Showing posts with label Ephemeral. Show all posts

Monday, October 24, 2011

AWS tricks #1 -- How to get rid of terminated volumes

Introduction

I spent this weekend figuring out how to install ODI 11.1.1.5 using SQL Server 2008 not Oracle 11g as the database back end and I now know two things:

I really ought to be doing this on Oracle instead of SQL Server as all of the blogs seem to cover the settings for Oracle. I find it interesting that the 10.1.3.x blog posts seemed to be oriented around SQL Server and 11.1.1.x around Oracle. However, as my environment was SQL Server, not Oracle, I was stuck with figuring it out.
I am not very good at installations. This may not come as a shock to anyone, especially me. :)

In an upcoming blog post, once I am convinced that I have everything working tickety-boo, I will document the settings and required downloads to make ODI and ODI Studio work in a 64-bit Windows server context.

Putting aside the additional grey hairs and general agita installing ODI caused me, this exercise got me back into the Amazon Web Services (AWS) cloud and that was a life-saver. Just like a virtual machine, I could spin up as many instances of the server as I needed to blunder my way through the install and could snapshot it when I had something halfway working. Unlike a virtual machine running on my laptop, I had 17+ gigabytes of RAM, super-fast networking (you haven’t lived till you switch from DSL to whatever AWS runs – I was getting 39 megabit per second downloads from Oracle’s website), and real server power.

However, as I was looking at my AWS account, I saw that I had a bunch of AMIs, snapshots, and volumes that I couldn’t really account for. Oh, I created them all right, but I hadn’t used them in ages and I was getting billed each month for them. A mass culling ensued but it occurred to me that this is a good time for some basic definitions.

AWS definitions

One of the (many) things that confuse me about AWS is the concept of “volumes”. Simply put, a volume is a hard drive.

Then there are these things called “snapshots”. Snapshots in AWS’ Elastic Block Storage (EBS) world are just what they sound like – snapshots of volumes that you can restore back to at any time.

It is very important to note that the hard drive of an Amazon Machine Image (AMI) (a predefined server you can start) is ephemeral, i.e., your laptop’s hard drive this is not – when the instance is killed, the hard drive is gone. It’s very easy to blow away all of your work if you don’t understand how AWS treats volumes.

This post will cover what a volume is, and how, when, and why you would get rid of a volume (you get charged for each one) as they have a way of piling up when least expected.

What’s ephemeral, and what’s not

A really good explanation of AWS’ drive ephemerality can be found here: http://shlomoswidler.com/2009/07/ec2-instance-life-cycle.html. If you cannot be bothered to jump to a well written blog, the gist is that EBS-backed instances have volumes that persist as long as the instance is not terminated.

And how does that affect you?

Well, when you stop an instance (Stopping an instance is akin to powering down a server) the volume is still extant. It makes sense (to me at least) that a volume hangs around when a server instance is stopped.

When you terminate (Terminating an instance is like you shut down the server, ripped it out of your data center, and naively took it to a recycling center which then had the hardware shipped tout de suite to Liberia for completely unsafe disposal.) an instance, there is a chance the volume will still hang around. But why is it still there when an instance is terminated? Doesn’t that destruction of the instance sort of imply that the hard drive volume should be trashed as well?

Remember, if you started your stopped instance back up, whatever changes you made to your boot drive are still there.

But what about the volume that is the product of a terminated instance? If you reattached it to a new instance, are your changes there? Nope, all of the changes you made are gone.

An example

Let’s pretend that you fired up John Booth’s EPM 11.1.2.1 AMI (go to http://www.metavero.com), did cool stuff, and then terminated the instance. Guess what – that C: drive is still around, and you’re getting billed for it.

Not deleted till it’s detached

Here’s what the web console to my AWS account looks like with four volumes attached to stopped instances and one 100 gigabyte volume from a terminated server instance.
What is that volume doing there? Did I really terminate that instance?

As I wrote, I’m being charged (not very much, but still) for storing that 100 gigabytes. I want to make that unloved hard drive go away.

It’s dead, Jim

That instance is deader than dead. That drive volume is in AWS purgatory.

What’s going on?

That volume is going to hang around till you delete it.

Why? A little searching of the tells us that the AMI must be set up to delete the volume on termination. If it’s not, then the volume must be manually deleted.

NB – A future blog post on launching an AMI from a command prompt will explain how to launch an AMI with that parameter; if the AMI has been set up to not delete the volume on termination, you must manually delete the volume as shown below.

How to delete a volume

Luckily, this is an easy process. Simply right click on the available volume and select “Delete Volume”.

You get one more chance to change your mind.

AWS will take a short time to delete the volume:

If you get impatient (it can take a while to delete a volume and the bigger the virtual drive, the slower the delete), you can click on the Refresh button to see the current status.

Until the volume is finally gone.

Conclusion

If the AMI is not set up to delete the volume on termination, you must do so manually, or modify the AMI via the command line interface to delete upon termination.

Set up your AMIs to delete volumes on termination or remember to check and delete available volumes – remember, you are being charged to store drives you no longer use.

Thursday, October 21, 2010

Who will rid me of this turbulent bug

It’s déjà vu all over again

Yep, life is sometimes like a Yogi Berra saying. That’s scary.

I just rolled off a Planning migration from h-e-double-toothpicks. I am reminded, again, that I am an applications, not an infrastructure consultant. For some strange reason, I seem to enjoy parading my serial infrastructure incompetence to all and sundry via this blog. Dirty Harry said it best. I am embracing my limitations with renewed fervor.

My pain=your gain

In an effort to ensure that this particular problem doesn’t bite you, oh applications consultant/administrator reader, in the unmentionables, think back, far back to the long-ago days of Planning 2.2. Was there a release with that number? Oh yes, and even before that. I have been around Planning a long time. So why have I learnt so little?

Moving past questions that cannot be answered (or at least questions that have answers I do not want to hear), there was a problem in older releases of Planning – ephemeral port consumption. No, that is not a Victorian-era disease that involves sanitariums and bloody coughs.

Why do you care and what are they?

The issue is that when Planning refreshes filters, it consumes ephemeral ports during its communication with Essbase. When the OS runs out of ports, Planning filter refreshes fail.

What does it look like?

The symptoms

What should have tipped me as to the error was that with 100 users in the app (I got pretty darn good with the Planning importsecurity.cmd/exportsecurity.cmd utilities) the refresh would work. The fact that the command line syntax for invoking the import and export utilities is completely different was just a dollop of Hyperion icing on the misery cake.

Getting back to what worked and didn’t, the filter refresh would work with 300 users in the app.

As the number of usernames increased (I was slowly adding known good MSAD usernames) to just over 600, at some point (and no, I never did get to the actual count that just tripped failure as I was adding in groups of 50) Planning would fail on the refresh.

I (and quite a few others) spent a lot of time trying to figure out if the MSAD ids were “bad” (some were and “bad” in MSAD means a bunch of different things, e.g., corrupted, locked out, etc.). But that wasn’t the issue.

Should have paid attention, but didn’t

What really threw me is that as I did the refresh, I’d get a pretty consistent list of failed usernames. However, when I selected those usernames individually, their refresh would work. Huh? Also, these same ids worked in other Planning apps. Huh, again.

And the answer(s) are

I would love to tell you that I came up with the diagnosis and the cure to this filter refresh failure, especially because I suffered through this in 2002, but I must give credit where it is due – say hello to Jason D’Onofrio who went into Metalink and started searching for an answer. Why would anyone want to search the help? If you don’t fancy my preferred diagnostic method of blindly poking around you too can search Metalink for knowledge base article 826673.1.

And the thing of it is, Tim Tow has documented this error and its fix for, oh, forever, maybe? A long time certainly.

If you can’t be bothered to read any of the explanations, here’s the quick and dirty Windows fix (the same issue affects *nix, but not very much and while the concept applies to that OS, the mechanics below do not):

1) Go into Windows Registry editor on the Essbase server.

2) Navigate to the following key: HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

3) Right click and select New or Edit->New and then select DWORD Value.

4) The name should be MaxUserPort.

5) Right click on the new “MaxUserPort” and edit the DWORD value. Enter a decimal value of 65534. You have just increased the number of ephemeral ports to their maximum value.

6) Again create a new DWORD Value. Call it TcpTimedWaitDelay. Set it to a decimal value of 30. You have now decreased to the minimum the time Windows will take to release a port.

Your registry settings should look like this when you’re done.

7) Reboot the Essbase box after stopping your various services – you know the boot order.

8) After starting the Oracle EPM services back up, try doing a refresh. You should have bottled magic at this point.

NB – The Metalink instructions go on about adding MaxFreeTcbs and setting that the decimal value to 6250. That wasn’t necessary in my case.

Why might you not see an error?

Maybe the registry settings are already there and you don’t know it.

Maybe you have small user communities and you never blow through Windows’ ephemeral ports.

Maybe you just can’t believe that this issue exists almost a decade after Hyperion Planning 1.0 was released on an unsuspecting world.

Maybe you’re on 11.1.2 and are using Windows 2008 which has a larger ephemeral port range. Yes, despite Essbase.sec’s almost complete emasculation in this release, filters are still stored in good old Essbase.sec.

Maybe you’re running some version of *nix.

Maybe you’re just lucky. :)

Phew, this is a problem I never want to revisit. Thanks again, Justin, for finding the answer.

Cameroon Airline