Virtualization - Cloud Design Phase 1 - Sizing

Follow my journey - building a large-scale virtualization cloud.  One day recently, the boss told me that the company was ready for "cloud computing".  They had been looking at commercial cloud offerings from the big players like IBM and Amazon, but they didn't seem to offer them the features they wanted.  They finally were ready to build a private cloud, in house.  I've been waiting for this for a long time.  I'd been supporting small islands of virtualization for the company for years, while supporting thousands of physical servers at the same time.

The process of delivering new physical servers had become so slow and painful, that it took three to six months to get a server into production.  We had built a virtual lab environment to support about 40 VMs, which quickly became very popular because we could deploy a new server in minutes instead of months.  It was only a matter of time before we would finally build a full scale virtualization platform for production.  Well, the time is now.  We're doing it.

To get started, we sat down with the inventory of servers, which contains the server names, operating system versions, and the primary role or application installed.  I guess that's useful information, but what we really needed to know big our servers are.  How many CPU cores are out there?  How much RAM?  How much data?  To build a virtual platform that will house all of the workloads that are running on our physical servers, we really need to be able to determine how much disk space is in use, how much RAM is in use, and how much processing power is in use.  Time for some scripts.  You didn't think we'd get through this article without a Perl script or two, did you?

Collecting CPU, RAM, and disk information is not only helpful in sizing the cloud, it's also a dramatic view into how much waste there is in your physical server environment.  I found that about two thirds of the RAM and disk space on my physical servers was unused.  What a waste of money.  And we still have capacity problems on many critical servers, with all that wasted disk space lying around.

Sizing the Storage Requirement
This first script takes a list of servers from a text file, and uses WMI queries to get the total disk size, space used, and space free, in gigabytes, for each server, and writes it to the output file.  You can then open the file in Excel, and have a look at the results.

use Win32::OLE;
$HARD_DISK = 3;
$GB = 1073741824;
open SERVERS,"servers.txt";
open OUT,">disk_space_report.txt";
while(<SERVERS>){
 chomp;
 s/\s+//g;
 $server="\U$_";
 my $serverDiskAllocated;
 my $serverDiskUsed;
 my $serverDiskFree;
 if($objWMIService = Win32::OLE->GetObject("winmgmts:{impersonationLevel=impersonate}!\\\\$_\\root\\cimv2")){
  $colDisks = $objWMIService->ExecQuery("Select * from Win32_LogicalDisk Where DriveType = $HARD_DISK");
  foreach $objDisk (in $colDisks){
   my $diskAllocated = $objDisk->{Size};
   my $diskFree = $objDisk->{FreeSpace};
   my $diskUsed = $diskAllocated - $diskFree;
   $serverDiskAllocated = $serverDiskAllocated + $diskAllocated;
   $serverDiskUsed = $serverDiskUsed + $diskUsed;
   $serverDiskFree = $serverDiskFree + $diskFree;
  }
 }
 print "$server\t".sprintf("%.1f",($serverDiskAllocated/$GB))."\t".sprintf("%.1f",($serverDiskUsed/$GB))."\t".sprintf("%.1f",($serverDiskFree/$GB))."\n";
 print OUT "$server\t".sprintf("%.1f",($serverDiskAllocated/$GB))."\t".sprintf("%.1f",($serverDiskUsed/$GB))."\t".sprintf("%.1f",($serverDiskFree/$GB))."\n";
}
close OUT;

If you don't already have a list of servers, you could make one by searching Active Directory for a list of servers.  After running our disk space script, I found that we had an absolutely enormous amount of wasted space.  I mean on the order of hundreds of Terabytes.  Sickening.  For me, this was the determining factor in my choice to go with thin provisioning.  I would normally shy away from it due to the potential pitfalls, but if it will save me hundreds of Terabytes, I'm sold.

So, with thin provisioning, I can total up my disk space used, add some percentage for growth, and end up with my storage sizing target for the new cloud.  Wow, that was one heck of a Perl script.  I was never that much of a WMI fan, but it came through this time.

CPU and Memory RequirementsNext, I needed the number of CPU cores and amount of RAM in use on the physical servers.  Again, I turned to Perl and WMI.  This next script collects the RAM in use, RAM installed, and the number of CPU cores on each server in our text file.

use Win32::OLE;
open IN,"servers.txt";
open OUT,">hwinfo.txt";
while(<IN>){
 chomp($server=$_);
 my $objWMIService;
 my $freemem;
 my $usedmem;
 my $totalmem;
 my $cores=0;
 if($objWMIService = Win32::OLE->GetObject("winmgmts:\\\\$server\\root\\CIMV2")){
  my $colItems;
  if($colItems = $objWMIService->ExecQuery("SELECT * FROM Win32_OperatingSystem")){
   foreach $objItem (Win32::OLE::Enum->new($colItems)->All){
    $totalmem=$objItem->{TotalVisibleMemorySize};
    my $freemem=$objItem->{FreePhysicalMemory};
    $usedmem=$totalmem - $freemem;
    $usedmem=sprintf("%.2f",($usedmem/(1024*1024)));
    $totalmem=sprintf("%.2f",($totalmem/(1024*1024)));
   }
  }
  my $colItems;
  if($colItems = $objWMIService->ExecQuery("SELECT * FROM Win32_Processor")){
   foreach $objItem (Win32::OLE::Enum->new($colItems)->All){
    $cores=$cores+$objItem->{NumberOfCores};
   }
  }
  if($cores == 0){ $cores = 2; }
 }
 print "$server\t$usedmem\t$totalmem\t$cores\n";
 print OUT "$server\t$usedmem\t$totalmem\t$cores\n";
}

Again, we can open the output in Excel and see the results.  Again, I found a tremendous amount of wasted RAM, terabytes wasted, terabytes of RAM!  Totalling up the RAM in use (not installed), and adding a healthy percentage for growth, I arrive at the amount of RAM the new cloud must have.

Also in the output is the number of CPU cores per server.  This is the number installed, we didn't get a feel for the amount of CPU in use.  This is a much more difficult thing to pin down, because of the spiky graph a CPU makes throughout the day, we'd have to monitor the CPU's over time to discover what the average utilization is.  Alternatively, you can accept what the industry says, that overall, the CPU's in your servers are probably running at around 10-15% of capacity.  You may disagree with that, and you may be right, if all you have is a small number of servers with heavy workloads.  In my case, I've got thousands of servers with all kinds of work loads, most of them light to moderate.  So I tend to agree with the 15% figure.

Whatever percentage you accept or how you arrive at it, you can then take your output files and do the math.  For example, if your physical server farm has 1000 processor cores running, and your virtualization factor is 15%, then theoretically you can build a cloud that contains 150 processor cores (plus some more for fault tolerance and growth).

So my sizing exercise started to come together.  Using the 15% virtualization factor for the processors, and the number of processor cores, RAM in use, and disk space in use from my scripts, I was able to figure out how big my cloud needed to be.

Stay tuned as we continue with the design.  We're busy choosing storage, server hardware, virtualization software, management tools, then we'll build the cloud and start migrating.  Lots to do!

0 comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...