Memory Management in vSphere – Where we are at today

This is a quick blog to discuss where vSphere is at with memory management today. vSphere has many mechanisms to reclaim memory before resorting to paging to disk. Let’s briefly look at these methods.


Memory Reclamation

  • Transparent Page Sharing (TPS)
    • Think of this as deduplication for memory. Identical pages of memory are shared with many VM’s instead of provisioning a copy of that same page to all VM’s. This can have a tremendous impact on the amount of RAM used on a given host if there are many identical pages.
  • Balooning
    • This method increases the memory pressure inside the guest so that memory that is not being used can be reclaimed. If the hypervisor were to just start taking memory pages from guests, the guest Operating Systems would not react positively to that. So, balooning is a way to place artificial pressure on the guest VM so that the VM pages unused memory to disk. Then, the hypervisor can reclaim that memory without disrupting the guest OS.
  • Memory compression
    • This method attempts to compress memory pages that would normally be swapped out via hypervisor swapping. This is preferable to swapping as there can be a performance impact when memory is swapped to disk.
  • Hypervisor swapping
    • This is the last resort for memory management. The memory pages are swapped to disk. New in vSphere 5 is the support for swapping these memory pages to SSD’s. This increases the performance when swapping is needed.

As you can see there are many memory management techniques in vSphere that allow greater consolidation ratios. The hypervisor in the virtual infrastructure does much more than just host guest VM images. There is a lot going on under the hood to consider before choosing a specific hypervisor to serve as the foundation for your infrastructure. Feel free to contact me if you would like to discuss any of the “under the hood” features of vSphere.

vSphere 5 High Availability: Bring on the Blades

vSphere 5 has many new and exciting features. This post will concentrate on High Availability(HA) and how it affects blade designs. While HA is certainly not new, it has been rewritten from the ground up to be more scalable and flexible than ever. The old HA software was based on Automated Availability Manager (AAM) licensed from Legato. This is why HA had its own set of binaries and log files.

One of the problems with this “now legacy” software was the method it used to track the availability of host resources. HA prior to vSphere 5 used the concept of primary nodes. There were a maximum of (5) primary nodes per HA cluster. These nodes were chosen by an election process at boot time. The (5) primary nodes kept track of the cluster state so that when an HA failover occurred, the virtual machines could restart on an available host in the cluster. Without the primary nodes, there was no visibility into the cluster state. So, if all (5) primary nodes failed, HA could not function.

This was not usually an issue in rackmount infrastructures. However, it posed some challenges in a blade infrastructure where a chassis failure can cause multiple blades to fail. Blade environments should typically have at least two chassis for failover reasons. If there was only a single chassis providing resources for an HA cluster, that single chassis failure could cause an entire cluster outage. You’ll seen in the diagram below that just because multiple chassis are used does not mean that the entire HA cluster is protected.




In this case, two chassis are used and populated with blades. However, the HA primary nodes all ended up on the same chassis. If that chassis were to fail, then HA will not function and the virtual machines will not restart on the remaining hosts in the other chassis. The way to design around this scenario prior to vSphere 5 is depicted in the below diagram.




No more than (4) blades should be part of the same HA cluster within a chassis. This does not mean that the entire chassis cannot be populated. The remaining slots in the chassis could be used for a second HA cluster. This scenario hinders single cluster scalability from a hardware perspective.


vSphere 5 HA

Some significant changes were made in vSphere 5 HA that address this challenge. HA was completely rewritten as Fault Domain Manager (FDM). The new HA software is baked into ESXi and does not rely at all on the AAM binaries. The idea of primary nodes has been abandoned. In its place is the concept of a single “Master” node and many (as many as are in the cluster) “Slave” nodes. All the nodes in an FDM based HA cluster can keep track of the cluster state. The “Master” node controls the distribution of cluster state information to the “Slave” nodes. However, any node in the cluster can initiate the HA failover process. The new HA failover process also includes electing a new “Master” node in the event that it is the node that fails. As you can see from the diagram below, a chassis failure can no longer take out an entire HA cluster that is stretched across multiple chassis.




The new FDM HA in vSphere 5 is much more resilient and allows the scaling of large single clusters in a blade environment. While blade architectures were certainly viable before, now those architectures can be utilized even more fully without compromises when it comes to HA.