window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'UA-16803030-1');

Stretched Clusters: Use Cases and Challenges Part I – HA

I have been hearing a lot of interest from my clients lately about stretched vSphere clusters. I can certainly see the appeal from a simplicity standpoint. At least on the surface. Let’s take a look at the perceived benefits, risks, and the reality of stretched vSphere clusters today.

First, let’s define what I mean by a stretched vSphere cluster. I am talking about a vSphere  (HA / DRS) cluster where some hosts exist in one physical datacenter and some hosts exist in another physical datacenter. These datacenters can be geographically separated or even on the same campus. Some of the challenges will be the same regardless of the geographic location.

To keep things simple, let’s look at a scenario where the cluster is stretched across two different datacenters on the same campus. This is a scenario that I see attempted quite often.

 

image

 

This cluster is stretched across two datacenters. For this example let’s assume that each datacenter has an IP-based storage array that is accessible to all the hosts in the cluster and the link between the two datacenters is Layer 2. This means that all of the hosts in the cluster are Layer 2 adjacent. At first glance, this configuration may be desirable because of its perceived elegance and simplicity. Let’s take a look at the perceived functionality.

  • If either datacenter has a failure, the VM’s should be restarted on the other datacenter’s hosts via High Availability (HA).
  • No need for manual intervention or something like Site Recovery Manager

Unfortunately, perceived functionality and actual functionality differ in this scenario. Let’s take a look at an HA failover scenario from a storage perspective first.

  • If virtual machines failed over from hosts in one datacenter to hosts in another datacenter, the storage will still be accessed from the originating datacenter.
  • This will cause storage that is not local to the datacenter to be accessed by hosts that are local to the datacenter as shown in the diagram below.

image

This situation is not ideal in most cases. Especially if the datacenter is completely isolated. Then the storage cannot be accessed anyway. Let’s take a look at what happens when one datacenter loses communication with the other datacenter, but not with the datacenter’s local hosts. This is depicted in the diagram below.

image

  • Prior to vSphere 5.0, if the link between the datacenters went down or some other communication disruption happened at this location in the network, each set of hosts would think that the others were down. This is a problem because each datacenter would attempt to bring the other datacenter’s virtual machines up. This is known as a split-brain scenario.
  • As of vSphere 5.0, each datacenter would create its own Network Partition from an HA perspective and proceed to operate as two independent clusters (although with some limitations) until connectivity was restored between the datacenters.
  • However, this scenario is still not ideal due to the storage access.

So what can be done? Well, beyond VM to Host affinity rules, if the sites are truly to be active / standby (with the standby site perhaps running lower priority VM’s), the cluster should be split into two different clusters. Perhaps even different vCenter instances (one for each site) if Site Recovery Manager (SRM) will be used to automate the failover process. If there is a use case for a single cluster, then external technology needs to be used. Specifically, the storage access problem can be addressed by using a technology like VPlex from EMC. In short, VPlex allows one to have a distributed (across two datacenters) virtual volume that can be used for a datastore in the vSphere cluster. This is depicted in the diagram below.

 

image

A detailed explanation of VPlex is beyond the scope of this post. At a high level, the distributed volume can be accessed by all the hosts in the stretched cluster. VPlex is capable of keeping track of which virtual machines should be running on the local storage that backs the distributed virtual volume. In the case of a complete site failure, VPlex can determine that the virtual machines should be restarted on the underlying storage that is local to the other datacenter’s hosts.

Technology is bringing us closer to location aware clusters. However, we are not quite there yet for a number of use cases as external equipment and functionality tradeoffs need to be considered. If you have the technology and can live with the functionality tradeoffs, then stretched clusters may work for your infrastructure. The simple design choice for many continues to be separate clusters.

What Fantasy Football Can Teach You About Your Data Center

The return of football means different things to different people.  For some, they strap on their shoulder pads, place a helmet on their head, and don their jersey before heading to the gridiron.  For others, they load up a bowl with corn chips, turn on their laptop and television, and put on a jersey before sitting on a couch and wondering if their kicker is going to receive 7.1 points instead of his projected 6.9 points so they can beat the guy who sits in the cube across from him at work.

Yes, the arrival of autumn heralds the return of the world’s greatest fake sport – Fantasy Football .  During this season, over 19 million people will “play” this “sport.”  And, yes, I am one of those people who will manage their team, while not registering the irony of putting on athletic clothing so I can more effectively sit on my backside and stare at a various glowing rectangles all Sunday afternoon. 

As I have found with other avenues of entertainment, I see how Fantasy Football provides us lessons regarding information technology solutions.  In this case, I would like to point out some lessons that Fantasy Football can teach us about your Data Center.

Disaster Recovery/Have a back up plan

My number one draft pick this season was Jamaal Charles of the Kansas City Chiefs.  Charles was ranked as a top five running back, and projected to have a great season … until I drafted him.  (Yes, I take credit for his injury).  After his unfortunate collision with the Detroit Lions mascot, Charles received a torn ACL, and his season was ended. 

With Charles’ injury, my season could easily end in disaster.  However, I was fortunate enough to plan ahead by drafting other Running Backs in case of injury, including the Buffalo Bills’ RB Fred Jackson.  By having a back up plan, I still have a chance to contend for a pretend trophy.

With your data center, you too must have a disaster recovery plan.  Unlike Fantasy Football, your data center actually means something in real life.  In business, it is paramount for leaders to find ways to mitigate the risks to their businesses, protecting their employees, customers and investments. Extended down time or lost data equals lost productivity and lost profits.

In Fantasy Football, it is crucial to have a back up plan.  Just like in disaster recovery, it is not a matter of if, but of when you will need to use it.

Value of planning ahead

When managing your Fantasy Football team, you have to factor in that your players will experience bye weeks.  Every NFL team is assigned an off, or bye, week when they do not play.  Accordingly, you must account for the weeks in which your players take off.  You might have the best team in your league; however, if you have not planned ahead for when key positions are empty, you can put your season in jeopardy.

With your data center, it is not enough to focus on right now.  You need to plan ahead to scale and grow your data center for future needs. At TBL, our data center engineers can assist you by planning for future growth, and testing to see that your current infrastructure is optimized.

In both Fantasy Football and Data Center, if you fail to plan, you plan to fail.

The power of defense

While it is not as exciting as drafting the best QB, or picking up the unknown Running Back who becomes a dominant player, picking a good Defense is key to your Fantasy Football success.   In my first week matchup, my imaginary team squeaked out a 3.08 point victory.  I need every point that I could muster that week.  Fortunately for me, I had chosen San Francisco 49er’s Defense, which generated the highest points of Week 1. Without a strong defense, I would have lost.

Even if your data center is not large, defense is crucial for your success.  During a recent interview with Work It Richmond, Bryan Miller of Syrinx Technologies, who partners with TBL Networks on system security, said that smaller organizations often think “that the bad guys don’t want the resources they own. They feel because they are small that whatever they own isn’t worth the trouble” and that hackers often go after “low-hanging fruit. If you present yourself an easy target, the bad guys will take advantage of it.” A strong defense is key to success in both Fantasy Football and Data Center.

While managing your Data Center might not involve keeping track of which Cincinnati Bengals are currently in jail, you can still find lessons from Fantasy Football to ensure victory in your systems management.

Hurricane Irene No Match for Collaboration Technologies

As you may have seen from previous posts, TBL eats its own dog food. Even in the wake of wide spread power disruptions and communication network outages, TBL is able to continue operations and support its clients. As I write this post sitting in a local area Starbucks, I cannot help but think about what it was like, even a few short years ago, and how things have changed. In a matter of moments, I’ll be posting this article to our website; I’m able to instant message with co-workers, client and partners on Cisco Jabber; And, I can receive phone calls via Single Number Reach from our Communications Manager.

 

This is all in the face of nearly the entire county where I reside being without power, our corporate office being inaccessible for the day, and more than 1.2 million others throughout the area without power or communication services. So as you go about your day, raking leaves, cutting up brush, refueling your generators, or even responding to the occasional email…just stop and think for a moment – how technology has changed your life or business, and just imagine what could be possible tomorrow.

 

Tropical Depression Eight