VMware VSAN – Cisco UCS C240 M3 – Networking Part #1 Direct Connect

One of the partners that belongs to the VMware VSAN Program is Cisco.  Cisco is a unique in that the UCS Platform provides several connectivity options for the C series servers.  You can leverage internal network configurations like other server vendors, or use Unified Fabric that UCS is known for, and manage the C-series servers by connecting them to FEX/FI’s (depends on model of virtual interface card and UCS Manager Software release), and managing them via UCS Manager. As of UCS Manager 2.2(1), with the VIC 1225, we can now connect the C-series servers directly to the Fabric Interconnects without the need for FEX. CIMC traffic is passed through a Network Controller Sideband Interface and all traffic is passed through the VIC1225. Direct Connect Cisco UCS-C Series Servers allows for the same ease of management that you would have with the UCS-B Series Blades while providing the same unified fabric support without the need for additional hardware.

Here is a list of supported Cisco UCS-C Series, Firmware/BIOS Versions, and VIC1225 Placement for Direct Connect:

Capture-UCS-Direct Connect

Capture-UCS-Direct-Connect-Firmware-Bios

Capture-UCS-Direct-Connect-VIC-Placement

For this exercise, we have chosen four Cisco UCS C-240 M3’s.  These were provided by our partner for our Solution Center environment where we can demo and show off the latest technology. This is NOT a sizing guide for a production environment, the subject matter in this post is about Network Connectivity Options with VSAN and UCS. The hardware provided is to demo VSAN with supported Cisco Hardware.  All hardware is listed on the VMware Virtual SAN Compatibility Guide.

Hardware:

4 x Cisco UCS C240 M3 Servers 

Options : VIC 1225/1 x 200GB SSD/2 x 300GB 10k – Megaraid RAID9271-AI

2 x Cisco UCS 6248 Fabric Interconnects – UCS Manager 2.2(1c)

(we did not receive these servers with the VIC1225, however, this is how these would be configured with that particular VIC, to show this configuration we used the UCS Platform Emulator)

Connectivity Option #1: Direct Connect w/VIC 1225

Capture-UCS-C240-Direct Connect

Using Direct Connect, UCS Manager will discover the C series servers connected to the Fabric Interconnects.  At that point we can now build our Pools, Policies, and Templates for these servers. Since this article is on networking as it relates to UCS, vSphere and VSAN, we will focus our attention there.

Within UCS Manager I have configured QoS Policies for my vNIC’s. To give credit where credit is due, I have followed Brad Headland’s UCS/vSphere guide for the past couple of years and it’s still very relevant today and will be referenced here.

I first want to point out that there are a couple ways of networking UCS using Virtual Interface Cards and vSphere, but the key point to take away here is that we should treat VSAN traffic much like vMotion.  We want to make sure that this traffic is East/West traffic and does not traverse the Uplinks

If the Rackmount Policy is set to immediate, it will discover the C240’s. You need to be absolutely sure that you have the correct firmware, BIOS, VIC1225 PCI-E Slot Placement listed above for Direct Connect to work properly.

Here we can see the UCS Topology for Direct Connect:

Capture-UCS-Rack-Topolgy-DirectConect

You can see the Discovery taking place within UCS Manager for a C240M3 here.

Capture-C240M3-Discovery

Now let’s move on and roll out an Service Profile Template for these servers, but before we can do that we need to make sure that we change some Policies and vNIC templates to account for Server type firmware, and additional vNIC(s) pending which route you take for fault tolerance, for VSAN traffic.

As stated earlier there are a couple of ways that we can make sure that this traffic moves east/west and not north/south. We can assign two vNIC’s for VSAN, one on Fabric A and the other on Fabric B and use VMware’s network teaming in an Active/Standby fashion or we can create one vNIC and assign it to a Fabric and enable Fabric failover. Either method is forcing traffic down the Fabric you choose as Active within the VMware teaming settings, or by the UCS Fabric with a single vNIC and Fabric Failover.

I would recommend that this traffic be assigned to opposite fabric as vMotion to not saturate one side, especially during maintenance activities such as Host Evacuation, so if vMotion is configured for Fabric A as primary using either method, then I would suggest assigning VSAN traffic to Fabric B as primary. This is also accounting for the added VSAN Maintenance activities such as Full Data Migration.

You can see both configurations within UCS below:

Fabric Failover
Capture-VSAN-vNIC-Fabric-Failover

VMware Network Teaming
Capture-VSAN-vNIC-VMware-Teaming-A
Capture-VSAN-vNIC-VMware-Teaming-B

You may notice the QoS policy which we haven’t really covered. One of the key benefits of Cisco UCS is the ability to assign QoS in hardware to prioritize traffic. This alleviates the need for a software based QoS solution such as VMware’s NIOC (Network I/O Control).

In this case I have VSAN assigned to a QoS System Class of Silver and vMotion set as Bronze. The Silver System Class has a higher weight than the Bronze system class as we need to make sure that Storage get’s priority. After all, without storage, we don’t have much of a Virtualization Infrastructure.

vMotion & VSAN QoS Policies

Capture-Bronze-QoS-Policy

Capture-Silver-QoS-Policy

Now you are ready to build your Service Profile Template. In my case, I copied a current Blade Template, adjusted the Firmware Policies and Boot Policies, added the VSAN vNIC(S), and I was off and running. In my case I always standardize on two vNIC’s per function, so I opted for vNIC’s on each Fabric for VSAN traffic and since vMotion was Active out Fabric A, I made the second vNIC active on Fabric B for VSAN.

Capture-VSAN-Networking

Here is a diagram showing the traffic flow for VSAN in this use case:

VSAN-Traffic-Flow

Stay tuned for Part II when we leverage VMware NIOC!

vSphere Design Best Practices

Over the course of the last few weeks I’ve been reading  Scott Lowe’s book on vSphere Design.  While centered around vSphere 4.x, most if not all of the information still applies today.  The concepts of breaking design down into Organizational, Operational, and Technical subsets that make up the overall design is still relevant and will be valid for years to come.

The overall planning, architecting, and implementation, are set around a broad set of facts which are functional requirements.  It’s within those  facts, however, where extreme granularity comes into play.

You first need to understand the business requirement in broad terms.  What is it exactly they want/need to accomplish?  What is the goal?   What are the constraints?  When that is determined, then the granularity comes into play at each of the subset levels and each one has its own set of questions and facts that need to be obtained.

Virtualization brings many more changes to an organization than just a change of technology.  More importantly, when we move from the physical to virtual worlds, many things that were applied in the physical world fall by the wayside.  By the sheer nature of the “pooled” architecture, many things change from management of resources, to troubleshooting, etc, and this is where most of the concerns will originate.  IT shops and businesses are used to doing things a certain way and they come to realize  a comfort level in what they know and do every day.  I believe it’s addressing those concerns within the overall design that is most important.  Working with your client/customer, and building a strong partnership to show that you are not there just to sell them some technology, but that you will guide them through the process in a clear and concise manner, making the transition as smooth as possible.

This comes to my first finding that I’m learning to accept as I go through this process, and that is provide a design that is simple and modular and can be supported with minimal risk.

Unfortunately, as we all know, sometimes this just can’t happen.  The customer may have some pretty stringent requirements where the only option is a complicated solution so that you can meet their functional requirements.  However, even in those cases, you still need to look at making some of these things easier by looking at reference or converged architecture where it makes sense.

In the coming weeks, I’m going to continue down this path of learning vSphere Design principles and concepts.  I’ll break down some of the things I’ve learned and get into some specific areas such as Networking, Storage, Mgmt Infrastructure..etc.