VMware VSAN – Cisco UCS C240 M3 – Networking Part #1 Direct Connect


One of the partners that belongs to the VMware VSAN Program is Cisco.  Cisco is a unique in that the UCS Platform provides several connectivity options for the C series servers.  You can leverage internal network configurations like other server vendors, or use Unified Fabric that UCS is known for, and manage the C-series servers by connecting them to FEX/FI’s (depends on model of virtual interface card and UCS Manager Software release), and managing them via UCS Manager. As of UCS Manager 2.2(1), with the VIC 1225, we can now connect the C-series servers directly to the Fabric Interconnects without the need for FEX. CIMC traffic is passed through a Network Controller Sideband Interface and all traffic is passed through the VIC1225. Direct Connect Cisco UCS-C Series Servers allows for the same ease of management that you would have with the UCS-B Series Blades while providing the same unified fabric support without the need for additional hardware.

Here is a list of supported Cisco UCS-C Series, Firmware/BIOS Versions, and VIC1225 Placement for Direct Connect:

Capture-UCS-Direct Connect

Capture-UCS-Direct-Connect-Firmware-Bios

Capture-UCS-Direct-Connect-VIC-Placement

For this exercise, we have chosen four Cisco UCS C-240 M3’s.  These were provided by our partner for our Solution Center environment where we can demo and show off the latest technology. This is NOT a sizing guide for a production environment, the subject matter in this post is about Network Connectivity Options with VSAN and UCS. The hardware provided is to demo VSAN with supported Cisco Hardware.  All hardware is listed on the VMware Virtual SAN Compatibility Guide.

Hardware:

4 x Cisco UCS C240 M3 Servers 

Options : VIC 1225/1 x 200GB SSD/2 x 300GB 10k – Megaraid RAID9271-AI

2 x Cisco UCS 6248 Fabric Interconnects – UCS Manager 2.2(1c)

(we did not receive these servers with the VIC1225, however, this is how these would be configured with that particular VIC, to show this configuration we used the UCS Platform Emulator)

Connectivity Option #1: Direct Connect w/VIC 1225

Capture-UCS-C240-Direct Connect

Using Direct Connect, UCS Manager will discover the C series servers connected to the Fabric Interconnects.  At that point we can now build our Pools, Policies, and Templates for these servers. Since this article is on networking as it relates to UCS, vSphere and VSAN, we will focus our attention there.

Within UCS Manager I have configured QoS Policies for my vNIC’s. To give credit where credit is due, I have followed Brad Headland’s UCS/vSphere guide for the past couple of years and it’s still very relevant today and will be referenced here.

I first want to point out that there are a couple ways of networking UCS using Virtual Interface Cards and vSphere, but the key point to take away here is that we should treat VSAN traffic much like vMotion.  We want to make sure that this traffic is East/West traffic and does not traverse the Uplinks

If the Rackmount Policy is set to immediate, it will discover the C240’s. You need to be absolutely sure that you have the correct firmware, BIOS, VIC1225 PCI-E Slot Placement listed above for Direct Connect to work properly.

Here we can see the UCS Topology for Direct Connect:

Capture-UCS-Rack-Topolgy-DirectConect

You can see the Discovery taking place within UCS Manager for a C240M3 here.

Capture-C240M3-Discovery

Now let’s move on and roll out an Service Profile Template for these servers, but before we can do that we need to make sure that we change some Policies and vNIC templates to account for Server type firmware, and additional vNIC(s) pending which route you take for fault tolerance, for VSAN traffic.

As stated earlier there are a couple of ways that we can make sure that this traffic moves east/west and not north/south. We can assign two vNIC’s for VSAN, one on Fabric A and the other on Fabric B and use VMware’s network teaming in an Active/Standby fashion or we can create one vNIC and assign it to a Fabric and enable Fabric failover. Either method is forcing traffic down the Fabric you choose as Active within the VMware teaming settings, or by the UCS Fabric with a single vNIC and Fabric Failover.

I would recommend that this traffic be assigned to opposite fabric as vMotion to not saturate one side, especially during maintenance activities such as Host Evacuation, so if vMotion is configured for Fabric A as primary using either method, then I would suggest assigning VSAN traffic to Fabric B as primary. This is also accounting for the added VSAN Maintenance activities such as Full Data Migration.

You can see both configurations within UCS below:

Fabric Failover
Capture-VSAN-vNIC-Fabric-Failover

VMware Network Teaming
Capture-VSAN-vNIC-VMware-Teaming-A
Capture-VSAN-vNIC-VMware-Teaming-B

You may notice the QoS policy which we haven’t really covered. One of the key benefits of Cisco UCS is the ability to assign QoS in hardware to prioritize traffic. This alleviates the need for a software based QoS solution such as VMware’s NIOC (Network I/O Control).

In this case I have VSAN assigned to a QoS System Class of Silver and vMotion set as Bronze. The Silver System Class has a higher weight than the Bronze system class as we need to make sure that Storage get’s priority. After all, without storage, we don’t have much of a Virtualization Infrastructure.

vMotion & VSAN QoS Policies

Capture-Bronze-QoS-Policy

Capture-Silver-QoS-Policy

Now you are ready to build your Service Profile Template. In my case, I copied a current Blade Template, adjusted the Firmware Policies and Boot Policies, added the VSAN vNIC(S), and I was off and running. In my case I always standardize on two vNIC’s per function, so I opted for vNIC’s on each Fabric for VSAN traffic and since vMotion was Active out Fabric A, I made the second vNIC active on Fabric B for VSAN.

Capture-VSAN-Networking

Here is a diagram showing the traffic flow for VSAN in this use case:

VSAN-Traffic-Flow

Stay tuned for Part II when we leverage VMware NIOC!

7 thoughts on “VMware VSAN – Cisco UCS C240 M3 – Networking Part #1 Direct Connect

  1. You state: “We can assign two vNIC’s for VSAN, one on Fabric A and the other on Fabric B and use VMware’s network teaming in an Active/Standby fashion or we can create one vNIC and assign it to a Fabric and enable Fabric failover.”

    Any particular reason why you wouldn’t set both vNICs to active in the vSwitch you created instead of going with Active/Standby? Is it just to give you more control?

    • Ernes,

      Thank you for your question. When going with two vNIC’s for vMotion/FT and now VSAN traffic, we only want that traffic to traverse east/west, server to server and not go northbound of the Fabric Interconnects into your aggregate or CORE layer. If we use both vNIC’s active/active, the traffic can traverse the northbound switch to go from Fabric A to Fabric B. Remember, the connections between the two Fabric Interconnects are for cluster communication only, and no other traffic.

      Does that answer your ?

  2. Digging up a bit of an old thread here but this doesn’t alleviate the need for connectivity between the fabric interconnects from the upstream network does it? If a link dies on 1 server to 1 fabric, the standby adapter will become active and the upstream network will be the only way for that server to communicate to the other fabric where all the other active nics are. Please correct me I’m wrong.

    • You are essentially forcing traffic out either fabric interconnect by either Active/Standby or Fabric Failover. If a single server loses connectivity on say Fabric A, all traffic Active to Fabric A will now go through Fabric B and all traffic already Active on Fabric B will remain on Fabric B. The idea is to make sure the certain traffic that needs to remain “server to server”, ie vMotion and VSAN traffic in this case, is coming in and out of the same Fabric and not going outside of the Fabric Interconnects northbound to other switches. The only traffic that should traverse northbound of the Fabric Interconnects, in this case is the MGMT vmkernel and VM traffic.

      However, if you are saying that it’s only one server, that has an issue, then yes, that server would then send/receive traffic out of the Standy (now active) interface and server to server traffic for vMotion and VSAN would indeed traverse the northbound switch fabric. Usually I don’t see issues with single server network since essentially, when it leaves the VIC it’s sending traffic out a shared fabric to the FI’s. Usually it’s a global issue that would impact all servers, which, in that case, then all servers vMotion/VSAN traffic would traverse the Standy (now Active) interfaces which would keep traffic within the FI’s.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s