Category Archives: Fibre Channel

The great misunderstanding of MPIO

Dual HBA’s, Dual Fabrics, redundant cache, RAID-ed disks, dual controllers or switched matrices, HA X-bars, multipath software installed and all OS drivers, firmware, microcode etc etc is up-to-date. In other words you’re all sorted and you can sleep well tonight.

And then Murphy strikes……..

As I’ve described in my previous articles it takes one single misbehaving device to really screw up a storage environment. Congestion and latency will, at some point in time, cause FC frames to go into the bit-bucket hence causing one or multiple IO errors. So what is exactly an IO error?

When an application want to read or write data it does this (in the open-systems world) via a SCSI command. (I’ll leave the device specific commands for later)
This command is than mapped at the FC4 layer into FC frames which then travel via the FC network to the target.

So lets take for example a database application that needs to read an piece of data. This is never done in chunks of a couple of bytes like single rows but it is always done with a certain size. This depends on the configuration of the application. For arguments sake lets assume the database uses 8KB IO sizes. So the read command is issued against a LUN on the SCSI layer more-or-less outlines the LUN id, the offset and the block-count from that offset. So for a single read-request an 8KB read is done on the array.  Since a fibre channel frame holds only 2 KB, this IO is split into 4 FC frames which are linked via, so called, sequence id’s. (I’ll spare you the entire handling on exchanges, sequences etc….). So if one of these frames are dropped somewhere under way we’re missing 2K out the total 8K. this means that for example frame 1, 2 and 4 have arrived back at the HBA but before the HBA can forward this to the SCSI layer is has to wait for frame 3 to arrive to be able to re-assemble the full IO. If frame 3 was dropped for whatever reason, the HBA has to wait for a pre-determined time before it will flag the IO as incomplete and will thus mark the entire FC exchange as invalid and will send and abort message with a certain status code to the SCSI layer. This will trigger the SCSI layer to retry the IO again and as such will consume the same resources on the system, FC fabric and storage array as the original request. You can imagine this can, and in many occasions will, cause performance issues or, in even more subsequent occurrences, an application failure.

Now when you look at the above traffic-flow all this time there has not been a single indication that the actual physical or logical path has disappeared between the HBA and the storage port. No HBA, storage or switch port has gone offline. The above was just the result of frames being dropped due to congestion, latency or any other reason. This will not trigger any MPIO software to logically remove a path and thus it will just keep on sending IO’s to the target over a path that may be somewhat erroneous. Again, it is NOT the purpose of MPIO to monitor and act up IO-errors.

If you are able to identify which path observes these errors you can disable this path from the MPIO software and you can fix the problem path at your earliest convenience. As I mentioned above this kind of behaviour very often occurs during Murphy time ie. during your least convenient time. This means that you will get called during your beauty sleep at 3:00AM  with a message that you’re entire ERP application is down and that 4 factories and 3 logistics distributions centre’s are picking their nose at $20,000 a minute.

So what happens when a real path problem is observed. Basically it means that a physical or logical issue occurred somewhere down the line. This can be a physical issue like a broken cable or SFP but also a bit- or word synchronisation issue between two ports in that path. This will trigger the switch to send a so called RSCN (Registered State Change Notification) to be sent to all ports in the same fabric and zone as the one that observed the problem. (Now, this also depends on the RSCN state registration of those devices but these are 99% of the time OK). This RSCN contains all 24-bit fabric addresses which are affected. (There can be more than one of course when ISL’s are involved.)

As soon as this RSCN arrives at the initiator the HBA will disassemble it and notify the upper layer of this change. This is done with different status codes as the IO errors as I described above. Based up the 24-bit fabric ID’s MPIO can then determine which path to that particular target and LUN was affected and as such can take it off-line. There can still be one or more IO errors as this depends on how many were in-flight during the error.

So what is the solution. As always the best way is to prevent this kind of troublesome scenario’s. Make sure you keep an eye on error counters and immediately fix these devices. If for some reason a device starts to behave this way during your beauty sleep, you need to make sure beforehand it will not further impact the rest of the environment. You can do this by disabling a ports either on the switch, HBA or storage port but that depends on where the problem is observed. Use tools that are build into the software like NX-OS or FOS to identify these troublesome links and disable them with features like portfencing. Although it might still have some impact this is nothing compared to a ongoing issue which might take hours or even days to identify.

As always use the manuals to determine how to set this up. If you’re inside the HDS network you can access a tool I wrote to very easily generate the portfencing configuration. Send me an email about this if you’re interested.

Hope this explains a bit the difference between IO errors and path problems w.r.t. MPIO and removes the confusion of what MPIO is intended to do.

Kind regards,
Erwin

P.S. for those unaware, MPIO (Multi Path IO) is software that maps multiple paths to target and LUNs to a single logical entity on a host so it can use all those paths to address that target/LUN. Software like Hitachi Dynamic Link Manager, EMC Powerpath, HP SecurePath and Veritas DMP fall into this category.

Brocade vs Cisco. The dance around DataCentre networking

When looking at the network market there is one clear leader and that is Cisco. Their products are ubiquitous from home computing to enterprise Of course there are others like Juniper, Nortel, Ericson but these companies only scratch the surface of what Cisco can provide. These companies rely on very specific differentiators and, given the fact they are still around, do a pretty good job at it.

A few years ago there was another network provider called Foundry and they had some really impressive products and I that’s mainly why these are only found in the core of data-centres which push a tremendous amount of data. The likes of ISP’s or  Internet Exchanges are a good fit. It is because of this reason Brocade acquired Foundry in July 2008. A second reason was that because Cisco had entered the storage market with the MDS platform. This gave Brocade no counterweight in the networking space to provide customers with an alternative.

When you look at the storage market it is the other way around. Brocade has been in the Fibre Channel space since day one. They led the way with their 1600 switches and have outperformed and out-smarted every other FC equipment provider on the planet. Many companies that have been in the FC space have either gone broke of have been swallowed by others. Names like Gadzoox, McData, CNT, Creekpath, Inrange and others have all vanished and their technologies either no longer exist or have been absorbed into products of vendors who acquired them.

With two distinct different technologies (networking & storage) both Cisco and Brocade have attained a huge market-share in their respective speciality. Since storage and networking are two very different beasts this has served many companies very well and no collision between the two technologies happened. (That is until FCoE came around; you can read my other blog posts on my opinion on FCoE).

Since Cisco, being bold, brave and sitting on a huge pile of cash, decided to also enter the storage market Brocade felt it’s market-share declining. It had to do something and thus Foundry was on the target list.

After the acquisition Brocade embarked on a path to get the product lines aligned to each other and they succeeded with  their own proprietary technology called VCS (I suggest you search for this on the web, many articles have been written). Basically what they’ve done with VCS is create an underlying technology which allows a flat level 2 Ethernet network operate on a flat fabric-based one which they have experiences with since the beginning of time (storage networking that is for them). 

Cisco wanted to have something different and came up with the technology merging enabled called FCoE. Cisco uses this extensively around their product set and is the primary internal communications protocol in their UCS platform. Although I don’t have any indicators yet it might well be that because FCoE will be ubiquitous in all of Cisco’s products the MDS platform might be abolished pretty soon from a sales perspective and the Nexus platforms will provide the overall merged storage and networking solution for Cisco data centre products which in the end makes good sense.

So what is my view on the Brocade vs. Cisco discussion. Well, basically, I do like them both. As they have different viewpoints of storage and networking there is not really a good vs bad. I see Brocade as the cowboy company providing bleeding edge, up to the latest standards, technologies like Ethernet fabrics and 16G fibre channel etc whereas Cisco is a bit more conservative which improves on stability and maturity. What the pros and cons for customers are I cannot determine since the requirement are mostly different.

From a support perspective on the technology side I think Cisco has a slight edge over Brocade since many of the hardware and software problems have been resolved over a longer period of time and, by nature, for Brocade providing bleeding edge technology with a “first-to-market” strategy may sometimes run into a bumpy ride. That being said since Cisco is a very structured company they sometimes lack a bit of flexibility and Brocade has an edge on that point.

If you ask me directly which vendor to choose when deciding a product set or vendor for a new data centre I have no preference. From a technology standpoint I would still separate fibre-channel from Ethernet and wait until both FCoE and Ethernet fabrics have matured and are well past their “hype-cycle”. We’re talking data centres here and it is your data. Not Cisco’s and not Brocade’s. Both FC and Ethernet are very mature and have a very long track-record of operations, flexibility and stability. The excellent knowledge there is available on each of these specific technologies gives me more piece of mind than the outlook of having to deal with problems bringing the entire data centre to a standstill.

Erwin

Brocade Fabric Watch – The most underutilised feature

Many customer cases I handle are related to poor connectivity. A connectivity problem can be caused by unclean connectors, broken cables or SFP’s. (See one of my earlier blog posts).
Although the switches are capable or identifying physical issues and subsequently notifying administrators, it’s  hardly ever being followed up. Very often an acute issue is lingering for days before an administrator starts investigating and in many cases this is only because of a server admin start complaining of SCSI errors or IO time-outs or very poor performance.
So how do we prevent this from happening? Well, for starters make sure that your environment is clean. With this I mean you should make sure that all connectors are not exposed to dust or other types of contamination. Secondly try to handle cables with care. I’ve seen many cases where cables were under so much tension that Jimmy Hendrix would be able to compose one of his finest works on it. Although modern fibre cables are fairly rugged and are able to handle a fair amount of tension try not to test this. At a last bullet point I would suggest to keep an eye out on light emitting power ratios. As you most likely know lasers do not have an infinite lifetime and their transmission power will decrease over time. At some point in time the receiving end of a link is most likely no longer able to distinguish between on or off in a reliable manner and as such the 8b10b (or 64b/66b) encoding/decoding algorithm will start to detect bit flips and as such it will discard a transmission word. The upper and lower power requirements are published in the data-sheets so as soon as one of these values reach their lower values replace them.

Now you might argue that if you have 10000 ports in your fabric you might have other things to worry about than checking SFP power values every day. The stress put on storage admins is not decreasing the last time I looked so this will most likely not be the case for the years to come.

Fortunately you don’t have to. Both Brocade and Cisco provide option to monitor each individual component. For many years Brocade has one of the best embedded management tools there is namely Fabric Watch (FW). FW is not an active management tool per-se however the underlying goal is to have a sort of self-healing and protecting framework to monitor, alert and take action on events that might have implications on overall fabric behaviour.

A single dodgy link can have significant implications on overall fabric behaviour which can, and will, impact many hosts depending on topology and traffic pattern. FW allows you to set thresholds on many items in a switch from SFP power values, link errors, temperature readings etc etc. Each of these items can be configured with certain characteristics like above,below,in-between or change values. On each of these a time frame can be configured.

Now lets take an example on a link that has some intermittent errors. Your applications tolerate a certain error ratio per time-frame that they can recover from so in case on or two IO errors per hour are seen by the OS or application it will re-send the read or write command and all is good. If however, this starts to increase you might end up with the application going down or even data corruption. If you have configured FW to send a notification in case the amount of errors increase beyond the application tolerance, you will be able to take some action and investigate were the problem might be.

Now there is another issue and that is that you’re most likely not sitting behind a console 24×7 or monitoring emails during your holidays. So even if you do get notified there is a good chance you will not notice it. (I know I won’t when I’m playing golf :-))
These call for some more drastic measures and this is also covered by FW. If a certain threshold increases beyond a warning level and reaches a critical level FW allows you to take some action right away. This is a feature Brocade call port-fencing. Basically what it means is that this threshold is met it will just disable the port to prevent it from propagating the problems further up in the fabric. This is REALLY an area you SHOULD investigate. It can save you from having many issues showing up all over the fabric.

The title of this blog post is unfortunately the status as it now stands with most of the installed base of fabrics and the reason seems to be that administrators have a problem with software deciding on disruptive actions like disabling ports. My argument is that this port is already in a degraded state plus it also causes other links in the entire fabric having problems. If you don’t know what your looking for and have this large 10000 port fabric it will take you a significant amount of time before you know what’s going on. In this time many, many more hosts and applications can and will suffer from significant performance and other problems which might create some significant overtime for many people.

Regards,
Erwin

Fill Words. What are those, what do they do and why are they needed

There has been quite some confusion around the use of fill words with the adoption of the 8G fibre-channel standard. Some admins have reported that they have problems connecting devices on this speed as well as numerous headaches in long-distance replication especially when DWDM/CWDM equipment is involved.

An ordered set is a transmission word used to perform control and signaling functions. There are 3 types of ordered sets defined:

1. Frame delimiters. These identify the start and end of frames.
2. Primitive signals. These are normally used to indicate events or actions (like IDLE)
3. Primitive Sequences which are used to indicate state or condition changes and are normally transmitted continuously until something causes the current state to chance. Examples are NOS,OLS,LR,LRR

So what is a fill-word? A fill-word is a primitive signal which is needed to maintain bit and word synchronization between two adjacent ports. Is doesn’t matter what port type (F-port,E-port,N-Port etc) it is. They are not data frames in the sense that they transport user-data but instead they communicate status messages between these two ports. If no user-data is transmitted the ports will send so called IDLE frames. These are just frames with some bit pattern where the ports are able to keep there synchronization on a bit-level as well as a word level. The IDLE primitive is a 10-bit transmission character on the wire, as any ordered set starts with K28.5 which is a fibre-channel notation for 8B10B encoding and three data words of which the last 20 bits are 1010101010….etc. Depending on the content of these transmission characters it’s either a fill-word or non-fillword.

Examples of fillwords are IDLE, ARB(F0), ARB(FF) and non-fillword are R_RDY, VC_RDY etc.

So what happened recently with the introduction of the 8G standard.

In the 1,2 and 4G standard the IDLE primitive signal was used to keep bit and word synchronization. This bitpattern was OK on those speeds however it has been observed that when increasing the clock speed this pattern caused high emissions which in turn could cause problems on adjacent ports and links. In order to reduce that the standard now requires links that are using 8G speed to use the ARB(ff) fill-word. This is a different bit-pattern which doesn’t have this high emission characteristic.

You might wonder what does this have to do with my connection problem? If links negotiate on 8G speed they both have to use the ARB(FF) fill-word. If that doesn’t happen for some reason then the ports cannot maintain word synchronisation and therefore cannot change the port into the active state. This causes both ports to be in some sort of deadlock situation and although you may see that there is a green status light on your HBA and switch port it still is not able to transfer data.

The standard defines that ports who connect on 8G speed first have to initialize with IDLE fill-words and as soon as the port changes to the active state it should change the fill-word to ARB(FF).

It becomes even more complicated with DWDM and CWDM equipment particularly when multiplexers are used. These TDM devices normally crack open the fibre-channel link on a frame boundary level and then are able to multiplex this on a higher clock-rate so they are able to send data from multiple links into one wavelength. If however these TDM devices cannot open the fibre-channel link because they only look for IDLE fillwords then the end-to-end link will fail.

Verify with you manufacturer if you use TDM devices and if so do they support ARB(FF) fillwords. If not than you may have to force the linkspeed to a lower level like 4G.

The importance of clean fibre optics

I attended Cisco Live this week in Melbourne. Since it was very close to home and Cisco was kind enough to provide me with an entry ticket. (Many thanks for this.)

While strolling around the expo floor I ran into the nice people from Fluke Networks who were showing their testing equipment and of course I was very interested in the optical side of the fence. (I haven’t seen wireless storage networks yet so I’ll save that part of their impressive toolkit for later. :-)).

Since I’m doing troubleshooting as a day to day job I see many issues which have characteristics of a physical nature. This can be a bad cable, patch panel, SFP or anything in that nature.

Just when I wanted to start this blog post I saw that my Melbournian buddy  Anthony Vandewerdt just beat me to it and wrote the article “Semmelweiss could see the problem” in which he described the problem of unclean cables and where it might lead to.  (read this first and then come back here.)

In order to complement that article I’ll try to explain why this is so important.

I’m pretty sure that everyone these days know that computers work with bits which are either a 1 or 0. To be able to communicate with other computers (or devices in general) we use transmission of bits with either an on or off signal whether this being an electrical current or an optical wave. Electrical use has the nasty habit that the energy is partially stored in the capacitance of the electrical cable so it has a certain drop zone before it becomes a capacitor with 0 value. You can see this very well if you use a laptop charger with a small led. When you unplug it from the wall-socket it takes a couple of seconds before the current is completely gone from the capacitors in the transformer. This is also one of the primary reasons FC uses a 8b/10b encoding decoding schema to keep a balanced DC value.

The optical to electrical transformers have the same issue albeit not being in the cable itself but more in the physics characteristics of the circuitry. There is a certain fall-off and ramp-up time before current becomes completely zero and completely one respectively. This is very important since this depicts when a receiver should determine if the incoming bit should be seen as a 1 or are 0.

The optics people and companies represented in IEEE and T11-2 do write up the official metrics so this is all being done for you. There is nothing on a switch, array or other network equipment where you can tune this.

The measurement and characteristics of a signal can be measured with an oscillator. The result you see looks like this:

 
The blue lines show the voltage on the oscillator and this shows the, so called, eye-pattern. The hexagon in the middle is determine by the folks of IEEE and T11-2 and can be loaded as a software feature for ease of use on most equipment. (Note: be aware that this differs per technology and optical characteristic like FC, Ethernet, DWDM etc. )
 
The above picture shows a perfect eye-pattern since it show that the ramp-up time (from the bottom blue line to the top) is way before the “decision point” on becoming a 1 and the fall-off time is way after the decision point of becoming a 0.
 
“So what does this have to do with my fibre-cable” you may ask.
When connectors are not clean the light may be reflected back in to the cable causing jitter. It is this jitter that can significantly close the eye-pattern to a point where the receiver can no longer determine if an incoming light should be determined as a 1 or 0. The below picture show that this comes pretty close.
 
 
 
By default it will keep the same value it had on the previous clock cycle. This means that a one remains a 1 even though it actually should have been a 0 and vice versa. The result will be that the bitstream from the receiver buffer into the serdes chip will be incorrect thereby causing a decoding error. For FC it means that the er_enc_out or er_enc_in value on the LESB (Link Error Status Block) is incremented by one (depending if the 10-bit transmission word was part of a FC frame or not). On a Brocade switch in the porterrshow output this is shown in the enc_in or enc_out column.
 
If this happens on a bit which was part of a, normally valid, FC frame the frame now contains an invalid byte. If we not would have a fall-back mechanism this would have led to an invalid byte being send to the operating system and application causing corruption and even system failures. Since we also do a CRC check on the entire frame the destination port will discard it entirely and the upper layer SCSI stack (or whatever protocol resides on the FC4 layer) retry the IO.
 
The problem is that with distance you get loss of power (remember that light is measured in db’s). Depending on the type of cable (OM1,2,3,4) this budget loss on the cable is fixed. Every connection or splice (two optical cables welded together) adds to the link loss and decreases the optical power received on the other side of the link. The problem with dirty connections is that it significantly decreases the optical power which can cause the problem that the value in db the receiver can detect falls outside the specification of that particular SFP. This can cause link losses and port flapping causing all sorts of other nasty issues.
 
The link loss budget can be calculated based on the launch power of the transmitter, the number of connectors and splices in the cable-plant plus the margin on the receiver side.If this all falls below the receiver sensitivity mark the receiver will drop the link and the ports will go offline.
 
 
 
 
On a Brocade switch you can see the transmitter and receiver value with the “sfpshow” command:
 
 
The specifications of the SFP determine what the transmit and receive power should be. If the actual values of the RX power fall outside the specification of the SFP you should start to look at you cable’s, connectors and start cleaning them. If this doesn’t help there might be another problem like a crack in the cable or the SFP has a broken laser. In this case either replace the cable and/or SFP.
 
Hope this may help to explain why you might see strange things in your fibre channel network if the connectors are not clean and your support organisation is really stressing to fix and maintain your cable plant. I did mention I work in support and I see many connectivity issues resulting in flapping ports, overall performance issues and even data-loss or corruption.
 
If you want to know the characteristics of optical cables or SFP’s I suggest you have a look at the JDSU, Finisar or Avago websites. Also check out the FOA Youtube channel who uploaded some nice video’s which explain in detail the ins and outs of fibre optics.
Regards,
Erwin

SoE, SCSI over Ethernet.

It may come as no surprise that I’m not a fan of FCoE. Although I have nothing against the underlying thought of converged networking I do feel that the method of encapsulating multiple protocols in yet another frame is overkill, adds complexity, requires additional skills, training and operating methods and introduces risk so as far as I’m concerned it shouldn’t be needed. The main reason FCoE is invented is to have the ability to traverse traffic from Fibre Channel environments through gateways (called FCF’s) to an Ethernet connected Converged Network Adapter in order to save on some cabling. Yeah, yeah I know many say you’ll save a lot more but I’m not convinced.
After staring at some ads from numerous vendors I still wonder why they never came up with the ability to directly map the SCSI protocol on Ethernet in the same way they do with IP. After all with the introduction of 10G Ethernet all issues of reliability appear to have gone (have they??) so it shouldn’t be such a problem to directly address this. This was the main reason why Fibre Channel was invented in the first place. I think from a development perspective this should be an evenly amount of effort to have SCSI directly transported on Ethernet compared to Fibre Channel.From an interface perspective it shouldn’t be such a problem as well. I think storage would be as happy to shove in an Ethernet port in addition to FC. They wouldn’t need to use any difficult FCoE or iSCSI mechanisms.

Since all, or at least a lot, development efforts these days seem to have shifted to Ethernet why still invest in Fibre Channel. Ethernet still has a 7 layer OSI stack but you should be able to just use three, the physical, datalink, and networking layer. This should be enough to shove frames back and forth in a flat Ethernet network (or Ethernet Fabric as Brocade calls it).For other protocol like TCP/IP this is no problem since they already use the same stack but just travel a bit higher up. This then allows you to have a routable iSCSI environment (over IP) as well as a native SCSI protocol running on the same network. The biggest problem is then security. If SCSI runs on a flat Ethernet network there is no way (yet) to secure SCSI packets arriving at all ports in that particular network segment. This would be the same as having no zoning active as well as disabling all LUN masking on the arrays. The only way to circumvent this is to invent some sort of “Ethernet Firewall” mechanism. (I’m not aware of a product/vendor who provides this but I’ve never heard of it.) I’ts pretty easy to spoof a MAC address so that’s no good as a security precaution. 

As usual this should then also have all the other security features like authentication, authorisation etc etc. Fibre Channel already provides authentication based on DH-CHAP which is specified in the FC-SP standard. Although DH-CHAP exists in the Ethernet world it is strictly tied to higher layers like TCP. It would be good though to see this functionality on the lower layers as well.

I’m not an expert on Ethernet so I would welcome comments that would provide some more insight of the options and possibilities.

Food for thought.

Regards,
Erwin

Why not FCoE?

You may have read my previous articles on FCoE as well as some comments I’ve posted on Brocade’s and Cisco’s blog sites. It won’t surprise you that I’m no fan of FCoE. Not for the technology itself but for the enormous complexity and organisational overhead involved.

So lets take a step back and try to figure out why this has become so much of a buzz in the storage and networking world.

First lets make it clear that FCoE is driven by the networking folks and most notably Cisco. The reason for this is that Cisco has around 90% market share of the data centre networking side but they only have around 10 to 15% of the storage side. (I don’t have the actual numbers at hand but I’ m sure it’s not far off). Brocade with their FC offerings have that part (storage) pretty well covered. Cisco hasn’t been able to eat more out of that pie for quite some time so they had to come up with something else. So FCoE was born. This allowed them (Cisco) to slow but steady get the foot in the storage door by offering a, so called,  “new” way of doing business in the data centre and convince customers to go “converged”.

I already explained that their is no or negligible benefit from an infrastructural and power/cooling perspective so cost-effectiveness from a capex perspective is nil and maybe even negative. I also showed that the organizational overhaul that has to be accomplished is tremendous. Remember you’re trying to glue two different technologies together by adding a new one. The June-2009 FC-BB-5 document (where FCoE is described) is around 1.9 MB and 180 pages give or take a few. FC-BB-6 is 208 pages and 2.4 MB thick. How does this decrease complexity?
Another part that you have to look at is backward compatibility. The Fibre Channel standard went up to 16Gb/s a while ago and most vendors have released product for it already. The FC standard does specify backward compatibility to 2Gb/s. So I’m perfectly safe when linking up an 16G SFP with a 8Gb/s or 4 Gb/s SFP and the speed will be negotiated to the highest possible. This means I don’t have to throw away some older, not yet depreciated, equipment. How does Ethernet play in this game? Well, it doesn’t, 10G Ethernet is incompatible with 1G so they don’t marry up. You have to forklift your equipment out of the data center and get new gear from top to bottom. How’s that for investment protection? The network providers will tell you this migration process comes naturally with equipment refresh but how do you explain that if you have to refresh one or two director class switches were your other equipment can’t connect to it this is a natural process? This means you have buy additional gear that bridges between the old and the new; resulting in you paying even more. This is probably what is meant by “naturally”. “Naturally you have to pay more.”

So it’s pretty obvious that Cisco needs to pursue this path will it ever get more traction in the data center storage networking club. They’ve also proven this with UCS, which looks like to fall off the cliff as well when you believe the publications in the blog-o-sphere. Brocade is not pushing FCoE at all. The only reason they are in the FCoE game is to be risk averse. If for some reason FCoE does take off they can say they have products to support that. Brocade has no intention of giving up an 80 to 85% market share in fibre channel just to be at risk to hand this over the other side being Cisco Networking. Brocade’s strategy is somewhat different than Ciscos’. Both companies have outlined their ideas and plans on numerous occasions so I’ll leave that for you to read on their websites.

“What about the other vendors?”  you’ll say. Well that’s pretty simple. All array vendors couldn’t care less. For them it’s just another transport mechanism like FC and iSCSI and there is no gain nor loss if FCoE makes it or not. They won’t tell you this in your face of course. The other connectivity vendors like Emulex and Qlogic have to be on the train with Cisco as well as Brocade however their main revenue comes out of the server vendors who build products with Emulex or Qlogic chips in them. If the server vendors demand an FCoE chip either party builds one and is happy to sell it to any server vendor. For the connectivity vendors like these it’s just another revenue stream they link into and cannot afford to be outside a certain technology if the competition is picking it up. Given the fact there is some significant R&D required w.r.t. chip development these vendors also have to market their kit to have some ROI. This is normal market dynamics.

“So what alternative do you have for a converged network?” was a question that was asked to me a while ago. My response was “Do you have a Fibre Channel infrastructure? If so, then you already have a converged network.” Fibre Channel was designed from the bottom up to transparently move data back and forth irrespective of the upper protocol used including TCP/IP. Unfortunately SCSI has become the most common but there is absolutely no reason why you couldn’t add a networking driver and the IP protocol stack as well. I’ve done this many times and never have had any troubles with it.

The question is now: “Who do you believe?” and “How much risk am I willing to take to adopt FCoE?”. I’m not on the sales side of the fence not am I in marketing. I work in a support role and have many of you on the phone when something goes wrong. My background is not in the academic world. I worked my way up and have been in many roles where I’ve seen technology evolve and I know when to spot bad ones. FCoE is one of them.

Comments are welcome.

Regards,
Erwin

Will FCoE bring you more headaches?

Yes it will!!.

Bit of a blunt statement but here’s why.

When you look at the presentations all connectivity vendors (Brocade,Cisco,Emulex etc…) will give you they pitch that FCoE is the best thing since sliced bread. Reduction in costs, cooling, cabling and complexity will solve all of your to-days problems! But is this really true?

Let start with costs. Are the cost savings really that big as they promise. These days a server 1G Ethernet port sits on the motherboard and is more or less almost a free-bee. Expectation is that the additional cost of 10Ge will be added to a server COG but as usual they will decline over time. Most servers come with multiple of these ports. On average a CNA is 2 times more expensive then 2 GE ports + 2 HBA’s so that’s not a reason to jump to FCoE. Each vendor have different price lists so that’s something you need to figure out yourself. The CAPEX is the easy part.

An FCoE capable switch (CEE or FCF) is significantly more expensive than an Ethernet switch + a FC switch. Be aware that these are data center switches and the current port count on an FCoE switch is not sufficient to deploy large scale infrastructures.

Then there is the so called power and cooling benefit. (?!?!?) I searched my butt of to find the power requirements on HBA’s and CNA’s but no vendor is publishing these. I can’t imagine an FC HBA chip eats more than 5 watts however a CNA will probably use more given the fact it runs on a higher clock speed and for redundancy reasons you need two of them anyway so in general I think these will equate to the same power requirements or an eth+hba combination is even more efficient than CNA’s. Now lets compare a Brocade 5000 (32 port FC switch) with a Brocade 8000 FCoE from a BTU and power rating perspective. I used their own specs according to their data sheets so if I made a mistake don’t blame me.

A Brocade 5000 uses a maximum of 56 watts and has a BTU rating of 239 at 80% efficiency. An 8000 FCoE switch uses 206 watts when idle and 306 watts when in use. The BTU heat dissipation is 1044.11 per hour. I struggled to find any benefit here. Now you can say that you also need an Ethernet switch but even if that has the same ratings as a 5000 switch you still save a hell of a lot of power and cooling requirement on separate switches. I haven’t checked out the Cisco, Emulex and Qlogic equipment but I assume I’m not far off on those as well.

Now, hang on, all vendors say there is a “huge benefit” in FCoE based infrastructures. Yes, there is, you can reduce your cabling plant but even there is a snag. You need very high quality cables so an OM1 or OM2 cabling plant will not do. As a minimum you need OM3 but OM4 is preferred. Do you have this already? If so good you need less cabling, if not buy a completely new plant.

Then there is complexity. Also an FCoE sales pitch. “Everything is much easier and simpler to configure if you go with FCoE”. Is it??? Where is the reduction in complexity when the only benefit is that you can get rid of cabling. Once a cabling plant is in place you only need to administer the changes and there is some extremely good and free software to do that. So even if you consider this as a huge benefit what do you get in return. A famous Dutch football player once said “Elk voordeel heb z’n nadeel” (That’s Dutch with an Amsterdam dialect spelling :-)) which more or less means that every benefit has it’s disadvantage i.e. there is a snag with each benefit.

The snag here is you get all the nice features like CEE,DCBX,LLDP,ETS,PFC,FIP,FPMA and a lot more new terminology introduced into you storage and network environment. (say what???). This more or less means that each of these abbreviations needs to be learned by your storage administrators as well as you network administrators, which means additional training requirements (and associated costs). This is not a replacement for your current training and knowledge but this comes on top of that.
Also these settings are not a one-time-setup which can be configured centrally on a switch but they need to be configured and managed per interface.

In my previous article I also mentioned the complete organizational overhaul you need to do between the storage and networking department. From a technology standpoint these two “cultures” have a different mindset. Storage people need to know exactly what is going to hit their arrays from an applications perspective as well as operating systems, firmware, drivers etc. Network people don’t care. They have a horizontal view and they transport IP packets from A to B irrespective of the content of that packet. If the pipe from A to B is not big enough they create a bigger pipe and there we go. In the storage world it doesn’t work like this as described before.

Then there is the support side of the fence. Lets assume you’ve adopted FCoE in your environment. Do you have everything in place to solve a problem when it occurs. (mind the term “when” not “if”)  Do you know exactly what it takes to troubleshoot a problem. Do you know how to collect logs the correct way? Have you ever seen a Fibre Channel trace captured by an analyzer? If so, where you able to bake some cake of it and actually are able to pinpoint an issue if there is one and more importantly how to solve this? Did you ever look at fabric/switch/port statistics on a switch to verify if something is wrong? For SNIA I wrote a tutorial (over here) in which I describe the overall issues support organisations face when a customer calls in for support and also what to do about it. The thing is that network and storage environments are very complex. By combining them and adding all the 3 and 4 letter acronyms mentioned above the complexity will increase 5-fold if not more. It therefore takes much and much longer to be able to pin-point an issue and advise on how to solve it.

I work in one of those support centers of a particular vendor and I see FC problems every day. Very often due to administrator errors but far more because of a problem with software or hardware. These can be very obvious like a cable problem but in most cases the issue is not so clear and it take a lot of skills, knowledge, technical information AND TIME to be able to sort this out. By adding complexity it just takes more time to collect and analyze the information and advise on resolution paths. I’m not saying it becomes undo-able but it just takes more time. Are you prepared and are you willing to provide your vendor this time to sort out issues?

Now, you probably think I must hold a major grudge against FCoE. On the contrary; I think FCoE is a great technology but it’s been created for technologie’s sake and not to help you as customer and administrator to really solve a problem. The entire storage industry is stacking protocols upon protocols to circumvent the very hard issue that they’ve screwed up a long time ago. (Huhhhh, why’s that?)

Be reminded that today’s storage infrastructure is still running on a 3 decade old protocol called SCSI (or SBCCS for z/OS which is even older). Nothing wrong with that but it implies that shortcomings of this protocol needs to be circumvented. SCSI originally ran on a parallel bus which was 8-bit wide and hit performance limitations pretty quick. So they created “wide scsi” which ran on a 16-bit wide bus. With increase of the clock frequencies they pumped up the speed however the problem of distance limitations became more imminent and so they invented Fibre-Channel. By disassociating the SCSI command set from the physical layer the T10 committee came up with SCSI-3 which allowed the SCSI protocol to be transported over a serialized interface like FC which had a multitude of benefits like speed, distance and connectivity. The same thing happened with Escon in the mainframe world. Both the Escon command set (SBCCS now known as Ficon) as well as SCSI (on FC known as FCP) are now able to run on the FC-4 layer. Since Ethernet back then was extremely lossy this was no option for a strict lossless  channel protocol with low latency requirements. Now that they have fixed up Ethernet a bit to allow for loss-less transport over a relatively fast interface they now map the entire stack into a mini-jumbo frame and the FCP-4 SCSI command and data sits in a FC encapsulated frame which in turn now sits in an Ethernet frame. (I still can’t find the reduction in complexity, if you can please let me know.)

What should have been done instead of introducing a fixer-upper like FCoE is that the industry should have come up with an entirely new concept of managing, transporting and storing data. This should have been created based on todays requirements which include security (like authentication and authorization), retention, (de-)duplication , removal of awareness of locality etc. Your data should reside in a container which is a unique entity on all levels from application to the storage and every mechanism in between. This container should be treated as per policy requirements encapsulated in that container and those policies are based on the content residing in there. This then allows for a multitude of properties to be applied to this container as described above and allows for far more effective transport

Now this may sound like trying to boil the ocean but try to think 10 years ahead. What will be beyond FCoE? Are we creating FCoEoXYZ? 5 Years ago I wrote a little piece called “The Future of Storage” which more or less introduced this concept. Since then nothing has happened in the industry to really solve the data growth issue. Instead the industry is stacking patch upon patch to circumvent current limitations (if any) or trying to generate a new revenue stream with something like the introduction of FCoE.

Again, I don’t hold anything against FCoE from a technology perspective and I respect and admire Silvano Gai and the others at T11 what they’ve accomplished in little over three years but I think it’s a major step in the wrong direction. It had the wrong starting point and it tries to answer a question without anyone asking.

For all the above reasons I still do not advise to adopt FCoE and urge you to push your vendors and their engineering teams to come up with something that will really help you to run your business and not patching up “issues” you might not even have.

Constructive comments are welcome.

Kind regards,
Erwin van Londen

Fibre Channel improvements.

So what is the problem with storage networking these days? some of you might argue that it’s the best thing since sliced bread and it’s the most stable way to shove data back and forth and maybe it is however this is not always the case. The problem is some gaps still exist which have never been addressed and one of them is resiliency. There is a lot that has been done to detect errors and to try to recover from them but nobody ever thought of how to prevent errors from occurring. (Until now that is). Read on.

So what is the evolution of a standard like Fibre Channel.It normally is born out of a need that isn’t addressed with current technologies. The primary reason FC was erected is that the parallel SCSI stack had a huge problem with distance. It did not scale beyond a couple of meters and was very sensitive to electrical noise which could disturb the reliable transmission that was needed for a data intensive channel protocol like SCSI. So somebody came up with the idea to serialise the data stream and FC was born. A lot of very smart people got together and cooked up the nifty things we now take for granted like massive address-space, zoning, huge increase in speed and lot of other goodies which could have never been achieved with a parallel interface.

The problem is that these goodies are all created in the dark dungeons of R&D labs. These guys don’t speak much (if at all) to end-user customers so the stuff coming out of these labs is very often extremely geeky.
If you follow a path from the creation of a new thing (whether technology or anything else) you see something like this:

  1. Market demand
  2. R&D
  3. Product
  4. Sales
  5. Customers
  6. Post sales support

The problem is that very often there is no link between #5/#6 and #2. Very often for good reason but this also inflicts some serious challenges. Since I’m not smart enough to work in #2 I’m on the bottom of the food chain working in #6. 🙂 But I do see the issues that arise in this path so I cooked something up. Read on.

Going back to fibre channel there is one huge gap and that is fault tolerance and the acting upon failures in a FC fabric. The protocol defines how to detect errors and how to try to recover from these but is does not have anything which defines how to prevent errors from reoccurring. This means that if an error has been detected and frames get lost we just say “OK, lets try it again and see if it succeeds now”. It doesn’t take a genius to see that if something is broke this will fail again.

So on the practical side there are a couple of things that most often go wrong and that is the physical side of things like SFP’s and cables. These result in errors like encoding/decoding failures, CRC errors, signal and synchronization errors. If these occur the entire frame including your data payload will get dropped and we’re asking to the initiator of that frame to try and resend it. If however the initiator does not have this frame in it’s buffers anymore we rely on the upper layer protocol to recover from this. Most of the time it succeeds, however, as previously mentioned, if things are really broke this will fail again. From an operating system perspective you will see this as SCSI check conditions and/or read/write failures. On a tape environment this will often result in failed back/restore jobs.

Now, you’re gonna say “Hold on buddy, that why we have dual redundant fabrics, multiple entries to our LUNS, multipathing etc etc” i.e. redundancy. True, BUT, what if it is just partially broken? An dodgy SFP or HBA might send out good signals but there could also be a certain amount of not so good signals. This will result in intermittent failures resulting in the above mentioned errors and if these happen often enough you might get these problems. So, although you have every piece of the storage puzzle redundant, you might still run into problems which, if severe enough, might affect your entire storage infrastructure. (and it does happen, believe me)

The underlying problem is that there is no communication between N-Ports and F-ports as well as lack of end-to-end path error verification to check if these errors occur in the fabric and if so how to mitigate or circumvent these. If an N-port sends out a signal to an F-port which gets corrupted underway there is no way the F-port is notifying the N-port and saying “He, dude you’re sending out crap, do something about it”. Similar issue is in meshed fabrics. We all grew up since 1998 with FSPF (Fabric Shortest Path First) which is a FC protocol extension to determine the shortest path from A to B in a FC fabric based on a least cost routing algorithm. Nothing wrong with that however what if this path is very error prone? Does the fabric have any means to make a decision and say “OK, I don’t trust this path, I’ll direct that traffic via another route”? No, there is nothing in the FC protocol which provides this option. The only way routes are redefined is if there are changes in the fabric like an N-Port coming online/offline and registers/de-registers itself with the fabric nameserver and RSCN (Registered Name Change Notifications) are sent out.

For this reason I submitted a proposal to the T11 committee via my, teacher and father of Fibre Channel Horst Truestedt, to extend the FC-GS services with new ways to solve these problems. (proposal can be downloaded here )

The underlying thoughts are to have port-to-port communication to be able to notify the other side of the link it is not stable as well as have and end-to-end error verification and notification algorithm so that hosts, hba’s and fabrics can act upon errors seen in the path to their end devices. This allows active redirection of frames to circumvent frames of passing via that route as well as the option to extend management capabilities so that storage administrators can act upon these failures and replace/update hardware and/or software before the problem becomes imminent and affects the overall stability of the storage infrastructure. This will in the end result in far greater storage availability and application uptime as well as prevent all the other nasty stuff like data corruption etc.

The proposal was positively received with an 8:0 voting ratio so now I’m waiting for a company to pull this further and actually starting to develop this extension.

Let me know what you think.

Regards
Erwin

Why FCoE will die a silent death

I’ve said it before, storage is not simple. There are numerous things you have to take into account when designing and managing a storage network. The collaboration between applications, IO stacks and storage networks have to be very stable in order to get something useful out of it both in stability as well as performance. If something goes wrong its not just annoying but it might be disastrous for companies and people.


Now I’ve been involved in numerous positions in the storage business from storage administrator to SAN architect and from pre-sales to customer support and I know what administrators/users need to know in order to get things working and keep it this way. The complexity that comes to the administrators is increasing every year as does the workload. A decade ago I use to manage just a little over a terrabyte of data and that was pretty impressive in those days. Today some admins have to manage a petabyte of data (yes, a 1000 fold more). Now going from a 32GB diskdrive to a 1TB diskdrive might look like their life just simplified but nothing is further from the truth. The impact it has when something goes wrong is immense. Complexity of applications, host/storage based virtualisation etc etc have all added to an increase of skills required to operate these environments.

So what does this have to with FCoE. Think of it as this: you have two very complex environments (TCPIP/networking and FibreChannel Storage) who by definition have no clue what the other is about. Now try to merge these two together to be able to transport packets through the same cable. How we do that? We rip away the lower level of the ISO and FC layers, replace that with a new 10GbE CEE interface, create a new wrapper with new frameheaders, addressing and protocol definitions on those layers and away we go.

Now this might look very simple but believe me, this was the same with fibre channel 10 years ago. Look how the protocol evolved. Not only in speeds and feeds but also tremendously in functionality. Examples are VSAN’s, Virtual Fabrics, FibreChannel Routing to name a few. Next to that the density of the FC fabrics has increased as does the functionality on storage arrays. I already wrote in a previous article that networking people in general are not interested in application behaviour. They don’t care about IO profiles, responsetimes and some packet loss since TCPIP will solve that anyway. They just transport packets through a pipe and if the pipe isn’t big enough they replace it with a bigger pipe or re-route some of the flow to another pipe. That is what they have done for years and they are extremely good at it. Storage people on the other hand need to know exactly what it hitting their arrays and disks. They have a much more vertical approach because each application has a different behaviour on storage. If you mix a large sequential load with a very random one hitting the same arrayports and spindles you know you are in a bad position.

So here is were politics will collide. Who will manage the FCoE network. Will it be the networking people? (Hey, it’s Ethernet right? So it belongs to us!). Normally I have no problem with that but they have to prove that they know how FibreChannel behaves, what a Ficon SBC codes set looks like as well as an FCP SCSI CDB. (I see some question marks coming already)
Now FCoE doesn’t work on your day-to-day ethernet or fibrechannel switch. You have to have specialized equipment like CEE and FCF switches to get things going. Most of them are not backwards compatible so they act more as a bridging device between an CEE and FC network. This in turn add significantly to the cost you were trying to save by knocking off a couple of HBA’s and network cards.

FCoE looks great but the added complexity in addition to an entire mindshift of networking and storage management plus the need for extremely well trained personnel will make this technology sit in a closet for at least 5 years. There it will mature over time so true storage and networking convergence might me possible as a real business value add. At the time of this writing the standard is just a year old and will need some fixing up.

Businesses are looking of ways to save cost, reduce risk and simplify environments. FCoE currently gives neither of these.