Tag Archives: bottleneckmon

1.1 – MAPS – Know what’s going on.

I’ve written about Fabric Watch quite a lot and I have always stressed the usefulness of this licensed add-on as a feature in FOS. This post will outline the major characteristics of MAPS and why you should migrate now. As of FOS 7.2 there has been a transition from Fabric Watch to MAPS (Monitoring and Alerting Policy Suite) and over the past few FOS versions it has seen a huge improvement in overall RAS (Redundancy, Availability and Serviceability) monitoring features. As of FOS 7.4 FabricWatch is no longer incorporated in FOS and as such MAPS is the only option you have if you want to use it.  MAPS is one section of a two part suite called Fabric Vision together with its performance companion “Flow-vision”. The MAPS part can interact with flow-vision based on criteria you specify and monitor/alert on performance related events.

Continue reading

Brocade Fabric Vision – Version 1

As you may have read in my previous posts I’m not really a fan of marketing driven terminology whereby existing technology is re-branded over and over again in order to obfuscate the underlying technology and make things more complex that they really are. The FC Gen-X nonsense is one of them. With Brocade Fabric Vision it took me a while however I see where Brocade is going with this and more where it is coming from.

Continue reading

Brocade FOS 7.1 and the cool features

After a very busy couple of weeks I’ve spent some time to dissect the release notes of Brocade FOS 7.1 and I must say there are some really nice features in there but also some that I REALLY think should be removed right away.

It may come to no surprise that I always look very critical to whatever come to the table from Brocade, Cisco and others w.r.t. storage networking. Especially the troubleshooting side and therefore the RAS capabilities of the hardware and software have a special place in my heart so if somebody screws up I’ll let them know via this platform. 🙂
So first of all some generics. FOS 7 is supported on the 8 and 16G platforms which cover the Goldeneye2,Condor2 and Condor 3 ASICs plus the AP blades for encryption, SAN extension and FCoE. (cough, cough)….Be aware that it doesn’t support the blades based on the older architecture such as the FR4-18i and FC10-6 (which I think was never bought by anyone.)  Most importantly this is the first version to support the new 6520 switch so if you ever think of buying one it will come shipped with this version installed. 
As for the software features Brocade really cranked up the RAS features. I especially do like the broadening of the scope for D-ports (diagnostics port) to include ICL ports but also between Brocade HBA’s and switch ports. One thing they should be paying attention to though is that they should sell a lot more of these. :-). Also the characteristics of the test patterns such as test duration, frame-sizes and number of frames can now be specified. Also FEC (Forward Error Correction) has been extended to access gateways and long distance ports which should increase stability w.r.t. frame flow. (It still doesn’t improve on signal levels but that is a hardware problem which cannot be fixed by software).
There are some security enhancements for authentication such as extended LDAP and TACACS+ support.
The 7800 can now be used with VF albeit not having XISL functionality. 
Finally the E_D_TOV FC timer value is propagated onto the FCIP complex. What this basically means that previously even though an FC frame had long timed-out according to FC specs (in general 2 seconds) it could still exist on the IP network in a FCIP packet. The remote FC side would discard that frame anyway thus wasting valuable resources. With FOS 7.1 the FCIP complex on the sending side will discard the frame after E_D_TOV has expired.
One of the most underutilised features (besides Fabric Watch) is FDMI (Fabric Device Management Interface). This is a separate FC service (part of the new FC-GS-6 standard) which can hold a huge treasure box of info w.r.t. connected devices. As an example:
FDMI entru
        switch:admin> fdmishow
        Local HBA database contains:
          Ports: 1
              Port attributes:
                FC4 Types: 0x0000010000000000000000000000000000000000000000000000000000000000
                Supported Speed: 0x0000003a
                Port Speed: 0x00000020
                Frame Size: 0x00000840
                Device Name: bfa
                Host Name: X3650050014
                Node Name: 20:00:8c:7c:ff:01:eb:00
                Port Name: 10:00:8c:7c:ff:01:eb:00
                Port Type: 0x0
                Port Symb Name: port2
                Class of Service: 0x08000000
                Fabric Name: 10:00:00:05:1e:e5:e8:00
                FC4 Active Type: 0x0000010000000000000000000000000000000000000000000000000000000000
                Port State: 0x00000005
                Discovered Ports: 0x00000002
                Port Identifier: 0x00030200
          HBA attributes:
            Node Name: 20:00:8c:7c:ff:01:eb:00
            Manufacturer: Brocade
            Serial Number: BUK0406G041
            Model: Brocade-1860-2p
            Model Description: Brocade-1860-2p
            Hardware Version: Rev-A
            Driver Version:
            Option ROM Version:
            Firmware Version:
            OS Name and Version: Windows Server 2008 R2 Standard | N/A
            Max CT Payload Length: 0x00000840
            Symbolic Name: Brocade-1860-2p | | X3650050014 |
            Number of Ports: 2
            Fabric Name: 10:00:00:05:1e:e5:e8:00
            Bios Version:
            Bios State: TRUE
            Vendor Identifier: BROCADE
            Vendor Info: 0x31000000
and as you can see this shows a lot more than the fairly basic nameserver entries:
N    8f9200;      3;21:00:00:1b:32:1f:c8:3d;20:00:00:1b:32:1f:c8:3d; na
    FC4s: FCP 
    NodeSymb: [41] “QLA2462 FW:v4.04.09 DVR:v8.02.01-k1-vmw38”
    Fabric Port Name: 20:92:00:05:1e:52:af:00 
    Permanent Port Name: 21:00:00:1b:32:1f:c8:3d
    Port Index: 146
    Share Area: No
    Device Shared in Other AD: No
    Redirect: No 
    Partial: No
    LSAN: No
Obviously the end-device needs to support this and it has to be enabled. (PLEASE DO !!!!!!!!) It’s invaluable for troubleshooters like me….
One thing that has bitten me a few times was the SFP problem. There has long been a problem that when a port was disabled and a new SFP was plugged in the switch didn’t detect that until the port was enabled and it had polled for up-to-date information. In the mean time you could get old/cached info of the old SFP including temperatures, db values, current, voltage etc.. This seems to be fixed now so thats one less thing to take into account.
Some CLI improvements have been made on various commands with some new parameters which lets you filter and select for certain errors etc.
The biggest idiocracy that has been made with this version is to allow the administrator change the severity level of event-codes. This means that if you have a filter in BNA (or whatever management software you have) to exclude INFO level messages but certain ERROR or CRITICAL messages start to annoy you you could change the severity to INFO and thus they don’t show up anymore. This doesn’t mean th problem is less critical so instead of just fixing the issue we now just pretend it’s not there. From a troubleshooting perspective this is disastrous since we look at a fair chuck of sup-saves each day and if we can’t rely on consistency in a log file it’s useless to have a look in the first place. Another one of those is the difference in deskew values on trunks when FEC is enabled. Due to a coding problem these values can differ up to 40 therefore normally depicting a massive difference in cable length. Only by executing a d-port analysis you can determine if that is really the case or not. My take is that they should fix the coding problem ASAP.  
A similar thing that has pissed me off was the change in sfpshow output. Since the invention of the wheel this has been the worst output in the brocade logs so many people have scripted their ass off to make it more readable.
Normally it looks like this:
Slot  1/Port  0:
Identifier:  3    SFP
Connector:   7    LC
Transceiver: 540c404000000000 2,4,8_Gbps M5,M6 sw Short_dist
Encoding:    1    8B10B
Baud Rate:   85   (units 100 megabaud)
Length 9u:   0    (units km)
Length 9u:   0    (units 100 meters)
Length 50u:  5    (units 10 meters)
Length 62.5u:2    (units 10 meters)
Length Cu:   0    (units 1 meter)
Vendor Name: BROCADE         
Vendor OUI:  00:05:1e
Vendor PN:   57-1000012-01   
Vendor Rev:  A   
Wavelength:  850  (units nm)
Options:     003a Loss_of_Sig,Tx_Fault,Tx_Disable
BR Max:      0   
BR Min:      0   
Serial No:   UAF11051000039A 
Date Code:   101212  
DD Type:     0x68
Enh Options: 0xfa
Status/Ctrl: 0xb0
Alarm flags[0,1] = 0x0, 0x0
Warn Flags[0,1] = 0x0, 0x0
                                          Alarm                  Warn
                                   low        high       low         high
Temperature: 31      Centigrade    -10         90         -5          85
Current:     6.616   mAmps          1.000      17.000     2.000       14.000 
Voltage:     3273.4  mVolts         2900.0      3700.0    3000.0       3600.0 
RX Power:    -2.8    dBm (530.6uW) 10.0   uW 1258.9 uW   15.8   uW  1000.0 uW
TX Power:    -3.3    dBm (465.9 uW)125.9  uW   631.0  uW  158.5  uW   562.3  uW
and that is for every port which basically makes you nuts.
So with some bash,awk,sed magic I scripted the output to look like this:
Port  Speed   Long  Short  Vendor     Serial            Wave   Temp   Current  Voltage   RX-Pwr   TX-Pwr
wave wave number Length
1/0 8G NA 50 m BROCADE UAF11051000039A 850 31 6.616 3273.4 -2.8 -3.3
1/1 8G NA 50 m BROCADE UAF110510000387 850 32 7.760 3268.8 -3.6 -3.3
1/2 8G NA 50 m BROCADE UAF1105100003A3 850 30 7.450 3270.7 -3.3 -3.3
From a troubleshooting perspective this is so much easier since you can spot issues right away.
Now with FOS 7.1.x the FOS engineers screwed up the SFPshow output which inherently screwed up my script which necessitates a load more work/code/lines to get this back into shape. The same thing goes for the output on the number of credits on virtual channels.
Pre-FOS 7.1 it looks like this:
C:—— blade port 64: E_port ——————————————
C:0xca682400: bbc_trc                 0004 0000 002a 0000 0000 0000 0001 0001 
With FOS 7.1 it looks like this:
bbc registers
0xd0982800: bbc_trc                 20   0    0    0    0    0    0    0    
(Yes, hair pulling stuff, aaarrrcchhhh)
Some more good things. The fabriclog now contains the direction of link resets. Previously we could only see an LR had occurred but we didn’t see who initiated it. Now we can and have the option to figure out in which direction credit issues might have been happening. (phew..)
The CLI history is now also saved after reboots and firmware-upgrades. Its been always a PITA to figure out who had done what at a certain point-in-time. This should help to try and find out.
One other very useful thing that has been added and it a major plus in this release is the addition of the remote WWNN of a switch in the switchshow and islshow output even when the ISL has segmented for whatever reason. This is massively helpful because normally you didn’t have a clue what was connected so you also needed to go through quite some hassle and check cabling or start digging through the portlogdump with some debug flags enabled. Always a troublesome exercise. 
The bonus points from for this release is the addition of the fabretrystats command. This gives us troubleshooters a great overview of statistics of fabric events and commands. 
0        0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
69       0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
71       0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
79       0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
131      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
140      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
141      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
148      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
149      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
168      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
169      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
174      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
175      0    0    0      0    0    0    0    0    0    0    0    0        0    0    0    0    0    0        
This release also fixes a gazillion defects so its highly advisable to get to this level better sooner than later. Check with your vendor for the latest supported release.
So all in all good stuff but some things should be reverted, NOW!!!. and PLEASE BROCADE: don’t screw up more output in such a way it breaks existing analysis scripts etc…

The Emergency Health Threat (EHT) on Brocade FC fabrics

Since a couple of FOS versions ago Brocade wanted to fix the problem Fibre-Channel has by definition called credit back-pressure. The word “problem” overstated since in 99.99999% of all fabrics you’ll never see this anyway.  As you know Fibre-Channel is a deterministic architected type of network which requires devices to behave properly. The chaos theory of Ethernet networks where flooding and broadcasts are more along the line of shoot the fly with a cannon doesn’t apply to Fibre-Channel.

So what is the issue then? By design two link-end points tell each other how many buffers they have available to store a FC frame during link initialization in the FLOGI (Fabric Login) phase. These is bi-directional so one side of the link can have 40 credits whilst the other only has three. This way the sending side know that it can send X amount of frames before it has to stop.

Below you see a snippet of a FC FLOGI trace plus the subsequent Accept.

The HBA tells the switch it has 3 buffers available so the switch is allowed to send a maximum of 3 frames before it need to wait for an R_RDY

 The switch then returns with an accept in which it tells the HBA that it has 8 buffers available.

When the sending side transmits a frame it subtracts one credit from this amount. The receiving side on the other side of the link forwards the frame to its destination and sends a, so called, R_RDY (Receiver Ready) primitive back to the transmitting side which causes the number of outstanding credits to be increased by one again. So far pretty simple but imagine the following scenario.

I have three servers on the left and a storage array on the right with a switch in the middle. You see that the link between the storage array and the switch carries all traffic from all three servers. In a normal situation this is no problem as long as all devices behave as they should. Send a frame as long as a credit is available and wait for an R_RDY to return to replenish that credit. Now what happens if the blue server start behaving badly? ie. for some reason either it is starting to become very slow on returning these R_RDY primitives or the R_RDY is corrupted due to a physical link problem. The second scenario is pretty easy to figure out since the switch logs these kind of problems in the porterrshow plus this particular link will log a lot of LR entries in the fabriclog. (Have a look)

To get back to the problem you’ll see that at some point in time when the number of credits are all used from port #6 to port #3 the frames that are coming in on the switch ingress port #7 can no longer be forwarded to that port 6 and subsequent port #3. So these buffers on port 7 will fill up pretty rapidly which also means there is no room anymore for frames arriving from the array destined to be forwarded via port #5 to the green server or port #4 to the red server.  So even when these two server have no problem at all they might get really impacted by the blue server. One way to overcome this is to shut down port #6 and you will see that traffic from the red and green server start flowing again. When there is a driver or firmware issue you will have some more troubles finding out what’s causing all this.

Now you may argue and say “listen dude, these frame do not sit in there indefinitely and at some point they will be discarded which allows for these buffer to be reused again so traffic starts flowing again.” Yes, you are right. There always has been a bit of a ballpark figure of a, so called, “hold time” which more or less meant that a frame may sit in an ingress buffer for X amount of time before the ASIC may discard that frame and free up the buffer and consequently return a R_RDY to the transmitting side. The issue is however that the transmitting port on the other side of the link might as well send a frame with the same troubled destination which also means that that frame will sit in the ingress buffer for that X amount of time. Brocade has used a formula to determine this “hold time” and on a default configured fabric this turns out to be 500ms. So in the previous example it may well be that for sending only two frames you lose an entire second. That’s an eternity in Fibre-Channel and storage so every precaution needs to be taken to prevent this from happening. Your performance across the fabric with sink through the toilet.

Now lets take this previous example one step further.

A fairly regular small to medium size SAN in a core-edge design looks a bit like this.

You have your servers on the left, the colored boxes are edge switches and the central ones are the cores. Nothing really fancy fancy but you already can see that from a trafficflow perspective  it becomes pretty weird already since everything can go everywhere based on the source and destination routing paths that are set up during fabric build and FSPF routing calculations. Especially on the ISL’s between the two cores you see all traffic from all servers, disk arrays and tape drives. You can imagine that is things start to stack up somewhere you will end up in the same scenario are I described before but with this difference that all traffic on those core ISL’s can be affected. So even one device which sits behind the red switch and has a target on the purple switch may cause a problem to servers sitting behind the yellow switch going to a target behind the light blue switch. Similar the other way around since high latency devices or software issues might also be in the target devices.

So how do you overcome such a situation? Well, since we can’t predict failures to the exact minute we can try to circumvent one device clogging up traffic through the entire fabric. The best and only answer is something I described in my earlier blog posts (check the Rotten Apple series) to shut down the misbehaving port(s). My take is that if a port doesn’t work properly anyway plus it has a significant adverse effect on the rest of the entire fabric it is really a nobrainer to shut it down. Brocade FOS has pretty nifty tools like fabricwatch to get this sorted. It seems that from any logical, technical and reasonable standpoint this is the best option. Until politics get involved and from that point on all hell breaks loose. Every non-related, non-technical and non-reasonable argument is thrown back at you to prevent the automation of these great tools to shut down a misbehaving device. It’s like you know you’re being robbed and the police is ready to intervene and catch these criminals but some politician (or business owner in your case) doesn’t allow that so the robbers get away with the loot.

So the second best answer is that we need to drop frames on the edge switches sooner than on the cores. With the introduction of FOS 6.3.1b Brocade added a parameter which allowed you to adjust this edge hold time from around 100ms to the default 500ms. So this would in essence be a fairly average candidate to prevent these kind of credit back-pressure problems. The reasoning is that if the edge switch does not have any credits to forward a frame to the core-switch in a shorter period than the core-switch needs to forward it on its egress ports, there will be no overload on the core switch and thus no timing issues on that side. This allows the cores to keep moving traffic since those ingress ports will not be  subjected to credit starvation since the frames are already dropped in an earlier stage and the outstanding buffer will be replenished sooner. In the end the result is that non-related traffic therefore will not be affected. There is however one major “gotcha”. This is a global switch setting on condor 2 platforms (which needs to be set on the default switch) and is applied PER ASIC AS SOON AS AN F-Port IS FOUND ON THAT ASIC. This means that if you share F-Ports and E-ports on a single ASIC the EHT is also applied on those E-ports so all returning traffic sitting in that Edge switch ingress port is immediately affected. So it is NOT a good idea to mix E-ports and F-ports on the same ASIC. What also is not a good idea is to have this EHT set too low (or tinkered with at all) on ASICs where target ports are connected since the argument is that one single target port might have a significant amount of fan-in ratio of devices and thus is more susceptible to somewhat elongated response times anyway which might affect timeing issues .

Although the EHT in general seems a great idea it can have some ramifications because what will happen on a very busy fabric where the credit_zero counters are already ramping up with significant speed. Reducing the time on which a frame can be sitting in a buffer might already be a problem. If frames are dropped due to timeouts whilst being allowed to sit around for 500ms will certainly be dropped with a much higher rate if the hold time is reduced.

Now here is where Brocade is currently shooting itself in the foot.

In subsequent versions, especially in the 7.0.1 and 7.1.0 releases they adjusted the default values from 500 to 220MS. ………. (keep thinking………)

Also there is a difference between Condor3 platforms (16G) and Condor2 platforms (8G) on the virtual circuits that live on the ISL’s plus the fact this is now configurable per logical switch (Condor3 platforms that is). Great fun if you have a mixed 8G and 16G blades in your DCX8510

Done thinking????

In general what we see in support is that many edge switches are embedded switches in blade-chassis and these fairly often fall under the responsibility of the server administrators. Now, I don’t want to discredit these fine folks, but in general they do not often take a look at the FC switches in this chassis. The result is that these switches are very often running old code. Now what happens if the storage admin decides to upgrade the core-switches to FOS 7.1.0x.??? The EHT will drop right away to 220ms since that’s the new default on that FOS version. The edge switches still run old code which have a default of 500ms hold-time so in a blink of an eye you now have a reversed EHT fabric where the edge switches (including the ones which might have bad behaving devices attached) are doing their job of trying to send a frame during 500ms but these frames will most likely be delayed much longer on the core-switches due to fan-in ratio and credit back-pressure and thus will be discarded at a much higher rate here than on the edge. Given the fact the EHT does not discriminate between frame on source and destination, all these dropped frames are likely to be from many, many initiators and targets across the fabric. This will results in the upper-layer protocols (like SCSI) having to re-drive their IO’s in addition to the normal workload and thus the parabolic eclipse of misery rises exponentially.

So what is my advice.

  1. First, keep all codelevels as up-to-date as possible.
  2. Monitor for bad-behaving devices, use FabricWatch AND SHUT THESE DOWN with the portfencing feature!!!!!!!!!!!!!! Preventing problems is always better that having to react on it.
  3. Make sure that if you use the EHT you configure it correctly. There are currently too many flaws in this feature that I wouldn’t recommend changing it to another value than the one we have used for decades and that is 500ms. If you fabric is up for it than keep it consistent across all switches in the fabric.
  4. Use bottleneck monitoring and alerting to identify high latency devices. Check these devices for up-to-date firmware and drivers and check if any physical issue might be the cause of the problem. (especially check the enc_out column of the porterrshow output.)
  5. Also use bottleneck monitoring to recover lost credits. Use the “bottleneckmon –cfgcredittools -intport -recover onLrOnly” command to have FOS fix credit issues on back-end links.
From a design perspective we often see schema’s which resemble the picture I’ve drawn above. A core-edge design with servers on one side and target on the other. Personally i think this is the worst design you could think of since it is susceptible to all sorts of nasty thing of which physical issues are the most obvious and easy to fix. Latency and congestion are a very different ballgame so keeping initiators and target as close as possible to each other has always had my preference. Try to localize per port-group, then  ASIC, then blade, then switch and last per hop count. This will give you the best performing and most resilient fabric.

!!!!!!!!!!!   UPDATE April 3rd 2013  !!!!!!!!!!!!

I’ve got some feedback from Brocade based on some questions we had:

In case of existing Fabric where EHT is 500ms and you insert a brand new 8510 in the core, the 8510 is set to 220. What is Brocade’s recommendation?

  • Brocade’s recommendation is to change the EHT for the new 8510 in the core to 500ms
  • Once set to 500ms on the 8510, any firmware upgrades to the DCX or 8510 will retain the EHT settings of 500ms

Can you set EHT on condor3 by port? Where can we check this setting in Supportsave logs?

  • You can’t set EHT by port … it is still one setting for the switch, but that setting will only take effect on individual ports based on the ASIC and port type:
  • ASIC has only F-ports. The lower EHT value will be programmed onto all F-ports
  • ASIC has both E-ports and F-ports.  All ports will be programmed with 500ms
  • ASIC has only E-ports.  All ports will be programmed with 500ms


  • ASIC has only F-ports.  The lower EHT value will be programmed onto all F-ports
  • ASIC has both E-ports and F-ports.  The lower EHT value will be programmed onto all F-ports, and 500ms will be programmed onto all E-ports
  • ASIC has only E-ports. All ports will be programmed with 500ms
  • You can see the current value set by using the following CLI command:

configshow | grep “edge”

  • This should also be part of the system group, configShow command in the Supportsave

What is the real default for FOS 7.x  220 or 500?

  • Brand new system, newly installed with 7.X firmware:   Default will be 220
What is Brocade’s Best practice? 220 on edge and 500 in core?
  • Yes, from our testing, we have shown this to be the optimal setting.   

Additional info:

  • Existing DCX upgraded from 6.4.2 or 6.4.3 to 7.X:               Setting will remain at 500ms with one exception :

Upgrading from 6.4.2 or 6.4.3, the 6.x default of 500ms will be retained, with one exception: If the configure command had never been run before, and the very first time it is run is after upgraded to 7.X then the EHT will use the 7.x default of  220ms  (I don’t think this should be a concern because every end user would have run the configure command. I think this would only apply on a new unit shipped with 6.x, never installed/configured, then upgraded to 7.x, then configured. In this case it would appear as a “new” 7.x install with 7.x default).


As mentioned above there will be some separate documentation around this topic. My preference would be to have the ability to segregate core from edge-switches plus manually be able to differentiate between E- and F-ports irrespective of FOS version and/or chip type but I also do acknowledge this might not be a real option on older equipment.

Hope this helps a bit if you get confused by the different options and settings regarding this Edge-Hold-Time.


PS. The “hold-time” formula I mentioned above is  ((RA_TOV – ED_TOV)/(max_hops + 1))/2 which by default translates to (10000ms-2000ms)/(7+1)/2=500ms

and PPS: pardon my drawing skills. 🙂