Category Archives: Troubleshooting

Host based mirroring kills your storage network!!

System administrators are very inventive and lazy. I know, I used to be one of them. 🙂 Everything that can be done to make ones life easier is about to be scripted, configured, designed etc.  If you are responsible for an overall environment from Apps to servers to networks and storage you can make very informed decisions on how you want to set up each different aspect of your environment. The last time I had this opportunity was back in 1995. Since then I have not come across an environment where a single person/team was responsible for each technology aspect of the infrastructure. As environments grow these teams grow as well. Business decisions like splits, acquisitions, outsourcing etc etc have enormous impacts not only on the business itself but also on people who are now forced to work with other people/teams who may have different mind-sets, processes and procedures and even completely different technologies. In many such instances strange things will happen and result in a very unpredictable behaviour of compute, network and storage systems. Below I’ll give you such an example where decisions from a systems-level perspective results in massive problems on a storage network.

Continue reading

Reset the Zoning Configuration and Prevent Mistakes.

There are occasions where you need to remove an entire zoning configuration from a switch. One of them is if you need to add a switch to an existing fabric and it still has a configuration in it. If these two conflict the switch will simply segment and the ISL’s get disabled. Another reason might be in case of configuration conflicts where and administrator had made zone changes whilst one switch was not participating in the fabric. Depending on how the switch is set up and the actual fabric state you need to follow different procedures.

Continue reading

Appalling state of Storage Networks

In the IT world a panic is most often related to a operating systems kernel to run out of resources or some unknown state where it cannot recover and stops. Depending on the configuration it might dump its memory contents to a file or a remote system. This allows developers to check this file and look for the reason why it happened.

A fibre-channel switch running Linux as OS is no different but the consequences can be far more severe.

Continue reading

Open Source Software (OSS) and security breaches in proprietary firmware

It is no secret that many vendors use open source software in their products and solutions. One of the most ubiquitous  is Linux which is often the base of many of these products and used as core-OS because of it’s flexibility and freely available status without the need of keeping track of licenses (to some extent) and costs.

These OSS tools have different development back-grounds and are subject to policies of the person (or people/companies) who develop it. This obviously results in the fact that defects or bugs may result in security issues especially when it involves network related applications. Recently the bugs in OpenSSL and Apache have gain much traction as some of these are fairly significant and can result in access breaches or denial of service.

Continue reading

Performance misconceptions on storage networks

The piece of spinning Fe3O4 (ie rust) is by far the slowest piece of equipment in the IO stack. Heck, they didn’t invent SSD and Flash for nothing, right. To overcome the terrible latency, involved when a host system requests a block of data, there are numerous layers of software and hardware that try to reduce the impact of physical disk related drag.

One of the most important is using cache. Whether that is CPU L2/L3 cache, DRAM cache or some hardware buffering device in the host system or even huge caches in the storage subsystems. All these can, can will, be used by numerous layers of the IO stack as each cache-hit means it prevents fetching data from a disk. (As in intro into this post you might read one I’ve written over here which explains what happens where when a IO request reaches a disk.)

Continue reading

Port counters may be flawed.

As a support-guy you very often look at port counters. These do not only provide insight into the status of a port but also may give statistical information which allows you to plan and design new connectivity layouts and diagrams or give some general advice. If you look at the wrong counters though you may be in for a surprise as some may not tell you the actual truth.

Continue reading