Why FCoE will die a silent death

I’ve said it before, storage is not simple. There are numerous things you have to take into account when designing and managing a storage network. The collaboration between applications, IO stacks and storage networks have to be very stable in order to get something useful out of it both in stability as well as performance. If something goes wrong its not just annoying but it might be disastrous for companies and people.


Now I’ve been involved in numerous positions in the storage business from storage administrator to SAN architect and from pre-sales to customer support and I know what administrators/users need to know in order to get things working and keep it this way. The complexity that comes to the administrators is increasing every year as does the workload. A decade ago I use to manage just a little over a terrabyte of data and that was pretty impressive in those days. Today some admins have to manage a petabyte of data (yes, a 1000 fold more). Now going from a 32GB diskdrive to a 1TB diskdrive might look like their life just simplified but nothing is further from the truth. The impact it has when something goes wrong is immense. Complexity of applications, host/storage based virtualisation etc etc have all added to an increase of skills required to operate these environments.

So what does this have to with FCoE. Think of it as this: you have two very complex environments (TCPIP/networking and FibreChannel Storage) who by definition have no clue what the other is about. Now try to merge these two together to be able to transport packets through the same cable. How we do that? We rip away the lower level of the ISO and FC layers, replace that with a new 10GbE CEE interface, create a new wrapper with new frameheaders, addressing and protocol definitions on those layers and away we go.

Now this might look very simple but believe me, this was the same with fibre channel 10 years ago. Look how the protocol evolved. Not only in speeds and feeds but also tremendously in functionality. Examples are VSAN’s, Virtual Fabrics, FibreChannel Routing to name a few. Next to that the density of the FC fabrics has increased as does the functionality on storage arrays. I already wrote in a previous article that networking people in general are not interested in application behaviour. They don’t care about IO profiles, responsetimes and some packet loss since TCPIP will solve that anyway. They just transport packets through a pipe and if the pipe isn’t big enough they replace it with a bigger pipe or re-route some of the flow to another pipe. That is what they have done for years and they are extremely good at it. Storage people on the other hand need to know exactly what it hitting their arrays and disks. They have a much more vertical approach because each application has a different behaviour on storage. If you mix a large sequential load with a very random one hitting the same arrayports and spindles you know you are in a bad position.

So here is were politics will collide. Who will manage the FCoE network. Will it be the networking people? (Hey, it’s Ethernet right? So it belongs to us!). Normally I have no problem with that but they have to prove that they know how FibreChannel behaves, what a Ficon SBC codes set looks like as well as an FCP SCSI CDB. (I see some question marks coming already)
Now FCoE doesn’t work on your day-to-day ethernet or fibrechannel switch. You have to have specialized equipment like CEE and FCF switches to get things going. Most of them are not backwards compatible so they act more as a bridging device between an CEE and FC network. This in turn add significantly to the cost you were trying to save by knocking off a couple of HBA’s and network cards.

FCoE looks great but the added complexity in addition to an entire mindshift of networking and storage management plus the need for extremely well trained personnel will make this technology sit in a closet for at least 5 years. There it will mature over time so true storage and networking convergence might me possible as a real business value add. At the time of this writing the standard is just a year old and will need some fixing up.

Businesses are looking of ways to save cost, reduce risk and simplify environments. FCoE currently gives neither of these.

Server virtualisation is the result of software development incompetence

Voila, there it is. The fox is in the hen-house.

Now let me explain before I get the entire world over me. 🙂
First I do not say that software developers are incompetent. I fact I think they are extremely smart people.
Second, the main reason for my statement is given the fact Moors’ law is still active we more or less got used to somewhat unlimited resources w.r.t. CPU/Memory/bandwidth etc. etc. developers most of the time write their own idea’s into the code without looking at better/more appropriate alternatives.


So let take a look why this server virtualization got started anyway.

The mainframe guys in the good old days already acknowledged the problem that application developers didn’t really give a “rats-ass” what else had to be installed on a system. They assumed that their application was the most important and deserved a dedicated system with likewise resources. Now this is Mainframe environment which already had strict rules regarding system utilization etc. The problem still was conflicts of shared libraries caused application havoc. So instead of flicking the application back to the developers IBM had to come up with something else and virtual instances were born. Now I’m too young to recollect the year this was introduced but I assume it was somewhere in the 70’s.

When Bill Gates came to power in the desktop industry and later in the server market you would assume that they learned something of the mistakes made in the past. Instead they came out with MS-DOS (I’m ignoring the OS/2 bit which they had some involvement in as well)
Now I’m fully aware that an Intel 8086 cpu had no were the capabilities as the CPU’s that were in the mainframe systems or mini’s but anyhow the entire architecture was build for single system and single application use.
They have ignored the fact that one system could do more than one task at the same time and application developer wrote whatever they seemed fit for their particular needs. Even today with windows and Unixes you are very often stuck with conflicting dependencies of libraries, compiler versions etc etc. Some administrators have called this the DLL Hell.

I’ve been personally involved in sorting out this mess with different applications that had to run on single systems so in that sense I know what I’m talking about.

So since the OS developers were obstructed by business requirements (in the sense that they could not enforce hard restrictions on application development) they more or less had no means to overcome this problem.

Now then there came some smart guys and dolls from Berkeley who started with a product which let you install an entire operating system in a software container and every resource needed by that operating system was directed by this container and voila: VMWare was born.

From my perspective this design has been the stupidest move ever made. I’m not saying the software is not good but from an architectural point of view this was a totally wrong decision. Why should I waste 30% or more of my resources by needing to install two, three, ten, twenty times the same kernel, libraries, functionality etc. etc.

What they should have done was build an application abstraction layer which made an inventory of underlying OS type, functionality, libraries etc etc. (You can safely assume that with current server farms each server has the same inventory if deployed from a central repository. Even if not this abstraction layer could detect and fix that) This way you can create lightweight application containers which share all common libraries and functionality from the OS that sits below this layer but if that is not enough or conflicts with these shared libraries they use a library or other settings which is locked inside this applications container.

Now here comes the fun. If I need to move applications to another sever I don’t need to move entire operating systems which rely on underlying storage infrastructures but instead I could move or even copy this application container to one or multiple servers. After that has been done you should be able to keep that application container in sync so if one gets a corrupt file for whenever reason the abstraction software should be able to correct that. This way you’re also assured that if I need to change anything I only need to that on configurations within that container.

This architecture is fare more flexible and can save organizations a lot of money.

The problem is: this software doesn’t exist yet. 🙂 (except maybe in development labs which I don’t have visibility of.)

You can’t compare it to cloud computing since currently that is far too limited in functionality. Clouds are build with a certain subset of functionality so although on the front-end side you see everything though a web-browser doesn’t mean that on the back-end in the data-centers if operates the same way. Don’t make the mistake that if you want to setup a cloud infrastructure your problems will be solved. You need a serious amount of real-estate to even think about cloud computing.

The above mentioned application container architecture let’s you grow far more easily.

Cheers,
Erwin

P.S. I used VMWare as an example since they are pretty well known but I also need to include all other server virtualisation technologies like Xen, Hyper-V etc.

Open Source Storage

Storage vendors are getting nervous. The time has come that SMB/SME level storage systems can be build from scratch with just servers, JBOD’s and some sort of connectivity.

Most notably SUN (or Oracle these days) has been very busy in this area. Most IP was already within SUN, Solaris source code has been made available, they have an excellent file-system (ZFS) which scales enormously and has a very rich feature set. Now extent that with Lustre ** and you’re steaming away. Growth is easily accomplished by adding nodes to the cluster which simultaneously increases the IO processing power as well as throughput.


But for me the absolute killer app is COMSTAR. This way you can create your own storage array with commodity hardware and make your HBA’s fibre channel targets. Present your LUNS and connect other systems to it via a fibre channel network. Better yet even iSCSI and FCOE are part of it now. Absolutely fabulous. These days there would be no reason to buy an expensive proprietary array but use the kit that you have. Ohh yes, talking about scalability, is 8 exabyte enough on one filesystem and over a couple of thousand nodes in a cluster. If you don’t have these requirements it works one a single server as well.

The only thing lacking is Mainframe support but since the majority of systems in data-centres have Windows or some sort of Unix farm anyway this can be an excellent candidate for large scale Open Source storage systems. Now that should make some vendors pretty nervous.

Regards,
Erwin

**ZFS is not yet supported in Luster clusters but on the roadmap for next year

Something different

Do you have kids crawling around the internet and you don’t have a clue of what they’re doing. (I know this has nothing to do with storage but I couldn’t leave this one just for myself.)

I’ve had that same problem and I’ve tried numerous things however last week I came across a very nifty service called Open DNS. The good thing is you don’t have to install anything and it works right out of the box. What is basically does is checking on DNS queries and if a query from your ipadress matches a site defined in one of the categories it returns a blocked page. You can even modify this page if you want.


The thing you have to do is change your ISP DNS server to one of theirs and you’re done. The best way to do this is modify your routers configuration (and they have lotst of examples to do that).

Now smart kids obviously know that if you change this DNS server back to your ISP’s ones they can circumvent that. The way to overcome that is to restrict the rights on your PC so they can’t.

Have a look at http://www.opendns.com and start having your kids be safe on the net.

Now be aware that this doesn’t mean it captures all so having multiple security measures like antivirus and a firewall in place is always advisable.

Regards,
Erwin

Save money managing storage effectively

How much tools do you use to manage your storage environment.

On average the storage admin uses 5 tools to manage a storage infrastructure.

1. Host tools (for getting host info like used capacity, volume configs etc.)
2. HBA tools (some OS’es don’t have a clue around that)
3. Fabric tools (extremely important)
4. Array tools (even more important)
5. Generic tools (for getting some sort of consolidated overview. Mainly Excel worksheets :-))


Sometime storage management is performed like below:

As you can see thing can become quite complicated when storage infrastructures grow and you’ll need a bigger whiteboard. At the point you have an enterprise storage infrastructure you’ll probably need a bigger building and a lot more whiteboards. 🙂

So what is the best way?

One word:

Integration, Integration, Integration, Integration.

The database boys know this for a long time. Don’t store the same information twice. This is called Database Normalization.
The same thing applies to storage management tools. Make sure that you use tools that have an integrated framework which leverages as much components as possible.

In case you’re using Hitachi kit is pretty easy. Their entire Hitachi Storage Command Suite works together and share single configuration repositories. The best thing is they do that even across their entire array product line from SMS to USP-V and even from two generations ago (so that includes the 9900 and 9500 series) This way other modules can make use of this. The other benefit is that you only have to deploy single host agents to obtain host info like volumes, filesystems, capacity usage etc and have that shared across all different products. Now be aware there is no silver bullet for managing all storage from a single pane of glass if you have a heterogenious environment. Every vendor has it own way of doing things and although the SNIA is making good progress with SMI-S it’s still lacking much of the nifty features storage vendors have released lately.

RTFM

Yeah its been a while. A lot has happened in two years. One thing that really jumps out is I moved Down Under. Yep, now inhabitant of kangarooland and I’ve loved every day of it.

To storage:
You don’t want to know how many questions I get who’s answers have been perfectly described in all sorts of manuals. This almost leads to the point were my job becomes a manual reader and a walking storage encyclopedia. 🙂 Now that’s something to put on my CV.


The big problem is however with so many different (storage) products and related documentation I can understand the problem storage admins have these days. Storage infrastructures become more and more complex and an ever increasing level of knowledge is required to maintain all of this. Take into account all different updates these guys get from their vendors almost on a monthly basis then you can imagine what their workday looks like. My life is pretty easy. I only have to keep track of around 80 software products and approx 15 storage hardware platforms because I work for one of those vendors. Multiply that by an average of around 17 manuals per product between 10 and over 5000 (yes, five-thousand) pages and …… you do the maths. Take into account that I also need to know what happens on a OS level from an IO stack perspective including all the different virtualisation kit that is out there including Mainframe z/OS so this pretty much sums up my daily life.. 😉

No, I’m not pitying myself. I have a fantastic wife, wonderful kids and good job, so I’m quite happy with what’s going on in my life.

Going back to the storage admins. The big difference between them and myself is I have access to all the information I need plus some competitive information of my com-colleagues. The storage admins totally rely of what the vendors want them to have and that very often is extremely restricted. I can understand that a lot of this is market sensitive and belongs as company confidential behind locks, however I also think that we should give the right information/documentation (in any form you like) in a structured and easy to understand format without the nitty/gritty stuff that is totally irrelevant. This will easy the burden which a lot of you guys out there suffer and believe me I’ve been there.

A second way of sharing experiences and knowledge is user communities. The perfect example for me has always been Encompass or DECUS. The best user community ever, affiliated to Digital Equipment Corporation. (HP still picks the fruit from that). I think it’s extremely important that vendor should provide a platform were their users can share expierences (good or bad) and be able to leverage the knowledge of his/her peers.

One of my primary tasks, besides being a technical conscience to my sales reps, is to provide my customers (you storage admins) with all the information they need and to help them manage the kit I sold them so they can be heroes within their company.

TTY later.

Greetz,
Erwin

Addres space vs. Dynamic allocation

This article is somewhat a successor to my first blog “The future of storage”. I discussed my article with Vincent Franceschini personally a while ago and although we have some different opinions on some topics, in general we agree on the setting we have to get more insight on the business value of data. This is the only way we can shift the engineering world to a more business focused mindset. Unfortunately today the engineering departments of all the major storage vendors still rely on old protocols like SCSI, NFS, CIFS which all have some sort of limitation which generally is address space.

To put this in perspective it’s like building a road with a certain amount of length and width which has a capacity for a certain number of cars per hour. This means it cannot adapt dynamically to a higher load i.e. more cars. You have to build new roads, or construct new lanes to existing ones if possible at all, to cater for more cars. With the growth of data and the changes companies are facing today it’s time to come up with something new. Basically this means we have to step away from technologies which have limitations build into their architecture. Although this might look like boiling the ocean I think we cannot afford the luxury of trying to improve current standards while the “data boom” is running like an avalanche.
Furthermore it is becoming too hard for IT department to keep up with the knowledge needed in every segment.

Question is “How do we accomplish this”. In my opinion the academic world together with the IT industry have huge potential in developing the next generation of IT. In current IT environments we run into barriers of all sorts. Performance, capacity, energy supply, etc etc.

So here’s an idea. Basically every word known to mankind has been written millions of times. So why do we need to write it over and over again. Basically what can be done is reference these words to compose an article. This leads to both a reduction of storage capacity needed as well as a reference-able index which can be searched on. The information of the index can be in a SNIA XAM format which also enables storage systems to leverage this information and dynamically allocate the required capacity or put business values to these indexes. This way the only thing that needs to be watched for is the integrity of the indexes and the words catalog. Another benefit of this is when a certain word changes it’s spelling the only thing that needs to be changed is that same word in the catalog. Since all articles just have references to this word the spelling is adjusted accordingly. (I’ll bet I will get some comments about that. :-))

As you can see this kind of information storage and retrieval totally eliminates the use of de-duplication, everything is written once anyway, which in turn has a major benefit on storage infrastructures, data integrity, authority etc etc. Since the indexes itself don’t have to grow because of auto elimination based on business value the concept of Dynamic Allocation has been achieved. OK, there are some caveats on the different formats, languages and overlapping context issues however these can be taken care of by linguists.

The Smarter Storage Admin (Work Smarter not Longer)

Lets start off with a question: Who is the best storage admin?
1. The one that starts at 07:00 AM and leaves at 18:00 PM
2. The one that starts at 09:00 AM and leaves at 16:00 PM

Two simple answers but they can make a world of difference to employers. Whenever an employer answers with no. 1 they often have the remark that this admin does a lot more work and is more loyal to the company. They might be right however the daily time spent at work is not a good qualifier for productivity so the amount of work done might be less than no.2. This means that an employer has to measure on other points and define clear milestones that have to be fulfilled.


Whenever I visit customers I often get the complaint that they spend too much time doing day to day administration like digging through log files, checking status messages, restoring files or emails etc. etc. These activities can occupy more than 60% of an administrators day which can be avoided.
To be more efficient one has to change the mindset from knowing all to knowing what doesn’t work. It’s a very simple principle however to get there you have to do a lot planning.
An example is when a server reboots do I want to know if the switch port goes offline? Maybe I do, maybe I don’t. It all depends on what the impact of that server is. Is it planned or not or maybe this server belongs to a test environment in which case I don’t want to get a phone-call in the middle of the night at all.

The software and hardware in a storage environment consists of many different components and they all have to work together. The primary goal of such an environment is to move bytes back and forth to disk, tape or another medium and they do that pretty well nowadays. The problem however is management of all these different components which require all different management tools, learning tracks and operation procedures. Even if we shift our mindset to “What doesn’t work”, we still have to spend a lot of time and effort in thing we often don’t want to know.

Currently there are no tools available who support the whole range of hardware and software so for specific tasks we still need the tools the vendors provide. However for day to day administration there are some good tools which might be very beneficial for administrators. These tools can save more than 40% of an administrators time so they can do more work in less time. It takes a simple calculation to determine the ROI and another pro is that the chances of making mistakes is drastically reduced.

Another thing to consider is if these tools fit into the business processes if these are defined within a company. Does the company have ITIL, Prince2 or any other method of IT service management in place. If so the storage management tool has to align to these processes since we don’t want to do things twice.

Last but not least is the support for open standards. The SNIA (Storage Networking Industry Association) is an non-profit organization which was founded by some storage vendors in the late 90’s. The SNIA works in conjunction with its members around the globe to make storage networking technologies understandable, simpler to implement, easier to manage, and recognized as a valued asset to business. One of the standards ,which was recently certified by ANSI, is SMI-S. This standard defines a very large subset of storage components which can be managed through a single common methodology. This means that you’ll get one common view of all your storage assets with the ability to manage it through a single interface independent of the vendor. If your storage management tool is based on this standard you do not have a vendor lock-in and day to day operations will be more efficient.
This implies however that the vendor also has to support the SMI-S standard so make sure you make the right choice if you are looking for a storage solution and ask the vendor if he supports the SMI-S standard and to what extend.

Greetz,
Erwin