Tag Archives: data growth

Storage in 2013 and beyond.

It’s comes to no surprise that a couple of technologies really struck in 2012. Flash disk drives, and specifically in flash arrays, have gone mainstream. One more technology still clinging on is converged networking and of course Big Data.

Big Data has become such a hype-word that many people have different opinions and descriptions for it. What is basically boils down to is that too many people have too much stuff hanging around which they never clean up or remove. This undeniably causes a huge burden on many IT departments who only have one answer: Add more disks……..

So where do we go from here. There is no denial that exabyte type storage environments become more apparent in many companies and government agencies. The question is what is being done with all these “dead” bytes. Will they even be used again. What is being done to safeguard this information?

Some studies show that the cost of managing this old data outgrows the benefit one could obtain from it. The problem is there are so many really useful and beneficial pieces of data in this enormous pile of bits but none of them are classified and tagged as such. This makes the “delete all” option a no-go but the costs of actually determining what needs to be kept can run side-by-side with keeping it all. We can be fairly certain that neither of the two options can hack it in the long run. Something has to be done to actually harvest the useful information and finally get rid of the old stuff.

The process of classification needs to be via heuristic mathematical deterministics. A mouth full but what it actually means is that every piece of information needs to be tagged with a value. Lets call this value X. This X is generated based upon business requirements related to the type of business we’re actually in. Whilst indexing the entire information base certain words, values, and other pieces of information appear more often than others. These indicators can cause a certain information type to obtain a higher value then others and there ranks higher (ie the X value increases). Of course you can have a multitude of information streams where one is by definition larger and causes data to appear more frequent in which case it rank higher even though the actual business value is not that great whilst you might have a very small project going on that could generate a fair chunk of your annual revenue. To identify those these need to be tagged with a second value called Y. And last but not least we have age. Since all data loses its accuracy and therefore value the data needs to be tagged with a third value called Z.

Based upon these three values we can create 3 dimensional value maps which can be projected on different parts of the organization. This outlines and quantifies where the most valuable data resides and where the most savings can be obtained. This allows for a far more effective process of data elimination and therefore huge cost savings. Different mathematical algorithms already exist however have not been applied in this way and therefore such technologies do not exist yet. Maybe something for someone to pick up. Good luck.

As for the logical parts of the Big Data question in 2013 we will will see a bigger shift towards object based storage. If you go back to one of my first articles you will see that I predicted this shift 6 years ago. Data objects need to get smarter and more intelligent by nature in order to increase value and manageability. By doing this we can think of all sorts of smarts to utilize the information to the fullest extend.

As for the other, more tangible technologies my take on them is as follows.


Flash technology will continue to evolve en price erosion will, at some point, will cause it to compete with normal disks but that is still a year or two away. R&D costs will still have a major burden on the price point of these drives/arrays so as the uptake of flash continues it will level out. Reliability has mostly been tackled by advances in redundancy and cell technology so that argument can be mostly negated. My take on dedicated flash arrays is that these are too limited in their functions and therefore overpriced. The only benefit they provide is performance but that is easily countered by the existing array vendors by adding dedicated flash controllers and optimized internal data-paths in their equipment. The benefit is that these can utilize the same proven functions that have been available for years. One of the most useful and cost-effective is of course auto-tiering which allows to have optimum usage is gives the most bang for your buck.

Converged networking

Well, what can I say. If designed and implemented correctly it just works but many companies are just not ready from a knowledge standpoint to adopt it. There are just too many differentiation in processes, knowledge and many other point which conflicts between the storage and networking folks. The arguments I ventilated in my previous post have still not been countered by anyone and as such my standpoint has not changed. If reliability and uptime is one of your priorities than don’t start with converged networking. Of course there are some exceptions. If for instance use want to buy a Cisco UCP then this system runs converged networking internally from front-to-back but there is not really much than is configurable so the “Oeps” factor is significantly minimized.

Processor and overall system requirements

More and more focus will be placed upon power requirements and companies will be forcing vendors to the extreme to reduce the amount of watts their systems suck from the wall socket. Software developers are strongly encouraged (and that’s an understatement) to sift through their code and check if optimizations can be achieved in this area.


A short look on the techno news sites in 2012 and you’ve probably noticed an increase in court cases were people are held responsible for breaches in confidentiality and  availability of information infrastructures. This will become a real battle with outsourced cloud services in the very near future. Cloud providers like AWS, Rackspace and Microsoft negate all responsibility w.r.t. to service/data-availability and uptime in their terms of use and contracts but just how far can they stretch this? There will be some point in time where courts will hold these provides accountable and you will see a major shift in requirements these providers will put in their infrastructures. All this will of course have significant ramifications on pricing and cloud expectations will have to be adjusted.

Hope you all have a good 2013 and we’ll see if some of these will gain some uptake.


The end of spinning disks (part 2)

Maybe you found the previous article a bit hypothetical and is not substantiated by facts but merely some guestimations?

To put some beef into the equation I’ll try to substantiate it with some simple calculations. Read on.

As shown in Cornell Uni’s report the expected amount of data generated will reach 1700 exabytes in 2011 with an additional 2500 in 2012. 1700 exabytes equates to 1 trillion, 700 billiard gigabytes in EU notation (say what…., look here)

So number-wise it looks like this: 1.700.000.000.000 GB

The average capacity of a disk drive in 2011 is around 1400 GB (the average of enterprise drives with high RPM of 600GB + the largest capacity wise commercially available for enterprise environments HDD of 2TB).In consumer land WD has a 6TB drive but these will not become mainstream until the end of 2011 or beginning 2012 . Maybe storage vendors will use the 3 and 4 TB versions but I do not have visibility of that currently.

1700EB / 1400GB = disk drives are needed to store this amount of information. (Ohh, in 2012 we need 1.785.714.286 units :-))

This leads us to have a look at production capabilities and HD vendors. Currently there are two major vendors in the HDD market. Seagate (which shipped 50 million HDD in FQ3 2011) and WD shipping 49 million. (Seagate acquired HGST and WD is talking to the HDD division of Samsung) Those 4 companies combined have a production capacity of around 150 million diskdrives per quarter. This means on an annual basis a shortage of : – 600.000.000 = 614.285.714 HDD’s
So who says the HDD business isn’t a healthy one? 🙂

OK, I agree, not everything is stored on HDD and the offload to secondary media like DVD,BlueRay,tape etc will cut a significant piece out of this pie however the instantiation of new data will primarily be done on HDD’s. Adoption of newer, larger capacity HDD is restricted for enterprise use because the access density is getting too high which equates to higher latency and lower performance which is not acceptable in these kind of environments.

This means new techniques will need to be adopted in all areas. From a performance perspective a lot can be gained with SSD’s (Solid State Drives) which have extremely good read performance but still lack somewhat in write performance as well as long term reliability. I’m sure over time this will be resolved. SSD will however not fill the capacity gap needed to accommodate the data growth.

As mentioned before my view is that this gap can and will be filled by advanced 3D optical media which provides new levels of capacity, performance, reliability and cost savings.

I’m open for constructive comments.


The end of spinning disks

Did you ever wonder how long this industry will rely on spinning disk? I do and I think that within 5 to 10/15 years we’ve reached the end of the abilities of disks to keep up with demand and data growth ratios. A report from Andrey V Makarenko of Cornell University estimates that around 1700 Exabytes (yes EXA-bytes) will be generated in 2011 alone with growth rates to over 2500 EXAbytes next year.

With new technologies invented and implemented in science, space exploration, health care and last but not least consumer electronics this growth ratio will increase exponentially. Although disk drive technology has kept pretty much pace with Moore’s law you can see the advances in development of this technology is declining. Rotational speed has been steady for years and the edges of perpendicular recording have almost been reached. This means that within the foreseeable future there will be a flipping point were demand will outgrow the capacity. Even if production facilities would be increased to keep up with demand, do we as society want to have these massive infrastructures which are very expensive to build and maintain as well as having a huge burden on our environment. So were does this leave us, do we have to stop generating data or generate it in a far more efficient way or should we also combine this with aggressive data life cycle management. I wrote an article earlier in this blog which shows how this could be achieved and it doesn’t take a scientist to understand it.
To go back to the subject there are talks that SSD will take over a significant amount of magnetic based drives and maybe it is so however it still lacks on reliability in one form or another. I’m sure this will be resolved in the not so distant future however will this technology be as cost effective as spinning disks have been in the last decades. I think this will take a significant amount of time to reach that point. So where do we go from here? It is my take that in addition to the uptake of SSD based drives significant advances will be made in 3D optical storage. This will not only allow for massive increase in capacity per cubic inch but also a reduction in cost, energy as well as a massive increase in performance.
Advancements in laser technology and photonic behavior as well as optical media will clear the pathway of adoption into data-centers the moment this will become commercially attractive.

There are numerous scientific studies as well as commercial entities working on this type of technology and due to market demand add significant pressure on the development of it. Check out this wikipedia article on 3D optical storage to get some more information around the technicalities.

Let me know your opinion.

Erwin van Londen