there are choices to be made more sticky than plaster.
Go read the whole poem. It’s brilliant.
From Server Huggers to Cloud Addicts:
Once upon a time, IT needed to convince application owners to virtualize their servers instead of hanging on to a dedicated physical server. Now, many IT shops have won that battle, with Virtual Machines (VMs) becoming the de facto standard, and old “server hugger” application owners increasingly sold on the benefits of server virtualization.
With the availability of new IT infrastructures and cloud services that are faster than ever before, a new set of expectations around speed, agility, and time to market have been established. Today’s “cloud addict” application owners expect instant provisioning of compute, storage, and network resources, and business managers increasingly cringe at the possibility of infrastructure constraints.
From Server Huggers to Cloud Addicts This thing I wrote came out decently. It’s a PDF. The rest talks about how the network gets in the way of making this shift work.
Also, this (mp3) classic rant was an inspiration.
A server with 16 SFF HDDs can give you 16 TB of storage today; 32 TB in the future, probably matched by 32 cores at that time. The IO bandwidth, even with those 16 disks, will be a tenth of what you get from ten such servers. It’s the IO bandwidth that was a driver for MapReduce -as the original Google paper points out. Observing in a 2012 paper that IO bandwidth was lagging DRAM isn’t new -that’s a complaint going back to the late 1980s.
If you want great IO bandwidth from HDDs, you need lots of them in parallel. RAID-5 filesystems with striped storage deliver this at a price; HDFS delivers it at a tangibly lower price. As the cost of SDDs falls, when they get integrated into the motherboards, you’ll get something with better bandwidth and latency numbers (I’m ignoring wear levelling here, and hoping that at the right price point SSD could be used for cold data as well as warm data). SSD at the price/GB of today’s HDDs would let you store hundreds of TB in servers, transform the power budget of a Hadoop cluster, and make random access much less expensive. That could be a big change.
Even with SSDs, you need lots in parallel to get the bandwidth -more than a single server’s storage capacity. If, in, say 5-10 years you could get away with a “small” cluster with a few tens of machines, ideally SSD storage, and lots of DRAM per server, you’d get an interesting setup. A small enough set of machines that a failure would be significantly less likely, changing the policy needed to handle failures. Less demand for stateless operations and use of persistent storage to propagate results; more value in streaming across stages. Less bandwidth problems, especially with multiple 10GBe links to every node. Lots of DRAM and CPUs.
This would be a nice “Hadoop 3.x” cluster. Steve Loughran: Nobody ever got fired for using Hadoop on a cluster