“Ahoy hoy and welcome to the newly minted Twilio engineering blog! We the Twilio engineering team will be sharing some of the unique challenges we face bridging the 100-year-old world of realtime telecom with the world of HTTP and the web. Using cloud infrastructure to implement a communications platform has required us to build a highly automated, self-healing distributed platform that can be deployed across thousands of servers.”—
Cloud services are everywhere, particularly in the home. This week I decided to give it a try at going full cloud on entertainment: I bought the latest Sony Blue-ray BDP-S780 which comes with a boat load of online streaming services.
My selection criteria was based on the following services:
It could have been the fault of a telecom vendor, an app vendor, a customer who mis-configured a database or application, or something that went wrong in any of the thousands of bits of IT in any data center – each of which has a tiny bit of evil at its core. That evil lets it pretend to be working perfectly until even a slight stutter would cause a big problem, then explode dramatically.
The central skill of data-center gurus is not in computing; it’s in disaster prevention.
Power failure, environmental disasters, hardware failures, software failures, sabotage, telecom problems, security problems, environmental problems, zombie apocalypses, onslaughts of fully authenticated BYOT devices – there are backup systems, redundancies and preventative measures for all of them.
There are a bunch of tweets from Dan, but these two were the news and the interesting. Interesting guy, interesting product and team. I suspect they would have done better in some sort of VC incubation Big Co partnership.
“Loura let workers choose between an iPhone or Android or Window Phone 7-powered smartphone. The company has issued 2,000 smartphones, 92% of which are iPhones. About 6% of the smartphones chosen were Android-based while 2% were Windows Phone 7 devices.”—
Ralph Loura, CIO at Clorox, let employees choose their phone. The results are, um, interesting.
“Some people think that the potential adoption of OpenFlow API will magically materialize open-source software to control the OpenFlow switches, breaking the bonds of proprietary networking solutions. In reality, the companies that invested heavily in networking software (Cisco, Juniper, HP and a few others) might be the big winners … if they figure out fast enough that they should morph into software-focused companies.”—Cisco IOS Hints and Tricks: OpenFlow: BIOS does not a server make
“The notion that there is a sea change in switch buying appears flawed based on market-share data that show 59% of the 200 million managed switch ports sold in 2010 were feature-rich (priced 2x higher than value products), which is up from 51% in 2005. Five years of share gains by Cisco’s and Juniper’s feature-rich products is validation that price alone is not a long-term differentiator.”—
Right - or at least almost right. I would argue that Huawei’s success was not based purely on price, but on price AND their ability to throw lots of people at the problem when things didn’t go exactly to plan, they were also willing to take deals on financial terms unacceptable by other vendors (which is price) The degree to which that model is sustainable in creating competitive advantage for the customer is still questionable.
In the enterprise market and the reference in this case is on HP, where again, services is a big play (can you say channel risk) price alone has turned the needle for some segments of the market, especially where the network is not as critical to business operations. That said, there is enough change in user behavior (static to mobile) and the IT stack in the datacenter, that you can win on price by also providing a better network experience, but if you simply compete/copy on price, the risk of change is too high and favors the incumbent.
“So after a lot of thought, we’ve decided to bid for Nortel’s patent portfolio in the company’s bankruptcy auction. Today, Nortel selected our bid as the “stalking-horse bid,” which is the starting point against which others will bid prior to the auction. If successful, we hope this portfolio will not only create a disincentive for others to sue Google, but also help us, our partners and the open source community—which is integrally involved in projects like Android and Chrome—continue to innovate. In the absence of meaningful reform, we believe it’s the best long-term solution for Google, our users and our partners.”—
Google takes another potential step in becoming an even bigger force in communications, LTE, and more. Avaya bid $475 and then bought Nortel enterprise coms for $900m after outbidding Siemens. I suspect there may be a few interested parties bidding on the patent portfolio - Cisco, RIM, Nokia, Ericsson, heck, even Apple and Microsoft could end up at the party. Fascinating.
“As noted in End User Prediction #1, above, the number of applications companies will be running is going to explode. Operations practices appropriate for one scale of application numbers will fall over when confronted with ten times as many applications. It’s unclear how this will turn out, but it’s very clear that existing operations practices will be stressed as never before.”—Cloud Computing: 2011 Predictions CIO.com
First, our engineers extended many of Twitter’s core systems to replicate Tweets to multiple data centers. Simultaneously, our operations engineers divided into new teams and built new processes and software to allow us to qualify, burn-in, deploy, tear-down and monitor the thousands of servers, routers, and switches that are required to build out and operate Twitter. With hardware at a second data center in place, we moved some of our non-runtime systems there – giving us headroom to stay ahead of tweet growth. This second data center also served as a staging laboratory for our replication and migration strategies. Simultaneously, we prepped a third larger data center as our final nesting ground.
Next, we set out rewiring the rocket mid-flight by writing Tweets to both our primary data center and the second data center. Once we proved our replication strategy worked, we built out the full Twitter stack, and copied all 20TB of Tweets, from @jack’s first to @honeybadger’s latest Tweet to the second data center. Once all the data was in place we began serving live traffic from the second data center for end-to-end testing and to continue to shed load from our primary data center. Confident that our strategy for replicating Twitter was solid, we moved on to the final leg of the migration, building out and moving all of Twitter from the first and second data centers to the final nesting grounds. This essentially required us to move much of Twitter two times.