This was a very interesting debate among some very heavy hitters who operate data centers about where the bottlenecks are in the data centers, and if the new model of massively distributed computing in one centralized data center is a sustainable model.
Storewidth is analogous to Bandwidth. Storage capacity is not the problem, it is the bandwidth to the storage, between the storage, and onto the storage that is the problem.
I was pleased and surprised by the depth of the technical debate.
It is clear that opportunities exist for companies to optimize hardware components for new distributed computing architectures. My prediction: Google (GOOG) will fund startup Memory and Disk companies to supply what they need.
(Italics are my comments. Plain Text is my transcription) If folks have changes/corrections by all means add them into the comments.
Panelists:
Stahlman: Google is a digital services company. Put $2000 price target on Google earlier this year. Feels that centralizing computing on the network is the future.
Luiz Barroso: Responds to question if his company is worth $2000 a share. “I am very passionate and knowledgeable about some things but the stock market is not one of them.” Good for him… not an appropriate question.
Patterson: Things started to shift to this massively distributed model around 2001 when Google illustrated what could be done with racks and rack of cheap PCs. Google builds their own PCs from scratch and optimizes the hardware design for reliability cost and low power.
Gilder: This approach has worked for search applications, will it work going forward? Good Question… how do you know new apps won’t break this model?
…
Gerasoulis: Power is now the barrier cost to data centers, not computing bandwidth.
Patterson: Future is 380V DC power straight into the PC. Get rid of AC power requirements.
Stahlman: Google will eventually open up their platform to other businesses. Salesforce.com allows third parties to write plugins to their application. Google will do the same thing for consumer applications. (Isn’t Yahoo already doing this? I think this is already happening? What would you call a Google Maps mashup?)
Innovation in drives is not in the right direction for use in data centers. Density keeps going up but speed does not, so in high speed applications, full utilization cannot be achieved. Seek times are a big problem, so big datacenters focus by distributing utilization. Failure rates and mechanisms are not well understood, and it is possible we could be using less backups (Google backs up in triplicate in some situations) if the mechanism of failure is better understood. NAND flash could start bridging the gap. (Intel announced some new Flash technology that would hit this market head on here.)
Audience Question: RAM Disks based on flash would only accelerate by a factor of 2 because the OS is so poorly optimized for accessing fast storage. This problem needs to be solved.
Coates: Problem exists in server interconnect. Attempts have been made (Infiniband) but the issue is always that not enough volume exists to drive the cost down low enough to be better than a big distributed architecture using off the shelf components. There just are not that many customers for this type of equipment.
Gilder: Is it possible that these massive data centers being built next to hydroelectric dams are going to be obsolete in the near future? Good Question…. will this capex yield out.
Unknown Panelist: One solution that would be disruptive is to put portion of Google search index on every device and distribute 80% of the requests. Send out updates via FedEx (Or peer to peer internetworking)
Question for Stahlman: Is Sun (SUNW) going to be acquired by Google? Answer: Anything is possible. Dell (DELL) can no longer continue as the company they are. Without R&D they are hopeless. They will merge with an EMC (EMC) or Sun. (I totally disagree with this. Dell is the ultimate platform company.)
This post is one of a series as I blog the Gilder Telecosm 2006 conference. All posts can be found by searching for ‘Telecosm 2006’.
“Patterson: Future is 380V DC power straight into the PC. Get rid of AC power requirements.”
“Audience Question: RAM Disks based on flash would only accelerate by a factor of 2 because the OS is so poorly optimized for accessing fast storage. This problem needs to be solved.”
“Unknown Panelist: One solution that would be disruptive is to put portion of Google search index on every device and distribute 80% of the requests.”
Maybe you shouldn’t read too much into these statement?
I’m just transcribing what is said… not agreeing/disagreeing.
Fair enough, I’ll put it more bluntly,
“380v DC man”, thinks DC to DC conversions are better than AC to DC conversions. Not an electrical engineer, he’s simply heard what the Google engineer said recently and extrapolated.
“RAM Disks based on flash”. Hasn’t even read a Wikipedia article on Flash before saying this.
“Google search index guy”, will soon be studying long multiplication at MBA school to try to comprehend what 1TB of data is.
So I disagree to the point of being patronising about it.
On Ramdisks… if you could get better write and read times and resolve the block access problems, maybe this works?
On search indexing – perhaps you don’t send all of the data, maybe just the most used 5%.
“if you could get better write and read times and resolve the block access problems, maybe this works?”
and the rest of the problems? For example the duty cycle is only 1 million at best, commodity cheap flash dies after as little as 100k rewrite cycles, so it would die quickly if used as a disk. The only good property of flash is it’s non-volatile. I don’t see why you would do that, even assuming you could fix some/all of the problems, you could get non-volatile simply by saving the data before the power is switched off…
On the other hand if someone could invent a non-volatile memory with the same capabilities as DRAM, they’d make a killing, it wouldn’t be anything like Flash is today though.
“perhaps you don’t send all of the data, maybe just the most used 5%.”
I’m kindof estimating 20TB of web data, with 5% = 1TB so we agree on the 5%. Even then you miss the parallel performance they get from 200-500 boxes running each query in parallel.