Grid Computing

Amazon EC2

Amazon EC2 heralds the arrival of computing-as-a -utility-service. Unlike previous efforts by Sun, IBM, Nortel, and others, Amazon has hit on the right offering in features, pricing, and provisioning. While there will no doubt be plenty of copy-cats, Amazon has such a head start that the community that builds up will give them a big advantage.

I expect Amazon will become the eBay of utility computing service. That is somewhat surprising considering how Google is seen as the leader in networked computing technology (See also How Google Works), but they seriously lag in turning that into innovative products.

Other folks think EC2 is hot too:

Carr on Grid
Reviews article in Grid Today which discusses business model issues in utility/elastic computing.


Applications I'd like to see available on-demand @ EC2:

Mark Logic Content Server - Enterprise Edition

Mathematica Personal Grid Edition

Nutch Appliance
Devloper API provisioning and metering service.
Article on API metering

Freeswitch VOIP appliance is EC2 compatible.

Apple XGrid

One of my bright ideas is a Dashboard widget to dial up EC2 instances that are then accessible via XGrid.

Integration with OS X Server Xgrid controllers is a natural for EC2 also.

Xgrid for non-Mac

Xgrid works for any platform.

Java Toolkits

Apache Hadoop

Spawned from Lucene Nutch (wiki).

Globus Toolkit

Formerly proprietary clustering solution, now OSS with many features.


Weka-Parallel - parallel processing for Weka.

Grid Weka - grid computing with Weka.

weka4WS - distributed data mining.

Essence is a clustered/shared collections implementation for Java.

A nice idea, it would probably be more interesting implemented using Globus Toolkit, or even JINI/JavaSpaces/Blitz rather than JCache.


A comprehensive bioinformatics workflow management tool.
A turn-key grid service running Taverna.
Sharing myGrid workflows.

"Firefish is a Peer-to-Peer Grid service infrastructure for access to dynamic data featuring ease-of-use, interoperability, scalability and performance."

is grid toolkit with load-sensitive provisioning support.


Managment GUI

Distributed Shells

Our Scripting Future

I don't agree with the anti-Java rant (nor obviously do the Lucene/Nutch/Hadoop] folks. Of course Java's start-up time is definitely an issue for scripts running on JIT-compiled Java engines, but the solution isn't to not use Java but to use precompilers for the engines. By the CTO of

Data Grid

Data mining operations (as opposed to compute-centered stuff like rendering and proofs) also have big data needs.

The Alexa Web Search platform (which is also an Amazon service, as is The Internet Archive) has a nice crawl of the web and also offers grid computing services for it. But their pricing is 10x that of EC2.

EC2 Web/Data Co-op

In the event that Amazon doesn't make Alexa data available at EC2 prices, then a great solution is to implement a shared web caching proxy using S3 for EC2. It would work as a web search/data mining co-op that reduced everyone's Internet transfer costs by sharing the S3 costs for the caching proxy. The co-op would also share the storage costs for big data sets like Google's trillion trigrams, USPTO, US Census Bureau, GIS, etc.


Danny Hillis know a thing or two about grid computing.

Elastic Compute Services

Elastic Live
Uses Enomalism for provisioning.
Distributed Potential
Uses Elastic Live and is priced at $0.50 per CPU/hour.

JBoss on EC2


"Akamai Gains Traction with Web Application Acceleration Service", eWeek. Minimum commitment is $10,000 per month.

Akamai Edge service with IBM WebSpehere

IBM Deep Computing on Demand

Google Big Data transfer

Google is working with scientific users to schlep big data sets around, and they're interested in making that data available publicly.


Beowulf on EC2
Red Hat Cluster Suite
Presentations from 2007 Xen Summit


Using ZoneEdit with EC2

Cool Apps (SunGrid)

Is an OSS Java document indexer using Spring.

Grid Watch articles