January 13, 2012

Elastic Applications in the Cloud

Parallel- and super-computing gained popularity over the last two decades due to the advent of faster processors, memory, and networking hardware. A need to run large scale high-performance applications was typically satisfied by procuring or gaining access to such high-end hardware. 

However, the later part of the last decade gave birth to cloud computing that leveraged the long existing paradigm known as distributed computing. Cloud computing has continued to gain popularity due to its on-demand resource allocation and usage based pricing model. Its growth has also been helped by several organizations (Amazon, Google, Microsoft, and more) offering cloud platforms for public use.

Cloud computing also presents an excellent alternative to running high-performance computations. However, to successfully and efficiently harness the scale of resources available in and across multiple cloud computing platforms requires fault-tolerance. In addition, it also requires run-time adaptability to available resources. That is, applications must be able to harness resources as they become available and adapt to resource losses and failures during run-time. Such applications that dynamically adapt to resource availability are termed elastic. The elasticity of applications also provides better fault-tolerance and scalability.

When elastic applications become portable across multiple platforms, their scalability becomes bounded only by the cost of running resources.

So, how to build or convert parallel applications to portable elastic applications? Use a framework like Work Queue. Work Queue abstracts the underlying distributed execution environment, and provides an interface for deploying and running applications in the cloud. Example and details can be found here

Work Queue by adhering to the design guidelines of cloud computing abstractions and frameworks presented in this paper, allows applications deployed through it to be elastic and executable across multiple cloud platforms simultaneously. As a result, such applications exhibit excellent scalability which is often required in high-performance computations to gain valuable scientific insights.