March 16, 2014

What is a distributed system?

There are numerous definitions for distributed systems based on the architecture of the system (client-server, peer-to-peer) and goals for the system (high scalability, availability).

This post provides a contemporary (2014) definition for distributed systems by classifying them based on the functions they perform.

The definition is as follows -
"Any distributed system must perform one or more of these three functions - distributed computing, distributed service, and distributed storage".

1. Distributed computing: In this system, computations - such as analysis, processing, and simulations - are partitioned and distributed across multiple machines for execution.

The distribution of computations enables realization of physical concurrency, overcomes limitations of resource capacities, lowers impact of failures, and moves computations to where data is stored.

Examples of distributed computing systems include Storm, Work Queue, and SETI@home.

2. Distributed service: The system serves requests for access to a service from its clients. The services are either provided by a single machine chosen by the system or by a group of machines acting in conjunction.

Distributed service systems provide highly available and reliable delivery of the advertised services to its clients and users.

Examples of distributed service systems include world wide web (example of one machine chosen to satisfy a request), and Domain Name Systems (DNS) (example of group of machines working together to provide service).

3. Distributed storage: The system stores data across a group of machines or storage nodes. The data is in the form of files, documents, or objects.

Distributed storage systems provide one or more of the following (note there are trade-offs between each of them): high availability, cost-efficiency, reliability, robustness, consistency, performance, and scalability (often in the size of data stored).

Examples of distributed storage systems include Cassandra, Hypertable, MongoDB, and Memcached.

The current majority of distributed systems are distributed storage systems. As a result, distributed systems in current usage typically refers to distributed storage systems.

Finally, a distributed system can also perform more than one of these functions. A good example is Hadoop that includes the HDFS storage system and offers an integrated framework for distributed computing and storage.

1 comment:

  1. Hi Dinesh,

    The reason for this note is to introduce myself to you, and to make you aware of a role we have to fill. Our client is a technology company that has been successfully applying their disciplined, process-driven investment trading strategies for 15 years. These strategies are based on statistical models developed using rigorous mathematical analysis. They are recruiting for a software engineer to build high performance research systems that will detect anomalies in massive datasets i.e. Big Data. Candidates will likely hold a degree in Computer Science or Electrical Engineering, and the desire to build highly complex systems capable of executing extremely high levels of computation.

    Would you like to learn more about this opportunity? Kindly let me know your thoughts.

    Best Regards,

    Brian McNeill
    Alpha Advisors
    (914) 584-9471
    bmcneill@alphaadvisorsllc.com
    http://www.alphaadvisorsllc.com/
    http://www.linkedin.com/pub/brian-mcneill/43/a4b/343/en

    ReplyDelete