The rise of "Big Data" and the increasing need to store terabytes or petabytes of data across many machines and multiple data centers has lead to the proliferation of large-scale data storage systems. This project investigated the guarantees provided by these systems, looking specifically at Cassandra as an example of a popular system of this type, and examined approaches for achieving full consistency in such systems.
I considered four Big Data use-patterns: (1) write once/read many, (2) simple key-value updates, (3) compound key-value updates, and (4) database transactions. Each use-pattern places distinct requirements on the data storage system. In order to use Big Data effectively, it is necessary to select or design a system that provides the appropriate guarantees for the specific use case. In this project, I examined existing solutions for the first two uses cases and evaluated the applicability of consistent replication techniques to use cases three and four. While replication engines are a possible approach to solving the problem of compound updates, more general agreement protocols are necessary to implement distributed transactions.