The Transaction Manifesto: A Direction for NoSQL Database Technology
Every application that needs to support simultaneous clients (concurrency) should be built using transactions with ACID properties. As the NoSQL databases technology matures, transactions will play an increasingly prominent role. In this article, David Rosenthal and Stephen Pimentel explains that transactions are the simplest and strongest programming model available to handle concurrency.
What is a Transaction?
A transaction is a set of database reads and writes that is handled as a unit with a few crucial properties. First, all reads in the transaction see the same snapshot of the database (e.g. do not see changes to the data resulting from other transactions executing concurrently). Second, the writes within a transaction either all succeed or all fail. (Failure might be caused, for example, by a connection loss.) Lastly, after a transaction succeeds (commits), the writes are permanently stored. These properties of transactions (called Atomicity, Consistency, Isolation, and Durability) are the fundamental “ACID” guarantees. We at FoundationDB think that support for ACID transactions is much more than a nice extra feature; transactions are critical for building robust systems in an efficient, simple way.
Everyone needs transactions
Every application that needs to support simultaneous clients (concurrency) should be built using transactions with ACID properties. Transactions are the simplest and strongest programming model available to handle concurrency and will play an increasingly prominent role as NoSQL technology matures.
A common misconception is that transactions are useful only for e-commerce or banking, which deal with financial transactions. However, the power of transactions comes from their engineering impact on application building, not from the details of any particular application. Because of the ACID properties, transactions can be composed to create new abstractions, support more efficient data structures, and enforce integrity constraints. As a result, building applications with transactions is easier, more reliable, and more extensible than any of the alternatives.
It may seem that the design tradeoffs for your system compel you to give up the advantages of transactions to gain speed, scalability, and fault tolerance. Systems built this way are usually fragile, difficult to manage, and often nearly impossible to adapt to changing business needs. The costs of foregoing transactions are rarely worth the benefits, especially if an alternative can provide transactional integrity at scale.
Transactions make concurrency simple
Concurrency arises whenever multiple clients, users, or parts of an application read and write the same data at the same time. Transactions make managing concurrency simple for developers. The main property of transactions that achieves this simplicity is isolation, the “I” in ACID. When a system guarantees that transactions are fully isolated (known as “serializability”), developers can treat each transaction as if it were executed sequentially, even though it may actually be executed concurrently. The burden of reasoning about potential interactions between operations from separate transactions goes away.
Slightly better than nothing: local transactions
Some systems offer ACID transactions for a limited set of predefined operations, typically restricted by the structure of the data model. For example, a document database may allow a single document to be updated in a transaction. A graph database may allow a relationship between two nodes to be added or dropped, updating the data elements for all three entities in a transaction.
However, the real power of transactions comes when they can be defined by the application developer over any set of data elements. An application developer working with a key-value store should be able to define transactions that read and write any number of key-value pairs. When developers can freely define transactions without arbitrary restrictions, they can use transactions as fundamental building blocks for an application.
Transactions enable abstraction
Application-defined transactions are composable. Isolation, in combination with atomicity (the “A” in ACID), ensures that the execution of one transaction can’t affect the visible behavior of another. This guarantee makes transactions composable with one another, allowing them to be assembled to build new abstractions. Abstractions can be encapsulated in layers. For example, a common abstraction is to maintain an index along with the primary data for to enable quickly finding data items matching some constraint. In any key-value store, this functionality can easily be implemented by storing a second copy of the data with the desired index field as the key. However, concurrent updates to the data potentially complicate this simple design. Without transactions it is difficult to ensure that, as the data changes, both the data and the index are updated consistently. With transactions, an indexing layer can update both the data and the index in a single transaction, guaranteeing their consistency and allowing a strong abstraction.
Transactions allow layers to build abstractions simply and efficiently, providing a highly extensible capability to support multiple data models. Data models optimized for graphs, hierarchical documents, column-oriented data, or relational data can all be implemented in layers on top of an ordered key-value store. In most of these cases, a single data object in the higher-level model will map onto multiple key-value pairs. Transactions make it straightforward to reliably implement these mappings by wrapping multiple key-value updates in atomic units.
Transactions are not as expensive as you think
You may be reluctant to employ transactions due to a perception that they impose serious technical tradeoffs, particularly for the kinds of high performance applications targeted by NoSQL databases. However, when you examine the tradeoffs more closely, the cost to the user is quite low. (The cost to the database engineer is quite high: distributed transactional systems are difficult to build!)
Performance and Scalability?
We know of no practical limits to the scalability or performance of systems supporting transactions. When the movement toward NoSQL databases began, early systems, such as Google BigTable, adopted minimal designs tightly focused on the scalability and performance. Features familiar from relational databases had been aggressively shed and the supposition was that the abandoned features were unnecessary or even harmful for scalability and performance goals.
Those suppositions were wrong. It is becoming clear that supporting transactions is a matter of engineering effort, not a fundamental tradeoff in the design space. Algorithms for maintaining transactional integrity can be distributed and scale out like many other problems. Transactional integrity does come at a CPU cost, but in our experience that cost is less than 10% of total system CPU. This is a small price to pay for transactional integrity and can easily be made up elsewhere.
Transactions guarantee the durability of writes (the ‘D’ in ACID). This guarantee comes with some increase in write latency. Durability means that committed writes stay committed, even in the face of subsequent hardware failures. As such, durability is an important component of fault tolerance. NoSQL systems that don’t support durability are necessarily weaker in regard to fault tolerance. Because of the importance of true fault tolerance, the write latency required for durable transactions is usually worth the cost. For those applications that have a hard requirement to minimize write latency, durability can be turned off without sacrificing the “ACI” properties.
Transactions are the future of NoSQL
As NoSQL databases become more broadly used for a wide variety of purposes, more applications built on them employ non-trivial concurrency from multiple clients. Without adequate concurrency control, all the traditional problems of concurrency re-emerge and create a significant burden for application developers. ACID transactions simplify concurrency for developers by providing serializable operations that can be composed to properly engineer application software. If you’re building an application that needs to be scalable and you don’t have transactions, you will eventually be burned. Fortunately, the scalability, fault-tolerance, and performance of NoSQL databases are still achievable with transactions. The choice to use transactions is ultimately not a matter of fundamental tradeoffs but of sound engineering. As the technology matures, transactions will form a foundational capability for future NoSQL databases.