Big Data, Bad Analogies

Data lakes, data exhaust, web scale, data is the new oil. Vendors are throwing new terms and analogies at us to convince us to buy their products as the market around data technologies grows. We change data persistence and transaction layers because “databases don’t scale” or because data is “unstructured”. If data had no structure then it wouldn’t be data, it would be noise. Schema on read, schema on write, schemaless databases; they imply structure underlying the data. All data has schema, but that word may not mean what you think it means.

This presentation describes concepts of data storage and retrieval from technology prehistory (i.e. before the 1980s) and examines the design principles behind both old and new technology for managing data because sometimes post-relational is actually pre-relational. It is important to separate what is identical to things that were tried in the past from new twists on old topics that deliver new capabilities.

Directly related to these topics are performance, scalability and the realities of what organizations do with data over time. All of these topics should guide architecture decisions to avoid the trap of creating technical debts that must be paid later, after systems are in place and change is difficult.

Video producer: