Data Storage


Data storage is the second stage in the data engineering life cycle. Storage is fundamental. Choosing the proper storage solutions depends on knowing how the data will be used and obtained.

The data engineer needs to be aware of the different hardware and software components of data storage to assess the trade-offs in storage architectures.

Data often passes through RAM, magnetic hard disk drives and sold state drives (SSD). Data passes through networks as well. You would also want to look at compression, serialization and caching. These are raw ingredients. Storage systems exist one level above these raw ingredients. Storage systems include RDBMS, Streaming storage, HDFS and others.

Above the storage systems we have storage abstractions. Storage abstractions, the top level, includes data lakes, data platforms, data lakehouses and cloud data warehouses.

Leave a Reply