Data Transformation


What is data transformation? In the context of data engineering, it is part of the data engineering life cycle. In the transformation stage, the data is changed. The purpose of the change is so that it becomes more useful for those users that are downstream from the data engineer. Data analysts and machine learning engineers need the data to be ready and in a useful format for reports, dashboards and models.

One of the first things to check the data formats of the data. For example, you may need to convert strings to numeric data or convert strings to a data-time format. You might need to remove rows with incomplete data. There are several other things you could do to transform the data. Data analysts

There are two ways you can transform the data, either in batch or stream. Actually, almost all data starts off in a continuous stream, and batch is just a way to process a stream.

Before building data systems we should be modelling the data and looking at normalizing the data. Typically the data is considered to be normalized if it’s in third normal form, at least.

In the book Fundamentals of Data Engineering by Joe Reis and Matt Housley, O’Reilly, 2022, the authors in Chapter 8 discuss queries, data modeling, transformations, whom you’ll work with, and undercurrents.

Leave a Reply