Data Serving


Serving data is the final stage in the data engineering life cycle. In the context of data engineering, serving data is making data available to users. Three main types of users are data analysts, data engineers (models and machine learning) and reverse ETL. Data analysts will build reports, statistical analysis and dashboards. Business intelligence will be considered part of data analytics for this discussion. Data scientists will use the data to build machine learning models. Reverse ETL is the process of sending data back to the data sources.

As a data engineer, people need to trust the data you are providing. Data engineers would do well to go back and look at the data they are serving to ensure it is of the highest quality. One thing you can do is simply ask the stakeholders if they trust the data. Data validation is the process of analyzing the data to ensure that it accurately represents the truth. You can run queries and compare the result to other processes and queries.

What’s the objective of the data? Why does somebody need the data? What questions will the data hopefully answer? Who needs to know? What decisions will the data influence? When is the data needed? Data engineers aren’t expected to have all of these answers but if they can ask for the answers it would make their jobs easier. The requirements of the data are important. Perhaps the business has a business analyst that can help answer these questions, or the data engineer could approach senior management or a project manager for answers.

What interface will the data consumers be using? Will they be able to create and run their own reports (self-serve) or will they be asking someone else to do that job? Self-service data is not easy to implement.

Regarding the entities that the data represents, such as a “customer” or a “supplier”, its best practice to have formal data definitions that describe the data. Data logic specifies the formulas and mathematical calculations that derive the required metrics. Having formal documentation helps with data correctness, consistency, and trustworthiness.

Leave a Reply