What is data architecture? It’s not a simple task to define data architecture, partly because the field is changing so much. Data architecture is a subset of enterprise architecture. Data architecture is part of enterprise architecture, along with the other three architectures: business, technical (computer and machine), and application (software).
Data architecture is one of the undercurrents of data engineering life cycle, according to Joe Reis and Matt Housley in their book Fundamentals of Data Engineering, 2022, O’Reilly. Their definition of data architecture can be found on page 77 of the book, which is “Data architecture is the design of systems to support the evolving data needs of an enterprise, achieved by flexible and reversable decisions reached through a careful evaluation of tradeoffs.”
What are the data requirements and how will these requirements be met? Data architecture is therefore divided into the what and how. The what is the operational architecture and the how is the technical architecture. To determine the operational architecture, you may have a business analyst determine this or a data engineer or a data architect. Either way, they will need to work with management.
What makes a good data architecture? Whatever decisions are made, we must always remember that data architecture should always be monitored and re-evaluated. It’s always a work-in-progress. We could look to Amazon’s Well-Architected Framework. It has six pillars. Google Cloud’s Five Principles for Cloud-Native Architecture has five pillars. Joe Reis and Matt Housley in their book have nine pillars.
What are some types of data architecture? There is the data warehouse, and the data lake.
After you decide on your data architecture, you will choose your technologies. There are a lot of different things to think about and its no easy task. Assess trade-offs and make reversable decisions.