Data Analytics/Analysis Life Cycle


Data undergoes various stages throughout its life, during its creation, testing, processing, consumption, and reuse. The Data Analytics Lifecycle maps out these stages for professionals working on data analytics projects. It’s a process. These phases are arranged in a circular structure that forms a Data Analytics Lifecycle. Data analysis is modifying, processing, and cleaning raw data to obtain useful, significant information that supports business decision-making. The following steps are from Dell EMC. This model was created by David Dietrich. A similar diagram appears on page 29 of the book Data Science and Big Data Analytics by EMC Education Services, published by John Wiley and sons, 2015. You always have the freedom to move back to a previous step or steps and make changes, and in fact you are encouraged to do so.

Data Analytics/Analysis Life Cycle

Key Concepts of the Six Phases

Discovery. team, business domain, past similar projects? business objectives, business opportunities, business problems/pains, resources, project management, skills of the team, and forming hypotheses.

Data Preparation. create an analytic sandbox, extract transform load (ETL), extract load transform (ELT), ETLT, become very familiar with the data, document the data, sort, filter.

Model Planning. methods, techniques & workflow for the model-building phase, data exploration, data relationships, important variables, structured data, unstructured data, and suitable models, you might use R, SSAS, SAS, Python, or others.

Model Building. develop datasets for testing, training & production purposes, build models, evaluate existing tools & recommend different tools as needed (such as large and faster hardware or different software).

Communicate Results. team in collaboration with major stakeholders determine if the project was a success or failure based on the objectives of phase 1, express key findings, business value, narratives, lessons learned, data visualization, present findings (perhaps as a PowerPoint) to stakeholders.

Operationalize. team delivers final reports, code, technical documentation, readmes, and perhaps a pilot project for the production environment and the phases start all over again.

Data Analysis – A Google Perspective

Here is another perspective on the data analysis process. It’s from the Google course Data Analytics on Coursera. Google calls this process the Data Analysis process, whereas David Dietrich (see above) calls it the data analytics process. The Four phases: organize, format and adjust, get input from others, and the fourth is transform.

The data analysis process is not the data life cycle. It’s the process of analyzing data.

  1. Askquestions, define the problem, communicate, SWOT (strengths, weaknesses, opportunities, and threats), structured thinking, ask who what when where how and why of the data, what metrics will you use to measure your data to achieve your objective, who are the stakeholders, understand stakeholder’s expectation and how does this affect your analysis process and presentation; the five whys.
  2. Preparedata, gather, collect, import, surveys, questionnaires, data location, data organization, data formats, sort the data, filter the data, unbiased, credible, organize, protect, timeline, what metrics to measure, locate data, protect data (security), what data is needed to solve the problem/objective? the limitations of data licensing, privacy, and accessibility, deliverable: a full description of the data.
  3. Process – software tools such as Python, R, SQL, Excel, Tableau and so on, transform data, test data, clean data, document data-cleaning processes, change log, combine data from multiple sources, spreadsheets, SQL, is it unbiased? communicate the results. Power Query, Power Query transformations, data errors, outliers, missing data, correct data types in columns, minimum, maximum, mean, median, other statistics, hypothesis testing, and margin of error, connecting business objectives to data analysis
  4. Analyze – tools, format data, transform, long vs wide data, patterns, calculations, minimum, maximum, standard deviation, combine data from multiple sources, predictions, trends, recommendations, data-driven decisions, Pivot Tables, Power Pivot, data validation, temporary tables, sorting, filtering, correlations. The 4 phases of analysis: organize data, format and adjust data, get input from others, and transform data by observing relationships between data points and making calculations.
  5. Sharedata visualization, tables, charts, graphs, tell the story, Tableau, who is your audience, accessibility, communicate, tie back to the original problem or opportunity, the McCandless method,
  6. Act – apply insights, validate the findings, devise a plan to solve the problem, solve the problem, decisions, innovate, ask new questions

The SAS data analytics lifecycle is very similar but it has 7 steps. It adds a seventh step called Evaluate. The Evaluate step is designed to help analysts evaluate their solutions and potentially return to the ask phase again.

There are different ways to do things but the same core ideas still appear in each model of the process.

PACE – Another Google Perspective

In Google’s Advanced Data Analytics course on Coursera, they introduced another model called PACE framework. PACE is an acronym; each one of the letters represents an actionable stage in a project: Plan, Analyze, Construct, and Execute.

Key Roles

Who is involved in the data analytics project? What are their job titles? It’s true that many roles overlap and a particular task may be done by one or the other role. Sometimes the analyst works with the data engineer to acquire and take care of the raw data. We have three roles in the middle and some other roles at the beginning and the end. In some instances, the data engineer works with the database administrator (DBA) to manage the raw data, particularly when we are working with big data. Big data will require the skills of a DBA. Here are a few key roles that work with or near the data analyst.

  • Data AnalystSQL, spreadsheets, databases, database queries, data visualization, might work as a business intelligence team creating those dashboards. They work downstream from data engineers.
  • Data Engineer – data analyst will work with a data engineer to turn that raw data into actionable pipelines. Data engineers move data from point A to point B and keep it safe.
  • Data Scientist – actually turn the clean actionable data into really cool machine learning models or statistical inferences that go “beyond” imagination, invent new tools, program, optimize, advanced statistics, make predictions
  • Database Administrator (DBA) – aka Data Specialists – provisions and configures the database environment to support the needs of the team; organize data, security, access, maintenance, indexes, backup, designing data models, scaleability, disaster recovery, large volumes of data, big data, data lakes
  • Machine Learning Analyst
  • Site-Reliability Engineer (SRE)
  • DevOps Specialist

Large projects will have other roles. Smaller projects will have these roles but they may be accomplished by fewer people. These other roles are:

Project Management

Have a look at our post on Project Management for more information.

Machine Learning Workflow

Machine Learning has a workflow. Have a look at our post called Machine Learning Workflow.

Data Engineering Life Cycle

Contrast the Data Engineering Life Cycle. This lifecycle happens before the data analytics lifecycle and serves data to the data analysts and data scientists. Data engineering sit upstream from data analytics.

Leave a Reply