This post is based on a video on YouTube by Alex the Analyst. Thanks Alex.
Learn with YouTube
Alex the Analyst has a very good video on YouTube that describes some data cleaning tasks on a small dataset of people who have id numbers, first and last names, phone numbers, address and a couple more columns. There are only 20 rows so that we can see the entire dataset on one page. I like that. The video is called Data Cleaning in Pandas | Python Pandas Tutorials.
Other Videos
The video is called Data Analyst Portfolio Project | Data Cleaning in SQL | Project 3/4. This is project 3 of 4 projects. Where is the data? It’s over at Github and it’s Nashville Housing Data. The YouTube video has the link as well. The video is almost 55 minutes long.
For your information, Alex’s first project is called Data Analyst Portfolio Project | SQL Data Exploration | Project 1/4. It’s COVID data. The second one is called Data Analyst Portfolio Project | Tableau Visualization | Project 2/4. This second project is about Tableau (visualizations). The second one is also Covid data. The fourth project is about Python. Your project gets uploaded to GitHub. The dataset is a movie dataset from Kaggle.
There are a number of queries that Alex used in this video project number 3, that goes beyond the basic SELECT FROM WHERE ORDER BY query. What are they? He used CONVERT for the dates. He used a self-join and aliased the tables. UPDATE SET. SUBSTRING. CHARINDEX. Count. Removed duplicates using ROW_NUMBER PARTITION BY ORDER BY in a CTE. ALTER TABLE DROP COLUMN.
Some of these skills mentioned above are above Junior Data Analyst and would be mid-level Data Analyst. The mid-level Data Analyst will know SQL window functions, temporary tables, CTEs, stored procedures and things like these.