EDA Validating with Python


Exploratory Data Analysis (EDA) has six main practices. The six main practices of EDA are discovering, structuring, cleaning, joining, validating and presenting. EDA is exploratory data analysis. Data professionals, when working with datasets, will check to make sure that the data they are using is error-free.

Validating refers to the process of verifying that the data is consistent and high quality. For example there could be misspellings of column names or misspellings in the data itself. Sometimes dates are not consistent in their format. Sometimes a column of numbers will appear to be integers, but also have floating point numbers, or even actually be text.

When you are looking at the data, pay special attention to the data types. In pandas you can use info(). For example, sometimes date information comes in as a string and appears as an “object”.

We have a post here called Cleaning Mixed Data Types that illustrated how you can deal with the object data type in pandas.

Leave a Reply