Pandas is a Python library that’s used to work with data sets and analyze data. It has functions for analyzing, cleaning, exploring, and manipulating data. The name “Pandas” has a reference to both “Panel Data”, and “Python Data Analysis” and was created by Wes McKinney in 2008, and emerged in 2010. Wes McKinney is the author of the O’Reilly book called Python for Data Analysis Data Wrangling with Pandas, NumPy and IPython. Note that pandas is built on top of NumPy.
With Pandas we can analyze big data and make conclusions based on statistical theories. Pandas can clean messy data sets, and make them readable and relevant. Pandas can find correlations, averages, minimums and maximums and more. Pandas can delete rows of a data set.
For more information you can look at the w3schools.com tutorial on Pandas.
Here below is an example of using pandas. pandas is typically aliased as pd.
import pandas as pd df = pd.read_csv('my_data.csv') print(df.to_string())
Here below is some example code from w3schools.com.
import pandas mydataset = { 'cars': ["BMW", "Volvo", "Ford"], 'passings': [3, 7, 2] } myvar = pandas.DataFrame(mydataset) print(myvar)
You can check the pandas version with the code below.
import pandas as pd print(pd.__version__)
A pandas series is similar to a column in a table. It is a one-dimensional array holding data of a particular data type. You use the pd.Series() function for this.
import pandas as pd a = [9, 5, 7, 2] myvar = pd.Series(a) print(myvar)
If nothing else is specified, the values are labeled with their index number. First value has index 0, second value has index 1 etc.
This label can be used to access a specified value.
Learning with YouTube
Here is a series of tutorial videos on pandas with Alex the Analyst. It’s called Pandas for Beginners.