Indexing in pandas is an important topic that you need to know, as a data analyst or data scientist. Be aware that the index does not have to be unique.
Let’s learn with an example. Here below is a subset of the dataset used in the video mentioned below. If you want to follow along, you could create an empty csv file and paste the data below into it, save it on your local computer and import it into Jupyter Notebook and follow along. Click on the small icon in the upper right corner of the text below to open a popup window.
Country,Capital,Continent,2022 Population Afghanistan,Kabul,Asia,41128771 Albania,Tirana,Europe,2842321 Algeria,Algiers,Africa,44903225 Yemen,Sanaa,Asia,33696614 Canada,Ottawa,North America,38454327 United States,"Washington, D.C.",North America,338289857 Philippines,Manila,Asia,115559009
I will just show some code here that you can use in your projects. The first thing to note is that indexing happens automatically. It starts at zero and increases by one for each row.
We can set the index when we import it as shown below. Here our variable is dfi, not df.
Of course, we can set the index to Country after we import the data into our variable called df.
# If we didn't set the index when we imported it with read_csv, we could do it now. # inplace=True will "save" it to df. Otherwise we could use another variable and # write df_country = df.set_index('Country') df.set_index('Country', inplace = True) df
If we have an index of country we can use loc like this: df.loc[‘Canada’]. Let’s reset the index on df, as shown below.
# Let's reset the index df.reset_index(inplace=True) # inplace=True ensures that its saved back to df df
Multi-index
We could have two indexes. In the data we have a “category’ called. We have ‘Continent’. Multi-indexing is beyond the scope of this particular post.
Learn with YouTube
Here is a video on YouTube by Alex the Analyst called Indexes in Pandas | Python Pandas Tutorials. You can follow along with the data file by downloading it and reading it in.