How do you delete a column in a pandas DataFrame? In other words, you want to take a subset of you data by removing at least one column. Perhaps you are doing some “feature engineering” and you have decided that there is a column in your dataset that is not needed any longer. For example, you might want a time duration column that’s based on a start and end column. After creating the duration column you might decide to drop the two columns that it’s based on. If the column will not contribute to the model you are building, you might drop it. Are there are several columns you want to delete? It’s easy.
First, how do you simply get a list of the column names? Simply do this: dataframename.columns.
I will run some Python example code in Jupyter Notebook. I will manually create a DataFrame.
import pandas as pd # import the pandas library into Python data = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rowan'], 'amount': [12, 67, 33, 41], 'color': ['Blue','Pink','Red','Green']} df = pd.DataFrame(data)
Let’s drop a column.
# Optionally, save a copy of the original df df2 = df.copy() # drop the amount column df2 = df2.drop('amount', axis=1) df2
In the above code example we could have used this [‘amount’] instead of ‘amount’.
Drop Multiple Columns
How would you drop two columns? You use a list. If you have many columns to drop you could create a list before running the drop() function. It just might be more convenient to do so. After you drop the columns you might want to run the info() function to look at things.
# drop two columns df3 = df.copy() df3 = df3.drop(['amount', 'color'], axis=1) df3
SQL
How do you drop a column in SQL? What is the syntax?
ALTER TABLE myTable DROP COLUMN myColumn;
Another way in Pandas
In pandas, another way to do this is to select the columns you need and assign those to the DataFrame. You use the double brackets. It’s best to think of the inner brackets as a python list. Here is some very simple code you can copy and prove it to yourself. We’ll first manually create a DataFrame.
import pandas as pd # import the pandas library into Python data = {'firstname': ['Bob', 'Sally', 'Suzie', 'Rohan'], 'amount': [12, 67, 33, 41], 'color': ['Blue','Pink','Red','Green']} df = pd.DataFrame(data) df
Now we can specify the columns we need to work with and change our DataFrame.
df = df[["firstname","amount"]] df
List
The inner bracket set is actually a list. Here is some code you can try to prove that. This is using our original DataFrame, df.
subset = ["firstname","color"] df_1 = df[subset] df_1