Pandas Assign to Add a Column


This entry is part 2 of 3 in the series Pandas EDA Discovery

How do you add a column to a pandas DataFrame using assign?

The documentation for assign is called pandas.DataFrame.assign. There are a couple of examples here when you have a temperature column in Celcius and you want to add a column that’s in Fahrenheit. The formula is F = C 5/9 + 32. Assuming your DataFrame is called df, here below is an example. The new column will be called temp_f. It’s values are based on the existing column temp_c and a formula.

df.assign(temp_f = df['temp_c'] * 9 / 5 + 32)

The syntax is not complex. Let’s create a new function that does absolutely nothing, just for illustration.

def do_nothing(my_string):
    return my_string
df.assign(new_col = do_nothing('abc'))

The above code adds a new column to the DataFrame (called df). The column is called new_col. What’s in the now column? Every row has the three characters ‘abc’.

We can add more than one column with assign(). The following code works.

df = df.assign(ab = '1', dc = '2')

Without assign

We can create a new column without using assign. In this example we’ll divide one column by another.

df['new_col'] = df[col1] / df[col2]

We should also round the result. In this case we’ll do this again.

df['new_col'] = round(df[col1] / df[col2],3)

Let’s do some comparisons.

SQL

How do you add a column in SQL?

ALTER TABLE table_name ADD column_name datatype;

After creating the column you could use UPDATE … SET to fill the column with some data. Hare’s a post on this website called SQL Add Column.

DAX

If you are working with DAX, we have a post here at this website called Calculated Columns Introduction.

Tableau

In the visualization software Tableau, adding a column is easy. They are called calculated fields. Calculated fields reference different fields (columns) in your data. For example, if you had a column called Sales and a column called Profit, you could create a calculated field called ‘Profit Ratio’ by dividing Profit by Sales. How do you accomplish this? Click the drop-down arrow at the top of the Data pane (on the left). Select Create Calculated Field. A window pops up for you to name and type your calculation

Series Navigation<< EDA Discovering with PandasEDA Discovering with Visuals >>

Leave a Reply