The Palmer Penguins Dataset


Palmer Penguins

The Palmer Penguins dataset is a sample dataset for use in R language and Python. The palmer penguin data has lots of information about three penguin species in the Palmer Archipelago, including size measurements, clutch sizes, and blood isotope ratios. There are 344 rows and 8 columns.

An archipelago, is a chain, cluster, or collection of islands, or sometimes a sea containing a small number of scattered islands. The Palmer Archipeligo is in Antarctica, not too far from the southern parts of Chili and Argentina in South America.

The Palmer Penguins dataset is considered to be an alternative to Anderson’s Iris dataset.

Installing Palmer Penguins in RStudio

In the RStudio console you can run these two commands to install and load palmer penguins dataset. You can run these two commands from the Console.

install.packages('palmerpenguins')
library('palmerpenguins')

If you have already installed it, you just need to load it with the library function. After that you can take a look at the data.

data(penguins)
View(penguins)

Installing Palmer Penguins in Python

When I’m writing Python code locally on my computer, I’m using Anaconda Navigator to do that. In a new project just need to install the library and load the dataset, as follows.

import seaborn as sns
# Load the palmer penguins dataset
penguins = sns.load_dataset("penguins")

Of course you will likely be loading more than these two lines above for your project. You’ll want pandas.

More Information

For more information, check out the palmer penguins site. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network. The palmerpenguins package contains two datasets. One is called penguins, and is a simplified version of the raw data. The second dataset is penguins_raw, and contains all the variables and original names as downloaded

Leave a Reply