The Diamonds dataset comes with seaborn, which is a Python library.
Below is some Python code you can use to work with this dataset. It’s purpose is for data professionals to practice working with datasets. Below is the data dictionary.
- price – price in US dollars ($326–$18,823)
- carat – weight of the diamond (0.2–5.01)
- cut – quality of the cut (Fair, Good, Very Good, Premium, Ideal)
- color – diamond colour, from D (best) to J (worst)
- clarity – a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
- x – length in mm (0–10.74)
- y – width in mm (0–58.9)
- z – depth in mm (0–31.8)
You could use feature engineering to create a coupe of columns, as listed below.
- depth – total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)
- table – width of top of diamond relative to widest point (43–95)
Let’s get into it by writing some code in Python, in Jupyter notebook.
# libraries import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Load the diamond dataset into a Pandas dataframe df = sns.load_dataset('diamonds') df.head(15).T # dot T will transpose the data.
df.info()