This dataset is shipped with the ggplot2 package in the tidyverse in R. If you haven’t already installed the tidyverse package, go ahead and do so. You only need to do this one time in RStudio. Thereafter, just load the library. It contains measurements on 10 different variables (like price, color, clarity, etc.) for 53,940 different diamonds.
In RStudio, you can load the dataset with the commands library(ggplot2) and library(diamonds) followed by data(diamonds). The diamonds dataset has 53,930 rows.
In RStudio you can get Help on the dataset with the command ?diamonds. As the most popular diamond shape, the Round Cut Diamond represents a large majority of all diamonds sold. In fact, data from diamond vendors suggests that around two thirds of consumers opt for the round cut when choosing a diamond. They’re cut with 58 facets that all reflect light back to your eyes.
This is what you will see in the console of RStudio.
> head(diamonds) # A tibble: 6 × 10 carat cut color clarity depth table price x y z <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
Here is what summary returns at the console.
> summary(diamonds) carat cut color clarity depth Min. :0.2000 Fair : 1610 D: 6775 SI1 :13065 Min. :43.00 1st Qu.:0.4000 Good : 4906 E: 9797 VS2 :12258 1st Qu.:61.00 Median :0.7000 Very Good:12082 F: 9542 SI2 : 9194 Median :61.80 Mean :0.7979 Premium :13791 G:11292 VS1 : 8171 Mean :61.75 3rd Qu.:1.0400 Ideal :21551 H: 8304 VVS2 : 5066 3rd Qu.:62.50 Max. :5.0100 I: 5422 VVS1 : 3655 Max. :79.00 J: 2808 (Other): 2531 table price x y z Min. :43.00 Min. : 326 Min. : 0.000 Min. : 0.000 Min. : 0.000 1st Qu.:56.00 1st Qu.: 950 1st Qu.: 4.710 1st Qu.: 4.720 1st Qu.: 2.910 Median :57.00 Median : 2401 Median : 5.700 Median : 5.710 Median : 3.530 Mean :57.46 Mean : 3933 Mean : 5.731 Mean : 5.735 Mean : 3.539 3rd Qu.:59.00 3rd Qu.: 5324 3rd Qu.: 6.540 3rd Qu.: 6.540 3rd Qu.: 4.040 Max. :95.00 Max. :18823 Max. :10.740 Max. :58.900 Max. :31.800
Let’s use the str() function. This is the structure function.
> str(diamonds) tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame) $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ... $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ... $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ... $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ... $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ... $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ... $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ... $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ... $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ... $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
Python
Are you working in Python instead of R Language? Do you want to work on this dataset in Python?