Kaggle Introduction


What is Kaggle? Wikipedia says: “Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges”. Kaggle is an online community of people who are passionate about data.

Kaggle.com began in the year 2010. On 8 March 2017, Google announced that they were acquiring Kaggle.

Kaggle has tens of thousands of datasets available for public use. Anyone can upload a dataset to Kaggle. If they choose to make it public, other Kagglers can use that dataset to create their own projects.

You need to create an account at Kaggle.com to be able to use it.

The Data Explorer

Once you are logged in to Kaggle, click on the Data icon in the Navigation bar on the left. This takes you to the Datasets home page. From here, you can create a new dataset or search for datasets created by other Kagglers.

Sample Datasets

If you find something you are interested in you can create a Notebook, download them, or check them out in the Data Explorer. Here are a few I’ve found: Classic Rock Top 500 Songs, IMDb Top 100 Movies and NBA games data. You will find other datasets of interest. Any notebooks that you create on Kaggle are private by default. Like in datasets, you can add collaborators as viewers or editors. You can also make a notebook public, which will share it with the entire Kaggle community.

Create a Notebook

You can create a Notebook from the datasets that you are interested in. To link a dataset to a Kaggle notebook, you click on the New Notebook button in the dataset header. This will create a notebook in your Kaggle account that links to the dataset.

Privacy

You can upload your own datasets and keep them private. This means that they are visible and accessible only by you. You also have the option to add collaborators to your dataset, whom you can add as viewers or editors. Viewers are able to see your private dataset and editors are able to make changes to your private dataset. You can share the link to your private dataset so anyone with the link is able to view it. If you have a private dataset on Kaggle and you choose to make it public, you will not be able to make the dataset private again. The only option you would have is to delete the dataset from Kaggle completely.

Any notebooks that you create on Kaggle are private by default. Like in datasets, you can add collaborators as viewers or editors. You can also make a notebook public, which will share it with the entire Kaggle community. If you add collaborators to your Kaggle notebook, they can make changes to it. You want to make sure you communicate and coordinate with your collaborators because the last person who saves the notebook will overwrite all of the previous work. If you’d like more fine-grained control of changes to your code, a system like GitHub provides more version control.

Forbidden

If Kaggle.com is not allowing you to access even their home page, it might be because you are using a secure VPN connection that comes with Norton.

CareerCon

Kaggle’s CareerCon resources are for anyone interested in a data analyst career. Kaggle’s CareerCon is an annual and free digital event whose aim is to help new data analysts land their first job in the field. Recorded sessions from CareerCon offer firsthand knowledge and advice from data analysts and hiring managers through seminars, coding workshops, and resume advice. Although the resources offered are aimed at data scientists, the principles and guidelines are still similar to what data analysts can expect on their career journey. Here’s a series of YouTube videos.

Here is an article at Kaggle called Kaggle Kernels Guide for Beginners — Step by Step Tutorial. Here’s an article called Publishing your first dataset on Kaggle.

Learn from Books

Have a look at Google Books for a few chapters, the beginning ones , of the book called The Kaggle Book Data analysis and machine learning for competitive data science.

Leave a Reply