Open Data


This entry is part 2 of 4 in the series Big Data

What is open data? In data analytics, open data is part of data ethics, which has to do with using data ethically. Openness refers to free access, usage, and sharing of data. But for data to be considered open, it has to:

  • Be available and accessible to the public as a complete dataset
  • Be provided under terms that allow it to be reused and redistributed
  • Allow universal participation so that anyone can use, reuse, and redistribute the data

There is a global movement that believes the openness of data can transform society and how decisions are made. Tim Berners-Lee believes in open data. When referring to data, openness refers to free access, usage and sharing of data. Data may be open but it doesn’t mean we ignore the other aspects of data ethics. We should still be transparent, respect privacy, and make sure we have consent for data that’s owned by others.

Open data must be available as a whole, preferably by downloading over the Internet in a convenient and modifiable form. Another standard surrounds reuse and redistribution. Open data must be provided under terms that allow reuse and redistribution including the ability to use it with other datasets. The last area is universal participation. Everyone must be able to use, reuse, and redistribute the data. You cannot restrict the data to a specific government, company or industry.

Interoperability is the ability of data systems and services to openly connect and share data. Interoperability allows your doctor is able to send your prescription directly to your pharmacy.

Internet Sources of Open Data

US Government Data. Data.gov is the home of the U.S. Government’s open data. You will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more.

US Census Bureau.

Open Data Network.

Google Cloud Public Datasets.

Dataset Search.

OpenML.org

Kaggle.com

Monthly Retail Trade Report: Retail and Food Services Sales: Excel (1992–present) by the United States Census Bureau.

Here is a YouTube video called Best FREE Datasets | Open-Source data for machine learning projects by Joe James. Here’s another video called Where to get FREE Datasets to practice Data Analytics by Chandoo.

BigQuery has 150+ public datasets you can access and use.

Public Health Datasets

Global Health Observatory data: You can search for datasets from this page or explore featured data collections from the World Health Organization.

The Cancer Imaging Archive (TCIA) dataset: Just like the earlier dataset, this data is hosted by the Google Cloud Public Datasets and can be uploaded to BigQuery.

1000 Genomes: This is another dataset from the Google Cloud Public resources that can be uploaded to BigQuery.

Public Climate Datasets

National Climatic Data Center: The NCDC Quick Links page has a selection of datasets you can explore.

NOAA Public Dataset Gallery: The NOAA Public Dataset Gallery contains a searchable collection of public datasets. Despite the cooling influence of La Niña, 2022 was the sixth-warmest year on record.

Public Social-Political Datasets

UNICEF State of the World’s Children: This dataset from UNICEF includes a collection of tables that can be downloaded.

CPS Labor Force Statistics: This page contains links to several available datasets that you can explore. The Current Population Survey (CPS) is a monthly survey of households conducted by the Bureau of Census for the Bureau of Labor Statistics. It provides a comprehensive body of data on the labor force, employment, unemployment, persons not in the labor force, hours of work, earnings, and other demographic and labor force characteristics.

The Stanford Open Policing Project: This dataset can be downloaded as a .CSV file for your own use. On a typical day in the United States, police officers make more than 50,000 traffic stops. Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country.

UN Data. Here is some United Nations data. For example, there is a dataset that includes population, surface area and density of each country. It has a male and female breakdown.

Learn with YouTube

10 Free Dataset Resources for Your Next Project!

Best Places to Find Datasets for Your Projects. Kaggle, Google Dataset Search, FiveThirtyEight, data.gov, github, data.nasa.gov.

Series Navigation<< Big Data IntroductionMap Reduce >>

Leave a Reply