Data Science Introduction


In the simplest terms, data science is a way to dig out knowledge from data, either structured or unstructured, using scientific techniques and algorithms.

What is data science? Wikipedia says: “Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data.” Chikio Hayashi says that Data science is a “concept to unify statistics, data analysis, informatics, and their related methods” in order to “understand and analyse actual phenomena” with data. Because data has grown so much in recent years, we need people to specialize in the various aspects of data.

Among others, data science uses techniques and knowledge from the following disciplines:

  • Mathematics and Statistics (Machine Learning and Artificial Intelligence)
  • Business (Accounting, Marketing, Finance, Production, Logistics) Data Analytics, Business Intelligence
  • Computer Science (Artificial Intelligence & Machine Learning – ML is a subset of AI)

Data Science is an umbrella term that includes machine learning, statistics, and analytics. Data science is creating new ways of modeling and understanding the unknown by using raw data. Data scientists create new questions. Whereas Data Analysts create answers to existing questions by creating insights from data sources.

With the advent of artificial intelligence (AI) and machine learning, the term “data science” gained popularity among the tech-savvy. So what are machine learning and artificial intelligence (AI)? Artificial intelligence is the development of computer systems able to perform tasks that normally require human intelligence. What is data mining?

How Do I Fit In?

Ask yourself how many decisions you are expecting to make. If there are a few decisions you want to make, under uncertainty, after looking at the data, then you are in the discipline of statistics. Statisticians are very careful not to lead stakeholders astray. Machine learning and AI involves making many many decisions under uncertainty. Machine learning is all about performance and the challenge. What if you don’t know what decisions you are looking for or how many decisions you need to make? What if you’re not even sure if the data contains anything interesting or insightful at all? That’s data analytics. Analytics is about how quickly you can explore a data landscape and find the gems or the deposit of gems or metals. Analytics is about looking into a vast amount of unknown data and creatively trying to find something interesting.

Watch the video by Cassie that’s called Dimensions of data analytics. Are you wondering which area of data science to specialize in? Cassie suggests first considering your personality. Have a read again in the paragraph above to learn what types of personalities and interests fit with each area of data science.

Learning More About Data Science

Below is a modified diagram from the book Data Science and Big Data Analytics by EMC Education Services, page 21.

What are some of the top programming languages for data science? They are Python, R, Java, Scala, SQL, Java, and Julia.

Here’s an article called: Which Programming Language Is Better: R, Scala, or Python?

What are the different data science roles? Here’s a YouTube video by Ken Jee called Different Data Science Roles Explained (by a Data Scientist).

Here’s an article at statisticsglobe.com called The Best Resources for Statistics, Data Science & Programming.

Here’s an article at Medium called Career Paths Within Data Science.

Books

Here is a video called Best Data Science Books for Beginners by Thu Vu data analytics. Here is another one called I Analyzed 1000 Data Science Books on Amazon: Here’s What I Found.