- Big Data Introduction
- Open Data
- Map Reduce
- The Growth of Data
What is big data? We can now collect and analyze data in ways that were impossible even a few years ago. Today, we have much more data than ever before and our ability to store and analyze that data is constantly improving.
Big Data is sometimes described as having 3 Vs: volume, variety, and velocity. Although we might just think of volume when we hear the term big data, we must also remember variety and velocity. The forth V is value. Due to its size or structure, Big Data cannot be efficiently analyzed using only traditional databases or methods. We will need new data architectures and analytic sandboxes, new tools, new analytical methods, and an integration of multiple skills into the new role of the data scientist.
The rate of data creation is accelerating. What accounts for this? Social media and genetic sequencing are among the fastest-growing sources of Big Data. Another example comes from genomics, genetic sequencing and human genome mapping.
Big data can come in multiple forms, including structured and non-structured data such as financial data, text files, multimedia files, and genetic mappings. Contrary to much of the traditional data analysis performed by organizations, most of the Big Data is unstructured or semi-structured in nature, which requires different techniques and tools to process and analyze.
As a buzzword, big data gained popularity during the early 2000’s through the mid 2010’s. The big data engineer arrived, along with a bunch of data tools. Today, big data is moving fast and growing. The term big data is essentially a relic to describe a particular time. Today the data is big, but we now just call our big data engineers, just data engineers. As Joe Reis and Matt Housley say in their book Fundamentals of Data Engineering on page 10, “as data engineers historically tended to the low-level details of monolithic frameworks such as Hadoop, Spark or Informatica, the trend is moving toward decentralized, modularized , managed and highly abstracted tools.” We have a collection of off-the-shelf open source third-party products assembled to make life easier. Data keeps growing along with different data formats. Data engineering is about connecting various technologies. That is why the data engineer studied in their book is really more like a data lifecycle engineer.
Data engineers are now more and more concerned with the lifecycle and its undercurrents. They have better tools and techniques than ever. The skills of a data engineer are in demand. Please have a look at Luke’s job posting project at datanerd.tech. Data scientists want to analyze data and construct machine learning models. To this end, they spend too much time working at the bottom three levels of the data science hierarchy of needs.