One of the most important considerations in working with and presenting data is the data format, also known as the data set. A proper data set is rectangular in form. It has rows and columns. The top row has a unique column header. This is also known as a field name or attribute. There are no gaps or missing columns or missing rows. Each column has one specific data type and never contains a mixed set of different types. Each row, except for the header row, can also be called an observation. A proper data set does not contain any subtotals or grand totals.
Here is an article over at tibco.com that’s called What is transactional data?
Wide and Long Data
Data sets are either “wide” or “long”. Long data is data where each row contains a single data point for a particular item. Long data often has a lot of rows. Wide data often has a lot of columns. Generally, long data is easier to work with and analyze. Wide data is data where each row contains multiple data points for the particular items identified in the columns.
It is possible to transform the data from one format to the other. For example, in Excel’s Power Query you can use the pivot or unpivot commands. To go from wide to long you can use unpivot, as shown in our post called Unpivot in Power Query.
Dates
There is more than one correct way to have dates represented in your data set. You could have the date in a column or you could have the date split into three columns: year, month and day. Both are okay, but the date in one column might work better.
Chris Duttom from Maven has an example of using Excel’s Power Query to transform a small data set to get it in a proper form. It’s froma course at Udemy called Microsoft Excel: Business Intelligence w/ Power Query & DAX and it’s video number 23. The example below is in “wide” format, not “long.
The first post in this series is called An Introduction to Data.