Data File Management


Are you working with data files or data frames? How are they organized? Do you change the data files when you work with them or do you download updated versions of the data and run a process on the data each time it’s downloaded? What kind of folder structure are you going to use? Why? Are you going to use ReadMe.txt files to record metadata? Will any metadata be part of the file name? Do you have an agreed-upon file naming convention? Will your files start with the sortable date in the format of YYYY-MM-DD or YYYYMMDD? If you are working for a company or on a team, there are a lot of things to consider as the project(s) begin. What are the company policies? It’s worth it to save time when later on down the road you or someone else needs to look at the project again. That’s a lot of questions!

Here is a YouTube video on this topic. It’s called Data – Episode 1 – Data Management.

File Naming

Here are four principles of file naming: machine-readable, human-readable, sorts well, and consistent. Avoid spaces, punctuation characters, and accented characters. Use underscores and dashes to separate words. This post is called Data File Management, but the WordPress slug is using dashes like this: data-file-management. If I had many versions of this file, it might be helpful to use the data at the beginning of the file like this: 2023-03-11_data-file-management. Instead of 2023-03-11 you could use 20230311. File names should be short but descriptive (<25 characters) (Briney, 2015). Use capitals and underscores instead of periods or spaces or slashes. Write down naming convention in data management plan.

Folder Structure

Hierarchical file structures can add additional organization to your files. As with file naming use whatever makes most sense for your data. For example, the top folder could be the name of the project. Above that, there might be a folder called Projects. On my computer I have a folder for projects (called Portfolio) and under that I have the type of project, such as R, Excel, SQL and so on.

Leave a Reply