T-SQL Parsing Into Many Columns


This entry is part 1 of 7 in the series SQL Server Parsing Many Columns

This post is about working with one column in a SQL Server table that contains a comma-separated list of text values that we would like to “normalize” by creating another table that will hold that list. We need to create this new table. Why would we want to spend the time doing this? We end up with more efficient queries against the database, as well as more update flexibility. Also, we would reduce the change of errors. For example, if we were querying for the string “apple” in our example below, we might also get back “pineapple”, which we didn’t want. Why would we not just Google the solution to this and follow the algorithm we find? One reason is that the algorithm assumes that the comma-separated list you are working with has a maximum of four parts to it. The reason for this is because they suggest you use the PARSENAME function and that’s it. That function only works for strings that are at most four parts.

A List of Fruit

Here is a screenshot of what the table looks like after running the script shown below. It’s what we are starting with. This is fruit but it could be employees with a list of skills, movies that fall under a list of genres and so on.

Let’s work with an example. I created a database with SQL Server Management Studio (SSMS) called FruitParse. Let’s set up a small table with some data. You can run this script on your non-production server. Warning: if a table called ParseFruit already exists in the current database (very unlikely), it will first be deleted before it is created.

IF OBJECT_ID(N'[dbo].[ParseFruit]', N'U') IS NOT NULL
	DROP TABLE [dbo].[ParseFruit];

CREATE TABLE [dbo].[ParseFruit](
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[Name] [nvarchar](50) NOT NULL,
	[ListFruit] [nvarchar](100) NULL,
 CONSTRAINT [PK_ParseFruit] PRIMARY KEY CLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[ParseFruit] ON 
GO
INSERT [dbo].[ParseFruit] ([Id], [Name], [ListFruit]) VALUES (1, N'Bob', N'apple')
GO
INSERT [dbo].[ParseFruit] ([Id], [Name], [ListFruit]) VALUES (2, N'Sally', N'pear,apple,pomegranate')
GO
INSERT [dbo].[ParseFruit] ([Id], [Name], [ListFruit]) VALUES (3, N'Samantha', N'cherry,peach,strawberry,prune')
GO
INSERT [dbo].[ParseFruit] ([Id], [Name], [ListFruit]) VALUES (4, N'William', N'raspberry,mango,fig,cantaloupe,apple,banana')
GO
INSERT [dbo].[ParseFruit] ([Id], [Name], [ListFruit]) VALUES (5, N'Linda', N'blueberry,blackberry,tangarine,watermellon,plantain,persimmon,pineapple')
GO
INSERT [dbo].[ParseFruit] ([Id], [Name], [ListFruit]) VALUES (6, N'Susan', NULL)
GO
INSERT [dbo].[ParseFruit] ([Id], [Name], [ListFruit]) VALUES (7, N'Ron', N'peach,blackberry,orange,blueberry,pear,passion fruit,dragonfruit,tamarind,kiwi')
GO
SET IDENTITY_INSERT [dbo].[ParseFruit] OFF
GO

Algorithm

In order to acomplish this, there are a few steps we need to do inorder to make it happen.

  1. Creata a column to hold the fruit frequency, of data type int and call it FruitFreq.
  2. Populate that column by using the example code in the post T-SQL Count Character Occurrences.
  3. From the maximum number of parts, how many groups of 4 do we need? Create the columns calling them Group1, Group2, and so on.
  4. Copy the CSV data over from the original column to the Group columns, using the frequency column as a guide.
  5. For each of the Group columns copy the data over to the individual columns.
  6. Create the columns to hold one fruit each and call them, in our case, fruit1, fruit2, fruit3, and so on up to the maximum. Allow NULLs.
  7. Copy the data from our Groups to our individual fruit columns with an UPDATE query using PARSENAME and REVERSE that we saw in our previous post called Multiple Genres in Genres Column.
  8. Create our new Fruit table. It will have a foreign key to our original table and the name of our fruit. It should also have a primary key which may simply be an Identity column.
  9. For each of the fruit columns (fruit1, fruit2 and so on) update (copy) the fruits into our new Fruits table. Write an update query that does each friut column one at a time. In the new table, include the primary key from our original table.
  10. Do any necessary clean up of the original table.

This article is presented in a series of posts.

Series NavigationT-SQL Parsing To Get Frequency >>