Multiple Genres in Genres Column


This entry is part 1 of 4 in the series SQL Server Parsing

This post discusses the code for a movie database that has a table with a list of genres in one of the columns. By list, I mean a string of comma-separated values. We really shouldn’t have a list inside a column in a database table. It’s great that the data exists, and we can be thankful for that, but there is a better way. To make the database more “normalized”, we should create several columns in the table to hold the movie genres. After that, we should actually create a new and separate table that holds all of the genres. This table will have a link to the Movies table so that each movie may have zero, one, or many rows in the Genres table. The Genres table has a foreign key that points back to the Movies table. That’s our end goal. So we have a three-step process.

This post features the function PARSENAME.

  1. Move the listed data over into new columns
  2. Move that data over to a new linked table
  3. Clean up the tables
-- ==============================================================================
-- Author:		Mike
-- Create date: Nov 10, 2021
-- Description:	Build an algorithm for splitting a comma-separated list of
--              genres in a column into multiple columns and then create another 
--              table with a foreign key to hold the list of genres for each movie.
-- ================================================================================
CREATE PROCEDURE [dbo].[BuildGenresNormalized] 
	-- Add the parameters for the stored procedure here
AS
BEGIN
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;

	IF OBJECT_ID(N'[dbo].[MovieGenres]', N'U') IS NOT NULL
	DROP TABLE [dbo].[MovieGenres];

	IF OBJECT_ID(N'[dbo].[Movies]', N'U') IS NOT NULL
	DROP TABLE [dbo].[Movies];

	CREATE TABLE dbo.Movies(
		 MovieId INT IDENTITY PRIMARY KEY
	   , Title NVARCHAR(200)
	   , Genres NVARCHAR(200)
	   );
	
	CREATE TABLE dbo.MovieGenres(
	   MovieGenreId INT IDENTITY PRIMARY KEY
	   , MovieId INT NOT NULL
	   , Genre NVARCHAR(100)
	   , FOREIGN KEY (MovieId) REFERENCES Movies(MovieId)
	   );

	INSERT INTO dbo.Movies(Title, Genres) VALUES('Lord of the Rings','Adventure,Fantasy');
	INSERT INTO dbo.Movies(Title, Genres) VALUES('To Kill a Mockingbird','Drama');
	INSERT INTO dbo.Movies(Title, Genres) VALUES('Airplane!','Comedy');
	INSERT INTO dbo.Movies(Title, Genres) VALUES('Avengers','Adventure,Sci-Fi');
	INSERT INTO dbo.Movies(Title) VALUES('Unknown Movie');
	
	SELECT * FROM dbo.Movies;

	ALTER TABLE Movies ADD Genre1 NVARCHAR(100) NULL, Genre2 NVARCHAR(100) NULL, Genre3 NVARCHAR(100) NULL, Genre4 NVARCHAR(100) NULL;

	UPDATE dbo.Movies SET Genre1 = REVERSE(PARSENAME(REPLACE(REVERSE(Genres), ',', '.'), 1));
    UPDATE dbo.Movies SET Genre2 = REVERSE(PARSENAME(REPLACE(REVERSE(Genres), ',', '.'), 2));
    UPDATE dbo.Movies SET Genre3 = REVERSE(PARSENAME(REPLACE(REVERSE(Genres), ',', '.'), 3));
	UPDATE dbo.Movies SET Genre4 = REVERSE(PARSENAME(REPLACE(REVERSE(Genres), ',', '.'), 4));

	SELECT * FROM dbo.Movies;

	INSERT INTO dbo.MovieGenres (MovieId, Genre)
	SELECT MovieId, Genre1
	FROM dbo.Movies
	WHERE Genre1 IS NOT NULL
	-- 
    INSERT INTO dbo.MovieGenres (MovieId, Genre)
	SELECT MovieId, Genre2
	FROM dbo.Movies
	WHERE Genre2 IS NOT NULL
	
	INSERT INTO dbo.MovieGenres (MovieId, Genre)
	SELECT MovieId, Genre3
	FROM dbo.Movies
	WHERE Genre3 IS NOT NULL

	INSERT INTO dbo.MovieGenres (MovieId, Genre)
	SELECT MovieId, Genre4
	FROM dbo.Movies
	WHERE Genre4 IS NOT NULL

	SELECT * FROM dbo.MovieGenres

	SELECT dbo.Movies.Title, dbo.MovieGenres.Genre
	FROM dbo.Movies LEFT JOIN 
	dbo.MovieGenres ON
	dbo.Movies.MovieId = dbo.MovieGenres.MovieId
	ORDER BY dbo.Movies.Title;

END

Here below is the SSMS output after you create the stored procedure and run it.

Click to Enlarge

PARSENAME

Parsename only works up to 4 parts. In this example, we have at most 4 genres in our list. What if you have 5 or more? Have a look at the post T-SQL Parse a String.

Series NavigationT-SQL Parse a String >>