- Multiple Genres in Genres Column
- T-SQL Parse a String
- T-SQL Parse a String 2
- T-SQL Count Character Occurrences
This post discusses the code for a movie database that has a table with a list of genres in one of the columns. By list, I mean a string of comma-separated values. We really shouldn’t have a list inside a column in a database table. It’s great that the data exists, and we can be thankful for that, but there is a better way. To make the database more “normalized”, we should create several columns in the table to hold the movie genres. After that, we should actually create a new and separate table that holds all of the genres. This table will have a link to the Movies table so that each movie may have zero, one, or many rows in the Genres table. The Genres table has a foreign key that points back to the Movies table. That’s our end goal. So we have a three-step process.
This post features the function PARSENAME.
- Move the listed data over into new columns
- Move that data over to a new linked table
- Clean up the tables
-- ============================================================================== -- Author: Mike -- Create date: Nov 10, 2021 -- Description: Build an algorithm for splitting a comma-separated list of -- genres in a column into multiple columns and then create another -- table with a foreign key to hold the list of genres for each movie. -- ================================================================================ CREATE PROCEDURE [dbo].[BuildGenresNormalized] -- Add the parameters for the stored procedure here AS BEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET NOCOUNT ON; IF OBJECT_ID(N'[dbo].[MovieGenres]', N'U') IS NOT NULL DROP TABLE [dbo].[MovieGenres]; IF OBJECT_ID(N'[dbo].[Movies]', N'U') IS NOT NULL DROP TABLE [dbo].[Movies]; CREATE TABLE dbo.Movies( MovieId INT IDENTITY PRIMARY KEY , Title NVARCHAR(200) , Genres NVARCHAR(200) ); CREATE TABLE dbo.MovieGenres( MovieGenreId INT IDENTITY PRIMARY KEY , MovieId INT NOT NULL , Genre NVARCHAR(100) , FOREIGN KEY (MovieId) REFERENCES Movies(MovieId) ); INSERT INTO dbo.Movies(Title, Genres) VALUES('Lord of the Rings','Adventure,Fantasy'); INSERT INTO dbo.Movies(Title, Genres) VALUES('To Kill a Mockingbird','Drama'); INSERT INTO dbo.Movies(Title, Genres) VALUES('Airplane!','Comedy'); INSERT INTO dbo.Movies(Title, Genres) VALUES('Avengers','Adventure,Sci-Fi'); INSERT INTO dbo.Movies(Title) VALUES('Unknown Movie'); SELECT * FROM dbo.Movies; ALTER TABLE Movies ADD Genre1 NVARCHAR(100) NULL, Genre2 NVARCHAR(100) NULL, Genre3 NVARCHAR(100) NULL, Genre4 NVARCHAR(100) NULL; UPDATE dbo.Movies SET Genre1 = REVERSE(PARSENAME(REPLACE(REVERSE(Genres), ',', '.'), 1)); UPDATE dbo.Movies SET Genre2 = REVERSE(PARSENAME(REPLACE(REVERSE(Genres), ',', '.'), 2)); UPDATE dbo.Movies SET Genre3 = REVERSE(PARSENAME(REPLACE(REVERSE(Genres), ',', '.'), 3)); UPDATE dbo.Movies SET Genre4 = REVERSE(PARSENAME(REPLACE(REVERSE(Genres), ',', '.'), 4)); SELECT * FROM dbo.Movies; INSERT INTO dbo.MovieGenres (MovieId, Genre) SELECT MovieId, Genre1 FROM dbo.Movies WHERE Genre1 IS NOT NULL -- INSERT INTO dbo.MovieGenres (MovieId, Genre) SELECT MovieId, Genre2 FROM dbo.Movies WHERE Genre2 IS NOT NULL INSERT INTO dbo.MovieGenres (MovieId, Genre) SELECT MovieId, Genre3 FROM dbo.Movies WHERE Genre3 IS NOT NULL INSERT INTO dbo.MovieGenres (MovieId, Genre) SELECT MovieId, Genre4 FROM dbo.Movies WHERE Genre4 IS NOT NULL SELECT * FROM dbo.MovieGenres SELECT dbo.Movies.Title, dbo.MovieGenres.Genre FROM dbo.Movies LEFT JOIN dbo.MovieGenres ON dbo.Movies.MovieId = dbo.MovieGenres.MovieId ORDER BY dbo.Movies.Title; END
Here below is the SSMS output after you create the stored procedure and run it.
PARSENAME
Parsename only works up to 4 parts. In this example, we have at most 4 genres in our list. What if you have 5 or more? Have a look at the post T-SQL Parse a String.