MySQL search optimization

May 23, 2019

The goal of this post is to implement a search mechanism that searches for songs and artists in a playlist table, that has about 11 million entries. All data has been gathered from a third party application that provides a history of the songs played on Austrian radio stations. One entry consists of the play date, the title and the artist. Unfortunately the data has to be downloaded as an html file that can be converted into SQL statements later on but does not have any structure.

Thanks to @davidkroell for the help concerning the database optimization.

Storing the data

At the beginning all the data had to be stored in a MySQL database. Therefore 2 tables were created: radio_stations and songs.

Radio Stations:

The table radio_stations was created using the following SQL statement. The radio stations were inserted manually and contain and id, an abbreviation that is used when downloading new data from the third party application and a display name.

CREATE TABLE `radio_stations` (
    `id` INT AUTO_INCREMENT,
    `abbreviation` VARCHAR(20) NOT NULL,
    `name` VARCHAR(100) NOT NULL,
    CONSTRAINT pk_radio_stations PRIMARY KEY (id)
);
id abbreviation name
1 fm4 FM4
2 oe3 Hitradio Ö3
3 radio_bgl Radio Burgenland
4 radio_ktn Radio Kärnten
5 radio_noe Radio Niederösterreich
6 radio_ooe Radio Oberösterreich
7 radio_sbg Radio Salzburg
8 radio_stmk Radio Steiermark
9 radio_tirol Radio Tirol
10 radio_vbg Radio Vorarlberg
11 radio_wien Radio Wien

Songs:

During the first iteration the table songs had the following columns. This table has not been normalized because of the sloppy input data and also for benchmarks. All data has been stored by using generated SQL files from the downloaded HTML file. To convert the HTML into an SQL file a custom bash script was used that replaced the HTML syntax with SQL syntax by using sed.

CREATE TABLE `songs` (
    `id` INT AUTO_INCREMENT,
    `radio_station_id` INT NOT NULL,
    `play_date` DATETIME NOT NULL,
    `title` VARCHAR(255) NOT NULL,
    `artist` VARCHAR(255) NOT NULL,
    constraint pk_songs primary key (id),
    constraint fk_songs_radio_stations foreign key (radio_station_id)
    references radio_stations(id)
);
id radiostationid play_date title artist
1 1 2012-01-01 00:00:00 Title 01 Artist 01
2 1 2012-01-01 00:05:00 Title 02 Artist 02
3 1 2012-01-01 00:10:00 Title 03 Artist 03
4 1
5 1

Search

The first search implementation only allowed searching for song titles and not for artists. This has been realized using the following query (simple but inefficient).

SELECT title, artist, COUNT(*) as count FROM songs
WHERE radio_station_id = 1
AND title like "%test%"
GROUP BY title, artist, radio_station_id
ORDER BY count DESC;
title artist count
Whitest Boy On The Beach Fat White Family 95
The Greatest View Flume / Isabelle Manfredi 77
Test Little Dragon 60

Benchmarks:

Search text MySQL speed Go API speed
test 3094ms 3105ms
a 3109ms 3055ms
bc 2390ms 2491ms
yellow 1391ms 1682ms
monday 1329ms 1557ms

Optimization 01 - Cached table

The first optimization was a cache table that stores all grouped songs per radio station. Therefore the group by can be left away. To insert the grouped entries the following statement has been used:

-- CREATE TABLE
CREATE TABLE songs_per_radio_station (
    id INT AUTO_INCREMENT,
    title VARCHAR(255),
    artist VARCHAR(255),
    radio_station_id INT NOT NULL,
    count INT NOT NULL,
    CONSTRAINT pk_songs_per_radio_station PRIMARY KEY (id),
    CONSTRAINT fk_songs_per_radio_station_radio_station FOREIGN KEY (radio_station_id)
        REFERENCES radio_stations (id)
);

-- INSERT DATA
INSERT INTO songs_per_radio_station
SELECT null as id, title, artist, radio_station_id, count(*) as count
FROM songs
GROUP BY title, artist, radio_station_id
ORDER BY radio_station_id asc, count desc;

Afterwards the following statement can be used to search for a title:

SELECT * FROM songs_per_radio_station
WHERE radio_station_id = 1
AND title like "%test%"
ORDER BY count DESC;
Search text old MySQL speed new MySQL speed old Go API speed new Go API speed
test 3094ms 110ms 3105ms 721ms
a 3109ms 141ms 3055ms 869ms
bc 2390ms 94ms 2491ms 212ms
yellow 1391ms 78ms 1682ms 181ms
monday 1329ms 62ms 1557ms 188ms

Optimization 02 - Fulltext Index

After creating a cache table a fulltext index has been used on the cached table. This allows the user to search for songs and artists. The best matching case will be shown first and afterwards the entries are sorted by count.

-- CREATE TABLE
CREATE TABLE songs_per_radio_station (
    id INT AUTO_INCREMENT,
    title VARCHAR(255),
    artist VARCHAR(255),
    radio_station_id INT NOT NULL,
    count INT NOT NULL,
    FULLTEXT (title, artist),
    CONSTRAINT pk_songs_per_radio_station PRIMARY KEY (id),
    CONSTRAINT fk_songs_per_radio_station_radio_station FOREIGN KEY (radio_station_id)
        REFERENCES radio_stations (id)
);

-- INSERT DATA
INSERT INTO songs_per_radio_station
SELECT null as id, title, artist, radio_station_id, count(*) as count
FROM songs
GROUP BY title, artist, radio_station_id
ORDER BY radio_station_id asc, count desc;

Songs and artists can be searched using the following query:

SELECT id, title, artist, count
FROM songs_per_radio_station
WHERE radio_station_id = 1
AND MATCH (title, artist) AGAINST ("test")
LIMIT 30
OFFSET 0;
Search text old MySQL speed new MySQL speed old Go API speed new Go API speed
test 110ms 31ms 721ms 164ms
a 141ms 31ms 869ms 31ms
bc 94ms 16ms 212ms 34ms
yellow 78ms 31ms 181ms 32ms
monday 62ms 32ms 188ms 34ms

These outcomes lead to the final result which is a performance improvement with a maximum request time reduction of 1/100 of the original request duration.


Written by default