FULLTEXT in MySQL

When querying your database of cats (as one does) using MySQL, sometimes a simple SELECT * FROM cats WHERE bio LIKE '%boxes%'; is either not strong enough (this query would obviously return many cats), or just inefficient and expensive, as it results in a full table scan. Instead, a FULLTEXT search is more nuanced, allowing you to run more explicit and multi-worded queries on your database.

There is one thing to first think about in regards to FULLTEXT. First, depending on the version of MySQL you are using, FULLTEXT may only work on a MyISAM type table (as of 5.6, InnoDB also supports FULLTEXT). I could write a whole post about the differences between MyISAM and InnoDB, but for now, this Stack Overflow post sums it up nicely instead.

Once you know your engine type and its FULLTEXT search capabilities, the first step is to make sure that the columns you want to query are indexed for FULLTEXT. If your table is already created, the ALTER command is necessary to change it.

In my table Cats, there are the following columns: cat_id, name, age, gender, breed, location, and bio. Let’s say I want to be able to run a FULLTEXT search on cats’ biographies:

ALTER TABLE cats ADD FULLTEXT (bio);

Running a FULLTEXT search is simple with the use of a WHERE clause, with a MATCH...AGAINST added in:

SELECT name, bio FROM cats WHERE MATCH (bio) AGAINST ('boxes Japan');

(In my example, this would return Maru.)

FULLTEXT has some important rules. Keywords that are less than four characters will be ignored; popular words (stopwords, like ‘and’, ‘the’, etc.) are also ignored; and, if more than half of the record match the keywords, the query won’t return anything at all. This last one is important and should be remembered, as you can become puzzled when your search returns nothing. This is because FULLTEXT wants to return relevant, specific results.

There are also some useful parameters that can help querying even further. Add IN BOOLEAN MODE to the AGAINST clause, and you can further specify if a word must be included (‘+boxes’) or excluded (‘-California’), as well as a word and its variations (the wildcard: ‘box*’ can return ‘box’ and ‘boxes’). It should be noted if you’re using boolean mode, the results will not be ordered or sorted by relevance. An ORDER BY clause after the AGAINST clause with the parameters you need will do this. Borrowing from my previous example, let’s say I’m looking for cats who like boxes, but I’m most interested in those who also live in Japan:

1
2
3
4
5
6
SELECT name, bio, 
  MATCH (bio) AGAINST ('boxes >Japan' IN BOOLEAN MODE) 
  AS japanese_box_relevance 
FROM cats 
WHERE MATCH (bio) AGAINST ('boxes >Japan' IN BOOLEAN MODE) 
ORDER BY japanese_box_relevance DESC;

FULLTEXT is a great feature of MyISAM/newer InnoDB tables, but there are some drawbacks. If you’re strictly InnoDB and using an older version of MySQL, or you have heavy traffic, for example, you may need other options, like a strictly fulltext search server like Sphinx.

Comments