Full-Text Searching
Often, a database query contains a search where you're not exactly sure what you're looking for but you know it might be like this or that. You may have already seen this while using LIKE in a WHERE conditional. An alternative is the full-text search, which has two benefits over LIKE:
There are four gotchas when it comes to full-text searches:
Full-text searches are case-insensitive. If you search using a term (or terms) that appears in more than half of the records, no rows will be returned, because that search will not be specific enough. One workaround is to use Boolean mode, discussed later in this chapter. There is a minimum word length for search terms, by default four characters. Shorter terms will never be found, even when you know they are present. Very popular words, called stopwords, are ignored.
Full-text searching is most important in situations where people will be entering keywords that must be searched against specific fields in a table, such as search engines. I'll first show you how to create the proper FULLTEXT index. Then I'll run through some simple full-text searches. Finally, you'll learn about the full-text Boolean mode, which allows for much more precise searching.
Tips
MySQL comes with several hundred stopwords already defined. These are part of the application's source code. You can also specify your own custom stopword file to use as of MySQL 4.0.10. The minimum keyword lengthfour characters by defaultis a configuration setting you can change in MySQL.
Creating a FULLTEXT index
Before performing a full-text search, you need to prepare the database by creating an index of type FULLTEXT on the column or columns to be used in the searches. Currently, you can only create FULLTEXT searches on MyISAM tables.
To create a FULLTEXT index when defining a table, you would do something like this:
CREATE TABLE tablename (
col1 TEXT,
col2 TEXT,
FULLTEXT (col1, col2)
)
If the table already exists, you can add a FULLTEXT index using an ALTER query:
ALTER TABLE tablename ADD FULLTEXT (columns)
I'll modify the expenses table (in the accounting database) to add a FULLTEXT index on the expense_description column.
To create a FULLTEXT index:
1. | Log in to the mysql client as a user that can access and modify the accounting database.
| 2. | Select the accounting database.
USE accounting;
| 3. | Make sure the expenses table is well populated ( Figure 10.9).
SELECT expense_description FROM
expenses;
It doesn't really matter what SQL you use during these steps as long as you add records with good descriptions to the expenses table. The more records a table has, the more useful and accurate full-text indexing and searching becomes.
| | | 4. | Create a full-text index ( Figure 10.10).

ALTER TABLE expenses
ADD FULLTEXT (expense_description);
Before I run any full-text searches on a table, I must create a full-text index on the column or columns involved.
| 5. | Confirm the table's structure by viewing its CREATE syntax ( Figure 10.11).
SHOW CREATE TABLE expenses \G
You've already seen a couple of SHOW commands by now, so hopefully this one isn't too surprising. The SHOW CREATE TABLE tablename command asks MySQL to print out the SQL command used to create (or re-create) the named table. Using the \G trick makes the results easier to view.
|
Tip
Inserting records into tables with FULLTEXT indexes can be much slower because of the complex index that's required. Similarly, adding a FULLTEXT index to an existing table can take some time, depending upon how much data is already there.
Performing basic FULLTEXT searches
Once you've established a FULLTEXT index on a column, you can start querying against it, using the MATCH and AGAINST keywords. The syntax for using full-text searching is
SELECT * FROM tablename WHERE MATCH (column) AGAINST ('string')
The result of the query will be the rows returned in order of relevance from most to least. In other words, the rows in which string most matches the values in column will be listed first.
If you want to match against multiple words, you can do so, separating the words by spaces:
SELECT * FROM tablename WHERE MATCH
(column) AGAINST ('word1 word2')
With a query like this, rows that contain both words will rank higher than those that contain only one or the other (which would still qualify as a match). This behavior can be tweaked using the Boolean mode, discussed next in the chapter.
To view the relevance of a row returned by a match, you would select it:
SELECT somecolumn, MATCH (column) AGAINST ('string') FROM tablename
Your FULLTEXT index must be created on the same column or combination of columns that you use in your SELECT queries. If you create the index on one column, you can use only that one column in your query. If you create the index on two columns, you must use exactly both those columns in your queries:
ALTER TABLE tablename ADD FULLTEXT (col1, col2)
SELECT * FROM tablename WHERE MATCH (col1, col2) AGAINST ('string')
|
To use full-text searching:
1. | Log in to the mysql client and select the accounting database, if you have not already.
USE accounting;
| 2. | Make sure the expenses table is well populated.
If you are paying close attention, you'll note that this step was included in the preceding sequence. It's important, though. The more populated a table is, the more useful a full-text search is. In fact, a (non-Boolean mode) full-text search won't even work if you have fewer than three records in the table!
| 3. | Do a full-text search using the word visual ( Figure 10.12).
SELECT expense_id, expense_description
FROM expenses WHERE
MATCH (expense_description) AGAINST ('visual');
This query will return the expense_id and expense_description columns wherever visual has a positive relevance in the expense_description values. In other words, any record that contains the word visual will be returned.
If you have different data in your table, you'll want to change the searched-for word accordingly.
| | | 4. | Run a full-text search using the words visual and guide ( Figure 10.13).
SELECT expense_id, expense_description,
MATCH (expense_description) AGAINST ('visual guide') AS rel
FROM expenses WHERE
MATCH (expense_description) AGAINST ('visual guide') \G
This query differs from that in Step 3 in two ways. First, I'm also selecting the MATCH...AGAINST value so that the calculated relevance number is displayed. Second, I've included a second term to match. Those records with both terms will score higher than those with just one.
|
Tips
You can weed out some results by using the MATCH...AGAINST() value in a WHERE clause: SELECT expense_id, expense_description,
MATCH (expense_description) AGAINST ('visual guide')
FROM expenses WHERE MATCH
(expense_description) AGAINST
('visual guide') > .001 Remember that if a FULLTEXT search returns no records, this means either that no matches were made or that over half of the records match (when not using Boolean mode). For sake of simplicity, I wrote all of the queries in this section as simple SELECT statements. You can certainly use FULLTEXT searches within joins or more complex queries.
Performing Boolean FULLTEXT searches
You can take your full-text searching one step further by using its Boolean mode (as of MySQL version 4.0.1). In Boolean mode the search terms can be preceded by special characters (Table 10.1) to indicate how their presence should be weighted with regard to relevancy.
Table 10.1. These characters are specifically used to affect the importance of terms in full-text searches.Special Boolean Mode Characters |
---|
Character | Meaning | Example | Matches |
---|
+ | Word is required | +punk rock | punk is required and rock is optional | - | Word must not be present | +punk -rock | punk is required and rock cannot be present | "" | A literal phrase | "punk rock" | Occurrences of the phrase punk rock are weighted | < | Less important | <punk +rock | rock is required, and punk is less significant | > | More important | >punk +rock | rock is required, but punk is more significant | () | Creates groups | (>punk roll) +rock | rock is required, both punk and roll are optional, but punk is weighted more | ~ | Detracts from relevance | +punk ~rock | punk is required, and the presence of rock devalues the relevance (but rock is not excluded) | * | Allows for wildcards | +punk +rock* | punk and rock are required, but rocks, rocker, rocking, etc., are counted |
SELECT columns FROM tablename WHERE MATCH (column) AGAINST ('+word1 word2' IN BOOLEAN MODE)
SELECT * FROM tablename WHERE MATCH (column) AGAINST ('+database -mysql' IN BOOLEAN MODE)
In the second example, a match will be made if the word database is found and mysql is not present. Alternatively, the tilde (~) is used as a milder form of the minus sign, meaning that the keyword can be present in a match, but such matches should be considered less relevant.
The wildcard character (*) matches variations on a word, so cata* matches catalog, catalina, and so on. Two operators explicitly state what keywords are more (>) or less (<) important. Finally, you can use double quotation marks to hunt for exact phrases. Parentheses can be used to make subexpressions.
The following query would look for records with the phrase Web develop with the word html being required and the word JavaScript detracting from a match's relevance:
SELECT * FROM tablename WHERE
MATCH(column) AGAINST('>"Web develop" +html ~JavaScript' IN BOOLEAN MODE)
When using Boolean mode, there are several differences as to how FULLTEXT searches work:
If a keyword is not preceded by an operator, the word is optional but a match will be ranked higher if it is present. Results will be returned even if more than 50 percent of the records match the search. The results are not automatically sorted by relevance.
Because of this last fact, you'll also want to sort the returned records by their relevance, as I'll demonstrate in the next sequence of steps. One important rule that remains with Boolean searches is that the minimum word length (four characters by default) still applies. So trying to require a shorter word using a plus sign (+SQL) still won't work.
One advanced topic that I don't cover in this book is the various control flow functions MySQL has to offer.
The basic function is an IF conditional, which acts like a ternary operator:
IF (condition, return_this_if_true, return_this_if_false)
If the condition is true, the second argument is returned. Otherwise, the third argument is returned. This next example returns either odd or even, depending upon whether or not the remainder of dividing a column's value (e.g., 3) by 2 is equal to 1:
SELECT IF( (some_num_col % 2) = 1, 'odd', 'even');
Along with the IF conditional, there are an IFNULL, a NULLIF, and a CASE. Each is really just a specific type of conditional. See the MySQL manual for syntax and examples of these.
|
To use Boolean mode:
1. | Log in to the mysql client and select the accounting database, if you have not already.
USE accounting;
| 2. | Do a full-text search requiring both the words visual and guide ( Figure 10.14).
SELECT expense_id, expense_description,
MATCH (expense_description) AGAINST ('+visual +guide' IN BOOLEAN MODE) AS rel
FROM expenses WHERE
MATCH (expense_description) AGAINST ('+visual +guide' IN BOOLEAN MODE) ORDER BY rel DESC \G
This query adds two new features. First, the IN BOOLEAN MODE text is added, as well as the plus signs, indicating required words. Second, the results are ordered by their relevance, using an alias.
| | | 3. | Do a full-text search requiring both the words visual and guide while stressing the word quickpro ( Figure 10.15).
SELECT expense_id, expense_description,
MATCH (expense_description) AGAINST ('+visual +guide >quickpro' IN BOOLEAN MODE) AS rel
FROM expenses WHERE
MATCH (expense_description) AGAINST ('+visual +guide >quickpro' IN BOOLEAN MODE) ORDER BY
rel DESC \G
I add > quickpro to my search terms to give preference to records containing this word.
|
Tip
|