What Is MySQL Full-Text Search?

A:-Introduction to MySQL Full-Text Search
Summary: in this tutorial, we will introduce you to MySQL full-text search and its features.

MySQL supports text searching by using the LIKE operator and regular expression. However, when the text column is large and the number of rows in a table is increased, using these methods has some limitations:

  • 1.Performance: MySQL has to scan the whole table to find the exact text based on a pattern in the LIKE statement or pattern in the regular 2expressions.
  • 2.Flexible search: with the LIKE operator and regular expression searches, it is difficult to have a flexible search query e.g., to find product whose description contains car but not classic.
  • 3.Relevance ranking: there is no way to specify which row in the result set is more relevant to the search terms.

Because of these limitations, MySQL extended a very nice feature so-called full-text search. Technically, MySQL creates an index from the words of the enabled full-text search columns and performs searches on this index. MySQL uses a sophisticated algorithm to determine the rows matched against the search query.

The following are some important features of MySQL full-text search:

  • Native SQL-like interface: you use the SQL-like statement to use the full-text search.
  • Fully dynamic index: MySQL automatically updates the index of text column whenever the data of that column changes.
  • Moderate index size: it doesn’t take much memory to store the index.
  • Last but not least, it is fast to search based on complex search queries.

Notice that not all storage engines support the full-text search feature. In MySQL version 5.6 or later, only MyISAM and InnoDB storage engines support full-text search.

B:- Defining FULLTEXT Indexes for MySQL Full-Text Searching
Summary: in this tutorial, you will learn how to define the full-text index for performing various full-text searches in MySQL.

Before performing a full-text search in a column of a table, you must index its data. MySQL will recreate the full-text index whenever the data of the column changes. In MySQL, the full-text index is a kind of index that has a name FULLTEXT.

MySQL supports indexing and re-indexing data automatically for a full-text search enabled column. MySQL version 5.6 or later allows you to define a full-text index for a column whose data type is CHAR, VARCHAR or TEXT in MyISAM or InnoDB table type. Notice that MySQL supported full-text index in the InnoDB tables since version 5.6.

MySQL allows you to define the FULLTEXT index by using the CREATE TABLE statement when you create the table or ALTER TABLE or CREATE INDEX statement for the existing tables.

Defining FULLTEXT index using CREATE TABLE statement
Typically, you define the FULLTEXT index for a column when you create a new table using the CREATE TABLE statement as follows:

1
2
3
4
5
6
7
8
CREATE TABLE table_name(
 column1 data_type,
        column2 data_type,
        column3 data_type,
 …
PRIMARY_KEY(key_column),
FULLTEXT (column1,column2,..)
);

To create the FULLTEXT index, you place a list of comma-separated columns in parentheses after the FULLTEXT keyword.
The following statement creates a new table named posts that has a FULLTEXT index that includes the post_content column.

1
2
3
4
5
6
7
CREATE TABLE posts (
  id int(4) NOT NULL AUTO_INCREMENT,
  title varchar(255) NOT NULL,
  post_content text,
  PRIMARY KEY (id),
  FULLTEXT KEY post_content (post_content)
);

Defining FULLTEXT index for existing tables
In case you already have existing tables and want to define full-text indexes, you can use the ALTER TABLE statement or CREATE INDEX statement.

Defining FULLTEXT index using ALTER TABLE statement
The following syntax defines a FULLTEXT index using the ALTER TABLE statement:

1
2
ALTER TABLE  table_name  
ADD FULLTEXT(column_name1, column_name2,…)

You put the table_name is the ADD FULLTEXT clause that defines a FULLTEXT index for one or more columns.
For example, you can define a FULLTEXT index for the productDescription and productLine columns in the products table of the sample database as follows:

1
2
ALTER TABLE products  
ADD FULLTEXT(productDescription,productLine)

Defining FULLTEXT index using CREATE INDEX statement
You can also use the CREATE INDEX statement to create a FULLTEXT index for existing tables. See the following syntax:

1
2
CREATE FULLTEXT INDEX index_name
ON table_name(idx_column_name,...)

The following statement creates a FULLTEXT index for the addressLine1 and addressLine2 columns of the offices table.

1
2
CREATE FULLTEXT INDEX address
ON offices(addressLine1,addressLine2)

Notice that for a table which has many rows, it is faster to load the data into the table that has no FULLTEXT index first and then create the FULLTEXT index, than loading a large amount of data into a table that has an existing FULLTEXT index.

Removing full-text search columns
To remove a FULLTEXT index, you just delete the index using the ALTER TABLE … DROP INDEX statement. For example, the following statement removes the address FULLTEXT index in the offices table:

C:-MySQL Natural Language Full-Text Searches
Summary: in this tutorial, you will learn about MySQL natural language full-text search by using the MATCH() and AGAINST() functions.

Introduction to MySQL natural language full-text searches
In natural language full-text searches, MySQL looks for rows or documents that are relevant to the free-text natural human language query, for example, “How to use MySQL natural language full-text searches”.
Relevance is a positive floating-point number. When the relevance is zero, it means that there is no similarity. MySQL computes the relevance based on various factors including the number of words in the document, the number of unique words in the document, the total number of words in the collection, and the number of documents (rows) that contain a particular word.
To perform natural language full-text searches, you use MATCH() and AGAINST() functions. The MATCH() function specifies the column where you want to search and the AGAINST() function determines the search expression to be used.

MySQL natural language full-text search example
We will use the products table in the sample database for the demonstration.
products_table.png

First, you need to enable full-text search in the productLine column of the products table using the ALTER TABLE ADD FULLTEXT statement:

1
2
ALTER TABLE products
ADD FULLTEXT(productline);

Second, you can search for products whose product lines contain the term Classic . You use the MATCH() and AGAINST() functions as the following query:

1
2
3
SELECT productName, productline
FROM products
WHERE MATCH(productline) AGAINST('Classic');

products-with-classic-product-line.png
To search for product whose product line contains Classic or Vintage term, you can perform the following query:

1
2
SELECT productName, productline
FROM products

products-with-classic-and-vintage-product-line.png

The AGAINST() function uses IN NATURAL LANGUAGE MODE search modifier by default therefore you can omit it in the query. There are other search modifiers e.g., IN BOOLEAN MODE for Boolean text searches.
You can explicitly use the IN NATURAL LANGUAGE MODE search modifier in your query as follows:

1
2
3
4
SELECT productName, productline
FROM products
WHERE MATCH(productline)
AGAINST('Classic,Vintage' IN NATURAL LANGUAGE MODE)

By default, MySQL performs searches in the case-insensitive fashion. However, you can instruct MySQL to perform case-sensitive searches using binary collation for indexed columns.
Sort the result set by relevance

A very important feature of full-text search is how MySQL ranks the rows in the result set based on their relevance. When the MATCH() function is used in the WHERE clause, MySQL returns the rows that are more relevant first.
The following example shows you how MySQL sorts the result set by the relevance.
First, you enable the full-text search feature for the productName column of the products table.

1
2
ALTER TABLE products
ADD FULLTEXT(productName);

Second, you search for products whose names contain Ford and/or 1932 using the following query:

1
2
3
SELECT productName, productline
FROM products
WHERE MATCH(productName) AGAINST('1932,Ford')

mysql-full-text-search-sort-by-relevancy.png

The products, whose names contain both 1932 and Ford are returned first and then the products whose names contains the only Ford keyword.

There are some important points you should remember when using the full-text search:
The minimum length of the search term defined in MySQL full-text search engine is 4. It means that if you search for the keyword whose length is less than 4 e.g., car, cat, etc., you will not get any results.
Stop words are ignored. MySQL defines a list of stop words in the MySQL source code distribution storage/myisam/ft_static.c

In this tutorial, you have shown you how to use the MATCH() and AGAINST() functions to perform natural language searches in MySQL.
D:- MySQL Boolean Full-Text Searches
Summary: in this tutorial, you will learn how to perform MySQL Boolean full-text searches. In addition, you will learn how to use Boolean operators to form very complex search queries.
Introduction to MySQL Boolean full-text searches

Besides the natural language full-text search, MySQL supports an additional form of full-text search that is called Boolean full-text search. In the Boolean mode, MySQL searches for words instead of the concept like in the natural language search.

MySQL allows you to perform a full-text search based on very complex queries in the Boolean mode along with Boolean operators. This is why the full-text search in Boolean mode is suitable for experienced users.

To perform a full-text search in the Boolean mode, you use the IN BOOLEAN MODE modifier in the AGAINST expression. The following example shows you how to search for a product whose product name contains the Truck word.

1
2
3
4
SELECT productName, productline
FROM products
WHERE MATCH(productName)
AGAINST('Truck' IN BOOLEAN MODE )

mysql-boolean-tex-searches-product-name-with-keyword-truck.png

Two products whose product names contain the Truck word are returned.

To find the product whose product names contain the Truck word but not any rows that contain Pickup , you can use the exclude Boolean operator ( – ), which returns the result that excludes the Pickup keyword as the following query:

mysql-boolean-tex-searches-with-Boolean-operator.png
MySQL Boolean full-text search operators

The following table illustrates the full-text search Boolean operators and their meanings:

Operator Description
+ Include, the word must be present.
– Exclude, the word must not be present.
> Include, and increase ranking value.
< Include, and decrease the ranking value. () Group words into subexpressions (allowing them to be included, excluded, ranked, and so forth as a group). ~ Negate a word’s ranking value. * Wildcard at the end of the word. “” Defines a phrase (as opposed to a list of individual words, the entire phrase is matched for inclusion or exclusion). The following examples illustrate how to use boolean full-text operators in search query: To search for rows that contain at least one of the two words: mysql or tutorial ‘mysql tutorial’ To search for rows that contain both words: mysql and tutorial ‘+mysql +tutorial’ To search for rows that contain the word “mysql”, but put the higher rank for the rows that contain “tutorial”: ‘+mysql tutorial’ To search for rows that contain the word “mysql” but not “tutorial” ‘+mysql -tutorial’ To search for rows that contain word “mysql” and rank the row lower if it contains the word “tutorial”. ‘+mysql ~tutorial’ To search for rows that contain the words “mysql” and “tutorial”, or “mysql” and “training” in whatever order, but put the rows that contain “mysql tutorial” higher than “mysql training”. ‘+mysql +(>tutorial MySQL boolean full-text search main features

MySQL does not automatically sort rows in the order of decreasing relevance in Boolean full-text search.
To perform Boolean queries, InnoDB tables require all columns of the MATCH expression has a FULLTEXT index. Notice that MyISAM tables do not require this, although the search is quite slow.
MySQL does not support multiple Boolean operators on a search query on InnoDB tables e.g., ‘++mysql’. MySQL will return an error if you do so. However, MyISAM behaves differently. It ignores other operators and uses the operator that is closest to the search word, for example, ‘+-mysql’ will become ‘-mysql’.
InnoDB full-text search does not support trailing plus (+) or minus (-) sign. It only supports leading plus or minus sign. MySQL will report an error if you search word is ‘mysql+’ or ‘mysql-‘. In addition, the following leading plus or minus with wildcard are invalid: +*, +-
The 50% threshold is not applied. By the way, 50% threshold means if a word appears in more than 50% of the rows, MySQL will ignore it in the search result.
In this tutorial, we have shown you how to perform MySQL Boolean full-text searches with many useful Boolean operators.
E:- Using MySQL Query Expansion
Summary: in this tutorial, you will learn about MySQL query expansion to widen the search results based on automatic relevance feedback.
Introduction to MySQL Query Expansion
In some cases, users want to search for information based on the knowledge that they have. Users use their experience to define keywords to search for information, and typically those keywords are too short.

To help users to find information based on the too-short keywords, MySQL full-text search engine introduces a concept called query expansion.

The query expansion is used to widen the search result of the full-text searches based on automatic relevance feedback (or blind query expansion). Technically, MySQL full-text search engine performs the following steps when the query expansion is used:

1-First, MySQL full-text search engine looks for all rows that match the search query.
2-Second, it checks all rows in the search result and finds the relevant words.
3-Third, it performs a search again based on the relevant words instead of the original keywords provided by the users.
From the application perspective, you can use the query expansion when the search results are too few. You perform the searches again but with query expansion to offer users more information that is related and relevant to what they are looking for.

To use the query expansion, you use the WITH QUERY EXPANSION search modifier in the AGAINST() function. The following illustrates the syntax of the query using the WITH QUERY EXPANSION search modifier.

1
2
3
4
SELECT column1, column2
FROM table1
WHERE MATCH(column1,column2)
      AGAINST('keyword',WITH QUERY EXPANSION);

MySQL Query Expansion Example

Let’s look at an example of query expansion to see how it works.
We will use the productName column of the products table to demonstrate the query expansion feature.
First, we enable the full-text search index for this column.

1
2
ALTER TABLE products
ADD FULLTEXT(productName);

Second, we search for a product whose product name contains the 1992 term without using query expansion.

1
2
3
SELECT productName
FROM products
WHERE MATCH(productName) AGAINST('1992');
1
2
3
4
5
6
7
+-----------------------------------+
| productName                       |
+-----------------------------------+
| 1992 Ferrari 360 Spider red       |
| 1992 Porsche Cayenne Turbo Silver |
+-----------------------------------+
2 rows in set (0.00 sec)

As you see, the search result has two products whose product names contain the term 1992 .
Third, we can widen the search result by using query expansion as the following statement:

1
2
3
4
SELECT productName
FROM products
WHERE MATCH(productName)
      AGAINST('1992' WITH QUERY EXPANSION);
1
2
3
4
5
6
7
8
9
10
11
12
+-------------------------------------+
| productName                         |
+-------------------------------------+
| 1992 Porsche Cayenne Turbo Silver   |
| 1992 Ferrari 360 Spider red         |
| 2001 Ferrari Enzo                   |
| 1932 Alfa Romeo 8C2300 Spider Sport |
| 1948 Porsche 356-A Roadster         |
| 1948 Porsche Type 356 Roadster      |
| 1956 Porsche 356A Coupe             |
+-------------------------------------+
7 rows in set (0.00 sec)

We got more rows in the search result when we used query expansion. The first two rows are the most relevant and the other rows come from the relevant keyword that is in the first two rows, e.g.,Ferrari

READ  Introduction on Mysql Triggers

Notice that blind query expansion tends to increase noise significantly by returning non-relevant results. It is highly recommended that you use query expansion only when the searched keyword is short.

F:- MySQL ngram Full-Text Parser
Summary: this tutorial shows you how to use MySQL ngram full-text parser to support full-text searches for ideographic languages such as Chinese, Japanese, Korean, etc.

Introduction to MySQL ngram full-text parser

The built-in MySQL full-text parser determines the beginning and ending of words using white space. When it comes to ideographic languages such as Chinese, Japanese, or Korean, etc., this is a limitation because these languages do not use word delimiters.

To address this issue, MySQL provided the ngram full-text parser. Since version 5.7.6, MySQL included ngram full-text parser as a built-in server plugin, meaning that MySQL loads this plugin automatically when the MySQL database server starts. MySQL supports ngram full-text parser for both InnoDB and MyISAM storage engines.

By definition, a ngram is a contiguous sequence of a number of characters from a sequence of text. The main function of ngram full-text parser is tokenizing a sequence of text into a contiguous sequence of n characters.

The following illustrates how the ngram full-text parser tokenizes a sequence of text for different value of n:

1
2
3
4
5
n = 1: 'm','y','s','q','l'
n = 2: 'my', 'ys', 'sq','ql'
n = 3: 'mys', 'ysq', 'sql'
n = 4: 'mysq', 'ysql'
n = 5: 'mysql'

Creating FULLTEXT indexes with ngram parser

To create a FULLTEXT index that uses ngram full-text parser, you add the WITH PARSER ngram in the CREATE TABLE, ALTER TABLE, or CREATE INDEX statement.

For example, the following statement creates new posts table and adds the title and body columns to the FULLTEXT index that use ngram full-text parser.

1
2
3
4
5
6
CREATE TABLE posts (
    id INT PRIMARY KEY AUTO_INCREMENT,
    title VARCHAR(255),
    body TEXT,
    FULLTEXT ( title , body ) WITH PARSER NGRAM
)  ENGINE=INNODB CHARACTER SET UTF8MB4;

The following INSERT statement inserts a new row into the posts table:

1
2
3
4
5
SET NAMES utf8mb4;
 
INSERT INTO posts(title,body)
VALUES('MySQL全文搜索','MySQL提供了具有许多好的功能的内置全文搜索'),
      ('MySQL教程','学习MySQL快速,简单和有趣');

Note that the SET NAMES statement sets the character set that both client and server will use to send and receive data; in this case, it is utf8mb4.
To see how the ngram tokenizes the text, you use the following statement:

1
2
3
4
5
6
7
SET GLOBAL innodb_ft_aux_table="test/posts";
 
SELECT
    *
FROM
    information_schema.innodb_ft_index_cache
ORDER BY doc_id , position;

MySQL-ngram-full-text-parser-example.png
This query is useful for troubleshooting purposes. For example, if a word does not include in the search results, then the word may be not indexed because it is a stopword or it could be another reason.

Setting ngram token size

As you can see the previous example, the token size (n) in the ngram by default is 2. To change the token size, you use the ngram_token_size configuration option, which has a value between 1 and 10.

Note that a smaller token size makes smaller full-text search index and allows you to search faster.

Because ngram_token_size is a read-only variable, therefore you only can set its value using two options:

First, in the start-up string:

1
mysqld --ngram_token_size=1

Second, in the configuration file:

1
2
[mysqld]
ngram_token_size=1

ngram parser phrase search

MySQL converts a phrase search into ngram phrase searches. For example, “abc” is converted into “ab bc”, which returns documents that contain “ab bc” and “abc”.

The following example shows you to search for the phrase 搜索 in the posts table:

1
2
3
4
5
6
SELECT
    id, title, body
FROM
    posts
WHERE
    MATCH (title , body) AGAINST ('搜索' );

MySQL-ngram-full-text-parser-phrase-search.png
Processing search result with ngram

Natural language mode
In NATURAL LANGUAGE MODE searches, the search term is converted to a union of ngram values. Suppose the token size is 2 or bigram, the search term mysql is converted to my ys sq and ql.

1
2
3
4
5
6
SELECT
    *
FROM
    posts
WHERE
    MATCH (title , body) AGAINST ('简单和有趣' IN natural language MODE);

MySQL-ngram-full-text-parser-NATURAL-LANGUAGE-MODE.png
Boolean mode
In BOOLEAN MODE searches, the search term is converted to a ngram phrase search. For example:

1
2
3
4
5
6
SELECT
    *
FROM
    posts
WHERE
    MATCH (title , body) AGAINST ('简单和有趣' IN BOOLEAN MODE);