Effective full-text search: go advanced

July 7, 2020

In the previous article you read about the LIKE statement with its pros and cons. If you were thinking about that a little bit you may found that as a simple “contains” method which doesn’t provide any linguistic help for queries.

Now we will switch to something that can scale better than LIKE '%...%' and full of extras.

Full-text search on MS SQL Server

Let’s start with the very beginning: real full text search doesn’t work like LIKE . You need to “enable” that during installation time or you need to re-run the setup to add this feature to your SQL instance.

What you can index

Technically you can index any character types including xml and json, but as an extra you can index files stored in the database too. To be able to index files you may need to install 3rd party or Microsoft made iFilters. This will come later!

Availability of the service

As a developer you need to know in which scenarios you can build on this service.

Almost all editions of SQL Server support full-text search, except the Express edition without the advanced services.

Azure SQL databases support full-text search, but don’t support iFilter installation.

If you are a .Net developer using Entity framework, your only option for using full-text queries is to call stored procedures and map the result to your entities.

Environment setup

Let’s create a full-text catalog first:

CREATE FULLTEXT CATALOG aCatalog AS DEFAULT

“aCatalog” is just a name for the catalog I picked.

Make sure you have a unique key on your table. In general your primary key will do the work and then you are ready to add the indexes to the table.

CREATE FULLTEXT INDEX ON dbo.main ([name],[details]) KEY INDEX PK_main ON aCatalog

Little bit confusing, but you can create full-text catalog even without installing the feature, but you won’t be able to create indexes.

Full-Text Search is not installed, or a full-text component cannot be loaded.

Index population

As a developer you need to know a few things about full-text index population. If you are in an OLTP scenario - your data is changing frequently - you might want your index to be updated whenever your data changes. Because this is the general scenario, automatic change tracking is the default setting for a catalog. Just look for the full-text catalog in your database and check the settings in SSMS.

In certain OLAP like - analytical - scenarios, when you just populate your database once a day, maybe completely vanish that before the load, you can also do the index population after the data load using the below script. Of course in this case you do not need automatic change tracking.

ALTER FULLTEXT INDEX ON dbo.main START FULL POPULATION

Full index population is resource intensive so be cautious if you need to do that during peek load period on large tables.

Simple queries

Once you have a table with a few records you can write full-text queries. The simplest commands are CONTAINS and FREETEXT . CONTAINS is more advanced, but in many cases FREETEXT can be just enough.

FREETEXT statement

The return value of FREETEXT is a bit (bool) and it just searches for a words you put into the search phrase in the given columns.

SELECT TOP (1000) *

FROM [dbo].[main]

WHERE FREETEXT(name, 'black oslo ingress')

Please notice that it is enough if one of the words is included in the column. You can look for the same value in multiple columns at the same time.

SELECT TOP (1000) *

FROM [dbo].[main]

WHERE FREETEXT((name, details), 'europe')

Unfortunately you can’t filter two tables at the same time using only one FREETEXT statement, but you can of course add two to the query and use a JOIN .

SELECT

[main].*

, [details].*

FROM [dbo].[main]

LEFT OUTER JOIN [dbo].[details]

ON [main].id = [details].MainId

WHERE

FREETEXT([main].[name], 'europe')

OR FREETEXT([details].[Details1], 'europe')

CONTAINS statement

CONTAINS gives you more advanced options and brings you closer to the Google like search. CONTAINS supports logical operators like or and and .

SELECT [main].*

FROM [dbo].[main]

WHERE CONTAINS(*, '"europe"')

Notice that the searched word is enclosed by quotation marks (“europe”). Technically if you are searching for only one word you can omit those, but if you would like to extend this search phrase you need to use this syntax.

Another difference compared to the previous queries is the usage of * instead of listing all full-text indexed columns.

Prefixed queries

In the below query you can find a logical OR operator and also a prefixed search. This is useful when the ending of the word is uncertain.

SELECT [main].*

FROM [dbo].[main]

WHERE CONTAINS(*, '"nice" or "eur*"')

Be professional

Code is not ready without being tested. The accuracy of the result set is of course the most important part of the testing but not the only one. Running the queries on relevant — production like — data sets (tables) and understanding the execution plans are also critical.

It really doesn’t matter if your query can provide the best result when it times out on the user interface… when it locks half of the database and a general participant of deadlocks which used to cause crashes. Some of these issues can be seen in the query plan in black and white and some parts can be forecasted.

Running the last query above results in the below plan:

The query plan shows that the full-text statements are like queries on independent tables which provides a docid column as a result set which is later merged into the query containing the columns you would like to get back as a result set.

István Kovács
Cloud Architect