Effective full-text search: go relevant

Using pure CONTAINS can be painful. Not for you as a developer, but for your users. There is a reason why Google Search became an industry standard and even Microsoft copied Google search results to improve Bing when they saved the searches in Internet Explorer.

Ranking is the most important when you have a huge pile of data.

Ok maybe AI developers will not agree, but at least both of these are based on statistics.

Ranking in T-SQL

Microsoft SQL Server/T-SQL supports full-text search ranking out the box. This works very similar to CONTAINS and FREETEXT detailed in the previous article of the series: Effective full-text search: go advanced, but instead of the return type of a bit (bool) CONTAINSTABLE and FREETEXTTABLE are table valued functions so these return with a table having two columns:

KEY and RANK .

Let’s see some basic examples:

SELECT * FROM CONTAINSTABLE(main, *, '"nice" or "eur*"') SELECT * FROM FREETEXTTABLE(main, *, 'nice')

The result is very much the same as the result of the normal CONTAINS and FREETEXT , but now we have a rank. By joining these table valued functions to the original table — called main in this example — you can get exactly the same result as before, but now you can order the result set based on its relevancy for the user. I added the RANK field to the SELECT to show how ordering works, but that is not mandatory.

SELECT [main].*, [FT].Rank

FROM [main]

INNER JOIN CONTAINSTABLE(main, *, '"nice" or "eur*"') AS FT ON [main].id = [FT].[Key]

ORDER BY [FT].Rank DESC

Just the most relevant

There are two additional parameters of these functions. One is of course the language — which will be discussed in a future article — and the “Top N by rank”. If

you think about this TOP part, you could also achieve something similar using the normal SELECT TOP N * FROM ... statements like this:

SELECT TOP 1 [main].*, [FT].rank

FROM [main]

INNER JOIN CONTAINSTABLE(main, *, '"nice" or "eur*"') AS FT ON [main].id = [FT].[Key]

ORDER BY [FT].Rank DESC

Please do not ever do anything like this, it is just for curiosity, because obviously it will not perform well, unless you have only a few records because in that case it will just be ugly but you won’t notice the difference.

The correct version will look like this:

SELECT [main].*, [FT].rank

FROM [main]

INNER JOIN CONTAINSTABLE(main, *,'"nice" or "eur*"', 1) AS FT ON [main].id = [FT].[Key]

ORDER BY [FT].Rank DESC

The result set in both cases will be the same.

To get used to the right practice

As noted in the previous article you need to check the query plans. Always. Without exceptions. It is not optional.

I run a simplified query without the explicit ORDER BY statement and our query plan shows that we are using a table valued function which sounds correct and there is also a sort operation to be able to give back only the most relevant record(s). This means that the TOP N parameter of CONTAINSTABLE added an additional operation to the query plan.

SELECT [main].*, [FT].rank˛

FROM [main]

INNER JOIN CONTAINSTABLE(main, *, '"nice" or "eur*"', 1) AS FT ON [main].id = [FT].[Key]

There is another important thing on execution plan that you need to investigate every time you notice it. This is the warning sign on the “SELECT” box (little yellow triangle with an exclamation mark), but it is beyond the scope of this article so we will skip it.

What’s next

The story goes on to the next chapter in which I will explain what you can do with files in SQL Server.

István Kovács
Cloud Architect