I've recently received a question about how fulltext indices work in NexusDB.

First I'll have to explain how a normal index works.

An index is a sorted list of Key, RefNr pairs. A RefNr identifies a specific record.

When a record is inserted in a table, a Key is build from that record, usually just by copying together one or more fields (depending on the index definition). This Key and the RefNr identifying that record is then inserted in the correct position in the index. The result is that for each record there is exactly one Key, RefNr pair in each index.

A range works by finding the first key that is larger or equal to the specified start value and the last key that is smaller or equal to the specified end value. Any key in between is contained in the range.

A fulltext index in NexusDB is very similar, the only difference is how the keys that are inserted into the index are generated.

The fulltext index descriptor contains a set of token extractors and token filters that describe a record is turned into a set of tokens. A token is simply a string. The result is that for each record there will be a list of 0 to n strings. Each of these strings now becomes a Key and there will be between 0 and n pairs added to the fulltext index, all with the same RefNr identifying the same record.

This is really the only difference between a normal index and a fulltext index. Where a normal index contains each record exactly once, a fulltext index will contain each record between 0 and n times, once for each unique token in that record.

With a normal index it's usually very easy to see the value of each key as it usually is 1:1 the value of one (or more) fields.

With a fulltext index, because the exact same record is present multiple times with different keys that's not so obvious. To make the key value visible, TnxTable has a special flag in it's Options property: dsoAddKeyAsVariantField. This will add a calculated field called _Key which shows the key value.

With this information it should be clear that it is not possible to perform an expression search (e.g. "test" AND "token") on a fulltext index using TnxTable directly as it is not possible to specify a single range which would only contain the records that have a key for both "test" and "token".

A way to do this manually would be to set a range on "test" then iterate the records in the range and note down the value of some unique field (e.g. an autoinc or guid field). Then repeat the same with a range on "token" and combine the 2 lists so that only entries remain that are in both lists. This list can then be used with an index on the unique field to find the individual records that matched the search expression.

Fortunately it is not necessary to do all this manually as the SQL engine already implements this using the CONTAINS clause.

The formal syntax for the CONTAINS clause looks like this:

<ContainsClause> ::= "CONTAINS" '(' <SqlName> ',' <FullTextSearchExpr> ')'
<FullTextSearchExpr> ::= <FullTextSearchTerm> { "OR" <FullTextSearchTerm>}
<FullTextSearchTerm> ::= <FullTextSearchFactor> { "AND" <FullTextSearchFactor>}
<FullTextSearchFactor> ::= [ "NOT" ] <SimpleExpression>

Example:

SELECT * FROM tbl WHERE CONTAINS('FulltextIndexName', 'test' AND 'token')
Home | Community | Blogs | Thorsten's Blog