How to integrate Azure Cosmos DB with Azure Search

Đặng Trường Chính (DTC), team leader of Rangers team, had a good post about how to integrate Azure Cosmos DB with Azure Search. We are happy to share it here.

First of all, let’s quickly get to known about indexers in Azure Search. Please allow us to quote the essential pieces of information here.

An indexer in Azure Search is a crawler that extracts searchable data and metadata from an external data source and populates an index based on field-to-field mappings between the index and your data source. This approach is sometimes referred to as a ‘pull model’ because the service pulls data in without you having to write any code that pushes data to an index.

A data source specifies the data to index, credentials, and policies for identifying changes in the data (such as modified or deleted documents inside your collection). The data source is defined as an independent resource so that it can be used by multiple indexers.

An indexer describes how the data flows from your data source into a target search index. An indexer can be used to:

  • Perform a one-time copy of the data to populate an index.
  • Sync an index with changes in the data source on a schedule. The schedule is part of the indexer definition.
  • Invoke on-demand updates to an index as needed.

Configuration steps

Indexers can offer features that are unique to the data source. In this respect, some aspects of indexer or data source configuration will vary by indexer type. However, all indexers share the same basic composition and requirements. Steps that are common to all indexers are covered below.

  • Step 1: Create a data source. An indexer pulls data from a data source which holds information such as a connection string and possibly credentials
  • Step 2: Create an model index
  • Step 3: Create and schedule the indexer. The indexer definition is a construct specifying the index, data source, and a schedule

2. Connecting Cosmos DB with Azure Search using indexers in the project

To set up a Cosmos DB indexer in the project, we create abstract components: ConnectorsIndexes and IndexBuilders

Connectors

The connector defines data source name, indexer name and query to shape indexed data

public class UserConnector : BaseConnector
{
    public const string DATA_SOURCE_NAME = "ds-user";
    public const string INDEXER_NAME = "indexer-user";

    public UserConnector()
    {
        DataSourceName = DATA_SOURCE_NAME;
        IndexerName = INDEXER_NAME;

        DocQuery = "SELECT c.id, c.Username, c.Email"
                    + " WHERE c.Type = 'UserModel'"
                    + " AND c._ts >= @HighWaterMark ORDER BY c._ts";
    }
}

Indexes

An index is the primary means of organizing and searching documents in Azure Search, similar to how a table organizes records in a database . When creating an index, following attributes can be set: name, type, key, retrievable, searchable, filterable, sortable, facetable …

public class UserIndex
{
    public const string INDEX_NAME = "index-user";

    public const string FieldId = "id";
    public const string FieldUsername = "Username";
    public const string FieldEmail = "Email";

    [IndexDoc(Name = FieldId, IsKey = true, IsFilterable = true, IsSortable = true)]
    [JsonProperty(FieldId)]
    public string Id { get; set; }

    [JsonIgnore]
    public string IndexName { get; protected set; }

    [IndexDoc(Name = FieldUsername, IsFilterable = true, IsSearchable = true, IsSortable = true)]
    [JsonProperty(FieldUsername)]
    public string Username { get; set; }

    [IndexDoc(Name = FieldEmail, IsFilterable = true, IsSearchable = true, IsSortable = true)]
    [JsonProperty(FieldEmail)]
    public string Email { get; set; }

    public UserIndex()
    {
        IndexName = INDEX_NAME;
    }
}

IndexBuilders

We need to build the connectors in implementation class UserIndexBuilder. the responsibility of one index builder:

  • Creating data source, indexers and index
  • Deleting data source, indexers and index.
  • Force running indexers to pull data from data source (Indexer runs every every 30 minutes by scheduling as default)
public interface IIndexBuilder
{
    Task CreateOrUpdateAsync(SearchServiceClient searchClient);
    Task DeleteAsync(SearchServiceClient searchClient);
    Task RunIndexersAsync(SearchServiceClient searchClient);
}

public abstract class BaseIndexBuilder<T> : IIndexBuilder where T : BaseDocIndex, new()
{
    protected readonly List<BaseConnector> _connectors;
    protected abstract List<BaseConnector> BuildConnectors();

    protected BaseIndexBuilder()
    {
        _connectors = BuildConnectors();
    }

    public Task CreateOrUpdateAsync(SearchServiceClient searchClient) { ... }
    public Task DeleteAsync(SearchServiceClient searchClient) { ... }
    public Task RunIndexersAsync(SearchServiceClient searchClient) { ... }
}

public class UserIndexBuilder : BaseIndexBuilder<UserIndex>
{
    protected override List<BaseConnector> BuildConnectors()
    {
        return new List<BaseConnector>
        {
            new UserConnector()
        };
    }
}

Credits

This post quoted some information from docs.microsoft.com: