Implementing Complex Text Search in MongoDB with Facets and Atlas Search

Pavlo Lompas
4 min readApr 26, 2023

In this article, we’ll explore how to implement a fuzzy search using MongoDB that retrieves results from multiple collections based on a search term. We’ll also dive into the process of aggregating these results and displaying them as a unified list while maintaining the correct order based on the text score.

Scenario

Imagine we have a MongoDB database containing three collections: brands, models, and trims. Additionally, there's a searches collection with the following fields: text (string), objectType (string), and objectId (MongoDB ObjectId). The objectType field stores the collection name, like 'brands', 'models', or 'trims', while the objectId field refers to the document's ID in the corresponding collection.

// example documents of "searches" collection

_id: 1122334455
text: "Audi"
objectId: 1122334455
objectType: "brands"


_id: 1122334455
text: "Acura Integra"
objectId: 1122334455
objectType: "models"

_id: 1122334455
text: "Porsche 911 Carrera 4"
objectType: "trims"
objectId: 1122334455

Objective

Our goal is to write a query that performs a fuzzy search using the text field from the searches collection, and retrieves the associated object from the corresponding collection based on the objectType and objectId. Furthermore, we want to maintain the correct order of search results based on the text score.

// When searching for "porsche carrera":
[
{objectType: "models", text: "Porsche Carrera GT", avatar: "Porsche-logo.png"},
{objectType: "trims", text: "Porsche 911 Carrera", avatar: "Porsche-logo.png"},
{objectType: "brands", text: "Porsche", avatar: "Porsche-logo.png"},
]

Solution

To achieve this, we’ll use MongoDB’s aggregation pipeline, which allows us to perform complex data processing tasks in combination with MongoDB’s Atlas Search product. Here’s the step-by-step breakdown of the pipeline:

  1. Use the $search stage to fuzzy search the searches collection based on the search term.
{
$search: {
index: 'searchesIndex',
text: {
query: "text to search",
path: { wildcard: '*' },
fuzzy: {}
}
}
}

2. Add a new field textScore to the documents using $addFields.

{
$addFields: {
textScore: { $meta: 'searchScore' }
}
},

3. Use the $facet stage to separate the pipeline into branches based on the objectType ('brands', 'models', and 'trims').

{
$facet: {
brands: [ /* ... */ ],
models: [ /* ... */ ],
trims: [ /* ... */ ]
}
}

4. In each branch, use the $lookup operator to fetch the associated object from the corresponding collection based on the objectId. For example, the $match and $lookupstages for 'brands':

[
{ $match: { objectType: 'brands' } },
{
$lookup: {
from: "brands",
localField: "objectId",
foreignField: "_id",
as: "matchedObject"
}
}
]

5. Within each branch, sort the results based on the textScore using the $sort operator.

{ $sort: { textScore: -1 } }

6. Use $concatArrays to combine the results from each branch, unwind the combined results, and replace the root with each combined result.

{
$project: {
combinedResults: {
$concatArrays: ["$brands", "$models", "$trims"]
}
}
},
{ $unwind: "$combinedResults" },
{ $replaceRoot: { newRoot: "$combinedResults" } },

7. Sort the combined results based on the `textScore` using the `$sort` operator.

{ $sort: { textScore: -1 } }

8. Remove the textScore field from the final output using the $project stage.

{
$project: {
_id: 0,
objectType: 1,
text: 1,
avatar: 1
}
}

Here’s the complete aggregation pipeline:

db.searches.aggregate([
{
$search: {
index: 'searchesIndex',
text: {
query: 'search term',
path: { wildcard: '*' },
fuzzy: {}
}
}
},
{
$addFields: {
textScore: { $meta: 'searchScore' }
}
},
{
$limit: 10
},
{
$facet: {
brands: [
{ $match: { objectType: 'brands' } },
{
$lookup: {
from: 'brands',
localField: 'objectId',
foreignField: '_id',
as: 'matchedObject'
}
},
{ $unwind: '$matchedObject' },
{
$project: {
_id: 0,
objectType: 1,
text: '$$ROOT.text',
avatar: '$matchedObject.logo',
textScore: 1,
}
},
{ $sort: { textScore: -1 } }
],
models: [
{ $match: { objectType: 'models' } },
{
$lookup: {
from: 'models',
localField: 'objectId',
foreignField: '_id',
as: 'matchedObject'
}
},
{ $unwind: '$matchedObject' },
{
$lookup: {
from: 'brands',
localField: 'matchedObject.brandId',
foreignField: '_id',
as: 'matchedObject.brand'
}
},
{ $unwind: '$matchedObject.brand' },
{
$project: {
_id: 0,
objectType: 1,
text: '$$ROOT.text',
avatar: '$matchedObject.brand.logo',
textScore: 1
}
},
{ $sort: { textScore: -1 } }
],
trims: [
{ $match: { objectType: 'trims' } },
{
$lookup: {
from: 'trims',
localField: 'objectId',
foreignField: '_id',
as: 'matchedObject'
}
},
{ $unwind: '$matchedObject' },
{
$lookup: {
from: 'models',
localField: 'matchedObject.modelId',
foreignField: '_id',
as: 'matchedObject.model'
}
},
{ $unwind: '$matchedObject.model' },
{
$lookup: {
from: 'brands',
localField: 'matchedObject.model.brandId',
foreignField: '_id',
as: 'matchedObject.model.brand'
}
},
{ $unwind: '$matchedObject.model.brand' },
{
$project: {
_id: 0,
objectType: 1,
text: '$$ROOT.text',
avatar: '$matchedObject.model.brand.logo',
textScore: 1
}
},
{ $sort: { textScore: -1 } }
],
}
},
{
$project: {
results: { $concatArrays: ['$brands', '$models', '$trims'] }
}
},
{ $unwind: '$results' },
{ $replaceRoot: { newRoot: '$results' } },
{
$sort: {
textScore: -1
}
},
{
$project: {
textScore: 0
}
}
])

Conclusion

In this article, we’ve demonstrated how to implement a complex fuzzy search in MongoDB that retrieves results from multiple collections using facets and maintains the correct order based on the text score. By using MongoDB’s powerful aggregation pipeline and Atlas Search, you can create flexible and efficient search functionality for your applications

--

--