Implementing Complex Text Search in MongoDB with Facets and Atlas Search
In this article, we’ll explore how to implement a fuzzy search using MongoDB that retrieves results from multiple collections based on a search term. We’ll also dive into the process of aggregating these results and displaying them as a unified list while maintaining the correct order based on the text score.
Scenario
Imagine we have a MongoDB database containing three collections: brands
, models
, and trims
. Additionally, there's a searches
collection with the following fields: text
(string), objectType
(string), and objectId
(MongoDB ObjectId). The objectType
field stores the collection name, like 'brands', 'models', or 'trims', while the objectId
field refers to the document's ID in the corresponding collection.
// example documents of "searches" collection
_id: 1122334455
text: "Audi"
objectId: 1122334455
objectType: "brands"
_id: 1122334455
text: "Acura Integra"
objectId: 1122334455
objectType: "models"
_id: 1122334455
text: "Porsche 911 Carrera 4"
objectType: "trims"
objectId: 1122334455
Objective
Our goal is to write a query that performs a fuzzy search using the text
field from the searches
collection, and retrieves the associated object from the corresponding collection based on the objectType
and objectId
. Furthermore, we want to maintain the correct order of search results based on the text score.
// When searching for "porsche carrera":
[
{objectType: "models", text: "Porsche Carrera GT", avatar: "Porsche-logo.png"},
{objectType: "trims", text: "Porsche 911 Carrera", avatar: "Porsche-logo.png"},
{objectType: "brands", text: "Porsche", avatar: "Porsche-logo.png"},
]
Solution
To achieve this, we’ll use MongoDB’s aggregation pipeline, which allows us to perform complex data processing tasks in combination with MongoDB’s Atlas Search product. Here’s the step-by-step breakdown of the pipeline:
- Use the
$search
stage to fuzzy search thesearches
collection based on the search term.
{
$search: {
index: 'searchesIndex',
text: {
query: "text to search",
path: { wildcard: '*' },
fuzzy: {}
}
}
}
2. Add a new field textScore
to the documents using $addFields
.
{
$addFields: {
textScore: { $meta: 'searchScore' }
}
},
3. Use the $facet
stage to separate the pipeline into branches based on the objectType
('brands', 'models', and 'trims').
{
$facet: {
brands: [ /* ... */ ],
models: [ /* ... */ ],
trims: [ /* ... */ ]
}
}
4. In each branch, use the $lookup
operator to fetch the associated object from the corresponding collection based on the objectId
. For example, the $match
and $lookup
stages for 'brands':
[
{ $match: { objectType: 'brands' } },
{
$lookup: {
from: "brands",
localField: "objectId",
foreignField: "_id",
as: "matchedObject"
}
}
]
5. Within each branch, sort the results based on the textScore
using the $sort
operator.
{ $sort: { textScore: -1 } }
6. Use $concatArrays
to combine the results from each branch, unwind the combined results, and replace the root with each combined result.
{
$project: {
combinedResults: {
$concatArrays: ["$brands", "$models", "$trims"]
}
}
},
{ $unwind: "$combinedResults" },
{ $replaceRoot: { newRoot: "$combinedResults" } },
7. Sort the combined results based on the `textScore` using the `$sort` operator.
{ $sort: { textScore: -1 } }
8. Remove the textScore
field from the final output using the $project
stage.
{
$project: {
_id: 0,
objectType: 1,
text: 1,
avatar: 1
}
}
Here’s the complete aggregation pipeline:
db.searches.aggregate([
{
$search: {
index: 'searchesIndex',
text: {
query: 'search term',
path: { wildcard: '*' },
fuzzy: {}
}
}
},
{
$addFields: {
textScore: { $meta: 'searchScore' }
}
},
{
$limit: 10
},
{
$facet: {
brands: [
{ $match: { objectType: 'brands' } },
{
$lookup: {
from: 'brands',
localField: 'objectId',
foreignField: '_id',
as: 'matchedObject'
}
},
{ $unwind: '$matchedObject' },
{
$project: {
_id: 0,
objectType: 1,
text: '$$ROOT.text',
avatar: '$matchedObject.logo',
textScore: 1,
}
},
{ $sort: { textScore: -1 } }
],
models: [
{ $match: { objectType: 'models' } },
{
$lookup: {
from: 'models',
localField: 'objectId',
foreignField: '_id',
as: 'matchedObject'
}
},
{ $unwind: '$matchedObject' },
{
$lookup: {
from: 'brands',
localField: 'matchedObject.brandId',
foreignField: '_id',
as: 'matchedObject.brand'
}
},
{ $unwind: '$matchedObject.brand' },
{
$project: {
_id: 0,
objectType: 1,
text: '$$ROOT.text',
avatar: '$matchedObject.brand.logo',
textScore: 1
}
},
{ $sort: { textScore: -1 } }
],
trims: [
{ $match: { objectType: 'trims' } },
{
$lookup: {
from: 'trims',
localField: 'objectId',
foreignField: '_id',
as: 'matchedObject'
}
},
{ $unwind: '$matchedObject' },
{
$lookup: {
from: 'models',
localField: 'matchedObject.modelId',
foreignField: '_id',
as: 'matchedObject.model'
}
},
{ $unwind: '$matchedObject.model' },
{
$lookup: {
from: 'brands',
localField: 'matchedObject.model.brandId',
foreignField: '_id',
as: 'matchedObject.model.brand'
}
},
{ $unwind: '$matchedObject.model.brand' },
{
$project: {
_id: 0,
objectType: 1,
text: '$$ROOT.text',
avatar: '$matchedObject.model.brand.logo',
textScore: 1
}
},
{ $sort: { textScore: -1 } }
],
}
},
{
$project: {
results: { $concatArrays: ['$brands', '$models', '$trims'] }
}
},
{ $unwind: '$results' },
{ $replaceRoot: { newRoot: '$results' } },
{
$sort: {
textScore: -1
}
},
{
$project: {
textScore: 0
}
}
])
Conclusion
In this article, we’ve demonstrated how to implement a complex fuzzy search in MongoDB that retrieves results from multiple collections using facets and maintains the correct order based on the text score. By using MongoDB’s powerful aggregation pipeline and Atlas Search, you can create flexible and efficient search functionality for your applications