Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

Influence Search Result Ranking with Function Scores in Atlas Search

TwitterFacebookRedditLinkedInHacker News

When it comes to natural language searching, it’s useful to know how the order of the results for a query were determined. Exact matches might be obvious, but what about situations where not all the results were exact matches due to a fuzzy parameter, the $near operator, or something else?

This is where the document score becomes relevant.

Every document returned by a $search query in MongoDB Atlas Search is assigned a score based on relevance, and the documents included in a result set are returned in order from highest score to lowest.

You can choose to rely on the scoring that Atlas Search determines based on the query operators, or you can customize its behavior using function scoring and optimize it towards your needs. In this tutorial, we’re going to see how the function option in Atlas Search can be used to rank results in an example.

Per the documentation, the function option allows the value of a numeric field to alter the final score of the document. You can specify the numeric field for computing the final score through an expression. With this in mind, let’s look at a few scenarios where this could be useful.

Let’s say that you have a review system like Yelp where the user needs to provide some search criteria such as the type of food they want to eat. By default, you’re probably going to get results based on relevance to your search term as well as the location that you defined. In the examples below, I’m using the sample restaurants data available in MongoDB Atlas.

The $search query (expressed as an aggregation pipeline) to make this search happen in MongoDB might look like the following:

[
    {
        "$search": {
            "text": {
                "query": "korean",
                "path": [ "cuisine" ],
                "fuzzy": {
                    "maxEdits": 2
                }
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "name": 1,
            "cuisine": 1,
            "location": 1,
            "rating": 1,
            "score": {
                "$meta": "searchScore"
            }
        }
    }
]

The above query is a two stage aggregation pipeline in MongoDB. The first stage is searching for “korean” in the “cuisine” document path. A fuzzy factor is applied to the search so spelling mistakes are allowed. The document results from the first stage might be quite large, so in the second stage we’re specifying which fields to return for every document. This includes a search score that is not part of the original document, but part of the search results.

As a result, you might end up with the following results:

[
    {
        "location": "Jfk International Airport",
        "cuisine": "Korean",
        "name": "Korean Lounge",
        "rating": 2,
        "score": 3.5087265968322754
    },
    {
        "location": "Broadway",
        "cuisine": "Korean",
        "name": "Mill Korean Restaurant",
        "rating": 4,
        "score": 2.995847225189209
    },
    {
        "location": "Northern Boulevard",
        "cuisine": "Korean",
        "name": "Korean Bbq Restaurant",
        "rating": 5,
        "score": 2.995847225189209
    }
]

The default ordering of the documents returned is based on the score value in descending order. The higher the score, the closer your match.

It’s very unlikely that you’re going to want to eat at the restaurants that have a rating below your threshold, even if they match your search term and are within the search location. With the function option we can assign a point system to the rating and perform some arithmetic to give better rated restaurants a boost in your results.

Let’s modify the search query to look like the following:

[
    {
        "$search": {
            "text": {
                "query": "korean",
                "path": [ "cuisine" ],
                "fuzzy": {
                    "maxEdits": 2
                },
                "score": {
                    "function": {
                        "multiply": [
                            {
                                "score": "relevance"
                            },
                            {
                                "path": {
                                    "value": "rating",
                                    "undefined": 1
                                }
                            }
                        ]
                    }
                }
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "name": 1,
            "cuisine": 1,
            "location": 1,
            "rating": 1,
            "score": {
                "$meta": "searchScore"
            }
        }
    }
]

In the above two-stage aggregation pipeline, the part to pay attention to is the following:

"score": {
    "function": {
        "multiply": [
            {
                "score": "relevance"
            },
            {
                "path": {
                    "value": "rating",
                    "undefined": 1
                }
            }
        ]
    }
}

What we’re saying in this part of the $search query is that we want to take the relevance score that we had already seen in the previous example and multiply it by whatever value is in the rating field of the document. This means that the score will potentially be higher if the rating of the restaurant is higher. If the restaurant does not have a rating, then we use a default multiplier value of 1.

If we run this query on the same data as before, we might now get results that look like this:

[
    {
        "location": "Northern Boulevard",
        "cuisine": "Korean",
        "name": "Korean Bbq Restaurant",
        "rating": 5,
        "score": 14.979236125946045
    },
    {
        "location": "Broadway",
        "cuisine": "Korean",
        "name": "Mill Korean Restaurant",
        "rating": 4,
        "score": 11.983388900756836
    },
    {
        "location": "Jfk International Airport",
        "cuisine": "Korean",
        "name": "Korean Lounge",
        "rating": 2,
        "score": 7.017453193664551
    }
]

So now while “Korean BBQ Restaurant” might be further in terms of location, it appears higher in our result set because the rating of the restaurant is higher.

Increasing the score based on rating is just one example. Another scenario could be to give search result priority to restaurants that are sponsors. A function multiplier could be used based on the sponsorship level.

Let’s look at a different use case. Say you have an e-commerce website that is running a sale. To push search products that are on sale higher in the list than items that are not on sale, you might use a constant score in combination with a relevancy score.

An aggregation that supports the above example might look like the following:

db.products.aggregate([
    {
        "$search": {
            "compound": { 
                "should": [
                    { 
                        "text": { 
                            "path": "promotions", 
                            "query": "July4Sale", 
                            "score": { 
                                "constant": { 
                                    "value": 1 
                                }
                            }
                        }
                    }
                ],
                "must": [ 
                    { 
                        "text": { 
                            "path": "name", 
                            "query": "bose headphones"
                        }
                    }
                ]
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "name": 1,
            "promotions": 1,
            "score": { "$meta": "searchScore" }
        }
    }
]);

To get into the nitty gritty of the above two-stage pipeline, the first stage uses the compound operator for searching. We’re saying that the search results must satisfy “bose headphones” and if the result-set should contain “July4Sale” in the promotions path, then add a constant of one to the score for that particular result item to boost its ranking.

The should operator doesn’t require its contents to be satisfied, so you could end up with headphone results that are not part of the “July4Sale”. Those result items just won’t have their score increased by any value, and therefore would show up lower down in the list. The second stage of the pipeline just defines which fields should exist in the response.

Conclusion

Being able to customize how search result sets are scored can help you deliver more relevant content to your users. While we looked at a couple examples around the function option with the multiply operator, there are other ways you can use function scoring, like replacing the value of a missing field with a constant value or boosting the results of documents with search terms found in a specific path. You can find more information in the Atlas Search documentation.

Don’t forget to check out the MongoDB Community Forums to learn about what other developers are doing with Atlas Search.

This content first appeared on MongoDB.

Nic Raboy

Nic Raboy

Nic Raboy is an advocate of modern web and mobile development technologies. He has experience in C#, JavaScript, Golang and a variety of frameworks such as Angular, NativeScript, and Unity. Nic writes about his development experiences related to making web and mobile development easier to understand.