Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

Joining Collections in MongoDB with .NET Core and an Aggregation Pipeline

TwitterFacebookRedditLinkedInHacker News

If you’ve been keeping up with my .NET Core series on MongoDB, you’ll remember that we explored creating a simple console application as well as building a RESTful API with basic CRUD support. In both examples, we used basic filters when interacting with MongoDB from our applications.

But what if we need to do something a bit more complex, like join data from two different MongoDB collections?

In this tutorial, we’re going to take a look at aggregation pipelines and some of the ways that you can work with them in a .NET Core application.

The Requirements

Before we get started, there are a few requirements that must be met to be successful:

  • Have a MongoDB Atlas cluster deployed and configured.
  • Install .NET Core 6+.
  • Install the MongoDB sample data sets.

We will be using .NET Core 6.0 for this particular tutorial. Older or newer versions might work, but there’s a chance that some of the commands may be a little different. The expectation is that you already have a MongoDB Atlas cluster ready to go. This could be a free M0 cluster or better, but you’ll need it properly configured with user roles and network access rules. You’ll also need the MongoDB sample data sets to be attached.

If you need help with this, check out a previous tutorial I wrote on the topic.

A Closer Look at the Data Model and Expected Outcomes

Because we’re expecting to accomplish some fairly complicated things in this tutorial, it’s probably a good idea to break down the data going into it and the data that we’re expecting to come out of it.

In this tutorial, we’re going to be using the sample_mflix database and the movies collection. We’re also going to be using a custom playlist collection that we’re going to add to the sample_mflix database.

To give you an idea of the data that we’re going to be working with, take the following document from the movies collection:

{
    "_id": ObjectId("573a1390f29313caabcd4135"),
    "title": "Blacksmith Scene",
    "plot": "Three men hammer on an anvil and pass a bottle of beer around.",
    "year": 1893,
    // ...
}

Alright, so I didn’t include the entire document because it is actually quite huge. Knowing every single field is not going to help or hurt the example as long as we’re familiar with the _id field.

Next, let’s look at a document in the proposed playlist collection:

{
    "_id": ObjectId("61d8bb5e2d5fe0c2b8a1007d"),
    "username": "nraboy",
    "items": [
        "573a1390f29313caabcd42e8",
        "573a1391f29313caabcd8a82"
    ]
}

Knowing the fields in the above document is important as they’ll be used throughout our aggregation pipelines.

One of the most important things to take note of between the two collections is the fact that the _id fields are ObjectId and the values in the items field are strings. More on this as we progress.

Now that we know our input documents, let’s take a look at what we’re expecting as a result of our queries. If I were to query for a playlist, I don’t want the id values for each of the movies. I want them fully expanded, like the following:

{
    "_id": ObjectId("61d8bb5e2d5fe0c2b8a1007d"),
    "username": "nraboy",
    "items": [
        {
            "_id": ObjectId("573a1390f29313caabcd4135"),
            "title": "Blacksmith Scene",
            "plot": "Three men hammer on an anvil and pass a bottle of beer around.",
            "year": 1893,
            // ...
        },
        {
            "_id": ObjectId("573a1391f29313caabcd8a82"),
            "title": "The Terminator",
            "plot": "A movie about some killer robots.",
            "year": 1984,
            // ...
        }
    ]
}

This is where the aggregation pipelines come in and some joining because we can’t just do a normal filter on a Find operation, unless we wanted to perform multiple Find operations.

Creating a New .NET Core Console Application with MongoDB Support

To keep things simple, we’re going to be building a console application that uses our aggregation pipeline. You can take the logic and apply it towards a web application if that is what you’re interested in.

From the CLI, execute the following:

dotnet new console -o MongoExample
cd MongoExample
dotnet add package MongoDB.Driver

The above commands will create a new .NET Core project and install the latest MongoDB driver for C#. Everything we do next will happen in the project’s “Program.cs” file.

Open the “Program.cs” file and add the following C# code:

using MongoDB.Driver;
using MongoDB.Bson;

MongoClient client = new MongoClient("ATLAS_URI_HERE");

IMongoCollection<BsonDocument> playlistCollection  = client.GetDatabase("sample_mflix").GetCollection<BsonDocument>("playlist");

List<BsonDocument> results = playlistCollection.Find(new BsonDocument()).ToList();

foreach(BsonDocument result in results) {
    Console.WriteLine(result["username"] + ": " + string.Join(", ", result["items"]));
}

The above code will connect to a MongoDB cluster, get a reference to our playlist collection, and dump all the documents from that collection into the console. Finding and returning all the documents in the collection is not a requirement for the aggregation pipeline, but it might help with the learning process.

The ATLAS_URI_HERE string can be obtained from the MongoDB Atlas Dashboard after clicking “Connect” for a particular cluster.

Building an Aggregation Pipeline with .NET Core Using Raw BsonDocument Stages

We’re going to explore a few different options towards creating an aggregation pipeline query with .NET Core. The first will use raw BsonDocument type data.

We know our input data and we know our expected outcome, so we need to come up with a few pipeline stages to bring it together.

Let’s start with the first stage:

BsonDocument pipelineStage1 = new BsonDocument{
    {
        "$match", new BsonDocument{
            { "username", "nraboy" }
        }
    }
};

The first stage of this pipeline uses the $match operator to find only documents where the username is “nraboy.” This could be more than one because we’re not treating username as a unique field.

With the filter in place, let’s move to the next stage:

BsonDocument pipelineStage2 = new BsonDocument{
    { 
        "$project", new BsonDocument{
            { "_id", 1 },
            { "username", 1 },
            { 
                "items", new BsonDocument{
                    {
                        "$map", new BsonDocument{
                            { "input", "$items" },
                            { "as", "item" },
                            {
                                "in", new BsonDocument{
                                    {
                                        "$convert", new BsonDocument{
                                            { "input", "$$item" },
                                            { "to", "objectId" }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
};

Remember how the document _id fields were ObjectId and the items array were strings? For the join to be successful, they need to be of the same type. The second pipeline stage is more of a manipulation stage with the $project operator. We’re defining the fields we want passed to the next stage, but we’re also modifying some of the fields, in particular the items field. Using the $map operator we can take the string values and convert them to ObjectId values.

If your items array contained ObjectId instead of string values, this particular stage wouldn’t be necessary. It might also not be necessary if you’re using POCO classes instead of BsonDocument types. That is a lesson for another day though.

With our item values mapped correctly, we can push them to the next stage in the pipeline:

BsonDocument pipelineStage3 = new BsonDocument{
    {
        "$lookup", new BsonDocument{
            { "from", "movies" },
            { "localField", "items" },
            { "foreignField", "_id" },
            { "as", "movies" }
        }
    }
};

The above pipeline stage is where the JOIN operation actually happens. We’re looking into the movies collection and we’re using the ObjectId fields from our playlist collection to join them to the _id field of our movies collection. The output from this JOIN will be stored in a new movies field.

The $lookup is like saying the following:

SELECT movies
FROM playlist
JOIN movies ON playlist.items = movies._id

Of course there is more to it than the above SQL statement because items is an array, something you can’t natively work with in most SQL databases.

So as of right now, we have our joined data. However, its not quite as elegant as what we wanted in our final outcome. This is because the $lookup output is an array which will leave us with a multidimensional array. Remember, items was an array and each movies is an array. Not the most pleasant thing to work with, so we probably want to further manipulate the data in another stage.

BsonDocument pipelineStage4 = new BsonDocument{
    { "$unwind", "$movies" }
};

The above stage will take our new movies field and flatten it out with the $unwind operator. The $unwind operator basically takes each element of an array and creates a new result item to sit adjacent to the rest of the fields of the parent document. So if you have, for example, one document that has an array with two elements, after doing an $unwind, you’ll have two documents.

Our end goal, though, is to end up with a single dimension array of movies, so we can fix this with another pipeline stage.

BsonDocument pipelineStage5 = new BsonDocument{
    {
        "$group", new BsonDocument{
            { "_id", "$_id" },
            { 
                "username", new BsonDocument{
                    { "$first", "$username" }
                } 
            },
            { 
                "movies", new BsonDocument{
                    { "$addToSet", "$movies" }
                }
            }
        }
    }
};

The above stage will group our documents and add our unwound movies to a new movies field, one that isn’t multidimensional.

So let’s bring the pipeline stages together so they can be run in our application.

BsonDocument[] pipeline = new BsonDocument[] { 
    pipelineStage1, 
    pipelineStage2, 
    pipelineStage3, 
    pipelineStage4, 
    pipelineStage5 
};

List<BsonDocument> pResults = playlistCollection.Aggregate<BsonDocument>(pipeline).ToList();

foreach(BsonDocument pResult in pResults) {
    Console.WriteLine(pResult);
}

Executing the code thus far should give us our expected outcome in terms of data and format.

Now, you might be thinking that the above five-stage pipeline was a lot to handle for a JOIN operation. There are a few things that you should be aware of:

  • Our id values were not of the same type, which resulted in another stage.
  • Our values to join were in an array, not a one-to-one relationship.

What I’m trying to say is that the length and complexity of your pipeline is going to depend on how you’ve chosen to model your data.

Using a Fluent API to Build Aggregation Pipeline Stages

Let’s look at another way to accomplish our desired outcome. We can make use of the Fluent API that MongoDB offers instead of creating an array of pipeline stages.

Take a look at the following:

var pResults = playlistCollection.Aggregate()
    .Match(new BsonDocument{{ "username", "nraboy" }})
    .Project(new BsonDocument{
            { "_id", 1 },
            { "username", 1 },
            {
                "items", new BsonDocument{
                    {
                        "$map", new BsonDocument{
                            { "input", "$items" },
                            { "as", "item" },
                            {
                                "in", new BsonDocument{
                                    {
                                        "$convert", new BsonDocument{
                                            { "input", "$$item" },
                                            { "to", "objectId" }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        })
    .Lookup("movies", "items", "_id", "movies")
    .Unwind("movies")
    .Group(new BsonDocument{
            { "_id", "$_id" },
            {
                "username", new BsonDocument{
                    { "$first", "$username" }
                }
            },
            {
                "movies", new BsonDocument{
                    { "$addToSet", "$movies" }
                }
            }
        })
    .ToList();

foreach(var pResult in pResults) {
    Console.WriteLine(pResult);
}

In the above example, we used methods such as Match, Project, Lookup, Unwind, and Group to get our final result. For some of these methods, we didn’t need to use a BsonDocument like we saw in the previous example.

Conclusion

You just saw two ways to do a MongoDB aggregation pipeline for joining collections within a .NET Core application. Like previously mentioned, there are a few ways to accomplish what we want, all of which are going to be dependent on how you’ve chosen to model the data within your collections.

There is a third way, which we’ll explore in another tutorial, and this uses LINQ to get the job done.

If you have questions about anything you saw in this tutorial, drop by the MongoDB Community Forums and get involved!

A video version of this tutorial can be seen below.

This content first appeared on MongoDB.

Nic Raboy

Nic Raboy

Nic Raboy is an advocate of modern web and mobile development technologies. He has experience in C#, JavaScript, Golang and a variety of frameworks such as Angular, NativeScript, and Unity. Nic writes about his development experiences related to making web and mobile development easier to understand.