Aggregating with MongoDB

Moving from a relational database to a NoSQL data store isn’t without some costs. If you’re like me and largely living in a Java world, one cost is immediately losing the expressiveness of ORM tools like Hibernate or JPA. Outside of the Java domain, several of SQL’s projection or aggregation functions are lost. For example, it’s hard for any application to easily live without the count(), avg(), min() and max() functions.

MongoDB, my choice for NoSQL data stores, addressed this shortcoming with its built-in map/reduce functionality. Suppose you want to count number of blog posts Bill has made. In a basic relational model, the SQL statement is straightforward:

select count(*) from blogposts where username='Bill';

Things are not so simple in MongoDB. To accomplish the same thing as the SQL statement shown above, you need to break out a map/reduce function. One representation of this function may look like:

db.blogPosts.group({
    cond: {username: 'Bill'},
    reduce: function(obj,prev) { prev.count++ },
    initial: {count : 0},
});

We can see how this new function maps to the SQL statement. The ‘cond’ parameter defines the conditions we want to match. In this case we’re selecting documents where the username attribute equals “Bill”. Next we define the reduce function. Put simply, this function does something with the objects the condition returns. The reduce function is passed the current object (obj) and the counter object (prev). Finally, the initial parameter initializes the attributes on the counter object. Executing the above function returns the count:

[ { "count" : 27 } ]

If you’re trying to accomplish this aggregation with the Spring Framework’s Data project, it’s quite a bit more cumbersome:

DBCollection collection = mongoTemplate.getCollection("blogPosts");
GroupCommand cmd = new GroupCommand(collection,
    null,
    new BasicDBObject("username", "Bill"),
    new BasicDBObject("count", 0),
    "function(obj,prev) {prev.count++;}",
    null);
return collection.group(cmd);

The ‘collection.group(cmd)’ statement returns a DBObject containing the count results.

MongoDB 2.2 introduces a simplified aggregation framework, making these operations much easier to understand and implement. This new aggregation framework will be the subject of a future post. In the meantime, I hope this has shed some light on how to perform operations in MongoDB you would normally take for granted in relational databases. This post has only scratched the surface of the current state of aggregation in MongoDB. For the full scoop, refer to the MongoDB Aggregation Framework documentation.

Share

Do you have something to say?

Your email is never published nor shared.
Required fields are marked *