xinglinkedinenvelopebubblesgoogleplusfacebooktwitterfeedgithub

Finding a random document in MongoDB (with benchmarks)

15th February 2014 | by Adam Beres-Deak | mongodb, random

Finding a random element is not a trivial task when using MongoDB - especially when performance is crutial.

Finding one random document - Method 1

  1. First of all you have to count how many documents you have in the collection. Optionally you can provide a filter condition (query).

    N = db.myCollection.count(query)

  2. Then you have to generate a random number which is less than the number you counted before.

    R = Math.floor(Math.random() * N)

  3. Then skip that many records and retrieve the next one. If you provided a query at the first step, here you have to use it as well.

    db.collection.find(query).limit(1).skip(R))

Let's see an example

var query = { state: 'OK' };
var n = db.myCollection.count(query);
var r = Math.floor(Math.random() * n);
var randomElement = db.myCollection.find(query).limit(1).skip(r);

Pro:

The data can be intact, no preparation is needed.

Con:

Finding one random document - Method 2

For this method to work, the data has too meet some constraints:

When the data is all set up, querying is rather easy:

var query = {
    state: 'OK',
    rnd: {
        $gte: Math.random()
    }
};

var randomElement = db.myCollection.findOne({ $query: query, $orderby: { rnd: 1 } });

Pro:

Con:

Finding one random document - Method 2.5

There is a simpler variant of the second method. It doesn't retrieve a truly random document, but it may be enough for the specific case.

var query = {
    state: 'OK',
    rnd: {
        $gte: Math.random()
    }
};

var randomElement = db.myCollection.findOne(query);

Please note that there is no $orderby. This can improve performance. The price is that documents are sorted in "find order" rather than "random order". But this could be fine in many cases.

Pro:

Con:

Benchmarks

I created a very simple benchmark. The setup looks like this:

var i = 1000000;
while(i) {
    db.test.save({
        name: "some lorem ipsum not important",
        rnd: Math.random()
    });
    i--;
}
db.test.ensureIndex({ rnd: 1 });

Then I ran all the three methods on my computer with a local database, 10000 times each.

var startDate = new Date();
var i = 10000;
while (i) {
    var n = db.test.count();
    var r = Math.floor(Math.random() * n);
    var randomElement = db.test.find().limit(1).skip(r);
    i--;
}
var t1 = new Date() - startDate;


var startDate = new Date();
var i = 10000;
while (i) {
    var query = {
        rnd: {
            $gte: Math.random()
        }
    };

    var randomElement = db.myCollection.findOne({ $query: query, $orderby: { rnd: 1 } });
    i--;
}
var t2 = new Date() - startDate;


var startDate = new Date();
var i = 10000;
while (i) {
    var query = {
        rnd: {
            $gte: Math.random()
        }
    };

    var randomElement = db.myCollection.findOne(query);
    i--;
}
var t3 = new Date() - startDate;


print(t1, t2, t3);

And the winner is "Method 2.5"

Here are my benchmark results:

MethodTime for 10,000 runs (seconds)
Method 2.52.234
Method 22.297
Method 12.813

Although the differences are not quite big, the winner is definitely "Method 2.5". I think, when using more realistic data, the differences are also going to be much larger.

Do you have your own benchmark results with different data? Please feel free to comment it.

by Adam Beres-Deak

| Share | Tweet | Share | Share

Latest blog posts

Displaying icons with custom elements 14th October 2015

Plain JavaScript event delegation 26th January 2015

After the first year of blogging - what happened on my blog in 2014? 1st January 2015

Better webfont loading with using localStorage and providing WOFF2 support 18th December 2014

Worth watching: Douglas Crockford speaking about the new good parts of JavaScript in 2014 20th October 2014