Rabbit mongoose, part one: getting data faster

Mongoose makes developers life easier and is easy to use. But the examples we’ve all seen is not the fastest way of using it. Here is some tips than can help you when dealing with huge amount of data.

♻ Foreword

I’m  a fan of Promises, so I gonna use them in my examples. However the same can be achieved using callback.

 

♻  Plain javascript object

By default, mongoose returns its own objects and not the javascript objects returned by mongo’s driver. Mongoose objects have useful methods, like save

  • This is handy…when we need those methods
  • Transforming plain javascript object to mongoose magic objects has a performance cost…even when we need not those extra methods.

  The conclusion is obvious: avoid mongoose objects whenever they are not needed.

You probably already know but in case you don’t, lot of models’ methods return query objects. The query objects have the lean option to prevent transformation to mangoose magical objects. Using lean is straightforward:

MyModel.find (…).lean ().then (…);

 

♻ Get only what we need: fields list

By default, mongoose returns the full documents, with all fields. But what if we need only some fields ? All finds methods provide fields filter. If want name and age fields only I write:

MyModel.find(…,’name age’).then (…)

The above example select only name and age fields. The opposite exists also:

MyModel.find(…, ‘-name -age’).then (…)

This gets all fields but name and age.  Note that you can exclude a field by default when declaring the model: Selection in schema type. This is handy for security: if password or any sensitive informations are stored, unelected them by default ensures that they will be used only when needed, thus reducing risk of disclosure.

 

♻ Get only what we need: array elements

With projection operator or element match, query results  contains only first element matching the query. This is useful when we need one subdocument stored in an array. Without them, a query returns the full array elements.

Let’s say that we want the name and professional adresses of our contacts, we have to write a query which select the contacts with address type pro and returns expected fields:

ContactsModel.find ({adresses.type: ‘pro’ }, {_id: 1, name:1, addresses. $: 1}).then (…)

address.$ is the mean to get only the selected address (the same can be done using element match) .

 

💡 Have you notice the way fields are selected in the above example ? Here I use an object and note a string like in the previous chapter.  ‘_id name address. $’ and {_id: 1, name:1, addresses. $: 1} has both the same meaning but the object approach is the only one used by mongo native driver.  When we use the string approach, Mongoose generates the matching objects. The object approach, if verbose, is slightly faster.

 


 

This conclue my tips about mongoose optimization. I hope you find them useful. Notice that this post focus only on coding, you can find tips about mongoose configuration in mongoose doc.

 

  🙋 See you and have a nice code

 

Advertisements

Effective NodeJS, part two: setImmediate & process.nextTick

As explain in my previous posts, Event Loop is single threaded. If a code spend too much time in it, event loop becomes a botleneck, callbacks and events processing are delayed. An easy solution is to split the code into “chunks”: event loop thread process a chunk of code, then callbacks, then events, then another chunks then callbacks then events, and so on. For sure, such splitted code complets slower than one-block code, but at least the whole application stay reactive.

Node provides 2 means that allow code “chunking” setImmediate and process.nextTick. For using them properly one need to understand how they fit in the event loop and how they interact with callbacks and events processing.

 

First example

setTimeout( () => { 
    console.log('Timer 1');} , 
    0
);

setImmediate( () => { 
    console.log('Immediate 1'); 
});

setImmediate( () => {
    console.log('Immediate 2'); 
});

process.nextTick( () => {
    console.log('Next tick 2');  
});

setTimeout( () => { 
    console.log('Timer 2'); }, 
0);

(Timers are also in event loop, so I added them to obtain a better overview of event loop iteration)

With a Nodejs 4.x, this produces the following output:

Next tick 2
Timer 1
Timer 2
Immediate 1
Immediate 2

😯 Outputs does not follow the code order. And no mater the instructions order would be, the outputs would be the same. This spots the way an event loop iteration behaves :

  • First, ticks and callback are excuteded
  • Then timers are proceeded
  • And finally the events and setImmediate are proceeded.

 

Do you see in what nextTick and setImmediate differs ?

 

With nextTick, the code is executed at the begin of the iteration, thus it delays events processing. With setImmediate, the code is executed at the end and so it is delayed by the callbacks and timers but does not delay the events to be processing. Now, up to you to see what is your priority.

 

Second exemple

'use strict';

setTimeout( () => {
    console.log('Timer 1');
}, 0);

setImmediate( () => {
    console.log('Immediate 1');
    setImmediate( () => { 
        console.log('Immediate from immeditate 1');
    });
    process.nextTick( () => { 
        console.log('Next tick from Immediate 1'); 
    });
    setTimeout( () => { 
        console.log('Timer from Immediate 1');},
     0);
});

process.nextTick( () => {
    console.log('Next tick 1');
});

setImmediate( () => {
    console.log('Immediate 2');
});

process.nextTick( () => {
    console.log('Next tick 2');
});

This produces the outputs:

Next tick 1
Next tick 2
Timer 1
Immediate 1
Immediate 2
Next tick from Immediate 1
Timer from Immediate 1
Immediate from immeditate 1

Here, no suprise: the first event loop iteration executes ticks, timers and setImmediate, then the second iteration executes the ticks, timers and setImmediate added in the first setImmediate. But when one do the same with nextTick, there’s a surprise…

 

Third example

'use strict';

setTimeout( () => {
    console.log('Timer 1');
}, 0);

setImmediate( () => {
    console.log('Immediate 1');
});

process.nextTick( () => {
    console.log('Next tick 1');
    process.nextTick( () => { 
        console.log('Next tick from next tick 1');
    });
    setImmediate( () => { 
        console.log('Immediate from next tick 1');
    });
    setTimeout( () => {
       console.log('Timer from next tick 1');},
    0);
});

setImmediate( () => {
    console.log('Immediate 2');
});

process.nextTick( () => {
    console.log('Next tick 2');
});

This leads to the output:

Next tick 1
Next tick 2
Next tick from next tick 1
Timer 1
Timer from next tick 1
Immediate 1
Immediate 2
Immediate from next tick 1

Ticks, timers and setImmediate added in the nextTick are executed in the same iteration, thus the tick “tick from next tick 1” execution delay setImmediate and events processing. And if it would call some other nexTick, those last would also been processing in the event loop and would also delay event processing, and so on…

 

Conclusion

One can “chunks” his code using setImmediate or nextTick depending on its priority (chunked code to be executed before or after events). But when it comes to recursivity (chunked code using setImmediate/nextTick to execute code which also call setImmediate/nexTick), one should avoid nextTick, because this would postponed events processing and would prevent NodeJS to behave reactively.

 


 

This conclude my series about NodeJS overview. My next post will be about how to use mongoose in an optimized way.

 

See you and have a nice code 🙋

Effective NodeJS, part one

NodeJS, What for ?

nodejs-green

 

 

Node is not suited for CPU eavy applications, but it is OK for Rest server, chat server or web gammig: Node is made for processing lightweight requests.

 

One should consider Node for serving requests issued by single page applications written with Angular or Backbone. This is the case where Node is the best. But you can also use Node to generate HTLM using tools like Handlebar, and Sails lets you create application with MVC architecture. To be complete, streams allow fast processing of huge amount of data making Node a tool of choice for ETL.

 

♻ Effective application architecture

At this time of writing (February 2016), clustering a single Node app is the buzzword.  It consists of forking the Node process, HTTP connections are then spread around the children processes. To find out how this can be done, I invite you to read this post.

 

♻ Code : lesson I’ve learned

At my beginning with Node, I wanted a mean to hash passwords with a salt to store them in a database.  For this I first used a library found on npms.org. It was working well and quickly…. when processing one request at a time.
When I tried multiple requests, response times were terrible 😨.
What was going wrong ❓❓❓
The library I was using was doing all its computation in the event loop, thus preventing Node to process multiple requests in parallel 💡( you can find an introduction on event loop in my previous blog).

Knowing that, I removed the lib and wrote a hashing and salting function using Node’s Crypto module.

  • It’s more work
  • Response time for completing single request was a little bit longer than with the previous lib.
  • … Response time while processing multiples requests was actually better

The reason ? I was using asynchronous functions, which allow to spend less time in event loop, thus lets Node to process requests in parallel (to simplify, event loop was able to treat a request while a worker thread was computing the hash). This lead us to the golden rule when developing a Node app:

As far as possible, don’t overload the event loop

This can be achieved using some rules:

  • Always prefer asynchronous functions to synchronous ones. (I’m not always following this rule, but only when I write application initialization)
  • When comparing modules, prefer  those using asynchronous Node’s API or having their own C++ module
  • If you use to develop javascript on browser side, sorry but forget your favorite libraries. No matter how rapid they are, they are not design to fit event loop.

 

Lastly, if you have a code that keeps event loop busy for a while, you probably  haven’t good reactivity. There’s 3 ways to address this:

  1. Create a specific application that can be launched from the event loop. Here is NodeJS documentation for managing child process..
  2. If you’re courageous enough, you can create your own Node Addon.
  3. You can split your code in chunks using setImmediate or process.nextTick

 

I’ve not used child nor wrote addon yet, so I won’t break them down: my next post will be about what I know: it will dig into event loop and will explain how to use setImmediat and process.nextTick.

 

See you and have a nice code 🙋