Toc
  1. for循环调用耗尽内存
  2. async.forever(
  3. setInterval
    1. The Moral
  • 最受欢迎的方式-async/await
    1. read more
  • Toc
    0 results found
    catzillaorz
    Loop in node
    2021/05/26 node loop node async
    for循环调用耗尽内存
    • The Story

    Here’s how it started: two weeks ago I was writing a web scraper for thepiratebay. My idea was simple: I wanted to get a JSON dump of all torrent information available, so that I could later use it for some simple data analysis.

    After taking a look at the site, I realized that the simplest way to scrape all the existing torrents would be to just loop through all integers, querying each one sequentially – this is because TPB allows you to access torrents via their integer ID (which is always increasing):

    The rules are simple: if you get a 404 skip it – if you get a 200, the torrent exists and can be scraped!

    So, I sat down and wrote a first version that looked something like this:

    var request = require('request');

    for (var i = 0; i < 10000000; i++) {
    request('http://thepiratebay.se/' + i, ...);
    }
    • This is some pretty basic stuff:
      • Iterate through numbers? CHECK!
      • Make HTTP requests? CHECK!

    But to my dismay, after running for a few minutes I noticed that this small program was eating all the RAM on my laptop! But why?!

    I realized that Node.js blocks when running blocking code (eg: a for loop) – but I figured that since I was making async requests from within things would continue to work normally.

    I was wrong.

    So, being confused about what was happening, I decided to dig a bit deeper. I narrowed my case down to a simpler test:

    for (var i = 0; i < 10000000; i++) {
    console.log('hi:', i);
    }
    async.forever(

    But alas, the same problem. The program simply runs for a few minutes, then crashes as it uses all the RAM on my computer. Bummer.

    So then I started Googling around to find potential solutions. Surely this must be a common issue?

    Unfortunately, however, I didn’t see much discussion about this, and all the relevant Stack Overflow threads proposed solutions that didn’t require looping at all (not an option in my case).

    Next, I turned to async – the really popular flow control library for Node. After looking through the docs, I realized there was something that was seemingly perfect for this! The forever construct!

    So I then tried the following:

    var async = require('async');

    var i = 0;
    async.forever(
    function(next) {
    console.log('hi:', i);
    i++;
    next();
    },
    function(err) {
    console.log('All done!');
    }
    );
    setInterval

    But again – the same issue. After a few thousand loops: crash.

    After writing quite a few different iterations of this simple program, and a significant amount of lost sleep (I can’t really sleep well knowing I don’t understand something – grr) – my coworker Robert proposed a working solution:


    var Abstraction = function() {
    this.index = -1;
    };

    Abstraction.prototype.getIndex = function getIndex() {
    this.index++;
    return this.index;
    };

    Abstraction.prototype.isDoneTest = function isDoneTest() {
    return this.index > 10000000;
    };

    var list = new Abstraction();

    function iterator(){
    var i = list.getIndex();
    console.log(i);
    if(list.isDoneTest()){
    clearInterval(interval);
    }
    }

    var interval = setInterval(iterator,1);

    Brilliant! I didn’t even think of setInterval for some reason.

    Anyhow: after a lot of discussion – we both came to the agreement that using setInterval is essentially the only way to solve this problem.

    After thinking about this some more, I decided to write a small abstraction layer to handle this – so I created lupus.

    lupus provides simple (albeit, basic) asynchronous looping for Node.js:

    var lupus = require('lupus');

    lupus(0, 10000000, function(n) {
    console.log("We're on:", n);
    }, function() {
    console.log('All done!');
    });

    Whatever you end up writing inside of the loop (blocking or not) – lupus doesn’t care.

    The Moral

    Performing asynchronous for loops in Node.js turned out to be quite a lot harder than I expected. I find it odd that it’s so easy to crash my programs with the simplest of looping examples.

    Oh well! Live and learn!

    PS: If you read this far, you might want to follow me on twitter or github and subscribe via RSS or email below (I’ll email you new articles when I publish them).

    最受欢迎的方式-async/await

    A Better Way: Async/Await

    The async/await keywords are a wonderful mechanism for modeling asynchronous control-flow in computer programs. In JavaScript, these keywords are syntactic sugar on top of Promises–they abstract away the calls to Promise.then. In the following code, we refactor the getFishAndChips function to use async/await.

    // We have to get chips after we get fish...
    async getFishAndChips() {
    const fish = await fetch(this.fishApiUrl).then(response => response.json());
    this.fish = fish;

    const fishIds = fish.map(fish => fish.id),
    chipReqOpts = { method: 'POST', body: JSON.stringify({ fishIds }) };

    const chips = await fetch(this.chipsApiUrl, chipReqOpts).then(response => response.json());
    this.chips = chips;
    }
    read more
    • Naive Approach

    Your first instinct when using fetch might be to do something like this:


    // my-component.jsx

    // We have to get chips after we get fish...
    getFishAndChips() {
    fetch(this.fishApiUrl) // Request fish species
    .then(fishRes => {
    fishRes.json().then(fish => {
    this.fish = fish;

    const fishIds = fish.map(fish => fish.id);

    fetch( // Request chips using fish ids
    this.chipsApiUrl,
    {
    method: 'POST',
    body: JSON.stringify({ fishIds })
    }
    )
    .then(chipsRes => {
    chipsRes.json().then(chips => {
    this.chips = chips;
    })
    })
    })
    })
    }

    In the above code, we are nesting two fetch calls in order to ensure that we request our chips after we request our fish. This is because, in order to request chips, we need to send an array of fish IDs with the POST request. This works, however, there is a big readability problem here. It is possible to reduce the amount of code needed for this feature dramatically. Let’s take a look at Promise chaining!

    打赏
    支付宝
    微信
    本文作者:catzillaorz
    版权声明:本文首发于catzillaorz的博客,转载请注明出处!