for循环调用耗尽内存
- The Story
Here’s how it started: two weeks ago I was writing a web scraper for thepiratebay. My idea was simple: I wanted to get a JSON dump of all torrent information available, so that I could later use it for some simple data analysis.
After taking a look at the site, I realized that the simplest way to scrape all the existing torrents would be to just loop through all integers, querying each one sequentially – this is because TPB allows you to access torrents via their integer ID (which is always increasing):
- http://thepiratebay.se/torrent/1
- http://thepiratebay.se/torrent/2
- http://thepiratebay.se/torrent/3
- http://thepiratebay.se/torrent/…
The rules are simple: if you get a 404 skip it – if you get a 200, the torrent exists and can be scraped!
So, I sat down and wrote a first version that looked something like this:
|
- This is some pretty basic stuff:
- Iterate through numbers? CHECK!
- Make HTTP requests? CHECK!
But to my dismay, after running for a few minutes I noticed that this small program was eating all the RAM on my laptop! But why?!
I realized that Node.js blocks when running blocking code (eg: a for loop) – but I figured that since I was making async requests from within things would continue to work normally.
I was wrong.
So, being confused about what was happening, I decided to dig a bit deeper. I narrowed my case down to a simpler test:
|
async.forever(
But alas, the same problem. The program simply runs for a few minutes, then crashes as it uses all the RAM on my computer. Bummer.
So then I started Googling around to find potential solutions. Surely this must be a common issue?
Unfortunately, however, I didn’t see much discussion about this, and all the relevant Stack Overflow threads proposed solutions that didn’t require looping at all (not an option in my case).
Next, I turned to async – the really popular flow control library for Node. After looking through the docs, I realized there was something that was seemingly perfect for this! The forever construct!
So I then tried the following:
|
setInterval
But again – the same issue. After a few thousand loops: crash.
After writing quite a few different iterations of this simple program, and a significant amount of lost sleep (I can’t really sleep well knowing I don’t understand something – grr) – my coworker Robert proposed a working solution:
|
Brilliant! I didn’t even think of setInterval for some reason.
Anyhow: after a lot of discussion – we both came to the agreement that using setInterval is essentially the only way to solve this problem.
After thinking about this some more, I decided to write a small abstraction layer to handle this – so I created lupus.
lupus provides simple (albeit, basic) asynchronous looping for Node.js:
|
Whatever you end up writing inside of the loop (blocking or not) – lupus doesn’t care.
The Moral
Performing asynchronous for loops in Node.js turned out to be quite a lot harder than I expected. I find it odd that it’s so easy to crash my programs with the simplest of looping examples.
Oh well! Live and learn!
PS: If you read this far, you might want to follow me on twitter or github and subscribe via RSS or email below (I’ll email you new articles when I publish them).
最受欢迎的方式-async/await
A Better Way: Async/Await
The async/await keywords are a wonderful mechanism for modeling asynchronous control-flow in computer programs. In JavaScript, these keywords are syntactic sugar on top of Promises–they abstract away the calls to Promise.then. In the following code, we refactor the getFishAndChips function to use async/await.
|
read more
- Naive Approach
Your first instinct when using fetch might be to do something like this:
|
In the above code, we are nesting two fetch calls in order to ensure that we request our chips after we request our fish. This is because, in order to request chips, we need to send an array of fish IDs with the POST request. This works, however, there is a big readability problem here. It is possible to reduce the amount of code needed for this feature dramatically. Let’s take a look at Promise chaining!