Phantomjs Crashes When I Open Too Many Pages And Ignores The Last Url
Solution 1:
Concurrent Requests
You really shouldn't be loading pages in a loop, because a loop is a synchronous construct whereas page.open()
is asynchronous. Doing so, you will experience the problem that memory consumption sky-rockets, because all URLs are opening at the same time. This will be a problem with 20 or more URLs in the list.
Function-level scope
The other problem is that JavaScript has function level scope. That means that even when you define the page
variable inside of the while
block it is available globally. Since it is defined globally, you get a problem with the asynchronous nature of PhantomJS. The page
inside of the page.onResourceRequested
function definition is very likely not the same page
that was used to open a URL which triggered the callback. See more on that here. A common solution would to use an IIFE to bind the page
variable to only one iteration, but you need to rethink your whole approach.
Memory-leak
You also have a memory-leak, because when the URL in the page.onResourceRequested
event doesn't match, you're not aborting the request and not cleaning the page instance up. You probably want to do that for all URLs and not just the ones that match your specific regex.
Easy fix
A fast solution would be to define a function that does one iteration and call the next iteration when the current one finished. You can also re-use one page
instance for all requests.
var page = webPage.create();
functionrunOnce(){
if (stream.atEnd()) {
phantom.exit();
return;
}
var url = stream.readLine();
if (url === "") {
phantom.exit();
return;
}
page.open(url, function() {});
page.onResourceRequested = function(requestData, request) {
/**...**/
request.abort();
runOnce();
};
}
runOnce();
Post a Comment for "Phantomjs Crashes When I Open Too Many Pages And Ignores The Last Url"