Skip to content Skip to sidebar Skip to footer

Phantomjs Crashes When I Open Too Many Pages And Ignores The Last Url

System: Windows 8.1 64bit with binary from the main page, version 2.0 I have a .txt file with 1 URL per line, I read every line and open the page, searching for a specific url.matc

Solution 1:

Concurrent Requests

You really shouldn't be loading pages in a loop, because a loop is a synchronous construct whereas page.open() is asynchronous. Doing so, you will experience the problem that memory consumption sky-rockets, because all URLs are opening at the same time. This will be a problem with 20 or more URLs in the list.

Function-level scope

The other problem is that JavaScript has function level scope. That means that even when you define the page variable inside of the while block it is available globally. Since it is defined globally, you get a problem with the asynchronous nature of PhantomJS. The page inside of the page.onResourceRequested function definition is very likely not the same page that was used to open a URL which triggered the callback. See more on that here. A common solution would to use an IIFE to bind the page variable to only one iteration, but you need to rethink your whole approach.

Memory-leak

You also have a memory-leak, because when the URL in the page.onResourceRequested event doesn't match, you're not aborting the request and not cleaning the page instance up. You probably want to do that for all URLs and not just the ones that match your specific regex.

Easy fix

A fast solution would be to define a function that does one iteration and call the next iteration when the current one finished. You can also re-use one page instance for all requests.

var page = webPage.create();

functionrunOnce(){
    if (stream.atEnd()) {
        phantom.exit();
        return;
    }
    var url = stream.readLine();
    if (url === "") {
        phantom.exit();
        return;
    }

    page.open(url, function() {});

    page.onResourceRequested = function(requestData, request) {
        /**...**/

        request.abort();

        runOnce();
    };
}

runOnce();

Post a Comment for "Phantomjs Crashes When I Open Too Many Pages And Ignores The Last Url"