I setup Nutch with a db.fetch.interval.default of 60000 so that I can crawl every day. If I don't, it won't even look at my site when I crawl the next ...
I search for a web crawler solution which can is mature enough and can be simply extended. I am interested in the following features... or possibility to extend the crawler to ...
I am stuck! Can`t get Nutch to crawl for me by small patches. I start it by bin/nutch crawl command with parameters -depth 7 and -topN 10000. And it never ends. ...
Is there a way to get Nutch to increase the crawling of pages that gets updated frequently?
E.g. index pages and feeds.
It would also be of value to refresh fresh pages ...
I am developing a system that has to track content of few portals and check changes every night (for example download and index new sites that have been added during the ...