CrawlConditions

Crawljax has CrawlConditions which can be used to guide the crawling with a dynamic approach. When Crawljax finds a new state, it only crawls it when all the CrawlConditions are satisfied. If no CrawlConditions are specified, all the states are crawled.

Methods in CrawlSpecification

addCrawlCondition(String description, Condition crawlCondition)
addCrawlCondition(String description, Condition crawlCondition, Condition... preConditions)

Example 1

Crawljax should onlyl crawl pages with the text foo in the URL.

CrawljaxConfigurationBuilder builder = CrawljaxConfiguration.builderFor(URL);
...
UrlCondition onlyFooDomain = new UrlCondition("foo"));
builder.addCrawlCondition("Only crawl foo site", onlyFooDomain);

Example 2

Crawljax should never crawl a page with a span with the class 'foo'.

CrawljaxConfigurationBuilder builder = CrawljaxConfiguration.builderFor(URL);
...
NotXPathCondition noFooClass = new NotXPathCondition("//SPAN[@class='foo']"));
builder.addCrawlCondition("No spans with foo as class", noFooClass);

This page contains a foo and should therefore not be crawled by Crawljax. Thus think link should not be clicked by Crawljax