Our company has thousands of PDF documents. How do we create a simple search engine using Lucene, Solr or Nutch? We'll provide a basic Java/JSP web page were people ...
I have crawled a few pages with Java Nutch
Also I have made a module with Lucene in Java which allows execute queries on indexed documents.
I know I created Nutch fields ...
How can I instruct Nutch to treat page#1 as belonging to a core and page#2 as belonging to a different core (both pages from the same domain)?
Practical situation: let's say Nutch ...