webcrawler « solr « Java Lucene Q&A

Home
Java Lucene Q&A
1.Database
2.Development
3.document
4.Field
5.index
6.lucene
7.lucene.net
8.nutch
9.query
10.solr
11.Tools
Java Lucene Q&A » solr » webcrawler 

1. Recommendations for a spidering tool to use with Lucene or Solr?    stackoverflow.com

What is a good crawler (spider) to use against HTML and XML documents (local or web-based) and that works well in the Lucene / Solr solution space? Could be Java-based but ...

2. Crawler/parser for Xapian    stackoverflow.com

I would like to implement a search engine which should crawl a set of web sites, extract specific information from the pages and create full-text index of that specific information. It seems ...

3. Which metadata I should save when downloading web-pages?    stackoverflow.com

I'm going to download (for future purposes of language processing) some thousands webpages. Now I'm thinking, which metadata I should save. I explore this, but I do not wont to neglect ...

4. Design Question for Notification System    stackoverflow.com

I would like to know what is best way to design Notification System for website update: For Example use case: Let suppose you have site like craiglist.com and any time a new posting ...

java2s.com  | Contact Us | Privacy Policy
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.