plugin « nutch « Java Lucene Q&A

Home
Java Lucene Q&A
1.Database
2.Development
3.document
4.Field
5.index
6.lucene
7.lucene.net
8.nutch
9.query
10.solr
11.Tools
Java Lucene Q&A » nutch » plugin 

1. Parsing html data with nutch 1.0 and a custom plugin    stackoverflow.com

I am currently trying to write a custom plugin for nutch 1.0. This plugin is supposed to parse html data and filter out relevant information from documents. I have a basic ...

2. how nutch plugins work?    stackoverflow.com

I am new to nutch, but i know nutch uses Lucene for indexing,which only understands text format. Nutch have many plug-ins that can is used for crawling the particular format that ...

3. Nutch : get current crawl depth in the plugin    stackoverflow.com

I want to write my own HTML parser plugin for nutch. I am doing focused crawling by generating outlinks falling only in specific xpath. In my use case, I want to fetch different ...

4. Why is nutch parsing application/x-javascript files?    stackoverflow.com

I configured nutch with the following in my conf/nutch-site.xml

<property>
  <name>plugin.includes</name>
  <value>urlfilter-regex|protocol-(http|file)|parse-(text|html|pdf|msword)|in
dex-(basic|anchor|more)|query-(basic|site|url)|response-(json|xml)|summary-basic
|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
  <description>Regular expression naming plugin directory names to
  include.  Any plugin not matching this expression is ...

5. Need plugin to overwrite default title    stackoverflow.com

Im trying to write a plugin for Nutch based on http://sujitpal.blogspot.com/2009/07/nutch-custom-plugin-to-parse-and-add.html to get a custom title finder. This works well, and storing extracted titles in new field is no problem. ...

java2s.com  | Contact Us | Privacy Policy
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.