1. Number of connections to the host at the same time stackoverflow.com
How can I handle this?
3. Hadoop to create an Index and Add() it to distributed SOLR... is this possible? Should I use Nutch? ..Cloudera? stackoverflow.com
Can I use a MapReduce framework to create an index and somehow add it to a distributed Solr? I have a burst of information (logfiles and documents) that will be transported over ...
4. How can I develop a web crawler using nutch in Windows XP? stackoverflow.com
I'm totally new to Nutch, I've installed Tomcat and, using NetBeans I've made a little Java project, which looks like this:
5. Writing MetaData inside HDFS stackoverflow.com
We are using nutch to crawl our intranet site. We are extracting the meta data in xml file, in the indexing phase(We modified the code of indexer.java), and when ran in local ...
6. Run Nutch on existing Hadoop cluster stackoverflow.com
We have a Hadoop cluster (Hadoop 0.20) and I want to use Nutch 1.2 to import some files over HTTP into HDFS, but I couldn't get Nutch running on the cluster. I've ...
7. Increase Java heap space for language-identifier plugin-in in nutch stackoverflow.com
I am trying to add a new language To Automatic Language Detection tool Apache's tika. It needs to build a language profile for adding a new language. So i am using ...
9. i don't known what does the symbol,"#" mean in the following src of the nutch's HttpBase.java stackoverflow.com
When I come to the following src of the nutch's
10. Nutch Crawl error - Input path does not exist stackoverflow.com
i have nutch/hadoop with 2 datanode server. I try to crawl some urls but nutch fails with this error:
11. whether method cancel() and method interrupt() do the duplicate job? stackoverflow.com
I read the source of
The source of the
12. Exploring nutch over hadoop stackoverflow.com
What possibly can i do with Hadoop and Nutch used as a search engine ? I know that nutch is used to build a web crawler . But i'm not finding ...
13. Setting up nutch 1.3 and Hadoop 0.20.2 stackoverflow.com
I have a multi-node cluster running on UEC(Ubuntu enterprise cloud) and i thought it will be a good idea to set up nutch with it . However, i found this tutorial unhelpful ...