crawler « Development « Java Network Q&A

Java Network Q&A
Java Network Q&A » Development » crawler 

1. Is it possible to discover plugged disks from Java?

I'm writing a disk crawler and if the user doesn't provide an existing path the program should search all disks that are available. Does anybody know is it possible and if ...

2. Lucene crawler (it needs to build lucene index)

I am looking for Apache Lucene web crawler written in java if possible or in any other language. The crawler must use lucene and create a valid lucene index and document ...

3. crawler get external website search result

  1. What is the best practice and library I can use to key in search textbox on external website and collect the search result?
  2. How do tackle website with different search box ...

4. Using Web crawler for price comparison

I need a open source java based web crwaler which I can extend for price comparison? How do I do the price comparison? Is there any open source code for that?

5. What is a good Java web crawler library?

I am about to develop a crawler in Java but don't feel like reinventing the wheel. A quick Google search gives a whole bunch of Java libraries to build ...

6. Writing a simple web crawler that interacts with the browser (Java)

I need to create an automated process (preferably using Java) that will:

  1. Open browser with specific url.
  2. Login, using the username and password specified.
  3. Follow one of the links on the page.
  4. Refresh the browser.
  5. Log ...

7. Building vertical crawler using Bixo

I came across an an open source crawler Bixo. Has anyone tried it? Could you please share the learning? Could we build directed crawler with enough ease (compared to Nutch/Heritrix) ? Thanks Nayn

8. How to optimize this ugly code?

I make the other day a question here, but finally I decided to do it myself for questions of time, now I have a little more time to fix it ...

9. web crawler in java

Possible Duplicate:
What is a good Java web crawler library?
hello, i need to crawl some websites say some 1000 websites i need an open source ...

10. Crawling a Page with dynamically generated content

I have been using the crawler for a custom built crawler. The problem is with dynamically generated content, like comments on a blog for example. Consider the following ...

11. web crawler performance

I am interested to know in a very general situation (a home-brew amateur web crawler) what will be the performance of such. More specifically how many pages can a crawler process. When ...

12. What should i use to crawl many news articles?

I've a project of natural language processing but for that i need to crawl many web articles from some sources like Yahoo news, Google news or blogs... I'm a java developper (so ...

13. Nutch API advice

I'm working on a project where I need a mature crawler to do some work, and I'm evaluating Nutch for this purpose. My current needs are relatively straightforward: I need a ...

14. Fastest way to check list of crawler IPs via contains in Java

I got list of crawlers from following website: If you know better list of IPs that is regularly update please let me know. Now I created object:

 private static ...

15. How do we build a website Crawler using Java

posting this question again.I have started with the crawler , but i am stucked with the indexing part.I want an efficient and fast way to index the links.Currently what i am ...

16. crawler to retrieve o2 bluebook messages

Hi Please read the whole topic - it is an interesting one. Quite challenging. o2 have a service called Bluebook allowing users to backup their contacts, pictures and text messages. Unfortunately because of the ...

17. Java CSS Crawler

I'm looking for a web crawler with the ability to grab the page's CSS. I don't need any other fancy crawling abilities. I'm trying to make my way through Xapian, Nutch and ...

18. nutch crawler is crawling ' as †

nutch crawler is crawling let's as Let’s y??? is there is any setting to change the this charset..

19. guide for setuping crawler4j

i would like to setup the crawler to crawl a website, let say blog, and fetch me only the links in the website and paste the links inside a text file. ...

20. java web crawler

hi can anyone recommend a simple java web crawler that crawls a websites and return a list of links in the website ? No, i do not need a parser. Thanks ...

21. Crawl Web Data using Web Crawler

I would like to use a web crawler and crawl a particular website. The website is a learning management system where many student upload their assignments,project presentations and so on. My ...

22. Are there any open-source implementations of the Mercator Web Crawler

Marc Najork and Allan Heydon have written an excellent paper on their Java, scalable and extensible web crawler called Mercator. Here are some resources on the Mercator web crawler:

23. RDF crawler using triple

I'm using a rdf crawler, in that I had a class named as:

import edu.unika.aifb.rdf.crawler.*;
import com.hp.hpl.jena.rdf.model.*;
import com.hp.hpl.jena.util.FileManager;
These are class file termed as error, and I try with jena packages but ...

24. How can I configure this java crawler

I want to configure this java crawler ( But I am confuse how can I do this as this is the first time I am working on this. I ...

25. Access is Denied error with Java Crawler

Possible Duplicate:
How can I configure this java crawler
Any idea what is the meaning of this error. And why I am getting this error..Any suggestions ...

26. Web Crawler's Functionality

Does a web crawler return the extracted text from webpages only? Say, if there are some pdf/doc files stored in the web server as well. Can a web crawler crawl through ...

27. Crawl Only HTML Pages

I want to crawl onyl html pages so when I changed the regular expression here in this code.. it is still crawling some xml page also.. Any suggestions why is it ...

28. Crawl only HTML page while checking the response header

I am trying to get all the url's that have header as Content-Type:text/html so I am checking the response header of each url and If they have content-type: text/html, then I ...

29. Writing a web crawler in Java

How do I write a page scraper in Java to crawl the web and obtain information related to a particular topic.On searching Google I found only 1 video on youttube with ...

30. How to write a java code for crawling sites with apache nutch 1.3 api?

I want to write a program with java and nutch 1.3 api to crawl the the sites i searched the web but there is no sample code how can i do that? thanks ...

31. HTML Mixed Encodings?

First I would like to say thank you for the help in advance. I am currently writing a web crawler that parses HTML content, strips HTML tags, and then spell checks the ...

32. Use HtmlUnit as crawler

I needed a headless browser to parse pages. HtmlUnit allow me to setup a Heroku Java app to fullfil this purpose. But now I'm meeting with couple of issues. The current one is malformed ...

33. Need a webcrawler that will pass login credentials and only follow specified path

So I need a web crawler (Preferably open source and java) that I can input a list of URLs (or just run multiple times I suppose considering it's open source) and ...

34. Nutch crawler only finds a subset of links on a given page?

I'm using the following command to crawl one single page with 788 links on it:

nutch crawl urls/ -dir crawls -depth 1 -topN 1000
The above command only is able to find 72 ...

35. Books about web crawler?

Is there any books really focusing on how to write a web crawler in java? I'd like a book that teaches me to writer a web spider step by step, not ...

36. web crawlers in java

37. Crawler

Are you asking "How can you tell if the client requesting a page is a crawler or a browser?" I don't think you can. A crawler could be written to send headers that perfectly impersonate a given browser. I wonder if you could build the smarts to notice a particular session running up a lot of bandwidth in a short time. ...

38. Java web crawler help

Hi all, I just got this .Net project that I need to move it's functionality into our Java project. Part of the .Net project has this web crawler that goes out to some site that has a bunch of links to .xls files and dowloads all those files locally. Can someone send me in the right direction on how to create ...

39. web crawler compilation problem

but i have another question. the program don't seems to be working. i typed a few url and it never show any result. anybody knows why? i have included the code here. thank you package crawler; import java.applet.Applet; import java.text.*; import java.awt.*; import java.awt.event.*; import java.util.*; import*; import*; import java.awt.List; public class WebCrawler extends Applet implements ActionListener, Runnable { ...

40. web crawler don work

hi all, i am writing this as a new post. i found the web crawler sample here. i manage to compiled. but when i typed a url, nothing is produced. can someone tell me what is wrong? i have included the code here. thank you package crawler; import java.applet.Applet; import java.text.*; import java.awt.*; import java.awt.event.*; import java.util.*; import*; import*; ...

41. Web crawler or RSS Feed

I need to create a program for price comparison and scheme comparison between multiple Cell service provider. Can anybody suggest any way or method to accomplish this task. Is a web crawler or RSS feed would be helpful. After googling I found that RSS generally have news related items and in case of web crawler the program needs to be updated ...

42. crawler

Actually, my concern is: I had application that crawl on web link. If login required on web link, it login as well. But there are some sites which required cookie enabled browser. I attempt to login on this site, i always return back with login page. In my application, crawler first request login page, retrive cookie information from this header and ...

43. Searching a crawler

Hi, i want to download as much websites as possible. I used this tool in the beginning and started modifying it heavily. As a result, it is running, but somehow it sometimes just does nothing (checked the tcp states, threads, heap space etc. and everything looks fine to me) so i started searching around the internet if there is a ...

44. What resources would be required for a java based web crawler

The standard Java library has all you need to get started. I feel that web crawling has gotten a lot more complicated as people create more complex pages using more JavaScript to dynamically build a page. The "semantic web" represents an attempt to allow better tagging of resources in a more academic style. If I was doing a web crawler now, ...

45. ChilKat web crawler

Hey guys, If anyone has used/ modified the Chilkat web crawler, could you tell me how? The crawler goes to each site, finds links, goes to the linked sites, finds more links, goes to those linked sites... etc. Does anyone know how to effective limit the number of links that it may expand to? Thanks so much, Brent

46. Web Crawler

Hi, I am trying this from a bit long time and still unable to figure out, that how can i detect a crawler is visiting my website. I know there are some web analytics tools already available for this, but i would like to know what API goes behind this. Any help will be appreciated. Thanks

47. Website Spider / Crawler program

Quick question. If I had to write a program to spider websites for specific information in volume (1000's of websites a week using a DB of URLS). Which language is best for this task Perl or Java. I've been looking into both and trying to come to a conclusion one way or another. Thanks in advance.

48. Web crawler - how to deal with a post method

49. Web Crawler - Out Of Memory Exception

Hi, I am implementing a Web Crawler that crawls through web pages, extract links, and moves on. At some points it discovers large web pages that cause OutOfMemory Exceptions at StringBuilder append method. I am using the following technique to read the page: - Open stream with the URL - Read line by line - Append each line to the StringBuilder ...

50. Creating a Java mp3 crawler

Hi, I am currently now working on creating an mp3 crawler using Java. I will enter a site address into my crawler and then it will crawl the site and locate all the mp3 files on the site. Although I get it working, its sort of like a brute force solution. It will crawl page by page looking for mp3 and ...

51. Java Crawler

Hi, Working on a crawler now. But all the time we will have to actually fetch the content back only can obtain the link in the page. Is there anyway that we can directly obtain the link from the page without having to first fetch it back first? Any advise will be appreciated. Thanks

52. Developing a Flight Booking System Crawler

Hi I am planning to develop a Flight Booking System. Basically, all I want to do is, have a software system that it searches for any airline. For example, the user will enter on the web application the dates they want to fly, maybe the airline or search for any flights. The live data will return all the list of results ...

54. Java and Web Crawlers...

Hi everyone, I was looking into how to build a web crawler with java and came across this old tutorial: After a few changes I got it to compile and run... but it doesn't work. Can anyone help? Code I'm using is: Basically my intention is to use this as a foundation for a little project I have going. Eventually ...


56. good java web crawlers  | Contact Us | Privacy Policy
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.