tika « Tools « Java Lucene Q&A

Home
Java Lucene Q&A
1.Database
2.Development
3.document
4.Field
5.index
6.lucene
7.lucene.net
8.nutch
9.query
10.solr
11.Tools
Java Lucene Q&A » Tools » tika 

1. Solr's TikaEntityProcessor not working    stackoverflow.com

I'm trying to get Solr to index a database in which one column is a filename of a PDF document I'd like to index. My configuration looks like this:

<dataConfig>
 <dataSource name="ds-db" ...

2. SOLR Tika: add text of file to existing record (ExtractingRequestHandler)    stackoverflow.com

I am indexing posts in SOLR with "name", "title", and "description" fields. I'd like to later be able to add a file (like a Word doc or a PDF) using Tika ...

3. Solr Tika, Text with style    stackoverflow.com

I've seen this link: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika What I got is pure text without any style from Tika for Solr to search in . Is it possible to have the text with its ...

4. Retrieving extracted text with Apache Solr    stackoverflow.com

I'm new to Apache Solr, and I want to use it for indexing pdf files. I managed to get it up and running so far and I can now search for ...

5. solr tika extraction problem    stackoverflow.com

I am using tika with dataimporthandler. while executing the full-import I am getting the following errors.

SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:tika-test Processing Document ...

6. tika solr integration    stackoverflow.com

I am trying to index using curl based request the request is

curl "http://localhost:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&fmap.content=attr_content&commit=true" -F "myfile=@/root/apache-solr-3.1.0/docs/who.pdf"
On submitting the request, i am getting this error,
 Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} ...

7. Solr Cell / ExtractingRequestHandler cannot parse some *.doc files    stackoverflow.com

I need to index content of doc/docx/pdf files uploaded by users and use Solr (1.4.1) ExtractingRequestHandler component (817165) for that. If that matters, I don't request indexing from it - the ...

8. What is the formatting of Solr CEL/Tika output? And how to fix it?    stackoverflow.com

I am using Solr to index DOC, DOCX and PDF files. I had enabled stored for the text and I checked it out. Here's the result from a sample DOC file:

...

9. Solr display page no of PDF along with the results    stackoverflow.com

My question is just a continuation of this activity where I would like to display page no for the searched word in the input document. Solr open document after searching a ...

10. Solr : file entity processor and delta import    stackoverflow.com

I'm using solr 3.3 and i want to use delta import with file entity processor and tika entity processor. Full import works fine but the delta import parameter doesn't import the ...

java2s.com  | Contact Us | Privacy Policy
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.