Example usage for org.apache.lucene.analysis.miscellaneous ASCIIFoldingFilter incrementToken

List of usage examples for org.apache.lucene.analysis.miscellaneous ASCIIFoldingFilter incrementToken

Introduction

In this page you can find the example usage for org.apache.lucene.analysis.miscellaneous ASCIIFoldingFilter incrementToken.

Prototype

@Override
    public boolean incrementToken() throws IOException 

Source Link

Usage

From source file:org.gbif.portal.action.dataset.SearchAction.java

License:Apache License

/**
 * Uses the solr.ASCIIFoldingFilter to convert a string to its ASCII equivalent. See solr documentation for full
 * details./*from  w  w w  . j  a v  a 2  s  . c  om*/
 * </br>
 * When doing the conversion, this method mirrors GBIF's registry-solr schema configuration for
 * <fieldType name="text_auto_ngram">. For example, it uses the KeywordTokenizer that treats the entire string as a
 * single token, regardless of its content. See the solr documentation for more details.
 * </br>
 * This method is needed when checking if the query string matches the dataset title. For example, if the query
 * string is "strae", it won't match the dataset title "Schulhof Gymnasium Hrth Bonnstrasse" unless "strae" gets
 * converted to its ASCII equivalent "strasse".
 * 
 * @param q query string
 * @return query string converted to ASCII equivalent
 * @see org.gbif.portal.action.dataset.SearchAction#addMissingHighlighting(String, String)
 * @see org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter
 * @see org.apache.lucene.analysis.core.KeywordTokenizer
 */
protected static String foldToAscii(String q) {
    if (!Strings.isNullOrEmpty(q)) {
        ASCIIFoldingFilter filter = null;
        try {
            StringReader reader = new StringReader(q);
            TokenStream stream = new KeywordTokenizer(reader);
            filter = new ASCIIFoldingFilter(stream);
            CharTermAttribute termAtt = filter.addAttribute(CharTermAttribute.class);
            filter.reset();
            filter.incrementToken();
            // converted q to ASCII equivalent and return it
            return termAtt.toString();
        } catch (IOException e) {
            // swallow
        } finally {
            if (filter != null) {
                try {
                    filter.end();
                    filter.close();
                } catch (IOException e) {
                    // swallow
                }
            }
        }
    }
    return q;
}