List of usage examples for org.apache.lucene.analysis.cn.smart WordType STRING
int STRING
To view the source code for org.apache.lucene.analysis.cn.smart WordType STRING.
Click Source Link
From source file:com.churvey.graduate.chinese.WordSegmenter.java
License:Apache License
/** * Process a {@link SegToken} so that it is ready for indexing. * /* w w w . j a va2s. c om*/ * This method calculates offsets and normalizes the token with {@link SegTokenFilter}. * * @param st input {@link SegToken} * @param sentence associated Sentence * @param sentenceStartOffset offset into sentence * @return Lucene {@link SegToken} */ public SegToken convertSegToken(SegToken st, String sentence, int sentenceStartOffset) { switch (st.wordType) { case WordType.STRING: case WordType.NUMBER: case WordType.FULLWIDTH_NUMBER: case WordType.FULLWIDTH_STRING: st.charArray = sentence.substring(st.startOffset, st.endOffset).toCharArray(); break; default: break; } st = tokenFilter.filter(st); st.startOffset += sentenceStartOffset; st.endOffset += sentenceStartOffset; return st; }