Testing the Analyzers
The javascript search engine is provided with a set of normalizers that can be chained to the parser.
An analyzer is just a parser associated to a set (possibly empty) of normalizers.
The goal of the normalization process is to reduce the set of words to index and to provide a way to
automatically remove typos, stem, or reduce words to phonetic equivalents so that they are represented
in the index by one single form.
Each normalizer have their own strategy and "aggressiveness". For instance:
- remove_duplicate_letters just remove sequences of the same letter.
- to_lowercase_decomp provides a unicode-decomposed, lower-cased form of the words.
- to_lowercase_nomark lowercases the words and remove all the diacritical marks
- porter_stemmer stems the words using english-based rules
- french_normalizer agressively normalizes french words by stemming and phonetically simplifying them
You can try some normalizers by typing text here: