Specification for annotator > tagger > lang-guesser
The language identification tool uses created language bigram models to guess the input text language. If examined text is shorter than 24 characters, the language is guessed based on the occurrences of the non standard letters in each defined language.
guess-language ! simple-writer --tags !en
Selects sentences in English from multi-language text.
Die Familie Grimm war in Hanau beheimatet. Jacob Ludwig Carl Grimm, born on 4 January 1785, was 13 months older than his brother Wilhelm Carl Grimm. Obaj bracia byli członkami Akademii Nauk w Berlinie i uczonymi (językoznawcami), o znacznym dorobku.
Jacob Ludwig Carl Grimm, born on 4 January 1785, was 13 months older than his brother Wilhelm Carl Grimm.
Allowed options: --default-language arg (=xx) Language code to be used for unrecognized strings, use 'none' to turn off putting a default language code --force All frags must be marked as text in some language --only-langs arg Guesses language only from the given list of languages