Some parts of this website may do not work correctly, because your browser doesn't support JavaScript or you have disabled it. In order to use all features please enable JavaScript in your browser.

Specification for annotator > tagger > lang-guesser

lang-guesser

The language identification tool uses created language bigram models to guess the input text language. If examined text is shorter than 24 characters, the language is guessed based on the occurrences of the non standard letters in each defined language.

Aliases

guess-lang, guess-language

Examples

guess-language ! simple-writer --tags !en

Selects sentences in English from multi-language text.

in:
Die Familie Grimm war in Hanau beheimatet. 
Jacob Ludwig Carl Grimm, born on 4 January 1785, was 13 months older than his brother Wilhelm Carl Grimm.
Obaj bracia byli członkami Akademii Nauk w Berlinie i uczonymi (językoznawcami), o znacznym dorobku.
out:
Jacob Ludwig Carl Grimm, born on 4 January 1785, was 13 months older than his brother Wilhelm Carl Grimm.

Options

Allowed options:
  --default-language arg (=xx) Language code to be used for unrecognized 
                               strings, use 'none' to turn off putting a 
                               default language code
  --force                      All frags must be marked as text in some 
                               language
  --only-langs arg             Guesses language only from the given list of 
                               languages

Other help resources