Specification for annotator > tagger > metagger
Metagger (Maximum Entropy Tagger) is a simplistic part-of-speech tagger that can be easily custom-trained. For the tagger to work, it is necessary to include any morphological analyzer in the pipeline before the tagger is used.
Currently no pretrained part-of-speech models are available which renders the tagger unusable unless you provide your own models.
A typical pipeline for disambiguating potentially ambiguous output of the morfologik morphological analyzer is as follows:
read-text ! tokenize --lang pl ! morfologik ! metagger
Most options concern custom-trained models and are not neccessary if te default Polish POS-model is used. A detailed description of the training procedure will be provided in a separate tutorial which is currently under construction.
Allowed options: --lang arg (=guess) language --force-language force using specified language even if a text was resognised otherwise --model arg (=%ITSDATA%/%LANG%.blm) model file --iterations arg (=50) number of iterations --unknown-pos arg (=ign) unknown part of speech label --cardinal-number-pos arg (=card) cardinal number part of speech label --proper-noun-pos arg (=name) proper noun part of speech label --open-class-labels arg open class labels --train training mode --save-text-model-files saves text model files in training model