Some parts of this website may do not work correctly, because your browser doesn't support JavaScript or you have disabled it. In order to use all features please enable JavaScript in your browser.

Specification for annotator > tagger > metagger

metagger

Metagger (Maximum Entropy Tagger) is a simplistic part-of-speech tagger that can be easily custom-trained. For the tagger to work, it is necessary to include any morphological analyzer in the pipeline before the tagger is used.

Currently no pretrained part-of-speech models are available which renders the tagger unusable unless you provide your own models.

Example

A typical pipeline for disambiguating potentially ambiguous output of the morfologik morphological analyzer is as follows:

read-text ! tokenize --lang pl ! morfologik ! metagger

Remarks

Most options concern custom-trained models and are not neccessary if te default Polish POS-model is used. A detailed description of the training procedure will be provided in a separate tutorial which is currently under construction.

Options

Allowed options:
  --lang arg (=guess)                 language
  --force-language                    force using specified language even if a 
                                      text was resognised otherwise
  --model arg (=%ITSDATA%/%LANG%.blm) model file
  --iterations arg (=50)              number of iterations
  --unknown-pos arg (=ign)            unknown part of speech label
  --cardinal-number-pos arg (=card)   cardinal number part of speech label
  --proper-noun-pos arg (=name)       proper noun part of speech label
  --open-class-labels arg             open class labels
  --train                             training mode
  --save-text-model-files             saves text model files in training model

Other help resources