Specification for annotator > parser > puddle
A shallow parser based on the Spejd shallow parser originally developed at IPI PAN (http://zil.ipipan.waw.pl/Spejd/). For input, Puddle requires morphologically anotated text as produced, for instance, by the morfologik processor. It may also serve as a disambiguation tool itself or can be used chained with a POS-tagger (e.g. metagger processor).
Note that text needs to be annotated morphologically before passing it to puddle.
Currently, rules and tagsets are available for Polish only and used by default if not specified otherwise. The Polish parsing rules are for demonstration purposes only and are by no means complete.
For other languages, you need to provide custom rules and tag sets that are compatible with the morphological processer employed in before puddle in the processing pipeline. A tutorial on the rule format and tagsets is currently being prepared. See the Polish rule and tag sets for examples.
Aliasesparse, parse-generator, parser
read-text ! tokenize --lang pl ! lemmatize ! puddle ! graph --tree
A typical pipeline for parsing a Polish sentence with graphical output.
Alicja miała psa.
Allowed options: --lang arg (=guess) language --force-language force using specified language even if a text was resognised otherwise --tagset arg (=%ITSDATA%/%LANG%/tagset.%LANG%.cfg) tagset file --rules arg (=%ITSDATA%/%LANG%/rules.%LANG%) rules file