Specification for parser
gobio
A deep parser based on the parser used in Translatica machine translation system.
Gobio operates on morfologically annotated text.
Gobio has some predefined sets of rules for several languages, but if you want,
you can provide your own rule file with --rules
option. The rules for gobio
are in general a kind of context-free grammar rules. The tutorial on how to
write your own rule sets for gobio is being prepared.
Explanation of symbols used to denote grammatical categories for Polish:
- C (czasownik) - verb
- LG (liczebnik główny) - cardinal numeral
- P (przymiotnik) - adjective
- PR (przyimek) - preposition
- PS (przysłówek) - adverb
- R (rzeczownik) - noun
- S (spójnik) - conjunction
- ZP (zaimek przymiotny) - adjective pronoun
- ZRn (zaimek rzeczowny nieokreślony) - indefinite noun pronoun
- ZRo (zaimek rzeczowny osobowy) - personal noun pronoun
- ZRs (zaimek „się”) - pronoun “się”
- ZRw (zaimek wskazujący) - demonstrative pronoun
- ZS (zaimek przysłowny) - adverbial pronoun
Aliases
parse, parse-generator, parserLanguages
de, pl, testExamples
gobio --lang pl ! draw-parse-tree
Parse Polish sentence and draw its constituent tree.
--line-by-line gobio --lang pl --terminal-tag parse-terminal ! bracketing-writer --disamb --tags parse --opening-bracket %c[
Parse Polish sentences line by line and print simplified constituent tree for each sentence.
Komputer czyta zdania.
Każde zdanie ma swoje drzewo składniowe.
Zrobiłem już trzy zdania.
Z[FC[FR[R[Komputer]] C[czyta]]] FR[R[zdania]].
Z[FC[FR[ZP[Każde] R[zdanie]] C[ma]]] FR[ZP[swoje] R[drzewo] FP[P[składniowe]]].
Z[FC[C[Zrobiłem] FPS[PS[już]]]] FR[LG[trzy] R[zdania]].
Options
Allowed options: --lang arg (=guess) language --force-language force using specified language even if a text was resognised otherwise --edge-number-limit arg (=-1) maximal number of edges inserted between each two vertices --rules arg (=%ITSDATA%/%LANG%/rules.g) file with rules in text format --terminal-tag arg (=parse-terminal) tag for terminal
puddle
A shallow parser based on the Spejd shallow parser originally developed at IPI PAN (http://zil.ipipan.waw.pl/Spejd/). For input, Puddle requires morphologically anotated text as produced, for instance, by the morfologik processor. It may also serve as a disambiguation tool itself or can be used chained with a POS-tagger (e.g. metagger processor).
Note that text needs to be annotated morphologically before passing it to puddle.
Currently, rules and tagsets are available for Polish only and used by default if not specified otherwise. The Polish parsing rules are for demonstration purposes only and are by no means complete.
For other languages, you need to provide custom rules and tag sets that are compatible with the morphological processer employed in before puddle in the processing pipeline. A tutorial on the rule format and tagsets is currently being prepared. See the Polish rule and tag sets for examples.
Aliases
parse, parse-generator, parserLanguages
fr, plExamples
read-text ! tokenize --lang pl ! lemmatize ! puddle ! graph --tree
A typical pipeline for parsing a Polish sentence with graphical output.
Options
Allowed options: --lang arg (=guess) language --force-language force using specified language even if a text was resognised otherwise --tagset arg (=%ITSDATA%/%LANG%/tagset.%LANG%.cfg) tagset file --rules arg (=%ITSDATA%/%LANG%/rules.%LANG%) rules file