Specification for annotator > translator > bonsai
Bonsai is a tree-to-string decoder for statistical machine translation. It requires its input sentences to be parsed before translation. For predefined translation rules, it is best to use default options without modification, as the weights have already been optimized.
You can now run a Polish to English toy translation model using the following pipe:
gobio --lang pl ! bonsai --lang pl --trg-lang en
It is possible to create your own translation rule set using any parser integrated with PSI-Toolkit and the training-specific tools described in the bonsai tutorial. This, however, is an advanced topic, recommended for people with some experience in statistical machine translation. It is also quite resource and time-intensive.
Allowed options: --lang arg (=guess) language --force-language force using specified language even if a text was resognised otherwise --trg-lang arg target language --config arg (=%ITSDATA%/%LANG%%TRGLANG%/%LANG%%TRGLANG%.cfg) Path to configuration --rs arg Paths to translation rule sets --lm arg Paths to language models --stacksize arg (=20) Node translation stack size --max_trans arg (=20) Maximal number of transformations per hyper edge --max_hyper arg (=20) Maximal number of hyper edges per symbol --eps arg (=-1) Allowed transformation cost factor --nbest arg (=1) Display n best translations --verbose arg (=0) Level of verbosity: 0, 1, 2 --pedantic Pedantic cost calculation (for debugging) --mert Output for MERT (combine with nbest) --tm_weight arg Weights for translation model parameters --rs_weight arg Weights for different translation rules sets --lm_weight arg Weights for different language models --word_penalty arg Weight for word penalty