Some parts of this website may do not work correctly, because your browser doesn't support JavaScript or you have disabled it. In order to use all features please enable JavaScript in your browser.

Specification for annotator > tokenizer > detok


A simple detokenizer, usually used at the end of the process of translation. It composes the final text from token edges of a given language. Tokens are put in the order induced by the values of the SurfacePosition attribute (not the original order in the lattice!). Usually tokens are joined using spaces except for some puncuation marks (e.g. no space is inserted before a comma or after an opening bracket).


Allowed options:
  --lang arg (=guess)   language
  --force-language      force using specified language even if a text was 
                        resognised otherwise

Other help resources