Getting started


PSI-Toolkit is a set of linguistic tools for processing texts written in various formats and various languages. The full range of PSI-Toolkit functions is offered with the command-line version, but you can test most of the features via this web-service.

What is going on here?

All you need to process your data is to set the command, type the input text and run PSI-Toolkit.


Commands let you specify what you want to do, for instance, to tokenize or to lemmatize. The full command usually is a sequence of three tasks: read, annotate and write because data have to be firstly read, then processed (annotated) and finally displayed.

Note that, you do not have to specify all tasks in your command. PSI-Toolkit tries to guess the best tool to complement your task. This is why on the first screenshot, neither a reader nor a writer were specified.

The last thing: --lang pl is an example of an option that can be passed to the given tool. All options are described in documentation.

Input data

To specify input data you want to process, you may choose between typing your text, and selecting a file from your disk. In the latter case PSI-Toolkit supports such formats as: .txt, .doc, .docx, .html, .pdf and others.

You may skip a reader and PSI-Toolkit select the best one for you.


If you are not familiar with available writers, you may select output format from the selection list. The results may be displayed in various ways depending on the type of writer and options you choose. Finally, in most cases it is possible to download results in a file.

Other help resources