Some parts of this website may do not work correctly, because your browser doesn't support JavaScript or you have disabled it. In order to use all features please enable JavaScript in your browser.

Frequently Asked Questions

If you have any question that is not included in this FAQ but it should be, please send it by email

Is the graphical user interface available offline?

Yes, but, currently, the GUI is available only for Linux users. After installing one of the PSI Linux packages, you will get the same webservice as this and you will be able to run it offline by command:

$ psi-server

Which types of input files does PSI-Toolkit support?

PSI-Toolkit deals with all kinds of text files, including some NLP-tools internal formats like PSI and UTT formats. Other supported file formats are: (x)html, pdf, doc, djvu, docx, xlsx and pptx.

Which languages does PSI-Toolkit support?

It depends on the processor. Some of them, such as tokenizer and segmentizer, offer support for the wide range of languages, whereas others are designed for specific language, like in the case of morfologik or link-grammar. In documentation you may find all supported languages for each processor.

How can I display all text fragments with particular tag?

You can filter the display of PSI-lattice by using the tag option of simple-writer. See page Working with PSI-lattice for details.

How to extract grammatical classes for each word in a sentence?

One of the possible solutions is to run the PSI-Toolkit with the following pipeline:

lemmatize ! write --tags lexeme --fallback-tags token

It returns the list of all known lemmas and its grammatical classes for each token, or it simply returns token if lemmas is not found. The order of obtained lexemes is exactly the same as the token's order for the tokenize command.

Which types of character set are supported?

The PSI-Toolkit has a native support for text in UTF-8 character encoding system, but there are also libraries for automatic detection and conversion of character set integrated into psi-pipe command. Please, check its --list-encoding and --encoding-conversion options.

Additionally, for all files sending to PSI-Server the conversion to UTF-8 is enabled.

Other help resources