The language of the engine to use.
The input paragraph or sentence string (must be SGML encoded).
This method returs a set of sentences as a single string. The sentences are separated by '\r\n'.
The language of the engine to use.
One input sentence string from SentenceSplitter.
The method returns a set of lines separated by '\r\n'. Each line contains a token and possibly a POS tag (list) separated by '\t' and a NER tag (list) separated by '\t'.
The language of the engine to use.
One input tokenized sentence string from Tokenizer.
The method returns a set of lines separated by '\r\n'. Each line contains a token and a POS tag (list) separated by '\t'. Possibly, also a NER tag (list) separated by '\t'.
The language of the engine to use.
One input tokenized and tagged sentence string from Tagger.
The method returns a set of lines separated by '\r\n'. Each line contains a token, a POS tag (list) and a lemma, all separated by '\t'.
The language of the engine to use.
One input tokenized, tagged and lemmatized sentence string from Lemmatizer.
The method returns a set of lines separated by '\r\n'. Each line contains a token, a POS tag (list), a lemma and possibly a list of available chunks to which the current token belongs, all separated by '\t'.
The language of the engine to use.
The paragraph id. If equals to '', no paragraph id is inserted.
The input string as a paragraph or sequence of sentences.
The method returns a set of <seg...></seg> segments containing XCES (RACAI variant) data all separated by '\r\n'.
The UTF8 encoded, input string.
The method returns a SGML character expansion of the input string.
The SGML encoded, input string.
The method returns the UTF8 character encoding of the input string.
The SGML encoded, input string.
The method returns a UTF7 representation of the input string.
This method performs sentence splitting. Input string must be SGML encoded. For this purpose, use this package UTF8toSGML and then apply SGMLtoUTF8 on the result value to obtain the UTF8 encoding of the result.
This method performs sentence splitting. Input string must be SGML encoded. For this purpose, use this package UTF8toSGML and then apply SGMLtoUTF8 on the result value to obtain the UTF8 encoding of the result.
This method performs POS tagging using HMM models. Input string must be SGML encoded. For this purpose, use this package UTF8toSGML and then apply SGMLtoUTF8 on the result value to obtain the UTF8 encoding of the result.
This method performs lemmatization using lexicon lookup and statistical lemmatization. Input string must be SGML encoded. For this purpose, use this package UTF8toSGML and then apply SGMLtoUTF8 on the result value to obtain the UTF8 encoding of the result.
This method performs chunking over sequences of POS tags defined by regexes. Input string must be SGML encoded. For this purpose, use this package UTF8toSGML and then apply SGMLtoUTF8 on the result value to obtain the UTF8 encoding of the result.
This method performs XCES encoding on the input string and calls all the processing methods in order. Input string must be SGML encoded. For this purpose, use this package UTF8toSGML and then apply SGMLtoUTF8 on the result value to obtain the UTF8 encoding of the result.
This static method performs UTF8 to SGML encoding on the input string.
This static method performs SGML to UTF8 encoding on the input string.
This static method performs SGML to UTF7 encoding on the input string.