Fork me on GitHub

Apache OpenNLP 2.0.0 released

The Apache OpenNLP team is pleased to announce the release of Apache OpenNLP 2.0.0.

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.

It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.

Apache OpenNLP 2.0.0 binary and source distributions are available for download from our download page: download page

The OpenNLP library is distributed by Maven Central as well. See the Maven Dependency page for more details: Maven Dependency

What’s new in Apache OpenNLP 2.0.0

  • Adds ability to download models from within Apache OpenNLP

  • Now builds using Java 11

  • Supports model inference using the ONNX Runtime

  • Adds MASC format support

  • Made NameSample overlap exception more helpful

  • Tokenizers can now output a new line token

  • Adding missing charset to DictionaryLemmatizer

  • Updated documentation to fix training API sample code

  • Fixed build issues with Java 17

A detailed list of the issues related to this release can be found in the release notes.

For a complete list of fixed bugs and improvements please see the README.html file included in the distribution.

--The Apache OpenNLP Team

05 June 2022