Apache OpenNLP Models

In this section of Apache OpenNLP Tutorial, we shall learn briefly the following items :

  • Tools for which OpenNLP Models are available.
  • Tools for which OpenNLP Models are not available.
  • Tools for which OpenNLP Models could be built.

All the Apache OpenNLP Models that are provided by Apache OpenNLP officially are available at http://opennlp.sourceforge.net/models-1.5/.

Officially available Apache OpenNLP Models

Apache OpenNLP has models for following languages officially :

  • Danish
  • English
  • Spanish
  • Dutch
  • Portuguese

If models are required for other languages, they could to be built using training modules. Apache OpenNLP provides Java APIs and Command Line Interface for doing so.

Following tools have models pre-built by Apache :

  • Tokenizer
  • Sentence Detector
  • POS Tagger
  • Name Finder
  • Chunker
  • Parser

Tools for which OpenNLP Models must be custom built

Document Categorizer is one of a kind where a definite data is not defined. The training data varies from use case to use case, application to application etc. And the developers are expected to build their own models that suit their use case and training data.

Tools for which OpenNLP Models could be custom built

Apache OpenNLP provides Java APIs and Command Line Interface to help us train and build a model from the custom training data.

Conclusion

In this tutorial, we have learnt the place to refer Apache OpenNLP Models, the list of models that could be built for various tools of OpenNLP, and the list of tools for which model must be generated.