Apache OpenNLP Models
In this section of Apache OpenNLP Tutorial, we shall learn briefly the following items :
- Tools for which OpenNLP Models are available.
- Tools for which OpenNLP Models are not available.
- Tools for which OpenNLP Models could be built.
All the Apache OpenNLP Models that are provided by Apache OpenNLP officially are available at http://opennlp.sourceforge.net/models-1.5/.
Officially available Apache OpenNLP Models
Apache OpenNLP has models for following languages officially :
- Danish
- English
- Spanish
- Dutch
- Portuguese
If models are required for other languages, they could to be built using training modules. Apache OpenNLP provides Java APIs and Command Line Interface for doing so.
Following tools have models pre-built by Apache :
- Tokenizer
- Sentence Detector
- POS Tagger
- Name Finder
- Chunker
- Parser
Tools for which OpenNLP Models must be custom built
Document Categorizer is one of a kind where a definite data is not defined. The training data varies from use case to use case, application to application etc. And the developers are expected to build their own models that suit their use case and training data.
Tools for which OpenNLP Models could be custom built
Apache OpenNLP provides Java APIs and Command Line Interface to help us train and build a model from the custom training data.
Conclusion
In this tutorial, we have learnt the place to refer Apache OpenNLP Models, the list of models that could be built for various tools of OpenNLP, and the list of tools for which model must be generated.