POS Tagger Example in Apache OpenNLP using Java
POS Tagger Example in Apache OpenNLP marks each word in a sentence with the word type.
In this tutorial, we will learn how to use POS Tagger in Apache OpenNLP for Parts-of-Speech tagging.
Following is an example showing the output of POS Tagger for a given input sentence.
Input to POS Tagger | John is 27 years old. |
Output of POS Tagger | John_NNP is_VBZ 27_CD years_NNS old_JJ ._. |
The word types are the tags attached to each word. These Parts Of Speech tags used are from Penn Treebank.
Tag | Description |
---|---|
NNP | Proper Noun, Singular |
VBZ | Verb, 3rd person singular present |
CD | Cardinal Number |
NNS | Noun, Plural |
JJ | Adjective |
. | . |
For a complete list of Parts Of Speech tags from Penn Treebank, please refer https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Steps to Use POS Tagger in OpenNLP
Following are the steps to obtain the tags pragmatically in Java using Apache OpenNLP.
Step 1: Tokenize the given input sentence into tokens.
String sentence = "John is 27 years old.";
// tokenize the sentence
tokenModelIn = new FileInputStream("en-token.bin");
TokenizerModel tokenModel = new TokenizerModel(tokenModelIn);
Tokenizer tokenizer = new TokenizerME(tokenModel);
String tokens[] = tokenizer.tokenize(sentence);
Step 2: Read the parts-of-speech maxent model, “en-pos-maxent.bin” into a stream.
InputStream posModelIn = new FileInputStream("en-pos-maxent.bin");
Step 3: Read the stream into parts-of-speech model, POSModel.
POSModel posModel = new POSModel(posModelIn);
Step 4: Load the model into parts-of-speech tagger, POSTaggerME .
POSTaggerME posTagger = new POSTaggerME(posModel);
Step 5: Grab the tags using the method POSTaggerME.tag(), and probability for the tag to be given using the method PosTaggerME.probs();
String tags[] = posTagger.tag(tokens);
double probs[] = posTagger.probs();
Step 6: Finally, print what we got, the token, their respective tags and probabilities of the tags.
Example – POS Tagger in OpenNLP
In this example, we will implement all the steps mentioned above.
POSTaggerExample.java
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
/**
* www.tutorialkart.com
* POS Tagger Example in Apache OpenNLP using Java
*/
public class POSTaggerExample {
public static void main(String[] args) {
InputStream tokenModelIn = null;
InputStream posModelIn = null;
try {
String sentence = "John is 27 years old.";
// tokenize the sentence
tokenModelIn = new FileInputStream("en-token.bin");
TokenizerModel tokenModel = new TokenizerModel(tokenModelIn);
Tokenizer tokenizer = new TokenizerME(tokenModel);
String tokens[] = tokenizer.tokenize(sentence);
// Parts-Of-Speech Tagging
// reading parts-of-speech model to a stream
posModelIn = new FileInputStream("en-pos-maxent.bin");
// loading the parts-of-speech model from stream
POSModel posModel = new POSModel(posModelIn);
// initializing the parts-of-speech tagger with model
POSTaggerME posTagger = new POSTaggerME(posModel);
// Tagger tagging the tokens
String tags[] = posTagger.tag(tokens);
// Getting the probabilities of the tags given to the tokens
double probs[] = posTagger.probs();
System.out.println("Token\t:\tTag\t:\tProbability\n---------------------------------------------");
for(int i=0;i<tokens.length;i++){
System.out.println(tokens[i]+"\t:\t"+tags[i]+"\t:\t"+probs[i]);
}
}
catch (IOException e) {
// Model loading failed, handle the error
e.printStackTrace();
}
finally {
if (tokenModelIn != null) {
try {
tokenModelIn.close();
}
catch (IOException e) {
}
}
if (posModelIn != null) {
try {
posModelIn.close();
}
catch (IOException e) {
}
}
}
}
}
When the above program is run, the output to the console is shown in the following.
Output
Token : Tag : Probability
---------------------------------------------
John : NNP : 0.9874932809932121
is : VBZ : 0.9667574183085389
27 : CD : 0.9890000667325892
years : NNS : 0.979181322666035
old : JJ : 0.9894752224172251
. : . : 0.9923321017451305
The structure of the project is shown below:
Please note that in this example, the model files, en-pos-maxent.bin and en-token.bin are placed right under the project folder. Please find the models at http://opennlp.sourceforge.net/models-1.5/ .
Conclusion
In this Apache OpenNLP Tutorial, we have seen how to tag parts of speech to the words in a sentence using POSModel and POSTaggerME classes of openNLP Tagger API.
Following are some of the other example programs we have,