Named Entity Extraction in OpenNLP
Named Entity Extraction Example in openNLP – In this openNLP tutorial, we shall try entity extraction from a sentence using openNLP pre-built models, that were already trained to find the named entity.
What is Named Entity Recognition/Extraction (NER)?
Named Entity Recognition is a task of finding the named entities that could possibly belong to categories like persons, organizations, dates, percentages, etc., and categorize the identified entity to one of these categories.
How Named Entity Extraction is done in OpenNLP?
In OpenNLP, Named Entity Extraction is done using statistical models, i.e., machine learning techniques. Coming to specifics, Maxent modeling is used. To get an intuition on how Maxent modeling works, refer to the motivating example of Maxent modeling.
Example 1 – Named Entity Extraction Example in OpenNLP
The following example, NameFinderExample.java shows how to use NameFinderME class to extract named entities, person and place.
NameFinderExample.java
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.Span;
/**
* This class demonstrates how to use NameFinderME class to do Named Entity Recognition/Extraction tasks.
* @author tutorialkart.com
*/
public class NameFinderExample {
public static void main(String[] args) {
// find person name
try {
System.out.println("-------Finding entities belonging to category : person name------");
new NameFinderExample().findName();
System.out.println();
} catch (IOException e) {
e.printStackTrace();
}
// find place
try {
System.out.println("-------Finding entities belonging to category : place name------");
new NameFinderExample().findLocation();
System.out.println();
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* method to find locations in the sentence
* @throws IOException
*/
public void findName() throws IOException {
InputStream is = new FileInputStream("en-ner-person.bin");
// load the model from file
TokenNameFinderModel model = new TokenNameFinderModel(is);
is.close();
// feed the model to name finder class
NameFinderME nameFinder = new NameFinderME(model);
// input string array
String[] sentence = new String[]{
"John",
"Smith",
"is",
"standing",
"next",
"to",
"bus",
"stop",
"and",
"waiting",
"for",
"Mike",
"."
};
Span nameSpans[] = nameFinder.find(sentence);
// nameSpans contain all the possible entities detected
for(Span s: nameSpans){
System.out.print(s.toString());
System.out.print(" : ");
// s.getStart() : contains the start index of possible name in the input string array
// s.getEnd() : contains the end index of the possible name in the input string array
for(int index=s.getStart();index<s.getEnd();index++){
System.out.print(sentence[index]+" ");
}
System.out.println();
}
}
/**
* method to find locations in the sentence
* @throws IOException
*/
public void findLocation() throws IOException {
InputStream is = new FileInputStream("en-ner-location.bin");
// load the model from file
TokenNameFinderModel model = new TokenNameFinderModel(is);
is.close();
// feed the model to name finder class
NameFinderME nameFinder = new NameFinderME(model);
// input string array
String[] sentence = new String[]{
"John",
"Smith",
"is",
"from",
"Atlanta",
"."
};
Span nameSpans[] = nameFinder.find(sentence);
// nameSpans contain all the possible entities detected
for(Span s: nameSpans){
System.out.print(s.toString());
System.out.print(" : ");
// s.getStart() : contains the start index of possible name in the input string array
// s.getEnd() : contains the end index of the possible name in the input string array
for(int index=s.getStart();index<s.getEnd();index++){
System.out.print(sentence[index]+" ");
}
System.out.println();
}
}
}
When the example program, NameFinderExample.java is run, the output to console is as shown in the following.
Output
-------Finding entities belonging to category : person name------
[0..2) person : John Smith
[11..12) person : Mike
-------Finding entities belonging to category : place name------
[4..5) location : Atlanta
The project structure and the model file location, etc., is shown below:
Model File
The model files en-ner-person.bin, en-ner-person.bin and other ner models are available at http://opennlp.sourceforge.net/models-1.5/. Stay updated regarding latest releases of openNLP or model files, at https://opennlp.apache.org/download.html
Conclusion
In this OpenNLP Tutorial, we have seen how to use Named Entity Extraction API of OpenNLP to extract named entities from a paragraph or sentence.