Intelligent Tagging - RESTful API
Download tutorial source code |
Click here to download |
Last update | January 2020 |
Environment | Windows, Linux |
Compilers | JDK 1.7 or greater |
Prerequisites | Components
|
Description
Our example will open and read one or more input text files containing unstructured content, parse, analyze and create one or more structured output text files containing the results of the analysis:
- Our example picks up from inputDir (included with the tutorial) a file with the text to be tagged.
- Our example creates in outputDir (included with the tutorial) an xml file with the produced tags.
Please note that any language that supports HTTP can be used to implement the request.
Setup Steps
The steps include:
- If you don't already have a free Open Calais access token, register for MyRefinitiv, and then login to PermID.org with your credentials. An Open Calais access token is automatically e-mailed to you.
- Review the example source code
- Put your file(s) to be tagged into inputDir folder (the tutorial includes a sample input text file called: apireq.txt)
- Make sure outputDir exists and is writable (the tutorial includes the inputDir and outputDir folders)
- Build and run
Review the example code
The steps include:
- Create HTTP client
// create HTTP client
HttpClientCalaisPost httpClientPost = new HttpClientCalaisPost();
- Create PostMethod
// specify end-point URL
private static final String CALAIS_URL = "https://api-eit.refinitiv.com/permid/calais";
PostMethod method = new PostMethod(CALAIS_URL);
- Specify mandatory parameters
method.setRequestHeader("X-AG-Access-Token", uniqueAccessKey);
method.setRequestHeader("Content-Type", "text/raw");
method.setRequestHeader("x-calais-selectiveTags", "company,person,industry,socialtags,topic");
// Set response/output format
method.setRequestHeader("outputformat", "xml/rdf" /*"application/json"*/);
- Set request entity to be our input file
method.setRequestEntity(new FileRequestEntity(file, null));
- Execute the post method on the client and release connection
try {
int returnCode = client.executeMethod(method);
if (returnCode == HttpStatus.SC_NOT_IMPLEMENTED) {
System.err.println("The Post method is not implemented by this URI");
// still consume the response body
method.getResponseBodyAsString();
} else if (returnCode == HttpStatus.SC_OK) {
System.out.println("File post succeeded: " + file);
saveResponse(file, method);
} else {
System.err.println("File post failed: " + file);
System.err.println("Got code: " + returnCode);
System.err.println("response: "+method.getResponseBodyAsString());
}
} catch (Exception e) {
e.printStackTrace();
} finally {
method.releaseConnection();
}
javac -cp ".;prereqs\httpclient-4.4.jar;prereqs\commons-codec-1.10.jar;prereqs\commons-httpclient-3.1.jar;prereqs\commons-logging-1.2.jar" tr\test\*.java
Run
It takes 3 arguments:
- Input folder name to process
- Output folder name to store response from Calais
- Token
java -cp ".;prereqs\httpclient-4.4.jar;prereqs\commons-codec-1.10.jar;prereqs\commons-httpclient-3.1.jar;prereqs\commons-logging-1.2.jar" tr.test.HttpClientCalaisPost inputDir outputDir YOURTOKENGOESHERE
Understanding the input
Input file is just plain text:
This is a text about google, about Google is this text
Megan Smith
In September 2014, President Obama named Megan Smith the United States Chief Technology Officer (CTO) in the Office of Science and Technology Policy. In this role, she serves as an Assistant to the President. As U.S. CTO, Smith focuses on how technology policy and innovation can advance the future of our nation.
Megan Smith is an award-winning entrepreneur, engineer, and tech evangelist, most recently serving as a Vice President at Google[x], where she worked on a range of projects and co-created the company’s “SolveForX” innovation community project as well as its “WomenTechmakers” tech-diversity initiative.
…
Request URL:
https://api-eit.refinitiv.com/permid/calais
The request includes the path and the query parameters. The mandatory headers are also included with the request (see the code). From api-eit.refinitiv.com we request from the open calais service to tag the information contained within our file.
Understanding the expected output
This is the command line response we receive when we submit for matching the example file included with this tutorial:
working on all files in C:\projects\Web\OpenCalais\OpenCalaisHTTP\inputDir
File post succeeded: inputDir\apireq.txt
This is an excerpt from the tagged file:
<?xml version="1.0" encoding="UTF-8"?>
<!--Use of the Calais Web Service is governed by the Terms of Service located at http://www.opencalais.com. By using this service or the results of the service you agree to these terms of service.--><!--Relations:
Company: General Magic, Google, Malala Fund, PlanetOut, apple japan, mit media lab, technology review, vital voices
Person: Megan Smith, Obama
--><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:c="http://s.opencalais.com/1/pred/"><rdf:Description c:calaisRequestID="b901d7cc-68bd-f17d-16f8-5e2508851e2f" c:id="http://id.opencalais.com/SR7fEPxwmQlEzUUmcTkKSw" c:ontology="http://trit-us-east-1-sharedamd.int.refinitiv.com/owlschema/13.0-rc2/onecalais.owl.allmetadata.xml" rdf:about="http://d.opencalais.com/dochash-1/1ebb8fc9-2538-3117-881b-661763e348e5"><rdf:type rdf:resource="http://s.opencalais.com/1/type/sys/DocInfo"/><c:document><![CDATA[This is a text about google, about Google is this text
Megan Smith
In September 2014, President Obama named Megan Smith the United States Chief Technology Officer (CTO) in the Office of Science and Technology Policy. In this role, she serves as an Assistant to the President. As U.S. CTO, Smith focuses on how technology policy and innovation can advance the future of our nation.
Or, if we were to go with outputFormat=”application/json” (an excerpt from the tagged file):
"http:\/\/d.opencalais.com\/comphash-1\/30c4d804-1991-3421-8014-56d6bf13d59e":{
"_typeGroup":"entities",
"_type":"Company",
"forenduserdisplay":"false",
"name":"mit media lab",
"confidencelevel":"0.944",
"recognizedas":"name",
"_typeReference":"http:\/\/s.opencalais.com\/1\/type\/em\/e\/Company",
"instances":[{
"detection":"[ \n \nShe has served on the boards of MIT, ]MIT Media Lab[, MIT Technology Review, and Vital Voices; as a]",
"prefix":" \n \nShe has served on the boards of MIT, ",
"exact":"MIT Media Lab",
"suffix":", MIT Technology Review, and Vital Voices; as a",
"offset":1710,
"length":13},
{"detection":"[she completed her master's thesis work at the ]MIT Media Lab[. \n]",
"prefix":"she completed her master's thesis work at the ",
"exact":"MIT Media Lab",
"suffix":". \n",
"offset":2060,
"length":13}],
"relevance":0.2,
"resolutions":[{
"name":"MIT Media Lab",
"permid":"5035087856",
"ispublic":"false",
"commonname":"MIT Media Lab",
"score":1,
"id":"https:\/\/permid.org\/1-5035087856"}],
"confidence":{
"statisticalfeature":"0.746",
"dblookup":"0.0","
resolution":"1.0",
"aggregate":"0.944"}},
"http:\/\/d.opencalais.com\/pershash-1\/b90f7a7b-9e5a-3842-8972-2d854e6024e2":{
"_typeGroup":"entities",
"_type":"Person",
"forenduserdisplay":"true",
"name":"Megan Smith",
"persontype":"N\/A",
"nationality":"N\/A",
"confidencelevel":"0.806",
"commonname":"Megan Smith",
"confidence":{
"statisticalfeature":"0.516",
"dblookup":"0.95",
"resolution":"0.6655559",
"aggregate":"0.806"},
"resolutions":[{
"name":"Megan J Smith",
"personid":"1948800",
"paid":"34414846458",
"officerid":"2294220",
"commonname":"Megan Smith"
"score":0.6655559}],
"_typeReference":"http:\/\/s.opencalais.com\/1\/type\/em\/e\/Person",
"permid":"https:\/\/permid.org\/1-404011",
"instances":[{
"detection":"[2000 miles across the Australian outback. \n \n]She[ has served on the boards of MIT, MIT Media Lab,]",
"prefix":"2000 miles across the Australian outback. \n \n",
"exact":"She",
"suffix":" has served on the boards of MIT, MIT Media Lab,",
"offset":1673,
"length":3},
{"detection":"[Smith \nIn September 2014, President Obama named ]Megan Smith[ the United States Chief Technology Officer (CTO)]",
"prefix":"Smith \nIn September 2014, President Obama named ",
"exact":"Megan Smith",
"suffix":" the United States Chief Technology Officer (CTO)",
"offset":112,
"length":11},
In addition to identifying and tagging individual text strings, Open Calais further enriches your data with metadata tags designed to describe the text. Open Calais automatically analyzes your input text and performs the following processes:
- Named Entity and Relationship Recognition
- Aboutness Tagging
- Social Tagging – Classifies the document based on Wikipedia folksonomy.
- Category Tagging – Identifies the topics discussed in the document. The list of possible topics is defined by the Refinitiv Coding Services (RCS) and International Press Telecommunications Council (IPTC) taxonomies.
- Industry Tagging –Identifies the industries related to the text. The list of industries that can be identified is defined by the TRBC Business Classification taxonomy
Learn more
For more information, developer guides, FAQ and Release Notes check out the documentation.