How to get MRN News Analytics Data via WebSocket API | Refinitiv

Article

How to get MRN News Analytics Data via WebSocket API

March 03, 2020

Wasin Waeosri

Overview

Update: December 2021.

This article shows how to get each Refinitiv News Analytics (TRNA) field data from Machine Readable News (MRN) message. The example application consumes TRNA data from Refinitiv Real-Time Distribution System (Refinitiv Real-Time Advanced Data Hub and Refinitiv Real-Time Advanced Distribution Server) via Machine Readable News (MRN) domain via the Websocket API for Pricing Streaming and Real-Time Service aka Websocket API connection.

Update (As of December 2021): The GitHub example now supports the Refinitiv Real-Time -- Optimized (RTO) connection.

Please check the new mrn_trna_console_rto.py console application and mrn_trna_notebook_app_rto.ipynb notebook files.
The deployed Refinitiv Real-Time Distribution System (RTDS) examples are mrn_trna_console_app.py console application and mrn_trna_notebook_app.ipynb notebook files.

WebSocket API Overview and Prerequisite

The WebSocket API is direct access that enables easy integration into a multitude of client technology environments such as scripting and web. This API runs directly on your Refinitiv Real-Time Distribution System infrastructure or the on the cloud and presents data in an open (JSON) readable format.

This example is focusing on the Refinitiv Machine Readable News (MRN) data processing only. I highly recommend you check the WebSocket API Tutorials page if you are not familiar with WebSocket API.

The Tutorials page provides a step-by-step guide (connect, log in, request data, parse data, etc) for developers who are interested in developing a WebSocket application to consume real-time data from Refinitiv Real-Time. The Tutorials also cover both Refinitiv Real-Time Optimized (formerly known as ERT in Cloud) and deployed Refinitiv Real-Time Distribution System infrastructure connection scenarios.

Refinitiv News Analytics Overview

Refinitiv News Analytics (TRNA) provides real-time numerical insight into the events on multiple news sources, in a format that can be directly consumed by algorithmic trading systems. TRNA enables algorithms to exploit the power of news to seize opportunities, capitalize on market inefficiencies, and manage event risk.

TRNA is published via Refinitiv Real-Time as part of Refinitiv Machine Readable News (MRN) data model. MRN is an advanced service for automating the consumption and systematic analysis of news. It delivers deep historical news archives, ultra-low latency structured news, and news analytics directly to your applications.

MRN Data Behavior

MRN is published over Refinitiv Real-Time using an Open Message Model (OMM) envelope in News Text Analytics domain messages. The Real-time News content set is made available over MRN_STORY RIC. The content data is contained in a FRAGMENT field that has been compressed and potentially fragmented across multiple messages, to reduce bandwidth and message size.

A FRAGMENT field has a different data type based on a connection type:

RSSL connection (RTSDK C++/Java): BUFFER type
WebSocket connection: Base64 ASCII string

The data goes through the following series of transformations:

The core content data is a UTF-8 JSON string.
This JSON string is compressed using gzip.
The compressed JSON is split into several fragments (BUFFER or Base64 ASCII string) which each fit into a single update message.
The data fragments are added to an update message as the FRAGMENT field value in a FieldList envelope.

Therefore, to parse the core content data, the application will need to reverse this process. The WebSocket application also needs to convert a received Base64 string in a FRAGMENT field to bytes data before further process this field. This application uses Python base64 and zlib modules to decode Base64 string and decompress JSON string.

How to consume, assemble, and decode MRN data with WebSocket API.

Please visit Introduction to Machine Readable News with WebSocket API article which gives you a full explanation of MRN Data processing topics:

The above article explains MRN with Python language for MRN Real-time News data (MRN_STORY RIC) , but the main concept for consuming and assembling MRN data are the same for all technologies and all MRN-specific RIC names.

If you are not familiar with MRN concept, please visit Webinar Recording: Introduction to Machine Readable News which will give you a full explanation of the MRN data model and implementation logic.

How to get News Analytics fields from MRN JSON message WebSocket API.

The example application is available on GitHub page. It supports the classic Jupyter Notebook environment. Please check the README.md file in the project for more detail regarding how to run the application.

The Notebook application connects to Refinitiv Real-Time Distribution System via the WebSocket connection, then consumes TRNA Data as MRN data domain. When the Notebook receives TRNA data from Refinitiv Real-Time Distribution System, it assembles, decodes MRN textual News Analytics messages and keeps them in _trna_messages list variable. You can get each TRNA JSON message and associate analytics fields from this variable.

News Analytics Data Model Overview

After assembly and decompression, the News Analytics data appears as JSON message in UTF-8 encoding. You can find the full description of News Analytics Data Models and Fields data in User Guide document section of My Refinitiv's News Analytics page.

The News Analytics JSON message has three top-level items:

id: The value of this field is in [feedFamilyCode]:[sourceId] format (see more detail in below section).
analytics: Analytics Groups sub-group containing the analytics scores
newsItem: This group contains metadata sourced directly from the STORY item, in contrast to the newsItem group also inside the analytics group that contains data derived from the TRNA scoring.

News Item Group (Top-Level Group)

The News Analytics feed contains two news item groups. This top-level group (trna_json_message['newsItem']) contains values that are contained within the news item being processed; the other group within the analytics group contains values derived from the news item by the analytics system.

Because the fields below are sourced from the incoming news item data and mapped to the below fields, those mappings can vary by the feedFamilyCode value.

Example Fields:

dataType: The broad type of data the news item belongs to. One of "News", "Social"
feedFamilyCode: A code that identifies the family of feeds the news item came from. Thomson Reuters feeds = "tr"
headline: The headline text of the news item.
sourceTimestamp: UTC timestamp of this news item. Millisecond precision. The source of this data varies by the feedFamilyCode value.
provider: Identifier for the organization which provided the news item. The source of this data varies by the feedFamilyCode value.
- "tr": from provider field
- "mrvr": from sourceName or publisher field
urgency: Differentiates story types.
- 1: alert
- 3: article

Example Code:

    	
            news_item  = _trna_messages[0]["newsItem"]
 
print("dataType: ", news_item["dataType"])
print("Headline: ", news_item["headline"])
print("Regional Timestamp: ", news_item["sourceTimestamp"])
print("feedFamilyCode: ", news_item["feedFamilyCode"])
print("provider: ", news_item["provider"][3:]) # news_item["provider"] == NS:RTRS
print("urgency: ", news_item["urgency"], " : ", (lambda item_type: "alert" if 1 else "article")(news_item["urgency"]))

Example Output:

    	
            dataType:  News
Headline:  Nomad Technologies Holdings Ltd - Monthly Return of Equity Issuer on Movements in Securities for the month ended 29 February 2020(with URL)
sourceTimestamp:  2020-03-03T09:05:29.828Z
feedFamilyCode:  tr
provider:  HIIS
urgency:  3  :  alert

Analytics Score Group

Each analytics score group contains all the analytics information derived from the news item for a specific asset as a simple group of named values.

Example Fields for this group:

assetClass: The broad class that the asset belongs to. Also describes the type of TRTS sentiment engine used in the scoring.
- Either "CMPNY" for a company or "COM" for a commodity.
- Set to "CMPNY" for document-level scores because of use of the same scoring engine as used for company-level scores.
assetCodes: List of prefixed codes, in conjunction with assetId field below, which identify the asset within various symbologies.
- If assetClass value is "CMPNY": "P:" prefix for PermID and "R:" for RIC. Can contain multiple RICs for a single company, including the primary one and those tagged to the news item.
- If assetClass value is "COM": "N2" prefix for the topic code.
assetId: Primary identifier for the asset. PermID for company and topic code for commodity. Please see more detail regarding each topic code for commodity in Appendix 2 "Commodity & Energy Code Coverage" section of News Analytics Data Models document.
assetName: A human readable name for the asset, used as an identifier for unknown entity scoring.
firstMentionSentence: The first sentence, starting with the headline, in which the scored asset is mentioned. Thus, a value of 1 denotes the headline, 2 the first sentence of the story body, 3 the second sentence, etc.
relevance: A decimal number indicating the relevance of the news item to the asset. It ranges from 0 to 1.
sentimentClass: This field indicates the predominant sentiment class for this news item with respect to this asset. The indicated class is the one with the highest probability.
- 1: Positive
- 0: Neutral
- -1: Negative
sentimentNegative: The probability that the sentiment of the news item was negative for the asset.
sentimentNeutral: The probability that the sentiment of the news item was neutral for the asset.
sentimentPositive: The probability that the sentiment of the news item was positive for the asset.

Example Code:

    	
            asset_class = None
asset_codes = None
sentiment_class = {-1: 'Negative', 0: 'Neutral', 1: 'Positive'}
 
def get_permid(asset_codes):
    for code in asset_codes:
        if code[:2] == "P:":
            return code[2:]
 
def get_company(asset_codes):
    company = [code[2:] for code in asset_codes if code[:2] == "R:"]
    return " ".join(company)
 
def get_topic_code(asset_codes):
    topic = [code[3:] for code in asset_codes if code[:3] == "N2:"]
    return " ".join(topic)
 
analytic_scores_group = _trna_messages[0]["analytics"]["analyticsScores"]
 
for analytic_score in analytic_scores_group:
    if analytic_score["assetClass"]:
        asset_class = analytic_score["assetClass"]
        asset_codes = analytic_score["assetCodes"]
        print("assetClass: ", asset_class)
        print("assetCodes: ", asset_codes)
        if asset_class == "CMPNY":
            print("PermID: ", get_permid(asset_codes))
            print("Co: ", get_company(asset_codes))
        elif asset_class == "COM":
            print("Topic Codes: ", get_topic_code(asset_codes))
        print("assetId: ", analytic_score["assetId"])
        print("assetName: ", analytic_score["assetName"])
        print("relevance: ",analytic_score["relevance"])
        print("sentimentClass: ", analytic_score["sentimentClass"], 
              ":", sentiment_class[analytic_score["sentimentClass"]] )
        print("sentimentPositive: ", analytic_score["sentimentPositive"])
        print("sentimentNeutral: ", analytic_score["sentimentNeutral"])
        print("sentimentNegative: ", analytic_score["sentimentNegative"])

Example Output:

    	
            assetClass:  CMPNY
assetCodes:  ['P:5066556859', 'R:8645.HK']
PermID:  5066556859
Co:  8645.HK
assetId:  5066556859
assetName:  Nomad Technologies Holdings Ltd
relevance:  1.0
sentimentClass:  0 : Neutral
sentimentPositive:  0.158962
sentimentNeutral:  0.740094
sentimentNegative:  0.100944

Example Output 2:

    	
            assetClass:  COM
assetCodes:  ['N2:MTAL', 'N2:MTAL08']
Topic Codes:  MTAL MTAL08
assetId:  MTAL
assetName:  Non-Ferrous Metals
relevance:  0.707107
sentimentClass:  1
sentimentPositive:  0.399518
sentimentNeutral:  0.343445
sentimentNegative:  0.257036

Windowed Count Group

The windowed count group is used to associate a count with the window of time it relates to. It is used for the noveltyCounts and volumeCounts analytics properties.

Novelty Counts

The novelty of the content within a news item on a particular asset is calculated by comparing it with the asset-specific text over a cache of previous news items that contain the asset.

The comparison between items is done using a linguistic fingerprint. If the news items are similar, they are termed as being "linked". As a result, a content item can "link" only to an item of the same language.

Five historical periods are used in the comparison. The default periods are 12 hours, 24 hours, 3 days, 5 days, and 7 days prior to the news item’s timestamp.

Volume Counts

The volume of news for each asset is calculated. A cache of previous news items is maintained and the number of news items that mention the asset within each of five historical periods is calculated. The cache is language-specific, e.g., a volumeCount on an English-language item measures the number of other English-language items in that historical period.

By default, the historical periods are 12 hours, 24 hours, 3 days, 5 days and 7 days prior to the news item’s timestamp and are the same used in the novelty calculations. Thus, direct comparisons between similar and total items within the historical periods can be achieved.

Example Fields:

itemCount: Number of items
window: Length of time the count covers nH (for hours) or nD (for days). Default values are "12H", "24H", "3D", "5D", and "7D".

Example Code:

    	
            def windowsed_count_group(group):
    for item in group:
        print("itemCount: ", item["itemCount"])
        print("window: ", item["window"])
 
analytic_scores_group = _trna_messages[0]["analytics"]["analyticsScores"]
 
for analytic_score in analytic_scores_group:
    print("Novelty Counts:\n")
    windowsed_count_group(analytic_score["noveltyCounts"])
    print("--------------------------------------------------------")
    print("Volumn Counts:\n")
    windowsed_count_group(analytic_score["volumeCounts"])
    print("--------------------------------------------------------")

Example Output:

    	
            Novelty Counts:
 
itemCount:  1
window:  12H
itemCount:  1
window:  24H
itemCount:  1
window:  3D
itemCount:  1
window:  5D
itemCount:  1
window:  7D
--------------------------------------------------------
Volumn Counts:
 
itemCount:  1
window:  12H
itemCount:  1
window:  24H
itemCount:  1
window:  3D
itemCount:  1
window:  5D
itemCount:  1
window:  7D
--------------------------------------------------------

Linked Id Group

The linked id group is used to associate an id with its position in a longer list of ids. It is used for the linkedIds. This group is not populated for document-level scores, since novelty is not calculated.

Example Fields:

idPosition: Position of the linkedId in the complete list of linked Ids. 0 is the first/oldest, and the largest/most recent is the 7-day itemCount minus 1.
linkedId: id of the item at this position

Example Code:

    	
            linked_id_group = None
 
analytic_scores_group = _trna_messages[0]["analytics"]["analyticsScores"]
 
for analytic_score in analytic_scores_group:
    print("Linked Id Group: ")
    linked_id_group = analytic_score["linkedIds"]
    if linked_id_group:
        for linked_id in linked_id_group:
            print("idPosition: ", linked_id["idPosition"])
            print("linkedId: ", linked_id["linkedId"])

Example Output:

    	
            Linked Id Group: 
idPosition:  0
linkedId:  tr:HKS8b3CTf_2003032QCbnadsx97vfj84CPVr+wZf+s//2WLKx1mGvf

News Item Group (Analytics Sub-group)

The TRNA feed contains two news item groups. This group (trna_json_message["analytics"]["newsItem"]), within the analytics group, contains values derived from the news item by the analytics system.

Example Fields:

companyCount: The number of companies explicitly listed in the news item in the subjects field
exchangeAction: One of "IMBALANCE", "HALT", "RESUME", "BLOCK TRADE", "INDICATION", "UNDEFINED".
- Set to "UNDEFINED" for all Japanese-language scores.
marketCommentary: Indicator that the item is discussing general market conditions, such as "After the Bell" summaries.
sentenceCount: The total number of sentences in the news item.
wordCount: The total number of lexical tokens (words and punctuation) in the news item.

Example Code:

    	
            news_item_groups = _trna_messages[0]["analytics"]["newsItem"]
 
print("companyCount: ", news_item_groups["companyCount"])
print("exchangeAction: ", news_item_groups["exchangeAction"])
print("marketCommentary: ", news_item_groups["marketCommentary"])
print("sentenceCount: ", news_item_groups["sentenceCount"])
print("wordCount: ", news_item_groups["wordCount"])

Example Output:

    	
            companyCount:  1
exchangeAction:  UNDEFINED
marketCommentary:  False
sentenceCount:  9
wordCount:  116

Next Steps

Once the application can retrieve each News Analytics field data from Refinitiv Real-Time, the application needs to implement business logic to collect and analyze those data based on interested Analytics asset. Please see the examples of how to use each asset below:

Sentiment: Positive sentiment typically leads to asset price rise, negative sentiment to a decline
Relevance: Filter out News Analytics records with low relevance
Novelty: Filter out News Analytics records that are similar to more than 0 or 1 recent news items
Volume: A sudden spike in overall news volume often leads to increased trading volume and volatility

For more detail regarding each asset usage and information, please check Refinitiv News Analytics Product page.