Article

Machine Readable News (MRN) & N2_UBMS Comparison and Migration Guide

July 19, 2017

Warat Boonyanit

MRN & N2_UBMS Comparison and Migration Guide

This article is intended for use by Refinitiv Real-Time – Optimized and Refinitiv Real-time APIs developers who wish to migrate from N2_UBMS news to Machine Readable News. It highlights the key differences between N2_UBMS and MRN, and gives examples to help developers understand how to replace N2_UBMS news with MRN.

N2_UBMS REAL-TIME NEWS

This section will describe N2_UBMS construction and implementation. It aimed to help new developers who might not have the background knowledge in N2_UBMS real-time news. Experienced developers can skip to the next section.

N2_UBMS Overview

N2_UBMS is a legacy RIC for Newsfeed broadcast message. Any incoming news story results in broadcast messages being transmitted as updates to N2_UBMS RIC. However, a news story may be made available in many parts. For example, the first part of a story may include one or more Alerts, which is a brief news headline containing the most essential information relating to an emerging story. An Alert will be followed by a newsbreak which is a headline and a piece of text for the story. This text and its associated headline are called a Take. More than one Takes are possible allowing more story text to be provided as and when available.

In order to identify each part of a news story, all transmitted news will have a common identifier that enables all parts of the same story to be recognized, and this identifier was called the "Primary News Access Code" (PNAC).

Each broadcast message on N2_UBMS contains a set of fields in Market Price domain which are listed in the next two sections. These fields contain "Primary News Access Code", headline, category codes, and timestamps, but not the story body.

The body of a story is split into a number of text segments, each of which has a unique identifier that is used as a RIC for retrieval purposes. The RIC for the first story text segment is always the PNAC, and each story text segment has pointers that allow segments to be linked together. Consumer application must use PNAC from N2_UBMS and subsequence story text segment RICs to retrieve the full body of a story.

A news story also has a set of category codes. These codes are transmitted with the broadcast messages. Codes represent different aspects of the news item and include product codes, topic codes, company codes, and attribution. Product codes identify whether a user is allowed to receive a news item and also define a broad range for the subject matter, e.g. "M" for Money International News Service. Topic codes describe the story's subject matter, e.g. "INT" for interest rates. Company codes identify a particular company affected by the news item, e.g. "RTR.L" for Reuters. Attribution defines the source of a specific news item.

Fields usage in broadcast messages

FID	FID Name	Description
3	DISPLY_NAME	Identify message subtype
235	PNAC	Primary News Access Code
255	PROC_DATE	Take date
259	RECORDTYPE	Set to 232 for news
264	BCAST_TEXT	Headline text
715	STORY_ID	Sequential value for message loss detection
720	TAKE_SEQNO	Message sequence number
725	ATTRIBTN	News Source
749	PROD_CODE	List of product codes
750	TOPIC_CODE	List of topic codes
751	CO_IDS	List of company codes
752	LANG_IND	Language indicator
1015	TAKE_TIME	Take time
1024	STORY_TIME	Story time
1027	STORY_DATE	Story date

Broadcast messages example

    	
            UPDATE Item Name: N2_UBMS
Fid: 3 Name = DSPLY_NAME DataType: Rmtes Value: 2
Fid: 1 Name = PROD_PERM DataType: UInt Value: 7710
Fid: 235 Name = PNAC DataType: Ascii Value: nHKS1w6r3n
Fid: 255 Name = PROC_DATE DataType: Date Value: 7 / 7 / 2017
Fid: 259 Name = RECORDTYPE DataType: UInt Value: 232
Fid: 264 Name = BCAST_TEXT DataType: Rmtes Value: The quick brown fox jumps over the lazy dog
Fid: 715 Name = STORY_ID DataType: Ascii Value: n&1EVEmt01
Fid: 720 Name = TAKE_SEQNO DataType: UInt Value: 1
Fid: 725 Name = ATTRIBTN DataType: Rmtes Value: HIIS
Fid: 749 Name = PROD_CODE DataType: Rmtes Value: SHKI
Fid: 750 Name = TOPIC_CODE DataType: Rmtes Value: ASIA CMPNY CN EMRG ENER EQTY HK LEN RENE RENQ STX HIIS
Fid: 751 Name = CO_IDS DataType: Rmtes Value: 9999.HK
Fid: 752 Name = LANG_IND DataType: Rmtes Value: EN
Fid: 1015 Name = TAKE_TIME DataType: Time Value: 10:11:12:0
Fid: 1024 Name = STORY_TIME DataType: Time Value: 10:11:12:0
Fid: 1027 Name = STORY_DATE DataType: Date Value: 7 / 7 / 2017

Fields usage in PNAC and subsequence story text segment messages

FID	FID Name	Description
2	RDNDISPLAY	Set to 136
237	PREV_LR	RIC for previous text segment
238	NEXT_LR	RIC for next text segment
254	UNIQUE_SN	PNAC
255	PROC_DATE	Take date
256	PROC_TIME	Take time
258	SEG_TEXT	Story text segment
259	RECORDTYPE	Set to 232 for news
723	TAB_TEXT	Tabular text indicator
727	MORE_NEWS	Used to indicate if more news is expected
752	LANG_IND	Language indicator

Subsequence story text example

    	
            REFRESH Item Name: nL3N1J5319
Fid: 1 Name = PROD_PERM DataType: UInt Value: 511
Fid: 2 Name = RDNDISPLAY DataType: UInt Value: 136
Fid: 237 Name = PREV_LR DataType: Ascii Value: <BLANK>
Fid: 238 Name = NEXT_LR DataType: Ascii Value: n#1EaGva03
Fid: 254 Name = UNIQUE_SN DataType: Ascii Value: nHKS1w6r3n
Fid: 255 Name = PROC_DATE DataType: Date Value: 7 / 7 / 2017
Fid: 256 Name = PROC_TIME DataType: Time Value: 8:9:0:0
Fid: 258 Name = SEG_TEXT DataType: Rmtes Value: The quick brown fox jumps over the lazy dog
Fid: 259 Name = RECORDTYPE DataType: UInt Value: 232
Fid: 723 Name = TABTEXT DataType: Rmtes Value: X
Fid: 727 Name = MORE_NEWS DataType: Rmtes Value: R

The message type is stored in FID 3 of broadcast messages and contains a single digit. A brief description of each message type as followed.

Message type	FID value	Notes
ALERT	1	Provides a brief summary of an important news item as quickly as possible. An alert can only be used for stories where there is, as yet, no associated story.
FIRST_TAKE	2	Associated with the newsbreak of the story text
SUBSEQUENT_TAKE	3	Associated with additional story text filed after the first take.
SUBSEQUENT_TAKE	3	Historically on legacy systems, we utilized message type 3, but these messages are now rendered as Update (Message Type 6) on live feed.
CORRECTION	4	Update that denotes a meaningful fact was incorrectly communicated and then corrected.
CORRECTED	5	Update that denotes minor issues, spelling, and correcting trivial details previously communicated.
UPDATE	6	A natural update to the story. May include additional context, reaction, analysis, background or facts.
DELETED	7	A story has been removed from the feed and should be deleted immediately, either because it was sent in error or because some other reason requires it to be deleted.

The Primary News Access Code (PNAC, FID 235 for broadcast messages and FID 254 for request/response messages) is an 8-byte string identifier which is used to identify a story. Story text segment RICs (FIDs 237 PREV_LR and 238 NEXT_LR) are also 8-byte strings and represent the specific RIC associated with part of the story text. Both PNACs and Story Text Segment RICs are unique for a period of 24 hours.

Category codes provide a means of associating news items with additional information about each news item. The codes fall into three of distinct sets; Product code on FID 749, Topic code on FID 750, and Company code on FID 751. Attribution (FID 725 ATTRIBTN) indicate the source of the news article. The source of the news is entered as a product code on FID 749 as well.

Two timestamps are included, the Story and Take date/times. The Story Date/Time (FID 1024 STORY_TIME and FID 1027 STORY_DATE) indicates when the news item first appeared, and the Take Date/Time (FID 255 PROC_DATE and FID 1015 TAKE_TIME) indicates the time when this portion of the news item was received.

The headlines and story text (FIDs 264 and 258) are provided using the Reuter implementation of ISO2022 encoding called "Reuter Multilingual Text Encoding Standard" (RMTES). The text for headlines and story are contained within 255 bytes, the maximum length of the FIDs.

MACHINE READABLE NEWS

Currently, there are four MRN content sets available over real-time datafeed, but the one that shall be used by previous N2_UBMS user is Real-time News. This content set is sourced from news alerts and stories from Reuters and dozens of third-party news sources. It contains the headline, story body text, and associated metadata available at news publication time.

MRN Data model

MRN is published over real-time data feed using an Open Message Model (OMM) envelope in News Text Analytics domain RSSL messages. The Real-time News content set is made available over MRN_STORY RIC. The content data is contained in a FRAGMENT (BUFFER type) field that has been compressed, and potentially fragmented across multiple messages, in order to reduce bandwidth and RSSL message size.

The data goes through the following series of transformations:

The core content data is a UTF-8 JSON string
This JSON string is compressed using gzip
The compressed JSON is split into a number of fragments which each fit into a single RSSL update
The data fragments are added to an update message as the FRAGMENT field value in a FieldList envelope

Therefore, in order to parse the core content data, the application will need to reverse this process.

Five fields, as well as the RIC itself, are necessary to determine whether the entire item has been received in its various fragments and how to concatenate the fragments to reconstruct the item:

MRN_SRC: identifier of the scoring/processing system that published the FRAGMENT
GUID: a globally unique identifier for the data item. All messages for this data item will have the same GUID values.
FRAGMENT: compressed data item fragment, itself
TOT_SIZE: total size in bytes of the fragmented data
FRAG_NUM: sequence number of fragments within a data item. This is set to 1 for the first fragment of each item published and is incremented for each subsequent fragment for the same item.

A single MRN data item publication is uniquely identified by the combination of RIC, MRN_SRC, and GUID.

Fragmentation

For a given RIC-MRN_SRC-GUID combination, when a data item requires only a single message, then TOT_SIZE will equal the number of bytes in the FRAGMENT and FRAG_NUM will be 1.

When multiple messages are required, then the data item can be deemed as fully received once the sum of the number of bytes of each FRAGMENT equals TOT_SUM. The consumer will also observe that all FRAG_NUM range from 1 to the number of the fragment, with no intermediate integers skipped. In other words, a data item transmitted over three messages will contain FRAG_NUM values of 1, 2 and 3.

Compression

The FRAGMENT field is compressed with gzip compression, thus requiring the consumer to decompress to reveal the JSON plain-text data in that FID.

When an MRN data item is sent in multiple messages, all the messages must be received and their FRAGMENTs concatenated before being decompressed. In other words, the FRAGMENTs should not be decompressed independently of each other.

The decompressed output is encoded in UTF-8 and formatted as JSON.

N2_UBMS and MRN Real-time News Comparison

As mentioned in the overview, the content of N2_UBMS appears as Market-Price domain field value pair, while the content of MRN Real-time news appears as UTF-8 JSON in FRAGMENT field of News Text Analytics domain.

The following table lists the differences between the process the consumer application have to take in order to extract the data.

N2_UBMS	MRN Real-Time News
Use Market Price domain	Use News Text Analytics domain
Request RIC	Request RIC
Received update messages contain headline and PNAC	Received update messages contain fragments of entire data
Use the PNAC from the update message to request the first story body segment
Use the pointer to request subsequence segments
Combine the story body from each segment	Combine the fragments
	Decompress the gzipped data.
	The entire content data is in JSON format.

Since News Text Analytic domain is a new domain, API users who use legacy market data interface will not be able to use this domain. It is recommended that legacy API users should upgrade to OMM interface.

In N2_UBMS, consumer application must make multiple requests in order to retrieve the full body of a story. First, it has to request RIC for headline and PNAC, then use PNAC to request the first body segment. After that, it has to use the pointer to the next segment for subsequence body parts. But in MRN, the consumer makes a single RIC request. The consumer receives update messages containing fragments of data which it has to combine and decompress. MRN consumer requires a compression library that can decompress gzip. And since news story content data is in JSON format, a JSON parser is necessary as well.

Fields Mapping

Both MRN Real-time News and N2_UBMS contain the same headline, story body text, and associated metadata about the story. The following table maps MRN Real-time News’ fields to N2_UBMS fields.

Field Categories	General Description	MRN_STORY	N2_UBMS	Notes
Identifying Information	Item ID	Id		The two fields have the same value.
	Item ID	GUID		GUID is in the field list envelope, but not in the core JSON data item
	Primary News Access Code	altId	PNAC
	Take Sequence Number	takeSequence	TAKE_SEQNO
	News Source	provider	ATTRIBTN	MRN_STORY is prefixed by “NS:”
News Text	Headline	headline	BCAST_TEXT
	Story Body	body	SEG_TEXT	MRN_STORY value is escaped to ensure valid JSON
	Language	language	LANG_IND	N2_UBMS use upper case. MRN_STORY use lower case
Timestamps	Story Date	firstCreated	STORY_DATE
	Story Time	firstCreated	STORY_TIME
	Take Date	versionCreated	PROC_DATE
	Take Time	versionCreated	TAKE_TIME
Tagging	Company Codes	subjects	CO_IDS	MRN_STORY is prefixed by “R:”
	Company Codes	subjects	CO_IDS	MRN_STORY may additionally contain company PermIDs, prefixed with “P:”
	Named Item Code	instancesOf	NAMED_ITEM	MRN_STORY is prefixed by “NI:”
	Product Codes	audiences	PROD_CODE	MRN_STORY is prefixed by “NP:”
	Topic Codes	subjects	TOPICS_CODE	MRN_STORY is prefixed by “N2:”
	Item Classification	messageType	DSPLY_NAME	MRN messageType use the same enum value as N2_UBMS.
	Item Classification	urgency	DSPLY_NAME	MRN urgency is a less specific version.

Handling News

When a newsworthy event occurs, the first part of a story can be an alert, a short headline in upper-case that contains the facts and essential detail. Then a newsbreak is created a few minutes after any alerts. Newsbreaks comprise a headline and several paragraphs of body text. They are usually filed in a single take. However, in some cases, further takes are necessary to add text or codes.

In N2_UBMS, the message classification is defined in FID 3.

In MRN, the classification is defined in messageType field, an integer field that uses the same enum value as FID 3 of N2_UBMS.

All parts of a story should contain the same Primary News Access Code or the altid in MRN.

The most common method of handling news stories is to overwrite older messages sharing an altId with the latest publishes message, discarding alerts, first takes, and previous updates.

Correction

For a unique story, all story messages will share the same altId.

However, if there is a substantial error in an alert, the story will be filed with a new PNAC and CORRECTED- is inserted at the beginning of the alert headline.

But if there is a substantial error in a newsbreak, the amended story is usually filed with the same PNAC.