REST API Tutorial 3: On Demand data extraction workflow

Last update

 

Dec 2023

Environment Any
Language Any HTTP is supported
Compilers None
Prerequisites DSS login, internet access
Source code Below

Tutorial purpose

This tutorial is a prerequisite for the following tutorials.

It explains the workflow for a raw data extraction, using an On Demand extraction request.

It also gives some tips on request tuning for best performance.

 

Table of contents


Raw data extraction workflow outline

The raw data extraction workflow can apply to several types of On Demand historical data requests (which are covered in the following tutorials):

  • Tick data
  • Market depth
  • Intraday bars
  • End of Day data

Here are the basic steps:

  • Retrieve the available field list from the server (optional).
  • Request historical data from the server, using an on demand request. The request will be queued, then executed.
  • Check the request status. Poll it until the request is completed.
  • Retrieve the data.

 

Get available field list - HTTP request

This step is optional. If you do not know what content field names are available, you can request a list of those available.

The available field set depends on the type of data you want to request. The data type must therefore be specified in the request, which is done by setting a report template type in the URL.

List of possible values:

Data type Report template type
Tick data TickHistoryTimeAndSales
Market depth TickHistoryMarketDepth
Intraday bars TickHistoryIntradaySummaries
End of Day ElektronTimeseries

As an example, here is the call to retrieve the available fields for tick data:

URL:

 

    	
            https://selectapi.datascope.refinitiv.com/RestApi/v1/Extractions/GetValidContentFieldTypes(ReportTemplateType=DataScope.Select.Api.Extractions.ReportTemplates.ReportTemplateTypes'TickHistoryTimeAndSales')
        
        
    

Method:          GET

Headers:

Note: for all requests we need a user token, set in the header. The token was retrieved in Tutorial 1.

    	
            

Prefer: respond-async

Content-Type: application/json

Authorization: Token F0ABE9A3FFF2E02E10AE2765ED872C59B8CC3B40EBB61B30E295E71DE31C254B8648DB9434C2DF9299FDC668AA123501F322D99D45C8B93438063C912BC936C7B87062B0CF812138863F5D836A7B31A32DCA67EF07B3B50B2FC4978DF6F76784FDF35FCB523A8430DA93613BC5730CDC310D4D241718F9FC3F2E55465A24957CC287BDEC79046B31AD642606275AEAD76318CB221BD843348E1483670DA13968D8A242AAFCF9E13E23240C905AE46DED9EDCA9BB316B4C5C767B18DB2EA7ADD100817ADF059D01394BC6375BECAF6138C25DBA57577F0061

The following tutorials show other possibilities.

 

Get available field list - HTTP response

If the token is valid, this is the response we get:

Status:                        200 OK

Relevant headers:

    	
            Content-Type: application/json; charset=utf-8
        
        
    

Body:

As an example, here is the beginning of the response, for tick data:

    	
            

{

  "@odata.context": "https://selectapi.datascope.refinitiv.com/RestApi/v1/$metadata#ContentFieldTypes",

  "value": [

    {

      "Code": "THT.Auction - Exchange Time",

      "Name": "Auction - Exchange Time",

      "Description": "Exchange supplied exchange time (Local or GMT depending on the exchange)",

      "FormatType": "Text",

      "FieldGroup": "Auction"

    },

    {

      "Code": "THT.Auction - Price",

      "Name": "Auction - Price",

      "Description": "Auction Price",

      "FormatType": "Number",

      "FieldGroup": "Auction"

    },

This goes on with all the other available fields. Here is the last part:

    	
            

    {

      "Code": "THT.Trade - Yield",

      "Name": "Trade - Yield",

      "Description": "An update to indicate Dividend Yield as adjusted by the last trade or closing price",

      "FormatType": "Number",

      "FieldGroup": "Trade"

    }

  ]

}

The resulting records contain for each field:

  • The field code.
  • The field name. This is used in the actual data request.
  • The field description.
  • The field type. It can be a number, text, or date.
  • The field group. The group classifies fields by record type (in the case of tick data: auctions, corrections, market conditions, quotes and trades). It is preferable not to mix record types in a request, because all the fields will be included in each row of results, resulting in many blank fields that burden the entire workflow.

Use this to choose all the field names you want. In the next step we will make a request for data, using some of these fields.


Request data - HTTP request

This is an On Demand extraction request, which means it will be immediately queued, then executed as soon as possible.

URL:

The raw extraction URL is the same for all data types.

    	
            https://selectapi.datascope.refinitiv.com/RestApi/v1/Extractions/ExtractRaw
        
        
    

Method:          POST

Headers:

The raw extraction header is the same for all data types.

To avoid keeping the connection open and the caller code waiting, we set the preference to an asynchronous response.

For all requests we need to set the user token in the authorization header; the token was retrieved in Tutorial 1.

    	
            

Prefer: respond-async

Content-Type: application/json

Authorization: Token F0ABE9A3FFF2E02E10AE2765ED872C59B8CC3B40EBB61B30E295E71DE31C254B8648DB9434C2DF9299FDC668AA123501F322D99D45C8B93438063C912BC936C7B87062B0CF812138863F5D836A7B31A32DCA67EF07B3B50B2FC4978DF6F76784FDF35FCB523A8430DA93613BC5730CDC310D4D241718F9FC3F2E55465A24957CC287BDEC79046B31AD642606275AEAD76318CB221BD843348E1483670DA13968D8A242AAFCF9E13E23240C905AE46DED9EDCA9BB316B4C5C767B18DB2EA7ADD100817ADF059D01394BC6375BECAF6138C25DBA57577F0061

Body:

The body contents vary depending on the type of data. It mentions this is an extraction request, and contains several parts:

  • The data type you want to request. It must be specified, which is done by setting it in the body of the request. List of possible values:
    Data type Data type
    Tick data TickHistoryTimeAndSalesExtractionRequest
    Market depth TickHistoryMarketDepthExtractionRequest
    Intraday bars TickHistoryIntradaySummariesExtractionRequest
    End of Day ElektronTimeseriesExtractionRequest
  • The list of field names. This also depends on the data type, and was determined in the previous step.
  • The list of instrument identifiers, each one with its type (they can be mixed).
  • The conditions: they also depend on the data type. A typical condition is the date range for the request.

As an example, here is the body of a request for tick data:

    	
            

{

  "ExtractionRequest": {

    "@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.TickHistoryTimeAndSalesExtractionRequest",

    "ContentFieldNames": [

      "Trade - Price",

      "Trade - Volume",

      "Trade - Exchange Time"

    ],

    "IdentifierList": {

      "@odata.type": "#DataScope.Select.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",  

      "InstrumentIdentifiers": [

        { "Identifier": "CARR.PA", "IdentifierType": "Ric" }

      ]

    },

    "Condition": {

      "MessageTimeStampIn": "GmtUtc",

      "ApplyCorrectionsAndCancellations": false,

      "ReportDateRangeType": "Range",

      "QueryStartDate": "2016-09-29T00:00:00.000Z",

      "QueryEndDate": "2016-09-29T12:00:00.000Z",

      "DisplaySourceRIC": true

    }

  }

}

 

Request data - HTTP response

On Demand extraction requests are executed as soon as possible. There is no guarantee on the delivery time, it depends on the amount of requested data, and the server load.

In the request we set a preference for an asynchronous response. We will get a response in 30 seconds (default wait time) or less.

The HTTP status of the response can have one of several values, here we shall detail the most likely ones:

  • 202 Accepted - this is the one we are most likely to receive. It means the request was accepted, but processing has not yet completed. The next step is to check the request status by polling it regularly until it returns a 200 OK.
  • 200 OK - this one may happen if the request was very small. It means the request processing has completed. We can skip the step where we check the request status, and proceed directly to the last step, which is to retrieve the data.
  • Other codes: follow this link for a full list with detailed explanations.

It is strongly recommended that you ensure your code handles all possible status codes.

When requests take more than 30 seconds, a 202 Accepted is returned as the first response. Usually Tick History requests will take more than 30 seconds, which means that 202 Accepted will be the normal first response.

You can customize the wait time, but this is not recommended.

Let us now look at the two most common responses in detail.

 

Request data - 202 Accepted HTTP response

The request was accepted, but processing has not yet completed. This response is the most likely, especially if the request is for a large amount of data.

Status:                        202 Accepted

Relevant headers:

    	
            Location: https://selectapi.datascope.refinitiv.com/RestApi/v1/Extractions/ExtractRawResult(ExtractionId='0x0785d7e9572c76b1')
        
        
    

The location URL must be saved, we will use it in the next step, check request status. Note: the last part of the URL (0x0785d7e9572c76b1) is the job ID for this request.

Body:                          Response does not contain any data

 

Request data - 200 OK HTTP response

Instead of a 202 Accepted, we could receive a 200 OK. This means the request has completed.

Status:                        200 OK

Relevant headers:

    	
            Content-Type: application/json; charset=utf-8
        
        
    

Body:

    	
            

{

  "@odata.context": "https://selectapi.datascope.refinitiv.com/RestApi/v1/$metadata#RawExtractionResults/$entity",

  "JobId": "0x0785d7e9572c76b1",

  "Notes": [

    "Extraction Services Version 14.5.42294 (737b0965c07f), Built Apr  8 2021 13:43:46\nUser ID: 9008895\nExtraction ID: 2000000249659457\nSchedule: 0x0785d7e9572c76b1 (ID = 0x0000000000000000)\nInput List (1 items):  (ID = 0x0785d7e9572c76b1) Created: 04/19/2021 10:23:30 Last Modified: 04/19/2021 10:23:30\nReport Template (3 fields): _OnD_0x0785d7e9572c76b1 (ID = 0x0785d7e9574c76b1) Created: 04/19/2021 10:21:53 Last Modified: 04/19/2021 10:21:53\nSchedule dispatched via message queue (0x0785d7e9572c76b1), Data source identifier (ADF5F7B662B34B91A118EEF071688A29)\nSchedule Time: 04/19/2021 10:21:55\nProcessing started at 04/19/2021 10:21:55\nProcessing completed successfully at 04/19/2021 10:23:30\nExtraction finished at 04/19/2021 10:23:30 UTC, with servers: tm02n02\nInstrument <RIC,CARR.PA> expanded to 1 RIC: CARR.PA.\nQuota Message: INFO: Tick History Cash Quota Count Before Extraction: 998; Instruments Extracted: 1; Tick History Cash Quota Count After Extraction: 998, 99.8% of Limit; Tick History Cash Quota Limit: 1000\nManifest: #RIC,Domain,Start,End,Status,Count\nManifest: CARR.PA,Market Price,2016-09-29T07:00:11.672415651Z,2016-09-29T11:59:46.552806988Z,Active,3620\n"

  ]

}

The JobId value must be saved, we will use it in the next step.

The Notes contain important information on the request, IDs, timestamps, eventual errors, and extraction quota status. If the request completed successfully, it will also contain the message: Processing completed successfully. It is recommended to store them, and analyze the text to detect issues, warnings or errors.

As the request status has been returned directly (because it was a very quick extraction), the next step (check request status) is not required.

We skip it to go directly to retrieve the data, using the returned JobId.

 

Check request status - HTTP request

Skip this step if the previous step returned an HTTP status of 200 OK.

If the previous step returned an HTTP status of 202 Accepted, this step must be executed, and repeated in a polling loop until it returns an HTTP status of 200 OK.

URL:

This is the Location URL, taken from the 202 response header received in the previous step.

    	
            https://selectapi.datascope.refinitiv.com/RestApi/v1/Extractions/ExtractRawResult(ExtractionId='0x0785d7e9572c76b1')
        
        
    

Method:          GET

Headers:

This is the same as for the other steps:

    	
            

Prefer: respond-async

Content-Type: application/json

Authorization: Token F0ABE9A3FFF2E02E10AE2765ED872C59B8CC3B40EBB61B30E295E71DE31C254B8648DB9434C2DF9299FDC668AA123501F322D99D45C8B93438063C912BC936C7B87062B0CF812138863F5D836A7B31A32DCA67EF07B3B50B2FC4978DF6F76784FDF35FCB523A8430DA93613BC5730CDC310D4D241718F9FC3F2E55465A24957CC287BDEC79046B31AD642606275AEAD76318CB221BD843348E1483670DA13968D8A242AAFCF9E13E23240C905AE46DED9EDCA9BB316B4C5C767B18DB2EA7ADD100817ADF059D01394BC6375BECAF6138C25DBA57577F0061

 

Check request status - HTTP response

If you receive an HTTP status 202 Accepted response (the same as in the previous step), it means the request has not yet completed. You must wait a bit and check the request status again.

If you receive an HTTP status 200 OK response, the request has completed:

Status:                        200 OK

Relevant headers:

    	
            Content-Type: application/json; charset=utf-8
        
        
    

Body:

    	
            

{

  "@odata.context": "https://selectapi.datascope.refinitiv.com/RestApi/v1/$metadata#RawExtractionResults/$entity",

  "JobId": "0x0785d7e9572c76b1",

  "Notes": [

    "Extraction Services Version 14.5.42294 (737b0965c07f), Built Apr  8 2021 13:43:46\nUser ID: 9008895\nExtraction ID: 2000000249659457\nSchedule: 0x0785d7e9572c76b1 (ID = 0x0000000000000000)\nInput List (1 items):  (ID = 0x0785d7e9572c76b1) Created: 04/19/2021 10:23:30 Last Modified: 04/19/2021 10:23:30\nReport Template (3 fields): _OnD_0x0785d7e9572c76b1 (ID = 0x0785d7e9574c76b1) Created: 04/19/2021 10:21:53 Last Modified: 04/19/2021 10:21:53\nSchedule dispatched via message queue (0x0785d7e9572c76b1), Data source identifier (ADF5F7B662B34B91A118EEF071688A29)\nSchedule Time: 04/19/2021 10:21:55\nProcessing started at 04/19/2021 10:21:55\nProcessing completed successfully at 04/19/2021 10:23:30\nExtraction finished at 04/19/2021 10:23:30 UTC, with servers: tm02n02\nInstrument <RIC,CARR.PA> expanded to 1 RIC: CARR.PA.\nQuota Message: INFO: Tick History Cash Quota Count Before Extraction: 998; Instruments Extracted: 1; Tick History Cash Quota Count After Extraction: 998, 99.8% of Limit; Tick History Cash Quota Limit: 1000\nManifest: #RIC,Domain,Start,End,Status,Count\nManifest: CARR.PA,Market Price,2016-09-29T07:00:11.672415651Z,2016-09-29T11:59:46.552806988Z,Active,3620\n"

  ]

}

The JobId value must be saved, we will use it in the next step.

The Notes contain information on the request, IDs, timestamps, eventual errors, and extraction quota status. If the request completed successfully, it will also contain the message: Processing completed successfully.

We can now retrieve the data, using the returned JobId.

Note: this 200 response is identical in format and content to the 200 OK response that could also have been returned directly when we requested the extraction.


Retrieve data - HTTP request

It is mandatory to have received a 200 OK response with a JobID from a previous step before proceeding with this last step.

URL:

Note the JobId value used as parameter in the URL:

    	
            https://selectapi.datascope.refinitiv.com/RestApi/v1/Extractions/RawExtractionResults('0x0785d7e9572c76b1')/$value
        
        
    

Method:          GET

Headers:

This is the same as for the other steps, except for the Content-Type. The server will always send the data in compressed format (gzip CSV). You can include “Accept-Encoding: gzip” for raw stream downloads,  this will allow the client (browser or SDK) to automatically unzip the data.

    	
            

Prefer: respond-async

Content-Type: Accept-Encoding: gzip, deflate

Authorization: Token F0ABE9A3FFF2E02E10AE2765ED872C59B8CC3B40EBB61B30E295E71DE31C254B8648DB9434C2DF9299FDC668AA123501F322D99D45C8B93438063C912BC936C7B87062B0CF812138863F5D836A7B31A32DCA67EF07B3B50B2FC4978DF6F76784FDF35FCB523A8430DA93613BC5730CDC310D4D241718F9FC3F2E55465A24957CC287BDEC79046B31AD642606275AEAD76318CB221BD843348E1483670DA13968D8A242AAFCF9E13E23240C905AE46DED9EDCA9BB316B4C5C767B18DB2EA7ADD100817ADF059D01394BC6375BECAF6138C25DBA57577F0061

 

Retrieve data - HTTP response

We should get a response of this type:

Status:                        200 OK

Relevant headers:

    	
            

Content-Encoding: gzip

Content-Type: text/plain

Body:

The content is compressed plain text in CSV format. Depending on the nature of the data, the time range and number of instruments, the response can be quite long and contain tens of thousands of lines.

As an example, here is the beginning of the response content for a tick data request:

    	
            

#RIC,Domain,Date-Time,Type,Price,Volume,Exch Time

CARR.PA,Market Price,2016-09-29T07:00:11.672415651Z,Trade,23.25,63,07:00:11.000000000

CARR.PA,Market Price,2016-09-29T07:00:11.672415651Z,Trade,23.25,64,07:00:11.000000000

CARR.PA,Market Price,2016-09-29T07:00:11.672415651Z,Trade,23.25,27,07:00:11.000000000

CARR.PA,Market Price,2016-09-29T07:00:11.672415651Z,Trade,23.25,2115,07:00:11.000000000

CARR.PA,Market Price,2016-09-29T07:00:11.672415651Z,Trade,23.25,21,07:00:11.000000000

CARR.PA,Market Price,2016-09-29T07:00:11.672674987Z,Trade,23.25,21,07:00:11.000000000

CARR.PA,Market Price,2016-09-29T07:00:11.672674987Z,Trade,23.25,11,07:00:11.000000000

CARR.PA,Market Price,2016-09-29T07:00:11.672674987Z,Trade,23.25,61,07:00:11.000000000

CARR.PA,Market Price,2016-09-29T07:00:11.672674987Z,Trade,23.25,235,07:00:11.000000000

 

Request tuning and best practices

Requests for raw data, tick data and market depth data can generate very large result sets.

To optimize the retrieval times, see the Request tuning and best practices document under the Documentation tab.