Refinitiv Data Platform APIs
Introduction To Filings - Python
Introducing Filings API Service on Refinitiv Data Platform
A new Filings API service is available on Refinitiv Data Platform (RDP) providing access to Global and EDGAR filing data for over 40 million documents from 135,000 companies worldwide, spanning over 50 years of history from 1968. Automated document feeds and newswires delivers timely and comprehensive collections for USA, Canada, Japan, Norway, Italy, Australia, Singapore, India, China and Korea.
Filings service consists of search and retrieval of public corporate disclosures. In this article, we will review on how to search for specific filings documents and to download the documents from the API.
Filings Search Using GraphQL
Filings documents can be searched through our GraphQL endpoint. GraphQL is a query language for APIs used on the front end to request and receive specific data in the response. Some capabilities used to search for filings documents include:
- Filtering
- Sorting
- Limit
- Pagination
- Keyword Search
To learn more about GraphQL, please visit https://graphql.org/.
Python Environment
For the purpose of demonstration, we are going to use Jupyter Lab with Python 3..8. We are going to discuss the code that is available for download from https://github.com
Valid Credentials - Replace in Code or Read From File
Valid RDP credentials are required to interact with an RDP service:
- USERNAME
- PASSWORD
- CLIENTID
USERNAME = "VALIDUSER"
PASSWORD = "VALIDPASSWORD"
CLIENT_ID = "SELFGENERATEDCLIENTID"
def readCredsFromFile(filePathName):
### Read valid credentials from file
global USERNAME, PASSWORD, CLIENT_ID
credFile = open(filePathName,"r") # one per line
#--- RDP MACHINE ID---
#--- LONG PASSWORD---
#--- GENERATED CLIENT ID---
USERNAME = credFile.readline().rstrip('\n')
PASSWORD = credFile.readline().rstrip('\n')
CLIENT_ID = credFile.readline().rstrip('\n')
credFile.close()
readCredsFromFile("..\creds\credFileHuman.txt")
# Uncomment - to make sure that creds are either set in code or read in correctly
#print("USERNAME="+str(USERNAME))
#print("PASSWORD="+str(PASSWORD))
#print("CLIENT_ID="+str(CLIENT_ID))
We include two ways to supply the valid credentials.
- One is, to replace the placeholders in code, "VALIDUSER" ... with the valid personal credential values. To enact, comment out the call to read cred from file:
#readCredsFromFile("..\creds\credFileHuman.txt")
- The other way is to store a set of valid RDP credentials in a file that is stored in path "../creds" in file "credsFileHuman.txt" and have the code retrieve the credentials from the file.
The file is expected to be in simple format one entity per line:
VALIDUSER
VALIDPASSWORD
SELFGENERATEDCLIENTID
Define Token Handling and Obtain a Valid Token
Having a valid token is a pre-requisite to requesting of any RDP content, and will be passed into the next steps. For additional information on Authorization and Tokens, refer to RDP Tutorial: Authorization - All about tokens.
The implementation steps that come next may look familiar, as with some variation they come up repeatedly, with any RDP service interaction.
TOKEN_ENDPOINT = RDP_BASE_URL + CATEGORY_URL + RDP_AUTH_VERSION + ENDPOINT_URL
def _requestNewToken(refreshToken):
if refreshToken is None:
tData = {
"username": USERNAME,
"password": PASSWORD,
"grant_type": "password",
"scope": SCOPE,
"takeExclusiveSignOnControl": "true"
};
else:
tData = {
"refresh_token": refreshToken,
"grant_type": "refresh_token",
};
# Make a REST call to get latest access token
response = requests.post(
TOKEN_ENDPOINT,
headers = {
"Accept": "application/json"
},
data = tData,
auth = (
CLIENT_ID,
CLIENT_SECRET
)
)
if response.status_code != 200:
raise Exception("Failed to get access token {0} - {1}".format(response.status_code, response.text));
# Return the new token
return json.loads(response.text);
def saveToken(tknObject):
tf = open(TOKEN_FILE, "w+");
print("Saving the new token");
# Append the expiry time to token
tknObject["expiry_tm"] = time.time() + int(tknObject["expires_in"]) - 10;
# Store it in the file
json.dump(tknObject, tf, indent=4)
def getToken():
try:
print("Reading the token from: " + TOKEN_FILE);
# Read the token from a file
tf = open(TOKEN_FILE, "r+")
tknObject = json.load(tf);
# Is access token valid
if tknObject["expiry_tm"] > time.time():
# return access token
print(tknObject["expiry_tm"])
print("time.time()="+ str(time.time()))
return tknObject["access_token"];
print("Token expired, refreshing a new one...");
tf.close();
# Get a new token from refresh token
tknObject = _requestNewToken(tknObject["refresh_token"]);
except Exception as exp:
print("Caught exception: " + str(exp))
print("Getting a new token using Password Grant...");
tknObject = _requestNewToken(None);
# Persist this token for future queries
saveToken(tknObject)
# Return access token
return tknObject["access_token"];
Define Filings Helper Function requestSearch
FILINGS_ENDPOINT = RDP_BASE_URL+'/data-store'+RDP_FILINGS_VERSION + '/graphql'
def requestSearch(token, payloadSearch):
global FILINGS_ENDPOINT
print("requestSearch...")
querystring = {}
payload = ""
jsonfull = ""
jsonpartial = ""
headers = {
'Content-Type': "application/json",
'Authorization': "Bearer " + token,
'cache-control': "no-cache"
}
response =requests.post(FILINGS_ENDPOINT, json={'query': payloadSearch}, headers=headers)
print("Response status code ="+str(response.status_code))
if response.status_code != 200:
if response.status_code == 401: # error when token expired
accessToken = getToken(); # token refresh on token expired
headers["Authorization"] = "Bearer " + accessToken
response =requests.post(FILINGS_ENDPOINT, json={'query': payloadSearch}, headers=headers)
print('Raw response=');
print(response);
if response.status_code == 200:
jsonFullResp = json.loads(response.text)
return jsonFullResp;
else:
return '';
Search Filings by File Type
The following example searches for all 10-Qs on February 12, 2021.
payloadIn = """
{
FinancialFiling(filter: {AND: [
{FilingDocument: {DocumentSummary: {FormType: {EQ: "10-Q"}}}},
{FilingDocument: {DocumentSummary: {FilingDate: {BETWN: {FROM: "2021-02-12T00:00:00Z", TO: "2021-02-12T23:59:59Z"}}}}}]},
sort: {FilingDocument: {DocumentSummary: {FilingDate: DESC}}},
limit: 25 ) {
_metadata {
totalCount
cursor
}
FilingOrganization {
Names {
Name {
OrganizationName (filter: {OrganizationNameTypeCode: {EQ: "LNG"}}){
OrganizationName
}
}
}
}
FilingDocument {
Identifiers {
OrganizationId
Dcn
}
DocId
FinancialFilingId
DocumentSummary {
DocumentTitle
FeedName
FormType
HighLevelCategory
MidLevelCategory
FilingDate
SecAccessionNumber
SizeInBytes
}
FilesMetaData {
FileName
MimeType
}
}
}
}
"""
jsonFullResp = requestSearch(accessToken,payloadIn);
print('Parsed json response=');
print(json.dumps(jsonFullResp, indent=2));
docId = jsonFullResp["data"]["FinancialFiling"][0]["FilingDocument"]["DocId"]
print('DocId is',str(docId))
cursor = jsonFullResp["data"]["FinancialFiling"][0]["_metadata"]["cursor"]
print('cursor is', str(cursor))
Once we have identified the required DocId or DocIds, and cusrsor, this info is used by the next steps to request the required Filings documents
Pagination
The maximum number response is 200 if a limit is not specified in the query. Since we use a cursor-based pagination, returns a pointer to a specific item in the dataset. Each cursors is unique to a specific record. Last record is used to paginate.
To view the next 25 responses in the previous example, set the cursor from the last data point in the response to retrieve 26-50.
payloadIn1 = """
{
FinancialFiling(filter: {AND: [
{FilingDocument: {DocumentSummary: {FormType: {EQ: "10-Q"}}}},
{FilingDocument: {DocumentSummary: {FilingDate: {BETWN: {FROM: "2021-02-12T00:00:00Z", TO: "2021-02-12T23:59:59Z"}}}}}]},
sort: {FilingDocument: {DocumentSummary: {FilingDate: DESC}}},
limit: 25
cursor: """
payloadIn2 = """
) {
_metadata {
totalCount
cursor
}
FilingOrganization {
Names {
Name {
OrganizationName (filter: {OrganizationNameTypeCode: {EQ: "LNG"}}){
OrganizationName
}
}
}
}
FilingDocument {
Identifiers {
OrganizationId
Dcn
}
DocId
FinancialFilingId
DocumentSummary {
DocumentTitle
FeedName
FormType
HighLevelCategory
MidLevelCategory
FilingDate
SecAccessionNumber
SizeInBytes
}
FilesMetaData {
FileName
MimeType
}
}
}
}
"""
print("Request="+payloadIn1+"\""+str(cursor)+"\""+payloadIn2)
jsonFullResp = requestSearch(accessToken,payloadIn1+"\""+str(cursor)+"\""+payloadIn2);
print('Parsed json response=');
print(json.dumps(jsonFullResp, indent=2));
Search by OrganizationId
Search for all filings documents for Tesla in 2021
payloadIn = """
{
FinancialFiling(filter: {AND: [{FilingDocument: {Identifiers: {OrganizationId: {EQ: "4297089638"}}}},
{FilingDocument: {DocumentSummary: {FilingDate: {BETWN: {FROM: "2021-01-01T00:00:00Z", TO: "2021-12-31T11:59:59Z"}}}}}]},
sort: {FilingDocument: {DocumentSummary: {FilingDate: DESC}}},
limit: 10) {
_metadata {
totalCount
}
FilingOrganization {
Names {
Name {
OrganizationName (filter: {OrganizationNameTypeCode: {EQ: "LNG"}}){
OrganizationName
}
}
}
}
FilingDocument {
Identifiers {
OrganizationId
Dcn
}
DocId
FinancialFilingId
DocumentSummary {
DocumentTitle
FeedName
FormType
HighLevelCategory
MidLevelCategory
FilingDate
SecAccessionNumber
SizeInBytes
}
FilesMetaData {
FileName
MimeType
}
}
}
}
"""
jsonFullResp = requestSearch(accessToken,payloadIn);
print('Parsed json response=');
print(json.dumps(jsonFullResp, indent=2));
Keyword Search by Document Text
One of the other features available in keyword word against document or section text.
payloadIn = """
{
FinancialFiling(
sort: {FilingDocument: {DocumentSummary: {FilingDate: DESC}}},
filter: {FilingDocument: {DocumentSummary: {FilingDate: {BETWN: {FROM: "2020-07-01T00:00:00Z", TO: "2020-08-01T00:00:00Z"}}}}},
keywords: {searchstring: "FinancialFiling.FilingDocument.DocumentText:COVID-19"},
limit: 5) {
_metadata {
totalCount
}
FilingOrganization {
Names {
Name {
OrganizationName(
filter: {AND: [ {
OrganizationNameLanguageId: {EQ: "505062"}}, {
OrganizationNameTypeCode: {EQ: "LNG"}}]})
{
OrganizationName
}
}
}
}
FilingDocument {
DocId
DocumentSummary {
DocumentTitle
FilingDate
FormType
FeedName
}
DocumentText
}
}
}
"""
jsonFullResp = requestSearch(accessToken,payloadIn);
print('Parsed json response=');
print(json.dumps(jsonFullResp, indent=2));
docId = jsonFullResp["data"]["FinancialFiling"][0]["FilingDocument"]["DocId"]
print('DocId is',str(docId))
Download Filings Documents
There are four identifers or retrieval methods you can use to download a document.
- FilingId (FilingId, or Financial Filing Id, is an internal permanent identifier assigned to each filings document. This is our strategic filings identifier.)
- Dcn (Dcn, also known as Document Control Number, is an external identifier and an enclosed film-number specific to Edgar documents.)
- DocId (DocId, or Document Identifier, is an internal identifier assigned to financial filings documents.)
- Filename (Filename provides a faster and direct route to download documents without going through a resolver.)
Define Helper Function retrieveURL
def retrieveURL(token, retrievalParameters):
ENDPOINT_DOC_RETRIEVAL = RDP_BASE_URL+'/data/filings'+RDP_FILINGS_VERSION + '/retrieval/search/' + retrievalParameters
headers = {
"Authorization": "Bearer " + token,
"X-API-Key": "155d9dbf-f0ac-46d9-8b77-f7f6dcd238f8",
"ClientID" : "api_playground"
}
print("Next we retrieve: " + ENDPOINT_DOC_RETRIEVAL);
response = requests.get(ENDPOINT_DOC_RETRIEVAL, headers = headers);
print("Response status code ="+str(response.status_code))
if response.status_code != 200:
if response.status_code == 401: # error when token expired
token = getToken(); # token refresh on token expired
print("Token now is: "+token)
headers["Authorization"] = "Bearer " + token
response = requests.get(ENDPOINT_DOC_RETRIEVAL, headers = headers);
print("Response status code ="+str(response.status_code))
if response.status_code == 200:
jsonFullResp = json.loads(response.text)
return jsonFullResp;
else:
return '';
Retrieve URL by DocID
jsonFullResp = retrieveURL(accessToken,'docId/54932207')
print('full response is =');
print(json.dumps(jsonFullResp, indent=2));
fileName = list(jsonFullResp.keys())[0]
print("fileName is: ")
print(fileName)
signedUrl = jsonFullResp[list(jsonFullResp.keys())[0]]["signedUrl"]
print("signedUrl to retrieve is: ")
print(signedUrl)
Retrieve URL by FilingId
jsonFullResp = retrieveURL(accessToken,'filingId/97661417885')
print('full response is =');
print(json.dumps(jsonFullResp, indent=2));
fileName = list(jsonFullResp.keys())[0]
print("fileName is: ")
print(fileName)
signedUrl = jsonFullResp[list(jsonFullResp.keys())[0]]["signedUrl"]
print("signedUrl to retrieve is: ")
print(signedUrl)
Download the Document
Now we are ready to download the Filings document and save it under downloads folder
def retrieveSaveDoc(fileName, signedUrl, token):
headers = {
'clientId': CLIENT_ID,
'Authorization': "Bearer " + token
}
response = requests.get(signedUrl, headers = headers, allow_redirects=True);
if response.status_code == 200:
filenameWithDir = './downloads/'+ str(fileName)
os.makedirs(os.path.dirname(filenameWithDir), exist_ok=True)
open(filenameWithDir, 'wb').write(response.content)
print("The document ",fileName," has been downloaded indo downloads subfolder")
return fileName;
else:
print("Response code on error is:",str(response.status_code))
return '';
retrieveSaveDoc(fileName, signedUrl, accessToken)ken)
At this point a filings pdf file is stores under downloads folder:
For more common use case examples that can be implemented in Python analogously, see Filings Developer Guide -> Section Use Cases