The Refinitiv Data Platform - Research API is Refinitiv aggregate delivery system of providing the buy-side with their entitled sell-side research reports on a real-time basis. This system delivers asynchronous updates (alerts) via Amazon’s Simple Queue Service (SQS). It is possible to create a serverless application based on AWS services to receive and process messages from the queue without provisioning or managing servers. Amazon provides a set of services that can be used to create a serverless application. In this article, we will utilize a set of AWS services to create an application working with RDP Research API.
I structured the article into two parts. In part 1 - I explain basic information about RDP Research API, Amazon Web Services. It also provides setup instructions and the application's workflow. In part 2 - I provide basic information about AWS Steps Functions, setup and run instructions.
RDP Research API Overview
The RDP Research API uses Alerts mechanism to delivery updates. An application first needs to login to the Refinitiv Data Platform and get access token used in any requests to Research API. Application can use API to subscribe to Research documents. After that, new updates (alerts) will be put in an AWS SQS queue. It is the application’s responsibility to keep polling the queue to get new messages.
You can find more information about the Research API from the following resources.
Amazon Web Services Overview
The application in this article utilizes various of Amazon Web Services. Below are some descriptions and resources.
- AWS Lambda Function
AWS Lambda let you run code without provisioning or managing server. AWS Lambda supports multiple languages through the use of runtimes. We use Python runtime to execute our application’s code in Python in this article.
You can also use AWS Lambda to run your code in response to events, such as changes to data in an Amazon S3 bucket or an Amazon DynamoDB table. According to Using Lambda with Amazon SQS, Amazon SQS can also be an Event source of AWS Lambda Function which invokes a Lambda Function with an event that contains queue message, however, the SQS created by Research API currently doesn’t support this functionality. So, in this article, we will implement a Lambda function to poll the SQS queue manually.
For more information about AWS Lambda: https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html
- AWS System Manager Parameter Store (SSM Parameter Store):
AWS Systems Manager Parameter Store provides secure, hierarchical storage for configuration data management and secrets management. You can store data such as passwords, database strings, and license codes as parameter values. The value can be stored as plain text or encrypted data.
In this article, we use the SSM Parameter Store to store username, password, UUID and access token used by the application. SSM Parameter Store also stores the Last Modified Date of parameter, so we can use this timestamp information to verify whether the Access Token of RDP is expired or not.
For more information about SSM Parameter Store: https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html
- Amazon Simple Storage Service (Amazon S3):
Amazon S3 is an object storage service. It has the concept of “bucket” which is a container for objects stored in Amazon S3. Every object is contained in a bucket.
For more information about S3: https://aws.amazon.com/s3/
- AWS Step Functions:
The AWS Step Functions is a service that lets users coordinate multiple AWS services into serverless workflow. Workflows are made up of a series of steps, with the output of one step acting as input into the next. It translates an application’s workflow to a state diagram which is easy to understand and monitor.
In this article, we use AWS Step Functions to integrate each step of Lambda functions following the Research API workflow such as get access token, subscribe for Research and poll SQS. Below is the state diagram generated by Step Functions for the application. During run-time, it displays the current execute steps and log status and results of each step.
For more information about AWS Step Function: https://aws.amazon.com/step-functions/
Application’s workflow
The application is implemented in Python based on the sample application in the News and Research alerts in Python tutorial. The application is implemented to repeatedly poll SQS for new updates. As Lambda was intended for small, simple functions, I need to separate each step of implementation to multiple Lambda functions and integrate them with AWS Step Functions.
- Step Function invokes a Lambda function; “getEDPToken” to get RDP username, password from AWS Systems Manager Parameter Store (SSM Parameter Store), and then use the information to get RDP access token from RDP API service. The Access token will be stored back in the Parameter Store. Below is the snippet code of the “getEDPToken” Lambda function to get parameters’ value from the SSM Parameter Store.
# Get parameters' values from SSM Parameter Store
client = boto3.client('ssm')
response = client.get_parameters(
Names=['EDPUsername','EDPPassword','EDPClientId'],
WithDecryption=False
)
params = response['Parameters']
username = list(filter(lambda x : x['Name'] == 'EDPUsername', params))[0]['Value']
password = list(filter(lambda x : x['Name'] == 'EDPPassword', params))[0]['Value']
clientId = list(filter(lambda x : x['Name'] == 'EDPClientId', params))[0]['Value']
2. A Lambda function; “subscribeResearch” is invoked to subscribe to Research Alerts, and then pass the Encryption key and endpoint to the next function. The function will unsubscribe the remain subscription first to prevent duplicate subscription error.
3. With the endpoint, a Lambda function; "getCloudCredential" will request Cloud Credential from RDP service.
4. Application repeatedly polls the SQS Queue to see whether there is a new Alert message, and then get the document ID of Research.
5. A Step Function; “getAlertMessage” verifies whether messages containing document ID are available in the queue. If available, the function will pass the list of document IDs to the next function to download documents. If not, a Wait X seconds state is invoked to wait for the next interval.
6. A Step Functions; “refreshToken” will be invoked before the download documents state to refresh token if the stored access token is expired download and store the file on Amazon S3, if available. The function; “downloadDocuments” will be invoked for each document ID to download data from RDP, and then store the file in AWS S3.
7. The Research API supports two types of results; text or pdf. The application in this article is implemented to get Research documents in text format. You can modify the type to pdf in the "downloadDocuments" Lambda function. Below is the sample code for pdf format.
#=============================================================================
def downloadDocument(id,docUrl,outputBucket):
s3 = boto3.client('s3')
response = requests.get(docUrl, stream=True)
print(response.raw)
s3.upload_fileobj(response.raw, outputBucket, id+".pdf")
#=============================================================================
def getDocumentUrl(token,docID,uId):
document_type = "/pdf"
p = {'uuid': uId}
RESOURCE_ENDPOINT = document_URL + docID + document_type
...
Below is the connectivity diagram describing how the application integrates with other Amazon Web Services and RDP API.
Environment setup
- To use Lambda and other AWS services, you need an AWS account and IAM User first. Below is the information regarding the setup from the Get Started with Lambda page. Please follow the instructions if you do not have an AWS account.
"To use Lambda and other AWS services, you need an AWS account. If you don't have an account, visit aws.amazon.com and choose Create an AWS Account. For detailed instructions, see Create and Activate an AWS Account.
As a best practice, you should also create an AWS Identity and Access Management (IAM) user with administrator permissions and use that for all work that does not require root credentials. Create a password for console access, and access keys to use command line tools. See Creating Your First IAM Admin User and Group in the IAM User Guide for instructions."
2. Download and install AWS Command Line Interface (CLI) which is used to deploy Lambda Functions in this article.
3. Setup your AWS credential in the AWS CLI.
Firstly, you need to get your access key ID and secret access key. You can follow the instructions in this guide to get your credential information. Run the configure command to set region, access key ID and secret access key.
With regard to the default region, all resources are created in “us-east-1” because the region is used by Research API to create the SQS queue. To prevent data transfer costs, we use this region for all Amazon Web Services.
>aws configure
AWS Access Key ID [None]: accesskey
AWS Secret Access Key [None]: secretkey
Default region name [None]: us-east-1
Default output format [None]:
4. The application gets RDP Username, Password, Access Token, and Refresh Token from SSM Parameter Store, so you need to create the following parameters on the AWS Console .
- EDPUsername
- EDPPassword
- EDPClientId
- UUID
- EDPAccessToken
- EDPRefreshToken
- BucketStorage
The console screen is below once you create all parameters.
5. The application download Research document information, and then store the information as a file object in an AWS S3 bucket. You need to create a bucket on the AWS Console.
Create Lambda Functions
Next, we will create Lambda Functions from deployment packages using the AWS Command Line Interface (CLI). The deployment packages and installation scripts can be downloaded from Github. After extract the file, you will see the structure as follows.
The “install.ps1” is a PowerShell script that can be used for installation. You can run the script to setup all Lambda functions. Otherwise, please follow the instructions as follows.
First, open a command line and change the current directory to the extracted files’ location.
- IAM Role for AWS service access
IAM Role can be created on a user account to define permission policies. As you have already known, the application accesses various services. We need to create an IAM Role contain permission policy for the services.
- Create an IAM Role
This step will return created Role’s ARN. You will need the Role’s ARN to create Lambda Function in the next step.
aws iam create-role --role-name lambda-sqs-ssm --assume-role-policy-document file://role.json
Below is the sample of ARN in the returned message.
- Attach Role Policy
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --role-name lambda-sqs-ssm
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonSSMFullAccess --role-name lambda-sqs-ssm
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AWSLambdaExecute --role-name lambda-sqs-ssm
2. Create Lambda Functions
You need to replace the $arn_info with the ARN of the IAM Role created in the previous step.
aws lambda create-function --function-name getEDPToken --runtime python3.7 --role $arn_info --handler lambda_function.lambda_handler --timeout 5 --zip-file fileb://getEDPToken.zip --region us-east-1
aws lambda create-function --function-name subscribeResearch --runtime python3.7 --role $arn_info --handler lambda_function.lambda_handler --timeout 5 --zip-file fileb://subscribeResearch.zip --region us-east-1
aws lambda create-function --function-name getCloudCredential --runtime python3.7 --role $arn_info --handler lambda_function.lambda_handler --timeout 3 --zip-file fileb://getCloudCredential.zip --region us-east-1
aws lambda create-function --function-name getAlertMessage --runtime python3.7 --role $arn_info --handler lambda_function.lambda_handler --timeout 10 --zip-file fileb://getAlertMessage.zip --region us-east-1
aws lambda create-function --function-name refreshToken --runtime python3.7 --role $arn_info --handler lambda_function.lambda_handler --timeout 5 --zip-file fileb://refreshToken.zip --region us-east-1
aws lambda create-function --function-name downloadDocuments --runtime python3.7 --role $arn_info --handler lambda_function.lambda_handler --timeout 10 --zip-file fileb://downloadDocuments.zip --region us-east-1
After this step, you will see the list of Lambda functions in the AWS Console GUI.
Custom Python Library in Lambda Function
Lambda Function generally is executed in a dedicated environment. If your function depends on libraries other than the SDK for Python (Boto 3), you need to create a deployment package that includes the libraries. In this article, the Lambda Functions depend on the requests and the pycryptodome libraries for REST API and decryption. For more information, please refer to Updating a Function with Additional Dependencies section in AWS Lambda Deployment Package in Python
Conclusion
This article provides a basic understanding of Research API, a set of Amazon Web Services. It also describes the application workflows which interacts with various service. Finally, it provides instructions to setup environment and Lambda Functions in your AWS account. The next part of this article will put all created serverless services together with AWS Step Functions.