Generating Alpha using Starmine Analytical Models - a Python example
Overview
This article will demonstrate how we can use Starmine Analytical Models to derive insight and generate alpha, in this case, for equity markets. It is intended as a teaser to intice you to further explore this rich vein rather than a rigorous scientific study. I will be using Eikon and more specifically our new Eikon Data API to access all the data I need to conduct this analysis - however Starmine Analytics is available in other products and delivery channels. I will also provide the Jupyter notebook source via gitub.
Pre-requisites:
Refinitiv Eikon / Refinitiv Workspace with access to Eikon Data APIs (Free Trial Available)
Python 2.x/3.x
Required Python Packages: eikon, pandas, numpy, matplotlib, sklearn, scipy
Starmine Quantitative Analytics
In short - StarMine provides a suite of proprietary alpha-generating analytics and models spanning sectors, regions, and markets. The list of models is really broad-based and includes both quantitative analytics (such as smartEstimates) and quantitative models (such as the Combined Alpha Model I will demo here).
All of these analytics and models can provide you with new sources of information and subsequently alpha - they are calculated by our Starmine team and delivered to you - saving you many man years of work and research etc. You can find out more about Starmine Analytics by going here.
How to Access the Starmine and Other Data
As mentioned above I will be using Eikon and specifically our new Eikon Data API - which is a really easy to use yet performant web API. At the time of writing this article it is still in Beta - but all Eikon users can access it by simply upgrading their Eikon to v4.0.36+ and going to the developer portal here and follow instructions. If you have any questions around the API there are also monitored Q&A forums.
Once we have access to the Eikon Data API its very straightforward to get the data we need. First lets import some packages we will need to conduct this analysis and also set our App ID (this is available from the App ID generator - see the quick start guide for further details):
import eikon as ek
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ek.set_app_key('YOUR APP KEY HERE')
Next we need to formulate our API call. In this case, I will be requesting data for all CAC-40 index constituents and I will be requesting a mixture of both Reference, Starmine Analytics and Price performance data for 8 quarters - I have made it more generic here - but essentially this is one line of code! And as you will see the API returns the data in a pandas dataframe.
RICS = ['0#.FCHI']
fields =['TR.TRBCIndustryGroup','TR.CombinedAlphaCountryRank(SDate=0,EDate=-7,Frq=FQ)','TR.CombinedAlphaCountryRank(SDate=0,EDate=-7,Frq=FQ).Date',
'TR.TotalReturn3Mo(SDate=0,EDate=-7,Frq=FQ)','TR.TotalReturn3Mo(SDate=0,EDate=-7,Frq=FQ).calcdate']
ids,err=ek.get_data(RICS,fields=fields)
ids.head(20)
Instrument | TRBC Industry Group Name | Combined Alpha Model Country Rank | Date | 3 Month Total Return | Calc Date | |
---|---|---|---|---|---|---|
0 | ACCP.PA | Hotels & Entertainment Services | 13.0 | 2017-12-31T00:00:00Z | 2.295706 | 2017-12-31 |
1 | ACCP.PA | 5.0 | 2017-09-30T00:00:00Z | 2.411987 | 2017-09-30 | |
2 | ACCP.PA | 36.0 | 2017-06-30T00:00:00Z | 7.890774 | 2017-06-30 | |
3 | ACCP.PA | 14.0 | 2017-03-31T00:00:00Z | 10.217330 | 2017-03-31 | |
4 | ACCP.PA | 17.0 | 2016-12-31T00:00:00Z | 0.339847 | 2016-12-31 | |
5 | ACCP.PA | 5.0 | 2016-09-30T00:00:00Z | 1.787259 | 2016-09-30 | |
6 | ACCP.PA | 13.0 | 2016-06-30T00:00:00Z | -5.187512 | 2016-06-30 | |
7 | ACCP.PA | 9.0 | 2016-03-31T00:00:00Z | -6.911636 | 2016-03-31 | |
8 | AIRP.PA | Chemicals | 44.0 | 2017-12-31T00:00:00Z | 2.396977 | 2017-12-31 |
9 | AIRP.PA | 22.0 | 2017-09-30T00:00:00Z | 4.297597 | 2017-09-30 | |
10 | AIRP.PA | 20.0 | 2017-06-30T00:00:00Z | 3.664822 | 2017-06-30 | |
11 | AIRP.PA | 17.0 | 2017-03-31T00:00:00Z | 1.372456 | 2017-03-31 | |
12 | AIRP.PA | 24.0 | 2016-12-31T00:00:00Z | 10.686223 | 2016-12-31 | |
13 | AIRP.PA | 10.0 | 2016-09-30T00:00:00Z | 4.185135 | 2016-09-30 | |
14 | AIRP.PA | 38.0 | 2016-06-30T00:00:00Z | -4.252091 | 2016-06-30 | |
15 | AIRP.PA | 26.0 | 2016-03-31T00:00:00Z | -4.592378 | 2016-03-31 | |
16 | AIR.PA | Aerospace & Defense | 62.0 | 2017-12-31T00:00:00Z | 3.220992 | 2017-12-31 |
17 | AIR.PA | 25.0 | 2017-09-30T00:00:00Z | 11.680556 | 2017-09-30 | |
18 | AIR.PA | 59.0 | 2017-06-30T00:00:00Z | 2.453877 | 2017-06-30 | |
19 | AIR.PA | 52.0 | 2017-03-31T00:00:00Z | 13.510503 | 2017-03-31 |
Now that we have our data in a dataframe we may need to do some wrangling to get the types correctly set. The get_data call is the most flexible of calls so this is to be expected. We can easily check the types of data in the dataframe by column:
ids.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 320 entries, 0 to 319
Data columns (total 6 columns):
Instrument 320 non-null object
TRBC Industry Group Name 320 non-null object
Combined Alpha Model Country Rank 316 non-null float64
Date 320 non-null object
3 Month Total Return 320 non-null float64
Calc Date 320 non-null object
dtypes: float64(2), object(4)
memory usage: 10.0+ KB
First we want to cast the Date object (a string) as a datetime type, then we want to set that datetime field as the index for the frame. Secondly, we want to make sure any numeric fields are numeric and any text fields are strings:
ids['Date']=pd.to_datetime(ids['Date'])
ads=ids.set_index('Date')[['Instrument','TRBC Industry Group Name','Combined Alpha Model Country Rank','3 Month Total Return']]
ads['3 Month Total Return'] = pd.to_numeric(ads['3 Month Total Return'], errors='coerse')
ads['TRBC Industry Group Name'] = ads['TRBC Industry Group Name'].astype(str)
ads.head(10)
Instrument | TRBC Industry Group Name | Combined Alpha Model Country Rank | 3 Month Total Return | |
---|---|---|---|---|
Date | ||||
2017-12-31 | ACCP.PA | Hotels & Entertainment Services | 13.0 | 2.295706 |
2017-09-30 | ACCP.PA | 5.0 | 2.411987 | |
2017-06-30 | ACCP.PA | 36.0 | 7.890774 | |
2017-03-31 | ACCP.PA | 14.0 | 10.217330 | |
2016-12-31 | ACCP.PA | 17.0 | 0.339847 | |
2016-09-30 | ACCP.PA | 5.0 | 1.787259 | |
2016-06-30 | ACCP.PA | 13.0 | -5.187512 | |
2016-03-31 | ACCP.PA | 9.0 | -6.911636 | |
2017-12-31 | AIRP.PA | Chemicals | 44.0 | 2.396977 |
2017-09-30 | AIRP.PA | 22.0 | 4.297597 |
ads.dtypes
Instrument object
TRBC Industry Group Name object
Combined Alpha Model Country Rank float64
3 Month Total Return float64
dtype: object
ads.head()
Instrument | TRBC Industry Group Name | Combined Alpha Model Country Rank | 3 Month Total Return | |
---|---|---|---|---|
Date | ||||
2017-12-31 | ACCP.PA | Hotels & Entertainment Services | 13.0 | 2.295706 |
2017-09-30 | ACCP.PA | 5.0 | 2.411987 | |
2017-06-30 | ACCP.PA | 36.0 | 7.890774 | |
2017-03-31 | ACCP.PA | 14.0 | 10.217330 | |
2016-12-31 | ACCP.PA | 17.0 | 0.339847 |
A bit more wrangling is required here as we can see for example that the API has returned the TRBC Industry Group only for the most current instance of each instrument - not for the historical quarters. We can recitfy this easily in 2 lines of code. First we will replace blanks with nan (not a number). We can then use the excellent and surgical fillna dataframe function to fill the Sector name to the historic quarters:
ads1 = ads.replace('', np.nan, regex=True)
ads1['TRBC Industry Group Name'].fillna(method='ffill',limit=7, inplace=True)
ads1.head(15)
Instrument | TRBC Industry Group Name | Combined Alpha Model Country Rank | 3 Month Total Return | |
---|---|---|---|---|
Date | ||||
2017-12-31 | ACCP.PA | Hotels & Entertainment Services | 13.0 | 2.295706 |
2017-09-30 | ACCP.PA | Hotels & Entertainment Services | 5.0 | 2.411987 |
2017-06-30 | ACCP.PA | Hotels & Entertainment Services | 36.0 | 7.890774 |
2017-03-31 | ACCP.PA | Hotels & Entertainment Services | 14.0 | 10.217330 |
2016-12-31 | ACCP.PA | Hotels & Entertainment Services | 17.0 | 0.339847 |
2016-09-30 | ACCP.PA | Hotels & Entertainment Services | 5.0 | 1.787259 |
2016-06-30 | ACCP.PA | Hotels & Entertainment Services | 13.0 | -5.187512 |
2016-03-31 | ACCP.PA | Hotels & Entertainment Services | 9.0 | -6.911636 |
2017-12-31 | AIRP.PA | Chemicals | 44.0 | 2.396977 |
2017-09-30 | AIRP.PA | Chemicals | 22.0 | 4.297597 |
2017-06-30 | AIRP.PA | Chemicals | 20.0 | 3.664822 |
2017-03-31 | AIRP.PA | Chemicals | 17.0 | 1.372456 |
2016-12-31 | AIRP.PA | Chemicals | 24.0 | 10.686223 |
2016-09-30 | AIRP.PA | Chemicals | 10.0 | 4.185135 |
2016-06-30 | AIRP.PA | Chemicals | 38.0 | -4.252091 |
We now need to shift the 3 Month Total Return column down by 1 so we can map CAM ranks to forward looking performance. Thankfully in pandas this is trivial and note the Groupby instrument clause.
ads1['3 Month Total Return'] = ads1.groupby('Instrument')['3 Month Total Return'].shift()
ads1.head(15)
Instrument | TRBC Industry Group Name | Combined Alpha Model Country Rank | 3 Month Total Return | |
---|---|---|---|---|
Date | ||||
2017-12-31 | ACCP.PA | Hotels & Entertainment Services | 13.0 | NaN |
2017-09-30 | ACCP.PA | Hotels & Entertainment Services | 5.0 | 2.295706 |
2017-06-30 | ACCP.PA | Hotels & Entertainment Services | 36.0 | 2.411987 |
2017-03-31 | ACCP.PA | Hotels & Entertainment Services | 14.0 | 7.890774 |
2016-12-31 | ACCP.PA | Hotels & Entertainment Services | 17.0 | 10.217330 |
2016-09-30 | ACCP.PA | Hotels & Entertainment Services | 5.0 | 0.339847 |
2016-06-30 | ACCP.PA | Hotels & Entertainment Services | 13.0 | 1.787259 |
2016-03-31 | ACCP.PA | Hotels & Entertainment Services | 9.0 | -5.187512 |
2017-12-31 | AIRP.PA | Chemicals | 44.0 | NaN |
2017-09-30 | AIRP.PA | Chemicals | 22.0 | 2.396977 |
2017-06-30 | AIRP.PA | Chemicals | 20.0 | 4.297597 |
2017-03-31 | AIRP.PA | Chemicals | 17.0 | 3.664822 |
2016-12-31 | AIRP.PA | Chemicals | 24.0 | 1.372456 |
2016-09-30 | AIRP.PA | Chemicals | 10.0 | 10.686223 |
2016-06-30 | AIRP.PA | Chemicals | 38.0 | 4.185135 |
ads1.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 320 entries, 2017-12-31 to 2016-03-31
Data columns (total 4 columns):
Instrument 320 non-null object
TRBC Industry Group Name 320 non-null object
Combined Alpha Model Country Rank 316 non-null float64
3 Month Total Return 280 non-null float64
dtypes: float64(2), object(2)
memory usage: 10.0+ KB
ads1.dropna(axis=0, how='any', inplace=True)
ads1.head(20)
Instrument | TRBC Industry Group Name | Combined Alpha Model Country Rank | 3 Month Total Return | |
---|---|---|---|---|
Date | ||||
2017-09-30 | ACCP.PA | Hotels & Entertainment Services | 5.0 | 2.295706 |
2017-06-30 | ACCP.PA | Hotels & Entertainment Services | 36.0 | 2.411987 |
2017-03-31 | ACCP.PA | Hotels & Entertainment Services | 14.0 | 7.890774 |
2016-12-31 | ACCP.PA | Hotels & Entertainment Services | 17.0 | 10.217330 |
2016-09-30 | ACCP.PA | Hotels & Entertainment Services | 5.0 | 0.339847 |
2016-06-30 | ACCP.PA | Hotels & Entertainment Services | 13.0 | 1.787259 |
2016-03-31 | ACCP.PA | Hotels & Entertainment Services | 9.0 | -5.187512 |
2017-09-30 | AIRP.PA | Chemicals | 22.0 | 2.396977 |
2017-06-30 | AIRP.PA | Chemicals | 20.0 | 4.297597 |
2017-03-31 | AIRP.PA | Chemicals | 17.0 | 3.664822 |
2016-12-31 | AIRP.PA | Chemicals | 24.0 | 1.372456 |
2016-09-30 | AIRP.PA | Chemicals | 10.0 | 10.686223 |
2016-06-30 | AIRP.PA | Chemicals | 38.0 | 4.185135 |
2016-03-31 | AIRP.PA | Chemicals | 26.0 | -4.252091 |
2017-09-30 | AIR.PA | Aerospace & Defense | 25.0 | 3.220992 |
2017-06-30 | AIR.PA | Aerospace & Defense | 59.0 | 11.680556 |
2017-03-31 | AIR.PA | Aerospace & Defense | 52.0 | 2.453877 |
2016-12-31 | AIR.PA | Aerospace & Defense | 14.0 | 13.510503 |
2016-09-30 | AIR.PA | Aerospace & Defense | 13.0 | 16.716196 |
2016-06-30 | AIR.PA | Aerospace & Defense | 22.0 | 4.078871 |
So now our Dataframe is seemingly in good shape - what are we going to do next? We can start with the question do higher CAM score leads to better performance? Or put more generically, are CAM scores RELATED to performance? Remember these CAM scores are dynamic and change overtime depending on changes in their underlying components AND relative to other companies in the universe (in this case companies in the same country).
So lets just see what this looks like in terms of a scatterplot for one sector of the CAC-40. Here we are just filtering the frame using a text filter and aggregating using a groupby statement. In this case we have 2 companies.
%matplotlib inline
adsl = ads1[ads1['TRBC Industry Group Name'] =='Industrial Conglomerates']
adsl = adsl.groupby('Instrument')
ax = adsl.plot(x='Combined Alpha Model Country Rank', y='3 Month Total Return', kind='scatter')
So lets be clear about what we are looking at. Each dot represents a pair of observations for CAM rank and 3M total return for a quarter - we should have 7 quarters of observations for each instrument. So we could implement a simple linear regression and that would have different parameters such as intercept, slope.
As I am in exploratory mode - I just am interested in the slope of the best fit linear line (of the form y = ax + b), where a is the slope coefficient. This should offer me some indication of how 3 Month total returns change for an increase in CAM rank. So our slope coefficient can tell us for example whether our variables are positively related (positive a) or negatively related (negative a) or really not related at all (near zero a).
We can also more formally calculate Spearman's Rank Correlation Coefficient which will give us a more rigorous measure of relatedness and also a probability that the null hypothesis (the two variables are not related) is true.
So rather than create plots for everything we can just create a model to provide the linear best fit solution for each instrument and then store that value in a new column as well as calculating the Spearman's Rank Correlation Coefficient and p-value.
First, I want to check for if any data is null as this may result in errors downstream.
ads1.isnull().any().count()
4
Here we can see there are 4 null values in our dataset - we can deal with these by replacing them with either mean values or most recent values. In our case, I choose the latter and will use a forward fill function to replace nulls:
adsNN = ads1.fillna(method='ffill')
adsNN.isnull().any()
Instrument False
TRBC Industry Group Name False
Combined Alpha Model Country Rank False
3 Month Total Return False
dtype: bool
Now we have confirmed we have no null values we can move on. Next we will use the Linear Regression model from the Linear Model tools from Scikit Learn package. We want to solve for 7 quarters of data for each Instrument. So we iterate over each intrument then use the model.fit method to generate the best fit linear solution (OLS) and then store the 'coef_' parameter of the model as a new column in the adsNN dataframe called 'slope'.
Whilst we are here we will also calculate a Spearman's Rank Correlation Coefficient using a routine from scipy package. The routine returns 2 values, the first is the Coefficient (Rho) and the second is the p-value. I just store these 2 elements in 2 seperate columns.
import sklearn
import scipy
from sklearn import linear_model
model = linear_model.LinearRegression()
for (group, adsNN_gp) in adsNN.groupby('Instrument'):
X=adsNN_gp[['Combined Alpha Model Country Rank']]
y=adsNN_gp[['3 Month Total Return']]
model.fit(X,y)
spearmans = scipy.stats.spearmanr(X,y)
adsNN.loc[adsNN.Instrument == adsNN_gp.iloc[0].Instrument, 'slope'] = model.coef_
adsNN.loc[adsNN.Instrument == adsNN_gp.iloc[0].Instrument, 'Rho'] = spearmans[0]
adsNN.loc[adsNN.Instrument == adsNN_gp.iloc[0].Instrument, 'p'] = spearmans[1]
adsNN.head(15)
Instrument | TRBC Industry Group Name | Combined Alpha Model Country Rank | 3 Month Total Return | slope | Rho | p | |
---|---|---|---|---|---|---|---|
Date | |||||||
2017-09-30 | ACCP.PA | Hotels & Entertainment Services | 5.0 | 2.295706 | 0.119443 | 0.684712 | 0.089666 |
2017-06-30 | ACCP.PA | Hotels & Entertainment Services | 36.0 | 2.411987 | 0.119443 | 0.684712 | 0.089666 |
2017-03-31 | ACCP.PA | Hotels & Entertainment Services | 14.0 | 7.890774 | 0.119443 | 0.684712 | 0.089666 |
2016-12-31 | ACCP.PA | Hotels & Entertainment Services | 17.0 | 10.217330 | 0.119443 | 0.684712 | 0.089666 |
2016-09-30 | ACCP.PA | Hotels & Entertainment Services | 5.0 | 0.339847 | 0.119443 | 0.684712 | 0.089666 |
2016-06-30 | ACCP.PA | Hotels & Entertainment Services | 13.0 | 1.787259 | 0.119443 | 0.684712 | 0.089666 |
2016-03-31 | ACCP.PA | Hotels & Entertainment Services | 9.0 | -5.187512 | 0.119443 | 0.684712 | 0.089666 |
2017-09-30 | AIRP.PA | Chemicals | 22.0 | 2.396977 | -0.250236 | -0.571429 | 0.180202 |
2017-06-30 | AIRP.PA | Chemicals | 20.0 | 4.297597 | -0.250236 | -0.571429 | 0.180202 |
2017-03-31 | AIRP.PA | Chemicals | 17.0 | 3.664822 | -0.250236 | -0.571429 | 0.180202 |
2016-12-31 | AIRP.PA | Chemicals | 24.0 | 1.372456 | -0.250236 | -0.571429 | 0.180202 |
2016-09-30 | AIRP.PA | Chemicals | 10.0 | 10.686223 | -0.250236 | -0.571429 | 0.180202 |
2016-06-30 | AIRP.PA | Chemicals | 38.0 | 4.185135 | -0.250236 | -0.571429 | 0.180202 |
2016-03-31 | AIRP.PA | Chemicals | 26.0 | -4.252091 | -0.250236 | -0.571429 | 0.180202 |
2017-09-30 | AIR.PA | Aerospace & Defense | 25.0 | 3.220992 | -0.234564 | -0.607143 | 0.148231 |
Voila - I have all the calculations I requested and I think I just want to average these by Instrument so I can get a summary view. (note averaging the slope, Rho and p values does not change them as they were calculated once for the 7 periods and just copied 7 times).
Averages = adsNN.groupby(['Instrument']).mean()
Averages
Combined Alpha Model Country Rank | 3 Month Total Return | slope | Rho | p | |
---|---|---|---|---|---|
Instrument | |||||
ACCP.PA | 14.142857 | 2.822199 | 0.119443 | 0.684712 | 0.089666 |
AIR.PA | 33.571429 | 5.714654 | -0.234564 | -0.607143 | 0.148231 |
AIRP.PA | 22.428571 | 3.193017 | -0.250236 | -0.571429 | 0.180202 |
ATOS.PA | 94.142857 | 8.784536 | -0.743983 | -0.054056 | 0.908365 |
AXAF.PA | 64.000000 | 4.571939 | -0.196042 | -0.126131 | 0.787572 |
BNPP.PA | 60.142857 | 7.160401 | -0.079672 | -0.252262 | 0.585241 |
BOUY.PA | 73.428571 | 4.506075 | -0.323423 | -0.500000 | 0.253170 |
CAGR.PA | 55.285714 | 8.206785 | -0.066242 | 0.142857 | 0.759945 |
CAPP.PA | 59.142857 | 3.231260 | -0.358011 | -0.535714 | 0.215217 |
CARR.PA | 26.285714 | -2.853468 | 0.209579 | -0.107143 | 0.819151 |
DANO.PA | 29.428571 | 2.312771 | 0.070155 | 0.142857 | 0.759945 |
ENGIE.PA | 46.428571 | 2.994981 | -0.117695 | -0.450469 | 0.310429 |
ESSI.PA | 37.285714 | 1.446916 | -0.205101 | -0.392857 | 0.383317 |
EXHO.PA | 55.428571 | 2.871624 | -0.066887 | -0.540562 | 0.210289 |
FTI.PA | 15.666667 | -4.249451 | -0.995839 | -1.000000 | 0.000000 |
LEGD.PA | 45.142857 | 4.603834 | -0.017527 | -0.035714 | 0.939408 |
LHN.PA | 35.857143 | 3.710595 | -0.248977 | -0.214286 | 0.644512 |
LVMH.PA | 70.571429 | 8.080266 | -0.167705 | -0.250000 | 0.588724 |
MICP.PA | 89.857143 | 5.207835 | 0.584094 | 0.630656 | 0.128888 |
MT.AS | 70.857143 | 13.962780 | 0.107790 | 0.198206 | 0.670085 |
ORAN.PA | 53.142857 | 0.186049 | -0.193575 | -0.846881 | 0.016197 |
OREP.PA | 43.000000 | 2.721421 | 0.055467 | 0.642857 | 0.119392 |
PERP.PA | 33.285714 | 4.797600 | -0.066685 | -0.071429 | 0.879048 |
PEUP.PA | 84.428571 | 3.700341 | -0.261221 | -0.214286 | 0.644512 |
PRTP.PA | 62.428571 | 14.708734 | -0.008738 | -0.018019 | 0.969415 |
PUBP.PA | 62.142857 | -0.332859 | 0.155443 | 0.321429 | 0.482072 |
RENA.PA | 67.000000 | 0.840024 | -0.286120 | -0.054056 | 0.908365 |
SAF.PA | 57.571429 | 5.371726 | -0.176439 | -0.357143 | 0.431611 |
SASY.PA | 71.285714 | 1.715126 | -0.241940 | -0.285714 | 0.534509 |
SCHN.PA | 53.142857 | 4.601030 | -0.002400 | 0.214286 | 0.644512 |
SGEF.PA | 70.285714 | 4.670457 | -0.264057 | -0.714286 | 0.071344 |
SGOB.PA | 68.285714 | 3.861674 | 0.131461 | 0.000000 | 1.000000 |
SOGN.PA | 55.857143 | 7.180684 | 0.029665 | 0.000000 | 1.000000 |
SOLB.BR | 61.142857 | 4.775618 | -0.236724 | -0.250000 | 0.588724 |
STM.PA | 73.285714 | 22.908329 | -0.388359 | -0.535714 | 0.215217 |
TOTF.PA | 51.857143 | 3.422500 | -0.031798 | -0.185312 | 0.690778 |
UNBP.AS | 32.000000 | -0.891114 | -0.095466 | -0.071429 | 0.879048 |
VIE.PA | 43.428571 | 1.959539 | -0.331911 | -0.828862 | 0.021174 |
VIV.PA | 28.285714 | 4.241614 | -0.079870 | -0.250000 | 0.588724 |
VLOF.PA | 88.142857 | 5.843416 | 0.022018 | -0.306319 | 0.504027 |
Conclusion
So now we have a summarised view of our study - can we answer the question posited earlier? 'Are CAM scores RELATED to performance?' Let us have a look at average CAM score versus average 3 month total return (over 7 quarters). This seems to indicate that there is a slight positive relation between the two variables.
ax = Averages.plot(x='Combined Alpha Model Country Rank', y='3 Month Total Return', kind='scatter')
We can check this further by running the same Spearman's Rank Correlation Coefficient test against these summary results. Here we can see Rho of 50 which is both large & positive and with p of 0.101 the null hypothesis can be safetly rejected.
spearmans = scipy.stats.spearmanr(Averages['Combined Alpha Model Country Rank'],Averages['3 Month Total Return'])
Rho = spearmans[0]
p = spearmans[1]
print(Rho,p)
0.5003048928999382 0.001010764442996582
Further Resources for Eikon Data API
For Content Navigation in Eikon - please use the Data Item Browser Application: Type 'DIB' into Eikon Search Bar.