ESG Factor Modeling With Python

June 25,2024

QuantCorner

Raksina Samasiri

Introduction to ESG

In recent years, sustainability in business has been brought to attention, and there has been a growing recognition among investors about the importance of Environmental, Social, and Governance (ESG) factors in investment analysis. These factors are seen as integral to evaluating a company's long-term sustainability and risk profile. Many investors seek to align their portfolios with their sustainability values, and this has led to the rise of ESG investing.

ESG is a framework for evaluating a company's performance and impact considering three main area:

Environmental: Assessing a company's impact on the environment, including its carbon footprint, pollution levels, and resource management practices.
Social: Examining a company's relationships with its stakeholders, including employees, customers, and communities. This could involve factors like labor practices, diversity and inclusion, and community engagement.
Governance: Focusing on a company's leadership structure, transparency, accountability, and risk management practices. Strong governance helps ensure long-term sustainability and mitigates risks like fraud or mismanagement.

London Stock Exchange Group (LSEG) has developed our own ESG scoring framework, aimed at providing a transparent and objective measure of a company’s ESG performance. This framework evaluates companies based on their relative performance, commitment, and effectiveness across 10 main themes, encompassing a wide range of criteria from environmental sustainability to corporate governance and social responsibility

Factor Modeling

Factor modeling seeks to explain the returns of a stock or portfolio by attributing them to various factors, each representing a specific characteristic or attribute that influences stock prices. Factor models are typically constructed using historical data through techniques such as regression analysis. Integration of ESG scores with factor models can be done by treating them as additional factors. This allows investors to incorporate sustainability considerations into their investment strategies.

Prerequisites

LSEG Workspace application with an access for RD library desktop session.
Python 3.9 or above
Python libraries
- pandas==2.1.4
- numpy==1.25.0
- seaborn==0.13.2
- matplotlib==3.8.0
- matplotlib-inline==0.1.6
- refinitiv-data==1.6.0
- scikit-learn==1.4.2
- scipy==1.9.3

Python Code Implementation

First we're importing necessary libraries and open the data library session (some other libraries such as scipy and sklearn will be imported in the step of visualization and analysis)

    	
            import pandas as pd
import numpy as np
 
import seaborn as sns
import matplotlib.pyplot as plt
 
import refinitiv.data as rd
 
rd.open_session()

Step 1) Prepare datasets for factor modeling.

In this demonstration utilized two datasets – asset price dataset and ESG scoring dataset. As the ESG score data is available in annually interval, our asset price dataset also contains the annually closing price of the asset. The ESG score dataset contains the ESG score across multiple domains of the corresponding asset, which will be used as factors in this modeling.

We're using RIC SCC.BK here, it can be changed to the instrument you're interested or chain, such as 0#.SETI for SET Index

The RIC (Refinitiv Identification Code) is a market-level identifier for instruments and pricing sources. To find the RIC you're looking for, you can use

Search Function
- Building Search into your Application Workflow
- Find content and functionality using Refinitiv Data Library with Eikon Advanced Search
Screener
- Find Your Right Companies with SCREENER

    	
            # Get annual pricing data with get_data function from year 2000 to 2024
fund_nav_df = rd.get_data(['SCC.BK'],
                          ['TR.PriceClose.date', 'TR.PriceClose'],
                          {'SDate': '2000-10-04', 'EDate': '2024-03-01', 'Frq': 'FY'})
fund_nav_df.set_index('Date', inplace=True)
fund_nav_df

    	
            # Plot graph to see the trend of pricing
price_close = fund_nav_df['Price Close']
price_close.plot()

Prepare ESG score data

    	
            # Get annual ESG score data with get_data function from year 2000 to 2024
df = rd.get_data(
    universe = ['SCC.BK'],
    fields = [
        'TR.TRESGScore.date',
        'TR.TRESGScore',
        'TR.TRESGCScore',
        'TR.TRESGCControversiesScore',
        'TR.TRESGResourceUseScore',
        'TR.TRESGEmissionsScore',
        'TR.TRESGInnovationScore',
        'TR.TRESGWorkforceScore',
        'TR.TRESGHumanRightsScore',
        'TR.TRESGCommunityScore',
        'TR.TRESGProductResponsibilityScore',
        'TR.TRESGManagementScore',
        'TR.TRESGShareholdersScore',
        'TR.TRESGCSRStrategyScore'
    ],
    parameters = {'Period':'FY0','Frq':'FY','SDate':'0','EDate':'-24'}
)
 
# Format data
factor_df = df
factor_df.sort_values(by = ['Date'], ascending = [True], inplace = True)
factor_df.dropna(inplace=True)
factor_df

Step 2) Calculate Log Return of asset

Calculate annually log returns from asset price dataset to observe price behavior, which will serve as the dependent variable in regression analysis.

    	
            # Calculate Log Return of Asset
 
log_returns = np.log(price_close / price_close.shift(1))
fund_nav_df["log_returns"] = log_returns
fund_nav_df

Step 3) Merge Datasets and Map ESG Scores

Merge datasets by years into a single dataframe and map ESG scores to corresponding assets.

    	
            fund_nav_df.drop(columns=['Instrument', 'Price Close'], inplace = True)
fund_nav_df['Year'] = fund_nav_df.index.strftime('%Y')
 
factor_df['Year'] = factor_df['Date'].dt.strftime('%Y')
 
df_beta = pd.merge(fund_nav_df, factor_df , on='Year', how='left').dropna()
df_beta

Step 4) Linear Regression

Performing linear regression on the entire dataset to assess the relationship between log returns and ESG factors. Log return data are assigned to y value and all corresponding factors are assigned as X value. Calculate R-Squared and regression intercept (Alpha) values.

    	
            from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
 
# Log return data are assigned to y value
y = df_beta.iloc[:, 1:2].to_numpy()
# All corresponding factors are assigned as X value
X = df_beta.drop(columns=['log_returns','Year','Instrument','Date'])
 
# coefficients linear regression
reg = LinearRegression(positive = True).fit(X, y)
liber_coeff = pd.DataFrame(reg.coef_, columns=reg.feature_names_in_)
 
# Calculate R-Squared and regression intercept (Alpha) values
R2 = r2_score( y, reg.predict(X))
liber_coeff["R2"] = R2
liber_coeff["Alpha"]=reg.intercept_
 
liber_coeff

Then we plot it as a heatmap graph as below

    	
            sns.heatmap(liber_coeff.T, annot=True, cmap='coolwarm')
plt.show()

To capture any relationship between different time periods within the dataset, performing linear regression on a subset of the dataset. Identifying variations in the relationship between variables across subsets.

    	
            # Performing linear regression on a subset of the dataset
# Identifying variations in the relationship between variables across subsets.
 
list_factor_year = []
 
x = range(0, len(df_beta), 3)
 
for n in x:
    df_beta_ = df_beta.iloc[n:n+2].drop(columns=['Date'])
    y = df_beta_.iloc[:, 1:2].to_numpy()
    X = df_beta_.drop(columns=['log_returns','Year','Instrument'])
    
    # Coefficients linear regression
    reg = LinearRegression(positive = True).fit(X, y)
    liber_coeff = pd.DataFrame(reg.coef_, columns=reg.feature_names_in_)
    R2 = r2_score(y, reg.predict(X))
    liber_coeff['R2'] = R2
    liber_coeff['Alpha']=reg.intercept_
 
    liber_coeff['Year'] = df_beta_.Year.iloc[-1]
    list_factor_year.append(liber_coeff)
 
df_factor_time = pd.concat(list_factor_year)
df_factor_time.set_index('Year', inplace=True)
df_factor_time

Then we plot it as a heatmap graph as below

    	
            sns.heatmap(df_factor_time.head(10), annot=True, cmap='coolwarm')
plt.show()

Step 5) Style Analysis

Find the optimal weights that minimizes the Tracking error between a portfolio of the explanatory variables(X) and the dependent variable(y). First, defining the styles_analysis function, then assigning asset log return (y value) as dependent variable and ESG factors (X value) as explanatory variables.

    	
            # Find the optimal weights that minimizes the Tracking error
# between a portfolio of the explanatory variables(X) and the dependent variable(y).
 
from scipy.optimize import minimize
 
def tracking_error(r_a, r_b):
    """
    Returns the Tracking Error between the two return series
    """
    return np.sqrt(((r_a - r_b)**2).sum())
 
def portfolio_tracking_error(weights, ref_r, bb_r):
    """
    returns the tracking error between the reference returns
    and a portfolio of building block returns held with given weights
    """
    return tracking_error(ref_r, (weights*bb_r).sum(axis=1))
 
# Defining the styles_analysis function
 
def style_analysis(dependent_variable, explanatory_variables):
    """
    Returns the optimal weights that minimizes the Tracking error between
    a portfolio of the explanatory variables and the dependent variable
    """
    n = explanatory_variables.shape[1]
    init_guess = np.repeat(1/n, n)
    bounds = ((0.0, 1.0),) * n # an N-tuple of 2-tuples!
    # construct the constraints
    weights_sum_to_1 = {'type': 'eq',
                        'fun': lambda weights: np.sum(weights) - 1
    }
    solution = minimize(portfolio_tracking_error, init_guess,
                       args=(dependent_variable, explanatory_variables,), method='SLSQP',
                       options={'disp': False},
                       constraints=(weights_sum_to_1,),
                       bounds=bounds)
    weights = pd.Series(solution.x, index=explanatory_variables.columns)
    return weights

    	
            # Assigning asset log return (y value) as dependent variable and ESG factors (X value) as explanatory variables
dependent_variable = df_beta[['log_returns']]
explanatory_variables =  df_beta.iloc[:, 4:-1]
 
# Find the optimal weights that minimizes the Tracking error
weights_b = style_analysis(dependent_variable['log_returns'], explanatory_variables)

Then we plot it as a bar chart as below

    	
            weights_b.sort_values(ascending=False).plot.bar()

Here's the weights used to plot the chart above

    	
            pd.set_option('display.float_format', lambda x: '%.15f' % x)
weights_b

Conclusion

In conclusion, the integration of Environmental, Social, and Governance (ESG) factors into factor modeling represents a significant advancement in investment analysis. ESG considerations have become increasingly essential for investors seeking to align their portfolios with sustainability values while managing risk effectively.

Reference

Register or Log in to applaud this article
Let the author know how much this article helped you

If you require assistance, please contact us here

GitHub - ESG Factor Modeling With Python

Refinitiv Data Library for Python

Articles

Developer Tools

Use Cases

Upcoming Events

Past Events

About Us