ESG Factor Modeling With Python

QuantCorner
QuantCorner community Thailand QuantCorner community Thailand
Raksina Samasiri
Developer Advocate Developer Advocate

Introduction to ESG

In recent years, sustainability in business has been brought to attention, and there has been a growing recognition among investors about the importance of Environmental, Social, and Governance (ESG) factors in investment analysis. These factors are seen as integral to evaluating a company's long-term sustainability and risk profile. Many investors seek to align their portfolios with their sustainability values, and this has led to the rise of ESG investing.

ESG is a framework for evaluating a company's performance and impact considering three main area:

  • Environmental: Assessing a company's impact on the environment, including its carbon footprint, pollution levels, and resource management practices.
  • Social: Examining a company's relationships with its stakeholders, including employees, customers, and communities. This could involve factors like labor practices, diversity and inclusion, and community engagement.
  • Governance: Focusing on a company's leadership structure, transparency, accountability, and risk management practices. Strong governance helps ensure long-term sustainability and mitigates risks like fraud or mismanagement.

London Stock Exchange Group (LSEG) has developed our own ESG scoring framework, aimed at providing a transparent and objective measure of a company’s ESG performance. This framework evaluates companies based on their relative performance, commitment, and effectiveness across 10 main themes, encompassing a wide range of criteria from environmental sustainability to corporate governance and social responsibility

Factor Modeling

Factor modeling seeks to explain the returns of a stock or portfolio by attributing them to various factors, each representing a specific characteristic or attribute that influences stock prices. Factor models are typically constructed using historical data through techniques such as regression analysis. Integration of ESG scores with factor models can be done by treating them as additional factors. This allows investors to incorporate sustainability considerations into their investment strategies.

Prerequisites

  • LSEG Workspace application with an access for RD library desktop session.
  • Python 3.9 or above
  • Python libraries
    • pandas==2.1.4
    • numpy==1.25.0
    • seaborn==0.13.2
    • matplotlib==3.8.0
    • matplotlib-inline==0.1.6
    • refinitiv-data==1.6.0
    • scikit-learn==1.4.2
    • scipy==1.9.3

Python Code Implementation

First we're importing necessary libraries and open the data library session (some other libraries such as scipy and sklearn will be imported in the step of visualization and analysis)

    	
            

import pandas as pd

import numpy as np

 

import seaborn as sns

import matplotlib.pyplot as plt

 

import refinitiv.data as rd

 

rd.open_session()

Step 1) Prepare datasets for factor modeling. 

In this demonstration utilized two datasets – asset price dataset and ESG scoring dataset. As the ESG score data is available in annually interval, our asset price dataset also contains the annually closing price of the asset. The ESG score dataset contains the ESG score across multiple domains of the corresponding asset, which will be used as factors in this modeling.

We're using RIC SCC.BK here, it can be changed to the instrument you're interested or chain, such as 0#.SETI for SET Index

The RIC (Refinitiv Identification Code) is a market-level identifier for instruments and pricing sources. To find the RIC you're looking for, you can use 

    	
            

# Get annual pricing data with get_data function from year 2000 to 2024

fund_nav_df = rd.get_data(['SCC.BK'],

                          ['TR.PriceClose.date', 'TR.PriceClose'],

                          {'SDate': '2000-10-04', 'EDate': '2024-03-01', 'Frq': 'FY'})

fund_nav_df.set_index('Date', inplace=True)

fund_nav_df

    	
            

# Plot graph to see the trend of pricing

price_close = fund_nav_df['Price Close']

price_close.plot()

Prepare ESG score data

    	
            

# Get annual ESG score data with get_data function from year 2000 to 2024

df = rd.get_data(

    universe = ['SCC.BK'],

    fields = [

        'TR.TRESGScore.date',

        'TR.TRESGScore',

        'TR.TRESGCScore',

        'TR.TRESGCControversiesScore',

        'TR.TRESGResourceUseScore',

        'TR.TRESGEmissionsScore',

        'TR.TRESGInnovationScore',

        'TR.TRESGWorkforceScore',

        'TR.TRESGHumanRightsScore',

        'TR.TRESGCommunityScore',

        'TR.TRESGProductResponsibilityScore',

        'TR.TRESGManagementScore',

        'TR.TRESGShareholdersScore',

        'TR.TRESGCSRStrategyScore'

    ],

    parameters = {'Period':'FY0','Frq':'FY','SDate':'0','EDate':'-24'}

)

 

# Format data

factor_df = df

factor_df.sort_values(by = ['Date'], ascending = [True], inplace = True)

factor_df.dropna(inplace=True)

factor_df

Step 2) Calculate Log Return of asset

Calculate annually log returns from asset price dataset to observe price behavior, which will serve as the dependent variable in regression analysis.

    	
            

# Calculate Log Return of Asset

 

log_returns = np.log(price_close / price_close.shift(1))

fund_nav_df["log_returns"] = log_returns

fund_nav_df

Step 3) Merge Datasets and Map ESG Scores

Merge datasets by years into a single dataframe and map ESG scores to corresponding assets.

    	
            

fund_nav_df.drop(columns=['Instrument', 'Price Close'], inplace = True)

fund_nav_df['Year'] = fund_nav_df.index.strftime('%Y')

 

factor_df['Year'] = factor_df['Date'].dt.strftime('%Y')

 

df_beta = pd.merge(fund_nav_df, factor_df , on='Year', how='left').dropna()

df_beta

Step 4) Linear Regression

Performing linear regression on the entire dataset to assess the relationship between log returns and ESG factors. Log return data are assigned to y value and all corresponding factors are assigned as X value. Calculate R-Squared and regression intercept (Alpha) values.

    	
            

from sklearn.linear_model import LinearRegression

from sklearn.metrics import r2_score

 

# Log return data are assigned to y value

y = df_beta.iloc[:, 1:2].to_numpy()

# All corresponding factors are assigned as X value

X = df_beta.drop(columns=['log_returns','Year','Instrument','Date'])

 

# coefficients linear regression

reg = LinearRegression(positive = True).fit(X, y)

liber_coeff = pd.DataFrame(reg.coef_, columns=reg.feature_names_in_)

 

# Calculate R-Squared and regression intercept (Alpha) values

R2 = r2_score( y, reg.predict(X))

liber_coeff["R2"] = R2

liber_coeff["Alpha"]=reg.intercept_

 

liber_coeff

Then we plot it as a heatmap graph as below

    	
            

sns.heatmap(liber_coeff.T, annot=True, cmap='coolwarm')

plt.show()

To capture any relationship between different time periods within the dataset, performing linear regression on a subset of the dataset. Identifying variations in the relationship between variables across subsets.

    	
            

# Performing linear regression on a subset of the dataset

# Identifying variations in the relationship between variables across subsets.

 

list_factor_year = []

 

x = range(0, len(df_beta), 3)

 

for n in x:

    df_beta_ = df_beta.iloc[n:n+2].drop(columns=['Date'])

    y = df_beta_.iloc[:, 1:2].to_numpy()

    X = df_beta_.drop(columns=['log_returns','Year','Instrument'])

    

    # Coefficients linear regression

    reg = LinearRegression(positive = True).fit(X, y)

    liber_coeff = pd.DataFrame(reg.coef_, columns=reg.feature_names_in_)

    R2 = r2_score(y, reg.predict(X))

    liber_coeff['R2'] = R2

    liber_coeff['Alpha']=reg.intercept_

 

    liber_coeff['Year'] = df_beta_.Year.iloc[-1]

    list_factor_year.append(liber_coeff)

 

df_factor_time = pd.concat(list_factor_year)

df_factor_time.set_index('Year', inplace=True)

df_factor_time

Then we plot it as a heatmap graph as below

    	
            

sns.heatmap(df_factor_time.head(10), annot=True, cmap='coolwarm')

plt.show()

Step 5) Style Analysis 

Find the optimal weights that minimizes the Tracking error between a portfolio of the explanatory variables(X) and the dependent variable(y). First, defining the styles_analysis function, then assigning asset log return (y value) as dependent variable and ESG factors (X value) as explanatory variables.

    	
            

# Find the optimal weights that minimizes the Tracking error

# between a portfolio of the explanatory variables(X) and the dependent variable(y).

 

from scipy.optimize import minimize

 

def tracking_error(r_a, r_b):

    """

    Returns the Tracking Error between the two return series

    """

    return np.sqrt(((r_a - r_b)**2).sum())

 

def portfolio_tracking_error(weights, ref_r, bb_r):

    """

    returns the tracking error between the reference returns

    and a portfolio of building block returns held with given weights

    """

    return tracking_error(ref_r, (weights*bb_r).sum(axis=1))

 

# Defining the styles_analysis function

 

def style_analysis(dependent_variable, explanatory_variables):

    """

    Returns the optimal weights that minimizes the Tracking error between

    a portfolio of the explanatory variables and the dependent variable

    """

    n = explanatory_variables.shape[1]

    init_guess = np.repeat(1/n, n)

    bounds = ((0.0, 1.0),) * n # an N-tuple of 2-tuples!

    # construct the constraints

    weights_sum_to_1 = {'type': 'eq',

                        'fun': lambda weights: np.sum(weights) - 1

    }

    solution = minimize(portfolio_tracking_error, init_guess,

                       args=(dependent_variable, explanatory_variables,), method='SLSQP',

                       options={'disp': False},

                       constraints=(weights_sum_to_1,),

                       bounds=bounds)

    weights = pd.Series(solution.x, index=explanatory_variables.columns)

    return weights

    	
            

# Assigning asset log return (y value) as dependent variable and ESG factors (X value) as explanatory variables

dependent_variable = df_beta[['log_returns']]

explanatory_variables =  df_beta.iloc[:, 4:-1]

 

# Find the optimal weights that minimizes the Tracking error

weights_b = style_analysis(dependent_variable['log_returns'], explanatory_variables)

Then we plot it as a bar chart as below

    	
            weights_b.sort_values(ascending=False).plot.bar()
        
        
    

Here's the weights used to plot the chart above

    	
            

pd.set_option('display.float_format', lambda x: '%.15f' % x)

weights_b

Conclusion

In conclusion, the integration of Environmental, Social, and Governance (ESG) factors into factor modeling represents a significant advancement in investment analysis. ESG considerations have become increasingly essential for investors seeking to align their portfolios with sustainability values while managing risk effectively.

Reference

  • Register or Log in to applaud this article
  • Let the author know how much this article helped you
If you require assistance, please contact us here