Saturday, January 16, 2016

Test of a Basic Linear Regression Model

Introduction

My interest is to test the association between per person electric consumption and income per capita via basic linear regression. The source of my data set is gapminder(gapminder.org)

Data Preparation

The variables of interest are "relectricperperson" and "incomeperperson". 
I will center the explanatory variable - "incomeperperson" prior to performing the regression analysis.

Code

import pandas
import numpy
import seaborn
import os
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

# define method to load data of interest


def load_data(data_dir, csv_file):
   if __name__ == "__main__":
   DATA_PATH = os.path.join(os.getcwd(), data_dir)
   DATA_FILE = os.path.join(DATA_PATH, csv_file)
   data = pandas.read_csv(DATA_FILE, low_memory=False)
   return data


# bug fix for display format to avoid run time errors
pandas.set_option('display.float_format', lambda x: '%f' % x)

# Set pandas to display all columns and rows in DataFrame
pandas.set_option('display.max_rows', None)
pandas.set_option('display.max_columns', None)


# loading data
data = load_data('data', 'gapminder.csv')
print(data)

#Making a copy of data frame
reg_data = data.copy()

# Extracting data pertinent to variables of interest
reg_data = \
reg_data[['incomeperperson', 'relectricperperson']]


print(reg_data)

# setting variables of interest to numeric
reg_data['incomeperperson'] = \
pandas.to_numeric(reg_data['incomeperperson'], errors='coerce')
reg_data['relectricperperson'] = \

pandas.to_numeric(reg_data['relectricperperson'], errors='coerce')


Center explanatory variable "incomeperperson"



mean_percapita_income = numpy.mean(reg_data['incomeperperson'])
print(mean_percapita_income)

8740.96607617579

reg_data['incomeperperson_centered'] =\
(reg_data['incomeperperson'] - mean_percapita_income)


print(reg_data)



Calculating mean of centered "incomeperperson" variable


mean_percapitaincome_centered =\
numpy.mean(reg_data['incomeperperson_centered'])

print(mean_percapitaincome_centered)

-1.1488354127658041e-13

Basic Regression


Testing the association between incomeperperson(centered) and  relectricperperson


scat1 = seaborn.regplot(x='incomeperperson_centered', y='relectricperperson',
fit_reg=True, data=reg_data)
plt.xlabel('Income Per Person (constant 2008 USD)')
plt.ylabel('Per Person Electric Consumption (kWh')
plt.title('Scatterplot for the Association Between income and\
electricity consumption globally')

print("OLS regression model for the association between income per person and \
real electric consumption per person")
reg_model = smf.ols('relectricperperson ~ incomeperperson_centered',
data=reg_data).fit()

print(reg_model.summary())

Program Output

OLS regression model for the association between income per person and real electric consumption per person
                            OLS Regression Results                          
========================================================================
Dep. Variable:     relectricperperson                                    R-squared:                       0.425
Model:                            OLS                                           Adj. R-squared:                  0.420
Method:                 Least Squares                                         F-statistic:                     94.47
Date:                Sun, 17 Jan 2016                                          Prob (F-statistic):           4.63e-17
Time:                        12:48:43                                    Log-Likelihood:                -1105.9
No. Observations:                 130                                      AIC:                             2216.
Df Residuals:                     128                                         BIC:                             2222.
Df Model:                           1                                      
Covariance Type:            nonrobust                                      
========================================================================
                                                       coef       std err          t           P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------------------------------------------
Intercept                               1144.6759    105.855     10.814      0.000       935.223  1354.129
incomeperperson_centered     0.0904           0.009       9.719      0.000         0.072     0.109
========================================================================
Omnibus:                      148.000                                             Durbin-Watson:                   2.123
Prob(Omnibus):                  0.000                                         Jarque-Bera (JB):             4079.319
Skew:                           4.030                                                      Prob(JB):                         0.00
Kurtosis:                      29.232                                                       Cond. No.                     1.14e+04
========================================================================



Result Summary

The results of my linear regression test demonstrates that,  per per person electric consumption(beta = 0.0904, p < 0.005, alpha = 1144.6759, p < 0.005 ) was significant and positively correlated with income per person. That is a one unit increase in per per person electric consumption results in 0.0904 increase in income per person.

No comments:

Post a Comment