Data Analysis and Interpretation: Test of a Basic Linear Regression Model

Introduction

My interest is to test the association between per person electric consumption and income per capita via basic linear regression. The source of my data set is gapminder(gapminder.org)

Data Preparation

The variables of interest are "relectricperperson" and "incomeperperson".

I will center the explanatory variable - "incomeperperson" prior to performing the regression analysis.

Code

import pandas
import numpy
import seaborn
import os
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

# define method to load data of interest

def load_data(data_dir, csv_file):
if __name__ == "__main__":
DATA_PATH = os.path.join(os.getcwd(), data_dir)
DATA_FILE = os.path.join(DATA_PATH, csv_file)
data = pandas.read_csv(DATA_FILE, low_memory=False)
return data

# bug fix for display format to avoid run time errors
pandas.set_option('display.float_format', lambda x: '%f' % x)

# Set pandas to display all columns and rows in DataFrame
pandas.set_option('display.max_rows', None)
pandas.set_option('display.max_columns', None)

# loading data
data = load_data('data', 'gapminder.csv')
print(data)

#Making a copy of data frame
reg_data = data.copy()

# Extracting data pertinent to variables of interest
reg_data = \
reg_data[['incomeperperson', 'relectricperperson']]

print(reg_data)

# setting variables of interest to numeric
reg_data['incomeperperson'] = \
pandas.to_numeric(reg_data['incomeperperson'], errors='coerce')
reg_data['relectricperperson'] = \

pandas.to_numeric(reg_data['relectricperperson'], errors='coerce')

Center explanatory variable "incomeperperson"

mean_percapita_income = numpy.mean(reg_data['incomeperperson'])
print(mean_percapita_income)

8740.96607617579

reg_data['incomeperperson_centered'] =\
(reg_data['incomeperperson'] - mean_percapita_income)

print(reg_data)

Calculating mean of centered "incomeperperson" variable

mean_percapitaincome_centered =\
numpy.mean(reg_data['incomeperperson_centered'])

print(mean_percapitaincome_centered)

-1.1488354127658041e-13

Basic Regression

Testing the association between incomeperperson(centered) and relectricperperson

scat1 = seaborn.regplot(x='incomeperperson_centered', y='relectricperperson',
fit_reg=True, data=reg_data)
plt.xlabel('Income Per Person (constant 2008 USD)')
plt.ylabel('Per Person Electric Consumption (kWh')
plt.title('Scatterplot for the Association Between income and\
electricity consumption globally')

print("OLS regression model for the association between income per person and \
real electric consumption per person")
reg_model = smf.ols('relectricperperson ~ incomeperperson_centered',
data=reg_data).fit()

print(reg_model.summary())

Program Output

OLS regression model for the association between income per person and real electric consumption per person
OLS Regression Results
========================================================================
Dep. Variable: relectricperperson R-squared: 0.425
Model: OLS Adj. R-squared: 0.420
Method: Least Squares F-statistic: 94.47
Date: Sun, 17 Jan 2016 Prob (F-statistic): 4.63e-17
Time: 12:48:43 Log-Likelihood: -1105.9
No. Observations: 130 AIC: 2216.
Df Residuals: 128 BIC: 2222.
Df Model: 1
Covariance Type: nonrobust
========================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------------------------------------------------
Intercept 1144.6759 105.855 10.814 0.000 935.223 1354.129
incomeperperson_centered 0.0904 0.009 9.719 0.000 0.072 0.109
========================================================================
Omnibus: 148.000 Durbin-Watson: 2.123
Prob(Omnibus): 0.000 Jarque-Bera (JB): 4079.319
Skew: 4.030 Prob(JB): 0.00
Kurtosis: 29.232 Cond. No. 1.14e+04
========================================================================

Result Summary

The results of my linear regression test demonstrates that, per per person electric consumption(beta = 0.0904, p < 0.005, alpha = 1144.6759, p < 0.005 ) was significant and positively correlated with income per person. That is a one unit increase in per per person electric consumption results in 0.0904 increase in income per person.

Data Analysis and Interpretation

Saturday, January 16, 2016

Test of a Basic Linear Regression Model