Regression Analysis of Keynesian Consumption Function

Words: 2830 Pages: 10

Introduction

Regression analysis is a statistical tool that is used to develop approximate linear relationships among various variables. Regression analysis formulates an association between several variables. When coming up with the model, it is necessary to separate between dependent and independent variables. Regression models are used to predict trends of future variables. The paper carries out both simple and multiple regression analyses of the Keynesian consumption function. Consumption is a function of income and wealth. Consumption from a sample of 25 households is used in the analysis.

Scatter diagram

A scatter diagram is a graph that plots two related variables on a Cartesian plane. The independent variable is plotted on the x-axis while the dependent variable is on the y – axis. In this case, the amount of consumption is plotted on the y – axis while the wealth and annual disposable income are plotted on the x-axis. Different graphs will be plotted for each explanatory variable. A Scatter diagram tries to establish if there exists a linear relationship between two variables plotted on the diagram. This can be observed by looking at the trend of the scatter plots. The graph below shows the scatter diagram of consumer expenditure (Y) against disposable income (X₁).

Relationship between consumption and annual disposal income

The scatter plots on the diagram above tend to slope upwards. The points on the diagram tend to concentrate along the line. This indicates a strong positive relationship between the variable. The correlation coefficient shows the value of correlation between consumption expenditure and annual disposable income.

The graph below shows the scatter diagram of consumer expenditure (Y) against the level of wealth (X₂).

Relationship between consumer expenditure and level of wealth

The scatter plots on the diagram above tend to slope upwards. This indicates a strong positive relationship between the variable. Thus, it is clear that both variables have a positive relationship with consumer expenditure.

Regression of total consumer expenditure on annual disposal income

The dependent variable is the total consumer expenditure while the independent variable is the amount of annual disposable income. A sample of twenty-five households is used to estimate the regression equation.

The regression line will take the form.

Regression of total consumer expenditure on annual disposal income

when the ordinary least squares method is used. The regression line can be simplified as shown below.

Simplified regression equation Y = b₀ + b₁X₁

Y = Consumer expenditure

X₁ = annual disposable income

The theoretical expectations are b₀ can take any value and b₁ > 0.

Regression Results

Variable		Coefficients of the variable
b₀	Y-intercept	0
b₁	Coefficient of annual disposable income	0.969973

From the above table, the regression equation can be written as Y = 0.969973X_1.The coefficient value of 0.969973 implies that as the annual disposal income increases by one unit, the consumer expenditure will increase by 0.969973 units. The positive value of the coefficient implies a positive relationship between the consumer expenditure and annual disposal income as was evident in the scatter diagram. The regression line can be drawn on the scatter diagram as shown below.

In the above diagram, the line of best fit is shown by the plots of predicted consumption.

Evaluation of regression model

Evaluation of the regression model can be done by testing the statistical significance of the variables. Testing statistical significance shows whether the annual disposal income is a significant determinant of the total consumer expenditure. A t-test will be used since the sample size is small. A two-tailed t-test is carried out at a 95% level of confidence.

Null hypothesis: Ho: bi = 0

Alternative hypotheses: Ho: b_i ≠ 0

The null hypothesis implies that the variables are not significant determinants of demand. The alternative hypothesis implies that variables are a significant determinant of demand. The table below summarizes the results of the t-tests.

	Variable	t – values computed	t at α 0.05	Decision
b₀	Y-intercept	N/A	1.9432	Ignore
b₁	Coefficient of annual disposable income	18.93113	1.9432	Reject

From the table above, the values of t – calculated are greater than the values of t – tabulated. Therefore, the null hypothesis will be rejected and this implies that the annual disposable income is a significant determinant explanatory variable. Thus, annual disposable income is statistically significant at the 95% level of significance.

R-square value

Coefficient of determination estimates the number of variations of the dependent variable explained by the independent variables. A high coefficient of determination implies that the explanatory variables adequately explain variations in the demand function. A low value of the coefficient of determination implies that the explanatory variables do not explain the variations in consumer expenditure adequately. For this regression, the value of R² is 93.72%. This implies that the annual disposable income explains 93.72% of the variation in consumer expenditure. It is an indication of a strong explanatory variable. Also, the value of adjusted R² is high at 89.56%. The value of R² can be improved by adding more variables to the regression model.

Analysis of variance

Item	Value	Proportion
Total sum of squares (TSS)	832146.8	100.00%
Residual sum of squares (RSS)	52228.46	6.28%
Explained sum of squares (ESS)	779918.3	93.72%

The ESS is greater than RSS by a large margin. From the table, the explained sum of squares (93.72%) is equal to the value of R² discussed above. It shows that the model is relevant in determining the variations in consumer expenditure. This is consistent with the real-life situation where the consumption of people highly depends on disposable income.

Regression of total consumer expenditure on level of wealth

The dependent variable is the total consumer expenditure while the independent variable is the level of wealth.

The regression line will take the form

Regression of total consumer expenditure on level of wealth

when the ordinary least squares method is used. The regression line can be simplified as shown below.

Simplified regression equation Y = b₀ + b₂X₂

Y = Consumer expenditure

X₂ = level of wealth

The theoretical expectations are b₀ can take any value and b₂ > 0.

Regression Results

Variable		Coefficients of the variable
b₀	Y-intercept	0
b₁	Coefficient of the level of wealth	0.254056

From the above table, the regression equation can be written as Y = 0.254056X_2.The coefficient value of 0.254056 implies that as the level of wealth increases by one unit, the consumer expenditure will increase by 0.254056 units. The positive value of the coefficient implies a positive relationship between the consumer expenditure and the level of wealth as was evident in the scatter diagram. The regression line can be drawn on the scatter diagram as shown below.

In the above diagram, the line of best fit is shown by the plots of predicted consumption that tend to take a straight line.

Evaluation of regression model

A two-tailed t-test is carried out at a 95% level of confidence.

Null hypothesis: Ho: b_i = 0

Alternative hypotheses: Ho: b_i ≠ 0

	Variable	t – values computed	t at α 0.05	Decision
b₀	Y-intercept	N/A	1.9432	Ignore
b₁	Coefficient of annual disposable income	15.61478	1.9432	Reject

From the table above, the values of t – calculated are greater than the values of t – tabulated. Therefore, the null hypothesis will be rejected and this implies that the level of wealth is a significant determinant explanatory variable. Thus, the level of wealth is statistically significant at the 95% level of significance. The regression model shows that the slope is strong and the regression coefficient shows a positive relationship between the consumer expenditure and level of income.

R-square value

The value of R² is 91.04%. This implies that the level of wealth explains only 91.04% of the variation in consumer expenditure. It is an indication of a strong explanatory variable. Also, the value of adjusted R² is high at 86.87%.

Analysis of variance

Item	Value	Proportion
Total sum of squares (TSS)	832146.8	100.00%
Residual sum of squares (RSS)	74570.31	8.96%
Explained sum of squares (ESS)	757576.4	91.04%

The ESS is greater than RSS by a large margin. From the table, the explained sum of squares (91.04%) is equal to the value of R² discussed above. It shows that the model is relevant in determining the variations in consumer expenditure. This is consistent with the life income hypothesis theory.

Prediction

Using the regression line Y = 0.969973X_1,the values of consumer expenditure can be predicted as below.

X₁	Y = 0.969973X₁
373	361.8000195
191.6	185.8468733
247.12	239.6997877

Multiple regression analysis

Regression of total consumer expenditure on disposable income and level of wealth

The dependent variable is the total consumer expenditure while the independent variables are disposable income and the level of wealth. The regression line will attempt to establish a linear relationship between consumer expenditure, disposable income, and level of wealth.

The regression line will take the form

Regression of total consumer expenditure on disposable income and level of wealth

when the ordinary least squares method is used. The regression line can be simplified as shown below.

Simplified regression equation Y = b₀ + b₁X₁+ b₂X₂

Y = Consumer expenditure

X₁ = Disposable income

X₂ = level of wealth

The theoretical expectations are b₀ can take any value and b₁, b₂ > 0.

Regression Results

The result of regression for each independent variable is shown in the table below.

Variable		Coefficients of the variable
b₀	Y-intercept	0
b₁	Coefficient of disposable income	0.677219
b₂	Coefficient of the level of wealth	0.080833

From the above table, the regression equation can be written as Y = 0.677219X₁+ 0.080833X_2.The intercept value of 0 implies that the line of best fit originates from the origin. The coefficient values are positive, that is, there is a positive relationship between disposable income and wealth. The values of the coefficient of both disposal income and wealth are lower than the values of multiple regression analysis. This can be as a result of some element of relationship with the explanatory variables.

Evaluation of regression model

Testing statistical significance shows whether the level of wealth is a significant determinant of the total consumer expenditure. A two-tailed t-test is carried out at a 95% level of confidence.

Null hypothesis: Ho: b_i = 0

Alternative hypotheses: Ho: b_i ≠ 0

The null hypothesis implies that the variables are not significant determinants of demand. The alternative hypothesis implies that variables are a significant determinants of demand. The table below summarizes the results of the t-tests.

	Variable	t – values computed	t at α 0.05	Decision
b₀	Y-intercept	N/A	1.9432	Ignore
b₁	Coefficient of annual disposable income	3.717362	1.9432	Reject
b₁	Coefficient of the level of wealth	1.669606	1.9432	Do not reject

From the table above, the value of t – calculated for annual disposable income is greater than the values of t – tabulated. Therefore, the null hypothesis will be rejected and this implies that the annual disposable income is a significant determinant explanatory variable. However, the level of wealth is not statistically significant since the null hypothesis will not be rejected. Thus, it can be dropped in the regression model. Besides, there is a high likelihood that the level of wealth is highly related to the disposal income since wealth is generated from disposal income among other factors.

R-square value

The value of multiple R is 97.16%. The value of R² is 94.40%. This implies that the variables explain only 94.40% of the variation in consumer expenditure. It is an indication of a strong explanatory variable. The value is higher than those of the individual values since more variables improve on the value of the coefficient of determination.

Analysis of variance

The table below summarizes the analysis of variance.

Item	Value	Proportion
Total sum of squares (TSS)	832146.8	100.00%
Residual sum of squares (RSS)	46582.67	5.60%
Explained sum of squares (ESS)	785564.1	94.40%

The ESS is greater than RSS by a large margin. From the table, the explained sum of squares (94.40%) is equal to the value of R² discussed above. It is worth noting that the increasing number of variables increases the value of ESS.

Prediction

Using the regression line Y = 0.969973X_1,the values of consumer expenditure can be predicted as below.

X₁	Y = 0.969973X₁
373	361.8000195
191.6	185.8468733
247.12	239.6997877

Descriptive statistics

The table below summarizes the descriptive statistics of some variables.

Item	Value
Variation in education level in 1970	3.242923827
Variation in education level in 2005	3.237961065
Variation in health in 1970	11.26764034
Variation in health in 2005	12.40498465
Minimum life expectancy in 1970	34.872099702
Maximum life expectancy in 1970	74.6492691
Minimum life expectancy in 2005	34.96585464
Maximum life expectancy in 2005	82.07544708
Maximum education level in 1970	11.81000042
Minimum Education level in 1970	0.100000001
Maximum education level in 2005	12.88500023
Minimum Education level in 2005	1.090000033

Regression model

The regression model is given by the equation below.

The results of the regression are summarized in the table below.

Variable	Coefficient	t – values computed	t at α 0.05	Decision
Intercept	-1256.756432	-1.00236	1.9432	Reject
rgdpch1970	0.290938782	3.974889	1.9432	Reject
avgsch1970	1426.641574	7.669623	1.9432	Reject
lifeex1970	63.08168093	2.501768	1.9432	Reject
trade1970	26.85484033	2.461359	1.9432	Reject

The regression equation can be written as Y = -1256.764 + 0.2909rgdpch1970 +1426.6415avgsch1970 + 63.08lifeex1970 + 26.85trade1970. All the coefficients are positive. This implies that they contribute positively to real GDP per capita in 2005. A two-tailed t-test is carried out at a 95% level of confidence.

Null hypothesis: Ho: b_i = 0

Alternative hypotheses: Ho: b_i ≠ 0

Since the null hypothesis is rejected for all the coefficients of explanatory variables, it implies that all the variables are statistically significant.

Estimating the equations using the log of variables

The results of the regression are summarized in the table below.

Variable	Coefficient	t – values computed	t at α 0.05	Decision
Intercept	1.948859327	3.210091	1.9432	Reject
rgdpch1970	0.388463603	2.470411	1.9432	Reject
avgsch1970	1.020557882	3.574301	1.9432	Reject
lifeex1970	0.876669597	5.051896	1.9432	Reject
trade1970	-0.489591873	-1.6706	1.9432	Do not reject

The regression equation can be written as Y = 1.95 + 0.39rgdpch1970 +1.02avgsch1970 + 0.87lifeex1970 – 0.49trade1970. All the coefficients are positive apart from trade. Besides, their values are reduced. All the variables are statistically significant apart from trade for 1970 at the 95% level of confidence.

Correlation matrix

The table below summarizes the correlation matrix.

	rgdpch2005	rgdpch1970	avgsch1970	lifeex1970	trade1970
rgdpch2005	1
rgdpch1970	0.448416845	1
avgsch1970	0.575739052	0.304870139	1
lifeex1970	0.418358277	0.291254712	0.399248689	1
trade1970	0.243317859	0.305530063	0.01657099	0.208791016	1

There is slightly a high correlation between the average schooling in 1970 and real GDP per capita in 2005. All the other correlation coefficients are low. Thus, there is no collinearity between the variables in the regression equation above. Multicollinearity is a condition when the explanatory variables are strongly correlated. Some of the other elements that need to be observed are high values of standard error and low values of the t – ratio. These are not found in the regression above and thus, multicollinearity does not exist in the above regression.

Test for autocorrelation

Autocorrelation is a scenario where the error terms of different periods are related. It is often tested either graphically or by use of the Durbin Watson test. The residual plot will be observed to ascertain if autocorrelation exists.

From the residual plots above, it is evident that there is no autocorrelation in most of the variables.

Robust standard errors or Whites heteroscedasticity correlated standard errors

Heteroscedasticity is a scenario where the error term violates the assumption of constant variance. The standard error of the regression equation is 7510.717929. The standard errors which arise when there is heteroscedasticity are known as robust standard errors. Generally, the standard errors shown above are not correct since the robust standard errors are more than the standard errors generated from the regression analysis. This gives an indication of the possible existence of heteroscedasticity. In the regression above, the robust standard errors are 8010.92728. It is an indication of the possible existence of heteroscedasticity.

Estimation of the average GDP per capita of Western Europe

The data of Western Europe is regarded as a dummy variable. Thus, it is important to use the various ways of carrying out regression using dummy variables. A dummy variable is regarded as a binary variable that takes the value either zero or one. Panel data analysis or random-effects models are used to analyze the regression equation. There will be a need to include two additional variables for the dummy variable that is dieurope0 and dieurope1. Thus, the regression equation will take the form shown below.

The table below summarizes the results of the regression with the dummy variables.

Variable	Coefficient	t – values computed	t at α 0.05	Decision
Intercept	-1295.93	-1.13424	1.9432
rgdpch1970	0.202979	2.983556	1.9432	Reject
avgsch1970	851.1276	4.459757	1.9432	Reject
lifeex1970	76.36457	3.310644	1.9432	Reject
trade1970	24.6943	2.482368	1.9432	Reject
westerneurope0	0	65535	1.9432	Reject
westerneurope1	13951.28	6.562905	1.9432	Reject

From the table above, it is clear the dummy variable westerneurope1 has a positive slope. In addition, it is statistically significant. There are a number of ways by which the regression model can be improved. One way is by lagging the dummy variables by one or two years. This will help in improving the efficiency of the model.