Среда, 26 марта, 2025

The concept of correlation and regression analysis

To solve the problems of economic analysis and forecasting, statistical, reporting or observable data are very often used. At the same time, it is believed that these data are values of a random variable.

A random variable is a variable that, depending on the case, takes different values with some probability. The law of distribution of a random variable shows the frequency of its certain values in their totality.

When studying the relationships between economic indicators based on statistical data, there is often a stochastic relationship between them. It is manifested in the fact that a change in the law of distribution of one random variable occurs under the influence of a change in another. The relationship between quantities can be complete (functional) and incomplete (distorted by other factors).

An example of functional dependence is the output of products and its consumption in conditions of scarcity.

An incomplete relationship is observed, for example, between the length of service of workers and their labor productivity. Usually, workers with a long work experience work better than young ones, but under the influence of additional factors – education, health, etc. This dependence can be distorted.

The section of mathematical statistics devoted to the study of relationships between random variables is called correlation analysis. The main task of correlation analysis is to establish the nature and closeness of the relationship between effective (dependent) and factor (independent) indicators (signs) in a given phenomenon or process. Correlation can be detected only by mass comparison of facts.

The nature of the relationship between the indicators is determined by the correlation field. If Y is a dependent feature and X is an independent feature, then by ticking each case of X(i) with coordinates xi and yi, we get a correlation field.

The closeness of the connection is determined using the correlation coefficient, which is calculated in a special way and lies in the intervals from minus one to plus units. If the value of the correlation coefficient lies in the range from 1 to 0.9 modulo, then there is a very strong correlation relationship. If the value of the correlation coefficient lies in the range from 0.9 to 0.6, then it is said that there is a weak correlation. Finally, if the value of the correlation coefficient is in the range from -0.6 to 0.6, then they speak of a very weak correlation dependence or its complete absence.

Thus, correlation analysis is used to find the nature and closeness of the relationship between random variables.

Regression analysis is aimed at deriving, defining (identifying) the regression equation, including statistical estimation of its parameters. The regression equation allows you to find the value of a dependent variable if the value of the independent or independent variables is known.

In practice, it is a question of analyzing the set of points on the chart (i.e. a set of statistical data), finding a line that, if possible, accurately reflects the pattern (trend, trend) contained in this set – the regression line.

According to the number of factors, one-, two- and multifactorial regression equations are distinguished.

According to the nature of the relationship, one-factor regression equations are divided into:

(a) Linear:

The concept of correlation and regression analysisThe concept of correlation and regression analysis ,

where X is an exogenous (independent) variable;

Y – endogenous (dependent, effective) variable;

a, b – parameters.

b) power:

The concept of correlation and regression analysisThe concept of correlation and regression analysis

c) illustrative:

The concept of correlation and regression analysis

d) other.

Determination of the parameters of the linear one-factor regression equation

Suppose we have data on income (X) and demand for some commodity (Y) for a number of years (n)

YEAR

n

INCOME

X

DEMAND

Y

1

x1

y1

2

x2

y2

3

x3

y3

n

xn

yn

Suppose that there is a linear relationship between X and Y, i.e., a linear relationship between X and Y.

The concept of correlation and regression analysis

In order to find the regression equation, first of all it is necessary to investigate the closeness of the relationship between the random variables X and Y, i.e. the correlation relationship.

Let:

xThe concept of correlation and regression analysis, xThe concept of correlation and regression analysis, . . . , xn – the set of values of an independent, factorial feature;

yThe concept of correlation and regression analysis, yThe concept of correlation and regression analysis. . . , yn is a set of corresponding values of the dependent, effective feature;

n is the number of observations.

To find the regression equation, the following values are calculated:

Averages

The concept of correlation and regression analysis for an exogenous variable.

The concept of correlation and regression analysis for endogenous variable$

2. Deviations from the average values

The concept of correlation and regression analysis, The concept of correlation and regression analysis$

Variance and mean squared deviation

The concept of correlation and regression analysis, The concept of correlation and regression analysis .

The concept of correlation and regression analysis The concept of correlation and regression analysis

The values of variance and mean quadratic deviation characterize the spread of observed values around the average value. The greater the variance, the greater the spread.

Calculation of the correlation moment (covariance coefficient):

The concept of correlation and regression analysis

The correlation moment reflects the nature of the relationship between x and y. If The concept of correlation and regression analysis, then the relationship is direct. If The concept of correlation and regression analysis, then the relationship is inverse.

The correlation coefficient is calculated by the formula:

The concept of correlation and regression analysis.

It has been proved that the correlation coefficient is in the range from minus one to plus one (The concept of correlation and regression analysis). The correlation coefficient squared (The concept of correlation and regression analysis) is called the coefficient of determination.

If The concept of correlation and regression analysis, then the calculations continue.

Calculations of the parameters of the regression equation.

Coefficient b is found by the formula:

The concept of correlation and regression analysis

After that, you can easily find the parameter a:

The concept of correlation and regression analysis

The coefficients a and b are the method of least squares, the main idea of which is that the sum of the squares of the difference (residues) between the actual values of the resulting feature The concept of correlation and regression analysis and its calculated values The concept of correlation and regression analysisobtained using the regression equation is taken as a measure of the total error.

The concept of correlation and regression analysis.

In this case, the values of the residues are found according to the formula:

The concept of correlation and regression analysis, where

The concept of correlation and regression analysisThe actual value of y

The concept of correlation and regression analysis the estimated value of y.

Example. Suppose we have statistics on income (X) and demand (Y). It is necessary to find a correlation between them and determine the parameters of the regression equation.

YEAR

n

INCOME

X

DEMAND

Y

1

10

6

2

12

8

3

14

8

4

16

10,3

5

18

10,5

6

20

13

Suppose that there is a linear relationship between our quantities.

Then the calculations are best done in Excel using statistical functions;

AVERAGE – to calculate average values;

VARP – to find variance;

STDEV – to determine the mean square deviation;

CORELL – to calculate the correlation coefficient.

The correlation moment can be calculated by finding deviations from the average values for the series X and series Y, then using the SUMPROIZ function to determine the sum of their products, which must be divided by n-1.

The results of the calculations can be summarized in a table.

Parameters of the linear single-factor regression equation

Indicators

X

Y

Medium

15

9,3

Dispersion

14

6,08

Average quadr. deviation

3,7417

2,4658

Correlation moment

8,96

Correlation coefficient

0,9712

Options

b=0,64

a = – 0,3

As a result, our equation will be:

y = – 0.3 + 0.64x

Using this equation, you can find the estimated values of Y and plot (Figure 2.1).

The concept of correlation and regression analysis

Rice. 2.1. Actual and calculated values The concept of correlation and regression analysis

The broken line on the chart reflects the actual Y values, and the straight line is constructed using the regression equation and reflects the trend of change in demand depending on income.

However, the question arises, how significant are parameters a and b? What is the margin of error?

Estimation of the error value of a linear single-factor equation

Let’s denote the difference between the actual value of the effective feature and its calculated value as The concept of correlation and regression analysis:

The concept of correlation and regression analysis, where

The concept of correlation and regression analysisThe actual value of y

The concept of correlation and regression analysis calculated value y,

The concept of correlation and regression analysis is the difference between the two.

As a measure of the total error, the following value is chosen:

The concept of correlation and regression analysis .

For our example , S = 0.432.

Since The concept of correlation and regression analysis (the average value of the residues) is zero, the total error is equal to the residual variance:

The residual variance is found according to the formula:

The concept of correlation and regression analysis

For our exampleThe concept of correlation and regression analysis, we can show that

The concept of correlation and regression analysis .

If The concept of correlation and regression analysis The concept of correlation and regression analysisthen The concept of correlation and regression analysis

The concept of correlation and regression analysis that The concept of correlation and regression analysis

Thus, The concept of correlation and regression analysis .

It is easy to see that if The concept of correlation and regression analysis, then

The concept of correlation and regression analysis

This ratio shows that in economic applications, the permissible total error can be no more than 20% of the variance of the resulting feature The concept of correlation and regression analysis.

The standard error of the equation is found in the formula:

The concept of correlation and regression analysisWhere is

The concept of correlation and regression analysis – residual dispersion. In our case The concept of correlation and regression analysis, .

The relative error of the regression equation is calculated as:

The concept of correlation and regression analysis

where The concept of correlation and regression analysis is the standard error;

The concept of correlation and regression analysis is the average value of the resulting feature.

In our case The concept of correlation and regression analysis , = 7.07%.

If the value is The concept of correlation and regression analysis small and there is no autocorrelation of residues, then the predictive qualities of the estimated regression equation are high.

The standard error of the coefficient b is calculated by the formula:

The concept of correlation and regression analysis

In our case, it is equal to The concept of correlation and regression analysis.

To calculate the standard error of the coefficient a, the formula is used:

The concept of correlation and regression analysis

In our example The concept of correlation and regression analysis, .

Standard coefficient errors are used to estimate the parameters of the regression equation.

Coefficients are considered significant if

The concept of correlation and regression analysis

In our example The concept of correlation and regression analysis

The coefficient is not significant, because this ratio is greater than 0.5, and the relative error of the regression equation is too high – 26.7%.

Standard coefficient errors are also used to estimate the statistical significance of coefficients using student’s criterion t. The values of the Student’s criterion t are found in reference books on mathematical statistics. Table 2.1 shows some of its values.

The following are the maximum and minimum values of the parameters (The concept of correlation and regression analysis) according to the formulas:

The concept of correlation and regression analysis

Table 2.1 Resource requirements by component

Some values of t – Student’s criterion

Degrees of freedom

Trust level (c)

(n-2)

0,90

0,95

1

6,31

12,71

2

2,92

4,30

3

2,35

3,18

4

2,13

2,78

5

2,02

2,57

For our example we find:

The concept of correlation and regression analysisThe concept of correlation and regression analysis

The concept of correlation and regression analysisThe concept of correlation and regression analysis

If the interval (The concept of correlation and regression analysis) is small enough and does not contain zero, then the coefficient b is statistically significant at the c-percentage confidence level.

Similarly, there are maximum and minimum values of parameter a. For our example:

The concept of correlation and regression analysis

Coefficient a is not statistically significant because the interval (The concept of correlation and regression analysis) is large and contains zero.

Conclusion: the results obtained are not significant and cannot be used for forecast calculations. The situation can be corrected in the following ways:

(a) Increase the number of n;

b) increase the number of factors;

c) change the shape of the equation.

The problem of autocorrelation of residues. Darbin-Watson criterion

Often, time series are used to find regression equations, i.e. a sequence of economic indicators for a number of years (quarters, months) following each other.

In this case, there is some dependence of the subsequent value of the indicator, on its previous value, which is called autocorrelation. In some cases, this kind of dependence is very strong and affects the accuracy of the regression coefficient.

Let the regression equation be constructed and have the form:

The concept of correlation and regression analysis

The concept of correlation and regression analysis is the error of the regression equation in year t.

The phenomenon of autocorrelation of residues is that in any year t the residue The concept of correlation and regression analysis is not a random variable, but depends on the size of the remnant of the previous year The concept of correlation and regression analysis.

To determine the presence or absence of autocorrelation, the Darbin-Watson criterion is applied:

The concept of correlation and regression analysis.

The possible values of the DW criterion range from 0 to 4. If there is no autocorrelation of residues, then DW≈2.

Construction of the power regression equation

The equation of power aggression is:

The concept of correlation and regression analysisWhere is

a, b – parameters that are determined from the data of the observation table.

The table of observations is compiled and has the form:

x

x1

x2

xn

y

y1

y2

yn

Prologarithm the original equation and as a result we get:

The concept of correlation and regression analysisln y = ln a + b⋅ln x .

Denote ln y with The concept of correlation and regression analysis, ln a as The concept of correlation and regression analysis, and ln x as The concept of correlation and regression analysis.

As a result of substitution, we get:

The concept of correlation and regression analysis

This equation is nothing more than a linear regression equation, the parameters of which we are able to find.

To do this, let’s prologarithm the initial data:

ln x

ln x1

ln x2

ln xn

ln y

ln y1

ln y2

ln yn

Next, it is necessary to perform the known computational procedures for finding the coefficients a and b, using the prologarithmated initial data. As a result, we get the value of the coefficient b and The concept of correlation and regression analysis. The parameter a can be found by the formula:

The concept of correlation and regression analysis.

For the same purpose, you can use the EXP function in Excel.

Two-factor and multifactor regression equations

The linear two-factor regression equation is:

The concept of correlation and regression analysis,

where The concept of correlation and regression analysisare the parameters;

The concept of correlation and regression analysis – exogenous variables;

y is an endogenous variable.

Identifying this equation is best done by using the Excel LINEST function.

The power two-factor regression equation is:

The concept of correlation and regression analysis

where The concept of correlation and regression analysisare the parameters;

The concept of correlation and regression analysis – exogenous variables;

Y is an endogenous variable.

To find the parameters of this equation, it is necessary to prologarithm it. As a result, we get:

The concept of correlation and regression analysisThe concept of correlation and regression analysis .

Identifying this equation is also best done by using the Excel LINEST function. It should be remembered that we will not get the parameter a, but its logarithm, which should be converted to a natural number.

The linear multifactorial regression equation is:

The concept of correlation and regression analysis

where The concept of correlation and regression analysisn are the parameters;

The concept of correlation and regression analysisn – exogenous variables;

y is an endogenous variable.

Identifying this equation is also best done by using the Excel LINEST function.

Актуальное

Основные модели, сущность, принципы и функции социальной рыночной экономики

Хотя Германия – хрестоматийный пример страны с социальным рыночным...

Воплощение и реализация стратеги

Воплощение и реализация стратегии требуют выбора и проведения конкретных...

Государственное регулирование распределения доходов

Формирование совокупных доходов населения охватывает их производство, распределение, перераспределение...

О самых значимых делах

О самых значимых делах в отношении "белых воротничков" Минской...

Теория соотношения факторов производства

В своих теориях абсолютного и относительного преимущества Смит и...
Темы

Трудовые ресурсы Республики Беларусь

Воздействие демографических факторов, о которых говорилось выше, проявляется сегодня...

Государственное регулирование распределения доходов

Формирование совокупных доходов населения охватывает их производство, распределение, перераспределение...

ОБЩАЯ ХАРАКТЕРИСТИКА ФРАНЦИИ

Площадь Франции – 551,5 тыс. кв. км, население –...

Международное регулирование миграционных процессов

Практически все страны мира активно воздействуют на процессы экспорта...

ОСОБЕННОСТИ РАЗВИТИЯ ЭКОНОМИКИ США

Успехи американской экономики достигнуты во многом благодаря ряду...

Формы внешнеэкономических связей

В мировой практике применяются самые разные формы экономического сотрудничества...

Бизнес-модели и стратегии в электронной коммерции

Развитие Internet-технологий и электронной коммерции формирует экономику будущего и...

Подходы к классификации услуг

Понятие «услуга» включает в себя комплекс многообразных видов экономической...
Статьи по теме

Популярные категории