Hello all over, this tutorial is how to do simple linear regression and multiple linear regression using R programming we participate in using for example.

The video data above can be downloaded at this link.

Example #2 we will use data from our research published in programming r the Journal of Management and Entrepreneurship, please download the data table12reg.xlsx. As usual, we will import data into RStudio first before starting data distribution. Our regression model is as in the image below, KM Linear Regression Model, CI & Junior High School

The first step is that we can do a simple linear regression operation command & multiple of our data & model image above as follows,

Compiling examples of Simple Linear Regression

For example, our simple linear regression equation is: Junior High School = a + b.KM + ewhere a is a constant and b the regression coefficient, the name of the variable that we will use in the command R must correspond to using the column name in our table12reg, then the command for example below,> regModelku <- lm(Junior High School~KM, data = table12reg)

Compiling examples of Multiple Linear Regression

For example, our multiple linear regression equations are:Junior High School = a +b1.KM + b2. CI + ewhere a is a constant, b1 and b2 are the regression coefficients, the name of the variable we will use in the command R must correspond to the name of the column in our table12reg, then the command is as below,> regModelku <- lm(SMP~KM+CI, data = table12reg)

Before discussing the results of multiple linear regression tests we need to perform linear regression forecast tests to test normality, multicollinearity, heterokedastity, autocorrelation & linearity according to our regression model.

Residual testing of the regression example whether it is distributed normally then performed the Normality test, using the command below,> par(mfrow=c(dua,dua))> plot(regModelku)

then in windows Plot will appear a graph for example below, Graph for normality check

We see in the Normal Q-Q graph (top right corner) explains the data points are around a straight line, then it can be said to be normal distributed, so it can be said that the example of regression meets the assumption of Normality.

Testing the relationship between independent variables to see that there is no close relationship between independent variables can be done looking at the VIF value using the vif() command on the package car (you may need to install this package first), as below,> library(car)> vif(regModelku)KMCI1.529073 1.529073

We see that the VIF value for KM and CI is below two, then it can be said that there is no multicollinierity and for example we meet the assumption of Multicollinearity.

Measuring whether there is a residual variance inequality (Heteroskedasticity) based on measurement data can be done by looking at the distribution on the Residuals vs Vitted graph with the same command in the Normality test.> par(mfrow=c(2,2))> plot(regModelku)

then in windows Plot will appear a graph as below, Heteroskedastity Test

We see in the Residuals vs Fitted graph (top left corner) that the data is scattered & does not form one exclusive pattern, so it can be said that there is no difference in residual variance and our model meets the heteroskedasticity estimate test.

The autocorrelation test is performed to test whether there is an impact according to the previous data on the new data, this test is done for the time series data, using the dwtest() command on the lmtest package (you may need to install this package first) using the command below,> library(lmtest)> dwtest(regModelku)Durbin-Watson testdata:regModelkuDW = 1.9522, p-value = 0.3891alternative hypothesis: true autocorrelation is greater than 0

Based on the results of the above test, using the Durbin-Watson p-value test value more based on 0.05, it can be said that there is no impact between previous and modern data, it can be said that there is no autocorrelation estimate between regression sample data is met.

Linear relationship testing between dependent and independent variables is performed to meet the approximate linear relationship between these variables, with the crPlots() command of the conf package (you may need to install this package first) below,> library(conf)> crPlots(regModelku)

then on windows Plot will appear a graph as below,Linearity test graph between variables

from the graph above it appears that the position of 2 (two) out-of-sync hue lines (dotted lines and connecting lines) is in an almost tight position, is an independent variable has linearity using dependent variables, it can be said that the deviation model linerity estimate is met.

Read multiple linear regression results

The final stage based on the multiple linear regression process is an interpretation of the results can be done with the command below,> summary(regModelku)Call:lm(formula = Junior High School ~ KM + CI, data = table12reg)Residuals:Min1QMedian3QMax-6.8578 -0.91470.07281.06315.1673Coefficients:Estimate Std. Error t value Pr(>danlt; 2e-16 ***Signif codes:0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1Residual standard error: 2.005 on 142 degrees of freedomMultiple R-squared:0.6235,Adjusted R-squared:0.6182F-statistic: 117.6 on 2 and 142 DF,p-value: < 2.2e-16

according to the test output table on top obtained our multiple linear regression equation is

SMP = 1.35207 + 0.10416KM + 0.74446CI + e

We can state the test results of the double linear regression example above that the impact of KM and CI to junior high school is significant using the p-value value on) is smaller according to 0.05, each increase of one km unit will provide an increase of 0.104 in junior high school, and each increase of one unit in CI will put an increase of 0.744 in junior high school. Our model has an Adj value. R2 as much as 0.6182 or variables KM and CI are able to reveal 61.82% of the factors that affect junior high school. In the F-statistic test obtained p-value < two.2e-16 is much more mini according to 0.05 showing an example of regression is very good.