Saturday, 15 September 2012

Analysis of Residuals

  • A residual is the difference between the actual value of Y and the predicted value Y'.
  • Residual should be approximately normally distributed. Histograms and stem and leaf chars are useful in checking this requirement.
  • A plot of the residuals and their corresponding Y' values is used for showing that there are no trends or patterns in the residuals.

Qualitative Variables & Stepwise Regression

  • Qualitative variables are non-numeric and are also called dummy variables.
    • For a qualitative variable, there are only two conditions possible
  • Stepwise Regression leads to the most efficient regression equation.
    • Only independent variables with significant regression coefficients are entered into the analysis. Variables are entered in the order in which they increase R^2 the fastetst.

Test for individual variables

  • This test is used to determine which independent variables have non zero regression coefficients.
  • The variables that have zero regression coefficients are usually dropped from the analysis.
  • The test statistic is the t distribution with n-(k+1) degrees of freedom.

Global Test

  • The global test is used to investigate whether any independent variables have significant coefficients. The hypotheses are:
The test statistic is the F distribution with k ( number of independent variables) and n(k+1) degrees of freedom, where n is the sample size.

Correlation Matrix

  • A correlation matrix is used to show all possible simple correlation coefficients between all variables.
    • The matrix is useful for locating correlated independent variables.
    • How strongly each independent variable is correlated to the dependent variable is shown in the matrix.

The ANOVA table

  • The ANOVA table gives the variation in the dependent variable ( of both that which is and is not explained by the regression equation).
  • It is used as a statistical technique or test in detecting the differences in population means or whether or not the means of different groups are all equal when you have more than two populations.

Multiple Regression and Correlation

  • The independent variables and the dependent variable have a linear relationship.
  • The dependent variable must be continuous and at least interval-scale.
  • The variation in (Y-Y') or residual must be the same for all values of Y. When this is the case, we say the difference exhibits homoscedasticity.
  • The residuals should be normally distributed with mean 0.
  • Successive values of the dependent variable must be uncorrelated.