Monday, 1 October 2012

Theory revision 2

  • population
    • it is a collection of possible individuals, objects or measurement of interest
    • example: population of Australia.
  • sample
    • it is a portion or part of the population of interest
    • example: 30 of UTS students
  • parameter
    • it is a measurable characteristic of a population
    • example: total population of Melbourne
  • statistic
    • it is a measurable characteristic of a sample.
    • example: 15 of wine lovers
  • statistical inference
    • it is the process of drawing conclusion from data subject to random variation such as observational errors and sample variation.
    • example: 15 percent of beer lovers

Theory revision 1

  • Statistic is the science of collecting, organizing, presenting, analyzing and interpreting numerical data to assist in making more effective decision. This technique is used extensively by marketing, accounting, quality control, consumers, professional sports people, hospital administrators , educators, politician and so on,
  • Descriptive statistics is the methods of organizing, summarizing and presenting data in an informative way.
  • Inferential statistics is a decision, estimate, prediction or generalization about a population, based on sample.
  • Qualitative data is the characteristic or variable being studied is non numeric.
  • Quantitative data is the variable that can be reported numerically.
  • A population is a collection of possible individuals, objects or measurement of interest.
  • A sample is a portion or part of the population of the interest.

Homoscedasticity

In statistics, a sequence or a vector of random variables is homoscedastic, if all random variables in the sequence or vector have the same finite variance.

Sunday, 30 September 2012

Theory revision 5

The multiple standard error of estimate is a measure of the effectiveness of the regression equation.
  • It is measured in the same units as the dependent variable.
  • It is difficult to determine what is a large value and what is a small value of the standard error.
  • The independent variables and the dependent variable have a linear relationship.
  • The dependent variable must be continuous and at least interval scale.
  • The variation in (Y-Y') or residual must be the same for all values of Y. When this is the case, we say the difference exhibits homoscedaticity.
  • A residual is the difference between the actual value of Y and the predicted value Y'.
  • Residuals should be approximately normally distributed, Histograms and stem and leaf charts are useful in checking this requirement.
  • The residuals should be normally distributed with mean 0.
  • Successive values of the dependent variable must be uncorrelated.

The ANOVA table 

  • The ANOVA table gives the variation in the dependent variable(of both that which is and is not explained by the regression equation),
  • It is used as a statistical technique or test in detecting the differences in population means or whether or not the means of different groups are all equal when you have more than two population.

Correlation Matrix 

  • A correlation matrix is used to show all possible simple correlation coefficients between all variables.
    • the matrix is useful for locating correlated independent variables.
    • How strongly each independent variable is correlated to the dependent variable is shown in the matrix.

Global Test 

  • The global test is used to investigate whether any of the independent variables have significant coefficients. 
  • The test statistic is the F distribution with k (number of independent variables) and n-(k+1) degree of freedom, where n is the sample size.

Test for individual variables

  • This test is used to determine which independent variables have non zero regression coefficient.
  • The variables that have zero regression coefficients are usually dropped from the analysis.
  • The test statistic is the t distribution with n-(k+1) degrees of freedom.

Qualitative Variables and Stepwise Regression

  • Qualitative variables are non numeric and also called dummy variables
    • For a qualitative variable, there are only two conditions possible.
  • Stepwise Regression leads to the most efficient regression equation
    • Only independent variables with significant regression coefficients are entered into the analysis. Variables are entered in the order in which they increase R^2 the fastest.

Theory revision 4

Hypothesis Testing

  • Testing hypothesis is an essential part of statistical inference.
  • A statistical hypothesis is an assumption about a population. This assumption may or may not be true.
  • For example, claiming that a new drug is better than the current drug for treatment of the same symptons.
  • The best way to determine whether a statistical hypothesis is true would be to examine the entire population.
  • Since that is often impractical, researchers examine a random sample from the population.If sample data are consistent with the statistical hypothesis, the hypothesis is accepted if not, it is rejected.
  • Null Hypothesis: the null hypothesis denoted by H0 is usually the hypothesis that sample observations result purely from chance.
  • Alternative Hypothesis: The alternative hypothesis denoted by H1 is the hypothesis that sample observations are influenced by some non random cause.
  • Statisticians follow a formal process to determine whether to accept or reject a null hypothesis based on sample data. This process, called hypothesis testing consists of five steps.
    1. State null and alternative hypothesis
    2. write down relevant data, select a level of significance
    3. Identify and compute the test statistic, Z to be used in testing the hypothesis.
    4. Compute the critical values Zc.
    5. Based on the sample arrive a decision.
  • Decision errors
    • Type I error: A type I error occurs when the null hypothesis is rejected when it is true. Type I error is called the significance level. This probability is also denoted by Alpha. α
    •  Type II error: A type II error occurs when the researcher accepts a null hypothesis that is false.
  •  One-tailed and two-tailed tests
    • A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling distribution is called a one tailed test.
    • For example, suppose the null hypothesis states that the mean is equal to or more than 10. The alternative hypothesis would be that the mean is less than 10.
    • The region of rejection would consist of a range of numbers located on the left side of sampling distribution that is a set of numbers less than 10.
    • A test of a statistical hypothesis where the region of rejection is on both sides of the sampling distribution is called a two tailed test.
    • For example, suppose the null hypothesis would be that the mean is equal to 10, the alternative hypothesis would be that the mean is less than 10 or greater than 10.
    • The region of rejection would consist of a range of numbers located on both sides of sampling distribution; that is the region of rejection would consist partly of numbers that were less than 10 and partly numbers that were greater than 10.
  • Degree of freedom
    • The concept of degree of freedom is central to the principle of estimating  Statistics of population from samples of them.
    • It is the number of scores that are free to try.

Theory revision 3

Bernuolli Trials 

  • Random with two outcomes (success or failure)
  • Random variable X often coded as 0(failure) and 1(success)
  • Bernoulli trail has probability of success usually denoted p.
  • Accordingly probability of failure (1-p) is ususally denoted
    • q=1-p
    • where x can be zero or one.
    • probability of Bernoulli Distribution is;  

Binomial distribution

  1. identical number of trials
  2. the binomial distribution which consists of a fix number of statistically independent BErnoulli trials.
  3. 2 possible outcome for each trials(success or failure)
  4. each trial is independent(does not affect the others)
  5. probability of success is the same for each trial
  6. Shapes of binomial distribution
    • if p<0.5: the distribution will exhibit positive skew
    • if p=0.5: the distribution will be symmetirc
    • if p>0.5: the distribution will exhibit negative skew

Poisson Random Variable 

  • Poisson random variable represents the number of independent events that occur randomly over unit of times.
  • Count number of times as event occur during a given unit of measurement.
  • Number of events that occur in one unit is independent of other units.
  • Probability that events occurs over given unit is identical for all units.(constant rate)
  • Events occur randomly
  • Expected number of events(rate) in each unit is denoted by λ(lambda)