代写*OverviewRegressionAnalysiswithCrossSectionalData

  • 100%原创包过,高质代写&免费提供Turnitin报告--24小时客服QQ&微信:120591129
  • *
    *
    代写*OverviewRegressionAnalysiswithCrossSectionalData 

    Definition of the multiple linear regression model
    *
    *Motivation for multiple regression
    *Incorporate more explanatory factors into the model
    *Explicitly hold fixed other factors that otherwise would be in
    *Allow for more flexible functional forms
    *
    *Example: Wage equation
    *
    *Interpretation of the multiple regression model
    *
    *
    *
    *
    *The multiple linear regression model manages to hold the values     of other explanatory variables fixed even if, in reality, they are correlated with the explanatory variable under consideration
    *„Ceteris paribus“-interpretation
    *It has still to be assumed that unobserved factors do not change if the explanatory variables are changed
    *
    *Example: Determinants of college GPA
    *
    *
    *
    *
    *Interpretation
    *Holding ACT fixed, another point on high school grade point average is associated with another .453 points college grade point average
    *Or: If we compare two students with the same ACT, but the hsGPA of student A is one point higher, we predict student A to have a colGPA that is .453 higher than that of student B
    *Holding high school grade point average fixed, another 10 points on ACT are associated with less than one point on college GPA
    *
    *Standard assumptions for the multiple regression model
    *Assumption MLR.1 (Linear in parameters)
    *
    *
    *Assumption MLR.2 (Random sampling)
    *
    *Standard assumptions for the multiple regression model (cont.)
    *Assumption MLR.3 (No perfect collinearity)
    *
    *
    *Remarks on MLR.3
    *The assumption only rules out perfect collinearity/correlation bet-ween explanatory variables; imperfect correlation is allowed
    *If an explanatory variable is a perfect linear combination of other explanatory variables it is superfluous and may be eliminated
    *Constant variables are also ruled out (collinear with intercept)
    *
    *Example for perfect collinearity: small sample
    *
    *
    *
    *
    *Example for perfect collinearity: relationships between regressors
    *
    *Standard assumptions for the multiple regression model (cont.)
    *Assumption MLR.4 (Zero conditional mean)
    *
    *
    *
    *In a multiple regression model, the zero conditional mean assumption is much more likely to hold because fewer things end up in the error
    *Example: Average test scores
    *
    *Discussion of the zero mean conditional assumption
    *Explanatory variables that are correlated with the error term are called endogenous; endogeneity is a violation of assumption MLR.4
    *Explanatory variables that are uncorrelated with the error term are called exogenous; MLR.4 holds if all explanat. var. are exogenous
    *Exogeneity is the key assumption for a causal interpretation of the regression, and for unbiasedness of the OLS estimators

    *Theorem 3.1 (Unbiasedness of OLS)
    *
    *Unbiasedness is an average property in repeated samples; in a given sample, the estimates may still be far away from the true values
    *
    *Including irrelevant variables in a regression model
    *
    *
    *
    *

    *Omitting relevant variables: the simple case;
    *
    *
    *
    *Conclusion: All estimated coefficients will be biased
    *
    *
    *Standard assumptions for the multiple regression model (cont.)
    *Assumption MLR.5 (Homoscedasticity)
    *
    *

    *Example: Wage equation
    *
    *
    *Short hand notation
    *
    *Assumption MLR.6 (Normality of error terms)
    *
    代写*OverviewRegressionAnalysiswithCrossSectionalData 
    *Theorem 3.2 (Sampling variances of OLS slope estimators)
    *
    *
    *An example for multicollinearity
    *
    *Discussion of the multicollinearity problem
    *In the above example, it would probably be better to lump all expen-diture categories together because effects cannot be disentangled
    *In other cases, dropping some independent variables may reduce multicollinearity (but this may lead to omitted variable bias)
    *Only the sampling variance of the variables involved in multicollinearity will be inflated; the estimates of other effects may be very precise
    *Note that multicollinearity is not a violation of MLR.3 in the strict sense
    *Multicollinearity may be detected through „variance inflation factors“
    *
    *Estimating the error variance
    *
    *
    *
    *
    *
    *
    *
    *
    *Theorem 3.3 (Unbiased estimator of the error variance)
    *
    *Efficiency of OLS: The Gauss-Markov Theorem
    *Under assumptions MLR.1 - MLR.5, OLS is unbiased
    *However, under these assumptions there may be many other estimators that are unbiased
    *Which one is the unbiased estimator with the smallest variance?
    *In order to answer this question one usually limits oneself to linear estimators, i.e. estimators linear in the dependent variable
    *
    *Theorem 3.4 (Gauss-Markov Theorem)
    *Under assumptions MLR.1 - MLR.5, the OLS estimators are the best linear unbiased estimators (BLUEs) of the regression coefficients, i.e.
    *
    *
    *
    *
    *
    *OLS is only the best estimator if MLR.1 – MLR.5 hold; if there is heteroscedasticity for example, there are better estimators.
    *
    *Estimation of the sampling variances of the OLS estimators
    *
    *
    *
    *
    *

    *
    *Note that these formulas are only valid under assumptions   MLR.1-MLR.5 (in particular, there has to be homoscedasticity)
    *
    *Terminology
    *
    *
    *
    *Theorem 4.1 (Normal sampling distributions)
    *
    *Testing hypotheses about a single population parameter
    *Theorem 4.1 (t-distribution for standardized estimators)
    *
    *
    *
    *
    *
    *Null hypothesis (for more general hypotheses, see below)
    *
    *t-statistic (or t-ratio)
    *
    *
    *
    *

    *Distribution of the t-statistic if the null hypothesis is true
    *

    *Goal: Define a rejection rule so that, if it is true, H0 is rejected only with a small probability (= significance level, e.g. 5%)
    *
    *Testing against one-sided alternatives (greater than zero)
    *
    *Example: Wage equation
    *Test whether, after controlling for education and tenure, higher work experience leads to higher hourly wages
    *
    *Example: Wage equation (cont.)
    *
    *Testing against one-sided alternatives (less than zero)
    *
    *Example: Student performance and school size
    *Test whether smaller school size leads to better student performance
    *
    *Example: Student performance and school size (cont.)
    *
    *Example: Student performance and school size (cont.)
    *Alternative specification of functional form:
    *
    *Example: Student performance and school size (cont.)
    *
    *Testing against two-sided alternatives
    *
    *Example: Determinants of college GPA
    *
    *„Statistically significant“ variables in a regression
    *If a regression coefficient is different from zero in a two-sided test, the corresponding variable is said to be „statistically significant“
    *If the number of degrees of freedom is large enough so that the nor-mal approximation applies, the following rules of thumb apply:
    *
    *Guidelines for discussing economic and statistical significance
    *If a variable is statistically significant, discuss the magnitude of the coefficient to get an idea of its economic or practical importance
    *The fact that a coefficient is statistically significant does not necessa-rily mean it is economically or practically significant!
    *If a variable is statistically and economically important but has the „wrong“ sign, the regression model might be misspecified 
    *If a variable is statistically insignificant at the usual levels (10%, 5%, 1%), one may think of dropping it from the regression
    *If the sample size is small, effects might be imprecisely estimated so that the case for dropping insignificant variables is less strong
    *
    *Testing more general hypotheses about a regression coefficient
    *Null hypothesis
    *
    *
    *t-statistic
    *
    *
    *
    *The test works exactly as before, except that the hypothesized value is substracted from the estimate when forming the statistic
    *
    *Example: Campus crime and enrollment
    *An interesting hypothesis is whether crime increases by one percent   if enrollment is increased by one percent
    *
    *Computing p-values for t-tests
    *If the significance level is made smaller and smaller, there will be a point where the null hypothesis cannot be rejected anymore
    *The reason is that, by lowering the significance level, one wants to avoid more and more to make the error of rejecting a correct H0
    *The smallest significance level at which the null hypothesis is still rejected, is called the p-value of the hypothesis test
    *A small p-value is evidence against the null hypothesis because one would reject the null hypothesis even at small significance levels
    *A large p-value is evidence in favor of the null hypothesis
    *P-values are more informative than tests at fixed significance levels
    *
    *How the p-value is computed (here: two-sided test)
    *
    *Confidence intervals
    *Simple manipulation of the result in Theorem 4.2 implies that
    *
    *
    *
    *
    *
    *Interpretation of the confidence interval
    *The bounds of the interval are random
    *In repeated samples, the interval that is constructed in the above way will cover the population regression coefficient in 95% of the cases
    *
    *Confidence intervals for typical confidence levels
    *
    *
    *
    *
    *

    *Relationship between confidence intervals and hypotheses tests
    *
    *Example: Model of firms‘ R&D expenditures
    *
    *Testing hypotheses about a linear combination of parameters
    *Example: Return to education at 2 year vs. at 4 year colleges
    *
    *Impossible to compute with standard regression output because
    *
    *
    *Alternative method
    *
    *Estimation results
    *
    *
    *
    *
    *
    *
    *
    *
    *This method works always for single linear hypotheses
    *
    *Testing multiple linear restrictions: The F-test
    *Testing exclusion restrictions
    *
    *Estimation of the unrestricted model
    *
    *Estimation of the restricted model
    *
    *
    *
    *
    *
    *
    *Test statistic
    *
    *Rejection rule (Figure 4.7)
    *
    *Test decision in example
    *
    *
    *
    *
    *
    *
    *Discussion
    *The three variables are „jointly significant“
    *They were not significant when tested individually
    *The likely reason is multicollinearity between them
    *
    *Test of overall significance of a regression
    *
    *
    *
    *
    *
    *
    *
    *
    *The test of overall significance is reported in most regression packages; the null hypothesis is usually overwhelmingly rejected
    *
    *Testing general linear restrictions with the F-test
    *Example: Test whether house price assessments are rational
    *
    *Unrestricted regression
    *
    *
    *Restricted regression
    *
    *
    *Test statistic
    *
    *Regression output for the unrestricted regression
    *
    *
    *
    *
    *
    *
    *
    *The F-test works for general multiple linear hypotheses
    *For all tests and confidence intervals, validity of assumptions MLR.1 – MLR.6 has been assumed. Tests may be invalid otherwise.
    *
    *Models with interaction terms
    *
    *
    *
    *
    *
    *
    *Interaction effects complicate interpretation of parameters
    *
    *Reparametrization of interaction effects
    *
    *
    *
    *

    *Advantages of reparametrization
    *Easy interpretation of all parameters
    *Standard errors for partial effects at the mean values available
    *If necessary, interaction may be centered at other interesting values
    *
    *Qualitative Information
    *Examples: gender, race, industry, region, rating grade, …
    *A way to incorporate qualitative information is to use dummy variables
    *They may appear as the dependent or as independent variables
    *
    *A single dummy independent variable
    *
    *Dummy variable trap
    *
    *Estimated wage equation with intercept shift
    *
    *
    *
    *
    *
    *
    *Does that mean that women are discriminated against?
    *Not necessarily. Being female may be correlated with other produc-tivity characteristics that have not been controlled for.
    *
    *Using dummy explanatory variables in equations for log(y)
    *
    *
    *Using dummy variables for multiple categories
    *1) Define membership in each category by a dummy variable
    *2) Leave out one category (which becomes the base category)
    代写*OverviewRegressionAnalysiswithCrossSectionalData 
    *