## Economic and econometric specification

It is not uncommon for economists to depart from the proper economic and econometric specifications of the model because the observed variables are not precisely as required. For example, in an economic analysis of technology choice and efficiency on Australian dairy farms (Kompas and Che, 2006) a variable representing feed concentration was not available. Average grain feed (in kilograms per cow) was therefore used as a proxy for this variable in the inefficiency model. As a consequence, the economic interpretation of the resulting analysis was compromised. Policy implications drawn from this analysis should also be interpreted carefully because of this divergence between the fitted econometric model and the underlying economic theory.

In general, missing variables, or inclusion of incorrect variables in the econometric model, can cause results to be insignificant when they should be significant, and vice versa. For example, fodder used may be more easily measured in terms of value rather than quantity. However, value also depends on price, which varies from year to year, and from region to region. Therefore, economic analysis that uses value may not give the correct policy implication for the supply of feed to farms.

There are a number of widely available diagnostic tests that can be used to assess the statistical significance of econometric results. The most common of these tests are for the correct functional form, autocorrelation, heteroscedasticity, multicollinearity and normally distributed residuals. Correct functional form means both the inclusion of the correct variables (including interactions where necessary) as well as the appropriate scale to be used in modelling the relationship between the dependent and independent variables (log, linear, etc.). The econometric model specification is often limited by data availability, but the choice of functional form for the model may be quite flexible within the data range. If an econometric specification fails the diagnostic test for functional form, then it may not be appropriate for the specified economic analysis. However, even if the functional specification is appropriate for the observed data, there is always the danger of drawing false conclusions if other diagnostic checks are not performed.

Autocorrelation occurs when the model error term for an observation is correlated with similar error terms associated with other observations that are 'close' to it in space or in time. Thus, in a time series model one can assume that the error term at time t is related to the error term at time t — 1 (lag 1 autocorrelation), in the sense that it can be written in the form et = pet—1 + ut, where |p | < 1 is the autocorrelation coefficient between the two error terms, and ut is a disturbance term whose distribution is the same at any time point and is uncorrelated across time. Under this type of autocorrelation structure, ordinary least square (OLS) estimators are unbiased and consistent, but inefficient. In particular, the true variance of estimators is inflated compared to the no-autocorrelation case (p = 0); second, estimated variances of the coefficient estimates are smaller (biased downward); third, the presence of autocorrelation causes an increase in the coefficient t statistics (biased upwards), making the estimate appear more significant than it actually is; and fourth, the presence of autocorrelation also causes the estimated fit of the model to the data to appear better than it actually is. Dealing with serial autocorrelation in linear regression is relatively straightforward (Greene, 2008; Maddala, 2001).

Many commonly used statistical techniques for model fitting are efficient provided a number of assumptions about the model and the underlying data hold true, and so it is important to be aware of these assumptions and of the consequences if they are incorrect. Thus, the OLS method of fitting a linear regression model is efficient when the error term has constant variance. This will be true if these terms are drawn from the same distribution. The data are said to be heteroscedastic when the variance of the error term varies from observation to observation. This is often the case with cross-sectional or time series data, and can arise when important variables are omitted from the economic model. Heteroscedasticity does not cause OLS coefficient estimates to be biased. However, the variance (and, thus, standard errors) of the coefficients tends to be underestimated, t statistics become inflated and sometimes insignificant variables appear to be statistically significant (Greene, 2008; Maddala, 2001). One approach to dealing with heteroscedasticity is to model it explicitly, as in Carroll and Ruppert (1988).

Multicollinearity occurs when the model used in a multiple regression analysis includes explanatory variables that are highly correlated, or where linear combinations of the explanatory variables are highly correlated. In such situations, parameter estimates and their variance estimates become highly unstable. Slight changes to the data used to fit the model, or removal of a statistically insignificant covariate, can result in radical changes to the estimates of the remaining coefficients. Belsey et al. (1980) contains an extensive discussion of this issue and methods for dealing with it. For example, one approach is to replace collinear variables by their principal components. Another approach when exclusion, or replacement, of covariates is unacceptable is ridge regression (Hoerl and Kennard, 1970).

It is quite common to observe highly non-normal residual distributions when modelling economic data. One of the main causes is the presence of outlying observations, which can be highly influential on the outcome of the modelling process. One approach, at least for OLS regression, is to identify and remove such observations from the analysis using Cook's D statistic (Cook, 1977). Because outliers are discarded, this approach can sometimes result in too optimistic an assessment of the fit of the model. An alternative approach is to use a robust fitting method that 'accommodates' outliers, such as M-regression (Huber, 1981), which automatically downweights outlying observations when fitting the regression model.

## Post a comment