Data access and dissemination

In the framework of the FNAC, data access and dissemination represented a very important opportunity to experiment with new techniques and technologies. The Second 1 Training courses on hardware were concerned with personal computers, servers, Unix machines, networks, maintenance, while training courses on software dealt with operative systems (MS DOS, Unix, Windows, Windows NT), programming languages, spreadsheets, word processors, databases, and statistical software (especially SAS). 2 Many...

Agriculture statistics a centralized approach

The production and dissemination of national and provincial estimates on the agriculture sector is part of Statistics Canada's mandate. The statistical agency carries out monthly, quarterly, annual and or seasonal data collection activities related to crop and livestock surveys and farm finances as needed. It also conducts the quinquennial Census of Agriculture in conjunction with the Census of Population (to enhance the availability of 0033 947 0715& lang e socio-economic data for this...

Preconstruction analysis

Before building a new frame, analysis is conducted to determine which states are most in need of one. Generally three to four states are selected to receive a new frame each year. Data collected from approximately 11000 segments during the JAS is used to determine the extent to which the land-use stratification has deteriorated for each state. This involves comparing the coefficients of variation for the survey estimates of major items over the life of the frame. Typically states with the...

Farm structure surveys

The farm structure survey (FSS) is considered in UNECE countries to be the backbone of the agricultural statistics system. Together with agricultural censuses, FSSs make it possible to undertake policy and economic analysis at a detailed geographical level. This type of analysis at regular time intervals is considered essential. In the EU, several simplifications have been made in recent years. From 2010 the frequency of FSSs will be reduced from every two to every three years. The decennial...

Small and diversified farm operations

For agricultural statistics in general, and for the EU and the NASS survey programmes in particular, the coverage (for the different crops such as acres of corn or the number of cattle represented by the farms on the frame) is a very important issue. In the regulations used for EU statistics, the desired accuracy and coverage are described in detail. Furthermore, countries are requested to provide detailed metadata and quality information. In general, active records eligible for survey samples...

A simulated exercise

The proposed algorithm has been applied to simulated data to test its performances. The first stage of the simulation experiment consisted in generating 100 bivariate observations from a variable (X, Y) on a 10 x 10 regular lattice, with X uniformly distributed U (0,1) and Y generated according to with N(0, ae 1 a), considered henceforth as the true error distribution. In more detail, we obtain different observations that are classified in a certain number K of groups, by using different values...

ABARE broadacre survey data

The data used in these case studies was obtained from the annual broadacre survey run by the Australian Bureau of Agriculture and Resource Economics (ABARE) from 1977-78 to 2006-07 (ABARE, 2003). The survey covers broadacre agriculture across all of Australia (see Figure 20.1), but excludes small and hobby farms. Broadacre agriculture involves large-scale dryland cereal cropping and grazing activities relying on extensive areas of land. Most of the information outlined in Section 20.3 was...

Acknowledgements

This chapter is based on the in-depth review of agricultural statistics in the UNECE region prepared for the Conference on European Statistics (CES). In its third meeting of 2007 2008, the CES Bureau decided on an in-depth review of this topic. It was requested that the review took into account recent developments such as the increase in food prices and the impact of climate change, and incorporated the final conclusions and recommendations reached at the fourth International Conference of...

Administrative data versus sample surveys

A statistical system based on administrative data allows money to be saved and the response burden to be reduced. It also has advantages that are typical of complete enumeration, such as producing figures for very detailed domains (not only geographical) and estimating transition over time. In fact, statistical units in a panel sample tend to abandon the survey after some time and comparisons over time become difficult whilst units are obliged to deliver administrative data, or at least...

Agricultural Survey Methods

'G. d'Annunzio' University, Chieti-Pescara, Italy National Institute of Statistics (ISTAT), Rome, Italy A John Wiley and Sons, Ltd., Publication This edition first published 2010 2010 John Wiley & Sons Ltd John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our...

Algorithms for automatic localization of random errors

An overview of algorithms for solving the localization problem for random errors in numerical data automatically based on the Fellegi-Holt paradigm is given in De Waal and Coutinho (2005). In this section we briefly discuss a number of these algorithms. Fellegi and Holt (1976) describe a method for solving the error localization problem automatically. In this section we sketch their method for details we refer to the original article by Fellegi and Holt (1976). The method is based on generating...

An overview of automatic editing

When automatic editing is applied, records are edited by computer without human intervention. Automatic editing is the opposite of the traditional interactive approach to the editing problem, where each record is edited manually. We can distinguish two kinds of errors systematic ones and random ones. A systematic error is an error reported consistently by (some of) the respondents. It can be caused by the consistent misunderstanding of a question by (some of) the respondents. Examples are when...

Analysis

The analysis system is perhaps the module of interest to the broadest audience in the NASS. This module will provide the tools and functionality through which analysts in Headquarters and our SSOs will interact with the data. All processes prior to this point are ones with no manual intervention or, in the case of data capture, one in which only a few will touch the data. As one of our senior executives aptly put it 'All this other stuff - data capture, edit and imputation - will happen while...

Analysis of the 2006 AGRIT data

The aim of the AGRIT survey is to produce area estimates of the main crops, as given in Table 13.1, at a national, regional and provincial level. Direct estimates of the area LCTc covered by crop type c (c 1, ,C), estimated standard errors (SE) and coefficients of variation (CV) were obtained as described in Section 13.2 for the 103 Italian provinces. The auxiliary variable is the number of pixels classified as crop type c according to the satellite data in the small area d, d 1, , D, with D...

Arealevel models

We now turn to model-based methods based on small-area linking models involving random small-area effects. Such models may be broadly classified into two types (a) area-level models, which we study in this section (b) unit-level models, which we study in the next. A basic area-level model that uses area-level covariates has two components. First, the direct survey estimate v, of the i(h area mean f, , possibly transformed as B-, f( ,), is equal to the sum of the population value Bi g(Yi) and...

Assuring quality

Measuring quality, however difficult it might be in specific cases, is only a lower level of quality ambition. Setting and meeting quality goals are, both from the user's and the producer's perspective, a higher ambition. The term 'quality assurance' is often used for the process of ensuring that quality goals are consistently met. 16.3.1 Quality assurance as an agency undertaking Traditionally, we tend to think of quality as a property of a single variable. We measure this quality and try to...

Author Index

L. 372 Abuelhaj, T. 351 Adger, N. W. 341, 342 Adhikary, A. K. 138 Aggarwal, R. 111 Ahmad, Q. K. 346 Alinovi, L. 342 Allen, J. D. 195, 199, 213 Anderson, D. W. 111 Anderson, J. R. 324 Annoni, A. 158 Arino, O. 154 Arminger, G. 349 Arthur, W. B. 344 Atkinson, D. 68 Bamps, C. 155 Bankier, M. 122, 241, 254 Barnard, J. 218 Bartholome, E. 154 Bartholomew, D. J. 349 Bartlett, M. S. 352 Baruth, B. 376 Bassett, G. W. 330 Battese, G. E. 144, 145, 146, 199, 330 Berthelot, J. M 247, 249, 250,...

Automatic editing of systematic errors

In this section we discuss several classes of systematic errors and ways to detect and correct them. As already mentioned, a well-known class of systematic errors is the so-called thousand errors. These are cases where a respondent replies in units rather than in the requested thousands of units. The usual way to detect such errors is - similar to selective editing - by considering 'anticipated' values, which could, for instance, be values of the same variable from a previous period or values...

Balanced sampling

The fundamental property of a balanced sample is that the HT estimators of the totals of a set of auxiliary variables are equal to the totals we wish to estimate. This idea dates back to the pioneering work by Neyman (1934) and Yates (1946). More recently, balanced sampling designs have been advocated by Royall and Herson (1973) and Scott et al. (1978), who pointed out that such sampling designs ensure the robustness of the estimators of totals, where 'robustness' essentially means 'protection...

Calibration and regression estimators

Calibration and regression estimators combine more accurate and objective observations on a sample (e.g. ground observations) with the exhaustive knowledge of a less accurate Table 12.2 Unweighted and weighted (unbiased) confusion matrix in LUCAS photo-interpretation for stratification. Photo- Permanent Permanent Forest & interpretation Arable Crops Grass Wood Other Total or less objective source of information, or co-variable (classified images). A characteristic property of these...

Calibration weighting

One of the most relevant problems encountered in large-scale business (e.g. agricultural) surveys is finding estimators that are (i) efficient and (ii) derived in accordance with criteria of internal and external consistency (see below). The methodology presented in this section satisfies these requirements. The class of calibration estimators (Deville and Sarndal, 1992) is an instance of a very general approach to the use of auxiliary information in the estimation procedures in finite-...

Case study the province of Foggia

Forecasting agriculture production is of critical importance for policy-makers as for market stakeholders. In the context of globalization, it is of great value to obtain as quickly as possible accurate overall estimates of crop production on an international scale. For example, timely and accurate estimates of durum wheat yield are an important management instrument for the European Commission's Directorate-General for Agriculture and Rural Development (Baruth et al., 2008). The main objective...

Changing concepts of quality

The most significant event is that the concepts of quality are being customer or externally driven rather than determined internally by the statistical organization. The availability of official statistics on the Internet is increasing the audience of data users. In previous times, the data users were fewer in number and usually very familiar from long experience with the data they were being provided. The new audience of data users is increasingly becoming more sophisticated and also more...

Combined use of different frames

When various incomplete registers are available and information included in their records cannot be directly used for producing statistics, a sample survey has to be designed. Administrative data are most often used to create one single sampling frame, although on the basis of two or more lists. This approach should be used only if the different lists contribute essential information to complete the frame and the record matching gives extremely reliable results otherwise, the frame will be...

Combining ex ante and ex post auxiliary information a simulated approach

In this section we measure the efficiency of the estimates produced when the sampling design uses only the ex ante, only the ex post or a mixture of the two types of auxiliary information. Thus, we performed some simulation experiments whose ultimate aim is to test the efficiency of some of the selection criteria discussed above. After giving a brief description of the archive at hand, we detail the Istat survey based on this archive. Then we review the technical aspects of the sample selection...

Conclusions

One of the key issues in edit development is determining what edits are essential to ensure the integrity of the data without over-editing. This is something that the edit group and Processing Sub-Team have struggled with. The team members represent an interesting blend of cultures. The longer-term, pre-census NASS staff developed within a culture of processing the returns from its sample surveys, where every questionnaire is hand-reviewed and corrected as necessary. While there is a need for...

Contents

1 The present state of agricultural statistics in developed countries situation 1.2 Current state and political and methodological context 4 1.2.2 Specific agricultural statistics in the UNECE region 6 1.3 Governance and horizontal issues 15 1.3.1 The governance of agricultural statistics 15 1.3.2 Horizontal issues in the methodology of agricultural statistics 16 1.4 Development in the demand for agricultural statistics 20 1.5 Conclusions 22 Acknowledgements 23 Reference 24 Part I Census,...

Coverage Evaluation Study

The snapshot records that were neither flagged as must-gets nor found among the census records composed the survey frame for a Coverage Evaluation Study (CES). A sample of these farms was selected, prepared, collected, matched and searched much as in the FCFU, but at a later time. The farms that were confirmed as active and not counted Table 5.1 Census of Agriculture coverage statistics, 2001 and 2006 Table 5.1 Census of Agriculture coverage statistics, 2001 and 2006 in the census were weighted...

Creating a farm register the population

When a statistical register is created, all relevant sources should be used so that the coverage will be as good as possible. When microdata from different sources are integrated many quality issues become clear that otherwise would have escaped notice. However, a common way of working is to use only one administrative source at a time. Figure 2.5 shows the consequences of this. The Swedish Business Register has been based on only one source, the Business Register of the Swedish Tax Board....

Data and modelling issues

Farm economic surveys may not capture or miss out on key output or input variables, which are essential for economic analysis purposes. If a database does not include the required economic information then it is of little use for economic analysis. Most econometric modelling makes the assumption of homogeneity of parameters. A number of statistical models have been developed to deal with this situation. One of the mostly widely used is the mixed regression model (Goldstein, 1995 Laird and Ware,...

Data by Str Con and sector aggregated over areas

Once the StrCon has been defined for each area unit, the basic information aggregated over areas on the numbers of units classified by sector and StrCon can be represented as in Table 6.1. By definition, there is a one-to-one correspondence between the sectors and the StrCon. Subscript i (rows 1 to I) refers to the sector, and j (columns 1 to I) to StrCon. Generally, any sector is distributed over various StrCon any StrCon contains establishments from various sectors, in fact it contains all...

Description of the basic design

Let us now consider some basic features of an area-based multi-stage sampling design for an integrated survey covering small-scale economic units of different types. The population of units comprises a number of sectors, such as different types of establishments. We assume that, on the basis of some criterion, each establishment can be assigned to one particular sector. Sample size requirements in terms of number of establishments n.i have been specified for each sector i. The available...

Development status

Timelines have been developed for specification and development of the various modules, and the groups are working hard to stick to them. Due to a number of factors beyond their control the developmental work started at least a year later than it should have, considering the magnitude of the system overhaul. In spite of the delays and overall staff shortages as compared to what was available for past censuses, the groups have done a fantastic job of moving ahead with the developmental work.

Direct tabulation of administrative data

Two interesting studies (Selander et al., 1998 Wallgren and Wallgren, 1999), financed jointly by Statistics Sweden and Eurostat, explored the possibility of producing statistics on crops and livestock through the Integrated Administrative and Control System (IACS, created for the European agricultural subsidies) and other administrative data. After a comparison of the IACS data with an updated list of farms, the first study came to the following conclusion 'The IACS register is generally not...

Disadvantages of direct tabulation of administrative data

When administrative data are used for statistical purposes, the first problem to be faced is that the information acquired is not exactly that which is needed, since questionnaires are designed for specific administrative purposes. Statistical and administrative purposes require different kinds of data to be collected and different acquisition methods (which strongly influence the quality of data). Strict interaction between statisticians and administrative departments is essential, although it...

Durum wheat yield forecast

In this subsection a regression model is presented and used for the spatial prediction of the yield of durum wheat in the province of Foggia. The idea is very simple the estimated regression equation obtained with a sample of observations is used for predicting the value of y for given values of - in other words, the geographical information (i.e. covariates) is available for each point of spatial domain under investigation, and by using this information and the estimated model, a spatial...

Economic and econometric specification

It is not uncommon for economists to depart from the proper economic and econometric specifications of the model because the observed variables are not precisely as required. For example, in an economic analysis of technology choice and efficiency on Australian dairy farms (Kompas and Che, 2006) a variable representing feed concentration was not available. Average grain feed (in kilograms per cow) was therefore used as a proxy for this variable in the inefficiency model. As a consequence, the...

Empirical strategy 2141 The Palestinian data set

The Palestinian Public Perception Survey (PPPS) is an inter-agency effort aimed at building understanding of the socio-economic conditions in the West Bank and Gaza Strip. The University of Geneva implemented the 11th PPPS in 2007, with the collaboration of several agencies, including the FAO for the food security component. Responsibility for the data collection rests with the Palestinian Central Bureau of Statistics. The PPPS provides a very rich data set, including key indicators relevant...

Errors in administrative registers

A pillar of sampling theory is that, when a sample survey is carried out, much care can be devoted to the collection procedure and to the data quality control, since a relatively small amount of data is collected thus, non-sampling errors can be limited. At the same time, sampling errors can be reduced by adopting efficient sample designs. The result is that very accurate estimates can often be produced with a relatively small amount of data. The approach of administrative registers is the...

Estimates for small domains and areas

An issue already reflected on earlier in this chapter is the increasing demand for data for small domains. In agriculture, these small domains could be geographical areas or unique commodities. Legislators are more frequently seeking data at lower levels of aggregation. In order for survey-based estimates to be reliable, the sample sizes would be required to increase beyond the organization's capacity to pay. The NASS's approach has been to augment probability-based survey estimates with...

Estimation of a total

In a multiple-frame survey, probability samples are drawn independently from the frames A , , Aq, Q > 2. The union of the Q frames is assumed to cover the finite population of interest, U. The frames may overlap, resulting in a possible 2Q 1 non-overlapping domains. When Q 2, the survey is called a dual-frame survey. For simplicity, let us consider the case of two frames (A and B), both incomplete and with some duplication, which together cover the whole population. The frames A and B...

Examples of crop area estimation with remote sensing in large regions

The early Large Area Crop Inventory Experiment (LACIE Heydorn, 1984) analysed samples of segments (5 x 6 nautical miles), defined as pieces of Landsat MSS images, and focused mainly on the sampling errors. It soon became clear (Sielken and Gbur, 1984) that pixel counting entailed a considerable risk of bias linked to the errors of commission and omission. Remote sensing was still too expensive in the 1980s (Allen, 1990), but the situation changed in the 1990s with the reduction of cost, both...

Examples of crop yield estimationforecasting with remote sensing

The USDA publishes monthly crop production figures for the United States and the world. At NASS, remote sensing information is used for qualitative monitoring of the state of crops but not for quantitative official estimates. NASS research experience has shown that AVHRR-based crop yield estimates in the USA are less precise than existing surveys.1 Current research is centred on the use of MODIS data within biophysical models for yield simulation. The Foreign Agricultural Service (FAS)...

Expected accuracy of area estimates with the LUCAS 2006 scheme

Before launching any survey some idea is needed of the accuracy that can be achieved. The accuracy reached for the estimated area of land cover c mainly depends on the size D of the region and the proportion p of c. The results for each country have allowed simple linear regressions without intercept to be fitted for agricultural classes, SLUCAS 0.743sran + , r2 0.989, SLUCAS 1.182Sran + , r2 0.967, where sran(p) D x y p(l p) (n 1) is the standard error that would have been obtained with simple...

Farm Accounts Data Network

The Farm Accounts Data Network (FADN) is a specific EU instrument, developed and managed by the Directorate-General for Agriculture. The FADN is an important source for micro-economic data relating to commercial holdings. For purposes of aggregation, the FADN sample results are linked to population results derived from the FSS using groupings based on the community typology. The creation of unique identifiers in the context of the agricultural register would enhance this linkage and, if privacy...

From concept to measurement 2131 The resilience framework

Figure 21.1 summarizes the rationale for attempting to measure resilience to food insecurity. Consistent with Dercon's (2001) framework, it is assumed that the resilience of a given household at a given point in time, T0, depends primarily on the options available to that household to make a living, such as its access to assets, income-generating activities, public services and social safety nets. These options represent a precondition for the household response mechanisms in the face of a...

General characteristics of SDA

SDA is a set of computer programs for the documentation and web-based analysis of survey data.10 There are also procedures for creating and downloading customized subsets of data sets. The software is maintained by the CSM.11 Current version of SDA is release 3.2 at the time of our experiment version 1.2 was available. All the following information are related to version 1.2. Data analysis programs were designed to be run from a web browser. SDA provides the results of the analysis very quickly...

How does it work

The intent of the data warehouse was to provide statisticians with direct access to current and historical data and with the capability to build their own queries or applications. Traditional transactional databases are designed using complex data models that can be difficult for anyone but power users to understand, thus requiring programmer assistance and discouraging ad hoc analysis. The database design results in many database tables (often over 100 tables), which result in many table forms...

Imputation of the missing auxiliary variables 1331 An overview of the missing data problem

Let us denote by Y (D x C) the matrix of estimates of the areas covered by crop types and by Z (D x C) the matrix containing the number of pixels classified by crop types according to the satellite data in each small area. Y is considered fully observed, while satellite images often haver missing data. Outliers and missing data in satellite information are mainly due to cloudy weather that does not allow the identification or correct recognition of what is being gown from the acquired digital...

Integrated economic and environmental accounting

At the aggregated level, sound indicators that give a good insight into the mechanism of agricultural society in relation to the economy and environment are needed. The integration of agricultural statistics with other statistics is a process that is tackled especially from the viewpoint of integrated economic and environmental accounting. The UNECE region is actively participating in preparations for the revision of the System of National Accounts (dating from 2008) where the relevance of...

Integrating agricultural and environmental information with LUCAS

Land cover and land use are of increasing importance in policy design and evaluation they constitute a key element in particular for climate change studies (Feddema et al., 2005). Environmental, agricultural and regional transport policies are more and more demanding two types of land cover data maps and statistics. A large number of land cover maps have been produced with satellite images examples at global level are Global Land Cover 2000 (known as GLC2000), based on SPOT VEGETATION images...

Introduction

Agricultural statistics in the UN Economic Commission for Europe (UNECE) region are well advanced. Information collected by farm structure surveys on the characteristics of farms, farmers' households and holdings is for example combined with a variety of information on the production of animal products, crops, etc. Agricultural accounts are produced on a regular basis and a large variety of indicators on agri-economic issues is available. At the level of the European Union as a whole the...

Issues in statistical analysis of farm survey data 2041 Multipurpose sample weighting

Since the sample of farms that contribute to a farm survey is typically a very small proportion of the farms that make up the agricultural sector of an economy, it is necessary to 'scale up' the sample data in order to properly represent the total activity of the sector. This scaling up is usually carried out by attaching a weight to each sample farm so that the weighted sum of the sample values of a survey variable is a 'good' estimate of the sector-based sum for the same variable. Methods for...

Landuse stratification

The process of land-use stratification is the delineation of land areas into land-use categories (strata). The purpose of stratification is to reduce the sampling variability by creating homogeneous groups of sampling units. Although certain parts of the process are highly subjective in nature, precision work is required of the personnel stratifying the land (called stratifiers) to ensure that overlaps and omissions of land area do not occur and land is correctly stratified. The stratification...

LUCAS 20012003 Target region sample design and results

Table 10.1 Some area estimates of LUCAS 2003 for EU15. Table 10.1 Some area estimates of LUCAS 2003 for EU15. are based on comparing each sample element with other sample elements geographically close to it. Wolter (1984) compares several estimators of this type for the one-dimensional case, some of which had been proposed by Yates (1949), Osborne (1942) and Cochran (1946). Matern (1986) proposes similar estimators for the two-dimensional case. The usual variance estimator of the mean for...

LUCAS 2006 a twophase sampling plan of unclustered points

After some encouraging tests of the Joint Research Centre (JRC) in collaboration with the Greek Ministry of Agriculture in 2004 and the previous experience of the Italian AGRIT program (Martino, 2003), Eurostat decided to change sampling scheme. The new scheme used a common map projection the Lambert azimuthal equal area recommended by the Infrastructure for Spatial Information in Europe (INSPIRE) initiative (Annoni et al., 2001). This decision improved the homogeneity of the sample layout, but...

Managing accuracy

Processes described previously under 'relevance' determine which programmes are going to be carried out, their broad objectives, and the resource parameters within which they must operate. Within those 'programme parameters', the management of accuracy requires particular attention during three key stages of a survey process survey design survey implementation and assessment of survey accuracy. These stages typically take place in a project management environment, outlined in Section 17.4,...

Managing interpretability

Statistical information that users cannot understand - or can easily misunderstand - has no value and may be a liability. Providing sufficient information to allow users to properly interpret statistical information is therefore a responsibility of the Agency. 'Information about information' has come to be known as meta-information or metadata. Metadata are at the heart of the management of the interpretability indicator, by informing users of the features that affect the quality of all data...

Managing timeliness

The desired timeliness of information derives from considerations of relevance - for what period does the information remain useful for its main purposes The answer to this question varies with the rate of change of the phenomena being measured, with the frequency of measurement, and with how quickly users must respond using the latest data. Specific types of agriculture data require different levels of timeliness. Data on crop area, stocks and production, for example, must be available soon...

Meat livestock and egg statistics

These traditional animal and poultry product statistics - resulting from traditional regular livestock surveys as well as meat, milk and eggs statistics - still play a key role in the design, implementation and monitoring of the EU Common Agricultural Policy and also contribute to ensuring food and feed safety in the EU. European statistics on animals and animal products are regulated by specific EU legislation. Member states are obliged to send monthly, annual and multi-annual data to the...

Methodological approaches

The framework described above can be estimated through multivariate analysis models. Equation (21.1) is a hierarchical model in which some variables are dependent on the one side and independent of the other. Unobservable (i.e. latent) variables also have to be dealt with. Figure 21.2 shows the path diagram of the model concerned. In the causal models literature (Spirtes et al. 2000), circles represent latent variables and boxes represent observed variables. Most of the hierarchical or...

Milk statistics

Milk statistics relate to milk produced by cows, ewes, goats and buffaloes. For the EU they are concerned with milk collected by dairies (monthly and annually) at national and regional level, milk produced in agricultural holdings (farms), the protein content and the supply balance sheets. Triennial statistics provide information on the structure of the dairies. Data collection and pre-validation are carried out through, for example, the use of the Web Forms system which ensures the management...

More flexible models an empirical approach

As is clear from the above illustrations, depending on the numbers and distribution of units of different types and on the extent to which the required sampling rates by sector differ, a basic model like (6.8) may be too inflexible, and may result in large variations in design weights and hence in large losses in efficiency of the design. It may even prove impossible to satisfy the sample allocation requirements in a reasonable way. Iteration o deviation * design effect Figure 6.1 Design effect...

Nonsampling errors in LUCAS 2006

Non-sampling errors are generally more difficult to assess than sampling errors (Lesser and Kalsbeek, 1999). In this section we study the possible order of magnitude of the main sources of non-sampling errors. The most important source of non-sampling error in an area frame survey is the identification mistakes by enumerators this can happen as a result of (b) incorrect identification because of inadequate enumerator training - mainly for minor crops (c) misinterpretation of rules to label land...

Numerical illustrations and more flexible models 661 Numerical illustrations

Table 6.2 shows three simulated populations of establishments. The distribution of the establishments by sector is identical in the three populations - varying linearly from Table 6.2 Number of establishments, by economic sector and 'stratum of concentration' (three simulated populations). Stratum of concentration population 1 Table 6.2 Number of establishments, by economic sector and 'stratum of concentration' (three simulated populations). Stratum of concentration population 1 Stratum of...

Probability proportional to size sampling

Consider a set-up where the study variable y and a positive auxiliary variable x are strongly correlated. Intuitively, in such a framework it should be convenient to select the elements to be included in the sample with probability proportional to x. Probability proportional to size (PPS) sampling designs can be applied in two different set-ups fixed-size designs without replacement (nps) and fixed-size designs with replacement (pps). Only nps will be considered here an excellent reference...

Probability proportional to size selection of area units

National or otherwise large-scale household surveys are typically based on multi-stage sampling designs. Firstly, a sample of area units is selected in one or more stages, and at the last stage a sample of ultimate units (dwellings, households, persons, etc.) is selected within each sample area. Increasingly - especially in developing countries - a more or less standard two-stage design is becoming common. In this design the first stage consists of the selection of area units with probability...

References

Carfagna, E. (1998) Area frame sample designs a comparison with the MARS project. In T.E. Holland and M.P.R. Van den Broecke (eds) Proceedings of Agricultural Statistics 2000, pp. 261-277. Voorburg, Netherlands International Statistical Institute. Carfagna, E. (2001a) Multiple frame sample surveys advantages, disadvantages and requirements. In International Statistical Institute, Proceedings, Invited papers, International Association of Survey Statisticians (IASS) Topics, Seoul, August 22-29,...

Sadasivan M 1975 Post Cluster Sampling

Bankier, M., Houle, A.M. and Luc, M. (1997) Calibration estimation in the 1991 and 1996 Canadian censuses. Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 66-75. Benedetti, R., Espa, G. and Lafratta, G. (2008) A tree-based approach to forming strata in multipurpose business surveys. Survey Methodology, 34, 195-203. Benedetti R., Bee, M. and Espa, G. (2010) A framework for cut-off sampling in business survey design. Journal of Official Statistics, in...

Registers register systems and methodological issues

A register is a complete list of the objects belonging to a defined object set. The objects in the register are identified by identification variables. This makes it possible to update or match the register against other sources. A system of statistical registers consists of a number of registers that can be linked to each other. To make this exact linkage between records in different registers possible, the registers in the system must contain reference numbers or other identification...

Relative efficiency of the LUCAS 2006 sampling plan

We have compared the variance obtained with several single-stage sampling plans. 1. Simple random sampling (srs). We make an approximation of simple random sampling using the variance of random subsamples of the available systematic sample. 2. Pure systematic. A subsample was extracted by selecting in the LUCAS 2006 sampling plan the first eight replicates in all strata. We have used the non-stratified version of (10.3) and (10.4). 3. Post-stratified sample. The systematic sample of option 2...

Requirements of sample surveys for economic analysis

One of the most important requirements of sample surveys in general, not only farm surveys, is that sample sizes are large enough to enable sufficiently accurate estimates to be produced for policy analysis. Working against this are two important objectives the provision of timely results and the need to collect detailed economic data (Section 20.3), which often necessitates expensive face-to-face data collection methods. At the same time the sample often needs to be spread spatially (for...

Respondent reluctance privacy and burden concerns

Although the agricultural sector is somewhat unique and not directly aligned with the general population on a number of levels, concerns regarding personal security and privacy of information are similar across most population subgroups in the USA, Brazil, and Europe. Due to incidences of personal information being released by businesses and government agencies, respondents now have one more reason for not responding to surveys. While this is not the only reason for increasing non-response...

Rural development statistics

These are a relatively new domain and can be seen as a consequence of the reform of the Common Agricultural Policy, which accords great importance to rural development. Eurostat has started collecting indicators for a wide range of subjects - such as demography (migration), economy (human capital), accessibility to services (infrastructure), social well-being - from almost all member states at regional level. Most of the indicators are not of a technical agricultural nature. Data collected...

Sample allocation

The area frame sample is used to collect data on a wide range of agricultural items such as crop acreages, livestock inventories and economic data. Therefore, the allocation of the sample across states and within states to the land-use strata is extremely important. The NASS evaluates optimum allocations of the sample to obtain the most precision in the major survey estimates for a given budget. The number of sample segments allocated to each land-use stratum and state depends on factors such...

Sample estimation

This final section will briefly discuss the approaches used to estimate agricultural production with an area frame sample of segments. The NASS uses two area frame estimators, namely the closed and weighted segment estimators. Both require that the interviewer collect data for all farms that operate land inside each segment. (A farm is defined to be all land under one operating arrangement with gross farm sales of at least 1000 a year.) The portion of the farm that is inside the segment is...

Sample rotation

As mentioned earlier, the NASS uses a five-year rotation scheme for the sample segments. Rotation is accomplished by replacing segments from specified replicates within a land-use stratum with newly selected segments. Preferably, the number of replicates is a multiple of 5 to provide a constant workload for sample selection and preparation activities in the AFS and for data collection work in the state offices. Naturally, instances occur when the number of replicates is not a multiple of 5,...

Sample selection

The procedures used to select the area frame samples will be described in this section for the equal and unequal probability of selection methods. 11.8.1 Equal probability of selection Recall that a two-step selection procedure is followed when segments are selected with equal probability. The first step is PSU selection. An SAS program is run which uses the selection probabilities discussed in the previous section to select the chosen PSUs. The program creates a listing of all chosen PSUs....

Satellite images and vegetation indices for yield monitoring

The use of remote sensing within the crop yield forecasting process has a series of requirements for most of the applications. Information is needed for large areas while maintaining spatial and temporal integrity. Suitable spectral bands are needed that characterize the vegetation to allow for crop monitoring, whether to derive crop indicators to be used in a regression or to feed biophysical models. High temporal frequency is essential to follow crop growth during the season. Historical data...

Satellites and sensors

The main characteristics of sensors for use in agricultural statistics are as follows Spectral resolution. Most agricultural applications use sensors that give information for a moderate number of bandwidths (four to eight bands). Near infrared (NIR), short wave infra-red (SWIR) and red are particularly important for measuring the activity of vegetation red-edge bands (between red and NIR) also seem to be promising for crop identification (Mutanga and Skidmore, 2004). Panchromatic (black and...

Selection probabilities

There are two methods for selecting the ultimate sampling unit or segment - equal and unequal selection. Which method is used depends on the availability of adequate boundaries for segments. If good boundaries are plentiful so that segments can be made approximately the same size within a land-use stratum, then segments are selected with equal probability. If adequate boundaries are not available, then unequal probability of selection is used since segment sizes are allowed to vary greatly in...

Selective editing

Manual or interactive editing is time-consuming and therefore expensive, and adversely influences the timeliness of publications. Moreover, when manual editing involves recon-tacting the respondents, it also increases the response burden. Therefore, most statistical institutes have adopted selective editing strategies. This means that only records that potentially contain influential errors are edited manually, whereas the remaining records are edited automatically. In this way, manual editing...

Smallarea estimates

An important development is the need for up-to-date and accurate small-area estimates. The demand for early estimates for advance warning on crops, and for results for small domains, continues to increase. In agriculture these small domains could be geographical areas or unique commodities. Statistical methods are being used for small-area estimation that use models and modelling techniques borrowing strength from other data sources such as administrative data or other areas. The overview of...

Statistics Canadas agriculture statistics programme

Statistic Canada's agriculture statistics programme consists of a central farm register numerous annual, sub-annual and occasional surveys the use of administrative data, including tax data and a census conducted every 5 years. Each of these components is described in this section. Farm register The farm register is a database of farm operations and operators. Its creation dates back to the late 1970s, while its present design and technology were developed in the mid-1990s. It contains key...

Statistics on crop production

The traditional statistics on crop production correspond in general to four families of data. First, the Early Estimates for Crop Products (EECP) provide, for cereals and certain other crops, data on area, yield and production before the harvest. Second, the current crop statistics provide at national level, for a given product, the area, yield and production harvested during the crop year. For some products data are requested at regional level. Third, the supply balance sheets give, for a...

Step 1 Match data from E to M

Every year, tax returns are matched to the farm register. Individuals' tax records are matched to operator records using key identifiers (name, sex, date of birth and address), while corporate tax records are matched to operations using other key identifiers (farm name and address). The matches are done using direct matching techniques. Very strong matches are accepted automatically weaker matches are verified and the weakest matches are automatically rejected. The thresholds between what is...

Stratification

Stratification is one of the most widely used techniques in finite population sampling. Strata are disjoint subdivisions of a population U , and the union of the strata coincides with the universe U Uk iUi, Uh n Ui 0, h i e 1, , H . Each group contains a portion of the sample. Many business surveys employ stratified sampling procedures where simple random sampling without replacement is performed within each stratum see, for example Sigman and Monsour (1995) and, for farm surveys, Vogel (1995)....

Substratification

There is a further level of stratification which is applied to the frame. Sub-stratification is the process used to divide the population of sampling units within each stratum equally into categories (substrata). These substrata do not have a definition associated with them like strata do (e.g. 50 or more cultivated). Sampling units are placed into substrata based on likeness of agricultural content and, to a certain extent, location. Sub-stratification activities include ordering the PSUs,...

Survey on plantations of certain species of fruit trees

Basic surveys (apple, pear, peach, apricot, orange, lemon and small-fruited citrus trees) are carried out in the EU every five years, to determine the production potential of plantations from which fruit produced is intended for the market. Data are collected on the areas under fruit trees broken down by region (production zone), species, variety, density (number of trees ha) and age of the trees. The chronological crop data series start with data from the early 1960s. The statistical system...

Synthetic and composite estimates

Suppose the population is divided into g large post-strata for which reliable direct estimates of the post-strata totals, Y.g, can be calculated from the survey data, where Y.g J2i Yig and Yig is the total of the characteristic of interest, y, for the units in small area i that belong to post-stratum g. Our interest is in estimat9ng the small-area totals Yi J2g Yig, i 1, ,m, using known auxiliary totals Xig. A synthetic estimate of Yi is given by where Y.g and X.g are reliable direct estimates...

Testing resilience measurement 2151 Model validation with CART

A cross-validation process is used to assess whether the procedures adopted for estimating the resilience index are meaningful. The cross-validation process tests the original hypothesis that sets of different variables and indicators belonging to different dimensions of food insecurity, the social sector and public services are correlated with (i.e. contribute to) the overall resilience index. The CART methodology (see Steinberg and Colla, 1995 Breiman et al., 1984) is used to estimate the...

The AGRIT survey

AGRIT is a sample survey the aim of which is to produce estimates, taking into account the economic situation, on areas and yields of the main crops and on the main land uses (see also Chapter 13 of this book). This survey is carried out using techniques of spatial sampling, particularly of point typology. Generally speaking, the method is based on the integration of data collected in ground samples, and data acquired through remote sensing. The list is constituted by a set of points (point...

The effect of random weights

The design effect, which measures the efficiency of a sample design compared to a simple random sample of the same size, can be decomposed under certain assumptions into two factors the effect of sample weights the effect of other aspects of the sample design, such as clustering and stratification. We are concerned here with the first component - the effect of sample weights on precision. This effect is generally to inflate variances and reduce the overall efficiency of the design. The increase...

The Fellegi Holt paradigm

In this section we describe the error localization problem for random errors as a mathematical optimization problem, using the (generalized) Fellegi-Holt paradigm. This mathematical optimization problem is solved for each record separately. For each record (x , ,xn) in the data set that is to be edited automatically, we wish to determine - or, more precisely, to ensure the existence of - a synthetic record (xjj, , xj) such that (xj*, , x*) satisfies all edits j (j 1, , J) given by (15.1) or...

The role of automatic editing in the editing process

As mentioned in Section 15.1, we aim to edit as many records as possible automatically, while ensuring that the final data are of sufficiently high quality. Automatic editing alone is generally not enough to obtain data of sufficiently high statistical quality. We believe that the combined use of modern editing techniques leads to a reduction in processing time and a decrease in required resources in comparison to the traditional interactive method, while preserving or often even increasing...

The role of resilience in measuring vulnerability

In most studies of poverty, vulnerability indicators consider the probability distribution of household consumption as their objective (Dercon, 2001). They consider consumption as a stochastic variable, try to estimate the deterministic part of it through regression models, and then calculate the probability of falling below a certain threshold, usually the poverty line or a proxy for food security. Other theoretical studies consider vulnerability as a function of people's exposure to risks and...

The transect survey in LUCAS 20012003

In each PSU of the 2001-2003 sample, a transect was defined as the 1200 m line joining the five points located at the north of the PSU. The transect was surveyed recording the intersections with linear elements (hedges, stone walls, etc.) and changes of major land cover types. Estimating the total length of linear elements is an application of the classical Buffon's needle problem (Wood and Robertson, 1998). An unbiased estimate of the total length is where is the number of transects of length...

Time series model of the growth in fodder use in the Australian cattle industry

The Australian cattle industry makes an important contribution to the Australian economy and farm sector. Cattle production accounted for about 7.9 billion, or about 23 of the value of farm production, in 2006-07 (ABS, 2008). Cattle production in Australia is spread across New South Wales, Victoria, southern Western Australia and northern Australia. The role of fodder is essential for the cattle industry as it not only is used as a supplement to dryland pasture in times of drought (Kompas and...

Typical contents of a farm economic survey

To obtain a basic economic profile of farms for construction of key economic variables such as major output and input variables, profit and TFP, and to undertake policy-relevant analysis, a large amount of information needs to be collected. In addition, construction of the survey database requires that certain basic information about the sample farms making up the database be collected and maintained. At a minimum, such 'paradata' should include a unique identifier so that data from the same...

Unitlevel models

A basic unit-level model assumes that the unit y-values, y j, associated with the jth population unit (j 1 V,) in the th area are related to unit-level covariates, Xy, for which the population mean vector X, is known. If v is a continuous response (e.g. crop yield), we assume a one-fold nested error linear regression model yj xj p + Vi + ej, j 1, ,Ni i 1, ,m, (9.9) where the random sample area effects 6i have mean 0 and common variance o62 and are independently distributed. Further, the vi are...