EPUNet :: ECHP User Guide Online

Remember if you want to print from this section, do it by 'Print Selection' in the Print dialogue box in the file menu.

Click on the link below to navigate to another part of the User Guide:

1. Introduction
Purpose of this guide
The purpose of this basic user guide is to provide a straightforward presentation of the European Community Household Panel (ECHP) dataset for new users and to provide a reference to more detailed information for experienced users. The guide is intended to give the new user sufficient information to begin working with the ECHP data. For more experienced users, the guide provides, in a compact format, references to the appropriate Eurostat documentation dealing with key data issues.

Overview of the ECHP
The ECHP is a harmonised cross-national longitudinal survey focusing on household income and living conditions. It also includes items on health, education, housing, migration, demographics and employment characteristics.

The survey runs from 1994 to 2001. In the first wave (1994) a sample of some 60,500 households i.e. approximately 130,000 adults aged 16 years and over were interviewed across 12 member states (Belgium, Denmark, Germany, Greece, Spain, France, Italy, Ireland, Luxembourg, The Netherlands, Portugal, the United-Kingdom). In wave 2 (1995) Austria, then Finland in wave 3 (1996) joined the ECHP. From Wave 4 (1997) Sweden provides cross-sectional data in the UDB format derived from its National Survey on Living conditions.

For most of the countries the surveys were carried out using the harmonised ECHP questionnaire. For some countries the institutes in charge of the production of the ECHP converted national data surveys into ECHP format to replace the ECHP from 1997 onwards. In Germany and the United Kingdom, the derived national data was provided from 1994 to 2001. The details of these variations across countries and national surveys are given in Table 1. Care is needed in analysing the converted data for these countries, as some information might not have been collected in the national surveys so that they will appear as missing in the ECHP. In other cases, variables that were not collected in the national survey were imputed based on similar variables.

Table 1: Nature of ECHP data by country and year (Harmonised original ECHP data and data derived from existing national sources)

Countries	Full ECHP Data Format	ECHP Data FormatDerived from National Surveys
Belgium, Denmark,France, Greece, Ireland, Italy, the Netherlands, Spain, Portugal	1994-2001	-
Austria	1995-2001	-
Finland	1996-2001	-
Germany	1994-1996	1994-2001 (SOEP)
Luxembourg	1994-1996	1997-2001 (PSELL)
United-Kingdom	1994-1996	1994-2001 (BHPS)
Sweden	-	1997-2001 (SLCS) (Cross-sectional data only)

* The ECHP data for Belgium and the Netherlands come from a modification of existing national panels to meet the ECHP requirements. These are listed in the first column above because this system was in place from the beginning of the ECHP and national questionnaires were substantially modified to meet the ECHP requirements.

From the Production Data Base to the User Data Base
Within each country represented in the ECHP the surveys were carried out by the "National Data Collection Units", "NDUs" that are either the National Statistical Institutes or research centres. The results of the interviews were then transmitted to Eurostat using a format very close to the questionnaire. These datasets were checked and formatted by Eurostat as the 'Production Data Base' (PDB). The PDB which is only available to the "NDU's" is then used by Eurostat for weighting, imputation and construction of the 'User Data Base' (UDB). The UDB is the standardised, anonymised and more user-friendly user version of the ECHP data made available to researchers under an ECHP research contract signed with Eurostat.

Characteristics of the ECHP dataset
Three central features of the ECHP make this dataset a valuable source of information for researchers.

The first feature is the multidimensional character of the topics covered. The ECHP provides microdata on a wide range of topics at the level of individual and household: income, social life, housing condition, health, education, employment, training, and so on.

The second feature is the cross-national comparability of the data. The ECHP (apart from those countries using data derived from national sources, as noted above) is a harmonised and comparable dataset across countries. This has been achieved through the implementation of common procedures at all stages from the design of a harmonised questionnaire, harmonised definitions and sampling requirements.

The third and final main feature of the ECHP is its longitudinal nature. Individuals who were members of a household in the first wave ('sample persons') are followed over time allowing researchers to examine how their circumstances change over time. As such, the ECHP provides information on relationships and transitions over time at the micro level.

Data access
The ECHP dataset is available through one of the Eurostat Datashops. Contact details for the Datashops and further details on the requirements for access to the data can be found on the following web site: http://forum.europa.eu.int/irc/dsis/echpanel/info/data/information.html.

Set-up programs
There are two ways Eurostat distributes the dataset. The first one consists of a CD-ROM that includes the documentation of the ECHP UDB as well as a self-extracting file to produce the dataset. The second way is by "downloading" the files through Internet connection.

In both cases Eurostat supplies the dataset files in a "comma separated variable" (CSV) format. A file in CSV format is a plain text file where all values (text, numbers...) are separated by commas and, in the case of string (alpha) variables, enclosed with double quotes. Because they are not application-specific, CSV files can be opened on any computer system and by most applications.

However, the files will have to be converted into a readable format before the data can be analysed using statistical packages such as SAS, STATA or SPSS. Some programs are available online for this purpose [Reference to EPUNet website where SPSS and Eurostat's SAS programmes will be available].

2. Documentation
Eurostat has prepared a number of documents providing details about the content and character of the ECHP data files, and background on survey methodology. The key documents are sent to users who obtain the data on CD ROM on the CD itself.

They are also available on the CIRCA library web site page dedicated to ECHP http://forum.europa.eu.int/Public/irc/dsis/echpanel/library. Because of the large number of these documents we can't fully describe all of them here but Figure 1 outlines the structure of the CIRCA website in order to facilitate a search for a specific document. In addition, Annex A provides a comprehensive list of all documents available on this website following the same structure as on the CIRCA web site.

Figure 1: Organisation of the CIRCA web site on ECHP documentation [needs to be updated]

Table 2 outlines the content of some of the key documents likely to be of use to ECHP analysts.

Table 2: Key ECHP documentation

Document Name	Description of the content
PAN 15	The variable list and codebook for the household and personal register records as well as for the household and personal questionnaire records for Wave 1 (1994).
PAN 30	As above but for wave 2 (1995)
PAN 65	As above but for wave 3 (1996)
PAN 81	As above but for wave 4 (1997)
PAN 97	As above but for wave 5 (1998)
PAN 112	As above but for wave 6 (1999)
PAN 151	As above but for wave 7 (2000)
PAN 159	As above but for wave 8 (2001)
PAN 165	Describes the weighting procedure that has been implemented for calculating individuals and households weights without describing however how these weights have to be used
PAN 166	Describes the variables (data dictionary, code book, and differences between countries and waves). The document is divided between the Household file and the Personal file, themselves divided into categories such as General Information, Demographic, Income and so on.
PAN 167	Describes the conversion of the variables from the PDB questions to the UDB variable format for the household file, personal file as well as link fixed variables and link wave specific variables (household and personal)
PAN 168	Presents the ECHP history as well as describing the dataset (the various files), the contractual arrangements and the related documents (Doc PAN 164, PAN 165 etc…) and the contacts for the National Data Collection Units and Eurostat

3. Structure of the ECHP User Database
This section provides an outline of the content of the ECHP User Database (UDB) files, as covered in more detail in the document PAN168. The UDB includes several files that are wave-specific (a "register file" a "relationship file", a "household file" and finally a "personal file") and two files cover all the waves (the "country file" and the "longitudinal link file"). Annex B provides a list of the variables included in each file.

3.1 Country File
This file contains the following information for each wave and country:

Population figures (number of private households in the country, number of persons living in private households, number of persons aged at least sixteen and living in private households) which can be used to rescale the weight to gross-up the results to population figures.
Purchasing power parities (PPP) and Purchasing Power Standards (PPS): PPPs are a fictitious currency exchange rate, which eliminate the impact of price level differences; 1 PPS will thus buy a comparable basket of goods and services in each country (they are scaled at EU level, which is why PPS can be thought of as the Euro in real terms).
Exchange rates figures to convert national currencies to ECU/EURO. The country file contains also the fixed exchange rates for the 'Eurozone' countries (after 01.01.1999).

3.2 Longitudinal Link File
This file contains a record for every person that ever appeared in the ECHP. The first section contains information (gender, year of birth…) that remains constant over time. The second section, which is repeated in each wave, contains all the information (household identifier, household size…) required to rebuild the "longitudinal status" of the person from the beginning to the end of the panel, derived from the personal and household register files. Each person receives an identification number (PID) that is fixed across all waves. Note that the PID is unique only within country: when several countries are being analysed, the country code must also be used to create a truly 'unique' identifier. [Link to this point in Register of Queries and Solutions].

3.3 Register File See variables...
This file covers all persons currently living in households with a completed household interview. There is one register file for each wave of data. Principally the information available for each person is the household and personal identifier, the weights, year and month of birth, age and gender.

3.4 Relationship File See variables...
This file records the relationship between each pair of persons in the same household. There is a separate relationship file for each wave. Its records have the format "person X has relationship R with person Y". The following rule is used in specifying the variables corresponding to X, and Y:
If the relationship is between an ascendant and a descendant (such as parent and child), 'R' (variable 'Relation') always specifies the descendant side of the relationship (e.g. the child, grandchild etc.). Variable PID1 is the fixed identification number (PID) of the ascendant, and variable PID2 is the fixed identification number (PID) of the descendant.
In the relationship file individuals are identified in terms of their fixed PIDs, so that the consistency and evolution of relationships can be traced over waves.

3.5 Household File See variables...
This file contains one record for each household with a completed household interview.
The data in the household file is grouped into 7 sections as follows:

HG: General information
HD: Demographic information
HI: Household income
HF: Household financial situation
HA: Accommodation
HB: Durables
HL: Children

3.6 Personal File See variables...
This file contains one record for each adult with a completed personal interview. The information is grouped into 13 sections as followed:

PG: General information
PD: Demographic information
PE: Employment
PU: Unemployment
PS: Search for a job
PJ: Previous job
PC: Calendar of activities
PI: Income
PT: Training and Education
PH: Health
PR: Social relations
PM: Migration
PK: Satisfaction with various aspects of life

3.7 Variables Used for Matching Files
The following diagram shows how the UDB files are linked in terms of the identifiers or ID numbers. It shows, for example, that in order to 'attach' information on the household file to a particular individual, you would need to match the household file to the personal file using Wave, Country and HID.

Figure 2: Examples of Identifiers Used in Linking Files

3.8 Number of observations in each file and wave
The following table gives an indication of the number of cases available for cross-sectional analysis in each wave. It shows the number of cases in each of the UDB data files.

Table 3: Number of observations in each file and wave

	Link File	Household	Register	Personal
Wave 1	N.A.	71,367	198,070	149,306
Wave 2	N.A.	73,715	204,060	156,063
Wave 3	N.A.	74,746	205,432	157,536
Wave 4	N.A.	68,788	186,987	143,935
Wave 5	N.A.	66,097	177,434	136,238
Wave 6	N.A.	64,285	171,093	131,372
Wave 7	N.A.	61,330	161,735	124,937
Wave 8	N.A.	59,852	156,606	121,122
All waves	277,240	N.A.	N.A.	N.A.

Note: Data files for Waves 1-3 include the data from national sources for Germany and the UK, as well as the data from the original ECHP.

3.9 Number of cases available for Longitudinal Analysis
The following table gives an indication of the number of cases available longitudinal analysis. It shows the number of cases for which information is available in all of the indicated waves¹. It is clear that sample attrition has reduced the number of cases available for longitudinal analysis. [Link to Doc Pan on Attrition].

¹Note that the figures include both the ECHP and national samples for Germany and the UK (1994-96) and Luxembourg (1995-96).

Table 4: Number of cases available for selected longitudinal analyses

	Number Persons (all ages)	Number of Personal Interviews
Wave 1-2	179,464	132,220
Wave 2-3	187,573	139,594
Wave 1,2,3,4,5,6,7	99,516	70,966
Wave 1,2,3,4,5,6,7,8	92,350	65,622

4. Weights
In the ECHP UDB files, weights are available for households and persons. These weights are calculated taking into account the sample design and characteristics of persons and households. The weights are calibrated to reflect the structure of the population.
The purpose of this section is to describe briefly the various weights and their appropriate use. For a detailed description of the weighting procedures that have been implemented for calculating weights in the ECHP, see PAN 165.

Table 3 describes the weights provided for each wave and file. There are two types of weights: the base weight (at individual level only) which would be used for longitudinal analysis and the cross-sectional weight (at both individual and household level) for use in cross-sectional analyses.

The base weight is available only for 'sample persons'. In Wave 1 all persons in interviewed households are considered as 'sample persons' - eligible to be followed from one wave of the panel to the next. In the following waves, new entrants to existing households are defined as 'non-sample persons' to distinguish them from those present in the first wave. These new sample members have a zero base weight, but a non-zero cross-sectional weight.

Table 4: Weights available in the ECHP UDB

Name of the weight variable

File type	Base weight	Cross-sectional weight
Register	RG003	RG002
Personal	PG003	PG002
Household	-	HG004

1 Under ECHP tracing rules, if a household does not contain any 'sample persons' (for example, they moved out or died), the household is dropped from the panel.

Register file
Two sets of weights are in the dataset, the base weight and the cross sectional weight. Children, as well as adults, have a cross-sectional weight and a base weight on the Register file.
Base weight (RGOO3)
In Wave 1, all 'sample persons' (including children) receive a non-zero base weight and all persons in the same household share the same weight.
From Wave 2 and onwards the base weights are computed on the basis of the Wave 1 base weights, modified to take into account attrition between the waves and calibration of the achieved sample to external control distributions by basic personal and household characteristics. New members (joining the household after Wave 1) do not have a base weight assigned.
Cross-sectional weight (RGOO2)
In Wave 1 the cross-sectional weights are identical to the base weights and are equal for all household members. From Wave 2 and onward households members have the same cross-sectional weight that is computed as the average of the base weights of all household members.

Personal file
As for the register file, base weights and cross sectional weights are available in the dataset.
Base weight (PGOO3)
All 'sample persons' who complete a personal interview (and have a record in the Personal File) receive a non-zero base weight and 'non-sample persons' receive a zero base weight. The personal file base weight is derived from the register file base weight and is adjusted to take account of variations in response rates on the Personal Questionnaire by age, gender and other personal characteristics. Therefore, unlike the register file base weight, the personal file base weight will differ between household members.
From Wave 2 and onwards the same applies, that is, 'sample persons' receive a non-zero base weight and 'non-sample' persons receive a zero base weight.
Cross-sectional weight (PGOO2)
In Wave 1 the cross-sectional weight is identical to the base weight.
From Wave 2 and onward households members have the same cross-sectional weight that is computed as the average of the base weights of the interviewed household members.

Household file
In the household file across all waves there is only one weight: the cross- sectional weight. By definition, the base weight cannot apply to households, as households are not stable in composition across time.

Population scaling
The weights in the household and personal file (as well as the register file) have been rescaled so that the mean of the weight within country is equal to one. For the purpose of some analysis one might wish to scale results to population size and to do so we can use a grossing factor. This grossing factor is constructed as the ratio of two components, N/n. The numerator (N) is the population size analysed - that is the country total population (persons for the register file, households for the household file) or the population aged 16 and over (for the personal file). The denominator (n) is the sample size of the population analysed, that is, the number of cases in the register, household or personal file, as appropriate. Information on total population sizes (persons, households, persons age 16 and over) can be found in the country file for each wave.

Longitudinal analysis, cross-sectional analysis and appropriate weight
By definition with longitudinal analysis we are only concerned with persons present in a number of consecutive waves that would require the use of a specific weight. Eurostat does not supply in the ECHP dataset such a longitudinal weight for each subset of waves that might be analysed. However the base weight (RG003 or PG003) is available for 'sample persons' who were present in the panel in earlier waves. The appropriate weight to use for longitudinal analysis, then, is the base weight of the last wave analysed. So, for instance, to analyse employment information from the personal questionnaires for 1994-1999, PG003 from 1999 would be used.
On the other hand, if the analyst is interested in using the data for cross-sectional purposes, the cross-sectional weight should be used. This weight is available for all persons present (and personally interviewed, in the case of PG002) in that wave: both sample persons and non-sample persons.

5. After the ECHP: EU-SILC
As mentioned earlier, the ECHP ran from 1994 to 2001. From 2004 onwards a new instrument called EU-SILC (Statistics on Income and Living Conditions) will replace it as the central source of micro-data on household incomes and social exclusion in the EU. EU-SILC will be organised under a framework Regulation adopted in 2003 by both the EU Council of Ministers and the European Parliament. Unlike the ECHP, it will therefore be compulsory for all Member States. Among the EU-15 Member States, 6 have launched EU-SILC as from 2003 on the basis of informal agreements signed with the Commission (Belgium, Denmark, Greece, Ireland, Luxembourg and Austria). Apart from Germany, the Netherlands and the UK, the other Member States will start in 2004 as foreseen in the Regulation; Germany, the Netherlands and the UK will start in 2005. Depending on the country, accessing and candidate countries will launch EU-SILC between 2004 and 2007.
The priority with EU-SILC is the provision of quality, timely cross-sectional information on household incomes and social exclusion. The emphasis is on output harmonisation rather than input harmonisation: the data may come from different sources in different EU member states, and countries with highly developed population registers will be encouraged to use these sources. Notwithstanding the efforts at output harmonisation, harmonisation (which are formalised in a series of Commission's Regulation to be approved by the Director-generals of national statistical institutes) the international comparability of data from EU-SILC will inevitably be diminished compared to the ECHP. EU-SILC is also more limited in content than the ECHP. The main differences between the ECHP and SILC are:

Whereas the ECHP was a full panel (with all sample persons from wave 1 followed for the life of the panel) EU-SILC will allow for a rotational design in which an individual is followed for four years at most. Countries seeking to conduct a full panel will, however, be allowed to do so.
The ECHP was based on the use of harmonised questionnaires in all the participating member states (at least in the first three waves), but SILC allows key data on individuals to be drawn from registers or other sources where these are available in a country. This is done to ensure that each country can use what it considers to be its 'best source(s)' for income data, but reduces the degree to which the methodology is harmonised across countries.
Much of the detailed information on labour market situation and on non-monetary indicators of exclusion has been dropped from SILC.
EU-SILC allows certain income components to be provided only at the household level: family allowances (including Child Benefit and Lone Parent Allowance), property income (interest, dividends, rent), housing allowances (such as rent and mortgage interest supplements) and social assistance payments. In contrast, apart from social assistance and housing benefits, the ECHP recorded these at the level of the individual recipient. This means that SILC will lack the kind of data needed for tax-benefit modelling at the level of the individual or tax unit.

6. Further Information
As noted earlier, Eurostat has produced a large number of detailed documents on the ECHP, ranging from the 'blueprint' ECHP questionnaires, through documents dealing with methodological issues, to the agenda and minutes of ECHP meetings. These are available on the CIRCA website [Link to Circa website] and a list of the documents is provided in Annex A.

Apart from this User Guide, the Euro-Panel User Network (EPUNet) has a number of other resources that are likely to be of interest to new and advanced users of the ECHP. [Link to EPUNet website]. The resources include the following:

Set-up programmes for use in converting the Comma Separated Variable files issued by Eurostat into formats for use by SPSS [Link to SPSS set-up programmes], SAS [link to SAS set-up programmes] and STATA [Link to STATA set-up programmes].
A register of "Queries and Solutions" based on the experience of ECHP Users and covering problems and issues arising in the course of work on the ECHP and, where available, the solutions that have been proposed.
An e-mail hotline for queries not covered in the Register of Queries and Solutions.
Research Familiarisation Sessions for new and advanced users of the ECHP [Link to web page on Research Familiarisation Sessions].
Study Visits where researchers from institutions who do not already have access to the ECHP can come for a period of several weeks and use the ECHP data for a specific research project.
A database of programs for computing derived variables which have been used in previous research using the ECHP. [Link to database of programs for computing derived variables].
A database of research and publications based on the ECHP [Link to database web page].
An annual conference focusing on comparative research using the ECHP [Link to conference web page].

We hope that you have found this guide helpful. We would be glad to receive comments. We also hope that you will also consider contributing your own queries, solutions, programs for computing derived variables and research papers to EPUNet. Contact for contributions. This is a network of ECHP users for ECHP users and its success depends on your participation.