|
Remember if you want to print from this section,
do it by 'Print Selection' in the Print dialogue box in the file menu.
Click on the link below to navigate to another part of the User Guide:
1. Introduction
Purpose of this guide
The purpose of this basic user guide is to provide a straightforward presentation of the
European Community Household Panel (ECHP) dataset for new users and to provide a reference
to more detailed information for experienced users. The guide is intended to give the new
user sufficient information to begin working with the ECHP data. For more experienced
users, the guide provides, in a compact format, references to the appropriate Eurostat
documentation dealing with key data issues.
Overview of the ECHP
The ECHP is a harmonised cross-national longitudinal survey focusing on household income
and living conditions. It also includes items on health, education, housing, migration,
demographics and employment characteristics.
The survey runs from 1994 to 2001. In the first wave (1994) a sample of some 60,500
households i.e. approximately 130,000 adults aged 16 years and over were interviewed
across 12 member states (Belgium, Denmark, Germany, Greece, Spain, France, Italy, Ireland,
Luxembourg, The Netherlands, Portugal, the United-Kingdom). In wave 2 (1995) Austria, then
Finland in wave 3 (1996) joined the ECHP. From Wave 4 (1997) Sweden provides
cross-sectional data in the UDB format derived from its National Survey on Living
conditions.
For most of the countries the surveys were carried out using the harmonised ECHP
questionnaire. For some countries the institutes in charge of the production of the ECHP
converted national data surveys into ECHP format to replace the ECHP from 1997 onwards. In
Germany and the United Kingdom, the derived national data was provided from 1994 to 2001.
The details of these variations across countries and national surveys are given in Table
1. Care is needed in analysing the converted data for these countries, as some information
might not have been collected in the national surveys so that they will appear as missing
in the ECHP. In other cases, variables that were not collected in the national survey were
imputed based on similar variables.
Table 1: Nature of ECHP data by country and year (Harmonised
original ECHP data and data derived from existing national sources)
Countries |
Full ECHP Data Format |
ECHP Data FormatDerived from
National Surveys |
Belgium*, Denmark,France, Greece, Ireland, Italy, the Netherlands*, Spain,
Portugal |
1994-2001
|
- |
Austria |
1995-2001
|
- |
Finland |
1996-2001 |
- |
Germany |
1994-1996
|
1994-2001 (SOEP) |
Luxembourg |
1994-1996
|
1997-2001 (PSELL) |
United-Kingdom |
1994-1996 |
1994-2001 (BHPS) |
Sweden |
- |
1997-2001 (SLCS)
(Cross-sectional data only) |
* The ECHP data for Belgium and the Netherlands come from a modification of existing
national panels to meet the ECHP requirements. These are listed in the first column above
because this system was in place from the beginning of the ECHP and national
questionnaires were substantially modified to meet the ECHP requirements.
From the Production Data Base to the User Data
Base
Within each country represented in the ECHP the surveys were carried out by the
"National Data Collection Units", "NDUs" that are either the National
Statistical Institutes or research centres. The results of the interviews were then
transmitted to Eurostat using a format very close to the questionnaire. These datasets
were checked and formatted by Eurostat as the 'Production Data Base' (PDB). The PDB which
is only available to the "NDU's" is then used by Eurostat for weighting,
imputation and construction of the 'User Data Base' (UDB). The UDB is the standardised,
anonymised and more user-friendly user version of the ECHP data made available to
researchers under an ECHP research contract signed with Eurostat.
|
|
|
|
|
Characteristics of the ECHP dataset
Three central features of the ECHP make this dataset a valuable source of information for
researchers.
The first feature is the multidimensional character of the topics covered. The ECHP
provides microdata on a wide range of topics at the level of individual and household:
income, social life, housing condition, health, education, employment, training, and so
on.
The second feature is the cross-national comparability of the data. The ECHP (apart
from those countries using data derived from national sources, as noted above) is a
harmonised and comparable dataset across countries. This has been achieved through the
implementation of common procedures at all stages from the design of a harmonised
questionnaire, harmonised definitions and sampling requirements.
The third and final main feature of the ECHP is its longitudinal nature. Individuals
who were members of a household in the first wave ('sample persons') are followed over
time allowing researchers to examine how their circumstances change over time. As such,
the ECHP provides information on relationships and transitions over time at the micro
level.
Data access
The ECHP dataset is available through one of the Eurostat Datashops. Contact details for
the Datashops and further details on the requirements for access to the data can be found
on the following web site: http://forum.europa.eu.int/irc/dsis/echpanel/info/data/information.html.
Set-up programs
There are two ways Eurostat distributes the dataset. The first one consists of a CD-ROM
that includes the documentation of the ECHP UDB as well as a self-extracting file to
produce the dataset. The second way is by "downloading" the files through
Internet connection.
In both cases Eurostat supplies the dataset files in a "comma separated
variable" (CSV) format. A file in CSV format is a plain text file where all values
(text, numbers...) are separated by commas and, in the case of string (alpha) variables,
enclosed with double quotes. Because they are not application-specific, CSV files can be
opened on any computer system and by most applications.
However, the files will have to be converted into a readable format before the data can
be analysed using statistical packages such as SAS, STATA or SPSS. Some programs are
available online for this purpose [Reference to EPUNet website where SPSS and Eurostat's
SAS programmes will be available].
2. Documentation
Eurostat has prepared a number of documents providing details about the content and
character of the ECHP data files, and background on survey methodology. The key documents
are sent to users who obtain the data on CD ROM on the CD itself.
They are also available on the CIRCA library web site page dedicated to ECHP http://forum.europa.eu.int/Public/irc/dsis/echpanel/library.
Because of the large number of these documents we can't fully describe all of them here
but Figure 1 outlines the structure of the CIRCA website in order to facilitate a search
for a specific document. In addition, Annex A provides a comprehensive list of all
documents available on this website following the same structure as on the CIRCA web site.
Figure 1: Organisation of the CIRCA web site on ECHP
documentation [needs to be updated]
Table 2 outlines the content of some of the key documents likely to be of use to ECHP
analysts.
Table 2: Key ECHP documentation
Document Name |
Description of the content |
PAN 15 |
The variable list and codebook for the household and personal register
records as well as for the household and personal questionnaire records for Wave 1 (1994).
|
PAN 30 |
As above but for wave 2 (1995) |
PAN 65 |
As above but for wave 3 (1996) |
PAN 81 |
As above but for wave 4 (1997) |
PAN 97 |
As above but for wave 5 (1998) |
PAN 112 |
As above but for wave 6 (1999) |
PAN 151 |
As above but for wave 7 (2000) |
PAN 159 |
As above but for wave 8 (2001) |
PAN 165 |
Describes the weighting procedure that has been implemented for
calculating individuals and households weights without describing however how these
weights have to be used |
PAN 166
|
Describes the variables (data dictionary, code book, and differences
between countries and waves). The document is divided between the Household file and the
Personal file, themselves divided into categories such as General Information,
Demographic, Income and so on. |
PAN 167 |
Describes the conversion of the variables from the PDB questions to
the UDB variable format for the household file, personal file as well as link fixed
variables and link wave specific variables (household and personal) |
PAN 168 |
Presents the ECHP history as well as describing the dataset (the
various files), the contractual arrangements and the related documents (Doc PAN 164, PAN
165 etc
) and the contacts for the National Data Collection Units and Eurostat |
|
|
|
|
|
3. Structure of the ECHP User
Database
This section provides an outline of the content of the ECHP User Database (UDB) files, as
covered in more detail in the document PAN168. The UDB includes several files that are
wave-specific (a "register file" a "relationship file", a
"household file" and finally a "personal file") and two files cover
all the waves (the "country file" and the "longitudinal link file").
Annex B provides a list of the variables included in each file.
3.1 Country File
This file contains the following information for each wave and country:
- Population figures (number of private households in the country, number of
persons living in private households, number of persons aged at least sixteen and living
in private households) which can be used to rescale the weight to gross-up the results to
population figures.
- Purchasing power parities (PPP) and Purchasing Power Standards (PPS): PPPs
are a fictitious currency exchange rate, which eliminate the impact of price level
differences; 1 PPS will thus buy a comparable basket of goods and services in each country
(they are scaled at EU level, which is why PPS can be thought of as the Euro in real
terms).
- Exchange rates figures to convert national currencies to ECU/EURO. The
country file contains also the fixed exchange rates for the 'Eurozone' countries (after
01.01.1999).
3.2 Longitudinal Link File
This file contains a record for every person that ever appeared in the ECHP. The first
section contains information (gender, year of birth
) that remains constant over
time. The second section, which is repeated in each wave, contains all the information
(household identifier, household size
) required to rebuild the "longitudinal
status" of the person from the beginning to the end of the panel, derived from the
personal and household register files. Each person receives an identification number (PID)
that is fixed across all waves. Note that the PID is unique only within country: when
several countries are being analysed, the country code must also be used to create a truly
'unique' identifier. [Link to this point in Register of Queries and Solutions].
|
|
|
|
|
3.3 Register File See variables...
This file covers all persons currently living in households with a completed household
interview. There is one register file for each wave of data. Principally the information
available for each person is the household and personal identifier, the weights, year and
month of birth, age and gender.
3.4 Relationship File See variables...
This file records the relationship between each pair of persons in the same household.
There is a separate relationship file for each wave. Its records have the format
"person X has relationship R with person Y". The following rule is used in
specifying the variables corresponding to X, and Y:
If the relationship is between an ascendant and a descendant (such as parent and child),
'R' (variable 'Relation') always specifies the descendant side of the relationship (e.g.
the child, grandchild etc.). Variable PID1 is the fixed identification number (PID) of the
ascendant, and variable PID2 is the fixed identification number (PID) of the descendant.
In the relationship file individuals are identified in terms of their fixed PIDs, so that
the consistency and evolution of relationships can be traced over waves.
3.5 Household File See
variables...
This file contains one record for each household with a completed household interview.
The data in the household file is grouped into 7 sections as follows:
- HG: General information
- HD: Demographic information
- HI: Household income
- HF: Household financial situation
- HA: Accommodation
- HB: Durables
- HL: Children
3.6 Personal File See
variables...
This file contains one record for each adult with a completed personal interview. The
information is grouped into 13 sections as followed:
- PG: General information
- PD: Demographic information
- PE: Employment
- PU: Unemployment
- PS: Search for a job
- PJ: Previous job
- PC: Calendar of activities
- PI: Income
- PT: Training and Education
- PH: Health
- PR: Social relations
- PM: Migration
- PK: Satisfaction with various aspects of life
3.7 Variables Used for Matching Files
The following diagram shows how the UDB files are linked in terms of the identifiers or ID
numbers. It shows, for example, that in order to 'attach' information on the household
file to a particular individual, you would need to match the household file to the
personal file using Wave, Country and HID.
Figure 2: Examples of Identifiers Used in Linking Files
3.8 Number of observations in each file and wave
The following table gives an indication of the number of cases available for
cross-sectional analysis in each wave. It shows the number of cases in each of the UDB
data files.
Table 3: Number of observations in each file and wave
|
Link File |
Household |
Register |
Personal |
Wave 1 |
N.A. |
71,367 |
198,070 |
149,306 |
Wave 2 |
N.A. |
73,715 |
204,060 |
156,063 |
Wave 3 |
N.A. |
74,746 |
205,432 |
157,536 |
Wave 4 |
N.A. |
68,788 |
186,987 |
143,935 |
Wave 5 |
N.A. |
66,097 |
177,434 |
136,238 |
Wave 6 |
N.A. |
64,285 |
171,093 |
131,372 |
Wave 7 |
N.A. |
61,330 |
161,735 |
124,937 |
Wave 8 |
N.A. |
59,852 |
156,606 |
121,122 |
All waves |
277,240 |
N.A. |
N.A. |
N.A. |
Note: Data files for Waves 1-3 include the data from national sources for Germany and
the UK, as well as the data from the original ECHP.
3.9 Number of cases available for Longitudinal Analysis
The following table gives an indication of the number of cases available longitudinal
analysis. It shows the number of cases for which information is available in all of the
indicated waves1. It is clear that sample attrition has reduced the number of
cases available for longitudinal analysis. [Link to Doc Pan on
Attrition].
1Note that the figures include both the ECHP and national samples for
Germany and the UK (1994-96) and Luxembourg (1995-96).
Table 4: Number of cases available for selected longitudinal
analyses
|
Number Persons
(all ages) |
Number of Personal Interviews |
Wave 1-2 |
179,464 |
132,220 |
Wave 2-3 |
187,573 |
139,594 |
Wave 1,2,3,4,5,6,7 |
99,516 |
70,966 |
Wave 1,2,3,4,5,6,7,8 |
92,350 |
65,622 |
4. Weights
In the ECHP UDB files, weights are available for households and persons. These weights are
calculated taking into account the sample design and characteristics of persons and
households. The weights are calibrated to reflect the structure of the population.
The purpose of this section is to describe briefly the various weights and their
appropriate use. For a detailed description of the weighting procedures that have been
implemented for calculating weights in the ECHP, see PAN 165.
Table 3 describes the weights provided for each wave and file. There are two types of
weights: the base weight (at individual level only) which would be used for longitudinal
analysis and the cross-sectional weight (at both individual and household level) for use
in cross-sectional analyses.
The base weight is available only for 'sample persons'. In Wave 1 all persons in
interviewed households are considered as 'sample persons' - eligible to be followed from
one wave of the panel to the next. In the following waves, new entrants to existing
households are defined as 'non-sample persons' to distinguish them from those present in
the first wave. These new sample members have a zero base weight, but a non-zero
cross-sectional weight.
Table 4: Weights available in the ECHP UDB
|
Name of the weight variable |
|
File type |
Base weight |
Cross-sectional weight |
Register |
RG003 |
RG002 |
Personal |
PG003 |
PG002 |
Household |
- |
HG004 |
|
1 Under ECHP tracing rules, if a household does not contain any 'sample persons' (for
example, they moved out or died), the household is dropped from the panel.
Register file
Two sets of weights are in the dataset, the base weight and the cross sectional weight.
Children, as well as adults, have a cross-sectional weight and a base weight on the
Register file.
Base weight (RGOO3)
In Wave 1, all 'sample persons' (including children) receive a non-zero base weight
and all persons in the same household share the same weight.
From Wave 2 and onwards the base weights are computed on the basis of the Wave 1 base
weights, modified to take into account attrition between the waves and calibration of the
achieved sample to external control distributions by basic personal and household
characteristics. New members (joining the household after Wave 1) do not have a base
weight assigned.
Cross-sectional weight (RGOO2)
In Wave 1 the cross-sectional weights are identical to the base weights and are equal for
all household members. From Wave 2 and onward households members have the same
cross-sectional weight that is computed as the average of the base weights of all
household members.
Personal file
As for the register file, base weights and cross sectional weights are available in the
dataset.
Base weight (PGOO3)
All 'sample persons' who complete a personal interview (and have a record in the Personal
File) receive a non-zero base weight and 'non-sample persons' receive a zero base weight.
The personal file base weight is derived from the register file base weight and is
adjusted to take account of variations in response rates on the Personal Questionnaire by
age, gender and other personal characteristics. Therefore, unlike the register file base
weight, the personal file base weight will differ between household members.
From Wave 2 and onwards the same applies, that is, 'sample persons' receive a non-zero
base weight and 'non-sample' persons receive a zero base weight.
Cross-sectional weight (PGOO2)
In Wave 1 the cross-sectional weight is identical to the base weight.
From Wave 2 and onward households members have the same cross-sectional weight that is
computed as the average of the base weights of the interviewed household members.
Household file
In the household file across all waves there is only one weight: the cross- sectional
weight. By definition, the base weight cannot apply to households, as households are not
stable in composition across time.
Population scaling
The weights in the household and personal file (as well as the register file) have been
rescaled so that the mean of the weight within country is equal to one. For the purpose of
some analysis one might wish to scale results to population size and to do so we can use a
grossing factor. This grossing factor is constructed as the ratio of two components, N/n.
The numerator (N) is the population size analysed - that is the country total population
(persons for the register file, households for the household file) or the population aged
16 and over (for the personal file). The denominator (n) is the sample size of the
population analysed, that is, the number of cases in the register, household or personal
file, as appropriate. Information on total population sizes (persons, households, persons
age 16 and over) can be found in the country file for each wave.
Longitudinal analysis,
cross-sectional analysis and appropriate weight
By definition with longitudinal analysis we are only concerned with persons present in a
number of consecutive waves that would require the use of a specific weight. Eurostat does
not supply in the ECHP dataset such a longitudinal weight for each subset of waves that
might be analysed. However the base weight (RG003 or PG003) is available for 'sample
persons' who were present in the panel in earlier waves. The appropriate weight to use for
longitudinal analysis, then, is the base weight of the last wave analysed. So, for
instance, to analyse employment information from the personal questionnaires for
1994-1999, PG003 from 1999 would be used.
On the other hand, if the analyst is interested in using the data for cross-sectional
purposes, the cross-sectional weight should be used. This weight is available for all
persons present (and personally interviewed, in the case of PG002) in that wave: both
sample persons and non-sample persons.
5. After the ECHP: EU-SILC
As mentioned earlier, the ECHP ran from 1994 to 2001. From 2004 onwards a new instrument
called EU-SILC (Statistics on Income and Living Conditions) will replace it as the central
source of micro-data on household incomes and social exclusion in the EU. EU-SILC will be
organised under a framework Regulation adopted in 2003 by both the EU Council of Ministers
and the European Parliament. Unlike the ECHP, it will therefore be compulsory for all
Member States. Among the EU-15 Member States, 6 have launched EU-SILC as from 2003 on the
basis of informal agreements signed with the Commission (Belgium, Denmark, Greece,
Ireland, Luxembourg and Austria). Apart from Germany, the Netherlands and the UK, the
other Member States will start in 2004 as foreseen in the Regulation; Germany, the
Netherlands and the UK will start in 2005. Depending on the country, accessing and
candidate countries will launch EU-SILC between 2004 and 2007.
The priority with EU-SILC is the provision of quality, timely cross-sectional information
on household incomes and social exclusion. The emphasis is on output harmonisation rather
than input harmonisation: the data may come from different sources in different EU member
states, and countries with highly developed population registers will be encouraged to use
these sources. Notwithstanding the efforts at output harmonisation, harmonisation (which
are formalised in a series of Commission's Regulation to be approved by the
Director-generals of national statistical institutes) the international comparability of
data from EU-SILC will inevitably be diminished compared to the ECHP. EU-SILC is also more
limited in content than the ECHP. The main differences between the ECHP and SILC are:
- Whereas the ECHP was a full panel (with all sample persons from wave 1
followed for the life of the panel) EU-SILC will allow for a rotational design in which an
individual is followed for four years at most. Countries seeking to conduct a full panel
will, however, be allowed to do so.
- The ECHP was based on the use of harmonised questionnaires in all the
participating member states (at least in the first three waves), but SILC allows key data
on individuals to be drawn from registers or other sources where these are available in a
country. This is done to ensure that each country can use what it considers to be its
'best source(s)' for income data, but reduces the degree to which the methodology is
harmonised across countries.
- Much of the detailed information on labour market situation and on
non-monetary indicators of exclusion has been dropped from SILC.
- EU-SILC allows certain income components to be provided only at the
household level: family allowances (including Child Benefit and Lone Parent Allowance),
property income (interest, dividends, rent), housing allowances (such as rent and mortgage
interest supplements) and social assistance payments. In contrast, apart from social
assistance and housing benefits, the ECHP recorded these at the level of the individual
recipient. This means that SILC will lack the kind of data needed for tax-benefit
modelling at the level of the individual or tax unit.
6. Further Information
As noted earlier, Eurostat has produced a large number of detailed documents on the ECHP,
ranging from the 'blueprint' ECHP questionnaires, through documents dealing with
methodological issues, to the agenda and minutes of ECHP meetings. These are available on
the CIRCA website [Link to Circa website] and a list of the
documents is provided in Annex A.
Apart from this User Guide, the Euro-Panel User Network (EPUNet) has a number of other
resources that are likely to be of interest to new and advanced users of the ECHP. [Link to EPUNet website]. The resources include the following:
- Set-up programmes for use in converting the Comma Separated Variable files
issued by Eurostat into formats for use by SPSS [Link to SPSS set-up
programmes], SAS [link to SAS set-up programmes] and
STATA [Link to STATA set-up programmes].
- A register of "Queries and Solutions"
based on the experience of ECHP Users and covering problems and issues arising in the
course of work on the ECHP and, where available, the solutions that have been proposed.
- An e-mail hotline for queries not
covered in the Register of Queries and Solutions.
- Research Familiarisation Sessions for new and advanced users of the ECHP
[Link to web page on Research Familiarisation Sessions].
- Study Visits where researchers from
institutions who do not already have access to the ECHP can come for a period of several
weeks and use the ECHP data for a specific research project.
- A database of programs for computing derived variables which have been used
in previous research using the ECHP. [Link to database of programs
for computing derived variables].
- A database of research and publications based on the ECHP [Link to database web page].
- An annual conference focusing on comparative research using the ECHP [Link to conference web page].
We hope that you have found this guide helpful. We would be glad to receive comments.
We also hope that you will also consider contributing your own queries, solutions,
programs for computing derived variables and research papers to EPUNet. Contact
for contributions. This is a network of ECHP users for
ECHP users and its success depends on your participation.
|
|