Non-sampling errors
Non-sampling errors generally result from human errors such as inattention,
misunderstanding or misinterpretation. The impact of randomly occurring
errors over a large number of observations will be minimal. Errors
occurring systematically can, on the other hand, have a major impact
on the reliability of estimates. Considerable time and effort is invested
into reducing non-sampling errors in SLID and SCF.
Non-sampling errors may arise from a variety of sources such as coverage,
response, non-response and processing errors.
Coverage error arises when sampling frame units do not exactly represent
the target population. Units may have been omitted from the sampling
frame (under-coverage), or units not in the target population may have
been included (overcoverage), or units may have been included more
than once (duplicates). Undercoverage represents the most common coverage
problem.
Slippage is a measure of survey coverage error. It is defined as the
percentage difference between control totals (Census population projections)
and weighted sample counts. Slippage rates for household surveys are
generally positive because some people that should be enumerated are
missed. Slippage rates have been revised back to 1996 using the 1996
Census population projections. According to the numbers in the table
below, in 2001, SLID covered 86.6% of its target population.
SLID estimation procedures use Census population projections to compensate
for determined slippage.
Rates are also available upon request for sex, province and age groupings.
Table E: Slippage rates in SLID
| |
1996 |
1997 |
1998 |
1999 |
2000 |
2001 |
| Canada (%) |
10.28 |
11.12 |
11.85 |
12.02 |
12.64 |
13.40 |
Response errors may be due to many factors, such as faulty questionnaire
design, interviewers’ or respondents’ misinterpretation
of questions, or respondents’ faulty reporting. Great effort
is invested in SCF and SLID to reduce the occurrence of response error.
Measures undertaken to minimize response errors include the use of
highly-skilled and well-trained interviewers, and supervision of interviewers
to detect misinterpretation of instructions or problems with the questionnaire
design. Response error can also be brought about by respondents who,
willingly or not, provide inaccurate responses.
Income data are especially prone to misreporting, as income is a sensitive
issue and includes many items with which respondents are not always
familiar. To obtain more accurate information, income data for the
SCF and SLID are collected after the income tax “season” when
respondents are more familiar with their tax records. Respondents receive
information about the income interview prior to the interviewer’s
telephone call. This gives them time to consult documents and have
information available at the time of the interview. Nevertheless, a
comparison of data produced from the SCF with other sources suggest
that certain income components such as EI benefits and self-employment
earnings are under-reported in an income interview. For respondents
who grant Statistics Canada permission to access their tax files (the
majority of respondents), SLID collects income data directly from administrative
files. This procedure reduces misreporting of income in the SLID.
Non-response errors occur to some extent in any survey for reasons
such as household members being on vacation during the interview period
or refusing to supply requested information, despite attempts to obtain
complete response from sampled units. For these individuals, the missing
data are imputed either explicitly by assigning data to each non-respondent
on the basis of a similar respondent record, or implicitly by redistributing
the weight of the non-respondent individual to other responding individuals.
The bias introduced by non-response increases with the differences
between respondent and non-respondent characteristics. Methods employed
to compensate for non-response make use of information available for
both respondents and non-respondents in an attempt to minimize this
bias.
Processing errors can occur at various stages in the survey: data
capture, editing, coding, weighting or tabulation. The computer-assisted
collection method used for SLID and SCF reduces the chance of introducing
capture errors because checks for consistency and completeness of the
data are built into the computer application. To minimize coding, weighting
or tabulation errors, diagnostic tests are carried out periodically.
These tests include comparisons of results with other data sources.
Weighting
The estimation of population characteristics from a survey is based on
the premise that each sampled unit represents, in addition to itself,
a certain number of unsampled units in the population. A basic survey
weight is attached to each record to indicate the number of units in
the population that are represented by that unit in the sample. Two
types of adjustment are then applied to the basic survey weights in
order to improve the reliability of the estimates. The basic weights
are first inflated to compensate for non-response. The non-response
adjusted weights are then further adjusted to ensure that estimates
on relevant population characteristics would respect population totals
from sources other than the survey. The population totals used for
SCF and SLID are based on Statistics Canada’s Demography Division
population counts for different province-age-sex groups as well as
counts by household and family size. In SLID, different weights apply
for cross-sectional and longitudinal estimates.
Cross-sectional representativeness of SLID
Each longitudinal sample, or “panel” in SLID initially constitutes
a representative cross-sectional sample of the population. However, because
the real population changes each year, whereas by design the longitudinal
sample does not, the sample must be modified to properly reflect these
changes to the composition of the population. This is done by adding
to the sample all new people in the population who are found to be living
with the initial respondents (and likewise dropping them from the sample
if they leave at later time-points). Conversely, any original respondents
who leave the target population (by moving abroad, into institutions,
etc.) are given a zero weight for cross-sectional purposes. In this way,
the cross-sectional sample, composed of the original respondents minus
those who left the target population plus those who have entered it,
is virtually fully representative of the population at each subsequent
time-point. The missing group is composed of persons who have newly entered
the target population and are not living with anyone who was in the target
population when the most recent panel was selected. Since SLID introduces
a new panel every three years, however, this group is quite small.
Response rates
High response rates are essential for the data quality of any survey
and thus considerable effort is invested to encourage effective participation
from SCF and SLID respondents.
For the SCF, response is calculated at the family level whereas in
SLID it is calculated at the household level. In SLID, a household
is considered to be “respondent” if at least one of its
members responds to either the January or the May interview. There
is the additional stipulation that the information on the household’s
composition cannot be missing for more than one year.
Within a respondent household, all members are assigned identical,
positive final weights, and those members (if any) who did not respond
to one or both of the collection phases will have final data that is
either shown as “missing” on the final database or imputed,
depending on the variable.
In the Survey of Consumer Finances (SCF) response ranged from 78.1%
(1989) to 82.1% (1995), while the cross-sectional response rates in
SLID range between 79.1% (2001) and 85.5% (1996).
The updated definition of respondent was introduced starting with
the release of data for 2000, and applied retroactively to 1996. It
had relatively little impact on response rates – the SLID response
rates for 1996 to 2000 are now one to two percentage points lower than
they were based on the old definition.
Response rates given in Table F have been revised back to 1996, using
the new definition of a respondent household.
Table F: Response rate in SCF (1990-1995) and SLID (1996-2001)
| Year |
1990 |
1991 |
1992 |
1993 |
1994 |
1995 |
1996 |
1997 |
1998 |
1999 |
2000 |
2001 |
Response
Rate (%)
|
79.0
|
80.0
|
80.7
|
80.0
|
79.5
|
82.1
|
85.5
|
83.6
|
82.3
|
82.8
|
80.8
|
79.1
|
Imputation for non-response
Income data are imputed in SCF – and in some cases in SLID – using
a “nearest neighbour” approach. This method involves identifying
another individual with certain similar characteristics, who becomes the “donor” for
the imputed value. SLID also uses other imputation techniques. In fact, the
primary method employed for imputing income data in this survey is to use the
previous year’s data, updated for any changes in circumstances. Only
in the absence of such data are income figures imputed using the “nearest
neighbour” technique in SLID.
Amounts received through certain government programs, such as child
tax benefits, the Goods and Services Harmonized Sales Tax Credit, and
the Guaranteed Income Supplement, are also derived from other information.
Data obtained from the tax route are complete and do not need imputation.
Comparability with other income data sources
Comparisons of figures produced from the SCF with other sources of data (Census
of Population, Longitudinal Administrative Data, National Economic and Financial
Accounts) reveal that certain income components, such as investment, self-employment
earnings, social assistance payments and EI benefits, are under-reported
in the SCF.
SLID’s estimates of the number of income recipients, aggregate
individual income and average family income are higher than the corresponding
estimates from SCF data.
Differences between SCF and SLID income figures can be attributed
to the different procedures for editing, imputation, and data collection
(entirely by questionnaire for the former versus partially by linkage
with T1 income tax files for the latter). |