**Unformatted text preview: **The SAGE Dictionary of Statistics The SAGE Dictionary of Statistics
Duncan Cramer and Dennis Howitt Duncan Cramer and Dennis Howitt
SAGE Cramer-Prelims.qxd 4/22/04 2:09 PM Page i The SAGE Dictionary of Statistics Cramer-Prelims.qxd 4/22/04 2:09 PM Page ii Cramer-Prelims.qxd 4/22/04 2:09 PM Page iii The SAGE Dictionary of Statistics
a practical resource for students
in the social sciences Duncan Cramer and Dennis Howitt SAGE Publications
London ● Thousand Oaks ● New Delhi Cramer-Prelims.qxd 4/22/04 2:09 PM Page iv © Duncan Cramer and Dennis Howitt 2004
First published 2004
Apart from any fair dealing for the purposes of research or
private study, or criticism or review, as permitted under
the Copyright, Designs and Patents Act, 1988, this publication
may be reproduced, stored or transmitted in any form, or by
any means, only with the prior permission in writing of the
publishers, or in the case of reprographic reproduction, in
accordance with the terms of licences issued by the
Copyright Licensing Agency. Inquiries concerning
reproduction outside those terms should be sent to
the publishers.
SAGE Publications Ltd
1 Oliver’s Yard
55 City Road
London EC1Y 1SP
SAGE Publications Inc.
2455 Teller Road
Thousand Oaks, California 91320
SAGE Publications India Pvt Ltd
B-42, Panchsheel Enclave
Post Box 4109
New Delhi 110 017
British Library Cataloguing in Publication data
A catalogue record for this book is available
from the British Library
ISBN 0 7619 4137 1
ISBN 0 7619 4138 X (pbk)
Library of Congress Control Number: 2003115348 Typeset by C&M Digitals (P) Ltd.
Printed in Great Britain by The Cromwell Press Ltd,
Trowbridge, Wiltshire Cramer-Prelims.qxd 4/22/04 2:09 PM Page v Contents Preface
Some Common Statistical Notation
A to Z Some Useful Sources vii
ix
1–186 187 Cramer-Prelims.qxd 4/22/04 2:09 PM Page vi To our mothers – it is not their fault that lexicography took its toll. Cramer-Prelims.qxd 4/22/04 2:09 PM Page vii Preface Writing a dictionary of statistics is not many people’s idea of fun. And it wasn’t ours.
Can we say that we have changed our minds about this at all? No. Nevertheless, now
the reading and writing is over and those heavy books have gone back to the library,
we are glad that we wrote it. Otherwise we would have had to buy it. The dictionary
provides a valuable resource for students – and anyone else with too little time on
their hands to stack their shelves with scores of specialist statistics textbooks.
Writing a dictionary of statistics is one thing – writing a practical dictionary of statistics is another. The entries had to be useful, not merely accurate. Accuracy is not that
useful on its own. One aspect of the practicality of this dictionary is in facilitating the
learning of statistical techniques and concepts. The dictionary is not intended to stand
alone as a textbook – there are plenty of those. We hope that it will be more important
than that. Perhaps only the computer is more useful. Learning statistics is a complex
business. Inevitably, students at some stage need to supplement their textbook. A trip
to the library or the statistics lecturer’s office is daunting. Getting a statistics dictionary from the shelf is the lesser evil. And just look at the statistics textbook next to it –
you probably outgrew its usefulness when you finished the first year at university.
Few readers, not even ourselves, will ever use all of the entries in this dictionary.
That would be a bit like stamp collecting. Nevertheless, all of the important things are
here in a compact and accessible form for when they are needed. No doubt there are
omissions but even The Collected Works of Shakespeare leaves out Pygmalion! Let us know
of any. And we are not so clever that we will not have made mistakes. Let us know if
you spot any of these too – modern publishing methods sometimes allow corrections
without a major reprint.
Many of the key terms used to describe statistical concepts are included as entries
elsewhere. Where we thought it useful we have suggested other entries that are
related to the entry that might be of interest by listing them at the end of the entry
under ‘See’ or ‘See also’. In the main body of the entry itself we have not drawn
attention to the terms that are covered elsewhere because we thought this could be
too distracting to many readers. If you are unfamiliar with a term we suggest you
look it up.
Many of the terms described will be found in introductory textbooks on statistics.
We suggest that if you want further information on a particular concept you look it up
in a textbook that is ready to hand. There are a large number of introductory statistics Cramer-Prelims.qxd 4/22/04 2:09 PM Page viii THE SAGE DICTIONARY OF STATISTICS viii texts that adequately discuss these terms and we would not want you to seek out a
particular text that we have selected that is not readily available to you. For the less
common terms we have recommended one or more sources for additional reading.
The authors and year of publication for these sources are given at the end of the entry
and full details of the sources are provided at the end of the book. As we have discussed some of these terms in texts that we have written, we have sometimes
recommended our own texts!
The key features of the dictionary are:
•
•
•
•
•
•
•
•
•
• Compact and detailed descriptions of key concepts.
Basic mathematical concepts explained.
Details of procedures for hand calculations if possible.
Difficulty level matched to the nature of the entry: very fundamental concepts are
the most simply explained; more advanced statistics are given a slightly more
sophisticated treatment.
Practical advice to help guide users through some of the difficulties of the application of statistics.
Exceptionally wide coverage and varied range of concepts, issues and procedures –
wider than any single textbook by far.
Coverage of relevant research methods.
Compatible with standard statistical packages.
Extensive cross-referencing.
Useful additional reading. One good thing, we guess, is that since this statistics dictionary would be hard to distinguish from a two-author encyclopaedia of statistics, we will not need to write one
ourselves.
Duncan Cramer
Dennis Howitt Cramer-Prelims.qxd 4/22/04 2:09 PM Page ix Some Common
Statistical Notation Roman letter symbols or abbreviations:
a
df
F
log n
M
MS
n or N
p
r
R
SD
SS
t constant
degrees of freedom
F test
natural or Napierian logarithm
arithmetic mean
mean square
number of cases in a sample
probability
Pearson’s correlation coefficient
multiple correlation
standard deviation
sum of squares
t test Greek letter symbols:
␣

␥
(lower case alpha) Cronbach’s alpha reliability, significance level or alpha error
(lower case beta) regression coefficient, beta error
(lower case gamma)
(lower case delta)
(lower case eta)
(lower case kappa)
(lower case lambda)
(lower case rho)
(lower case tau)
(lower case phi)
(lower case chi) Cramer-Prelims.qxd x 4/22/04 2:09 PM Page x THE SAGE DICTIONARY OF STATISTICS Some common mathematical symbols:
冱
⬁
⫽
⬍
ⱕ
⬎
ⱖ
冑苴 sum of
infinity
equal to
less than
less than or equal to
greater than
greater than or equal to
square root Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 1 A a posteriori tests: see post hoc tests a priori comparisons or tests: where
there are three or more means that may be
compared (e.g. analysis of variance with
three groups), one strategy is to plan the
analysis in advance of collecting the data (or
examining them). So, in this context, a priori
means before the data analysis. (Obviously
this would only apply if the researcher was
not the data collector, otherwise it is in
advance of collecting the data.) This is important because the process of deciding what
groups are to be compared should be on the
basis of the hypotheses underlying the planning of the research. By definition, this implies
that the researcher is generally disinterested in
general or trivial aspects of the data which are
not the researcher’s primary focus. As a consequence, just a few of the possible comparisons
are needed to be made as these contain the
crucial information relative to the researcher’s
interests. Table A.1 involves a simple ANOVA
design in which there are four conditions –
two are drug treatments and there are two
control conditions. There are two control conditions because in one case the placebo tablet
is for drug A and in the other case the placebo
tablet is for drug B.
An appropriate a priori comparison strategy
in this case would be:
• Meana against Meanb
• Meana against Meanc
• Meanb against Meand Table A.1 A simple ANOVA design
Drug A Drug B Placebo
control A Placebo
control B Meana = Meanb = Meanc = Meand = Notice that this is fewer than the maximum
number of comparisons that could be made
(a total of six). This is because the researcher
has ignored issues which perhaps are of little
practical concern in terms of evaluating
the effectiveness of the different drugs. For
example, comparing placebo control A with
placebo control B answers questions about
the relative effectiveness of the placebo conditions but has no bearing on which drug is
the most effective overall.
The a priori approach needs to be compared with perhaps the more typical alternative research scenario – post hoc comparisons.
The latter involves an unplanned analysis of
the data following their collection. While this
may be a perfectly adequate process, it is
nevertheless far less clearly linked with the
established priorities of the research than a
priori comparisons. In post hoc testing, there
tends to be an exhaustive examination of all
of the possible pairs of means – so in the
example in Table A.1 all four means would be
compared with each other in pairs. This gives
a total of six different comparisons.
In a priori testing, it is not necessary to
carry out the overall ANOVA since this
merely tests whether there are differences
across the various means. In these circumstances, failure of some means to differ from Cramer Chapter-A.qxd 4/22/04 2:09 PM 2 Page 2 THE SAGE DICTIONARY OF STATISTICS the others may produce non-significant
findings due to conditions which are of little
or no interest to the researcher. In a priori testing, the number of comparisons to be made
has been limited to a small number of key
comparisons. It is generally accepted that if
there are relatively few a priori comparisons
to be made, no adjustment is needed for the
number of comparisons made. One rule of
thumb is that if the comparisons are fewer in
total than the degrees of freedom for the main
effect minus one, it is perfectly appropriate to
compare means without adjustment for the
number of comparisons.
Contrasts are examined in a priori testing.
This is a system of weighting the means in
order to obtain the appropriate mean difference
when comparing two means. One mean is
weighted (multiplied by) 1 and the other is
weighted 1. The other means are weighted 0.
The consequence of this is that the two key
means are responsible for the mean difference. The other means (those not of interest)
become zero and are always in the centre of
the distribution and hence cannot influence
the mean difference.
There is an elegance and efficiency in the a
priori comparison strategy. However, it does
require an advanced level of statistical and
research sophistication. Consequently, the
more exhaustive procedure of the post hoc
test (multiple comparisons test) is more
familiar in the research literature. See also:
analysis of variance; Bonferroni test; contrast; Dunn’s test; Dunnett’s C test; Dunnett’s
T3 test; Dunnett’s test; Dunn–Sidak multiple comparison test; omnibus test; post hoc
tests abscissa: this is the horizontal or x axis in a
graph. See x axis absolute deviation: this is the difference
between one numerical value and another
numerical value. Negative values are
ignored as we are simply measuring the distance between the two numbers. Most Absolute
deviation 4
9 5 Absolute
deviation 2
3 Figure A.1 5 Absolute deviations commonly, absolute deviation in statistics is
the difference between a score and the mean
(or sometimes median) of the set of scores.
Thus, the absolute deviation of a score of 9
from the mean of 5 is 4. The absolute deviation of a score of 3 from the mean of 5 is
2 (Figure A.1). One advantage of the
absolute deviation over deviation is that the
former totals (and averages) for a set of
scores to values other than 0.0 and so gives
some indication of the variability of the
scores. See also: mean deviation; mean,
arithmetic acquiescence or yea-saying response
set or style: this is the tendency to agree or
to say ‘yes’ to a series of questions. This tendency is the opposite of disagreeing or saying
‘no’ to a set of questions, sometimes called a
nay-saying response set. If agreeing or saying
‘yes’ to a series of questions results in a high
score on the variable that those questions are
measuring, such as being anxious, then a
high score on the questions may indicate
either greater anxiety or a tendency to agree.
To control or to counteract this tendency,
half of the questions may be worded in the
opposite or reverse way so that if a person
has a tendency to agree the tendency will
cancel itself out when the two sets of items
are combined. adding: see negative values Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 3 ALPHA (α) RELIABILITY, CRONBACH’S Probability of
head = 0.5 + Probability of
tail = 0.5 = 3 Probability of head
or tail is the sum of
the two separate
probabilities
according to
addition rule: 0.5 +
0.5 = 1 Figure A.2 Demonstrating the addition rule for the simple case of either heads or tails when tossing a coin addition rule: a simple principle of
probability theory is that the probability of
either of two different outcomes occurring is
the sum of the separate probabilities for those
two different events (Figure A.2). So, the
probability of a die landing 3 is 1 divided by
6 (i.e. 0.167) and the probability of a die landing 5 is 1 divided by 6 (i.e. 0.167 again). The
probability of getting either a 3 or a 5 when
tossing a die is the sum of the two separate
probabilities (i.e. 0.167 0.167 0.333). Of
course, the probability of getting any of the
numbers from 1 to 6 spots is 1.0 (i.e. the sum
of six probabilities of 0.167). N is the number of scores and 冱 is the symbol
indicating in this case that all of the scores
under consideration should be added
together.
One difficulty in statistics is that there is a
degree of inconsistency in the use of the symbols for different things. So generally speaking, if a formula is used it is important to
indicate what you mean by the letters in a
separate key. algorithm: this is a set of steps which see analysis of covariance describe the process of doing a particular calculation or solving a problem. It is a common
term to use to describe the steps in a computer
program to do a particular calculation. See
also: heuristic agglomeration schedule: a table that shows alpha error: see Type I or alpha error which variables or clusters of variables are
paired together at different stages of a cluster
analysis. See cluster analysis
Cramer (2003) ) reliability, Cronbach’s: one of a
alpha ( adjusted means, analysis of covariance: algebra: in algebra numbers are represented
as letters and other symbols when giving
equations or formulae. Algebra therefore is
the basis of statistical equations. So a typical
example is the formula for the mean:
m 冱X
N In this m stands for the numerical value of the
mean, X is the numerical value of a score, number of measures of the internal consistency of items on questionnaires, tests and
other instruments. It is used when all the
items on the measure (or some of the items)
are intended to measure the same concept
(such as personality traits such as neuroticism). When a measure is internally consistent, all of the individual questions or items
making up that measure should correlate
well with the others. One traditional way of
checking this is split-half reliability in which
the items making up the measure are split
into two sets (odd-numbered items versus Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 4 THE SAGE DICTIONARY OF STATISTICS 4 Table A.2 Preferences for four foodstuffs
plus a total for number of
preferences Person
Person
Person
Person
Person
Person 1
2
3
4
5
6 Q1:
bread Q2:
cheese Q3:
butter Q4:
ham Total 0
1
1
1
0
0 0
1
0
1
0
1 0
1
1
1
0
0 0
0
1
1
1
0 0
3
3
4
1
1 Table A.3 The data from Table A.2 with Q1
and Q2 added, and Q3 and Q4
added Person
Person
Person
Person
Person
Person Half A:
bread + cheese
items Half B:
butter + ham
items Total 0
2
1
2
0
1 0
1
2
2
1
0 0
3
3
4
1
1 1
2
3
4
5
6 even-numbered items, the first half of the
items compared with the second half). The
two separate sets are then summated to give
two separate measures of what would appear
to be the same concept. For example, the following four items serve to illustrate a short
scale intended to measure liking for different
foodstuffs:
1
2
3
4 I like bread
I like cheese
I like butter
I like ham Agree Disagree
Agree Disagree
Agree Disagree
Agree Disagree Responses to these four items are given in
Table A.2 for six individuals. One split half of
the test might be made up of items 1 and 2,
and the other split half is made up of items 3
and 4. These sums are given in Table A.3. If
the items measure the same thing, then the
two split halves should correlate fairly well
together. This turns out to be the case since
the correlation of the two split halves with each other is 0.5 (although it is not significant
with such a small sample size). Another name
for this correlation is the split-half reliability.
Since there are many ways of splitting the
items on a measure, there are numerous split
halves for most measuring instruments. One
could calculate the odd–even reliability for
the same data by summing items 1 and 3
and summing items 2 and 4. These two forms
of reliability can give different values. This is
inevitable as they are based on different combinations of items.
Conceptually alpha is simply the average
of all of the possible split-half reliabilities that
could be calculated for any set of data. With a
measure consisting of four items, these are
items 1 and 2 versus items 3 and 4, items 2
and 3 versus items 1 and 4, and items 1 and 3
versus items 2 and 4. Alpha has a big advantage over split-half reliability. It is not dependent on arbitrary selections of items since it
incorporates all possible selections of items.
In practice, the calculation is based on the
repeated-measures analysis of variance. The
data in Table A.2 could be entered into a
repeated-measures one-way analysis of variance. The ANOVA summary table is to be
found in Table A.4. We then calculate coefficient alpha from the following formula: alpha
mean square between people
mean square residual
mean square between people
0.600 − 0.200 0.400
0.67
0.600
0.600 Of course, SPSS and similar packages simply
give the alpha value. See internal consistency; reliability
Cramer (1998) alternative hypothesis: see hypothesis;
hypothesis testing AMOS: this is the name of one of the computer programs for carrying out structural Cramer Chapter-A.qxd 4/22/04 2:09 PM Page 5 ANALYSIS OF COVARIANCE (ANCOVA) Table A.4 Repeated-measures ANOVA
summary table for data in Table A.2
Sums of Degrees of
squares freedom
Between
treatments
Between people
Error (residual) 0.000 3 3.000
3.000 5
15 Means
square
0.000
(not needed)
0.600
0.200 equation modelling. AMOS stands for
Analysis of Moment Structures. Information
about AMOS can be found at the following
website:
.
html
See structural equation modelling analysis of covariance (ANCOVA):
analysis of covariance is abbreviated as
ANCOVA (analysis of covariance). It is a form
of analysis of variance (ANOVA). In the simplest case it is used to determine whether the
means of the dependent variable for two or
more groups of an independent variable or
factor differ significantly w...

View
Full Document