#
Apple Inc. v. Samsung Electronics Co. Ltd. et al

### Filing
999

Administrative Motion to File Under Seal filed by Samsung Electronics America, Inc.(a New York corporation), Samsung Electronics Co. Ltd., Samsung Telecommunications America, LLC(a Delaware limited liability company). (Attachments: #1 Proposed Order Granting Motion to Seal, #2 Samsung's Opposition to Apple's Motion to Exclude Testimony of Samsung's Experts, #3 Declaration of Joby Martin in Support of Samsung's Opposition, #4 Exhibit A to the Martin Declaration, #5 Exhibit B to the Martin Declaration, #6 Exhibit C to the Martin Declaration, #7 Exhibit D to the Martin Declaration, #8 Exhibit E to the Martin Declaration, #9 Exhibit F to the Martin Declaration, #10 Exhibit G to the Martin Declaration, #11 Exhibit H to the Martin Declaration, #12 Exhibit I to the Martin Declaration, #13 Exhibit J to the Martin Declaration, #14 Exhibit K to the Martin Declaration, #15 Exhibit L to the Martin Declaration, #16 Exhibit M to the Martin Declaration, #17 Exhibit N to the Martin Declaration, #18 Exhibit O to the Martin Declaration, #19 Exhibit P to the Martin Declaration, #20 Exhibit Q to the Martin Declaration, #21 Exhibit R to the Martin Declaration, #22 Exhibit S to the Martin Declaration, #23 Proposed Order Denying Apple's Motion to Exclude Testimony of Samsung's Experts)(Maroulis, Victoria) (Filed on 5/31/2012)

EXHIBIT R
Quality & Quantity (2007) 41:601–626
DOI 10.1007/s11135-007-9089-z
© Springer 2007
Ordinal Methodology in the Analysis of Likert
Scales
¨
RAINER GOB1,∗ , CHRISTOPHER McCOLLIN2 and MARIA
FERNANDA RAMALHOTO3
1
Institute for Applied Mathematics and Statistics, University of W¨ rzburg, Sanderring 2,
u
D-97070 W¨ rzburg, Germany. E-mail: goeb@mathematik.uni-wuerzburg.de; 2 Nottingham
u
Trent University, University Burton Street, Nottingham, NG1 4BU, United Kingdom.
E-mail: Chris.McCollin@ntu.ac.uk; 3 Instituto Superior T´ cnico, Maths Dept., Av. Rovisco
e
Pais, 1049-001 Lisbon, Portugal
Abstract. Likert scales are widely used in survey studies for attitude measuring. In particular, the questionnaires propagated by the SERVQUAL approach are based on Likert scales.
Though the problem of attitude suggests an ordinal interpretation of Likert scales, attitude
survey data are often evaluated with techniques designed for cardinal measurements. The
present paper discusses the interpretation of scales for attitude measuring and gives a survey
of data analysis techniques under the proper ordinal understanding.
Key words: attitude measuring, likert scales, ordinal scales, cardinal scales, SERVQUAL, statistical analysis.
1. Introduction
Likert scales are widely used for measuring attitudes, e.g., opinions, psychic and mental dispositions, preferences. Questionnaires and surveys based
on Likert scales are used in various areas, e.g., in psychometrics for the
analysis of subjective well-being, see Diener et al. (1985) or Watson et al.
(1988), in social studies and panels, or for purposes of business administration. The use of Likert scales has increased especially in the service sector
with consumer surveys now being commonplace within the hotel, leisure
and public utility sectors. In particular, the SERVQUAL approach introduced by Parasuraman et al. (1985, 1988) has received enormous interest.
The ways of collecting survey data vary widely from the use of telephone
questionnaires to on-line designed web pages for automatic input.
The statistical analysis of survey data can range from simple dot plots
to logistic regression and cluster analysis to determine any hidden structure. However, many studies conﬁne themselves to a descriptive analysis.
∗
Author for correspondence: E-mail: goeb@mathematik.uni-wuerzburg.de
602
¨
RAINER GOB ET AL.
Clason and Dormody (1994) compare 95 articles analyzing Likert scales
from the Journal of Agricultural Education. 51 reported only descriptive
statistics. In a recent review of some University Business School dissertations, most students opted for questionnaires and/or interviews for their
primary research and the main statistical analysis was of an exploratory
nature with bar charts, check lists and Pareto plots undertaken. It is interesting to note that in a similar way to Ishikawa’s three levels of tools which
provide the differentiation between Six Sigma Green and Black Belts, most
students will mainly only attempt Ishikawa’s level 1 tools (7 basic tools) to
carry out their analysis even though they have been taught level 2 and 3
tools such as ANOVA and regression.
Unfortunately, the promotion of ways to analyze data measured in
Likert scales is not widely available within textbooks. In fact, there is no
common standard accepted by the scientiﬁc community for the correct
interpretation and analysis of such data. Interpretation and analysis often
seem to be in a mismatch. In methodological considerations it is generally
acknowledged that attitude measuring scales should be considered as ordinal. Nevertheless, many studies use cardinal statistics as sample means,
sample variances, t-tests to analyze attitude data. Proper ordinal approaches
are in the minority. In particular, the SERVQUAL methodology as usually
propagated is completely based on cardinal statistics.
The objective of the present paper is to establish a framework for the
analysis of survey data under an explicitly ordinal interpretation of the
Likert scale. Sections 2 and 3 review the debate on the impact of scale typologies on statistical methodology. Sections 4 and 5 discuss the interpretation of Likert scales. Sections 6 and 7 suggest the multinomial model for
modelling data from attitude surveys. Sections 8 through 12 consider the
analysis of survey data from a homogeneous sample of respondents. Ways
of detecting and analyzing inhomogeneous samples are discussed in Section
13.
2. Ordinal and Cardinal Scales
We consider one-dimensional scales which can be identiﬁed with subsets
of the real line. Stevens (1946, 1951) characterizes the scale types nominal,
ordinal, interval, ratio in terms of permissible transformations. We use Stevens’ (1932) ideas to distinguish between ordinal and cardinal scales.
Ordinal measure scales consist of categories ordered by a relation of the
type “<” or “≤”, respectively. Any two measure values can be compared
in terms of the order relation. The admissibility of strictly increasing scale
transformations preserving the order relation is characteristic for ordinal
scales. Consequently, differences of scale values are not meaningful. Beyond
order, there is no measure for the distance between two scale values. For
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
603
instance, the ordinal scales 1, 2, 3, 4, 5 and 1, 3, 9, 27, 81 are equivalent.
However, in the ﬁrst scale the magnitude of differences between successive
points is identical, whereas it is increasing in the second scale.
Cardinal measure scales express magnitudes. Differences between scale
values are meaningful. In Stevens’ terminology, cardinal scales are interval scales. Interval scales are characterized by the admissibility of strictly
increasing linear transformations. For instance, the cardinal scales 1, 2, 3,
4, 5 and 0, 2, 4, 6, 8 are equivalent.
3. Scale Interpretation and Statistical Methodology
The rationale behind an axiomatic distinction of scales as described in
Section 2 is beyond doubt. However, the role of scale identiﬁcations in the
methodology of statistical data analysis is controversial.
Stevens (1951) and subsequently many other authors, e.g., Luce (1959),
Townsend and Ashby (1984) and Luce et al. (1990), postulate the following
steps of data analysis:
(S1) Scales for measuring the values of certain attributes are chosen
according to criteria provided by measurement theory.
(S2) The measure scale chosen in step (S1) prescribes certain statistics
and proscribes others.
In this view, data measured in a speciﬁc scale have to be analyzed by
statistics which preserve their meaning under the characteristic transformation of the scale. Admissible statistics for ordinal data are frequencies, histograms, order statistics. Methods involving arithmetic or weighted means
are appropriate for cardinal data, but they make no sense for the analysis
of ordinal data. Andrews et al. (1981) present an elaborate guide to select
statistical methods in accordance with measure scales.
The above view of the predominant role of measurement theory in data
analysis has been criticized by several authors, see Lord (1953), Savage
(1957), Tukey (1961), Adams et al. (1965) and Baker et al. (1986) for
instance. More references and a detailed discussion survey are given by
Velleman and Wilkinson (1993). Subsequently we consider only one, but
substantial critical argument.
The following propositions can be taken for granted: (i) Data analysis is
an autonomous discipline. (ii) Among other techniques, data analysis uses
formal mathematical methods, without being a part of mathematics. (iii)
Any data analysis is motivated by a speciﬁc problem, i.e., speciﬁc interests and objectives of knowledge discovery, occurs in a speciﬁc context, i.e.,
a speciﬁc scientiﬁc or pragmatic environment, and reﬂects methods with
respect to their solution potential for the problem in the context. (iv) The
criteria of adequacy of methods of data analysis result from the speciﬁc
problem, the speciﬁc context, and the solution potential.
604
¨
RAINER GOB ET AL.
Under propositions (i) through (iv), the description of data analysis by
steps (S1), (S2) requires the following further assumption: (v) Measurement
theory alone is able to reﬂect the criteria resulting from problem, context,
and solution potential, by determining uniquely an adequate measure scale.
However, considering the actual state of measurement theory as a discipline, see Luce et al. (1990) for instance, it will be difﬁcult to defend
Proposition (v). Customary measurement theory deliberately works without reﬂecting the potential of the entire corpus of statistical data analysis regarding problem, context, and solution potential. On the contrary,
the succession of steps (S1), (S2) claims that methods of analysis can be
selected or excluded without reﬂecting their potential. Measurement theory
claims to be a preliminary fundamental discipline for data analysis. However, it is strongly inﬂuenced by formal axiomatic reasoning and fails to
provide a conceptual framework to structure data analysis according to the
basic matters of problem and context, and solution potential.
The succession of steps (S1), (S2) has to be rejected. Scale type identiﬁcation is reasonable to avoid conceptual confusion. However, scale type
identiﬁcation by measurement theory is not exclusively decisive for the
choice of data analysis methods. The choice of appropriate methods is
determined by the three interdependent factors listed above. In this vein,
Adams et al. (1965): “Nothing is wrong per se in applying any statistical operation to measurements of given scale, but what may be wrong,
depending on what is said about the results of these operations, is that the
statement about them will be empirically meaningful or else that it is not
scientiﬁcally debated.”
Examples illustrating the above argument are provided by Lord (1953)
and Wright (1997). The subsequent Section 5 discusses the interpretation
of Likert scales.
4. The Likert Scale
Rensis Likert (1932) introduced a scale and technique for attitude
measurement. An individual is confronted with statements which are
essentially value judgements. The value judgements may concern the individual’s reﬂections of reality or the individual’s psychic dispositions as
feelings, wants, desires, conative dispositions. The individual is invited to
deﬁne his attitude towards each statement by choosing among a number
of r grades (scores, degrees) on the r-grade Likert scale. Most popular are
ﬁve-grade and seven-grade Likert scales. The grades (scores, degrees) 1, . . . , r
are ordered in ascending order of agreement or approval of the individual
with respect to the value statement. In case of r = 5, the grades are usually
interpreted by strongly disagree, disagree, neutral (undecided), agree, strongly
agree.
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
605
Likert scales are widely used in different areas for attitude measurement
by surveys, e.g., in psychology, sociology, health care, marketing, quality
control. Popular applications are in the assessment of customers’ quality perceptions or expectations, and of subjective well-being. Subjective
well-being has become an important topic in research and practical ﬁelds
like health care, see Diener (1984) or Diener et al. (1999).
Lots of differently structured attitude measuring techniques based on
Likert scales are used. We describe some standard schemes which have
been widely used for many years: SERVQUAL, PANAS, SWLS, GSOEP.
4.1. servqual
Attitude surveys structured according to the SERVQUAL approach introduced by Parasuraman et al. (1985, 1988) are currently among the most
popular applications of sample surveys in industry. SERVQUAL surveys
intend to inquire customers’ attitudes towards service quality. Service quality is considered with respect to ν dimensions which are addressed by a
questionnaire regarding M performance items. Parasuraman et al. (1985,
1988) suggest M = 22 items grouped into ν = 5 dimensions of service quality: tangibles (environmental factors), reliability, responsiveness, assurance,
empathy. This setting has widely been accepted in applications. More subtle
investigations use statistical instruments like principal components analysis
to conﬁrm or modify the setting, see Asubonteng et al. (1996) for a literature survey.
The questionnaire contains a statement on each of the M performance
items. The respondent is invited to qualify his attitude towards each statement in a response scale of Likert type with grades or scores ranging from
1 (“strongly disagree”) to r (“strongly agree”). Most popular in SERVQUAL manuals and case studies are scales with r = 7 or r = 5 grades, see
Parasuraman et al. (1985, 1988). Occasionally, other values of r, e.g., r = 10
are also used, see Asubonteng et al. (1996).
SERVQUAL distinguishes between two attitudes: expectations on quality, i.e., what a customer expects from the service, and perceptions of
quality, i.e., the customer’s view of what actually happened. SERVQUAL
intends to measure the gap between expectations and perceptions. To this
end, SERVQUAL questionnaires are doubled: The respondent is invited
to qualify his attitude towards each of the M statements once in the sense
of expectation, irrespective of what actually happened, once in the sense of
perception of what actually happened.
The SERVQUAL community has adopted some standard quantitative methodology for the evaluation of SERVQUAL surveys which was
essentially coined by Parasuraman et al. (1985, 1988, 1991). These methods are based on an implicit cardinal interpretation of the Likert scale. For
606
¨
RAINER GOB ET AL.
each respondent, dimension scores and a total SERVQUAL score are calculated as arithmetic or appropriately weighted averages. Survey scores are
calculated as arithmetic averages of the respondent scores. Gap scores are
calculated as differences of perception score minus expectation score.
4.2. panas
The positive and negative affect scale (PANAS) introduced by Watson et al.
(1988) is concerned with measuring subjective dispositions in the sense of
moods, momentary, mid-term, or long-term. PANAS refers to 20 feelings
or emotions in two dimensions, positive and negative. The respondent is
asked to notify the degree of realizing the feeling or emotion in a ﬁve-grade
Likert scale with values very slightly or not at all, a little, moderately, quite
a bit, extremely.
The quantitative methodology suggested by Watson et al. (1988) uses an
implicit cardinal interpretation of Likert scores.
4.3. swls
To measure global and rather persistent judgements on individual life, Diener et al. (1985) suggest the satisfaction with life scale (SWLS). SWLS
considers only ﬁve statements: “In most ways my life is close to my
ideal”, “The conditions of my life are excellent”, “I am satisﬁed with my
life”, “So far I have gotten the important things I want in life”, “If I could
live my life over, I would change almost nothing”. The respondent is asked
to notify the degree of approval with each statement in a seven-grade Likert scale ranging from strongly disagree to strongly agree.
The quantitative methodology used by Diener et al. (1985) to evaluate the SWLS technique is based on an implicit cardinal interpretation of
Likert scores.
4.4. gsoep
The German Socio-Economic Panel (GSOEP) has been conducted as a longitudinal panel in Germany since 1984. It includes 11 questions concerning
satisfaction with work, income, health, housing, leisure, consumption. The
answers are notiﬁed in an eleven-grade Likert scale ranging from 0 (totally
dissatisﬁed) to 10 (totally satisﬁed). Further information about GSOEP
can be obtained on the WWW at http://www.diw.de/. GSOEP contains no
advice for scale interpretation or methods of analysis.
5. The Character of Likert Scales
Following the conclusion of Section 3, the speciﬁc problem, the context of
data analysis, and the problem solving potential of methods are crucial for
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
607
the deciding about the scale type and appropriate methods of analysis. For
a Likert scale, the alternative is between an ordinal and a cardinal scale
type.
5.1. the problem behind likert scales
The problem behind the use of Likert scales is measuring attitudes. Accordingly, the interpretation of such scales is discussed in psychology, sociology
and economics. Attitude measuring has to satisfy the following criteria:
• Longitudinal consistency or retesting reliability: At repeated measuring
times under invariant relevant side conditions respondents exhibit the
same rating.
• Longitudinal comparibility: Responses given by an individual at different times with respect to the same item can be compared on the scale.
• Internal consistency.
• Interpersonal comparibility: Responses from different inviduals can be
compared on the scale.
• Plausibility: The measuring method has to conform to naive assessments of attitudes.
By deﬁnition in terms of admissible transformations, an ordinal scale is less
restrictive in interpretation than a cardinal scale. Hence it is easier to satisfy the above criteria under an ordinal interpretation than under a cardinal
interpretation of the Likert scale.
In particular, ordinal scales facilitate achieving comparibility. For
instance, consider a seven-grade Likert scale to measure satisfaction and
let an individual report grade 2 at time t1 and grade 4 at time t2 . Under
a cardinal interpretation this amounts to the controversial conclusion that
the individual at time t2 experiences twice the satisfaction experienced at
time t1 . Under an ordinal interpretation it only means that the indvidual’s satisfaction increased from the second to the fourth position on the
scale. Comparability has been discussed extensively in the theory of utility
and of social choice, see Georgescu-Roegen (1968), van Praag (1991) or Sen
(1999). Many authors agree that under ordinal scaling interpersonal comparibility is a justiﬁed working hypothesis, see Ferrer-I-Carbonell and van
Praag (2003). Cardinal interpretations, however, involve considerable difﬁculties in guaranteeing comparability.
Naive cardinal interpretations of ordinal scales may violate internal consistency and interpersonal comparibility. Hart (1996) reports the results of
an experiment suggested by Lodge (1981) for quantifying the grades in a
Likert scale by magnitudes. A sample of respondents is invited to assign
magnitudes to the grades of a 7-grade Likert scale with the interpretations
atrocious, very bad, bad, so-so, good, very good, excellent. The result shows
608
¨
RAINER GOB ET AL.
considerable differences in the weights assigned to distances between the
grades on the Likert scale. For instance, the step from atrocious to very bad
is quantiﬁed by 0.6, whereas the step from so-so to good is quantiﬁed by
1.9.
5.2. the context of the use of likert scales
Consider the scientiﬁc and pragmatic context of the use of Likert scales in
survey techniques. In view of the context, methodology is rated by the following criteria:
• Acceptance by communities in practice or research.
• Standardization.
• Comparibility of results.
Section 4 lists four popular standardized attitude measuring techniques
based on Likert scales: SERVQUAL, SWLS, PANAS, GSOEP. All are
widely accepted, deﬁnite, standardized. They differ in advice for scale interpretation and for methods of data analysis.
GSOEP contains no advice for scale interpretation or methods of analysis. Recent studies in GSOEP attitude data include explicitly ordinal scale
interpretations, see Ferrer-I-Carbonell and van Praag (2003) or Nolte and
McKee (2004), and implicitly cardinal ones, see Lucas et al. (2003) or
Ronellenﬁtsch and Razum (2004).
SERVQUAL, SWLS, and PANAS, in the manner originally conceived
by their authors, see Parasuraman et al. (1988), Diener et al. (1985) and
Watson et al. (1988), contain advices on data analysis. These advices imply
a cardinal interpretation of the Likert scale: empirical sums, means, variances and correlation coefﬁcients of scores are calculated. Such approaches
are mainly motivated by pragmatic reasoning since cardinal statistics are
widely available in textbooks and software. However, they contradict principles of attitude measuring which suggest ordinal scaling, see Section 5.1,
above. Essentially, two types of misleading conclusions may follow from the
conﬂict of intrinsic ordinality and imposed cardinality. (1) Complete distortion of results by applying strictly monotonous transformations to a scale
which bears a cardinal interpretation. Fortunately, this type of misinterpretation is prevented by the pragmatic context, where SERVQUAL, SWLS,
and PANAS are strictly linked to unambiguous Likert scales with grades
1,. . . ,7. The idea of subjecting the scale to transformations is purely academic. (2) Interpretation of attitude grades in terms of magnitudes. This
is a serious misinterpretation supported by approaches like SERVQUAL,
SWLS, or PANAS. Often enough, practitioners report results of surveys
by statements like “We’ve increased customer satisfaction by 150% in one
year.”
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
609
5.3. the analysis of attitude surveys
Consider the problem solving potential of methods for the analysis of attitude surveys. The major criteria are:
•
•
•
•
•
Clarity.
Exactness.
Informational value.
Simplicity.
Availability.
The cardinal scale approach suggested by standard descriptions of
SERVQUAL, PANAS, SWLS excels by simplicity and availability. Methods like principal components analysis, factor analysis, correlation analysis,
t-testing or ANOVA are from the conventional statistical toolbox, readily
available in textbooks or software packages.
Deﬁciencies of the cardinal scale approach are in clarity and exactness.
The basic problem of scale interpretation generally remains unmentioned
in the SERVQUAL environment and is discussed by few authors only, see
Hart (1996) or Hart (1999). Many of the methods usually recommended
are based on normality assumptions. These assumptions mostly remain
undiscussed. Attempts of substantiating by asymptotics are not made.
The informational value of methods recommended in SERVQUAL,
PANAS or SWLS schemes is undoubted. Summed or averaged scores convey information about respondents’ attitudes. However, cardinal statistics
also may hide or distort information. For instance, strong agreements and
strong disagreements may be averaged, providing a misleading impression
of average agreement.
5.4. conclusion on the interpretation of likert scales
The problem of attitude measuring clearly suggests an ordinal interpretation of Likert scales. The context of use has established some implicit cardinal interpretation. To some extent, cardinal statistics have successfully
been applied in the analysis of attitude surveys.
In summary, ordinal methodology for Likert scale analysis conforms to
the problem of attitude measuring, but it differs from widely used and
sufﬁciently successful practice. To be acceptable for practitioners, a proper
ordinal approach to Likert scale analysis has to substantiate its problem
solving potential according to the criteria listed in Section 5.3, in particular with respect to simplicity and availability. The subsequent sections give
an overview of ordinal methodology which is competitive in this sense.
610
¨
RAINER GOB ET AL.
6. Formal Description of Attitude Questionnaires
The discussion of quantitative analysis of attitude surveys requires a formal
description of attitude questionnaires.
Consider a questionnaire with M statements expressing ν dimensions. In
SERVQUAL usually ν = 5, M = 22. Let 1 = m1 < · · · < mν+1 = M + 1 and
let the statements (items) mρ , . . . , mρ+1 − 1 be associated with dimension
ρ, ρ = 1, . . . , ν. Responses are notiﬁed in a Likert scale with r grades represented by the numbers 1, . . . , r. The survey is conducted with n respondents i = 1, . . . , n. The response of respondent i with respect to statement
j is denoted by an r-tuple X ij = (Xij 1 , . . . , Xij r ) with entries from {0, 1},
Xij 1 + · · · + Xij r = 1. Xij l = 1 means: with respect to statement j , respondent
i exhibits agreement grade 1 ≤ l ≤ r on the Likert scale. Then
mρ+1 −1
(ρ)
Xi =
ν
Xij ,
ρ = 1, . . . , ν,
M
(ρ)
Xi =
Xi =
j =mρ
ρ=1
Xij
(1)
j =1
is the response vector of respondent i in dimension ρ, respectively the total
response vector in all M items.
The above scheme can be used to describe the response on perception as
well as on expectation. The response of respondent i with respect to statement j in terms of the gap between perception and expectation is denoted
by a (2r − 1)-tuple
Z ij = (Zi,j,−(r−1) , . . . , Zi,j,0 , . . . , Zi,j,r−1 )
(2)
with entries from {0, 1}, Zi,j,−(r−1) + · · · + Zi,j,r−1 = 1. Zij l = 1, l < 0 means:
with respect to item j , the perception of respondent i is |l| degrees below
the expectation. Zij l = 1, l > 0 means: with respect to item j , the perception
of respondent i exceeds the expectation by l degrees. Zij 0 = 1 means: with
respect to item j , the perception of respondent i equals the expectation.
The above interpretation of gaps is consistent with an ordinal interpretation of the Likert scale. The cyphers −(r − 1), . . . , 0, . . . , r − 1 indicate
distances of expectation and perception in terms of degrees, not of magnitudes. These distances remain invariant under strictly monotonous scale
transformations.
The vectors
mρ+1 −1
(ρ)
Zi =
ν
Z ij ,
j =mρ
ρ = 1, . . . , ν,
M
(ρ)
Zi =
Zi =
ρ=1
Z ij
(3)
j =1
are the gap vectors of respondent i in dimensions ρ = 1, . . . , ν, respectively
the total gap vector of respondent i in all M items on the questionnaire.
611
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
7. The Multinomial Model for Attitude Responses
The notation of Section 6 can be used to express survey evaluations based
on a cardinal view of the Likert scale by forming weighted sums and averages of scores. Under an ordinal interpretation, quantitative analysis is primarily interested in the proportions of respondents choosing a certain grade
on the attitude scale. In view of this interest, the multinomial distribution is
a natural stochastic model of response behaviour. The multinomial framework in terms of the multinomial logit is often used in explanatory modelling of choices or preferences, see Powers and Xie (1999). However, in the
analysis of sample surveys based on Likert scales the multinomial model is
not very popular, see Maravelakis et al. (2003) for instance.
An s-dimensional random vector Y has multinomial distribution MULT
(k, p1 , . . . , ps ), brieﬂy Y ∼ MU LT (k, p1 , . . . , ps ), if the probability density
function is given by
P (Y =y)=
k!
y
P1 1 · · · Psys
y1 ! . . . y s !
for y1 , . . . , ys ∈ N0 ,
y1 +· · ·+ys = k, (4)
with parameters k ∈ N and p1 , . . . , ps ≥ 0, p1 + · · · + ps = 1. Formula (4) gives
the probability of choosing yl times the category l in k experiments where
the choice probability of choosing category l in one experiment is pl .
The above interpretation suggests to assume
Xij ∼ MU LT (1, pij 1 , . . . , pij r ),
Z ij ∼ MU LT (1, qi,j,−(r−1) , . . . , qi,j,r−1 )
(5)
for the response vector X ij of respondent i on item j and for the gap
vector Z ij between perception and expectation of respondent i on item j .
The choice probability pij l respectively qij l quantiﬁes respondent i’s average
inclination to exhibit attitude l towards statement j .
Analogously, we assume multinomial distributions for dimension
responses or dimension gaps and for total responses and for total gaps, i.e.,
(ρ)
(ρ)
(ρ)
Xi ∼ MU LT (mρ+1 − mρ , pi1 , . . . , pir ), ρ = 1, . . . , ν,
X i ∼ MU LT (M, pi1 , . . . , pir ),
(ρ)
(ρ)
(ρ)
Z i ∼ MU LT (mρ+1 − mρ , qi,−(r−1) , . . . , qi,r−1 ), ρ = 1, . . . , ν,
Z i ∼ MU LT (M, qi,−(r−1) , . . . , qi,r−1 ).
(6)
(7)
(8)
(9)
Above, choice probabilities are indexed in the respondent i. In most cases,
choice probabilities are identical for groups of individuals or for the entire
population. We distinguish two assumptions:
(A1) Homogeneous population and sample: The choice probabilities are
invariant for all respondents from a given population, and in partic-
612
¨
RAINER GOB ET AL.
ular for all respondents 1, . . . , n. The respondent index i in formulae
(5) through (9) can be omitted.
(A2) Clustered population and sample: The choice probabilities are invariant in mutually exclusive subgroups (clusters) of the population, and
in particular in subgroups C1 , . . . , CQ , C1 ∪ · · · ∪ CQ = {1, . . . , n} of
respondents in the sample. Choice probabilities corresponding to
different clusters are different.
8. Estimation and Conﬁdence Intervals for Choice Probabilities in a
Homogeneous Population
We consider a homogeneous sample according to assumption (A1) where
the choice probabilities are identical for all respondents. Responses of
different individuals can be assumed to be independent. The respective vectors of choice probabilities, see formulae (5) through (9), are estimated by
the vectors
1
Xj =
n
Zj =
1
n
n
Xij,
X
(ρ)
i=1
n
Z ij,
Z
=
(ρ)
n
1
n(mρ+1 − mρ )
=
i=1
1
(ρ)
Xi ,
i=1
n
n(mρ+1 − mρ )
(ρ)
Zi ,
i=1
1
X=
nM
Z=
1
nM
n
Xi , (10)
i=1
n
Z i ,(11)
i=1
of empirical survey averages (empirical proportions). The components of
the survey averages are uniformly minimum variance unbiased (UMVU)
estimators for the corresponding choice probabilities, for instance Lehmann
(1983) for the background in estimation theory.
The accuracy of a parameter estimate is best supported by providing
a conﬁdence region at a sufﬁciently high conﬁdence level γ . In case of
a vector parameter, simultaneous conﬁdence intervals are particularly useful since the accuracy of each component estimate can be evaluated separately. For the vector parameter p = (p1 , . . . , ps ) in an s-dimensional
multinomial distribution, the s simultaneous conﬁdence intervals I1 =
(LCL1 ; U CL1 ), . . . , Is = (LCLs ; U CLs ) at a nominal conﬁdence level γ
have to satisfy
!
Pp (p1 ∈ I1 , . . . , ps ∈ Is ) ≥ γ
for all values of p = (p1 , . . . , ps ).
(12)
We consider an i.i.d. sample Y 1 , . . . , Y N of size N from a multinomial
distribution MULT (1, p1 , . . . , ps ). The conﬁdence intervals for pl are
centered around the UMVU estimator for pl , i.e., the sample average
1
(empirical proportion) Y ·l = N N Ydl with respect to component l.
d=1
Asymptotic simultaneous conﬁdence intervals for the choice probabilities have been constructed by Quesenberry and Hurst (1964) and Goodman
613
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
(1965). Both Quesenberry and Hurst (1964) and Goodman (1965) suggest
for pl an interval with lower/upper limit
LCLl /U CLl =
z + 2NY · l − / + z z + 4NY ·l (1 − Y ·l )
2(N + z)
,
(13)
where z is a suitable 100β% quantile of χ 2 -distribution. Quesenberry and
Hurst (1964) show that the requirement (12) is satisﬁed asymptotically for
large N , i.e., the nominal conﬁdence level is guaranteed asymptotically, by
choosing the 100γ % quantile z = zs−1 (γ ) of the χ 2 - distribution. χ 2 (s − 1)
with degree of freedom s − 1. By a theorem of Wilks (1962) based on the
Bonferroni inequality, Goodman (1965) shows that (12) can be satisﬁed
asymptotically with narrower intervals by choosing the 100(1 − (1 − γ )/s)%
quantile z = z1 (1 − (1 − γ )/s) of the χ 2 -distribution χ 2 (1) with degree of
freedom 1.
Further simultaneous conﬁdence intervals for the choice probabilities
are discussed in literature. Bailey’s (1980) approach based on a normalizing transformation of the estimators produces shorter intervals than
Goodman’s for small values of the estimators. The method of Sison and
Glaz (1995) is quite involved and cannot be used without software support.
Fitzpatrick and Scott (1987) suggest the simple intervals
c(γ )
c(γ )
Il = Y ·l − √ ; Y ·l + √
,
N
N
(14)
where c(0.90) = 1.00, c(0.95) = 1.13, c(0.99) = 1.40.
May and Johnson (1997) compare the approaches of Quesenberry
and Hurst (1964) Goodman (1965), Fitzpatrick and Scott (1987), Sison
and Glaz (1995) and some more in a simulation study. Fitzpatrick and
Scott (1987) intervals are recommended for quick and rough calculations.
Quesenberry and Hurst (1964) intervals are wide and conservative, agreeing with formula (12) generally also for relatively small sample sizes. The
narrower Goodman (1965) intervals should be used only if the dimension
s of the multinomial distribution is small or if the expected occupancy in
each degree l = 1, . . . , s is at least 10.
To apply formulae (13) and (14) for constructing simultaneous conﬁdence intervals for choice probabilities in the setting introduced by
Section 7, the estimators Y ·l and the sample size N have to be identiﬁed
appropriately with parameters from formulae (10) and (11). The identiﬁcations can be found in Table 1.
614
¨
RAINER GOB ET AL.
Table I. Interpretation of choice probabilities in formulae (5) through (9) for a homogeneous
sample and corresponding estimators and sample sizes N to be used in conﬁdence interval
formulae (13) and (14)
Number s of
Likert Degrees
Parameter
Estimator
Sample Size
s =r
pj l , probability that
a respondent chooses
degree l with respect
to statement j
1
Y ·l = X ·j l = n n Xij l ,
i=1
sample average number
of respondents choosing
degree l with respect to
statement j
N =n
s =r
pl(ρ) , average probability that a respondent
chooses degree l with
respect to statements
in dimension ρ
s =r
pl, average probability that a respondent
chooses degree l with
respect to statements
on the questionnaire
s = 2r − 1
qj l , probability that
a respondent chooses
the gap degree l with
respect to statement j
(ρ)
Y ·l = X ·l = n Xil /
i=1
sample
n(mρ+1 − mρ ),
average number of respondents choosing degree
l with respect to statements in dimension
ρ
1
Y ·l = n n Xij l = X ·j l ,
i=1
sample average number
of respondents choosing
degree l with respect to
statements on the questionnaire
1
Y ·l = Z ·j l = n n Zij l ,
i=l
sample average number
of respondents choosing gap degree l with
respect to statement j
s = 2r − 1
ql(ρ) , average probability that a respondent
chooses the gap degree
l with respect to statements in dimension ρ
s = 2r − 1
ql, average probability that a respondent
chooses gap degree l
with respect to statements on the questionnaire
(ρ)
(ρ)
(ρ)
Y ·l = Z ·l = n Zil /
i=1
n(mρ+1 − mρ ), sample
average number of respondents choosing gap
degree l with respect to
statements in dimension
ρ
1
Y ·l = Z ·l = n n Zil ,
i=1
sample average number
of respondents choosing
the gap degree l with
respect to statements on
the questionnaire
N=n(mρ+1 −mρ )
N = nM
N =n
N=n(mρ+1 −mρ )
N = nM
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
615
9. Choice of Sample Size
A reasonable criterion for sample size selection is imposing an upper limit
on the width of simultaneous conﬁdence intervals, see Tortora (1978).
As in Section 8, above, we consider an i.i.d. sample Y 1 , . . . , Y N of size
N from a multinomial distribution MU LT (1, p1 , . . . , ps ). The length of the
intervals by Quesenberry and Hurst (1964) and Goodman (1965), see formula (13), is
U CLl − LCLl =
z z + 4NY ·l (1 − Y ·l )
N +z
,
(15)
where z is a suitable quantile of a χ 2 -distribution as discussed in Section 8.
The interval length depends on the estimator Y ·l for pl and attains a max√
imum at Y ·l = 0.5 with asymptotic length z/N.
Sample size is determined according to the following criterion: The conﬁdence limits should not differ from the estimate by more than a prescribed
amount ε, i.e., the length of the conﬁdence interval for each pl , l = 1, . . . , s
should not exceed 2ε. Hence we obtain
N=
z
.
4ε 2
(16)
To identify the parameters s, p1 , . . . , ps and N in the setting of Section 7,
consider Table 1.The following example 9.1 shows that sample size calculations by formula (16) are very sensitive with respect to the type of conﬁdence intervals. For a conservative assessment of conﬁdence, the technique
of Quesenberry and Hurst (1964) should be used.
9.1. example
Consider a SERVQUAL survey based on a 7-grade Likert scale. Interest is
in estimating the choice probabilities qj,−6 , . . . , qj,6 for gap degrees in each
item. Hence s = 13, N = n. Estimates should be accurate up to ∓10% at
90% conﬁdence.
Under the more conservative conﬁdence intervals of Quesenberry and
Hurst (1964) we have to use the quantile z = z12 (0.90) = 18.55 of the
χ 2 -distribution χ 2 (12) with degree of freedom 12. Hence from formula (16)
we obtain sample size N = 464.
Under the looser conﬁdence intervals of Goodman (1965) we have to
use the quantile z = z1 (0.9923) = 7.10 of the χ 2 -distribution χ 2 (1) with
degree of freedom 1. Hence from formula (16) we obtain sample size
N = 177.
616
¨
RAINER GOB ET AL.
10. Testing the Equality of Choice Probabilities in a Homogeneous
Population
As in Section 8 we consider the survey as an independent sample from
a homogeneous population of respondents. The estimates together with
simultaneous conﬁdence intervals for the choice probabilities appearing in
formulae (10) and (11) provide a good insight into the attitudes of individuals in the population. Descriptive statistics like histograms, bar charts, pie
charts, Pareto charts should be used for presentation. In the present and
the subsequent Section 11 we present methods of statistical inference for
the comparison of choice probabilities: Tests of signiﬁcance for the equality
of choice probabilities, see below, and tests of signiﬁcance for rank orders
of choice probabilities, see Section 11.
The tests are based on an i.i.d. sample Y 1 , . . . , Y n of size n from a
multinomial distribution MU LT (M, p1 , . . . , ps ). The sum Y = Y 1 + · · · +
Y n is a sufﬁcient statistic for the probabilities p1 , . . . , ps , see Lehmann
(1983). Hence testing can be based on Y which has multinomial distribution MU LT (nM, p1 , . . . , ps ), see the reproduction theorem for multinomial distribution in Appendix A.1. The interpretation of the quantities
s, p1 , . . . , ps , Y , of the general scheme with the appropriate quantities in
the special settings of Section 7 is obvious from Table 2.
We want to ﬁnd out whether respondents prefer certain attitudes or
whether all among a given number of pairwise different attitudes i1 , . . . , it
have the same probability to be chosen. To this end we consider the equality hypothesis
H : pi1 = · · · = pit .
(17)
Similar to Fisher’s well-known test for comparing binomial probabilities, a
reasonable test of (17) compares the results Yi1 , . . . , Yit relative to the total
number y = Yi1 + · · · + Yit of observed choices in degrees i1 , . . . , it . According to assertion (c) of Theorem A.2.1 in Appendix A.2, the conditional
distribution of Yi1 , . . . , Yit under the condition y = Yi1 + · · · + Yit is the multinomial distribution MU LT (y, π1 , . . . , πt ) where πl = pil /(pi1 + · · · + pit ).
The equality hypothesis H is equivalent to H : π1 = · · · = πt = 1/t.
A reasonable test statistic should be a measure of variation of the
responses Yi1 , . . . , Yit . Light and Margolin (1971) suggest the variation
measure
y
1
V = V (Yi1 , . . . , Yit ) = −
2 2y
t
Yil
(18)
l=1
which is derived from Gini’s (1955) well-known variation measure for categorical data. The equality hypothesis H : π1 = · · · = πt = 1/t is rejected if the
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
617
Table II. Assignment of choice probabilities and statistics from formulae (5)
through (9) to the general testing schemes of Sections 10 and 11
Number s of
Likert Degrees
Compared Probabilities
Test Statistic
s =r
pj1 , . . . , pj r , probabilities
that a respondent chooses
degree 1, . . . , r with respect
to statement j
(ρ)
(ρ)
p1 , . . . , pr , average probabilities that a respondent
chooses degrees 1, . . . , r
with respect to statements
in dimension ρ
p1 , . . . , pr , average probabilities that a respondent
chooses degrees 1, . . . , r
with respect to statements
on the questionnaire
probabilities
qj1 , . . . , qj r ,
that a respondent chooses
gap degree 1, . . . , r with
respect to statement j
Y = n Xij , vector of total
i=1
numbers of respondents
choosing degrees 1, . . . , r
with respect to statement j
Y = n X(ρ) , vector of
i
i=1
total numbers of respondents
choosing
degree
1, . . . , r with respect to
statements in dimension ρ
Y = n Xi , vector of total
i=1
numbers of respondents
choosing degrees 1, . . . , r
with respect to statements
on the questionnaire
Y = n Zij , vector of total
i=1
numbers of respondents
choosing
gap
degrees
−(r − 1), . . . , r − 1 with
respect to statement j
Y = n Z (ρ) , vector of
i=1 i
total numbers of respondents choosing gap degree
−(r − 1), . . . , r − 1 with
respect to statements in
dimension ρ
Y = n Z i , vector of total
i=1
numbers of respondents
choosing
gap
degrees
−(r − 1), . . . , r − 1 with
respect to statements on the
questionnaire
s =r
s =r
s = 2r − 1
s = 2r − 1
(ρ)
q1 , . . . , qr(ρ) , average probabilities that a respondent
chooses gap degrees −(r −
1), . . . , r − 1 with respect to
statements in dimension ρ
s = 2r − 1
q1 , . . . , qr , average probabilities that a respondent chooses gap degrees
−(r − 1), . . . , r − 1 with
respect to statements on the
questionnaire
sample variation is too large, i.e., if V ≥ c. The p-value of this test under
sample realizations Yi1 = y1 , . . . , Yit = yt , y1 + · · · + yt = y, is given by
618
¨
RAINER GOB ET AL.
1
ty
x1 ,... ,xt ≥0
x1 +···+xt =y
V (x1 ,... ,xt )≥V (y1 ,... ,yt )
y!
.
x1 ! · · · .xt !
(19)
Further research on simplifying approximations of expression (19) is
necessary.
11. Ranking of Choice Probabilities in a Homogeneous Population
If choice probabilities are apparently not identical, major interest is in a
hypothesis on the rank order of the choice probabilities. Such a hypothesis may be formulated by the rank order of the empirical proportions in
the sample as expressed by a Pareto chart. Methods for conﬁrming this
hypothesis are required. Simultaneous conﬁdence regions are no help for
this purpose. In the sequel, we develop a method of testing the ranking
hypothesis by multiple comparisons as used in comparative treatment analysis, see Hsu (1996) for instance.
Consider the comparison of choice probabilities p1 , . . . , ps , pl = 1. We
wish to conﬁrm the composite rank order hypothesis
(20)
K : pi1 > pj1 , . . . , pit > pjt
where im = jm . To this end, we consider the negation H = ¬K as the null
hypothesis. K is conﬁrmed by a signiﬁcance test if H can be rejected. H
is the disjunction H = H1 ∪ · · · ∪ Ht of the null hypotheses in the t partial testing problems H1 : pi1 ≤ pj1 against K1 : pi1 > pj1 , H2 : pi2 ≤ pj2 against
K2 : pi2 > pj2 , and so on until Ht : pit ≤ pjt against Kt : pit > pjt . We test H
against K by successively testing Hm against Km . H is rejected in favour of
K if each Hm is rejected in favour of Km .
In the following Section 11.1 we describe the design of partial tests
under a prescribed level of signiﬁcance. The subsequent Section 11.2
considers the test of the composite hypothesis H against K.
11.1. design of partial tests for rank order
Consider the partial problem
Hm : pim ≤ pjm
against
Km : pim > pjm ,
(21)
m ∈ {1, . . . , t}, where pim + pjm > 0. An equivalent formulation of problem
(21) is
H
m : πm ≤ 0.5
against
Km : πm > 0.5, where πm =
pim
.
p i m + pj m
(22)
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
619
Similar to Fisher’s well-known test for comparing binomial probabilities,
a reasonable test of (21) compares the results Yim , Yjm relative to the total
number y = Yim + Yjm of observed choices in degrees im and jm . Hm is
rejected in favour of Km if Yim > c, i.e., if Yim is a too large amount of y.
We have to determine the critical value c = cα0 ∈ {0, . . . , y} under a prescribed level of signiﬁcance 0 < α0 < 1. Clearly in case of 0 = y = Yim + Yjm Hm
cannot be rejected under any level α0 , so formally c = cα0 = 0 = y in case of
y = 0. Consider the case 0 < y = Yim + Yjm . According to assertion (d) of Theorem A.2.1 in Appendix A.2, the conditional distribution of Yim under the
condition Yim + Yjm = y is the binomial distribution Bi(y, π1 ). Hence (22)
can be tested by the well-known test for binomial probabilities. The critical
value c = cα0 is determined as the minimum integer c ∈ {0, . . . , y} satisfying
the inequalities
y
1−Ly,c (0.5)=0.5
y
l=c+1
y
l
!
!
= ≤ α0 < 0.5
y
y
=1−Ly,c−1 (0.5), (23)
l
y
l=c
where Ly,c (0.5) is the distribution function of the binomial distribution
Bi(y, 0.5). These values are available in tables and are provided by any
modern statistical software package.
11.2. design of the test for the composite rank order hypothesis
Let the signiﬁcance level 0 < α < 1 be prescribed for a test of the composite
hypothesis H against K. This level can be guaranteed by prescribing α0 = α
t
for each of the t partial problems Hm against Km . Let Rm denote the event
that the m-th partial test, m ∈ {1, . . . , t}, rejects Hm in favour of Km , and let
R denote the event that H is rejected in favour of K. Then P(Rm |Hm ) ≤ α
t
by the design of the partial test and hence by the well-known Bonferroni
inequality
t
P(R|H ) = P(R1 ∪ · · · ∪ Rt |H ) ≤
t
P(Rm |H ) =
m=1
P(Rm |Hm ) ≤ α.
(24)
m=1
11.3. the p-value of the tests for rank order
Under small y = Yim + Yjm it may be impossible to satisfy (23) with α0 = α ,
t
i.e., to guarantee a partial test of prescribed signiﬁcance level α0 = α . Hence
t
it may be more adequate to consider the p-value for rejecting H = H1 ∪ · · · ∪
Ht under sample realizations Y1 = y1 , . . . , Ys = ys . The p-value of the partial
test of Hm against Km with the rejection region of type Yim > c as described
in Section 11.1 is
620
¨
RAINER GOB ET AL.
1 − Lyim +yjm ,yim −1 (0.5) = 0.5
yim +yjm
yim +yjm
l=yim
yim + yjm
.
l
(25)
By the Bonferroni inequality, an upper bound for the p-value of the test
of the composite hypothesis H is
t
t
1 − Lyim +yjm ,yim −1 (0.5) =
m=1
yim +yjm
yim +yjm
0.5
m=1
l=yim
yim + yjm
.
l
(26)
11.4. comparison of choice probabilities by simultaneous conﬁdence
intervals
Choice probabilities pim , pjm may also be compared by simultaneous conﬁdence intervals for the difference pim − pjm or the ratio pim /pjm . Such simultaneous conﬁdence intervals are provided by Goodman (1965).
12. In-Questionnaire Association
An important topic in survey data analysis is in-questionnaire association
or dependence, i.e., association or dependence between responses on certain items or dimensions or the entire questionnaire. Are responses tending to be similar or do they diverge? Particularly important is item-to-total
association, i.e., the relationship between responses on an individual item
and responses on the entire questionnaire. A detailed analysis goes beyond
the scope of the present paper. Important ordinal measures of association
are Spearman’s well-known rank correlation coefﬁcient, the gamma statistic
by Goodman and Kruskal (1954), Kendall’s (1945) tau-b, Somer’s (1962)
d statistic. These measures should be investigated under the multinomial
response model to develop efﬁcient estimation and testing procedures.
13. Comparison of Vectors of Choice Probabilities
Sections 10 and 11 compare the choice probabilities in a given multinomial
parameter vector. A further important topic is the comparison of entire
parameter vectors. Two topics are interesting:
• Comparison of parameter vectors corresponding to questionnaire items
or questionnaire dimensions or the entire questionnaire. This topic is
related to in-questionnaire association, see Section 12, above.
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
621
• Comparison of parameter vectors corresponding to different respondents. Here, we question the assumption of a homogeneous population
of respondents, made throughout in Sections 8–11, see assumption (A1)
of Section 7.
Methods for comparing multinomial parameter vectors are provided by
literature. A simple approach is to use Pearson’s χ 2 -test, see Clason and
Dormody (1994). Light and Margolin (1971) and Margolin and Light
(1974) present an ANOVA scheme which tests the hypothesis of equality
of m multinomial probability vectors. A survey of measures of agreement
between respondents is provided by Adejumo et al. (2004).
In practice, populations of respondents are often inhomogeneous, i.e.,
the hypothesis of equality of the n multinomial probability vectors of
n respondents will often be rejected. Groups (clusters) of customers,
consumers, patients, social classes, age brackets, genders, may differ substantially in their attitudes. Some distinguishing factors may be quite obvious and known beforehand so that stratiﬁed surveys may be conducted.
In other cases, however, stratiﬁed sampling is impossible. Major obstacles
are: (1) Unknown factors. (2) Known factors, but unknown distribution of
factors in the population. (3) Practical unfeasibility of stratiﬁcation, e.g.,
due to economic restrictions. In such cases, groups of substantially different
respondents have to be identiﬁed from survey data.
Standard model free clustering algorithms based on distance measures
can contribute to clustering in attitude surveys. However, model based clustering is generally more efﬁcient. By means of assumption (A2), the multinomial model of Section 7 can describe clusters as groups sharing the
same vector of choice probabilities. Recently, advances in clustering multinomial samples have been made in genetics, see Medvedovic et al. (2000).
This type of probabilistic clustering has the potential to be more efﬁcient than model free techniques. The description of multinomial clustering goes beyond the scope of the present paper. However, in view of the
potential of such methods business statistics should reﬂect and adopt such
approaches.
14. Conclusion
The problem of attitude measuring suggests an ordinal interpretation of
the Likert scale. The above Sections 7–13 show that plenty of proper ordinal methods exist for the analyis of data measured in Likert scales. However, such methods are not as easily available in textbooks and statistical
packages as cardinal statistics. Some new methods were introduced in Sections 8, 10 and 11. Further work should concentrate on developing con-
622
¨
RAINER GOB ET AL.
venient and customized versions of ordinal statistics and on propagating
these among researchers and practitioners.
Appendix
A. Properties of Multinomial Random Vectors
A.1. reproduction theorem
Sums of independent multinomial random vectors with identical vectors of
choice probabilities follow a multinomial distribution:
THEOREM A.1.1. Let Y 1 , . . . , Y n be s-dimensional independent random
vectors, each with multinomial distribution MU LT (ki , p1 , . . . , ps ).
n
Then the sum
i=1 Y i has multinomial distribution MU LT (
. . . , ps ). For a proof of Theorem A.1.1 see Wilks (1962).
k i , p1 ,
A.2. marginal and conditional distributions
The following Theorem A.2.1 gives marginal and conditional distributions
in a multinomial vector.
THEOREM A.2.1. Let Y = (Y1 , . . . , Yr ) be an r-dimensional random vector with multinomial distribution MU LT (N, p1 , . . . , pr ). Let 1 ≤ i1 < · · · <
im ≤ r. Then we have the following results:
(a) For yi1 , . . . , yim ≥ 0, yi1 + · · · + yim = y ≤ N we have
P(Yi1 = yi1 , . . . , Yim = yim )
N!
yi
yi
=
pi1 1 · · · pim1 (1 − pi1 − · · · − pim )N−y .
yi1 ! · · · yim !(N − y)!
(A.1)
(b) The sum Yi1 + · · · + Yim follows the binomial distribution Bi(N, pi1 +
· · · + pim ).
(c) Let 1 ≤ y ≤ N . Then the conditional distribution of the m-dimensional
random vector (Yi1 , . . . , Yim ) under the condition Yi1 + · · · + Yim = y is
the multinomial distribution MU LT (y, π1 , . . . , πm ) where
pil
πl =
for l = 1, . . . , m.
(A.2)
pi1 + · · · + pim
(d) Let 1 ≤ y ≤ N, m = 2. Then the conditional distribution of the univariate random variable Yi1 under the condition Yi1 + Yi2 = y is the binomial distribution Bi(y, pi1 /pi1 + pi2 ).
623
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
The proof of Theorem A.2.1 makes use of the well-known theorem on
multinomial expansions:
(pj1 + · · · + pjk )z =
yj1 ,... ,yjk ≥0
yj1 ,... ,yjk =z
z!
yj
yj
pj1 1 · · · pjk k .
yj1 ! · · · yjk !
(A.3)
Hence formula (A.1) in assertion (a) of theorem A.2.1 is obtained by calculating
P(Yi1 = yi1 , . . . , Yim = yim ) =
N!
yi
yi
pi1 1 . . . pimm
yi1 ! · · · yim !(N − y)!
(N−y)!
yj
yj
pj1 1 . . . pjNN −m .
−m
yj1 ! · · · yjN −m !
yj1 ,...,yjN −m≥0
yj1 ,...,yjN −m=N−y
For the proof of assertion (b) let 0 ≤ y ≤ N . Using formula (A.3) on
multinomial expansion we obtain
P(Yi1 + · · · + Yim = y) =
yi1 ,... ,yim ≥0
N!
yi1 ! · · · yim !(N − y)!
yi1 ,... ,yim =y
yi
y
i
pi1 1 · · · pimm (1 − pi1 − · · · − pim )N−y
=
N
(1 − pi1 − · · · − pim )N−y
y
y!
yi
yi
pi1 1 · · · pimm
yi1 ! · · · yim !
yi1 ,... ,yim ≥0
yi1 ,... ,yim =y
=
This proves assertion (b).
N
(pi1 + · · · + pim )y (1 − pi1 − · · · − pim )N−y .
y
624
¨
RAINER GOB ET AL.
Investigating the conditional distribution considered in assertion (c) we
obtain for yi1 , . . . , yim ≥ 0, yi1 + · · · + yim = y:
P(Yi1 = yi1 , . . . , Yim = yim |Yi1 + · · · + Yim = y)
P(Yi1 = yi1 , . . . , Yim = yim )
=
P(Yi1 + · · · + Yim = y)
yi
yi
N!
p 1 . . . pim1 (1 − pi1 − · · · − pim )N−y
yi1 !···yim !(N−y)! i1
=
N
(pi1 + · · · + pim )y (1 − pi1 − · · · − pim )N−y
y
yi 1
y!
pim
pi1
=
...
yi1 ! · · · yim ! pi1 + · · · + pim
pi1 + · · · + pim
yim
.
Acknowledgements
This paper is supported by funding from the “Growth” program of the
European Community and was prepared in collaboration by member
organizations of the Thematic Network-Pro- ENBIS-EC contract number
G6RT-CT-2001-05059.
References
Adams, E., Fagot, R. F. & Robinson, R. E. (1965). A theory of appropriate statistics. Pschometrika 30 (2): 99–127.
Adejumo, A. O., Heumann, C. & Toutenburg, H. (2004). A review of agreement measure as a subset of association measure between raters. SFB386-Discussion Paper 385,
¨
Ludwig-Maximilians-Universit¨ t, Munchen, Germany.
a
Andrews, F. M., Klem, L., Davidson, T. N., O’Malley, P. M. & Rodgers, W. L. (1981). A
Guide for Selecting Statistical Techniques for Analyzing Social Science Data. Ann Arbor:
Institute for Social Research, University of Michigan.
Asubonteng, A., McClearly, K. J. & Swan, J. E. (1996). SERVQUAL revisited: A critical
review of service quality. J. Service. Market. 10 (6): 62–81.
Bailey, B. J. R. (1980). Large sample simultaneous conﬁdence intervals for the multinomial probabilities based on transformations of the cell frequencies. Technometrics 22 (4):
583–589.
Baker, B. O., Hardyck, C. D. & Petrinovich, L. F. (1986). Weak measurements versus strong
statistics: An empirical critique of S. S. Stevens’ proscriptions on statistics. Education.
Psychol. Measure. 26: 291–309.
Clason, D. L. & Dormody, T. J. (1994). Analyzing data measured by individual likert-type
items. J. Agric. Education 35 (4): 31–35.
Diener, E.(1984). Subjective well-being. Psychol. Bull. 95: 542–575.
Diener, E., Emmons, R. A., Larsen, R. J. & Grifﬁn, S. (1985). The satisfaction with life scale.
J. Personal. Assess. 49 (1): 71–75.
Diener, E., Suh, E. M., Lucas, R. E. & Smith, H. L. (1999). Subjective well-being: Three
decades of progress. Psychol. Bull. 125 (2): 276–302.
Ferrer-I-Carbonell, A. & van Praag, B. M. S. (2003). Income satisfaction inequality and its
Causes. J. Econ. Inequality 1: 107–127.
ORDINAL METHODOLOGY IN THE ANALYSIS OF LIKERT SCALES
625
Fitzpatrick, S. & Scott, A. (1987). Quick simultaneous conﬁdence intervals for multinomial
proportions. J. Am. Stat. Assoc. 82: 399.
Gini, C. W. (1955). Variabilit` e Concentrazione. Vol. 1: Memorie di metodologia statistica.
a
Georgescu-Roegen, N. (1968). Utility. in International Encyclopedia of Social Sciences, Vol.
16, New York: MacMillan Co. & The Free Press, pp. 236–267.
Goodman, L. A. (1965). On simultaneous conﬁdence intervals for multinomial proportions.
Technometrics 7 (2): 247–254.
Goodman, L. A. & Kruskal, W. H. (1954). Measuring of association for cross classiﬁcations.
J. Am. Stat. Assoc. 49: 732–768.
Hart, M. C. (1996). Improving the discrimination of SERVQUAL by using magnitude scaling. In London: G. K. Kanji (ed.) Chapman & Hall Total Quality Management in Action.
Hart, M. C. (1999). The quantiﬁcation of patient satisfaction. In Managing Quality: Strategic Issues in Health Care Management. H. T. O. Davies, M. Tavakoli, M. Malek, and
A. Neilson Ashgate (eds.), Aldershot.
Hsu, J. C. (1996). Mutiple Comparisons. Theory and Methods. Boca Raton, London:
Chapman & Hall
Kendall, M. G. (1945). The treatment of ties in ranK problems. Biometrika, 33: 239–251.
Lehmann, E. L. (1959). Testing Statistical Hypotheses. New York: John Wiley & Sons.
Lehmann, E. L. (1983). Theory of Point Estimation. New York: John Wiley & Sons.
Light, J. & Margolin, B. H. (1971). An analysis of variance for categorical data. J. Am. Stat.
Assoc. 66 (335): 534–544.
Likert, R. (1932). A technique for the measurement of attitudes. J. Social. Psychol. 5:
228–238.
Lodge, M. (1981). Magnitude scaling. Sage University Paper Series on Quantitative Applications in the Social Sciences, 07-025, Beverly Hills and London: Sage Publications
Lord, F. M. (1953). On the statistical treatment of football numbers. Am. Psychol. 8:
750–751.
Lucas, R. E., Clark, A. E., Georgellis, Y. & Diener, E. (2003). Reexamining adaptation and
the set point model of happiness: Reactions to changes in marital status. J. Personal.
Social Psychol. 84 (3): 527–539.
Luce, R. D. (1959). On the possible psychophysical laws. Psychol. Rev. 66: 81–95.
Luce, R. D., Krantz, D. H., Suppes, P. & Tversky, A. (1990). Foundations of Measurement.
Vol. III. New York: Academic Press.
Maravelakis, P. E., Perakis, M., Psarakis, S. & Panaretos, J. (2003). The use of indices in
surveys. Qual. Quant. 37: 1–19.
Margolin, B. H. & Light, J. (1974). An analysis of variance for categorical data, II. J. Am.
Stat. Assoc. 69 (347): 755–764.
May, W. L. & Johnson, W. D. (1997). Properties of simultaneous conﬁdence intervals for
multinomial proportions. Commun. Stat. Simulat. Comput. 26 (2): 495–518.
Medvedovic, M., Succop, P., Shukla, R. & Dixon, K. (2000). Clustering mutational spectra
via classiﬁcation Likelihood and Markov chain Monte Carlo algorithms. J. Agric. Biol.
Environ. Stat. 6 (1): 19–37.
Nolte, E. & McKee, M. (2004). Changing health inequalities in east and west Germany since
uniﬁcation. Social Sci. Med. 58 (1): 119–136.
Parasuraman, A., Berry, L. L. & Zeithaml, V. A. (1991). Reﬁnement and assessment of the
SERVQUAL. J. Retail. 67(4): 420–449.
Parasuraman, A., Zeithaml, V. A. & Berry, L. L. (1985). A conceptual model for service
quality and its implication for future research. J. Market. 41–50.
Parasuraman, A., Zeithaml, V. A. & Berry, L. L. (1988). SERVQUAL: A multiple-item scale
for measuring consumer perceptions of service quality. J. Retail. 64 (1): 12–40.
626
¨
RAINER GOB ET AL.
Powers, D. & Xie, Y. (1999). Statistical Methods for Categorical Data Analysis. Academic
Press, Inc.
Quesenberry, C. P. & Hurst, D. C. (1964). Large sample simultaneous conﬁdence intervals
for multinomial proportions. Technometrics 6 (2): 191–195.
Ronellenﬁtsch, U. & Razum, R. (2004). Deteriorating health satisfaction among immigrants
from eastern Europe to Germany. Int. J. Equity Health 3 (1):4
Savage, I. R. (1957). Nonparametric Statistics. J. Am. Stat. Assoc. 52: 331–334.
Sen, A. (1999). The possibility of social choice. Am. Econ. Rev. 89: 349–378.
Sison, C. P. & Glaz, J. (1995). Simultaneous conﬁdence intervals and sample size determination for multinomial proportions. J. Am. Stat. Assoc. 90 (429): 366–369.
Somer, R. H. (1962). A new asymmetric measure of association of ordinal variables. Am.
Sociol. Rev. 27: 799–811.
Stevens, S. S. (1946). On the theory of scales of measurements. Science 103: 677–680.
Stevens, S. S. (1951). Mathematics, measurement and psychphysics. In Handbook of Experimental Psychology. S. S. Stevens (ed.), New York: John Wiley & Sons pp. 1–49.
Tortora, R. D. (1978). A note on sample size estimation for multinomial populations. Am.
Stat. 32 (3): 100–102.
Townsend, J. T. & Ashby, F. G. (1984). Measurement scales and statistics: The misconception misconceived. Psychol. Bull. 96 (2): 394–401.
Tukey, J.W. (1961). Data analysis and behavioral science or learning to bear the quantitative
burden by Shunning Badmandments. In The Collected Works of John W. Tukey, Vol. III,
L. V. Jones (ed.), Belmont: Wadsworth. pp. 391–484.
van Praag, B. M. S. (1999). Ordinal and cardinal utility. J. Economet. 50: 69–89.
Velleman, P. F. & Wilkinson, L. (1993). Nominal, ordinal, interval, and ratio typologies are
misleading. Am. Stat. 47 (1): 65–72.
Watson, D., Clark, L. & Tellegen, A. (1988). Development and validation of brief measures of positive and negative affect: The panas scales. J. Personal. Social Psychol. 54 (6):
1063–1070.
Wilks, S. S. (1962). Mathematical Statistics. New York: John Wiley & Sons
Wright, D. B. (1997). Football standings and measurement levels. Statistician 46 (1):
105–110.
**Disclaimer:** Justia Dockets & Filings provides public litigation records from the federal appellate and district courts. These filings and docket sheets should not be considered findings of fact or liability, nor do they necessarily reflect the view of Justia.

**Why Is My Information Online?**