TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022
Kurssiasetusten perusteella kurssi on päättynyt 06.04.2022 Etsi kursseja: TU-L0022
Summary of measurement validation techniques (17:01)
In
this video all key terms and concepts of reliability and validity are
summarized. Reliability and validity can be assessed using multiple
different techniques and this video will provide an overview of some of
those techniques in a summary form and then present you some suggestions
on which ones to use and which ones not to use.
Click to view transcript
Reliability
and validity can be assessed using multiple different techniques and
this video will provide an overview of some of those techniques in a
summary form and then present you some suggestions on which ones to use
and which ones not to use.
For validity - measurement validation
is a conceptual argument so it cannot test validity empirically. You can
provide some evidence that supports your validity claim but validity of
whether the variants of the construct causes variation in the
indicators it is ultimately a theoretical argument that you cannot prove
empirically. Nevertheless there are some techniques that are commonly
applied.
Factor analysis is the most important technique for
assessing validity. Factor analysis can be used for three different
purposes. One is to assess whether indicators have a common cause. If
indicators have a common cause or if they have zero variation that's an
indication that they could measure the same thing.
Then the
second purpose is to assess common method varience. If one factor
explains the majority of the data - correlation to the data - then
that's an indication that your measurement method could be driving the
correlations. There are also more refined techniques that allow you to
test or assess method variance and the actual factors at the same time.
Then
factor analysis is also used for discriminant validity assessment. The
idea of discriminant validity is that two scales are discriminant valid
or are empirically distinct if the factor correlation is well below one.
The idea is that if two factors correlate at 0.99 for example then it's
difficult to claim that those two factors represent two different
things. So discriminant validity is whether two scales are empirically
distinct. Then factor analysis makes some assumptions that must be
checked. If or some of them can't be checked in which case they have to
be justified based on existing theory.
Exploratory factor
analysis assumes that all relationships are linear and the error terms
in the factor analysis are independent. The confirmatory factor analysis
approach is more flexible. It only assumes that the model is correctly
specified and you can model nonlinear relationships in which case you
would be doing item response theory analysis or you can model
measurement effects correlated there are secondary factors and so on.
But it's important the model is correctly specified because otherwise
it's not a proper test of your theory. Minimal reporting of exploratory
factor analysis is which factor rotation techniques you used. You should
always use the direct oblimin rotation.
Then factor loading
pattern - so you report if you have ten indicators and four factors then
you have a table of four columns and ten rows and then you have a
factor loading for each indicator factor pair and then you highlight the
highest ones so that it's easier to see the pattern in the loadings.
In
confirmatory factor analysis you report estimated factor loadings and
the chi-square statistic the decrease of freedom and the p-value. If the
p-value is rejected then you also - if the p-value rejects the model
then you also need to report what kind of diagnostics you did for the
confirmatory factor analysis.
Then we have the second technique
is construct validity assessment regression analysis correlations. The
idea of this technique is that you have different measures that could
correlate and they are supposed to measure different things. Then you
have a theoretical expectation of how the constructs that those
indicators measure behave. So the theoretical expectations are called
the normal logical network. So you have the causal relationships in
their directions and strength and then you compare if your empirical
relationships between the measures match the theoretical expectations.
If they do then you conclude that you may have concert validity.
The
assumption here is that the normal logical network or all the
theoretical relationships are known a prior and this is very difficult
to satisfy in practice because we - typically we're testing new theory
in articles if our theory is new - it has never been tested before -
then how could we possibly know that it's correct. We can't.
The
minimum reporting is regression coefficient its standard error and
p-value. These are normally not reported as a table. Instead they are
reported in the text so you're saying with relationships you expect then
you have parenthesis including the regression coefficient standard
error and p-value in the parentheses telling that whether the expected
relationship was observed or not. Then you discuss whether the
regression coefficients match your theoretical expectations. Then you
have theoretical arguments and this is rarely seen but this is very
important thing. And the theoretical argument is that the idea of
validity is that the variants of construct causes various in inner
items. That's one definition of measurement validity.
The
concepts or theoretical arguments must answer the question of why we
should expect the construct variation to cause variation in the data. So
we have to explain what is the process that through which the construct
causes people to respond the survey in a particular way for example.
The assumptions is that the argument is logical and supported by prior
theory.
Then we have principal component analysis which is
sometimes used but it's not useful for measurement validation. People
are incorrectly applied principal component analysis as a factor
analysis technique. It is not a factor analysis technique it's a data
summarization technique and its not useful for any of these purposes
that we use factor analysis for.
Then reliability. Reliability -
we have to consider two important things. The reliability of the scales
scores that we calculate as the mean or sum of the items and then we
need to quantify the reliability of the scale score and we do that by
using reliability indices. So reliability indices tell us what is the
reliability of scale score.
There are many different variations
or many different types of reliability indices and they differ in the
assumptions that they make. The most commonly used is the Tau equivalent
reliability or alpha which assumes the indicators are uni-dimensional
measures of one thing. All items are equally reliable and measurement
error is purely random.
Then we have - the second most popular is
the congeneric reliability or composite reliability or coefficient
Omega which assumes uni-dimensionality and random measurement error. The
difference here is that congeneric reliability allows the indicators to
differ in their individual reliabilities. The minimum reporting that
you do is you have to explain why you chose a particular index and then
you have to justify the assumptions and explain how they were checked if
they were and then the actual value of the index.
Then we have
test-retest correlation and it can be used to assess the reliability of
individual measures or scale scores. The idea was that we measure one
thing now and then we measure the same thing a week later. If the two
measures correlate that's indication of reliability. This technique
makes two assumptions. First of all the assumption is that the delay
between the measures must be sufficiently long so that the informant
gets the reset. So if we remember what we answered the last time to a
question then test-retest doesn't work. So it assumes that we don't
remember a previous answer. That's the reason for the delay. Then it
also assumes that the delay is not long enough or too long so that the
trait that we are measuring is relatively stable. So if we measure a
child's height now and then two years from now then whether those two
measures are not the same it's not indication of unreliability - it's
indication of that the kid has grown during that time. So the trade must
be stable. Then in a minimum reporting you should justify the delay and
then you test-retest correlations that you report those. So justify the
delay. It's not too long not too short and then report actual
correlations.
Then we have standardized factor loadings and
standardized factor loadings are used to assess individual real item
reliability. So the square of standardized factor loading is an estimate
of individual item reliability. Assumptions are the ones that you make
in your factor analysis and then typically the factor analysis reporting
is interpreted as it is and you don't need to report anything special
for this reliability estimate.
Then there is average variance
extracted with sometimes used and this is redundant with others. So
there is really no reason to use it and it's one index for scale but it
is not the reliability index in the same sense as these others here
because it doesn't quantify what is the reliability of the sum. And you
need to report the factor loadings anyway that go into the average
varinace extracted index so the AVE really doesn't give any additional
value beyond the standardized factor loadings that you would normally
report anyway.
So finally how do you assess reliability and
validity evidence reported by others? And I'll have two roles when
you're reading published work and when you're reviewing work for
publication. So when you are a conference or journal reviewer. And we
have a couple - a four scenarios here and what to do in these scenarios?
One
is common is the factor analysis is missing and if what do you do about
it? So factor analysis is the most important tool for validation and
what if the authors don't report that at all. So they could say that it
is conducted the results were okay they don't report it or they don't
say thing about factor analysis at all. If the scale has been previously
validated in multiple different studies and the previous validation
evidence is valid. The fact that a scale has been applied before in some
statistic - some statistics have been reported about that scale -
doesn't mean that it has actually validated because for example the
people who represent a scale could have used principle component
analysis which is not useful for scale validation and then they
nevertheless get to published paper. So is there actual prior evidence
that you can check if yes then is probably okay. If your reviewing
somebody's work and they present a multiple item scale and don't present
the factor analysis then you should require revision that includes the
factor analysis results.
Then we have a reliability statistic
reported without checking the assumptions. This is the default case of
use coefficient alpha without even knowing what the assumptions are. If
you're reviewing somebody's - reading published work then it's useful
to know that the reliability statistics - even if the assumptions are
not fulfilled completely - the results may not be that severe. So you
could probably trust the results. If you are reviewing somebody else's
work for publication then require that the authors justify the chosen
reliability coefficient and report how the assumptions were checked.
Then
the fourth case - third case is that there is some evidence that you
don't really understand. So the authors discuss something about
reliability and validity. They use some weird index that you have never
heard about like greatest lowest bound GLB or whatever. You don't
understand what it means. So what do you do about it? Then if you're a
reader of the paper then if there is some other evidence that you know
how to interpret and you can make a decision of whether to trust the
results using that other evidence then it's probably okay to ignore the
evidence that you don't understand. Another alternative is that this is a
learning opportunity for you. So study the technique and importantly
you should study the techniques using sources that are trustworthy
because particularly in measurement there are these guideline type
articles that basically say that this has been applied before therefore
it's a good practice it should be applied in the future. The fact that
something has been applied before doesn't make it a valid technique. So
there are articles that advocate techniques not because of the merits of
the techniques but because the technique has been used previously and
the authors think that it's evidence for its validity. Trust for the
sources include for example organizational research methods so you can
basically trust that what's said in the journal makes sense. Other
journals psychological methods is okay. A good book about measurement is
okay as well.
If you're reviewing work by somebody else and you
don't understand the statistics that they apply then ask the authors to
explain that in the paper. If you don't understand what the statistic
tells you then it's possible that other readers don't understand it
either. Coefficient alpha - most people probably have an idea of what it
means. Greatest lower bound statistic - most people in management
probably have no clue what it does. So then it's useful for the article
to educate the readers a bit. So tell - ask the authors to tell what the
index how its interpreted why it was used what kind of asssumptions it
makes and then cite appropriate papers to support that it's actually
useful index for the purpose that the authors are using it for.
Then
we have at the final case cross-sectional survey ignores common method
variants and there is no assessment or harmon single factor test which
is a really weak test then what you should do when you read published
work? Well in published work you can check the correlations. If all the
indicators all the measures correlate with one another then that's an
indication that there's a method variance problem. If there are sets of
indicators that are only weakly correlated then that's evidence that
there probably is not a method variance problem. If there are objective
measures or items that are specific instead of being like asking
person's feelings - then you can probably trust results. If you
reviewing work by others require that the authors apply a confirmatory
factor analysis with a method factor and if they have marcury indicators
those should be used as well. And then the authors also should mention
the limitations of the technique that they apply for considering method
variance problems because not all of these techniques work really well
in all scenarios.
So every time when you review work by others
the main thing that you do is -in the methods part - is to make people
justify their decisions. So that you understand why they made those
decisions then you make - then you can make a call whether the decision
is justified or not.