TU-L0022_aalto-CUR-141790-3063741: Summary of measurement validation techniques (17:01)

Summary of measurement validation techniques (17:01)

Få ett betyg

In this video all key terms and concepts of reliability and validity are summarized. Reliability and validity can be assessed using multiple different techniques and this video will provide an overview of some of those techniques in a summary form and then present you some suggestions on which ones to use and which ones not to use.

Click to view transcript

Reliability and validity can be assessed using multiple different techniques and this video will provide an overview of some of those techniques in a summary form and then present you some suggestions on which ones to use and which ones not to use.

For validity - measurement validation is a conceptual argument so it cannot test validity empirically. You can provide some evidence that supports your validity claim but validity of whether the variants of the construct causes variation in the indicators it is ultimately a theoretical argument that you cannot prove empirically. Nevertheless there are some techniques that are commonly applied.

Factor analysis is the most important technique for assessing validity. Factor analysis can be used for three different purposes. One is to assess whether indicators have a common cause. If indicators have a common cause or if they have zero variation that's an indication that they could measure the same thing.

Then the second purpose is to assess common method varience. If one factor explains the majority of the data - correlation to the data - then that's an indication that your measurement method could be driving the correlations. There are also more refined techniques that allow you to test or assess method variance and the actual factors at the same time.

Then factor analysis is also used for discriminant validity assessment. The idea of discriminant validity is that two scales are discriminant valid or are empirically distinct if the factor correlation is well below one. The idea is that if two factors correlate at 0.99 for example then it's difficult to claim that those two factors represent two different things. So discriminant validity is whether two scales are empirically distinct. Then factor analysis makes some assumptions that must be checked. If or some of them can't be checked in which case they have to be justified based on existing theory.

Exploratory factor analysis assumes that all relationships are linear and the error terms in the factor analysis are independent. The confirmatory factor analysis approach is more flexible. It only assumes that the model is correctly specified and you can model nonlinear relationships in which case you would be doing item response theory analysis or you can model measurement effects correlated there are secondary factors and so on. But it's important the model is correctly specified because otherwise it's not a proper test of your theory. Minimal reporting of exploratory factor analysis is which factor rotation techniques you used. You should always use the direct oblimin rotation.

Then factor loading pattern - so you report if you have ten indicators and four factors then you have a table of four columns and ten rows and then you have a factor loading for each indicator factor pair and then you highlight the highest ones so that it's easier to see the pattern in the loadings.

In confirmatory factor analysis you report estimated factor loadings and the chi-square statistic the decrease of freedom and the p-value. If the p-value is rejected then you also - if the p-value rejects the model then you also need to report what kind of diagnostics you did for the confirmatory factor analysis.

Then we have the second technique is construct validity assessment regression analysis correlations. The idea of this technique is that you have different measures that could correlate and they are supposed to measure different things. Then you have a theoretical expectation of how the constructs that those indicators measure behave. So the theoretical expectations are called the normal logical network. So you have the causal relationships in their directions and strength and then you compare if your empirical relationships between the measures match the theoretical expectations. If they do then you conclude that you may have concert validity.

The assumption here is that the normal logical network or all the theoretical relationships are known a prior and this is very difficult to satisfy in practice because we - typically we're testing new theory in articles if our theory is new - it has never been tested before - then how could we possibly know that it's correct. We can't.

The minimum reporting is regression coefficient its standard error and p-value. These are normally not reported as a table. Instead they are reported in the text so you're saying with relationships you expect then you have parenthesis including the regression coefficient standard error and p-value in the parentheses telling that whether the expected relationship was observed or not. Then you discuss whether the regression coefficients match your theoretical expectations. Then you have theoretical arguments and this is rarely seen but this is very important thing. And the theoretical argument is that the idea of validity is that the variants of construct causes various in inner items. That's one definition of measurement validity.

The concepts or theoretical arguments must answer the question of why we should expect the construct variation to cause variation in the data. So we have to explain what is the process that through which the construct causes people to respond the survey in a particular way for example. The assumptions is that the argument is logical and supported by prior theory.

Then we have principal component analysis which is sometimes used but it's not useful for measurement validation. People are incorrectly applied principal component analysis as a factor analysis technique. It is not a factor analysis technique it's a data summarization technique and its not useful for any of these purposes that we use factor analysis for.

Then reliability. Reliability - we have to consider two important things. The reliability of the scales scores that we calculate as the mean or sum of the items and then we need to quantify the reliability of the scale score and we do that by using reliability indices. So reliability indices tell us what is the reliability of scale score.

There are many different variations or many different types of reliability indices and they differ in the assumptions that they make. The most commonly used is the Tau equivalent reliability or alpha which assumes the indicators are uni-dimensional measures of one thing. All items are equally reliable and measurement error is purely random.

Then we have - the second most popular is the congeneric reliability or composite reliability or coefficient Omega which assumes uni-dimensionality and random measurement error. The difference here is that congeneric reliability allows the indicators to differ in their individual reliabilities. The minimum reporting that you do is you have to explain why you chose a particular index and then you have to justify the assumptions and explain how they were checked if they were and then the actual value of the index.

Then we have test-retest correlation and it can be used to assess the reliability of individual measures or scale scores. The idea was that we measure one thing now and then we measure the same thing a week later. If the two measures correlate that's indication of reliability. This technique makes two assumptions. First of all the assumption is that the delay between the measures must be sufficiently long so that the informant gets the reset. So if we remember what we answered the last time to a question then test-retest doesn't work. So it assumes that we don't remember a previous answer. That's the reason for the delay. Then it also assumes that the delay is not long enough or too long so that the trait that we are measuring is relatively stable. So if we measure a child's height now and then two years from now then whether those two measures are not the same it's not indication of unreliability - it's indication of that the kid has grown during that time. So the trade must be stable. Then in a minimum reporting you should justify the delay and then you test-retest correlations that you report those. So justify the delay. It's not too long not too short and then report actual correlations.

Then we have standardized factor loadings and standardized factor loadings are used to assess individual real item reliability. So the square of standardized factor loading is an estimate of individual item reliability. Assumptions are the ones that you make in your factor analysis and then typically the factor analysis reporting is interpreted as it is and you don't need to report anything special for this reliability estimate.

Then there is average variance extracted with sometimes used and this is redundant with others. So there is really no reason to use it and it's one index for scale but it is not the reliability index in the same sense as these others here because it doesn't quantify what is the reliability of the sum. And you need to report the factor loadings anyway that go into the average varinace extracted index so the AVE really doesn't give any additional value beyond the standardized factor loadings that you would normally report anyway.

So finally how do you assess reliability and validity evidence reported by others? And I'll have two roles when you're reading published work and when you're reviewing work for publication. So when you are a conference or journal reviewer. And we have a couple - a four scenarios here and what to do in these scenarios?

One is common is the factor analysis is missing and if what do you do about it? So factor analysis is the most important tool for validation and what if the authors don't report that at all. So they could say that it is conducted the results were okay they don't report it or they don't say thing about factor analysis at all. If the scale has been previously validated in multiple different studies and the previous validation evidence is valid. The fact that a scale has been applied before in some statistic - some statistics have been reported about that scale - doesn't mean that it has actually validated because for example the people who represent a scale could have used principle component analysis which is not useful for scale validation and then they nevertheless get to published paper. So is there actual prior evidence that you can check if yes then is probably okay. If your reviewing somebody's work and they present a multiple item scale and don't present the factor analysis then you should require revision that includes the factor analysis results.

Then we have a reliability statistic reported without checking the assumptions. This is the default case of use coefficient alpha without even knowing what the assumptions are. If you're reviewing somebody's - reading published work then it's useful to know that the reliability statistics - even if the assumptions are not fulfilled completely - the results may not be that severe. So you could probably trust the results. If you are reviewing somebody else's work for publication then require that the authors justify the chosen reliability coefficient and report how the assumptions were checked.

Then the fourth case - third case is that there is some evidence that you don't really understand. So the authors discuss something about reliability and validity. They use some weird index that you have never heard about like greatest lowest bound GLB or whatever. You don't understand what it means. So what do you do about it? Then if you're a reader of the paper then if there is some other evidence that you know how to interpret and you can make a decision of whether to trust the results using that other evidence then it's probably okay to ignore the evidence that you don't understand. Another alternative is that this is a learning opportunity for you. So study the technique and importantly you should study the techniques using sources that are trustworthy because particularly in measurement there are these guideline type articles that basically say that this has been applied before therefore it's a good practice it should be applied in the future. The fact that something has been applied before doesn't make it a valid technique. So there are articles that advocate techniques not because of the merits of the techniques but because the technique has been used previously and the authors think that it's evidence for its validity. Trust for the sources include for example organizational research methods so you can basically trust that what's said in the journal makes sense. Other journals psychological methods is okay. A good book about measurement is okay as well.

If you're reviewing work by somebody else and you don't understand the statistics that they apply then ask the authors to explain that in the paper. If you don't understand what the statistic tells you then it's possible that other readers don't understand it either. Coefficient alpha - most people probably have an idea of what it means. Greatest lower bound statistic - most people in management probably have no clue what it does. So then it's useful for the article to educate the readers a bit. So tell - ask the authors to tell what the index how its interpreted why it was used what kind of asssumptions it makes and then cite appropriate papers to support that it's actually useful index for the purpose that the authors are using it for.

Then we have at the final case cross-sectional survey ignores common method variants and there is no assessment or harmon single factor test which is a really weak test then what you should do when you read published work? Well in published work you can check the correlations. If all the indicators all the measures correlate with one another then that's an indication that there's a method variance problem. If there are sets of indicators that are only weakly correlated then that's evidence that there probably is not a method variance problem. If there are objective measures or items that are specific instead of being like asking person's feelings - then you can probably trust results. If you reviewing work by others require that the authors apply a confirmatory factor analysis with a method factor and if they have marcury indicators those should be used as well. And then the authors also should mention the limitations of the technique that they apply for considering method variance problems because not all of these techniques work really well in all scenarios.

So every time when you review work by others the main thing that you do is -in the methods part - is to make people justify their decisions. So that you understand why they made those decisions then you make - then you can make a call whether the decision is justified or not.

Det här innehållet visas i förhandsgranskningsläge. Ingen spårning av försök kommer att lagras.

TU-L0022 - Statistical Research Methods D, Lecture, 2.11.2021-6.4.2022

Summary of measurement validation techniques (17:01)

Students

Teachers

Service