Test Reliability and Validity
Predictor
any variable used to predict a criterion (outcome)
Assessing the Quality of Predictors
Psychometric criteria = Reliability and Validity
Reliability
Is the predictor stable (reliable) over time?
You shouldn’t score differently when taking the test again
determined by the test itself.
consistency, stability, or equivalence of a measure
Test-Retest Reliability
Same test, same people, 2 separate times
simplest method
correlate scores of both test times for each person
Coefficient of stability – correlation of how stable test is over time
Equivalent-Form Reliability
Same people, 2 separate tests on the same construct
very difficult, least popular
correlate scores of both test forms for each person
higher the r = high equivalent-form reliability
Coefficient of equivalence
Internal-Consistency Reliability
1 test (later broken into smaller tests)
homogeneity of the test
Two ways to test for internal-consistency reliability
Split-half reliability
Cronbach’s alpha or Kuder-Richardson 20
Inter-rater reliability
1 test rated by 2 or more researchers
objective and subjective
r = agreement (consistency) among ratings
Validity
Can we accurately draw inferences from predictor (test) scores?
Does it measure what we want it to measure
determined by the use of the test
Types of validity
Construct Validity
Convergent validity coefficients
comparison of your test to pre-existing tests of construct
Divergent validity coefficients
comparison of your test to a test that measures something else
Criterion-Related Validity
Predictor relates to criterion (construct)
Two times (related to time)
Concurrent criterion-related validity
relationship of predictor and criterion at same time,
Predictive criterion-related validity
relationship of predictor and criterion in the future
High school GPA and college performance
Validity coefficient
will accept slightly lower score bc it’s hard to get. Like .3 or so.
Content Validity
Does test adequately over the intended construct
From the opinion of Subject matter experts (test developers)
No r coefficient, minimal rigor
less rigorous than the others
Face Validity
Test appears to cover the intended construct
From the opinions of test takers
No r coefficient, minimal rigor
less rigourous than the others
without this or content validity, going through the other steps is meaningless
Predictor development
Two dimensions to classify predictors
Psychological tests and inventories
History of psychological testing
Sir Francis Galton - first scientist to devise a way of systematically measuring people
Cattell - introduced the term mental test. He devised an early intelligence test based on sensory discrimination and reaction time.
Ebbinghaus - German psychologist developed math and sentence completion tests. In 1897 he reported that performance on sentence completion test was related to school performance
Binet - French psychologist developed test of intelligence. It consisted of 30 problems covering judgment, comprehension, and reasoning.
Terman - Continued Binet’s research and developed the concept of IQ (Intelligence Quotient)
Types of tests
Test vs Inventory
Test – answers are right or wrong (quiz)
Inventory – no right or wrong answers (ASVAB)
Speed versus Power Tests
speed – easy items with short time limit (typing)
power – difficult items with no time limit (Final)
Individual versus Group Tests
individual – one test taker (IQ test)
group – several takers (Quiz)
Paper-and-Pencil versus Performance Tests
P&P – no physical task (quiz)
performance – requires physical skill (driving test)
Ethical standards in testing
APA code of professional ethics
Test user qualifications
Invasion of privacy
asking questions unrelated to construct or that are inherently intrusive
test reveals more information than is needed
Favorite sports team
intrusive
religion, sexual orientation, pregnancy
Confidentiality
who has access to the data/scores
confidential unless written release given by the test-taker
Sources of information about testing
Mental Measurements Yearbook (MMY) – big book of tests published every two years
Tests in Print VII – bibliographies and helps locate tests in the MMY
Test content
Intelligence tests
complex construct, multiple types of IQ
“g” = general mental ability
single best predictor of performance (r = 0.4 – 0.6)
Example Question
Four years ago, Jane was twice as old as Sam. Four years from now, Sam will be 3/4 of Jane’s age. How old is Jane now? (mensa.org)
Sternbrg’s triarchic theory of intelligence
Mechanical aptitude tests
Recognition of mechanical principles
concepts include: sound & heat conductance, velocity, gravity, and force
predictive feature of performance in manufacturing and production jobs
Ex question: car and ball thrown off 100 foot cliff, which will hit ground at same time.
Personality inventories
No right or wrong answers, level of agreement
Predictive of job success
Myers-Briggs Type Indicator
Big five theory of personality (more scientific than others)
neuroticism – stability vs. instability
extraversion – sociable, assertive, outgoing
openness to experience – curious, imaginative
agreeableness – cooperative, helpful, easy going
conscientiousness – purposeful, organized
“p-fator”
Integrity test
Asses honesty, integrity, and character
Used to identify those who might steal or perform CWBs (ex: absenteeism)
Overt integrity tests
Personality-based measures
these work the best
mainly tests conscientiousness and emotional stability personality factors
Physical abilities
assess strength , endurance and movement quality
predictive of job performance in physically demanding jobs
static strength
the ability to use muscle to lift, push, pull, or carry objects
explosive strength
the ability to use short bursts of muscle force to propel oneself or an object
gross body coordination
the ability to coordinate the movement of arms, legs, and torso where the whole body is in motion
stamina
the ability of the lungs and the circulatory systems of the body to perform efficiently over time
Situational judgment tests
All answers are plausible, only 1 is appropriate in the given situation.
Computerized adaptive testing
like the GRE
each question is different depending on how you answer previously
adapts to test-taker in terms of difficulty
tailored testing (very popular, but expensive)
Testing on the Internet
Faster and cheaper
test security is a major issue
proctored: take the test at a specific site with a proctor
not proctored: take the test anywhere you internet access and have your buddies help you take it.
Interview
social interactions between interviewer and applicant
can be biased by shared similarities, nonverbal behavior, and verbal cues
unstructured – ask applicants different questions
focus on g, education, interests, and work experience
Structured – questions are consistent across applicants
focus on job knowledge, interpersonal and social skills, and problem solving
more valid, predict job performance and mental ability, apply fairness to applicants
Situational – like SJTs, focus on:
experiences you’ve likely had (“how did you handle that”
unforeseen situations that might arise (“how would you handle that?”)
Experience-based v. situational questions
“Illusion of validity” – We are not good judges of people, but we think we are.
Assessment centers
Assess applicants via standardized group oriented exercises evaluated by raters
rate the performance of applicant
General characteristics
Assess management-level personnel
appraise individuals in goups (10 to 20)
Performance rated by trained observers
Use multiple methods to assess performance (group exercise, personality, etc)
Sources of criterion contamination
Work sample (High-fidelity simulations)
candidate performs actual task or representative task and is evaluated on proficiency
fidelity = high realism
typically used in “blue collar” physical jobs, not those that involve social aspects
Situational exercises (problem presented and asked how you would solve it).
In-basket Exercise
Leaderless Group Discussion
more white collar, decision making ability
Low-fidelity simulations
Biographical information
Predictor of promotion, salary, absenteeism, and productivity
issues of fairness
Legal implications
Letters of recommendation
Very often used but least valid
Primarily positive
Drug testing
Best used when danger to self or others is present (ie. forklift operator, truck driver)
Screening test
Confirmation test
Newer controversial methods of assessment
Polygraphs or Lie Detection
used in government agencies
Graphology
used in other countries
predictive of affective states (ex.:stress)
Emotional Intelligence
ability to manage emotional responses in social situations
scientific status still unknown
Test reliability and validity must be defensible in court when used to make personnel decisions.