Sample midterm exam items:
1. What are the major areas of assessment that were
discussed in class? Intellectual
assessment, personality assessment, diagnostic/emotional assessment,
relationship and family functioning, vocational interest and aptitude,
behavioral assessment.
2. What were some of the most important ethical, legal, and multicultural issues in assessment? Fairness to clients, use of appropriate
measures, appropriate use of measures, competence of the professional, cultural
competence including appropriate norms and language of administration, avoiding
pejorative labels and bias, etc.
3. What are norms and how are they used? Normative samples are representative groups
to whom the exam is initially administered. When you administer an instrument
to a client who is similar to the norm group you can compare your clients scores to the norm group.
4. What are the pros and cons of raw scores, age- and
grade-equivalents, percentile ranks, and standard scores? Raw scores are only meaningful to those very familiar with the test but
are the easiest to first obtain. Age- and grade-equivalents are easy for
parents to discuss (“Johnny is at a 6th-grade reading level”). They
are either determined by correct answers that are theoretically at a particular
age- or grade-level, or by obtaining scores that children typically receive at
that age or grade-level. However, this does not mean that the child being
assessed is similar to children of the age-level obtained in ways other than on
that particular measure, and so can be misleading. Percentile ranks are a
better expression of performance, and are generally understandable to clients
(Johnny scored at a level at or better than 95% of others his age). A drawback
is that percentiles are not evenly spaced in terms of numbers of correct
answers. Because a lot of people score around the 50th percentile,
it may require fewer additional correct/endorsed answers to move up to the 55th
percentile from the 50th than to move from the 90th to
the 95th, for example. The best representation of performance are
standard scores (your T-score of 60 means that your
score is one standard deviation above the mean), because it clearly expresses
how well a client did in relation to norms, on an interval scale. However,
standard scores are generally more understandable by clients when conveyed with
percentile ranks or simply as the equivalent percentile ranks.
For items 5-13, please briefly define each term:
5. Interrater reliability: Agreement/consistency between raters who are scoring answers or observing behavior.
6. Test-retest reliability: Correlation between first and second administration of test—shows consistent results.
7. Alternate-forms reliability: Correlation between two forms of test—shows consistency across similar versions.
8. Inter-item consistency: Correlation among items on a test, which reflects consistently
measuring the domain of interest.
9. Construct validity: A
construct is a theoretical idea such as a state like anxiety. In order to
measure it we have to operationally define it in terms of observable behaviors
and responses to items on measurement scales. If we successfully measure what
seems to be a real state or condition, we consider our techniques, and the
construct, to be valid. It indicates we are measuring what we intended to. To
evaluate this, we typically look for evidence of convergent validity--moderate
to strong correlations with similar measures, and discriminant
validity—lower correlations with less similar measures.
10. Content validity: Are
we measuring the domain well? Typically experts can evaluate whether the domain
is adequately sampled by a measure. For example, an expert in depression might
want to ensure that a measure asks about not only dysphoria,
but also anhedonia, bodily symptoms, and so on.
11. Criterion validity: Does
our measure predict outcomes? This can be concurrent or predictive (in the
future). For example, the predictive validity of the GRE might be measured by
seeing whether it predicts first-year GPA in grad school. In order to do this
well the criterion measure must have good psychometric properties itself.
12. Incremental validity: Does our measure cost-effectively add to our understanding of the
client and aid in decision making?
13. Face validity: Does
our instrument measure what it looks like it does? This is not always desired,
because it makes results easy to fake, and it is not always necessary if the
items discriminate well between groups of respondents. However, if you have a
reluctant or unmotivated client and you use measures with low face validity
(such as some MMPI-type items), they may be resistant
to completing the instruments carefully.
14. One of the assessment instruments used in family therapy is the FACES-III. What does
it measure? Adaptability
and cohesion in the family.
15. Another measure discussed in the text and in class is
the genogram. How is it
typically administered? Clients might be
asked to draw circles and squares designating female and male family members as
far back as they can remember for three generations, in the fashion of a family
tree, labeling them with names, dates of birth and death, occupations,
marriages, and sometimes indicating other alliances or rifts between members.