Misconception #1: “You can’t measure social and emotional skills”
Part 1: Operational Definitions and Validity
This post is the first in a series about misconceptions we often hear about social and emotional learning (SEL). In each post, we will tackle a common confusion about SEL and attempt to clarify some misconceptions. The first two posts in the series concern the measurement of SEL. In this post, we will cover operational definitions and validity. Next week, we’ll discuss reliability and how to select an SEL assessment.
In recent years, there has been increased interest in SEL as more and more research shows the importance of these skills for school, work, and life. With increased interest also comes an increased need for measuring social and emotional skills. Being able to measure these skills is key to identifying student strengths and areas for improvement, evaluating the efficacy of SEL programs, and in some instances, securing state and federal funding for SEL. However, we often hear that people think SEL is “soft”, “squishy”, or even “invisible” and, as such, can’t be measured. Thus, we have our first misconception: You can’t measure social and emotional skills.
Psychometrics says: We can measure social and emotional skills
There is an entire scientific field devoted to the measurement of these “invisible” skills or constructs called psychometrics. The American Psychological Association defines psychometrics as “the branch of psychology concerned with the quantification and measurement of mental attributes, behavior, performance, and the like, as well as with the design, analysis, and improvement of the tests, questionnaires, and other instruments used in such measurement.…” So how do we do it?
We begin by developing operational definitions of the constructs (e.g., social and emotional skills) we want to measure. An operational definition describes a construct based on its “measurable units,” which can include behaviors, self-perceptions, and/or physiological factors. For example, we all have an idea of what stress is. However, people with different expertise may define and operationalize stress differently: a medical doctor may operationalize stress based on physiological reactions (e.g., increase in blood pressure), a counselor based on how intrusive stress is on the patient’s daily life (e.g., trouble sleeping), while a researcher may use a checklist of stressful events (e.g., The Daily Hassles and Uplifts Scale). Therefore, we need to first operationally define stress in order to develop the corresponding instrument to measure it. The same idea can be applied to skills such as Resilience or Grit. Once we have the operational definition, we can develop an appropriate assessment to measure the construct.
Are we measuring what we think we are? Validity
At this point, you probably have (at least) one other question: Which one of those definitions is the correct one? Truth is that all of them are. Though each definition operationalizes stress differently, we would expect that all three measures will be strongly related to each other and produce similar patterns of relationships since they measure the same construct. This would show that the construct is measured accurately, and thus valid. Validity is one of the two corner stones of psychometrics (along with reliability, which we’ll discuss in our next blog). Something round and orange isn’t necessarily an orange, it could be a grapefruit, or a traffic light, or even a basketball! Therefore, we need to evaluate that we are actually hitting the target. Validity speaks to how well an assessment measures what we want it to be measuring.
Validation is an ongoing process during which we evaluate multiple sources. We discuss each source below and include examples from our ACT® Tessera® family of assessments.
- Content validity: This is the extent to which the whole construct or skill is covered. We gather content validity evidence by enlisting subject matter experts to review an assessment to check that all aspects that make up the construct are included while avoiding including “noise” (aspects that are not part of the construct).
- ACT Tessera example: When thinking about School Safety, there are several sources of perceived safety: perceived personal safety, safety conditions based on peer-peer relationships within the school, and building factors contributing to safety. The School Climate Safety scale included in ACT Tessera includes items to cover each aspect (e.g., I feel safe at school; There are gangs at my school; School classrooms and hallways are clean). A group of experts reviewed all ACT Tessera items in order to ensure all constructs are sufficiently covered.
- Convergent and discriminant validity: These concern whether or not we are zoning in on the right construct with our assessment. Convergent validity is the extent to which an assessment is related to others that measure the same and/or related constructs. On the other hand, discriminant validity is found when we see little to no relationship with assessments that measure unrelated constructs or skills.
- ACT Tessera example: In ACT Tessera Workforce, the Integrity skill is strongly related to the Honesty-Humility scale from another well-established assessment; this shows convergent validity because they both measure the same thing. However, we don’t expect (nor find) any relationship with extraversion, which demonstrates discriminant validity.
- Criterion validity: Probably the most important source of validity evidence is criterion validity. An assessment shows strong criterion validity if it strongly relates to an outcome that it should theoretically relate to. To a certain extent, criterion validity informs us of how useful the assessment is to achieve an outcome we want.
- ACT Tessera example: Based on previous research, we expect Grit to be a good predictor of academic success. Using ACT Tessera Middle School, we gathered criterion validity evidence by showing that our assessment of Grit is the strongest predictor of GPA.
To summarize, imagine you are taking a picture of a pen. Although the picture on the left includes a pen, it wouldn’t be considered a valid picture. The picture on the right more accurately captures the image of a pen, and therefore it is more valid.
People are often skeptical that SEL can be measured. Fortunately, the field of psychometrics has been working for centuries on finding ways to measure constructs such as social and emotional skills. Operational definitions and validity are two key components of SEL assessment. Next week, we’ll discuss reliability and how to select an SEL assessment. We hope it becomes clear that SEL can be measured as long as these factors are taken into consideration. Stay tuned!
 Kanner, A. D., Coyne, J. C., Schaefer, C., & Lazarus, R. S. (1981). Comparison of two modes of stress measurement: Daily hassles and uplifts versus major life events. Journal of behavioral medicine, 4(1), 1-39.