Definitions of Critical Thinking
One of the most debatable features about critical thinking is what constitutes critical thinking—its definition. Table 1 shows definitions of critical thinking drawn from the frameworks reviewed in the Markle et al. (2013) paper. The different sources of the frameworks (e.g., higher education and workforce) focus on different aspects of critical thinking. Some value the reasoning process specific to critical thinking, while others emphasize the outcomes of critical thinking, such as whether it can be used for decision making or problem solving. An interesting phenomenon is that none of the frameworks referenced in the Markle et al. paper offers actual assessments of critical thinking based on the group's definition. For example, in the case of the VALUE (Valid Assessment of Learning in Undergraduate Education) initiative as part of the AAC&U's LEAP campaign, VALUE rubrics were developed with the intent to serve as generic guidelines when faculty members design their own assessments or grading activities. This approach provides great flexibility to faculty and accommodates local needs. However, it also raises concerns of reliability in terms of how faculty members use the rubrics. A recent AAC&U research study found that the percent agreement in scoring was fairly low when multiple raters scored the same student work using the VALUE rubrics (Finley, 2012). For example, the percentage of perfect agreement of using four scoring categories across multiple raters was only 36% when the critical thinking rubric was applied.
In addition to the frameworks discussed by Markle et al. (2013), there are other influential research efforts on critical thinking. Unlike the frameworks discussed by Market et al., these research efforts have led to commercially available critical thinking assessments. For example, in a study sponsored by the American Philosophical Association (APA), Facione (1990b) spearheaded the effort to identify a consensus definition of critical thinking using the Delphi approach, an expert consensus approach. For the APA study, 46 members recognized as having experience or expertise in critical thinking instruction, assessment, or theory, shared reasoned opinions about critical thinking. The experts were asked to provide their own list of the skill and dispositional dimensions of critical thinking. After rounds of discussion, the experts reached an agreement on the core cognitive dimensions (i.e., key skills or dispositions) of critical thinking: (a) interpretation, (b) analysis, (c) evaluation, (d) inference, (e) explanation, and (f) self-regulation—making it clear that a person does not have to be proficient at every skill to be considered a critical thinker. The experts also reached consensus on the affective, dispositional components of critical thinking, such as “inquisitiveness with regard to a wide range of issues,” “concern to become and remain generally well-informed,” and “alertness to opportunities to use CT [critical thinking]” (Facione, 1990b, p. 13). Two decades later, the approach AAC&U took to define critical thinking was heavily influenced by the APA definitions.
Halpern also led a noteworthy research and assessment effort on critical thinking. In her 2003 book, Halpern defined critical thinking as
…the use of those cognitive skills or strategies that increase the probability of a desirable outcome. It is used to describe thinking that is purposeful, reasoned, and goal directed—the kind of thinking involved in solving problems, formulating inferences, calculating likelihoods, and making decisions, when the thinker is using skills that are thoughtful and effective for the particular context and type of thinking task. (Halpern, 2003, p. 6)
Halpern's approach to critical thinking has a strong focus on the outcome or utility aspect of critical thinking, in that critical thinking is conceptualized as a tool to facilitate decision making or problem solving. Halpern recognized several key aspects of critical thinking, including verbal reasoning, argument analysis, assessing likelihood and uncertainty, making sound decisions, and thinking as hypothesis testing (Halpern, 2003).
These two research efforts, led by Facione and Halpern, lent themselves to two commercially available assessments of critical thinking, the California Critical Thinking Skills Test (CCTST) and the Halpern Critical Thinking Assessment (HCTA), respectively, which are described in detail in the following section, where we discuss existing assessments. Interested readers are also pointed to research concerning constructs overlapping with critical thinking, such as argumentation (Godden & Walton, 2007; Walton, 1996; Walton, Reed, & Macagno, 2008) and reasoning (Carroll, 1993; Powers & Dwyer, 2003).
Existing Assessments of Critical Thinking
Multiple Themes of Assessments
As with the multivariate nature of the definitions offered for critical thinking, critical thinking assessments also tend to capture multiple themes. Table 2 presents some of the most popular assessments of critical thinking, including the CCTST (Facione, 1990a), California Critical Thinking Disposition Inventory (CCTDI; Facione & Facione, 1992), Watson–Glaser Critical Thinking Appraisal (WGCTA; Watson & Glaser, 1980), Ennis–Weir Critical Thinking Essay Test (Ennis & Weir, 1985), Cornell Critical Thinking Test (CCTT; Ennis, Millman, & Tomko, 1985), ETS® Proficiency Profile (EPP; ETS, 2010), Collegiate Learning Assessment+ (CLA+; Council for Aid to Education, 2013), Collegiate Assessment of Academic Proficiency (CAAP Program Management, 2012), and the HCTA (Halpern, 2010). The last column in Table 2 shows how critical thinking is operationally defined in these widely used assessments. The assessments overlap in a number of key themes, such as reasoning, analysis, argumentation, and evaluation. They also differ along a few dimensions, such as whether critical thinking should include decision making and problem solving (e.g., CLA+, HCTA, and California Measure of Mental Motivation [CM3]), be integrated with writing (e.g., CLA+), or involve metacognition (e.g., CM3).
|California Critical Thinking Disposition Inventory (CCTDI)||Insight Assessment (California Academic Press)a||Selected-response (Likert scale—extent to which students agree or disagree)||Online or paper/pencil||30 min||75 items (seven scales: 9–12 items per scale)||This test contains seven scales of critical thinking: (a) truth-seeking, (b) open-mindedness, (c) analyticity, (d) systematicity, (e) confidence in reasoning, (f) inquisitiveness, and (g) maturity of judgment (Facione, Facione, & Sanchez, 1994)|
|California Critical Thinking Skills Test (CCTST)||Insight Assessment (California Academic Press)||Multiple-choice (MC)||Online or paper/pencil||45 min||34 items (vignette based)||The CCTST returns scores on the following scales: (a) analysis, (b) evaluation, (c) inference, (d) deduction, (e) induction, and (f) overall reasoning skills (Facione, 1990a)|
|California Measure of Mental Motivation (CM3)||Insight Assessment (California Academic Press)||Selected-response (4-point Likert scale: strongly disagree to strongly agree)||Online or paper/pencil||20 min||72 items||This assessment measures and reports scores on the following areas: (a) learning orientation, (b) creative problem solving, (c) cognitive integrity, (d) scholarly rigor, and (e) technological orientation (Insight Assessment, 2013)|
|Collegiate Assessment of Academic Proficiency (CAAP) Critical Thinking||ACT||MC||Paper/pencil||40 min||32 items (includes four passages representative of issues commonly encountered in a postsecondary curriculum)||The CAAP Critical Thinking measures students' skills in analyzing elements of an argument, evaluating an argument, and extending arguments (CAAP Program Management, 2012)|
|Collegiate Learning Assessment+ (CLA+)||Council for Aid to Education (CAE)||Performance task (PT) and MC||Online||90 min (60 min for PT; 30 min for MC)||26 items (one PT; 25 MC)||The CLA+ PTs measure higher order skills including: (a) analysis and problem solving, (b) writing effectiveness, and (c) writing mechanics. The MC items assess (a) scientific and quantitative reasoning, (b) critical reading and evaluation, and (c) critiquing an argument (Zahner, 2013)|
|Cornell Critical Thinking Test (CCTT)||The Critical Thinking Co.||MC||Computer based (using the software) or paper/pencil||50 min (can also be administered untimed)||Level X: 71 items||Level X is intended for students in Grades 5–12+ and measures the following skills: (a) induction, (b) deduction, (c) credibility, and (d) identification of assumptions (The Critical Thinking Co., 2014)|
|Level Z: 52 items||Level Z is intended for students in Grades 11–12+ and measures the following skills: (a) induction, (b) deduction, (c) credibility, (d) identification of assumptions, (e) semantics, (f) definition, and (g) prediction in planning experiments (The Critical Thinking Co., 2014)|
|Ennis–Weir Critical Thinking Essay Test||Midwest Publications||Essay||Paper/pencil||40 min||Nine-paragraph essay/letter||This assessment measures the following areas of the critical thinking competence: (a) getting the point, (b) seeing reasons and assumptions, (c) stating one's point, (d) offering good reasons, (e) seeing other possibilities, and (f) responding appropriately to and/or avoiding argument weaknesses (Ennis & Weir, 1985)|
|ETS Proficiency Profile (EPP) Critical Thinking||ETS||MC||Online and paper/pencil||About 40 min (full test is 2 h)||27 items (standard form)||The Critical Thinking component of this test measures a students' ability to: (a) distinguish between rhetoric and argumentation in a piece of nonfiction prose, (b) recognize assumptions and the best hypothesis to account for information presented, (c) infer and interpret a relationship between variables, and (d) draw valid conclusions based on information presented (ETS, 2010)|
|Halpern Critical Thinking Assessment (HCTA)||Schuhfried Publishing, Inc.||Forced choice (MC, ranking, or rating of alternatives) and open-ended||Computer based||60–80 min, but test is untimed (Form S1)||25 scenarios of everyday events (five per subcategory)||This test measures five critical thinking subskills: (a) verbal reasoning skills, (b) argument and analysis skills, (c) skills in thinking as hypothesis testing, (d) using likelihood and uncertainty, and (e) decision-making and problem-solving skills (Halpern, 2010)|
|20 min, but test is untimed (Form S2)||S1: Both open-ended and forced choice items|
|S2: All forced choice items|
|Watson–Glaser Critical Thinking Appraisal tool (WGCTA)||Pearson||MC||Online and paper/pencil||Standard: 40–60 min (Forms A and B) if timed||80 items||The WGCTA is composed of five tests: (a) inference, (b) recognition of assumptions, (c) deduction, (d) interpretation, and (e) evaluation of arguments. Each test contains both neutral and controversial reading passages and scenarios encountered at work, in the classroom, and in the media. Although there are five tests, only the total score is reported (Watson & Glaser, 2008a, 2008b)|
|Short form: 30 min if timed||40 items|
|Watson–Glaser II: 40 min if timed||40 items||Measures and provides interpretable subscores for three critical thinking skill domains that are both contemporary and business relevant, including the ability to: (a) recognize assumptions, (b) evaluate arguments, and (c) draw conclusions (Watson & Glaser, 2010).|
The majority of the assessments exclusively use selected-response items such as multiple-choice or Likert-type items (e.g., CAAP, CCTST, and WGCTA). EPP, HCTA, and CLA+ use a combination of multiple-choice and constructed-response items (though the essay is optional in EPP), and the Ennis–Weir test is an essay test. Given the limited testing time, only a small number of constructed-response items can typically be used in a given assessment.
Test and Scale Reliability
Although constructed-response items have great face validity and have the potential to offer authentic contexts in assessments, they tend to have lower levels of reliability than multiple-choice items for the same amount of testing time (Lee, Liu, & Linn, 2011). For example, according to a recent report released by the sponsor of the CLA+, the Council for Aid to Education (Zahner, 2013), the reliability of the 60-min constructed-response section is only .43. The test-level reliability is .87, largely driven by the reliability of CLA+'s 30-min short multiple-choice section.
Because of the multidimensional nature of critical thinking, many existing assessments include multiple subscales and report subscale scores. The main advantage of subscale scores is that they provide detailed information about test takers' critical thinking ability. The downside, however, is that these subscale scores are typically challenged by their unsatisfactory reliability and the lack of distinction between scales. For example, CCTST reports scores on overall reasoning skills and subscale scores on five aspects of critical thinking: (a) analysis, (b) evaluation, (c) inference, (d) deduction, and (e) induction. However, Leppa (1997) reported that the subscales have low internal consistency, from .21 to .51, much lower than the reliabilities (i.e., .68 to .70) reported by the authors of CCTST (Ku, 2009). Another example is that the WGCTA provides subscale scores on inference, recognition of assumption, deduction, interpretation, and evaluation of arguments. Studies found that the internal consistency of some of these subscales was low and had a large range, from .17 to .74 (Loo & Thorpe, 1999). Additionally, there was no clear evidence of distinct subscales, since a single-component scale was discovered from 60 published studies in a meta-analysis (Bernard et al., 2008). Studies also reported unstable factor structure and low reliability for the CCTDI (Kakai, 2003; Walsh & Hardy, 1997; Walsh, Seldomridge, & Badros, 2007).
Comparability of Forms
Following reasons such as test security and construct representation, most assessments employ multiple forms. The comparability among forms is another source of concern. For example, Jacobs (1999) found that the Form B of CCTST was significantly more difficult than Form A. Other studies also found that there is low comparability between the two forms on the CCTST (Bondy, Koenigseder, Ishee, & Williams, 2001).
Table 3 presents some of the more recent validity studies for existing critical thinking assessments. Most studies focus on the correlation of critical thinking scores with scores on other general cognitive measures. For example, critical thinking assessments showed moderate correlations with general cognitive assessments such as SAT® or GRE® tests (e.g., Ennis, 2005; Giancarlo, Blohm, & Urdan, 2004; Liu, 2008; Stanovich & West, 2008; Watson & Glaser, 2010). They also showed moderate correlations with course grades and GPA (Gadzella et al., 2006; Giancarlo et al., 2004; Halpern, 2006; Hawkins, 2012; Liu & Roohr, 2013; Williams et al., 2003). A few studies have looked at the relationship of critical thinking to behaviors, job performance, or life events. Ejiogu, Yang, Trent, and Rose (2006) examined the scores on the WGCTA and found that they positively correlated moderately with job performance (corrected r = .32 to .52). Butler (2012) examined the external validity of the HCTA and concluded that those with higher critical thinking scores had fewer negative life events than those with lower critical thinking skills (r = −.38).
|Butler (2012)||HCTA||Community college students; state university students; and community adults||131||Significant moderate correlation with the real-world outcomes of critical thinking inventory (r(131) = −.38), meaning those with higher critical thinking scores reported fewer negative life events|
|Ejiogu et al. (2006)||WGCTA Short Form||Analysts in a government agency||84||Significant moderate correlations corrected for criterion unreliability ranging from .32 to .52 with supervisory ratings of job performance behaviors; highest correlations were with analysis and problem solving (r(68) = .52), and with judgment and decision making (r(68) = .52)|
|Ennis (2005)||Ennis–Weir Critical Thinking Essay Test||Undergraduates in an educational psychology course (Taube, 1997)||198|
Moderate correlation with WGCTA (r(187) = .37)
Low to moderate correlations with personality assessments ranging from .24 to .35
Low to moderate correlations with SAT verbal (r(155) = .40), SAT quantitative (r(155) = .28), and GPA (r(171) = .28)
|Malay undergraduates with English as a second language (Moore, 1995)||60||Correlations with SAT verbal (pretest: r(60) = .34, posttest: r(60) = .59), TOEFL® (pre: r(60) = .35, post: r(60) = .48), ACT (pre: r(60) = .25, post: r(60) = .66), TWE® (pre: r(60) = −.56, post: r(60) = −.07), SPM (pre: r(60) = .41, post: r(60) = .35)|
|10th-, 11th-, and 12th-grade students (Norris, 1995)||172||Low to moderate correlations with WGCTA (r(172) = .28), CCTT (r(172) = .32), and Test on Appraising Observations (r(172) = .25)|
|Gadzella et al. (2006)||WGCTA Short Form||State university students (psychology, educational psychology, and special education undergraduate majors; graduate students)||586||Low to moderately high significant correlations with course grades ranging from .20 to .62 (r(565) = .30 for total group; r(56) = .62 for psychology majors)|
|Giddens and Gloeckner (2005)||CCTST; CCTDI||Baccalaureate nursing program in the southwestern United States||218||Students who passed the NCLEX had significantly higher total critical thinking scores on the CCTST entry test (t(101) = .2.5*, d = 1.0), CCTST exit test (t(191) = 3.0**, d = .81), and the CCTDI exit test (t(183) = 2.6**, d = .72) than students who failed the NCLEX|
|Halpern (2006)||HCTA||Study 1: Junior and senior students from high school and college in California||80 high school, 80 college||Moderate significant correlations with the Arlin Test of Formal Reasoning (r = .32) for both groups|
|Study 2: Undergraduate and second-year masters students from California State University, San Bernardino||145 undergraduates, 32 masters||Moderate to moderately high correlations with the Need for Cognition scale (r = .32), GPA (r = .30), SAT Verbal (r = .58), SAT Math (r = .50), GRE Analytic (r = .59)|
|Giancarlo et al. (2004)||CM3||9th- and 11th-grade public school students in northern California (validation study 2)||484||Statistically significant correlation ranges between four CM3 subscales (learning, creative problem solving, mental focus, and cognitive integrity) and measures of mastery goals (r(482) = .09 to .67), self-efficacy (r(482) = .22 to .47), SAT9 Math (r(379) = .18 to .33), SAT9 Reading (r(387) = .13 to .43), SAT9 Science (r(380) = .11 to .22), SAT9 Language/Writing (r(382) = .09 to .17), SAT9 Social Science (r(379) = .09 to .18), and GPA (r(468) = .19 to .35)|
|9th- to 12th-grade all-female college preparatory students in Missouri (validation study 3)||587||Statistically significant correlation ranges between four CM3 subscales (learning, creative problem solving, mental focus, and cognitive integrity) and PSAT Math (r(434) = .15 to .37), PSAT Verbal (r(434) = .20 to .31), PSAT Writing (r(291) = .21 to .33), PSAT selection index (r(434) = .23 to .40), and GPA (r(580) = .21 to .46)|
|Hawkins (2012)||CCTST||Students enrolled in undergraduate English courses at a small liberal arts college||117||Moderate significant correlations between total score and GPA (r = .45). Moderate significant subscale correlations with GPA ranged from .27 to .43|
|Liu and Roohr (2013)||EPP||Community college students from 13 institutions||46,402|
Students with higher GPA and students with more credit hours performed higher on the EPP as compared to students with low GPA and fewer credit hours
GPA was the strongest significant predictor of critical thinking (β = .21, η2 = .04)
|Watson and Glaser (2010)||WGCTA||Undergraduate educational psychology students (Taube, 1997)||198||Moderate significant correlations with SAT Verbal (r(155) = .43), SAT Math (r(155) = .39), GPA (r(171) = .30), and Ennis–Weir (r(187) = .37). Low to moderate correlations with personality assessments ranging from .07 to .33|
|Three semesters of freshman nursing students in eastern Pennsylvania (Behrens, 1996)||172||Moderately high significant correlations with fall semester GPA ranging from .51 to .59|
|Education majors in an educational psychology course at a southwestern state university (Gadzella, Baloglu, & Stephens, 2002)||114||Significant correlation between total score and GPA (r = .28) and significant correlations between the five WGCTA subscales and GPA ranging from .02 to .34|
|Williams et al. (2003)||CCTST; CCTDI||First-year dental hygiene students from seven U.S. baccalaureate universities||207|
Significant correlations between the CCTST and CCTDI at baseline (r = .41) and at second semester (r = .26)
Significant correlations between CCTST and knowledge, faculty ratings, and clinical reasoning ranging from .24 to .37 at baseline, and from .23 to .31 at the second semester. For the CCTDI, significant correlations ranged from .15 to .19 at baseline with knowledge, faculty ratings, and clinical reasoning, and with faculty reasoning (r = .21) at second semester
The CCTDI was a more consistent predictor of student performance (4.9–12.3% variance explained) than traditional predictors such as age, GPA, number of college hours (2.1–4.1% variance explained)
|Williams, Schmidt, Tilliss, Wilkins, and Glasnapp (2006)||CCTST; CCTDI||First-year dental hygiene students from three U.S. baccalaureate dental hygiene programs||78|
Significant correlation between CCTST and CCTDI (r = .29) at baseline
Significant correlations between CCTST and NBDHE Multiple-Choice (r = .35) and Case-Based tests (r = .47) at baseline and at program completion (r = .30 and .33, respectively). Significant correlations between CCTDI and NBDHE Case-Based at baseline (r = .25) and at program completion (r = .40)
CCTST was a more consistent predictor of student performance on both NBDHE Multiple-Choice (10.5% variance explained) and NBDHE Case-Based scores (18.4% variance explained) than traditional predictors such as age, GPA, number of college hours
Our review of validity evidence for existing assessments revealed that the quality and quantity of research support varied significantly among existing assessments. Common problems with existing assessments include insufficient evidence of distinct dimensionality, unreliable subscores, noncomparable test forms, and unclear evidence of differential validity across groups of test takers. In a review of the psychometric quality of existing critical thinking assessments, Ku (2009) reported a phenomenon that the studies conducted by researchers not affiliated with the authors of the tests tend to report lower psychometric quality of the tests than the studies conducted by the authors and their affiliates.
For future research, a component of validity that is missing from many of the existing studies is the incremental predictive validity of critical thinking. As Kuncel (2011) pointed out, evidence is needed to clarify critical thinking skills' prediction of desirable outcomes (e.g., job performance) beyond what is predicted by other general cognitive measures. Without controlling for other types of general cognitive ability, it is difficult to evaluate the unique contributions that critical thinking skills make to the various outcomes. For example, the Butler (2012) study did not control for any measures of participants' general cognitive ability. Hence, it leaves room for an alternative explanation that other aspects of people's general cognitive ability, rather than critical thinking, may have contributed to their life success.
Challenges in Designing Critical Thinking Assessment
Authenticity Versus Psychometric Quality
A major challenge in designing an assessment for critical thinking is to strike a balance between the assessment's authenticity and its psychometric quality. Most current assessments rely on multiple-choice items when measuring critical thinking. The advantages of such assessments lie in their objectivity, efficiency, high reliability, and low cost. Typically, within the same amount of testing time, multiple-choice items are able to provide more information about what the test takers know as compared to constructed-response items (Lee et al., 2011). Wainer and Thissen (1993) reported that the scoring of 10 constructed-response items costs about $30, while the cost for scoring multiple-choice items to achieve the same level of reliability was only 1¢. Although multiple-choice items cost less to score, they typically cost more in assessment development than constructed-response items. That being said, the overall cost structure of multiple-choice versus constructed-response items will depend on the number of scores that are derived from a given item over its lifecycle.
Studies also show high correlations of multiple-choice items and constructed-response items of the same constructs (Klein et al., 2009). Rodriguez (2003) investigated the construct equivalence between the two item formats through a meta-analysis of 63 studies and concluded that these two formats are highly correlated when measuring the same content—mean correlation around .95 with item stem equivalence and .92 without stem equivalence. The Klein et al. (2009) study compared the construct validity of three standardized assessments of college learning outcomes (i.e., EPP, CLA, and CAAP) including critical thinking. The school-level correlation between a multiple-choice and a constructed-response critical thinking test was .93.
Given that there may be situations where constructed-response items are more expensive to score and that multiple-choice items can measure the same constructs equally well in some cases, one might argue that it makes more sense to use all multiple-choice items and disregard constructed-response items; however, with constructed-response items, it is possible to create more authentic contexts and assess students' ability to generate rather than select responses. In real-life situations where critical thinking skills need to be exercised, there will not be choices provided. Instead, people will be expected to come up with their own choices and determine which one is more preferable based on the question at hand. Research has long established that the ability to recognize is different from the ability to generate (Frederiksen, 1984; Lane, 2004; Shepard, 2000). In the case of critical thinking, constructed-response items could be a better proxy of real-world scenarios than multiple-choice items.
We agree with researchers who call for multiple item formats in critical thinking assessments (e.g., Butler, 2012; Halpern, 2010; Ku, 2009). Constructed-response items alone will not be able to meet the psychometric standards due to their low internal consistency, one type of reliability. A combination of multiple item formats offers the potential for an authentic and psychometrically sound assessment.
Instructional Value Versus Standardization
Another challenge of designing a standardized critical thinking assessment for higher education is the need to pay attention to the assessment's instructional relevance. Faculty members are sometimes concerned about the limited relevance of general student learning outcomes' assessment results, as these assessments tend to be created in isolation from curriculum and instruction. For example, although most institutions think that critical thinking is a necessary skill for their students (AAC&U, 2011), not many offer courses to foster critical thinking specifically. Therefore, even if the assessment results show that students at a particular institution lack critical thinking skills, no specific department, program, or faculty would claim responsibility for it, which greatly limits the practical use of the assessment results. It is important to identify the common goals of general higher education and translate them into the design of the learning outcomes assessment. The VALUE rubrics created by AAC&U (Rhodes, 2010) are great examples of how a common framework can be created to align expectations about college students' critical thinking skills. While one should pay attention to the assessments' instructional relevance, one should also keep in mind that the tension will always exist between instructional relevance and standardization of the assessment. Standardized assessment can offer comparability and generalizability across institutions and programs within an institution. An assessment designed to reflect closely the objectives and goals of a particular program will have great instructional relevance and will likely offer rich diagnostic information about the students in that program, but it may not serve as a meaningful measure of outcomes for students in other programs. When designing an assessment for critical thinking, it is essential to find that balance point so the assessment results bear meaning for the instructors and provide information to support comparisons across programs and institutions.
Institutional Versus Individual Use
Another concern is whether the assessment should be designed to provide results for institutional use or individual use, a decision that has implications for psychometric considerations such as reliability and validity. For an institutional level assessment, the results only need to be reliable at the group level (e.g., major, department), while for an individual assessment, the results have to be reliable at the individual test-taker level. Typically, more items are required to achieve acceptable individual-level reliability than institution-level reliability. When assessment results are used only at an aggregate level, which is how they are currently used by most institutions, the validity of the test scores is in question as students may not expend their maximum effort when answering the items. Student motivation when taking a low-stakes assessment has long been a source of concern. A recent study by Liu, Bridgeman, and Adler (2012) confirmed that motivation plays a significant role in affecting student performance on low-stakes learning outcomes assessment in higher education. Conclusions about students' learning gains in college could significantly vary depending on whether they are motivated to take the test or not. If possible, the assessment should be designed to provide reliable information about individual test takers, which allows test takers to possibly benefit from the test (e.g., obtaining a certificate of achievement). The increased stakes may help boost students' motivation while taking such assessments.
General Versus Domain-Specific Assessment
Critical thinking has been defined as a generic skill in many of the existing frameworks and assessments (e.g., Bangert-Drowns & Bankert, 1990; Ennis, 2003; Facione, 1990b; Halpern, 1998). On one hand, many educators and philosophers believe that critical thinking is a set of skills and dispositions that can be applied across specific domains (Davies, 2013; Ennis, 1989; Moore, 2011). The generalists depict critical thinking as an enabling skill similar to reading and writing, and argue that it can be taught outside the context of a specific discipline. On the other hand, the specifists' view about critical thinking is that it is a domain-specific skill and that the type of critical thinking skills required for nursing would be very different from those practiced in engineering (Tucker, 1996). To date, much of the debate remains at the theoretical level, with little empirical evidence confirming the generalization or specificity of critical thinking (Nicholas & Labig, 2013). One empirical study has yielded mixed findings. Powers and Enright (1987) surveyed 255 faculty members in six disciplinary domains to gain understanding of the kind of reasoning and analytical abilities required for successful performance at the graduate level. The authors found that some general skills, such as “reasoning or problem solving in situations in which all the needed information is not known,” were valued by faculty in all domains (p. 670). Despite the consensus on some skills, faculty members across subject domains showed marked difference in terms of their perceptions of the importance of other skills. For example, “knowing the rules of formal logic” was rated of high importance for computer science but not for other disciplines (p. 678).
Tuning USA is one of the efforts that considers critical thinking in a domain-specific context. Tuning USA is a faculty-driven process that aims to align goals and define competencies at each degree level (i.e., associate's, bachelor's, and master's) within a discipline (Institute for Evidence-Based Change, 2010). For Tuning USA, there are goals to foster critical thinking within certain disciplinary domains, such as engineering and history. For example, for engineering students who work on design, critical thinking suggests that they develop “an appreciation of the uncertainties involved, and the use of engineering judgment” (p. 97) and that they understand “consideration of risk assessment, societal and environmental impact, standards, codes, regulations, safety, security, sustainability, constructability, and operability” at various stages of the design process (p. 97).
In addition, there is insufficient empirical evidence showing that, as a generic skill, critical thinking is distinguishable from other general cognitive abilities measured by validated assessments such as the SAT and GRE tests (see Kuncel, 2011). Kuncel, therefore, argued that instead of being a generic skill, critical thinking is more appropriately studied as a domain-specific construct. This view may be correct, or at least plausible, but there also needs to be empirical evidence demonstrating that critical thinking is a domain-specific skill. It is true that examples of critical thinking offered by members of the nursing profession may be very different from those cited by engineers, but content knowledge plays a significant role in this distinction. Would it be reasonable to assume that skillful critical thinkers can be successful when they transfer from one profession to another with sufficient content training? Whether and how content knowledge can be disentangled from higher order critical thinking skills, as well as other cognitive and affective faculties, await further investigation.
Despite the debate over the nature of critical thinking, most existing critical thinking assessments treat this skill as generic. Apart from the theoretical reasons, it is much more costly and labor-intensive to design, develop, and score a critical thinking assessment for each major field of study. If assessments are designed only for popular domains with large numbers of students, students in less popular majors are deprived of the opportunity to demonstrate their critical thinking skills. From a score user perspective, because of the interdisciplinary nature of many jobs in the 21st century workforce, many employers value generic skills that can be transferable from one domain to another (AAC&U, 2011; Chronicle of Higher Education, 2012; Hart Research Associates, 2013), which makes an assessment of critical thinking in a particular domain less attractive.
Total Versus Subscale Scores
Another challenge related to critical thinking assessment is whether to offer subscale scores. Given the multidimensional nature of the critical thinking construct, it is a natural tendency for assessment developers to consider subscale scores for critical thinking. Subscale scores have the advantages of offering detailed information about test takers' performance on each of the subscales and also have the potential to provide diagnostic information for teachers or instructors if the scores are going to be used for formative purposes (Sinharay, Puhan, & Haberman, 2011). However, one should not lose sight of the psychometric requirements when offering subscale scores. Evidence is needed to demonstrate that there is a real and reliable distinction among the subscales. Previous research reveals that for some of the existing critical thinking assessments, there is lack of support for the factor structure based on which subscale scores are reported (e.g., CCTDI; Kakai, 2003; Walsh & Hardy, 1997; Walsh et al., 2007
To be skilled in critical thinking is to be able to take one’s thinking apart systematically, to analyze each part, assess it for quality and then improve it. The first step in this process is understanding the parts of thinking, or elements of reasoning.
These elements are: purpose, question, information, inference, assumption, point of view, concepts, and implications. They are present in the mind whenever we reason. To take command of our thinking, we need to formulate both our purpose and the question at issue clearly. We need to use information in our thinking that is both relevant to the question we are dealing with, and accurate. We need to make logical inferences based on sound assumptions. We need to understand our own point of view and fully consider other relevant viewpoints. We need to use concepts justifiably and follow out the implications of decisions we are considering. (For an elaboration of the Elements of Reasoning, see a Miniature Guide to the Foundations of Analytic Thinking.)
In this article we focus on two of the elements of reasoning: inferences and assumptions. Learning to distinguish inferences from assumptions is an important intellectual skill. Many confuse the two elements. Let us begin with a review of the basic meanings:
- Inference: An inference is a step of the mind, an intellectual act by which one concludes that something is true in light of something else’s being true, or seeming to be true. If you come at me with a knife in your hand, I probably would infer that you mean to do me harm. Inferences can be accurate or inaccurate, logical or illogical, justified or unjustified.
- Assumption: An assumption is something we take for granted or presuppose. Usually it is something we previously learned and do not question. It is part of our system of beliefs. We assume our beliefs to be true and use them to interpret the world about us. If we believe that it is dangerous to walk late at night in big cities and we are staying in Chicago, we will infer that it is dangerous to go for a walk late at night. We take for granted our belief that it is dangerous to walk late at night in big cities. If our belief is a sound one, our assumption is sound. If our belief is not sound, our assumption is not sound. Beliefs, and hence assumptions, can be unjustified or justified, depending upon whether we do or do not have good reasons for them. Consider this example: “I heard a scratch at the door. I got up to let the cat in.” My inference was based on the assumption (my prior belief) that only the cat makes that noise, and that he makes it only when he wants to be let in.
We humans naturally and regularly use our beliefs as assumptions and make inferences based on those assumptions. We must do so to make sense of where we are, what we are about, and what is happening. Assumptions and inferences permeate our lives precisely because we cannot act without them. We make judgments, form interpretations, and come to conclusions based on the beliefs we have formed.
If you put humans in any situation, they start to give it some meaning or other. People automatically make inferences to gain a basis for understanding and action. So quickly and automatically do we make inferences that we do not, without training, notice them as inferences. We see dark clouds and infer rain. We hear the door slam and infer that someone has arrived. We see a frowning face and infer that the person is upset. If our friend is late, we infer that she is being inconsiderate. We meet a tall guy and infer that he is good at basketball, an Asian and infer that she will be good at math. We read a book, and interpret what the various sentences and paragraphs — indeed what the whole book — is saying. We listen to what people say and make a series of inferences as to what they mean.
As we write, we make inferences as to what readers will make of what we are writing. We make inferences as to the clarity of what we are saying, what requires further explanation, what has to be exemplified or illustrated, and what does not. Many of our inferences are justified and reasonable, but some are not.
As always, an important part of critical thinking is the art of bringing what is subconscious in our thought to the level of conscious realization. This includes the recognition that our experiences are shaped by the inferences we make during those experiences. It enables us to separate our experiences into two categories: the raw data of our experience in contrast with our interpretations of those data, or the inferences we are making about them. Eventually we need to realize that the inferences we make are heavily influenced by our point of view and the assumptions we have made about people and situations. This puts us in the position of being able to broaden the scope of our outlook, to see situations from more than one point of view, and hence to become more open-minded.
Often different people make different inferences because they bring to situations different viewpoints. They see the data differently. To put it another way, they make different assumptions about what they see. For example, if two people see a man lying in a gutter, one might infer, “There’s a drunken bum.” The other might infer, “There’s a man in need of help.” These inferences are based on different assumptions about the conditions under which people end up in gutters. Moreover, these assumptions are connected to each person’s viewpoint about people. The first person assumes, “Only drunks are to be found in gutters.” The second person assumes, “People lying in the gutter are in need of help.”
The first person may have developed the point of view that people are fundamentally responsible for what happens to them and ought to be able to care for themselves. The second may have developed the point of view that the problems people have are often caused by forces and events beyond their control. The reasoning of these two people, in terms of their inferences and assumptions, could be characterized in the following way:
Situation: A man is lying in the gutter.
Situation: A man is lying in the gutter.
|Inference: That man’s a bum.||Inference: That man is in need of help.|
Assumption: Only bums lie in gutters.
Assumption: Anyone lying in the gutter is in need of help.
Critical thinkers notice the inferences they are making, the assumptions upon which they are basing those inferences, and the point of view about the world they are developing. To develop these skills, students need practice in noticing their inferences and then figuring the assumptions that lead to them.
As students become aware of the inferences they make and the assumptions that underlie those inferences, they begin to gain command over their thinking. Because all human thinking is inferential in nature, command of thinking depends on command of the inferences embedded in it and thus of the assumptions that underlie it. Consider the way in which we plan and think our way through everyday events. We think of ourselves as preparing for breakfast, eating our breakfast, getting ready for class, arriving on time, leading class discussions, grading student papers, making plans for lunch, paying bills, engaging in an intellectual discussion, and so on. We can do none of these things without interpreting our actions, giving them meanings, making inferences about what is happening.
This is to say that we must choose among a variety of possible meanings. For example, am I “relaxing” or “wasting time?” Am I being “determined” or “stubborn?” Am I “joining” a conversation or “butting in?” Is someone “laughing with me” or “laughing at me?” Am I “helping a friend” or “being taken advantage of?” Every time we interpret our actions, every time we give them a meaning, we are making one or more inferences on the basis of one or more assumptions.
As humans, we continually make assumptions about ourselves, our jobs, our mates, our students, our children, the world in general. We take some things for granted simply because we can’t question everything. Sometimes we take the wrong things for granted. For example, I run off to the store (assuming that I have enough money with me) and arrive to find that I have left my money at home. I assume that I have enough gas in the car only to find that I have run out of gas. I assume that an item marked down in price is a good buy only to find that it was marked up before it was marked down. I assume that it will not, or that it will, rain. I assume that my car will start when I turn the key and press the gas pedal. I assume that I mean well in my dealings with others.
Humans make hundreds of assumptions without knowing it---without thinking about it. Many assumptions are sound and justifiable. Many, however, are not. The question then becomes: “How can students begin to recognize the inferences they are making, the assumptions on which they are basing those inferences, and the point of view, the perspective on the world that they are forming?”
There are many ways to foster student awareness of inferences and assumptions. For one thing, all disciplined subject-matter thinking requires that students learn to make accurate assumptions about the content they are studying and become practiced in making justifiable inferences within that content. As examples: In doing math, students make mathematical inferences based on their mathematical assumptions. In doing science, they make scientific inferences based on their scientific assumptions. In constructing historical accounts, they make historical inferences based on their historical assumptions. In each case, the assumptions students make depend on their understanding of fundamental concepts and principles.
As a matter of daily practice, then, we can help students begin to notice the inferences they are making within the content we teach. We can help them identify inferences made by authors of a textbook, or of an article we give them. Once they have identified these inferences, we can ask them to figure out the assumptions that led to those inferences. When we give them routine practice in identifying inferences and assumptions, they begin to see that inferences will be illogical when the assumptions that lead to them are not justifiable. They begin to see that whenever they make an inference, there are other (perhaps more logical) inferences they could have made. They begin to see high quality inferences as coming from good reasoning.
We can also help students think about the inferences they make in daily situations, and the assumptions that lead to those inferences. As they become skilled in identifying their inferences and assumptions, they are in a better position to question the extent to which any of their assumptions is justified. They can begin to ask questions, for example, like: Am I justified in assuming that everyone eats lunch at 12:00 noon? Am I justified in assuming that it usually rains when there are black clouds in the sky? Am I justified in assuming that bumps on the head are only caused by blows?
The point is that we all make many assumptions as we go about our daily life and we ought to be able to recognize and question them. As students develop these critical intuitions, they increasingly notice their inferences and those of others. They increasingly notice what they and others are taking for granted. They increasingly notice how their point of view shapes their experiences.
This article was adapted from the book, Critical Thinking: Tools for Taking Charge of Your Learning and Your Life, by Richard Paul and Linda Elder.
Back to top