BAB Validity Research - The Ball Foundation

Establishing the validity of a multi-aptitude battery first requires meeting the requisite psychometric standards for each of the individual assessments. Each test is selected to be included in the “collection” of assessments (i.e., the battery) in order to provide a comprehensive picture of an individual’s range of aptitudes. Subsequently, validity evidence requires criterion-specific information that the assessments perform jointly to provide useful information in different performance environments.

The Ball Aptitude Battery^® (BAB™) continues to demonstrate and meet validity and reliability standards, which are the key scientific requirements essential to test construction and development.

Reliability

Tests cannot be valid unless they are first reliable. Reliability refers to consistency in measurement across individuals and time. The three basic procedures used by the Foundation to estimate the reliability of the Ball Aptitude Battery include: test-retest reliability, internal consistency reliability, and parallel forms reliability.

BAB reliability research suggests that:

The pattern of reliability among BAB subtests has remained consistent across BAB forms.

Internal consistency and test-retest methods of reliability estimation support that the items of each test measure the same construct and that they do so consistently over time.

Historically, studies in which reliability data have been reported for the BAB show that reliability coefficients range from mid .70s to low .90s, with a median coefficient in the mid .80s. During these investigations, attention was given to ensure that the components of the testing event (e.g., administration conditions, individuals completing the tests were motivated to do their best, etc.) were favorable.

Validity

Validity, in essence, refers to whether a test measures what it says it measures. Validation of a multi-aptitude test battery is an ongoing process. Through data collection and research investigations, the Foundation continuously examines validity. The two major types of validity evidence gathered on the BAB are discussed below.

Construct-Related Validity Evidence

Using factor-analytic methods, scores on BAB subtests have been analyzed in relation to both other BAB subtests and the subtests of other major aptitude batteries in order to establish that the subtests are each measuring unique and distinct constructs.

BAB construct-related validity research suggests that:

The BAB factor structure is stable across time.

Factor structure is stable across samples and forms.

Confirmatory Factor Analysis (CFA) on BAB Form M data found five mutually orthogonal factors: Verbal, Numerical Ability, Memory, Spatial Reasoning, and Perceptual-Motor Speed (Tirre and Field, 2002) . These findings replicated Neuman et al’s (2000) five factor solution using a version of the BAB that had very similar item content to that of Form M but used a sample with a more diverse age range. In addition, evidence suggests that computer and paper-pencil versions of the BAB measure the same constructs (Tirre & Field, 2004, May) .

Measurement equivalence studies support a lack of bias in the BAB.

The Foundation has conducted analyses of item-level DIF using the Mantel-Haenszel chi-square statistic. When item-level evidence of DIF has been identified, it has been considered in the decision criteria for elimination for shortening the BAB subtests. As a result, in Form M introduced in 1998 and the computerized BAB, evidence of item-level DIF has been minimal (roughly equal number of items that favor the focal and majority groups). Also, across several sample analyzed using confirmatory factor analysis, results have been supportive of the measurement equivalence of the BAB (e.g., Tirre & Field, 2002; Tirre & Field, 2003, May).

Convergent and discriminant validity evidence has been gathered.

We have also evaluated the relationships between the subtests of the Ball Aptitude Battery and relevant subtests of other major aptitude batteries. Data from cross-correlational and confirmatory factor analyses (Dong et al, 1986; Tirre & Field, 2002) show that the aptitudes measured by the BAB are related in predictable ways to aptitudes measured by three well-known multi-aptitude batteries: Differential Aptitude Tests (DAT; The Psychological Corporation, 1972), General Aptitude Test Battery (GATB; U.S. Department of Labor, 1970), and Armed Services Vocational Aptitude Battery (ASVAB; U.S. Department of Defense, 1995).

Confirmatory Factor Analysis demonstrates that the BAB has a broad offering of unique aptitude measurements.

We have also evaluated the relationships between the subtests of the Ball Aptitude Battery and relevant subtests of other major aptitude batteries. The intent was to compare what the common dimensions measured are and overall breadth of aptitude coverage. The Foundation obtained three separate samples and used Confirmatory Factor Analysis (CFA) to jointly analyze the BAB and a multi-aptitude battery (CAB, ASVAB, or GATB) per sample (Tirre and Field, 2002) . The intent was to compare what the common dimensions measured are and overall breadth of aptitude coverage. Differences in factor structure primarily reflected the composition of the aptitude tests that were administered. The BAB does not measure specialized knowledge such as the technical knowledge subtests in the ASVAB. However, when compared to the GATB or ASVAB, the BAB appears to offer greater breadth of aptitude coverage for use in comprehensive vocational assessment.

Criterion-Related Validity Evidence

Criterion-related validity evidence is one of the ways to add confidence and demonstrate that what is being measured is consistent in its relationships with important educational or work-related outcomes. The BAB has been administered to individuals who represent a variety of ages and in a variety of educational and work settings. Data collected for numerous research and consulting projects have provided support for the criterion-related validity of the BAB.

Data collected in educational settings:

BAB subtest scores were statistically significant predictors of academic performance across subjects.

In a concurrent validation study, students completed the BAB during their senior year in high school (Dong, Sung, & Goldman, 1985) . BAB subtest scores were predictive of course grade composites for subjects such as English, Economics, Chemistry, Social Studies, Foreign Languages, and overall GPA.

As part of a three-year longitudinal study in a large Midwest high school, students completed a shortened form of the BAB during their ninth grade year, and additional data were gathered throughout their high school years (ninth through twelfth grades). BAB subtest scores were predictive of course grade composites for subjects such as Art, Business, Health, Science, Foreign language, and for overall GPA (BCS tech manual, 2002).

BAB subtests and scores on standardized tests also show strong relationships with academic performance.

Concurrent validation results from students who completed the BAB during their senior year in high school (Dong, Sung, & Goldman, 1985) indicated that BAB subtest scores were predictive of course grade composites for subjects such as English, Economics, Chemistry, Social Studies, Foreign Languages, and overall GPA.

As part of a three-year longitudinal study in a large Midwest high school, standardized testing (PSAT, PLAN, SAT, and ACT) administered in tenth to twelfth grades was collected, and correlations with BAB subtests and scores on the standardized tests resulted in a median correlation of .30 (range from .02 to .80). Four subtests (Analytical Reasoning, Numerical Computation, Numerical Reasoning, and Vocabulary) showed the strongest relationships across the academic criteria (BCS tech manual, 2002).

Criterion-related validity data collected on the BAB within occupational settings is described in our Performance Environments section.