... part of the Web Site of Dr. Jeffrey Oescher ... EDFR 6700 Index page
Chapter 8

Quantitative Data Collection Techniques


Revised - 11 November 2002



| Learning Objectives | External Web Sites | Application Exercises | Assignments
  1. Technical characteristics
    1. Validity: the extent to which inferences made from the results are appropriate, meaningful, and useful 8.4, 8.5, 8.6, 8.8
      1. Two types of inferences commonly found in education
        1. Assessing achievement and thus focusing on the extent to which the content of a test represents the larger domain of content that could have been examined
        2. Assessing abstract traits that tend to be highly related to achievement (e.g., intelligence, creativity, self-esteem, attitudes, reasoning, etc.) and thus focusing on the extent to which the construct is represented
      2. Types of evidence
        1. Evidence based on test content: the extent to which the test covers the content it is supposed to cover 8.7
          1. Types
            1. Face validity: a cursory examination of the content of a test
            2. Content validity: a systematic examination of the content of a test and the level of cognition at which it is tested (see Bloom's Taxonomy 8.2)
          2. Estimated by creating a table of specifications and mapping items into it 8.21
        2. Evidence based on response processes: the extent to which observations of performance strategies or responses to specific tasks are consistent with what is intended to be measured
          1. Examples
            1. Solving a mathematical problem demonstrating mathematical reasoning
            2. Performing a complicated physical activity like designing a floor plan or performing a chemistry experiment
          2. Estimated by examining respondent's explanations and patterns of responses
        3. Evidence based on internal structure: the extent to which different parts of an instrument and the items related to these parts are related in prescribed manners (i.e., construct validity) 8.9
          1. Example: A "Survey of School Attitudes" scale might be developed around four dimensions: curriculum, teachers and administrators, physical environment, and social relationships among students. Items written to each dimension should be unique to that dimension and strongly related to one another. Each dimension should represent a unique aspect of the construct of school attitudes and therefore not be related highly to the other dimensions.
          2. Estimated statistically by inter-item correlations and factor analyses of dimensions
        4. Evidence based on relations to other variables: the extent to which scores on a predictor measure relate in a predictable manner to scores on a criterion measure
          1. Convergent and discriminant evidence
            1. Convergent evidence: scores on the predictor measure relate positively to those on the criterion measure (e.g., ACT scores are related to freshman GPA's, GRE scores are related to graduate school GPA's)
            2. Discriminant evidence: scores on the predictor measure relate negatively to those on the criterion measure (e.g., frequency of counseling sessions and disruptive behavior, lesson effectiveness and off-task behaviors)
            3. Estimated with correlation coefficients
          2. Criterion-related evidence
            1. Predictive validity: scores on a predictor measure are collected first while scores on the criterion measure are collected at some point in the future (e.g., ACT scores are collected during the student's senior year in high school while freshman GPA is collected after completing his first year of college)
            2. Concurrent validity: scores on the predictor and criterion measures are collected at the same time (e.g., a teacher's perception of her effectiveness is collected at the same time as an observer's ratings of that teacher's effectiveness are made)
            3. Estimated with correlation coefficients
      3. Importance of validity evidence
        1. A researcher must establish validity for the measures being used
        2. Validity evidence is a matter of degree, not presence or absence
    2. Reliability: the extent to which the results are consistent, that is, they are similar over different forms of the same instrument or occasions of data collection 8.4, 8.5, 8.6, 8.8
      1. Conceptual formula: Obtained Score = True Score + Error
        1. Sources of measurement error in test construction and administration
          1. Changes in time limits
          2. Changes in directions
          3. Different scoring procedures
          4. Interrupted testing session
          5. Race of the test administrator
          6. Time the test is taken
          7. Sampling of items
          8. Ambiguity in wording
          9. Misunderstood directions
          10. Effect of heat, light, ventilation, etc. in the testing situation
          11. Differences in observers
        2. Sources of measurement error associated with the person taking the test
          1. Reactions to specific items
          2. Health
          3. Motivation
          4. Mood
          5. Fatigue
          6. Luck
          7. Fluctuation in memory or attention
          8. Attitudes
          9. Test-taking skills
          10. Ability to comprehend instruction
          11. Anxiety
      2. Types of reliability estimates
        1. Stability: taking the same test on two occasions (i.e., test - retest)
        2. Equivalence: taking two different forms of the same test at the same time (i.e., parallel forms)
        3. Equivalence and stability: parallel forms one of which is taken at one time and the other at a later time
        4. Internal consistency: artificially splitting a single test into two halves
          1. Split-half: literally any combination of halves of the test (e.g., odd-even, first half-second half, etc.)
          2. Kuder Richardson formulae: statistical formulae estimating the average of all combinations of spit-halves
            1. KR 20: uses test and item statistics in a complicated formula
            2. KR 21: uses test statistics only but under-estimates the KR 20
          3. Cronbach alpha: similar to the KR 20 but applicable to non-dichotomous responses (e.g., Likert scales, Semantic Differential scales, etc.)
        5. Agreement: extent to which two or more people agree about what was observed or rated (i.e., inter-rater reliability)
      3. Interpreting reliability coefficients
        1. All reliability coefficients range from 0 to 1 with the higher the coefficient the greater the reliability
        2. Factors positively affecting the reliability coefficient
          1. Heterogeneity of the group taking the test
          2. Greater number of items
          3. Greater variability in scores
          4. Moderate levels of item difficulty
          5. Items that discriminate effectively between high and low achievers
      4. Importance of reliability
        1. A researcher must establish reliability for the measures being used
        2. Reliability is a necessary, but not sufficient, condition for validity
  2. Tests
    1. Cognitive tests: measurement of student cognition
      1. Types of tests
        1. Standardized: tests which are characterized by prescribed methods for administering, scoring, and interpreting scores 8.11
          1. Typically developed in very meticulous, rigorous ways and therefore technically sound
          2. Broad based applications and uses
        2. Teacher-made: tests which are developed by teachers for use in their classrooms
          1. Typically less technically sophisticated than standardized tests
          2. Specific applications and uses
      2. Content of tests
        1. Achievement: measurement of what a student knows
        2. Aptitude
          1. Measurement of a student's potential to know
          2. Usually used to predict future performance
      3. Interpretation of test scores 8.12
        1. Norm-referenced: scores are interpreted relative to the scores of others taking the test (i.e., a norming group)
          1. Johnny performed better than 95% of the other students
          2. NRT scores include percentiles, stanines, CEEB scores (i.e., ETS scores), etc.
        2. Criterion-referenced: scores are interpreted relative to what the student knows
          1. Sally knows how to add, subtract, and multiply, but she does not know how to divide
          2. Typically interpreted relative to some performance standard (e.g., pass or fail, competent or incompetent, etc.)
      4. ERIC Test Locator 8.3
    2. Alternative assessments: measurement of student performance and achievement in "authentic" contexts (e.g., giving a speech, conducting a science experiment, writing an original short story, etc.)
      1. Performance assessments: measuring student proficiency of cognitive skills by directly observing how a student performs the skill 8.13
        1. Represents a holistic perspective on the skill
        2. Performed in an "authentic" context
        3. Criterion-referenced in orientation
      2. Portfolios: purposeful, systematic collection and evaluation of student work that documents progress toward learning objectives
      3. Due to the dependence on subjective ratings and subsequent lack of reliability, there is a need for sound rubrics for scoring these assessments 8.20
    3. Affective scales 8.14, 8.16
      1. Measures of interests, attitudes, self-concept, values, personality traits, beliefs, etc.
      2. Characteristics of concern
        1. Response set
          1. A pattern of consistent responses based on format rather than the trait being measured (e.g., strongly agreeing to every statement once this response pattern is established)
          2. Controlled by using both positively and negatively worded items so that a response set is not possible
        2. Faking
          1. Disguising the true response to an item, typically by choosing the socially desirable or normal response rather than responding honestly
          2. Controlled with anonymity or confidentiality
        3. Reliability is typically low for affective scales
        4. The lack of "right" or "wrong" answers implies that scores are often interpreted from a norm-referenced perspective, and, thus, the characteristics of the norming group become important
        5. Construct validity evidence is difficult to establish
  3. Questionnaire: a set of paper and pencil questions to which responses are requested
    1. Types of items
      1. Open (respondents create their responses) or closed (respondents choose their response form alternatives given to them)
      2. Scaled item formats
        1. Likert: a question or statement followed by a set of scaled responses
          1. Typical format is 1) Strongly Agree 2) Agree 3) Neutral 4) Disagree 5) Strongly Disagree
          2. Alternative formats describe levels of importance (i.e., very important to not important at all), time (i.e., always to never), performance (i.e., excellent to very poor), happiness (extremely happy to extremely sad), etc.
          3. Neutrality
            1. Neutral positions can be obtained by offering an odd number of points on the scale (e.g., 1 - 3, 1 - 5, etc.)
            2. Forced choices (positive or negative) can be obtained by offering an even number of points on the scale (e.g., 1 - 4, 1 - 6, etc.)
        2. Semantic differential: a statement or question followed by a set of bipolar adjectives called anchors and a set of responses placing the individual between these anchors
          1. Anchors include bipolar adjectives like easy - hard, like - dislike, fair - unfair, etc.
          2. The typical format is Easy: ___ ___ ___ ___ ___: Hard where the respondent checks the blank that corresponds to their feelings
          3. Neutral or forced choice responses can be controlled through the number of points on the scale as in the Likert format
        3. Ranked responses: statements or questions that require the respondent to rank their response alternatives
          1. Typical rankings are based on importance, fairness, desire, etc.
          2. Forces respondents to differentiate among alternatives
        4. Checklists: statements or questions followed by a number of options from which the respondent checks all that apply
    2. Development
      1. Justify the use of the technique
      2. Define the objective of the questionnaire
      3. Write the questions
        1. Make items clear
        2. Avoid double-barreled questions that contain two or more ideas (e.g., the following item, "Do you think the test was easy and fair?" asks for an opinion about difficulty and fairness.)
        3. Questions should be relevant
        4. Respondents must be competent to respond (e.g., reading levels are appropriate)
        5. Short and simple items are best
        6. Avoid negative items (e.g., "the test is fair" is a better statement than "the test is not fair")
        7. Avoid biased items or terms
      4. Organize the questionnaire and layout
      5. Review the scale and pilot test for technical merit
        1. Pilot testing involves administering the questionnaire to a sample of subjects to identify difficulties with directions, items, item formats, responses, time limits, etc.
    3. Advantages and disadvantages
      1. Advantages
        1. Efficient
        2. Practical
        3. Useful with relatively large samples
        4. Provides for standardized instructions
      2. Disadvantages
        1. Data is typically self-reported
        2. Possibility of misinterpreting questions
        3. Low return rates, typically less than 50%, for mailings
  4. Interview schedule: oral questions and answers
    1. Types
      1. Structured: specific questions are followed by a set of responses from which the respondents choose their responses
      2. Semi-structured: fairly specific questions with open-ended responses
      3. Unstructured: great latitude in terms of asking broad questions and flexibility of responses
    2. Development
      1. Justify the use of this technique
      2. Define objectives
      3. Write the questions and responses if it is a structured interview
      4. Organize the interview
      5. The interview process
        1. Importance of personal characteristics (e.g., appearance, personality, age, experience, racial background, gender, etc.)
        2. Use of probing for further clarification of an answer
    3. Advantages and disadvantages
      1. Advantages
        1. Flexibility and elaboration
        2. Opportunity to establish trust
        3. Depth of information
      2. Disadvantages
        1. Time consuming and typically expensive as a result
        2. Typically small samples
        3. Difficult data analysis
  5. Observation schedule: recordings of naturally occurring behavior seen or heard by the observer
    1. In quantitative research the observer remains detached and objective
    2. Development
      1. Justify the use of this technique
      2. Define observational units
        1. Types of observational data 8.19
          1. High inference: requires the observer to make judgements (e.g., enthusiastic, happy, etc.)
          2. Low inference: requires recording specific behaviors without making judgements in a larger sense (e.g., number of times a student leaves his seat, number of questions a student asks, etc.)
        2. Types of observations
          1. Duration: length of time a behavior lasts
          2. Frequency count: number of times a behavior occurs
          3. Interval: behaviors that occur within a specified interval of time
            1. Usually ten seconds to one minute
            2. Frequency counts within the interval
          4. Continuous: a brief description of behavior over an extended period of time
          5. Time sampling
            1. A selection of fixed or random time periods that will be used to observe
            2. Used in conjunction with all of the types of observations discussed above
      3. Training issues
        1. Observations must be objective, unbiased, and accurate
        2. Bias: idiosyncratic perceptions of the observer
    3. Advantages and disadvantages
      1. Advantages
        1. Records behavior as it naturally occurs
        2. Data is not self-reported
        3. Minimal effects of social desirability
        4. No response set effect
      2. Disadvantages
        1. Time consuming and thus expensive
        2. Difficult to conduct for complex behaviors
        3. Observer effects are difficult to control
  6. Unobtrusive measures: measures uninfluenced by an awareness of the subjects that they are participating
    1. Types
      1. Physical traces
      2. Archives
        1. Running record
        2. Private record
      3. Simple observation
      4. Contrived observation (e.g., taping)
    2. Development follows those procedures described above for observations and interviews
    3. Advantages and disadvantages
      1. Advantages
        1. Controls for the Hawthorne Effect
        2. Controls for role selection
        3. Controls for response sets
      2. Disadvantages
        1. Time consuming and thus expensive
        2. Difficult to conduct for complex behaviors
        3. Observer effects are difficult to control
  7. Strengths and weaknesses of quantitative data collection techniques (See Table 8.7, p. 276)
  8. Websites related to general measurement issues
    1. Measurement 8.1
    2. The Quality of Assessment 8.10
    3. Types of Educational Measures 8.18




Original outline prepared for Addison, Wesley, Longman by Jeffrey Oescher, University of New Orleans



Back to Top



E-Mail Jeffrey Oescher