Variables & Data

A variable is something that you want to measure as part of your research question. You may want to measure “enjoyment” or “learning” or “self-efficacy”. Identifying the correct data to collect to operationalize your variables is important for answering your research questions and increasing confidence in your findings.

Variables are the phenomenon of interest that you want to measure or understand to address or answer a research question. Variables may not be directly measurable. Measuring “learning” may require an assessment or pre/post test. Measuring “enjoyment” or “self-efficacy” may require a survey that utilizes a validated instrument that has been shown to measure the attribute of interest. Measuring “time on task” may require developing a strategy for estimating when students are working and then use that strategy for estimating student work time similar to “study sessions” [GXH19]. The key to data collection is to identify the measures that can be utilized to answer the research question associated with specific variables. This is called operationalizing the variables and should be described in a data collection plan and later in the study manuscript.

As you work on your data collection plan, you may find that the variables you’re interested in measuring don’t address the research questions that you’re asking or they may be challenging to obtain with the resources available for your study. You may find that you’ll refine your research questions as you consider what data it is feasible to collect, how the data operationalizes a variable, and what measures are most appropriate.

Reliability & Validity

Two considerations of the data you’re collecting is the reliability and validity of the collection and/or the instrument.

Reliability is the “consistency of your measurement instrument, or the degree to which an instrument measures the same thing each time it is used” [BD12]. A highly reliable instrument means that the results are the same with each administration of the instrument. There are several ways to measure reliability of instruments described on csedresearch.org [MX19ReliabilityValdity]. Highly reliable instruments strengthen your research results.

Validity is the “strength of our conclusions or propositions” [BD12]. A highly valid instrument has been checked, usually in multiple ways, to ensure that it is measuring what it is supposed to measure. There are several different types of validity that may need to be considered as described on csedresearch.org [MX19ReliabilityValdity]. Concerns with validity doesn’t mean that you can’t use the instrument. Those concerns are a threat to the validity of your research and should be recorded so that others can understand the limitations to the study conclusions.

When choosing instruments, you want to maximize the reliability and validity of your measures so that you and your readers can trust the strength of your conclusions. If you’re developing a tool to collect data, you should have tests to show that it is collecting the correct data. Using existing instruments to measure phenomenon of interest, like attitudinal measures, saves time in creating new instruments, supports replication, and strengthens research results.

Measurements

Measurements are the data generated when running a data collection instrument as part of your research study. There are several different types of measurements to consider [BD12]:

Self-report - measures an individual’s thoughts, feelings, or behaviors
Tests - measures individual differences in ability or personality
Behavioral Measures - measures behaviors through systematic observation, which may include text or recordings of audio/video
Physical Measures - measures bodily activity (e.g., eye tracking, heart rate, blood pressure)

Measurements like self-report and tests may utilize surveys to gather data. An example of a self-report survey would be a survey about self-efficacy. An example of a test is the Myers-Briggs personality type indicator. Self-report data may also utilize interviews and focus groups. Behavioral measures are taken via observations, interviews, and focus groups. Physical measures require special biometrics tools to support data collection.

Classroom Data

When conducting educational research, there are other sources of data available that can help answer research questions, some of which may be generated as part of “normal classroom practice”. These data, used in combination with other measures, can help answer research questions about the impact of educational interventions. Some common sources of classroom data are:

Grades - a measure of learning on assessments in the class
Learning Management System Logs - a measure of student interaction with course resources
Version Control System Data - a measure of student development history
Automated Grading System Data - a measure of student performance, possibly with a rich history of how performance changed throughout a project
Message Board Interaction - a measure of asynchronous help-seeking activity
Office Hours Interactions - a measure of synchronous help-seeking activity

Demographic Data

Demographic data is useful for characterizing the participants in a study and demographic may be an important part of the research questions. There are several key pieces of information that should be considered depending on your research questions [MDZ18]:

Student Demographic Data

Where possible, provide the following demographic data about any student participant populations in the study. This can help others understand the institutional context of the research and if the intervention is appropriate for their context.

Ages and levels - characterize the participants by age and grade or university levels. Using both age and levels appropriate for your context can help characterize the population to a global audience (e.g., middle grades or middle school is not a standard grade or age group).
Number of participants - describe the number and groups who participated in the study
Gender - describe the gender of the students. Where possible consider self-report and a text field for the participant to describe their gender. If numbers of one group are less than 5, particularly for quantitative work, be careful about the possibility of identification of individuals due to that information.
Locations - describe the location of the study. For dual-anonymous review, provide a general region and country (e.g., a research-intensive institution in the southeastern United States)
Prior Computing Knowledge - prior knowledge in computing is emerging as a key factor in student success in computing. This prior knowledge could be formal (e.g., an AP CS course) or informal (e.g., summer camp or after school club)
Race/ethnicity - describe the race and ethnicity of student participants. If numbers of one group are less than 5, particularly for quantitative work, be careful about the possibility of identification of individuals due to that information.
Student disabilities - describe student disabilities and characterize the disabilities. Be careful about possible identification with low numbers.

You may also want to consider reporting the socio-economic status of students (particularly for K-12 research), first generation college students, veterans, transfer or traditional undergraduates, etc.

Instructor Demographic Data

When discussing an intervention, include information about the instructor who lead the intervention, including number, who taught what, instructor prior experience, gender, and race/ethnicity.

Program & Intervention Data

In addition to characterizing the study participants, the program should also be characterized to help others understand if the intervention is appropriate in their context. Include the following [MDZ18]:

Course description - what are the key learning outcomes for the course and is the course intended for majors, minors, non-majors, etc.
Type of activity - describe the intervention activity
Required or optional - describe if the activity was required (or part of normal classroom practice) or optional. If optional, did students have the opportunity to complete some alternative activity?
Timeframe - when was the activity or intervention done in the overall class context
Curriculum - what was the curriculum, who created it, and what is the general availability
Teaching methods - what teaching methods are used for the intervention or the classroom in general
Tool/language - what were the tool(s) and/or language(s) used for the class or intervention
Duration of the activity - what was the duration of the activity, for K-12 research, the contact hours is especially important
Accommodations - what were any needed accommodations for participants with disabilities

Previous submodule:

Data Collection Considerations

Next submodule:

Validation