Data Collection Considerations

Your research questions should drive identification of the data that is collected for your study. The data that you collect should either directly answer a research question or help answer the research question when combined with other information [BD12].

When identifying data to collect, take into consideration the following questions:

Is the data NEEDED?

When considering data collection, the first question to ask is if the data is needed to answer the research questions. Your focus should be on the key information that you need to directly answer the research questions for your study.

Many researchers default to trying to gather all of the data you can. There are several challenges and concerns with this approach to data collection.

  • The data might be harder to gather from various sources (e.g., learning management systems, automated grading systems, gradesheets, exercises, exams, surveys, etc.)
  • Some data might be harder to store. Videos and screen capture are large. Many students may lead to lots of items to store.
  • Some data might have restrictions on where it can be stored. Protected data like grades, financial aid, health records, etc. may only be stored on certain machines and with specific protections. You should review your institution’s data storage policies to ensure that you’re in compliance.
  • More data means more work [wrangling] the data into a form that you can use to address your research questions. This may include coding open-ended questions or transcribing interviews.
  • Too many surveys or activities beyond normal classroom practice may lead to “survey fatigue” or other challenges with data collection.
  • Collection of too much data may suggest hypothesis hunting where you’re searching for something interesting and then building the hypothesis or research question from the interesting thing. The research question should drive the research (even if there are null or negative results); the interesting finding shouldn’t drive the research!

Ultimately, use your common sense to identify the data most appropriate to answer your research questions, that reduces the threats to validity associated with your study, and that would convince an audience that your research question is answered.

Is it REASONABLE or even possible to gather the data?

Just because you want to collect some data, doesn’t mean that it’s reasonable to get it!

When thinking about reasonableness of data collection the two main factors are time & resources and appropriateness.

Consider both the research team’s and the participants’ time and resources. For example, attempting to analyze video recordings of student help-seeking interactions during office hours for a 200+ person class is likely too time consuming for the value. In that case, would a subset work? Or some other information about the interaction like a characterization of the interaction rather than a tool? From the participant perspective, asking for a significant time commitment will likely lead to low response rates. Excessively long surveys will see a drop off in completion. Lab studies that are over 2 hours will likely have few participants.

Consider the appropriateness of collecting the data and the impact that it might have on your study. Some things about the way you request information may introduce bias or confounding factors. For example, asking a leading question may encourage the participant to give the answer that you’re seeking (e.g., How has the time you spent working on an assignment negatively impacted your mental health?). Or asking a self-report question about a past event may lead to inaccurate or misremembered information (e.g., How did you feel about class registration when you attended your college orientation?). Collecting particularly sensitive information, like personally identifiable information (PII), opinions about others like teammates or roommates, and health related information adds significant complexity to the research study and increases the likelihood of not participating. Sensitive information (e.g., social security numbers, class accommodations, peer evaluations) should only be collected if absolutely needed to answer the research question.

Is it ETHICAL to gather the data?

The final consideration about data collection is to determine if it is ethical to collect the data. The Bradley Report has three considerations for ethical human subjects research:

  • Respect for Persons - information about the study should be presented at the appropriate comprehension level for the participants and participants should be given informed consent to participate without cohesion
  • Beneficence - the study should maximize benefits and minimize possible harm
  • Justice - there should be a fair distribution of the benefits and burdens on research participants

Ethics review, like Institutional Review Board (IRB) review in the United States, will take a look at the ethical aspects to the study.

As an example, consider a gamification system that categorizes student performance into bronze, silver, gold, and platinum. A study asking a research question about if categorizing students’ performance would increase motivation was rejected by an ethics review because the categorization could have a negative impact on students.

Next submodule: