Your research questions should drive identification of the data that is collected for your study. The data that you collect should either directly answer a research question or help answer the research question when combined with other information [BD12].
When identifying data to collect, take into consideration the following questions:
When considering data collection, the first question to ask is if the data is needed to answer the research questions. Your focus should be on the key information that you need to directly answer the research questions for your study.
Many researchers default to trying to gather all of the data you can. There are several challenges and concerns with this approach to data collection.
Ultimately, use your common sense to identify the data most appropriate to answer your research questions, that reduces the threats to validity associated with your study, and that would convince an audience that your research question is answered.
Just because you want to collect some data, doesn’t mean that it’s reasonable to get it!
When thinking about reasonableness of data collection the two main factors are time & resources and appropriateness.
Consider both the research team’s and the participants’ time and resources. For example, attempting to analyze video recordings of student help-seeking interactions during office hours for a 200+ person class is likely too time consuming for the value. In that case, would a subset work? Or some other information about the interaction like a characterization of the interaction rather than a tool? From the participant perspective, asking for a significant time commitment will likely lead to low response rates. Excessively long surveys will see a drop off in completion. Lab studies that are over 2 hours will likely have few participants.
Consider the appropriateness of collecting the data and the impact that it might have on your study. Some things about the way you request information may introduce bias or confounding factors. For example, asking a leading question may encourage the participant to give the answer that you’re seeking (e.g., How has the time you spent working on an assignment negatively impacted your mental health?). Or asking a self-report question about a past event may lead to inaccurate or misremembered information (e.g., How did you feel about class registration when you attended your college orientation?). Collecting particularly sensitive information, like personally identifiable information (PII), opinions about others like teammates or roommates, and health related information adds significant complexity to the research study and increases the likelihood of not participating. Sensitive information (e.g., social security numbers, class accommodations, peer evaluations) should only be collected if absolutely needed to answer the research question.
The final consideration about data collection is to determine if it is ethical to collect the data. The Bradley Report has three considerations for ethical human subjects research:
Ethics review, like Institutional Review Board (IRB) review in the United States, will take a look at the ethical aspects to the study.
As an example, consider a gamification system that categorizes student performance into bronze, silver, gold, and platinum. A study asking a research question about if categorizing students’ performance would increase motivation was rejected by an ethics review because the categorization could have a negative impact on students.