Processing Data
Overview
“Processing Data” is a crucial literacy domain for helping students learn how to perform various operations on data so that they can extract valuable insights from data in critical and ethical ways.
Processing Data is often understood to be an iterative cycle involving several phrases. For the purposes of this project, we identify 4 key phases: collecting data; preparing data; analyzing data; and storing and preserving data.
The resources offered under this literacy domain push students to ask critical questions about data processing such as:
- What ethical issues do we need to consider when collecting data?
- What are useful strategies for preparing data?
- What are different ways we can analyze data to glean useful information?
- How can we store and preserve data to make it sustainable and accessible?
Collecting Data
Sample Toolkit Resources
Data Biography
Resource Type: Assignment
In this assignment, students apply the concept of a “data biography” to analyze the history behind a particular dataset: the who, what, when, how, and why of the dataset and its creation. In doing so, they learn about the different interpretative filters that shape the historical trajectory of a dataset, from its initial collection to its availability and usability today.
Preparing Data
Sample Toolkit Resources
Data Cleaning
Resource Type: Reading
In this brief article, Alice Macfarlan describes the motivation behind careful data preparation and outlines a set of steps and questions to ask oneself when preparing data. Macfarlan also provides links to more information about the process of and motivation for cleaning data.
Strangers in the Dataset
Resource Type: Reading
This section from chapter five of D’Ignazio and Klein’s Data Feminism takes a critical look at the metaphor of “cleaning” data, assesses the implications of thinking about data in this way, and challenges readers to think more deeply about the assumptions that guide data the data gathering and preparation process.
Analyzing Data
Sample Toolkit Resources
Attending to the Cultures of Data Science Work
Resource Type: Reading
This essay argues that data science communities have a responsibility to attend not only to the cultures that orient the work of domain communities, but also to the cultures that orient their own work. The author also describes how ethnographic frameworks such as thick description can be enlisted to encourage more reflexive data science work.
Critically Analyzing the World Happiness Report
Resource Type: Activity
This activity invites students to think about the variables included in the World Happiness Report dataset, about the relations between variables, and about the advantages and disadvantages of the authors’ approach to measuring happiness. This exercise is designed to help cultivate habits of critical reflection and to provide practice in data analysis, including reflection on correlation.
Exploring Data
Resource Type: Assignment
This assignment challenges students to examine, explore, and think critically about a dataset. In crtically analyzing a subset of the 2019 American Community Survey performed by the United States Census Bureau, students come to learn how counting the US population is inherently messy, and implicitly (and sometimes explicitly) caught up in questions of power.
Storing and Preserving Data
Sample Toolkit Resources
The CARE Principles for Indigenous Data Governance
Resource Type: Reading
This scholarly article describes the CARE principles for Indigenous data management and stewardship that have been built around the concept of data sovereignty and designed to complement the existing FAIR principles. Readers are challenged to think about what researchers owe to communities (particularly indigenous communities) who helped to create the data that researchers collect and publish.