Introduction for Instructors
This assignment sequence lays out a drafted and workshopped major writing assignment designed for an upper-division undergraduate technical writing course and intended to take 1.5-2 weeks of the course to complete. It asks students to find a dataset to document and then leads them through a multi-step process of critical engagement with that dataset. The ultimate product is a set of documentation that is not merely technical but also critical and deeply contextual. One example of student work produced during the course of this assignment sequence can be found in the Student Showcase on this website.
The sequence consists of four assignments, the first three of which ask students to keep adding new material to their draft. After the third assignment, students get feedback from peers and the instructor (details not explained here), before submitting the final draft in the fourth assignment.
Before beginning this sequence, students will have seen the slide deck titled “Getting Started with Data for Advocacy.”
The student-facing instructions for all four assignments are as follows:
Assignment 1: Find a dataset & write draft documentation for it
To develop your skills in critiquing data, your next assignment is to find and document an existing dataset that could be used in data advocacy.
Finding data
First, decide what topic or advocacy question you are interested in pursuing. Do some internet searches that combine keywords about your chosen topic with words like “dataset.” You might find what you’re looking for pretty quickly.
If you don’t, ask yourself: who might have collected the data I want? If it’s a government, there’s a decent chance the data might be online. Try internet searches with the name of the agency or jurisdiction. If it’s an academic researcher whose data you want, they might have made it available in a public archive – but no guarantees. If you can’t find the data you really want, settle for whatever’s closest. However, make sure you have a dataset, not just the results of data analysis.
Documenting data
Most data on the web is poorly documented. YOU are not going to make that mistake.
One of the key goals of this project is to learn the skills necessary to make your work useful to, and usable by, future collaborators. This type of project documentation is not only a key genre of technical communication, but also a vital professional writing skill. That’s why your first important writing assignment in this class is the documentation of the dataset you found.
Many different standards for data documentation exist, tailored to the needs of different workers in different fields. Our field, data advocacy, does not have any set standard for data documentation as far as I am aware. Therefore, I have devised one for you to follow.
We’re going to build the complete documentation over multiple days of class. We’ll start with the technical documentation. Your dataset documentation should contain the following information, in the following order.
- Name of dataset
- Link to dataset
- Summary of dataset (4-5 sentences, including a brief explanation of the dataset’s potential relevance to data advocacy, and on which topic or issue)
- Keywords (search terms that future students might use to find this dataset in the archive. You can think of these as tags.)
- Creator(s) of dataset
- Funder(s) of dataset, if applicable
- Rights and permissions (Is the data in the public domain, like most government data? Or released under a Creative Commons license? Or are some rights restricted?)
- Source where you found the data
- Date of creation
- Date of last update, if different from date of creation
- Version number, if applicable
- File format(s) (CSV? HTML? JPEG? If there are multiple files, list each separately with all relevant info)
- List of variables (in most tabular datasets, the variables are the column names) with a one-sentence explanation of each
- Explanation of codes, if relevant (i.e. codes or abbreviations used in either the file names or the variables in the data files - for example, ‘999 indicates a missing value in the data’)
This is a first draft of your documentation, and we will be revising and adding to it in the coming days.
Sample documentation
Here is an anonymous student sample of what I’m asking you to create, including formatting.
Assignment 2: Read Data Feminism Ch. 6 and write a biography of your dataset
For next class, read Chapter 6 of Data Feminism. The overarching idea of this chapter is that basic technical documentation of data, like you have already created for your dataset, is not enough. Datasets need additional context.
Thus, I’m going to ask you to write a data biography of the dataset you have already begun documenting for class. Add the data biography to the same document that you’re already working in. Add it below the technical dataset documentation, in similar format using subsections and styles.
What’s a data biography?
To figure out exactly what should go into your data biography, we’ll use the “Datasheets for Datasets” proposal by Timnit Gebru et al. as a guide. This proposal was discussed in Chapter 6 of Data Feminism, and it has been quite influential. You don’t have to read the whole thing. Just look carefully at the questions in sections 3.1 through 3.7.
The questions in section 3.1 should already be answered in the technical documentation you’ve done so far. I recommend that you add a paragraph to your documentation for each section from 3.2 through 3.7, answering the questions in each section that are relevant to your dataset.
Many of these questions will require you to do background research on your dataset. If a question simply doesn’t apply to your dataset, you can ignore it – but if you can’t find the answer to a question that applies, then say you can’t find it instead of ignoring the question. Don’t blithely report “there are no errors in the dataset” unless you know enough to say for certain! (Also note: “targets” and “splits” are terms used in machine learning that aren’t super relevant for many data advocacy purposes.)
This should end up expanding your dataset documentation draft by somewhere around two pages. If you run into difficulties, shoot me a message.
Assignment 3: First complete draft of dataset documentation
For next class, I want you to complete the first draft of your dataset documentation by adding part three: an ethnographic assessment of your dataset.
What’s an ethnographic assessment?
If you didn’t already, read the influential article by Tricia Wang titled “Why Big Data needs Thick Data”. Then, at the end of your draft, after the technical documentation and the data biography, explain what “Thick Data” might look like for the dataset you found. In general, you should first present any “thick data” that you can put together for your dataset. Then, describe the additional thick data needed to ensure accurate, useful, ethical analysis.
In many ways, “thick data” can be thought of as an ethnography of your dataset. Careful ethnography usually involves conducting multiple interviews, and I don’t expect you to conduct any interviews for this assignment (although feel free if you want to)! Instead, your assessment might explain which people you would ideally want to interview, what kinds of information you would want to get from them, and what negative consequences might arise if the dataset were used without that information.
However, interviews are not the only kind of thick data. Thick data could involve notes on context, a backstory of the investigation, documents that provide an in-depth look at a subset of the data, case studies, glossaries of terms, rules and regulations, official policies, and more. Basically, thick data is anything that gives you necessary understanding of the human context of the dataset, including the norms and goals of the cultures or subcultures that created it. I do want you to find as much of this thick data as you can for your dataset without having to do any interviews, and start your ethnographic assessment by explaining the thick data you found, before going on to explain what you didn’t have time to find.
Use similar formatting to the earlier sections of your documentation, and split this assessment into subsections as you deem appropriate – the organization may vary from dataset to dataset.
Student sample
Here is a sample from a past student with a thoughtful ethnographic assessment piece. The earlier parts of the documentation may not look like what you have, because the assignment instructions have changed somewhat since this student’s semester.
Assignment 4: Final draft of dataset documentation
Use all the feedback you got from your groupmates and me to revise and finalize your dataset documentation. Work to ensure its completeness, the clarity of the writing, its logical organization, and most importantly, its usefulness to future researchers or advocates who might want to use this dataset.