This lesson plan introduces digital humanities concepts through an exploration of the process of preparing and analyzing textual data.
There are three modules with a combination of discussion and hands-on activities. Some of the modules require the participants to have computers in order to fully participate. These modules can either be done collectively as a workshop (or series thereof) or individually as one-shots.
Disciplinary teaching faculty (optional)
Senior undergraduate students (preferably with a capstone to be completed)
Graduate students
Faculty
Adaptable to any project or course that works with text.
See session outlines below.
See session outlines below.
See session outlines below.
Critically examine the foundational concepts of digital humanities
Underwood, T. (2015). Seven ways humanists are using computers to understand text. Retrieved from https://tedunderwood.com.
Svensson, P. (2012). Envisioning the digital humanities. Digital Humanities Quarterly 6(1).
Schultz, K. (2011, June 24). What is distant reading?. The New York Times. Retrieved from https://www.newyorktimes.com.
Sinclair, S. & Rockwell, G. (2016). Text analysis and visualization: Making meaning count. In S. Schreibman, R. Siemens, & J. Unsworth (Eds.). A New Companion to Digital Humanities(274-290). Malden, MA: John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118680605.ch19
Silge, J. (2016). She giggles, he gallops. The Pudding. Retrieved from https://pudding.cool/
Stevens, G. (2017). Early American Cookbooks. Retrieved from https://wp.nyu.edu/early_american_cookbooks/
Opportunity Insights & U.S. Census Bureau. (n.d.) The opportunity atlas. Retrieved from https://www.opportunityatlas.org/
Results of the activity/discussion
Pre- and post-session surveys (template included at the end)
Break participants into small groups of 4-6 to brainstorm answers to the following discussion questions. (Note: participants do not need to have any prior knowledge to answer these.):
What is Digital Humanities?
What isn’t Digital Humanities?
After a few minutes, have each of the groups report back on their brainstorming but don’t engage in discussion until the responses have been recorded for everyone to see.
Once everyone has contributed, use the answers as a jumping off point for discussion on what Digital Humanities is and is not. The following can also help facilitate discussion:
Results of a What is DH (Google Scholar) search.
DH as modeling humanities data with computers. Computer is a general purpose “modeling machine”
Is it/should it be an academic discipline or degree? Or does it belong in humanities departments?
Technology can help find patterns, but can also hide knowledge and patterns.
Some tools can have a “blackbox” effect (i.e. don’t know how the software works and just take the results as true).
Humanities are essentially static (in contrast to other sciences which have ongoing processes that can be studied/experimented with).
Scale and scope of field are questionable/changing.
Bring up projects or articles as examples of DH in action (projects can be brought up without participant prep, but the article should be read ahead of time if referenced).
Screen Directions Project: How does the project creator visualize information? Does it convey the information successfully? Also a good time to mention the open source nature of many DH projects as the code is available on GitHub.
Cookbooks Project: What was the project creator able to get out of analyzing cookbooks? They used HathiTrust and Tableau, so good example of text analysis that many people can do. Can show https://analytics.hathitrust.org if there’s time.
Opportunity Atlas: Great example of combining GIS with census data and statistical modeling. Has a methods page explaining how it was put together and its limitations.
Sinclair, S. & Rockwell, G. (2016). Text analysis and visualization: Making meaning count.
End with the group attempting to come up with their own definition of what DH is/is not. Discussion of this can include:
The question of what it is defines the field in some ways.
Digital can be used to differentiate it from other fields (sometimes even in negative, condescending way).
Different terms used over time and in various contexts.
Lots of work in digital humanities is thinking/writing about what digital humanities is.
Hard to define it, but we know when we’re doing DH.
Provide links to DiRT and TAPOR for further self-directed exploration
Critically examine the concept of data and preparing textual data for analysis
Ignatow, G. & Mihalcea, R. (2017). An introduction to text mining: Research design, data collection, and analysis. Washington, DC: Sages. (Ch. 8 “Basic Text Processing”, Ch. 16 “Analyzing Topics”)
Turing, A. M. (1950) Computing Machinery and Intelligence. Mind 49: 433-460.
Stanford CoreNLP Tools (Part of Speech Tagging, Named Entity Recognition, Sentiment Analysis, and Dependency Tagging).
Text to run through the Stanford CoreNLP. Instructors should have some sample texts available, but participants can also bring their own. The text should already be digitized into a format that makes it easy to copy and paste sections.
Results of the activity/discussion
Pre and post session surveys (template included at the end)
Open with discussion question: What are data?
What does this mean in your field?
What sort of data do you and/or your students encounter that are “big” / “difficult,” etc?
Is art, literature, etc. the same as data?
Is there a difference between a digital representation of art or literature and the printed book or artwork itself?
What are some examples of humanities data?
What would “big data” be in the humanities?
How are Humanities data different than other fields?
Define unstructured data and structured data
Digital Humanities often uses “unstructured data” like art or text.
Review the Data Science Process:
Highlight that the entire process is often iterative and not necessarily completed in a fixed order.
DH offers ability to look at texts closely or large sets of texts (distantly) in new ways. DH pulls on data-ideas, hypothesis-testing, quantitative analysis, modeling, visualizing, etc.
Saw movement early on in Moretti’s idea of Distant Reading, which relied on computer assistance. Stanford Literary Lab
Can look at one text in new way/present it visually in more concise manner.
Can look at many texts in way that’s not possible as human because of memory, time, etc.
Making a computer understand data (specifically textual data). Discuss the following concepts:
Computers don’t understand language. We need to give them rules on how to work with it.
Turing tests, a common example of which is a CAPTCHA (an acronym for "completely automated public Turing test to tell computers and humans apart")
Text wrangling: What might you need to do to text in order to make a computer understand it?
OCR, spelling changes, dropping punctuation, convert contractions, remap names/words (e.g. British <—> American Spellings)
Split your data up / Join your data together (Unit of Analysis, create corpora)
Extraction / Removal (e.g. Just proper names, locations. Measurements in cookbook)
Tokenizing: Breaking a text down into meaningful parts, typically words. It’s relatively easy to break text down to the word, sentence, or paragraph level in English as compared to languages such as Chinese that don’t use the same sort of punctuation and spacing to separate those chunks out. However, there can still be issues with words/spellings/punctuation e.g. home work, homework; comma splice in text.
Stop wording: Identifying words that may appear frequently but won’t add to your understanding of the text at a distant reading level.
Show a word cloud with stop words and without. Try to have participants guess the text each time.
In the above example, the first circle represented commonly used stop words, the second the text of Jane Eyre with stop words included, and the third the text of Jane Eyre with stop words excluded.
Introduce the blacklist/whitelist concept.
Stemming (getting to the root of the word)
Lemmatizing (getting to dictionary headword)
POS Tagging (part of speech tagging)
Look at Stanford NLP Tool to see this in action.
End by dividing class into small groups and giving them each a small amount of text (or use their own) to run through the NLP tool. Ask them to take it through the data cleaning process and see what they come up with.
Apply and critique the use of multiple digital tools to examine text.
Articles
Sinclair, S. & Rockwell, G. (2016). Text analysis and visualization: Making meaning count. In S. Schreibman, R. Siemens, & J. Unsworth (Eds.). A New Companion to Digital Humanities(274-290). Malden, MA: John Wiley & Sons, Ltd.
Graham, S., Weingart, S. & Milligan, I. (2012). Getting started with topic modeling and MALLET. The Programming Historian.
What is topic modeling? (2017).
Silge, J. (2016). She giggles, he gallops. The Pudding.
Stevens, G. (2017). Early American Cookbooks.
Resources
N-grams: Hathi Trust’s Bookworm and Google’s N-grams.
Papers with questions on them for the send-a-problem activity
Results of the activity/discussion
Pre- and post- session surveys (template included at the end)
Have audience each bring an example of text with them (an OCRed article, plain text, etc.). Instructor should have a few pieces of clean text with them in case of problems.
Have students develop a research question they’d like to answer with text analysis.
Briefly discuss functionality of N-grams (Hathi Trust’s Bookworm and Google’s N-grams)
Send-a-problem activity: Divide the participants into three groups. Give each group a sheet of paper with a different question to try to answer using one or both of the N-gram tools and a few minutes to answer it. The questions can be selected from the research questions that the class develops or can be pre-determined by the instructor. After they have tried to answer the questions for a few minutes have them pass the sheet with their question and answers to the next group and repeat. After a few minutes they pass the sheet again, but this time instead of answering the question have the third group evaluate the answers. This activity helps demonstrate the collaborative, interpretive nature of DH.
Use the results of the activity to lead a discussion of the positives and drawbacks of using N-grams. Were they able to answer the question? How successfully? Were there any concerns in using the data they found to answer the question? Were they able to find applicable data?
Topic Modeling
Introduce the concept of topic modeling. The illustration and definition from the What is topic modeling? reference above is a good resource for this.
She giggles, he gallops and Early American Cookbooks can also be good resources to help with discussing the concept.
Voyant
Show Voyant and the breadth of the tools it has
Have participants bring up Voyant on their own devices if possible. Small groups would be fine for this if there are not enough devices to go around.
Walk them through uploading the text they brought and ask them to explore the text through the different tools available for a few minutes.
Discuss what they see about their text by using Voyant. Any unexpected patterns? Or expected ones?
End with revisiting the original research questions. Can any of these tools help them answer it? Do they open up new avenues to explore? Or are they not appropriate for what they want to learn?
We have used these modules to teach both faculty and student audiences about the topic. Generally we have left our discussions fairly open-ended, posing questions and letting the audience steer the discussion toward the subject area applications that interest them. This can lead to some great discussion in an interdisciplinary group as it often gives them a chance to look at data and digital humanities (or digital scholarship if you would like to use a broader term) from a new perspective. Our recommendation would be to have some sample texts or examples of applications that apply to the discipline you are presenting to, or a variety if you are presenting to an interdisciplinary group. If some of the tools or examples don’t work as intended we have used that as a chance to discuss the (potentially) ephemeral nature of grant-funded tools, how digital humanities tools and methods are constantly changing, and the importance of continually adapting your tools and methods as you progress in your research.
Note: These questions are intended to be given to students in the class at the beginning and end of a session and can be adapted to more specifically meet the needs of the class or module that you are assessing. Additional questions related to the content of the classes can also be added.
Pre-Session Survey Questions
Please rate your current understanding of the topic (select one)
None
Basic
Intermediate
Advanced
Expert
Is there an assignment related to the topic being covered? (select one)
Yes
No
Post-Session Survey Questions
Please rate your understanding of the topic after taking part in this session (select one)
None
Basic
Intermediate
Advanced
Expert
Do you feel the information presented in this workshop will improve your ability to complete class assignments? (select one)
Definitely yes
Probably yes
Probably not
Definitely not
How would you rank your ability to use the information presented? (select all that apply)
I am not able to use the information provided
I will be able to use the information provided with more guidance
I will be able to use the information provided with more practice
I can adequately apply the information provided with no more than minimal guidance
I can proficiently apply the information provided with no more than minimal guidance