Skip to main content
SearchLoginLogin or Signup

Preparing Letters as Data: Transcribing Archival Documents

This lesson introduces students to transforming archival documents into digital artifacts through transcription and metadata tags, using the text-recognition and transcription software, Transkribus, to prepare them for future digital scholarship inquiry.

Published onOct 05, 2022
Preparing Letters as Data: Transcribing Archival Documents


This lesson introduces students to transforming archival documents into digital artifacts through transcription and metadata tags, using the text-recognition and transcription software, Transkribus, to prepare them for future digital scholarship inquiry. Additionally, this lesson gives students a greater understanding of what it takes to collaborate on and manage projects with multiple moving parts.

Literacies & Competencies

  • SAA/RBMS Primary Source Literacy

    • 1.B: This lesson engages students with primary sources through conceptualizing their importance to the scholarly record

    • 3.A: Students read and understand the script within the primary source to transcribe and annotate it 

    • 4.C: Students critically examine the content within the primary source for the purpose of tagging and continuing to evaluate its contribution to their research

  • Bryn Mawr Digital literacies

    • 1.1 Networks and file management: Students use their transcriptions and tags to create derivative files (.txt and .xlsx) from a primary source

    • 2.1 Collaborative communication: Students work in groups to transcribe several documents in one collection and collaborate to mark their progress, troubleshoot, and create consensus around metadata decisions.

    • 3.4 Metadata: Students develop a working knowledge of the standards for tagging letters

  • ACRL Information Literacy Framework for Higher Education

    • Information Creation as a Process: Students use primary sources to create information (metadata) for future inquiries of their own or other researchers.


Undergraduate or graduate students who are transcribing primary source documents.

Curricular Context

This lesson was developed for an introductory digital humanities course, which is taught as a survey of different computational methods in Mississippi University for Women’s Digital Studies minor. The minor is interdisciplinary, and this course is taught by library faculty with digital scholarship experience. When this course was developed, we decided to use an archival collection as the basis for digital scholarship. 

In this lesson, students are just beginning to work with a set of letters in that collection by transcribing them as a group, and therefore preparing them as data to further experiment with different computational methods. The letters themselves are part of a collection of local family papers that span the 20th century and are written to and from several family members and close friends. (Learn more about the collection.)

This lesson occurs at the beginning of the semester, and the resulting transcriptions and tags are used in subsequent lessons. Students work directly with 2-3 letters on their own, but they are contributing to all the transcriptions from the collection, which at this writing is about 200.

The lesson takes place over two 90-minute class sessions to learn the software, discuss transcription guidelines, and give students time to work on their own. The first class focuses on transcribing a letter from beginning to end (parts I-III of the lesson) and gives students time to begin transcribing independently in a supervised environment. The second class discusses metadata tags (parts IV-V of the lesson) and gives students further time to work independently in a supervised environment.

Instructional Partners

  • Hillary Richardson, Instructor of Digital Studies and Smith Papers Project manager

  • Elaine Walker, Graduate Studies Librarian and Smith Papers Project member


  • Desktop or laptop computers with Transkribus software installed

  • Access to project management software (examples in the lesson) or Google Sheets (a template is provided with this lesson)


This lesson uses letters from an archival collection within Mississippi University for Women but can be used with any collection (institutional or otherwise) accessible to the participants. Additionally, the letters used in this collection have also been the basis for a Handwritten Text Recognition (HTR) model in Transkribus (“Pauline Smith 2.0”). There are other models in Transkribus. The instructor may want to experiment with different models or explore creating their own, depending on the uniqueness of the collection at hand, or alternatively, use Transkribus to transcribe documents manually. 

At the time of this publication, Transkribus has a downloadable version that is free (to a point) and a browser-based version in beta development (Transkribus Lite). This lesson is based on the downloaded program (which requires Java 8 or higher to run); future lessons may use the browser application.

For a minimal computing option, participants can use a standard text editor application that is installed on the device they are using, such as Notepad. Instructors can also forgo annotating the completed transcriptions that occur in the second session of the lesson and can choose to use oXygen or another XML editor to maintain this part. Otherwise, this part of the lesson can be skipped to downsize the lesson.

Learning Outcomes

Students will be able to:

  • Read, evaluate, and transcribe an archival, primary source document into machine-readable .txt files 

  • Add metadata tags to a primary source document to enrich the file, enhance discoverability, and create opportunities for further research

  • Track group progress through a project management template

  • Create different file types from transcribing a letter to prepare them for future digital scholarship (.txt, .docx, .xlsx., and .pdf)


  • If they haven’t already been digitized, identify and digitize a set of documents from an archival collection that can be transcribed. (A sample letter is provided with this lesson.)

  • Create a collection in Transkribus and import these items into that collection. (Note: This lesson does not demonstrate how to import letters into a collection, but Transkribus provides documentation.)

  • Before the first meeting, ask students to create Transkribus accounts and add their account emails to the collection as editors. 

  • If using an HTR model, identify (or begin working toward) a model that works well with the handwriting in the collection.

  • Make the lessons asynchronously available to students before the first meeting.



Student Learning 

There are several opportunities for discussion and check-ins built into the lessons for formative assessment. The original lesson linked to quizzes in a learning management system (LMS), so the discussions and check-ins could be translated within that LMS, or they could also be done as in-class discussions. Check-ins are meant to gauge the student’s comprehension of the software (e.g., “What do you do if the transcription line is split when it shouldn’t be?” or “How do you create a tag when it isn’t already in the given list?”) and their comprehension of transcription guidelines (e.g., “What do you write if you can’t understand the handwriting?”). Assessments can also involve in-class discussions to gauge students’ comfort level with reading different kinds of scripts, depending on what collection the instructor uses in the lesson, or to see if they are comfortable with choosing different categories of tags (e.g., “What’s an example of a time you might use the comment tag?”).

The final transcriptions, tags, and project management templates can also be used as summative assessments after completing them. Rather than going over these with a fine-tooth comb, we recommend having students share their drafts with each other and turn in one as a pair or group.

Implementation Fidelity

In the transcription assignment for the course in which this was initially implemented, I offered additional time for students who had trouble deciphering the handwriting and did not finish the transcription in class. Students who more frequently came to optional office hours or used the project management tool to communicate their questions were more likely to better grasp the software and the content within the letters they transcribed. The students who waited to complete the assignment, did not attend office hours, or did not ask the group about transcriptions they felt unsure about were less likely to comprehend the nuanced definitions of the different metadata tags, and a little shakier on things like file types and harder-to-read words. Ample opportunities to engage with the instructor and each other will help students accurately, consistently, and confidently tag and transcribe the documents based on the standards outlined in the class and the lessons.


Students enjoyed transcribing the letters and getting into the details of the content, even though it was difficult for them to read the cursive handwriting at first. This response had much to do with students understanding the letters’ context and importance, which I provided them so they could be more invested.

The technical part of this lesson was not an obstacle to students’ enjoyment. Still, it was helpful to give them as many opportunities as possible to engage different functions of the software as a group and repeat steps often (while making sure to reference the steps in the lesson).

Finally, as mentioned, it was helpful to give students out-of-workshop time to work on transcriptions and ask questions about handwriting, software, and the content of the letters. I called them “transcribing parties” and told them to bring snacks.

Lesson Outline

Before you begin

  • Provide a statement on addressing bias in preservation descriptions (bias within the collection and in annotations, tags, metadata, etc.)

  • Review recommended reading on transcribing guidelines and software documentation

  • Install Transkribus, and provide either a project management tool or a spreadsheet for tracking progress (one is provided)

Session 1: Lessons I-II

  • Transcribing Guidelines: Tips to help maintain original document integrity while transcribing and tips for handwriting that is difficult to read

  • Text Regions in Transkribus: Step-by-step guide for arranging, creating, and editing text regions

  • Handwritten Text Recognition: Instructions for selecting and implementing a text recognition model

Session 2: Lessons IV and V 

  • Metadata Tags: Learn how to describe and annotate important and unique contents of the letters

  • Exporting Files in Transkribus: Options for saving and migrating documents

Additional Materials

No comments here
Why not start the discussion?