A hands-on introduction to structured data
This lesson plan is for a 2-hour workshop that introduces Wikidata and the SPARQL query language through hands-on examples and activities. It introduces participants to fundamental concepts related to structured data, which powers everything from search engines (such as Google’s Knowledge Panels) to AI assistants. The lesson exposes learners to the underpinnings of systems encountered in day-to-day life both by showing them what this underlying data looks like and by giving them a chance to practice editing it. Particular attention is paid to missing data, gaps, and failures, which encourages learners to train their critical eye.
Alexandra Provo & Nicole Helregel
Learners will be able to:
• Understand linked open data concepts and structures such as triples
• Understand the components of a Wikidata item: QID, property, value, qualifier, and reference
• Modify simple SPARQL queries
• Manually edit a Wikidata item
• Evaluate citation practices and norms in an open data platform
This lesson can be suitable for a range of audiences, including undergraduate students, graduate students, faculty, and researchers. Instructors can modify or adjust the lesson plan depending on any awareness of the learners’ pre-existing knowledge or experience with Wikidata, query languages, database thinking, or linked/structured data. If the audience has an unknown or low amount of technical or conceptual knowledge of databases then consider slimming down the content (e.g. just focusing on running queries, not on editing entries) or extending the workshop to 3 hours (see Adaptability section for more details).
This lesson assumes no prior learner knowledge of linked data, structured data, databases, or metadata. Previous experience conducting database searches in library systems is helpful, but not required. Similarly, learners knowledgeable about relational databases and query languages like SQL may find the lesson easier, but this is not required. The lesson works especially well for information or data science courses, where students may have more relevant prior knowledge, but can work for a course in almost any subject area wherein the instructor and students are interested in contributing to open knowledge/scholarship. The most important part of teaching this session to a subject-specific audience is to tailor the examples to be relevant to their discipline. For examples of different subject area emphases, see the Adaptability section.
The workshop has been designed to be taught virtually but could be taught in person. If taught virtually, screen-sharing capability for instructors and breakout room features are required. If taught in person, slides and live demonstrations should be projected on clearly visible screens, and laptops should be provided for attendees if needed. No special technology beyond a web browser and an Internet connection is needed to complete the activities. Wikidata accounts are recommended, and time can be allocated during the lesson to create them. The lesson is best taught by a team of 2-4 instructors who have at least basic familiarity with Wikidata, Wikipedia, and SPARQL queries (but you do not have to be an expert!). Sections of the lesson can be led by different instructors, and instructors can also take on a specific breakout room (virtual) or area of the room to support learners with specific tasks like creating accounts, writing queries, or editing Wikidata.
Modify the slide deck and workshop script and assign sections to each instructor
Modify and prepare handouts, if using (print, if the workshop is in person)
Adjust and test SPARQL queries as necessary
Prepare demonstration edits by identifying Wikidata items and specific edits to be demonstrated
Compile a list of suggested items to edit
Reserve the physical room or Zoom room for the lesson
Create a Wikidata account (if the instructor does not already have a Wikimedia account, such as a Wikipedia account) and (optionally) practice editing entries in Wikidata
Create a campaign on the Wikimedia Foundation’s Programs & Events Dashboard to track participant contributions (optional)
Ensure they have a working web browser available on their computer if bringing their own
Create a Wikidata account (if they do not already have a Wikimedia account, including a Wikipedia account)
Note: While you may decide to require participants to create an account ahead of time, it is not strictly necessary, especially if the focus of the lesson is on querying rather than editing. If participants have not created an account ahead of time, this can be done during the hands-on editing section
Review asynchronous WikiEdu training: https://dashboard.wikiedu.org/training/students/querying-wikidata
Lesson plan - Detailed (while an overarching/general outline is included below, a fuller/more detailed lesson plan is linked to with a great deal more specificity - this is from an iteration of the workshop with a climate change theme, but could certainly be customized, as is the case with all of the following materials)
Example handout from the event “Crafting Citable Futures: 2040 Now, Wikidata, & Wikipedia”
Query Builder Script/Handouts
Query Service Script/Handouts
Example feedback form
Regarding accessibility, lesson materials have been made accessible (color contrast, document headings, image alt text). We recommend thorough verbal explanations of visuals, such as the “Data relationships” and “Getting to know a Wikidata entry” slides. As part of the Wikimedia universe, Wikidata is within the scope of discussions around accessibility (see https://meta.wikimedia.org/wiki/Accessibility) and the community strives to make its platforms accessible to “any user that is not browsing Wikimedia wikis using their eyes, or is not using a graphical browser on a desktop computer.” Wikidata policy and documentation pages can be very text-heavy, which is something to be aware of when asking learners to engage with these pages as they can be overwhelming. We recommend magnifying web pages such as the Query Builder and Wikidata Query Service during live demos (for example, using your web browser’s zoom feature) for increased visibility.
Duration: 2 hours (can be extended to 3)
[Slides used throughout, back and forth with live demos; see slides Notes section for corresponding outline section number.]
Introduction (10 minutes)
Instructor introductions
Outline for the session
What is Wikidata? What is a knowledge graph?
Today’s theme [can be related to a specific context/theme or can be tailored to the audience in some way, e.g. public health data for public health students]
How to read/speak Wikidata (20 minutes)
How to Ask a Graph a Question activity - points to emphasize: what is left out when querying a representation of the world; disambiguating; directionality of triples; formulating a question that can be answered by the graph
Intro to Wikidata item components: QID, property, value [refer to relevant sections of Handout]
Building queries using the Query Builder [see Query Builder Scripts for detailed example - consider tailoring to your examples and sharing links with attendees] (20 minutes)
How to use the Query Builder [use prepared example within theme]
Activity: learners try it out with instructor-prepared prompts and/or their own questions
Points to discuss: What is surprising? What is missing?
Building Queries using the Query Service: putting SPARQL into practice! [see Query Service Scripts for a detailed example - consider tailoring to your examples and sharing link with attendees] (20 minutes)
Building a query in stages to identify gaps in Wikidata’s information [use a prepared example within the theme]
Visualizations: graph builder; timeline
Editing Wikidata (20 minutes)
How to add a new property statement to an item
How to add a value to an existing property statement
Adding references when needed [use a prepared example within theme]
Points to discuss: use other Wikidata pages as a model if you’re not sure how to enter things or where to start; use Wikiprojects as a place to find easy/simple edits and tasks that need doing [use prepared example within the theme - try to find a relevant Wikiproject, or list a few in subjects you think will interest your audience]
Hands-on activities [in breakout rooms if virtual] [refer to resources and ideas on Handout] (25 minutes)
Any questions about what we’ve covered so far?
Rooms/areas for Creating a new Wiki account; Writing/modifying SPARQL queries; Customizing a SPARQL query; Wikidata editing
For each area/room: have a few pre-canned things that folks can do quickly/easily. But also encourage them to explore topics/areas that interest them!
Wrap-up (5 minutes)
Thank learners for attending
Assessment quiz, if using [see Workshop Feedback form]
Learning can be assessed via:
Observation of the learners completing activities during the lesson and during the breakout time. For example, walking around the room or entering breakout rooms to check understanding of the differences between properties and values, choice of SPARQL variables, and the differences between citations (references) and qualifiers.
Tracking of edits via a Wiki event dashboard (such as the Wikimedia Foundation’s Programs & Events Dashboard), which can be easily implemented specifically for the lesson event.
A short survey was administered at the end of the lesson, asking learners to report their confidence levels in the learning outcomes of the lesson and to list anything they are still confused about or wish they understood better (this would also allow for individual follow-up if the survey is not anonymous or learners had the option of including their contact information for the use of followup).
Optionally, if working with a faculty member who is interested in creating an assignment based on the workshop, or if you want to more closely target the learning outcomes, consider adding some of the following questions to the feedback survey:
Learning outcome: Manually edit a Wikidata item
What item(s) did you edit? Provide links.
What was the editing process like? How did it feel?
Learning outcomes: Understand linked open data concepts and structures such as triples; Understand the components of a Wikidata item: QID, property, value
What statement(s) did you add? Identify the subject, predicate (property), and object of your statement.
Learning outcome: Modify simple SPARQL queries
What question did you ask Wikidata?
What was your SPARQL variable (the unknown)?
What property did you use?
What object/value did you choose?
Learning outcome: Evaluate citation practices and norms in an open data platform
What citations or references did you find in Wikidata? What did or did not surprise you about these references?
If you added references to a Wikidata item, how did you choose them?
While designed as a 2-hour online workshop, the lesson could be extended to 3 hours, and it could be implemented in person and/or as part of a semester-long course. More than 2 hours is recommended if the audience has very little experience with both the conceptual and technical components of the workshop; if you cannot do more than 2 hours then consider slimming down the content somewhat (e.g. just focusing on running queries, not on editing entries). The lesson can be adapted to focus on almost any subject area wherein the instructor and students are interested in contributing to open knowledge/scholarship. The most important part of teaching this session to a subject-specific audience is to tailor the examples to be relevant to their discipline. The example lesson outline and materials have a climate change focus, which is a topical focus that works well for interdisciplinary audiences. Through the five iterations of this lesson, we have tested so far, query examples and editing suggestions have been adjusted to a variety of themes:
Climate change (including endangered species, the Climate Change WikiProject, and films about climate change and related concepts)
Climate change by region or country
Severe weather events
New York City (including parks, geography, and NYC-born scientists)
20th-century art and artist
When brainstorming about topic ideas we recommend considering examples that draw out critical questions around inclusion, exclusion, and representation, and highlight what or who is missing. For example, while querying for severe weather events you could highlight the geographic bias in terms of weather event data that is currently available in Wikidata; or when searching for scientists or artists you could explore gender representation and considerations around sourcing (e.g. whether an individual’s gender is from a source where they self-identify or not). These kinds of examples can be found in almost every subject area and will help bring to light some of the conceptual ideas about the inherent subjectivity and non-neutrality of description, categorization, linking data, etc.
The structure of this lesson – with its distinct sections on the basics of graph structures, querying with a visual interface, writing queries directly, and editing Wikidata – has made it possible to involve colleagues with various levels of linked data knowledge and confidence as workshop co-instructors. For example, collaborators less comfortable with writing SPARQL queries directly in the Query Editor can take on other sections of the lesson. Sharing the labor among instructors with varying levels of knowledge models a welcoming and supportive environment for learning, encouraging people to try a sometimes inscrutable and intimidating query language by presenting the instructors as learners themselves. Part of the fun we had in developing this lesson was learning from each other. While not always practical or feasible, sharing the teaching responsibility among many of us in this workshop has been a gratifying way to embody the collaborative spirit of Wiki communities.
In our teaching of this lesson so far, we have found it to be highly adaptable and customizable for different themes. That said, the parts of this workshop that require the most effort are writing example queries, coming up with lists of suggested items for editing, and planning edits for demonstrations (for example, by finding appropriate references). We recommend building in enough preparation time to test example queries, plan demonstration edits, and create lists of items to edit.
While one-hour workshops are often desired by collaborators and participants, we strongly advise scheduling this workshop for 1.5-2 hours, especially if hands-on activity time is desired. One iteration of this workshop was cut down to an hour, and it was difficult to fit in all of the learning outcomes. If your audience is already familiar with knowledge graphs and/or linked open data then some of the more general introductory material could be cut in favor of more querying and editing, but over an hour is still recommended.
In future iterations of this lesson, we plan to be more intentional about collecting post-workshop feedback, and would consider incorporating more formative assessments into the workshop itself.
This lesson is based on workshops developed by Nora Lambert, Lia Warner, Nicole Helregel, Jojo Karlin, and Alexandra Provo at NYU Libraries. Many thanks to our peer reviewers - Zoe Bursztajn-Illingworth, Shelby Hallman, & Trent Wintermeier - whose comments and suggestions were insightful and sharpened this chapter and the accompanying materials.