Skip to main content
SearchLoginLogin or Signup

ARL White Paper on Wikidata: Opportunities and Recommendations

Published onNov 03, 2023
ARL White Paper on Wikidata: Opportunities and Recommendations

Allison-Cassin, Stacy, Alison Armstrong, Phoebe Ayers, Tom Cramer, Mark Custer, Mairelys Lemus-Rojas, Sally McCallum, et al. ARL Task Force on Wikimedia and Linked Open Data. 2019. ARL White Paper on Wikidata: Opportunities and Recommendations. Association of Research Libraries.

About ARL

The Association of Research Libraries (ARL) is a nonprofit membership organization of libraries and archives in major public and private universities, federal government agencies, and large public institutions in the US and Canada. ARL advances research, learning, and scholarly communication, fosters the open exchange of ideas and expertise, promotes equity and diversity, and pursues advocacy and public policy efforts that reflect the values of the library, scholarly, and higher education communities. ARL forges partnerships and catalyzes the collective efforts of research libraries to enable knowledge creation and to achieve enduring and barrier-free access to information.1

ARL Task Force on Wikimedia and Linked Open Data

Stacy Allison-Cassin, York University Alison Armstrong, Ohio State University

Phoebe Ayers, MIT; former trustee, Wikimedia Foundation

Tom Cramer, Stanford University

Mark Custer, Beinecke Rare Book & Manuscript Library, Yale University

Mairelys Lemus-Rojas, Indiana University–Purdue University

Indianapolis (IUPUI)

Sally McCallum, Library of Congress

Merrilee Proffitt, OCLC Research

Mark A. Puente, Association of Research Libraries

Judy Ruttenberg, Association of Research Libraries

Alex Stinson, Wikimedia Foundation

Contributors and Editors

The ARL Task Force on Wikimedia and Linked Open Data made a draft of this paper available for public comment from November 19 through November 30, 2018. Many people (librarians, Wikimedians, and others) provided valuable and constructive feedback including the following:

  1. Asking for clarity on the framing of the document (was it meant to inform or to advocate, and to which audience or audiences?)

  2. Pointing out known cultural challenges within Wikipedia and questioning the extent to which those issues are present in the Wikidata community

  3. Pointing out known structural challenges within libraries with respect to how authoritative metadata is created and propagated, and the workflow implications of embracing Wikidata

  4. Challenging whether there is reciprocity and mutual benefit in strengthening the Wikimedia-library relationship, in particular when time is a scarce resource

  5. Questioning the use of a centralized wiki-database for structured data, with respect to both quality control and preservation in the library context

  6. Reflecting on how the growth and practice of Wikidata, and the white paper’s presentation of it, relates to linked open data in libraries writ large

While the Task Force on Wikimedia and Linked Open Data provides responses to some of the critiques and thorny issues, some will simply not be resolved in this paper. Some of the feedback provides the basis for a research agenda we hope will be taken up by readers.

ARL convened the task force and wrote this white paper to inform its membership about GLAM (galleries, libraries, archives, and museums) activity in Wikidata and highlight opportunities for research library involvement, particularly in community-based collections, community-owned infrastructure, and collective collections. The task force included colleagues who do not work in ARL member institutions, and the white paper covers activity well outside ARL libraries, including smaller libraries, museums, and scholarly communities. Many in the international research community, including in libraries, are focused on community-owned infrastructure2 and robust metadata3 to facilitate open scholarship practices,4 and this paper takes a close look at Wikidata and Wikibase through that lens—as a public good worthy of examination and support. The task force, the public comments, and the structures upon which the white paper is built are all volunteer-based. ARL is grateful to the volunteer effort that is the Wikimedia community.

External contributors generously fixed references, created items in Wikidata, added or improved examples and use cases, suggested new sections, and copy-edited for sense and clarity. This draft addresses the critiques, and benefits enormously from the contributions.

Thanks to Paul Burley, Karen Coyle, Teri Embrey, Beat Estermann, Steven Folsom, Violet Fox, Valeria De Francesca, Michelle Futornick, R. Benjamin Gorham, Stephen Hearn, Regine Heberline, Evelin Heidel, Andy Mabbett, Luca Martinelli, Monica McCormick, Daniel Mietchen, Jere Odell, Daniel Poulter, Lane Rasberry, Charles Riley, Amanda Rust, Dorothea Salo, Robert Sanderson, Dan Scott, Christina Spurgin, Andrew Su, Sara Thomas, Ruth Tillman, Simeon Warner, and several anonymous contributors.

Recommendations in this white paper are from the task force and are intended to inform and provide information for working with Wikidata if desired.

Executive Summary

ARL and the Wikimedia Foundation have been in conversation about collaboration and mission alignment since 2015. Representatives from the two organizations held a summit at the International Federation of Library Associations and Institutions (IFLA) conference in 2016 in Columbus, Ohio, and subsequently contributed to several white papers5 exploring additional opportunities to work together. The IFLA summit surfaced the following principal areas of interest:

  1. Using linked open data (LOD) to describe and connect resources, to mutually enrich Wikimedia and library discovery sources

  2. Establishing learning communities for Wikimedians in libraries, cultural heritage, and research institutions

  3. Librarians addressing the gaps in content and the cultural barriers inherent in Wikipedia, and conversely, using Wikimedia projects to help address cultural barriers in traditional library and archival practice

ARL charged a Task Force on Wikimedia and Linked Open Data in mid-2018 to focus on areas 1 and 3: linked open data and diversity & inclusion. In practice, this meant (1) focusing on Wikidata6 as a potential repository for libraries’ linked open data, and (2) that a significant use case driving the formation of the task force was a mutual interest between libraries7 and the Wikimedia community8 in creating culturally competent descriptive metadata in collaboration with communities whose lives, collections, and relationships are being described. When the task force was convened, it became clear that Wikibase9 (the infrastructure) and Wikidata (the community and the knowledge base) should also be explored explicitly in the context of ARL’s stated commitment to equitable and barrier-free scholarly communication.

Wikidata is a collaboratively edited knowledge base hosted by the Wikimedia Foundation, the nonprofit organization that also hosts Wikipedia and several other wiki-based knowledge projects. Wikidata serves as a central knowledge base for Wikimedia projects as well as being a freely available open database of linked open data for other projects. The task force was asked to consider issues, challenges, educational needs, and resource requirements for libraries to participate in Wikidata (and its software, Wikibase) in their uses and applications of linked open data to advance and enrich discovery of locally curated collections on the global web. Task force members collectively articulated the following goals for a white paper:

  • Elevating the visibility of linked open data in research libraries and in general, and Wikidata/Wikibase in particular, as part of a global, openly licensed, linked data network for public good

  • Providing an in-depth and actionable introduction to Wikidata for the library community, including a method to get started—like Wikipedia, Wikidata is both easy to use and complex to master.

  • Modeling the role of librarians as contributors as well as users of a global, largely volunteer open knowledge network, by encouraging contribution of library services, workflows, tools, and software to the Wikidata community—this contribution can and should include research, development, and experimentation.

  • Identifying barriers for librarians and library staff to contribute, on a regular basis, to Wikidata/Wikibase—these barriers may include, among others, lack of sufficient time, skill, staff capacity, or institutional policy environment.

  • Creating a call for a community of practice around a metadata infrastructure for knowledge equity and diversity, with a focus on developing an understanding of the culture and skills of contributing to a knowledge commons like Wikidata

  • Providing example use cases for how libraries and other cultural heritage institutions are already using Wikidata, which might inspire other ways to work with the project, for example, Wikidata: WikiProject Open Access10

While ARL convened this task force to inform its membership as the principal audience, the recommendations in this white paper apply to libraries well beyond the Association. The recommendations are meant for individual librarians wishing to explore the Wikidata community, and for library departments and organizations wishing to consider structural commitments to this open data source in the advancement of their work.

Contributors to this paper during the public comment period raised serious concerns about library participation (both individual and organizational) in a community with no formal governance or leadership, where systemic social power dynamics can reign and where sustainability and persistence are at risk. These concerns, acknowledged and addressed to the extent practicable, might provide a fruitful research agenda for further conversation. Tangibly, one reviewer asked whether we can advise on the best way to proceed with Wikidata involvement, which we look forward to exploring with more data.

In the meantime, this paper’s recommendations are based upon use cases in cultural heritage documentation, open scholarly communication, archival and bibliographic discovery—in particular through collaborative description with communities represented in our collections.


Work on Wikidata is primarily contributed by individual volunteer contributors, who have organized their editing and data-modeling efforts in thematic projects, such as cultural heritage.11 Contributing to these projects provides a way to interlink Wikidata to sources of library data. Wikidata’s rise in linked open data communities also provides an opportunity for libraries to get involved in contributing to modeling and data efforts on a larger scale.12

For individual librarians:

  • Use and experiment with Wikidata, for example:

    • Contribute local name authorities to Wikidata, particularly for underrepresented creators and organizations.

    • Add institutional holdings to existing Wikidata items using the “archives at”13 property.14

    • Create items for faculty in an institution.

  • Explore and experiment with Wikidata editing tools such as Mix’n’match, batch uploading, and database dumps.15

  • Create a “hub of hubs” for authority controls, metadata vocabularies, and other data sources, to facilitate the connection between existing external metadata sources and Wikidata.

  • Get involved in the greater Wikimedia community by holding edit-a-thons and workshops, participating in discussions on email lists and in social media channels, and by joining the Wikimedia and Libraries User Group.16

  • Advocate within your research communities and organizations for open, compatible licensing of data sets so that they can be incorporated into Wikidata.17

For library leadership and organizations:

  • Give staff time to experiment and contribute to Wikidata, including by determining tasks that can be added to existing positions and workflows, or incorporating Wikidata participation into existing incentive and reward structures.

  • Expand capacity with Wikimedians in Residence or fellowships.

  • Inform and advocate with your patrons/scholars/research community to use LOD for their research projects that involve data/data sets.

  • Make data sets and scholarship from existing institutional projects visible on Wikidata as part of a global network of knowledge. Large-scale cooperative projects like Social Networks and Archival Context (SNAC),18 and VIAF, the Virtual International Authority File,19 for example, have added identifiers to Wikidata.

  • Provide linked data support to researchers, academics, and other patrons wishing to expand the context of their own research and data or to develop web applications representing knowledge from their field.

  • Engage scholars and communities working in underrepresented knowledge areas to help extend existing sets of knowledge in Wikidata.

  • Explore and advocate for the use of Wikidata identifiers (“Q IDs”) or equivalent uniform resource identifiers (URIs) in library and archival systems, repositories, and platforms.

  • Consider the use of Wikibase as a LOD store for local identifiers and authority-like data.

Wikidata is a largely volunteer community that leverages passion and enthusiasm. If through participation and experimentation libraries come to rely upon the infrastructure, they will want to assess their contributions to its growth and development within the context of their other official commitments and expenditures to Wikidata as a public good.


In 2017, librarians at York University advanced a compelling use case for Wikidata when they began a project with ARL to work in partnership with Indigenous communities to create inclusive, culturally competent metadata about Indigenous communities and collections in Canada.20 The work is part of a response to the national calls to action issued in the 2015 final report of Canada’s Truth and Reconciliation Commission. Museums and archives, as sites of public memory, were called out as crucial spaces to advance reconciliation. The York project leaders chose a linked open data approach in order to share their work—which would involve original research and consultation with Indigenous communities to identify self-determined names, places, and relationships in their collections—as widely as possible to the extent desired by the communities. Linked data facilitates semantic interoperability with related materials. York chose Wikidata as the place to store that original structured data because it is open, non-proprietary, flexible, community-oriented, and growing in adoption and use internationally. Finally, by contributing to the metadata hub underlying Wikipedia, the York project would increase the visibility and representation of Indigenous people and collections in Wikipedia in a way that addresses what the Truth and Reconciliation Commission called “ways that have excluded or marginalized Aboriginal peoples’ cultural perspectives and historical experience.”21

Much of the conversation between ARL and Wikimedia at their 2016 summit centered around diversity and inclusion challenges within the Wikimedia community, which are reflected in the content of Wikipedia and often grounded in the self-fulfilling prophecy of “notability.”22 Several readers of this paper pointed out similar inequities in Wikidata, with respect to data about people, for example, but others noted that the different notability criteria, and the community of editors, distinguishes Wikidata as a place of greater opportunity than Wikipedia for inclusion of communities. Within the Wiki community writ large, librarians who are active Wikimedians are also among the community’s staunchest critics, and serve as advocates for other library professionals to remediate some of its social problems.

Wikidata is a multilingual project, supporting labeling and items in hundreds of languages, all stored in a single repository. A community of approximately 18,000 active editors from around the world work to create, enhance, and validate Wikidata’s data. Many of these editors also have experience as editors of Wikipedia or other Wikimedia projects, but many specialize in Wikidata, and the project has developed a distinct editorial community of its own, which also attracts contributors from outside the Wikimedia community.

Since the founding of Wikipedia in 2001, individual librarians have been involved in the project as volunteer contributors, but in the last several years, as the site has grown in prominence and reach, libraries have significantly increased their engagement and participation in Wikipedia. Recognizing Wikipedia as a highly-used source for their communities, librarians organize edit-a-thons,23 and institutions are rewriting job descriptions to encourage participation in Wikimedia projects. Some have hired Wikimedians in Residence24 or use graduate student interns or fellows to support representation of particular subjects from a diversity and equity perspective, and to enhance the visibility of their unique collections. Ease of use and familiarity of Wikipedia to library users has made it a successful platform around which to organize specific communities of interest and leverage their enthusiasm—in music, art and feminism, history, and more.25 Similarly, some expert communities, such as medicine, have organized both on- and offline edit-a-thons to ensure the quality and accuracy of Wikipedia articles.26

Libraries employ various strategies to make their collections discoverable and accessible on the web outside of traditional library discovery platforms. Such efforts have included making links to unique collections and local scholarship available in Wikipedia and other open knowledge platforms. Within the library community, a move to openly licensed metadata, open citations, and linked open data are hallmarks of a shift toward more open scholarship and infrastructure. In addition to advancing a broad mission to contribute to the public good, enhance the reputation of their institutions, and contribute to the connectedness of their scholars, such efforts have also been tied to community engagement, broadly conceived. Increasing exposure of library collections in Wikipedia has also been a critical part of advancing a diversity and equity agenda by helping to fill and address known gaps.

Linked open data (LOD) is openly sourced, commonly formatted, and interrelated. LOD breaks apart the bibliographic record into its component parts (for example, dates, works, creators), leverages authority files and uniform resource identifiers (URIs), highlights the relationships among those components, and aims to transcend individual databases and put “information where people are looking for it—on the web.”27 LOD provides the potential for greater interlinking between library collections regardless of where the collections are physically housed or virtually hosted, and enables machines to connect related items across platforms. Deploying LOD applications in libraries has been a complex task involving developing the entirety of the infrastructure needed to create it. This complexity has primarily restricted LOD activity to large, well-resourced institutions, often with external financial support.

Wikidata offers software and an application framework in Wikibase, as well as user-friendly editing tools,28 that put experimentation and implementation of linked open data within reach of more libraries. According to a 2018 survey of international linked data implementers in libraries, Wikidata has become “the #5 ranked data source consumed by linked data projects/services.”29 Many in the international research community, including in libraries, are focused on community-owned infrastructure30 and robust metadata31 to facilitate open scholarship practices,32 and this white paper takes a close look at Wikidata and Wikibase through that lens.

Through Wikidata, libraries can use their expertise in the creation of structured data and resource description in an open, reusable, and globally interoperable environment. And as a crowd-sourced and open effort, Wikidata—a community and a knowledge base—both challenges libraries’ traditional practices around authority control and at the same time presents opportunities to promote core library values of diversity, equity, and inclusion in the collaborative creation of descriptive metadata.

Providing background on the Wikidata community and knowledge base, this white paper includes use cases exploring projects and initiatives in the broader cultural heritage community, including:

  • Using Wikidata and Wikibase to expand the scope of applications of linked open data that have until now been domain or academic-subject specific

  • Establishing the relationship between Wikidata and authority files

  • Increasing the visibility of institutions, content, people, events, and more in Wikipedia through structured data in Wikidata

  • Using Wikidata for bibliographic/archival description and discovery

  • Using Wikidata within an emerging system of open scholarly communication and scholarly infrastructure

  • Deploying Wikibase as infrastructure for linked open data (including open, authoritative data not open to edit)

  • Wikibase and Wikidata as infrastructure for FAIR (Findable, Accessable, Interoperable, Reusable) data33

  • Librarians learning from and influencing the Wikidata community by contributing to documentation and tool development that will reduce the barriers to participation

Most active LOD projects in libraries have been concentrated in large institutions in the US and Europe—the kind of organizations with dedicated information technology and systems departments, and staff with capacity to invest in retooling basic infrastructure and data models. The Task Force encourages library leaders to take a broad-based view of how their organizations might discuss and implement its recommendations, convening staff across their organizations to see Wikidata as part of:

  • Enhancing metadata and discovery

  • Advocating for open bibliographic data and open metadata exchange

  • Promoting greater visibility and reach of institutional research and scholarship (the inside-out library) through its open citation corpus

  • Sustainability of the open approach for libraries

From the earliest discussions of the semantic web, librarians along with other academic communities have been looking for practical pathways to connect dispersed knowledge on the internet. Linked data uses data “triples” (subject-predicate-object) that connect two different concepts (subject and object) with a predefined concept relationship (a predicate). With these relationships, a growing number of items become connected into an expanding graph of interrelated concepts, which can be traversed, and in turn represented, through computational processes and human exploration. Navigation of these growing networks of concepts allows for a serendipitous series of relationships. Queries of these concepts can lead to much more human answers to questions like “Which authors were born in a specific town?” or “Which authors of papers in our collection were PhD candidates at the same university?”

But the technical approach to creating linked data is challenging. The findings of OCLC’s 2015 survey of linked data implementers in the library world indicated that the principal motivations for publishing linked data were learning and experimentation, exposing data to a larger audience on the web, and improving discovery of local content by search engines. Among the most cited barriers to publishing linked data were “steep learning curve for staff,” “little documentation or advice on how to build the systems,” “lack of tools,” “restrictive or unclear licenses,” and “ascertaining who owns the data.”34

But by 2018, Wikidata rose from the 15th most used linked data source (9% of surveyed projects) in 2015 to the 5th most used linked data source (41% of surveyed projects).35 Because it was designed to support hundreds of languages, wide-scale human contribution, and the scalability required for supporting Wikipedia, Wikidata offers one of the better human-and-machine-interactive platforms for creating interdisciplinary linked data. In particular:

  • Wikidata has an established global community of stakeholders among national libraries (including the National Library of the Netherlands, National Library of Sweden, British Library, National Library of Finland, National Library of Wales, German National Library, National Library Service of Italy, and others).

  • Wikidata has investments from academic libraries (including the Linked Data for Libraries project, a collaboration variously involving the libraries of Columbia, Cornell, Harvard, Princeton, and Stanford, and the Library of Congress,36 and from research universities throughout Europe), other GLAM (galleries, libraries, archives, and museums) institutions (The Metropolitan Museum of Art, the Pritzker Military Museum & Library, and the Smithsonian Institution), and broader academic and professional communities working in linked data.

  • The experience of creating linked data within Wikidata is very human-centered, while still adopting most of the technical best practices from the broader linked data community. This means that teaching people how to create linked data requires fewer technical skills and can be achieved in less time using Wikidata compared to other platforms.

  • Because of its broad scope and open community, Wikidata allows for a much more diverse, multidisciplinary approach to linked data creation than many other linked data and authority projects that are bounded by institutional membership and participation.

  • Wikimedia has a long-term commitment to, and track record of, creating reliable knowledge on the internet.37

Though still-emerging platforms with many open questions about their role in the linked data community, both Wikidata and Wikibase have demonstrated substantial potential for changing the practice and the environment for practical implementation of linked open data in libraries.38 By consuming and contributing to Wikidata, libraries will also have a vested interest in its scalability and persistence.

A Brief History and Introduction to Wikidata

Wikidata is a knowledge base of structured linked data. Like the Wikimedia Foundation’s other projects, Wikidata is a wiki, and is editable both by individuals and by machines (using automatic programs that can make a series of specified changes, commonly known as “robots” or “bots”). Wikidata is published under the Creative Commons Public Domain Dedication 1.0, which means that the data contained in it is free to copy, modify, and distribute. It is a multilingual project, supporting labeling of items in hundreds of languages, all stored in a single site.

Wikidata was developed to connect and serve other Wikimedia projects, including Wikipedia (which has 292 active language editions),39 Wikimedia Commons (Wikimedia’s media repository, which as of February 2019 includes over 52 million freely licensed photos, videos, and other multimedia files),40 Wikisource (a collection of primary source and historical documents),41 and more. For instance, a single Wikidata item can provide links to Wikipedia articles about that item in various language editions of Wikipedia, connect to photos in Wikimedia Commons about the same topic, link to a definition on Wiktionary, and, if it is an item about a historical text, link to the full text on Wikisource. These inter-wiki links are dynamically updated: if a new Wikipedia article is written in a new language, the link will be added to Wikidata and it will be available from all other Wikipedia editions.

Example of a Wikidata data model with a statement group, opened and collapsed references, and identifiers

Figure 1. Graphic representing Wikidata’s data model with a statement group that includes opened and collapsed references,

Wikidata’s structured data is designed to summarize, augment, and update the content of Wikipedia articles in a number of ways, including through dynamically populating “infoboxes,” the summary boxes of data that sit on the side of many Wikipedia articles. Wikidata provides a central place to update data that can change (such as population figures), which ensures consistency across all Wikipedia editions and provides users with the most up-to-date information on the subject. Though this is not yet being used across all infoboxes or languages, Wikimedia’s volunteer developer and editor communities are rapidly building tools to make using Wikidata easier. Other uses for this rich central repository of structured data will continue to emerge both in Wikipedia and in other projects. Further, the use of structured data through Wikidata on Wikipedia articles directly impacts the wider web. For example, Google uses Wikidata and Wikipedia in its knowledge panels, the summarized claims to the right of a set of search results.

The first use for Wikidata was populating and supporting the connection between articles on different language editions of Wikipedia. For example, Wikidata acts as a central hub linking the article in the French-language edition of Wikipedia about “the French Revolution” with the article in the English Wikipedia and the Swahili Wikipedia, along with the 138 other language Wikipedias that have an article about this concept. If an article is created in a new language edition of Wikipedia (for instance, if someone starts an article about the French Revolution on the Igbo Wikipedia, which as of 2018 does not have one), the link to that article can be added to the Wikidata item, and the link to Igbo will be automatically populated across the other editions of Wikipedia.

As support for new data properties in Wikidata and coverage for Wikipedia projects expanded, an increasing number of other data sets and sources of information became the foundation for adding more and more topics and properties on Wikidata, including matching sets of external “identifiers,” which includes URIs, name authorities, and controlled vocabularies from other data sources around the web. Wikidata includes one item for every topic that has a Wikipedia article in any language (including meta-items like Wikipedia categories); but it also now includes millions of items for topics that do not have an associated Wikipedia article that meet Wikidata’s simpler notability requirements. Topics in Wikidata may never become Wikipedia articles, either because the topic is missing in Wikipedia or because Wikipedia editors have not found the topic notable enough for a stand-alone Wikipedia article. For example, Wikidata projects were developed to create items and contribute data related to paintings, monuments, and journal articles (discussed further below). Each of these subcommunities contributes to an ever growing body of knowledge created as linked open data.

Wikidata item creation page, with fields including language, label, description, and aliases

Figure 2. Wikidata item creation page. Creating an item in Wikidata can be done by anyone, through either the item creation page, or tools that allow for batch item creation.

Entities are the elements of the Wikidata knowledge base, which can be either items or properties. Entities act similar to terms in a controlled vocabulary. Both item and property entities have their own section of the project,42 and each entity has a Wikidata page. They are assigned unique identifiers, using the letter “Q” for items and “P” for properties, followed by unique numbers. All entities can have labels (preferred name of the entity), descriptions (short description of the entity), and may have aliases (alternative names) in multiple languages.

Items contain data describing a single topic, object, or concept.43 Items contain a series of statements in the form of a property-value pair, which can be further enriched by the use of qualifiers. For instance, the following statement in Wikidata describes the fact that the novel The Able McLaughlins won the Pulitzer Prize for Fiction. The statement is further described using qualifiers to include when the prize was awarded and the name of the winner:

The Able McLaughlins (item) → award received (property) → Pulitzer Prize for Fiction (value) → point in time (qualifier) → 1924 (value) → winner (qualifier) → Margaret Wilson (value)

The claim made in each individual property-value statement can be supported with references. An item can also have sitelinks, which are interwiki links (links to other Wikimedia projects). Items and their statements can be created and edited by anyone.44

Properties, on the other hand, are more controlled since they require a community review process. Properties are the equivalent to metadata elements/fields since they are used to record values. Properties in Wikidata are equivalent to predicates in general linked data terminology. A property describes a relationship between the item and another item, an identifier, or a literal value (free text string, date, etc.). As of February 2019, there are over 6,000 properties, both general and domain-specific, that are supported on Wikidata.45

Properties that relate to existing or well-used metadata standards will be supported by the community—so getting support for new properties needed by libraries should be straightforward. However, at the time of writing, properties equivalent to common standards used in both libraries and other communities are inconsistent—often the first step for projects working with Wikidata is identifying the alignment of standard data models with the Wikidata model.

The values associated with properties may be restricted to only certain allowed values, such as other Wikidata items, or restricted to certain formatting. For example, any value entered as a Library of Congress authority ID is expected to be a “string of 1 or 2 lowercase letters, a 2- or 4-digit year and a sequence of 6 digits.” This is typically expressed with a format as a regular expression statement on the property page.46

Example of a Wikidata property proposal, with description of the property, statement of motivation, and discussion

Figure 3. Property proposal for an architecture database maintained by the University of Washington Libraries. Most conversations about authorities are easy to advance in conversation with the Wikidata community, and property proposals are valuable opportunities for feedback and conversation about how to best restructure the property for it to work well in Wikidata’s existing structure.

Wikidata’s software, Wikibase, includes a number of features that make both Wikidata and linked data projects flexible. The software has a human-editable interface—encouraging individual contribution, and allowing contribution from people with limited knowledge of linked-data standards—as well as a batch-editing API, which can be accessed with several generic batch uploading tools, including QuickStatements47 and OpenRefine,48 as well as custom-written bot scripts. The software supports multilinguality, allowing for simple translation of labels; and it has built-in quality-control features and permissions. Wikibase has been used in medical libraries and other GLAM projects. For example, OCLC led a pilot project to explore the application of Wikibase for linked open data in the library workflow.49 Similarly, a number of new domain-specific linked data projects are either adopting or seriously considering adopting the Wikibase software for linked data development.50 Because of this increasing demand for third-party installations of Wikibase, the software has been packaged with a number of utilities (including the batch uploading tool QuickStatements, and the SPARQL querying tool), for installation by server administrators, making the software accessible for creating new linked data projects.51

QuickStatements tool for creating or editing Wikidata items in batches. Includes options to import V1 commands or CSV commands.

Figure 4. QuickStatements tool for creating and/or editing Wikidata items in batches.

QuickStatements allows a user to batch edit Wikidata or Wikibase by importing multiple editing commands (or statements) in a delimited text format. There are tools for generating such statements including OpenRefine, CSV files, and Zotero. The interface takes either tab/ newline or comma-delimited data for batch upload into Wikidata and Wikibase. QuickStatements can be installed on a server alongside independent deployments of Wikibase.

Example of a Wikidata reconciliation service in OpenRefine

Figure 5. Wikidata reconciliation service in OpenRefine.

The Wikidata reconciliation service as part of OpenRefine allows for rapid matching between strings and Wikidata items in a data set, before uploading. The OpenRefine tool is being generalized for use on other Wikibase installations.

Wikidata and Wikibase Lower Barriers to LOD, Help Scale Its Applications

Some past attempts to do linked data at scale and across domains have run into practical limitations. DBpedia,52 a linked data set based on Wikipedia (and recently including some Wikidata data) and created by a small group of academics has been published inconsistently, and requires parsing information from individual Wikipedias. Freebase, a Google-supported effort to create linked data, failed due to lack of a crowdsourcing community and investment. On the other hand, closed data sets like the Getty Vocabularies and Library of Congress Name Authorities are editorially created authority systems focusing on items held in museum or library collections, which means they represent a small fraction of the world’s culture and introduce layers of complexity to contribution. These limitations have largely precluded smaller institutions and independent contributors from participating, creating barriers for marginalized knowledge to enter the data set. Wikidata and Wikibase directly address many of the challenges created by other linked data environments:

  • it provides interfaces that allow for both strong human and bot-driven contribution;53

  • it maintains a precise history of changes;

  • the data model supports the addition of citation and/or attribution for each data point;

  • the Wikidata community extends the work and knowledge created by the community of contributors that make Wikipedia and Wikimedia while also encouraging the inclusion of other open licensed data sets; and

  • by being radically open, following the “anyone can edit” model of contribution, it encourages the participation of a wider range of users.

Additionally, the Wikimedia community has continued to create tools, tutorials, and games to assist participants in acquiring the confidence and expertise to participate fully in the linked data creation process across skill levels.54 Wikibase promotes these same principles, supporting rapidly changing and growing data structures. This allows for the creation of linked data projects which aren’t limited by the availability of technical skills—such as programming and data management—enabling broader participation by those with more expertise in the content itself.

The Culture and Policy Environment of Wikidata

The Wikidata community is as important an asset as the repository of data itself. The Wikidata project grows out of many of the same values that guide contributors in other Wikimedia projects. For example, Wikidata users embraced the importance of “verifiable”55 content within the data, building on the Wikipedia concept that in an ideal world, all facts presented could be substantiated through another source. Data quality is tied heavily to the idea that an original source provides the authority behind the information, which in practice differs from the Wikipedia practice of relying on authority from secondary or other published sources of information. Additionally, Wikidata allows for items to be created from a single external source— working around the barrier for inclusion of new marginalized topics in English Wikipedia, which requires representation of a topic in multiple sources.

Similarly, the consensus-driven approach to create policies, norms, and practices on Wikipedia has been brought in large part to Wikidata. For example, property creation (and thus the ontology of Wikidata) emerges as community members propose and then endorse the creation of a property. Similarly, as properties are not used, or data items are left unconnected to the larger graph, the community may delete them. If on the other hand, items from niche or lesser known data sets are connected to properties in the larger graph, they can find a home in Wikidata where they might not in other name or authority systems.56 These values, practices, and norms of interaction that happen on Wikimedia projects require patience and learning how the community works. At the same time, at least for now, the Wikidata community is more open to newcomers than some of the Wikimedia sister projects.

Wikidata and Wikibase Applications

There are many different opportunities for libraries to contribute to Wikidata as a public good and to benefit their organizations. Some may consider using Wikibase as a platform for deploying a linked data store. This next section will highlight several thematic areas of interest to libraries and give examples of existing projects and initiatives to demonstrate the kinds of projects which lend themselves to Wikidata or Wikibase.

Authority Data and Using Wikidata as a Linking Hub

Contributing to open knowledge projects, such as Wikidata, aligns with the mission and values of libraries in their value-driven commitment to contribute to open culture. Authority data in the form of names (personal, corporate, or jurisdictional) are part of the backbone of functional bibliographic metadata. Authority data form the most highly structured and standardized metadata within bibliographic records, aiding with disambiguation. These data are the most readily usable and linkable as linked data on the open web.57 Implementing authority data in the form of uniform resource identifiers (URIs) connected to collections is a powerful way to link to related collections through Wikidata, as well as opening the possibility of enriching library bibliographic systems with external data sources.

To fully take advantage of these possibilities and to more fully participate in the linked data environment, modifications must be made to standards applied to bibliographic records. These changes include allowing for the use of alternate external data sources such as Wikidata, ORCID, and ISNI to establish and link to names. These changes would enable libraries to more fully link into the cloud, and reduce resource barriers caused by requirements of the Library of Congress’s Name Authority Cooperative Program (NACO). Further efforts should be made to engage in the creation of links between data stores and collections, to create fully actionable linked data.

The process of minting and verifying name authorities in the library community has often been constrained by editorial processes. For example, NACO requires a large resource commitment in both staff time and financial support, and contributions can only be made through a NACO hub, such as OCLC or SkyRiver. A more collaborative and open approach to the creation of structured data could allow libraries to concentrate efforts on unique collections, as well as choose the most appropriate mechanism for the minting of URIs for names. Furthermore, libraries could focus on creating and hooking into a network of names, rather than on a one-to-one relationship with the Library of Congress.

Wikidata works in concert with open platforms (whether on Wikidata, in appropriate relevant Wikibases, or on other dynamic LOD platforms with liberal contribution policies) to develop linked data, and connect to existing sources of that data. It also enables greater openness and opportunities for the creation, sharing, and reuse of bibliographic and authority-like data. Bibliographic data is part of a network of data sources and libraries should continue to advocate for reducing resource barriers to data creation processes and for making data as reusable and open as possible. The authors see particular benefits in areas related to unique and special collections and smaller institutions that might lie outside traditional library workflows.

Libraries’ participation in the creation and enhancement of Wikidata items related to their unique collections is an area of great potential. Providing a presence in Wikidata for creators of archival collections (corporate body, person, or family responsible for the creation/ compilation of the works), and people and organizations represented in those collections, may help facilitate discoverability of the materials as well as their connection to materials in other repositories.

At the same time, the data in Wikidata is connected to external data sources, many of which were created through national libraries. Making these connections not only enriches the data, but it also aids in disambiguating concepts. Already large sets of identifiers from such data sources as the Library of Congress Name Authority File (LCNAF), the German National Library GND, and the National Library of France SUDOC have been extensively matched. The interconnection between Wikidata and trusted external data sources and schemas58 bolsters Wikidata as a trustworthy conveyor of data.

However, an area that will need further development is finding ways to increase the ease and speed of creating items in Wikidata. Even with the use of existing tools it is cumbersome to create items with a full set of associated properties. Community development of application profiles containing practical, base element sets for common library-related materials would aid in the utility of Wikidata and its place within library workflows. Furthermore, it would help to ensure a consistent application of data across users in libraries and other cultural heritage institutions.

A variety of projects described below demonstrate that Wikidata has the potential to be an important part of the linked data ecosystem for authorities. Smaller institutions with limited resources could benefit from the free, community-driven, easy-to-use Wikidata interface to provide a presence for underrepresented subjects in the knowledge base.59

Notable Examples in the Research Library Community

Enhanced Data in Library Catalogs

The University of Wisconsin–Madison Libraries developed a linked data project called BibCard to build knowledge cards for library data. BibCard brings identifiers related to authors from external sources (such as VIAF, Library of Congress Name Authority File, Wikidata, DBpedia, and the Getty Vocabularies) into the linked data instance of their library catalog.60 An example of this integration can be seen in the entry for Indiana University–Purdue University Indianapolis (IUPUI) professor Una Osili’s work Does Female Schooling Reduce Fertility? Evidence from Nigeria. (See Figure 6.) The Wikidata entry for Osili was created as part of the IUPUI initiative to bring faculty members to Wikidata, mentioned later in this white paper. This shows how a library’s contribution to Wikidata can benefit other institutions in the enhancement of their bibliographic catalog records.

Example of Wikidata used as a reference or external source of information

Figure 6. Una Osili’s information in the University of Wisconsin–Madison Libraries catalog, where Wikidata is used as one of the external sources for information to expand the author’s biographi-cal data.

A prototype developed at Laurentian University demonstrates the integration of Wikidata into the institution’s Evergreen-based catalog.61 (See Figure 7.) In this project, identifier data is pulled from Wikidata to display contextual links for musical artists.

Catalog record using Wikidata to add to the record

Figure 7. Snippet of a catalog record in the Laurentian University Library ILS (Integrated Library System), Evergreen, where data pulled from Wikidata is used to augment the record.

A similar feature has been rolled out by library technology provider Zepheira, which is creating “About Author” tools for entities represented in the collections of public libraries across the United States.62 Once linked data entities are matched against Wikidata, a much larger community of reusers can take advantage of that data for discovery applications.

Archival and Special Collections Discovery

SNAC (Social Networks and Archival Context) provides the archival community with a central hub where data about archival creators (corporations, people, and families) are maintained. In SNAC, a record is defined as a “constellation” and each constellation is assigned a unique identifier. During the creation of the SNAC database, many of these constellations were linked to library authority records as well as Wikipedia. SNAC has also benefited from those links by pulling associated images from Wikimedia Commons into its interface when a SNAC constellation is linked to a Wikipedia entry. More recently, the SNAC constellations were matched with existing Wikidata items, providing an additional data source offering more information on a given concept. This work was done in two parts: a wiki editor affiliated with a library institution proposed a new property in Wikidata to accommodate SNAC identifiers; and another wiki editor created a procedure to match SNAC constellations with existing Wikidata items. The proposed property, SNAC Ark ID,63 underwent a community review process and was later added to the knowledge base. With the new property in place, the matching of constellations to existing Wikidata items resulted in the addition of about 128,000 SNAC IDs to the knowledge base. Since that time, the property has continued to be used by Wikidata editors, with additional matches and corrections being undertaken.

Europeana, the EU digital platform for cultural heritage, has called on libraries and other cultural heritage institutions to add their identifiers and vocabularies to Wikidata as a way for Europeana to pull them into their system. Wikidata was identified as a priority area for Europeana, and having more libraries using Wikidata as a linking hub strengthens the overall structure of semantic information about cultural heritage collections.64

The Smithsonian Institution is running an Open Data Pilot that focuses on contributing open data about the institutional collections to Wikidata. Because the Smithsonian is a multidisciplinary organization, the collection and management of the identities within their content management platforms has considerable challenges—including, for example, reconciling identities and data models across different disciplines. As part of this pilot for regularizing and building a model for impact around linked open data, the Smithsonian is developing two data sets: one in the sciences (Natural History Specimens) and one in the humanities (Artworks).65 Some of this experimentation will be paired with the Smithsonian’s American Women’s History Initiative as a series of digital innovation initiatives that work across institutional boundaries within the Smithsonian.

In a corollary project, Yale University Library, through the Wikidata for Digital Preservation project,66 is using Wikidata to store technical information about software preservation. Wikidata provides a way to model and create collaborative, responsive data that allows the software preservation community to reduce redundancies and maximize the sharing of work. Because Wikidata is a stable, collaborative project with a great deal of flexibility, it reduces barriers to collaboration and documentation that might make this work otherwise challenging.

How Can an Institution Get Started Adding Bibliographic Data to Wikidata?

  • Host workshops on Wikidata.67

  • Make micro-contributions by using existing external tools (for example through the Wikidata Distributed Game68 or through the Mix’n’match tool).69

  • Add archival holdings to existing Wikidata items using the “archives at” property.70

  • Create items for creators of archival collections and link them to your institution.

  • Add missing descriptions to existing items in your language of preference.71

  • Particularly for institutions where ORCIDs are widely used, and/ or VIVO is deployed, create items for individual faculty members at your institution (including identifiers to external data sources)72 and their publications.

  • Integrate existing local name authorities or vocabularies that are important to your institution through the Mix’n’match tool.

  • Batch-upload data. There is a well-documented process for identifying and uploading batches of data to Wikidata.73 Data sets that might be a good fit for Wikidata include: data about people or other institutional entities that are relevant to your collections; extended information about the institution itself (especially faculty, buildings in existing data sets); or data about geographical entities from the local region.

Wikidata in the Landscape of Scholarly Communication

Academic institutions and their libraries are interested in collecting and sharing bibliographic information related to the scholarship produced by their faculty and researchers. This not only facilitates visibility and recognition of their work, but also strengthens the institution’s reputation. There are a number of resources and tools available to faculty, researchers, and their institutions for sharing their scholarly activities. One way this is accomplished is by displaying faculty profiles in institutional websites, which can contain both biographical and bibliographical data, or using academic network sites74 to expose the data. For instance, managing identities using ORCID;75 tracking citations in Google Scholar Citations; networking with other faculty and researchers with similar interests in ResearchGate,, and LinkedIn; or using VIVO,76 a cross-institutional application used to track the productivity of faculty and researchers. Limitations to these approaches include the need for individual faculty or researchers to keep the data up-to-date (both biographical and bibliographical), and the fact that the data shared on these sites are not always structured or licensed in a way that would allow for them to be easily reused.

Academic institutions have demonstrated the appetite and readiness for both research library communities and the larger scholarly community to access, share, and discover scholarly materials without restrictions. The promise of linked data in libraries is in part associated with the importance of making research more visible for communities to use and reuse for the creation of new scholarship. Open bibliographic metadata initiatives in the Wikimedia arena include the Initiative for Open Citations77 and WikiCite.78

As the number of tools and platforms in open scholarship proliferates (see “101 Innovations in Scholarly Communication”),79 there has been increased attention to ownership, portability and interoperability, and lock-in. Groups supporting “community-owned” or “academy-owned” scholarly infrastructure are investing in open source publishing platforms, preprints, and scholarly annotation tools. Wikidata is community-owned. It stores structured data that has open, reusable licenses, which makes the Wikidata knowledge base part of the open “connective tissue” that can help transcend particular tools and power discovery and integration of scholarship.

Scholarly Communication Exemplars

Wikidata is being used as a means of documenting and surfacing researchers, publications and research data in a number of ways. It provides an opportunity for sharing faculty scholarship on an open and accessible platform. The following are examples of current initiatives, tools, and use cases related to Wikidata:

  • WikiCite: “a Wikimedia initiative to develop a database of open citations and linked bibliographic data to serve free knowledge.”80 The WikiCite initiative has brought together a community of Wikidata contributors, open access advocates, and library metadata professionals, to model source materials (including journal articles and books) that can be used as structured data citations for Wikimedia projects and other platforms. The goal is to create the “Sum of All Citations.” At the time of this paper’s writing, the WikiCite data set now includes nearly 7 billion triples (subject-predicate-object) and over 150 million citation relationships among the works collected.81 To date, there have been three WikiCite conferences, bringing together Wikimedia contributors and community members, linked data professionals, and librarians to discuss and develop the future of bibliographic data on Wikidata. There are planned future events as well as a discussion mailing list that any interested people can join.82

  • Scholia: a web service that makes live SPARQL queries (SPARQL Protocol and RDF Query Language) to Wikidata, which allows for live browsing of the relationships between works, authors, institutions, and other metadata about the works and their creators.83 In addition to rendering scholarly profiles, Scholia can be used as a bibliographic reference management tool. The bibliographic visualizations generated by the tool allow users to explore publications and their connections with other works. Scholia has served an important role in providing a better understanding of what is possible to achieve using Wikidata data.

  • Faculty profiles at IUPUI: The University Library at Indiana University–Purdue University Indianapolis ran a pilot project in 201784 to test the viability of Wikidata as a repository for data related to campus faculty members and the scholarship they produce. The core faculty from the IU Lilly Family School of Philanthropy was selected as a use case. Items were created for the selected faculty, their co-authors (regardless of their institutional affiliation), and some of their publications. Connections between works and creators were established. Cited works (works listed in the references) were also added to Wikidata. These contributions facilitated the use of Scholia to generate the scholarly profiles. IUPUI, in continuing its support and involvement with open knowledge projects such as Wikidata, has continued creating entries for the campus faculty across multiple disciplines, with a focus on women faculty.

How Can an Institution Get Started Adding Scholarly Data to Wikidata?

  • Institutions can systematically add data for faculty members and their publications to Wikidata using the Source MetaData85 tool. This tool’s performance is dependent, in part, on works having digital object identifiers (DOIs), and those DOIs appearing in publicly viewable ORCID records.

  • Data can also be added ad hoc after being exported from Zotero86 into the QuickStatements tool.87

  • Institutions can consider running Wikidata edit-a-thons88 (for staff and/or as public events), especially during events like Open Access Week, Open Data Day, Mozilla Global Sprint, or Ada Lovelace Day, to add faculty and researcher profiles and publications to Wikidata.

Wikibase and Infrastructure for Linked Open Data

The software that drives Wikidata, Wikibase,89 can offer value to the broader library community independent of Wikidata. The relatively lightweight technology stack and easy-to-use human-readable interface make Wikibase a viable piece of infrastructure for developing a linked data store. Wikibase is particularly useful in cases where data and data models are highly specialized or there are considerations that require greater control over the data. Wikibase has a growing community of users in the GLAM and research sectors.

There are a number of benefits to using Wikibase, as Matt Miller of the Library of Congress describes:

  • Statement level provenance (what Wikibase calls references)

  • Revision tracking and history

  • A nice user interface for manual editing and curation

  • An API to do bulk data work

  • SPARQL endpoint90

Wikibase provides something that a number of other platforms in the linked data environment have not: a dynamic, machineand human-contributable and readable environment that supports diversity of language and data structure from the beginning. The growing number of Wikibase implementations (examples below) suggests opportunities for scholarly and GLAM reuse of the software as a generic data store. Investment in community infrastructure means less lock-in to proprietary platforms and systems.

A Wikibase implementation may be a sandbox or drafting environment that allows for ingestion and reconciliation of data sets before they are merged into Wikidata. Or a Wikibase implementation may be the end point for managing a particular set of data or project.

Because the Wikibase software has only been readily available for independent deployment, many of the current examples of its deployment are experimental. There is a growing community of Wikibase adopters interested in running such experiments, identifying technical barriers to adoption, and facilitating increased feedback to the development of the software beyond its Wikidata application.

Wikibase Exemplars

  • OCLC Linked Data Wikibase Prototype:91 Since 2017, OCLC has been collaborating with university research libraries to pilot the use of Wikibase as a sandbox environment for creating bibliographic and other metadata, and to explore its use in metadata reconciliation. The project has recently concluded, and its results will be shared in forthcoming conference presentations and a report.

  • A Mellon-funded project from the Matrix Digital Humanities Center at Michigan State University, uses Wikibase as a central platform for integrating discrete scholarly databases of slavery data. The goal of the project is to create a central platform to facilitate better research and storytelling about individual slave narratives as well as create tools for scholars to analyze and reflect on those sources.

  • Rhizome, an arts organization based in New York City, has used Wikibase to document its collection of born-digital art and to practice digital preservation. Wikibase was chosen because it offers a flexible and customizable structure for both modelling data and adding properties and qualifiers. Further, Rhizome needed highly specialized fields that may not have been appropriate for Wikidata.92

  • Structured data on Wikimedia Commons: The Wikimedia Foundation, with the support of the Alfred P. Sloan Foundation, has invested in the integration of Wikibase into the backend metadata storage for its Wikimedia Commons platform. Commons, which contains 50 million files, including millions that come directly from GLAM institutions, has long stored metadata as free text.

How Can an Institution Get Started Implementing Wikibase?

  • Explore how and why institutions and projects have decided to go with separate Wikibase installations.

  • Provide server space and staff time from institutional developers in order to launch and support a deployment of a local Wikibase.

  • Pilot linked data projects within the Wikibase environment before building or deploying custom platforms.

Diversity, Equity, and Inclusion

Most collections in the library and broader GLAM community build on a number of assumptions about what constitutes valuable content. Though there is a growing attempt to increase representation and collection of underrepresented communities and knowledges in these collections, many of the descriptive practices that underpin collection catalogs and authorities don’t provide the full flexibility to break out of colonial and patriarchal assumptions.

Wikidata offers an opportunity to document and increase the visibility of collections and research materials about underrepresented populations. In the last half decade, the Wikimedia movement has invested energy in populating biographies of notable individuals, and in turn, increasing the diversity of people represented in that content through focused drives, such as those implemented by the Art+Feminism (focused on women in the arts), Women in Red (focused on missing women in English Wikipedia), WikiMujeres (a group focused on representation of women in Spanish Wikipedia), and AfroCrowd (outreach to people of African descent) initiatives. These community-led Wikimedia initiatives focus on diversity of representation of people included in the projects and connected to other data sources. By populating Wikidata with content about these individuals, and then supporting collaborative initiatives coming out of the Wikimedia community, libraries have an opportunity to engage networks of activists in fully documenting and describing these communities.

There are still considerable gaps in Wikidata content, but the Wikimedia movement’s commitment to knowledge equity as part of the 2030 Wikimedia movement direction, and recent trends in organizing within the Wikimedia movement, suggest that a growing global community of practice is forming around using Wikidata to document marginalized knowledge.93 Library and other institutional metadata is an important foundation for building out that growing data set to address these issues.

Several institutions are already developing projects that leverage Wikidata to better connect resources from external sites with content within the Wikimedia ecosystem to create an emergent and dynamically growing body of knowledge. For example, the Smithsonian’s American Women’s History Initiative94 will be building on existing crowdsourcing and research efforts to identify women in their collections through open data practices that rely on Wikidata. These efforts will help the Smithsonian better understand what is already documented in their metadata by providing the crowdsourced context and will make it easier to surface the identified resources. Practically, this community-centered approach to enriching data should open up a number of opportunities for dynamically enriched explorations of women in the Smithsonian’s collection.

Another example of an emergent applications of Wikidata is a recent project from a team at Yale University called Science Stories.95 The new platform pulls data from Yale University Library, Wikidata, and other sources to create a dynamic interpretative environment for understanding Yale alumni who are women in the sciences. In this setting, Wikidata acts as both a source of data and a bridge between different authoritative sources for enabling the auto-generation of these women’s story pages.96 These examples are emergent, but promising models of how Wikidata could offer a pathway for institutions and communities to surface underrepresented groups and the knowledge associated with them from collections.

Due to the multilingual nature of Wikidata, its data is being used to generate content for underrepresented language editions of Wikipedia. This work is being accomplished by the use of the “article placeholder,”97 which dynamically generates information for the subject being searched in a particular Wikipedia edition based on statements available in Wikidata. For instance, the article placeholder has been adopted by the Welsh community of editors in an effort to increase access to knowledge in Welsh Wikipedia.98 Other applications include Wikidata-powered infoboxes99 and Listeria.100

Beyond leveraging Wikidata as a hub among projects trying to better understand existing metadata, the platform allows for more deliberate efforts to create and accurately represent these communities. This has been best demonstrated in the work being led by Stacy Allison-Cassin at York.101 The open-ended data model for working with content allowed her team to work with the Indigenous communities represented in materials and records of the York University archives to create authorities that accurately represent the individuals and cultural practices in those ecosystems. Similarly, the Black Lunch Table project102 connects underrepresented artists in Wikipedia, and through Wikidata the project has run campaigns to collect images and research about these artists. The flexibility of Wikidata and Wikipedia allows for the project to identify and expand description of these artists while respecting their self-determined identities.103

How Can an Institution Get Started Improving the Diversity, Equity, and Inclusion of Wikidata?

To address gaps in Wikidata coverage, institutions can:

  • Add descriptions to existing collections with biases because of how and when they were described, to include more diverse identities from sets integrated in Wikidata or other related sets.

  • Create authorities, using Wikidata, for missing identities within existing collections, to increase the reliability of Wikimedia projects.

  • Use Wikidata as a mechanism for connecting data sets focused on underrepresented groups or marginalized collections.

  • Engage scholars working in underrepresented knowledge areas, such as non-Western, colonized, or other marginalized knowledge, to help extend existing sets of knowledge in Wikidata.

Community Outreach and Training

The Wikimedia community has a long history of collaborating with and training librarians to participate in Wikimedia projects.104 Much of the collaboration has centered around holding community-facing Wikipedia events in libraries supporting outreach, digital literacy, or other skill-based work—and the library community is beginning to see Wikidata and Wikibase skills being shared in workshop and edit-a-thon settings similar to those used with Wikipedia. For example, the 2017 Canadian Music Edit-a-thon series included Wikidata integration.

There are also themed communities in Wikidata that can catalyze public contribution, and there is value in bringing actors beyond the “library metadata” workers into thinking and caring about metadata.

Community Outreach Exemplars

Sum of All Paintings is a volunteer-organized effort to represent “every notable painting” in the world on Wikidata.105 This project has become a focal point for data modeling, not only for paintings and other works of art, but related topics like artistic methods, creators, and institutions holding the works. Sum of All Paintings and the larger cultural heritage community on Wikidata have been a driving force behind ingestion of data from GLAM institutions around the world.106 This community has allowed for the creation of several prototype tools for browsing heritage collections, such as Crotos.107

The Wikimedia community has been running the Wiki Loves Monuments campaign since 2011 to document listed monuments and heritage sites.108 Since its inception, Wiki Loves Monuments has generated millions of freely licensed, high-quality images of heritage sites, and created a Guinness World Record109 for the largest photography competition in the world. In the process of running the contest, the community collected and reconciled heritage registries from hundreds of jurisdictions, creating the biggest monuments database in the world. In the last few years, Wikimedia Sweden, with the support of a major Swedish grant and in collaboration with UNESCO (United Nations Educational, Scientific and Cultural Organization), converted much of that database into Wikidata, and imported additional heritage registries into Wikidata as well.110

How Can an Institution Get Started Doing Outreach on Wikidata?

  • Devote staff time to Wikidata by writing participation into job descriptions, and giving release time and professional development time to participate in Wikimedia events. Examples of positions explicitly incorporating Wikidata include a coordinator of Wikipedia initiatives at Brigham Young University Library, in the library’s department of Marketing, Design, and Communications, and the digital initiatives metadata librarian at IUPUI, who has participation, training, and advocacy for Wikimedia and open knowledge as secondary responsibilities.

  • Look into a Wikipedian/Wikimedian in Residence. More than 150 libraries, museums, archives, and other institutions have hosted Wikimedians in Residence since 2010. “The Wikipedian in Residence is not simply an in-house editor: the role is fundamentally about enabling the host organisation and its members to continue a productive relationship with the encyclopedia and its community after the Residency is finished.”111


It is unlikely at this point that Wikidata can replace systems needed to manage collection inventory, transactions, and other practical aspects of managing linked data within library systems. However, it is vitally important for the library community to invest in linked data solutions that benefit the larger ecosystem of scholarly metadata creators. Linked data should connect and make more interoperable the information created by different disciplines and fields, helping knowledge become less balkanized by the scholarly communities that create it. This requires a global community of collaborators, across many disciplines, and globally interoperable technology.

While libraries have generally seen the benefit of making their data openly available as linked data and enabling semantic data within their systems, the ability to implement projects or create linkable data has been out of reach for many institutions and organizations. Some of this is due to lack of resources, but some is due to a lack of accessible tools and techniques as well as a lack of consensus on application use beyond first adopters. Many cataloging systems do not generate linked data, and few make data available as linked open data. Research libraries, by getting involved in the Wikidata community as users and contributors, can help address these barriers. Using infrastructure such as Wikibase and Wikidata allows integration into a variety of systems and tools that can be used to maximize interoperability and minimize hands-on work, and potentially allows for efficiencies across the research library sector.

This white paper advances a number of recommendations for research libraries to contribute to Wikidata, and the ecosystem of linked data that is growing around Wikidata in the form of other Wikibases. There are still a number of open questions about the the long-term needs for maintaining library engagement with open platforms, like Wikidata, but there is momentum in that direction. Before the library community can develop a shared strategy for working in this space of collaborative linked data, it is important to both run pilot projects and develop understanding and skill among library staff. Through such work, the library community can develop an understanding of the circumstances under which Wikimedia engagement is a competitive way to invest institutional resources.

No comments here
Why not start the discussion?