Page MenuHomePhabricator

Integrating Internet Archive API into Wikidata Reference Validator
Closed, ResolvedPublic

Description

Project title:
Integrating Internet Archive API into Wikidata Reference Validator

Project description:
This project aims to enhance the Wikidata Reference Validator by automatically suggesting archived versions of dead or broken reference links. When a reference URL is detected as dead, the tool will query the Internet Archive Wayback Machine API to find an available archived snapshot and present it as a suggested replacement for Wikidata editors.
This integration will improve the reliability of sources on Wikidata and help preserve reference integrity across the platform.

Contact person(s)/ Mentor(s):
JosefAnthony

Project type: (tool, feature, bug fix, documentation, research, etc.)
Tool enhancement (API integration, feature development)

Skills or technologies involved: (Python, MediaWiki, APIs, design, etc.)
Python, Flask, JavaScript, HTML/CSS, REST APIs, Internet Archive API, Wikidata APIs, Toolforge

Project doc or setup link: (Eg. link to the doc)
Tool: https://wikidata-reference-validator.toolforge.org/
Repository: https://gitlab.wikimedia.org/josefanthony/wikidata-reference-validator-app
Doc: https://gitlab.wikimedia.org/josefanthony/wikidata-reference-validator-app/-/blob/main/README.md?ref_type=heads

Task list: List the subtasks associated with this project. If none exist, create a sub Phab ticket and add the Phab ID here.

  1. https://phabricator.wikimedia.org/T409298
  2. https://phabricator.wikimedia.org/T409302
  3. https://phabricator.wikimedia.org/T409304

Success criteria: What does success look like? Mention measurable outcomes or key metrics (e.g., feature demoed, patch merged, docs improved, bug(s) fixed, etc.).

  1. Archived URL suggestions appear correctly for at least 80% of dead links detected
  2. Demonstration of a working prototype during Wiki Indaba hackathon
  3. Code merged or ready for deployment on Toolforge
  4. Documentation updated to include the new feature

Participation format: (are you joining in-person, virtual, or hybrid)
virtual

Any other details to share?:

Event Timeline

JosefAnthony claimed this task.

This feature has now been fully implemented and deployed to the Wikidata Reference Validator.

The tool now automatically queries the Internet Archive Wayback Machine API when a dead or broken reference URL is detected and suggests an available archived version to editors.
Key outcomes:

  • Dead references are detected and logged
  • Archived URLs are fetched via the Internet Archive API

Relevant commit:
https://gitlab.wikimedia.org/josefanthony/wikidata-reference-validator-app/-/commit/5de0594d3521b9f897062c6eccdf84d47d69c462