Page MenuHomePhabricator

Project Proposal: Broken Link Detection Tool for Wikimedia
Closed, InvalidPublicFeature

Description

Wikimedia projects are built on the reliability of references and interconnected knowledge. However, broken links are a growing threat to that reliability. As external websites shut down or URLs change, references become inaccessible, weakening the value of countless articles. The Lusophone community however, lacks an automated way to detect and report these broken links. Manually hunting them down proves to be time-consuming, inefficient, and unsustainable in the long run.

I'd like to propose building a "Broken Link Detection Tool", a lightweight, automated system that scans pages, detects broken internal and external links, and provides a clear, categorized report for editors to act on. The tool will empower contributors by taking care of the tedious technical part, so they can focus on content correction and improvement.

Technically, the tool would work by first scanning Wikimedia articles so as to retrieve and parse through all mentioned links. The links would be validated using HTTP requests for error handling, and Wikipedia's API to catch deleted or redirected pages, and then a report would be made which shows the pages with broken links, type of broken link, error code and other necessary information.

This would be a very important and useful tool as it helps preserve the credibility of the articles, helps editors get needed information on links without much manual efforts, and is a tool with localized impact on the Portuguese community, which really needs the functionality and ease which this tool brings.

If approved, the tool can be deployed either as:

  • A web dashboard for contributors to view reports and search for affected pages.
  • A Wikimedia bot that posts weekly broken-link summaries on relevant discussion or maintenance pages.
  • A command-line tool for power users who want to run it manually.

To get the best out of the project, we would be needing:

  • Development time and mentorship for integration with Wikimedia APIs
  • Feedback and test cases from Lusophone contributors
  • Hosting via Wikimedia Cloud Services or any other hosting service approved by Wikimedia

At the end of the project, it is expected that the community would have:

  • A fully functioning link-checking tool for the Lusophone wiki community
  • Cleaner, more reliable articles across Wikimedia
  • A repeatable model that can be adapted for other language communities

This project solves a real, long-standing problem with a solution that’s lightweight, maintainable, immediately impactful and gives Wikimedia a powerful tool that can be replicated across different language communities. With the community’s input and Wikimedia’s support, we can eliminate broken links at scale, and protect the quality of knowledge that millions depend on.