This task is to track research exploration on how AI-assisted tools can enhance content discovery, moderation, and newcomer support on Wikipedia. The initial focus is on building a content diff index to make Wikipedia’s content history more accessible for non-article-centric questions.
We will validate the index by targeting a specific, high-value query pattern and creating an early prototype. This will help us test technical feasibility, gather early feedback, and inform design decisions for future iterations.
Sprint 1: Content Diff Index
- Implement a first version of the content diff index for a chosen query pattern (e.g., tracking who added specific terms or sources).
- Validate the output and performance on a realistic subset of the dataset to confirm feasibility and identify any infrastructure constraints.
- Output: will be a database that allows efficient requests for the chosen query pattern
Sprint 2: Prototype UI for PM Feedback
- Build a simple UI prototype (possibly just a json api) that demonstrates how the content diff index can be used to answer practical user questions.
- Share the prototype with product managers to gather concrete feedback on usefulness, desired features, and next steps.
- Output: a documented planned next steps informed by the previous step.