Page MenuHomePhabricator

Outreachy 31: Micro-task Generator for Organizers on Wikipedia
Open, Needs TriagePublic

Description

Micro-task Generator for Organizers on Wikipedia

Develop a tool that automatically generates prioritized micro-tasks for Wikipedia articles to help organizers and new editors. The tool will analyze article metadata, maintenance templates, and engagement metrics to surface specific improvements, such as "add citations" or "fix dead links." This reduces the burden on campaign organizers and provides clear entry points for new contributors. The final deliverable will be a web application prototype suitable for deployment on Wikimedia Toolforge or OnWiki.

Timeline

Phase 1: Foundation & Setup (Weeks 1-3)

Focus: Onboarding, environment setup, and detailed planning.

Week 1 (Dec 8-14)

Onboarding & Community Integration

Administrative tasks

  • GitLab (create a developer account, here)
  • Research Page
  • Sign Agreement
  • Complete the Toolforge Quickstart tutorial. Deploy a simple "Hello World" tool.
  • Primary Objective: Finalize and document the project's technical design & data flow (LiftWing API -> Backend -> Frontend View).

👥 Community events to get to know the environment better (⚠️ not compulsory):
Attend Igbo Wikimedians User Group Year-End Review Meeting on Saturday (Dec 13) at 4:00 PM WAT
AND/OR the Wikipedia Edit-a-thon (Dec 13)
Dec 13) Event:Africa Wiki Women Roundtable community meetup/2025 END YEAR MEET UP
Note: Could not attend because of conflicting agendas


Week 2 (Dec 15-21)

Backend Foundation
🛠 Primary Objective: Build the core backend service (Python/FastAPI or similar).
Implement:

  • 1) Accept a list of article titles.
  • 2) Call the LiftWing Quality API for each.
  • 3) Parse and structure the response to extract the overall score and key feature scores (references, images, etc.).
  • 📦 Output a simple JSON for the frontend.

~~👥 Community: Attend the Afrika Baraza event (Dec 18 | Zoom link to register) to network and learn more about the community (we can discuss if presenting the prototype is useful)

Week 3 (Dec 22-28)

Frontend Foundation & Topic Integration
🖥 Primary Objective: Create a basic frontend (simple HTML/JS or a Toolforge tool interface).

Implement:

  • # Input for article lists.
  • # Display results in a table showing Title, Quality Score, and a "Potential Needs" column derived from low feature scores.
  • 🔗 Integrate a basic topic model (e.g., via ORES or a simple category check) to allow filtering by topic area.

Phase 2: Core Development & Extension (Weeks 4-7)

Focus: Building the main features and beginning to extend the backend with recommendations.

Week 4-5 (Dec 29 - Jan 11)

Extend the Backend & Refine UI

🛠 Primary Objective: Design and build the "maybe-add-this" prototype extension API.

  • Implement functions to generate simple recommendations for: Infobox template names, Categories, and Sections.

🖥 Frontend: Update the UI to toggle between the "high-level needs" view and a more detailed "recommendations" view.

  • Frontend UI toggle implemented.
Week 6-7 (Jan 12-25)

Feedback Loop & Polish

👥 Primary Objective: Solicit initial user feedback.

  • Identify -2-3 experienced Wikimedians (e.g., from attended events) and schedule brief demos.
  • Present the tool and ask: "Do these views match what you look for when choosing articles to improve?"
  • 📝 Document feedback and create a prioritized list of adjustments for the final weeks.
  • 🔧 Polish the UI/UX based on initial reactions.

Phase 3: MVP Finalization & "Nice-to-Haves" (Weeks 8-11)

Focus: Solidifying the MVP and exploring advanced, stretch features.

Week 8-9 (Jan 26 - Feb 8)

MVP Lockdown & Selection of the 5 "Nice-to-Haves" (if time is available) ← this list could change

  • 🎯 GOAL: MVP Feature Complete. All core backend and frontend work is finalized, tested, and documented on-wiki.
  • ✨ Nice-to-Have #1: On-Wiki Output. Add a feature for users to generate wikitext (e.g., a checklist or a worklist template) based on the tool's analysis, which they can paste into their userpage or a project page.
Week 10 (Feb 9-15)

Progress Tracking & Link Analysis

  • Fix bugs, testing, and implementation of feedback received.

✨ Nice-to-Have #2: Progress Dashboard. Design a simple dashboard concept. Could track: # of articles analyzed by a user, # of "potential tasks" identified, and (if feasible) allow users to manually mark tasks as "done." (We could no longer add this due to limitations and impracticality).

Week 11 (Feb 16-22)

Advanced Features & New Editor Research

  • ✨ Nice-to-Have #5: New Editor Usability & Retention Research. Draft a section for the final report analyzing: 1) How the tool lowers barriers for new editors. 2) A proposed method to track if event attendees who use the tool return to edit (e.g., via a voluntary opt-in survey or tag).
Phase 4: Wrap-up (Weeks 12-13)

Focus: Documentation, final presentation, and handover.

Week 12 (Feb 23 - Mar 1)

Final Integration & Documentation

📚 Primary Objective: Complete all project documentation.

  • Create a comprehensive on-wiki User Guide and Technical Documentation.
  • Ensure code is clean, commented, and deployed robustly on Toolforge.
  • Integrate any completed "nice-to-have" features.
Week 13 (Mar 2-6)

Final Demo, Report, and Handover

  • 🎤 Prepare and deliver a final presentation/demo for mentors and the community.
  • 📄 Submit final Outreachy report (this should be a Diff post), synthesizing work done, feedback received, and future possibilities (especially regarding new editor retention).
  • 👋 Final Day (Mar 6): Handover complete. Celebrate! 🎉

Event Timeline

Weekly Report

Week 1 (Dec 8 - 14)

  • Created my blog site and wrote my first blog for week 1 on December 8th.
  • Had my first virtual meeting with my mentors on Monday, December 8th.
  • Got assigned my first tasks, which were to create a GitLab account, a Wikimedia Developer account, a research page for the project, set up ToolForge, document the project workflow, and deploy a simple Hello World tool on ToolForge using Flask API, Fast API.
  • Created my Wikimedia Developer account on December 8th.
  • Documented the project's workflow on December 9th.
  • Created the project's research page on the Wikimedia Meta Wiki on December 9th.
  • Completed the setup of my GitLab account and ToolForge demo tool on December 10th.
  • Got assigned a task on December 9th to set up my personal research page on Wikimedia Meta Wiki pages, and completed this task on December 11th.
  • Deployed my demo tool on Flask API on December 11th and my FastAPI demo tool on December 12th.

Challenges

  • Had issues with logging in to ToolForge on my command line.
  • Had issues with installing Flask on ToolForge on my command line.
  • Had issues with installing Fast API on ToolForge on my command line.

What I Learned

  • I learned how to log in to ToolForge on the command line and how to install Flask and FastAPI on ToolForge.
  • I learned how to create research pages on Wikimedia Meta - Wiki, edit one, and also how to create a personal page on Wikimedia Meta-Wiki.
  • I learned how the LiftWing API works and how the whole project should be connected.

Weekly Report

Week 2 (Dec 15 – 21)

Activities Completed

  • I completed the backend foundation for the project using Python and FastAPI, which accepts a list of Wikipedia article titles and calls the LiftWing Article Quality API for each article.
  • I implemented logic to retrieve each article’s latest revision ID using the MediaWiki API, call the LiftWing Article Quality API to extract and return the article quality score along with key feature scores such as references, images, headings, and links in a JSON format.
  • I uploaded the project code to GitLab to enable mentors to review the code and assist with resolving the LiftWing API error.
  • I selected an event related to the project to attend from the Wikimedia Events page.
  • I reviewed related projects like the Newcomer Homepage on Wikimedia and PetScan to understand their upsides and limitations for improvements of our project.

Challenges

  • I faced issues with the LiftWing API responses, particularly around missing quality and feature score values.
  • I needed additional clarification on how LiftWing models interpret revisions compared to what the latest article states.
  • I faced errors when calling the responses from the LiftWing Quality API, which required mentor support and code review.

What I Learned

  • I learnt how to design a backend service that integrates the external LiftWing machine-learning API.
  • I improved my understanding of the LiftWing Article Quality model.
  • I learned that the LiftWing API requires request data to be passed as a JSON string using json.dumps, and that the request must use the data parameter instead of json in the requests.post call.

Weekly Report

Week 3 (Dec 21 – 28)

Activities Completed

  • I created a simple frontend for the tool using HTML, CSS, and JS
  • I added an input field that allows users to submit a list of Wikipedia article titles for analysis.
  • I displayed the results in a table showing each article’s title, overall quality score, and a “Potential Needs” column based on low feature scores such as references, images, headings, and links.
  • I implemented a way to associate articles with topic areas using article categories.
  • I added a filtering function so users can view articles by topic area.
  • I connected the frontend to the backend API to make sure the results are fetched and displayed correctly.

Challenges

  • I faced issues with understanding how to retrieve Wikipedia articles that are similar to the input article
  • I faced issues with filtering the categories based on keyword searches on the MediaWiki API.
  • I faced issues with organizing the table format when the buttons are clicked.

What I Learned

  • I gained much understanding of the MediaWiki API ecosystem and how to extract article properties
  • I learnt how to display the table in the correct format.
  • I learnt how to display the suggested categories on input and focus.

Weekly Report
Week 4 (Dec 29 – Jan 4)

Activities Completed

  • I implemented functions to generate suggestions for missing Infoboxes, absent images in Infoboxes, categories, wikilinks, and article sections, and displayed these recommendations on the frontend for the given articles.
  • I improved the frontend UI to allow toggling between the summarized needs and a more detailed recommendations view.
  • I added a function to redirect users to the edit pages of the inputted Wikipedia articles.

Challenges

  • I faced challenges in cleaning up the wikitext to generate pure text for displaying the section recommendations.
  • I faced challenges with extracting Wikipedia properties for infoboxes and sections.
  • I faced challenges with scoring the Wikipedia quality features to recommend the features with the lowest level score.

What I Learned

  • I gained more experience in building backend APIs that generate structured recommendations to improve content quality.
  • I learned how to improve frontend interactivity to enable users to toggle between summary and detailed recommendations easily.
  • I deepened my understanding of the Wikipedia API ecosystem and how to extract the article sections from wikitext.

Weekly Report
Week 5 (Jan 5 – Jan 11)

Activities Completed

  • I implemented functions to generate suggestions for categories based on a few search inputs and allowed for the entry of the number of articles to display under the selected categories.
  • I added a function to call the LiftWing API topic model and display the topic categories of each article in the table.
  • I added a function to call the LiftWing API country model to display the geographical location of each article in the table.
  • I improved the UI to allow filtering by both topic and geographical location.
  • I improved the UI and logic to allow multi-filtering using these filters.

Challenges

  • I faced challenges in extracting the topic categories suggestions from the MediaWiki API
  • I also faced challenges with allowing for the geographical value from the topic model to display under "filter by geography" instead of "filter by topic"
  • I faced challenges with selecting multiple filters and displaying the filtered results.

What I Learned

  • I learned how to extract topic category suggestions from the MediaWiki API after initially facing challenges with the process.
  • I learned how to correctly map geographical values from the country model so they display under the needed geography.
  • I learned how to implement and manage multiple filters simultaneously and correctly display the filtered results.

Weekly Report
Week 6 (Jan 12 – Jan 18)

Activities Completed

  • I created a merge request of my progress so far
  • I fixed the issues from the feedback on my merge request from my mentors. This included:
    • Using Toolforge's external font resources instead of Google Fonts.
    • Adding "redirects" for pages with alternate names, misspellings, or related terms to a single main article.
    • Avoiding hardcoding the endpoint URL, among others.
  • I added a CONTRIBUTIONS.md file to include instructions for how to run pre-commit locally before submitting a merge request.
  • I added a Pre-commit config so the GitLab CI ensures the quality and consistency of the code before allowing the merge request.
  • I added a GitLab CI file

Challenges

  • I faced challenges when applying pre-commit formatting to main.py

What I Learned

  • I learnt that the lower stylesheets override the higher ones if there are two in a document
  • I learnt about the usefulness of redirects in calling the Wikipedia API
  • I learnt how to write the following documentation: CONTRIBUTING.md, Pre-commit config, and GitLab CI
  • I learnt that Toolforge does not allow for external dependencies

Weekly Report
Week 7 (Jan 19 – Jan 25)

Activities Completed

  • I fixed the category suggestions to suggest categories in the specific language
  • I hosted the tool on Toolforge.
  • I added page views of 1 year for each article to the article table and a sort functionality button to sort between high to low views and low to high views.
  • I added a progress bar for the scores functionality
  • I added a click functionality to the table rows that toggles between high-level needs and the main description.

Challenges

  • I faced challenges when hosting on Toolforge
  • I faced challenges when extracting the page view from the MediaWiki API

What I Learned

  • I learnt how to add a toggle function to the rows of each table
  • I learnt how to call and calculate the page views of 1 year from the MediaWiki API
  • I learnt how to debug on Toolforge through the logs.

Weekly Report

Week 8 (Jan 26 – Feb 1)

Activities Completed

  • I made the results table sortable using DataTables, allowing users to sort by article title, quality signal, namespace, and other columns.
  • I fixed an issue where non-existing articles were incorrectly marked as “Up to date”; missing pages are now clearly flagged as “Page does not exist.”
  • I fixed a namespace handling issue where category results included Talk pages by normalizing titles.
  • I removed duplicate article recommendations so each article appears only once in the generated task list.
  • I fixed the incorrect splitting of page titles containing commas; page titles are now only split by newlines.

Challenges

  • I faced challenges when implementing DataTables, particularly around sorting behavior and table updates.
  • I faced challenges with some articles showing that they don't exist when they do.

What I Learned

  • I learnt how to work with DataTables to make tables sortable and interactive.
  • I learnt how to debug to see which API endpoint has an error.
  • I learned how to deduplicate results effectively to prevent repeated recommendations in generated task lists.
  • I learned how to properly handle non-existing pages by detecting missing articles from the MediaWiki API and displaying accurate status messages.

Weekly Report
Week 9 (Feb 2 - Feb 8)
Activities Completed

  • I improved the article list textarea by setting a minimum height to display at least 7-8 lines and added a vertical scrollbar.
  • I fixed the pagination issue where page links moved down when articles had long names by making the pagination controls sticky/fixed at the bottom of the results section.
  • I added "Showing 1 to m of n entries" information at the top of the results table in addition to the bottom for better visibility.
  • I added a display counter showing the total number of articles in the input list box above the textarea.
  • I moved pagination links to appear at both the top and bottom of the results table for easier navigation.
  • I added a close button to the filter dropdowns so users can manually close them after making multiple selections, instead of relying on clicking outside.
  • I improved the "All" button labeling to dynamically show "All/None" depending on the current filter selection state.
  • I implemented duplicate handling for redirect articles by checking revision IDs and ensuring only the first instance of each unique article is processed, preventing the same article from appearing multiple times due to redirects.

Challenges

  • I faced challenges with making pagination links stay in a fixed position regardless of page content length, especially when article names wrapped to multiple lines.
  • I experienced performance issues when toggling the "All" button with large numbers of filter options, particularly in the "Filter by Topics" section.
  • I had to understand how MediaWiki handles redirects and revision IDs to properly deduplicate articles that redirect to the same page.

What I Learned

  • I improved my understanding of how to reduce load times through sequential category member fetching in batches, parallel revision ID in batches, and concurrent article processing with 4 API calls per article for quality, topics, countries, and pageviews.
  • I learned how to implement sticky/fixed positioning for pagination controls to keep them visible regardless of content length.
  • I learned techniques for improving JavaScript performance when handling bulk checkbox operations on large datasets.
  • I learned how MediaWiki's revision ID system works and how redirected articles share the same revision ID with their target pages.

Weekly Report
Week 10 (Feb 9 - Feb 15)
Activities Completed

  • I implemented a "Days Since Last Edit" prioritization column that calculates and displays the number of days elapsed since an article's last revision by fetching revision timestamps from the Wikipedia API and calculating the difference from the current date.
  • I added a "Number of Lang." (language links/sitelinks) column that shows how many other language versions of Wikipedia have articles on the same topic, using the langlinks property from the Wikipedia API.
  • I included a more detailed progress bar of each recommended needs in a toggle function of the table row.
  • I wrote functions to enable the exportation and copying of the table in Wikitext, TSV, and CSV files.
  • I added explicit width specifications to all table columns to prevent DataTables from facing undefined width calculations.

Challenges

  • I struggled with writing the format of Wikitext and TSV tables in JavaScript.
  • I faced troubles in making sure the "maintenance message" feature only recommends when appropriate.
  • I faced issues with fetching the sitelinks from the Wikipedia API.

What I Learned

  • I learned how to use Wikipedia's revision API to fetch both revision IDs and timestamps simultaneously, and how to handle redirect resolution and title normalization in the same query.
  • I gained understanding of how Wikipedia's langlinks property works and how it differs from Wikidata's sitelinks count, representing the number of interlanguage links on the current wiki.
  • I learnt how to include the Wikitext formatting into the export function.

Weekly Report
Week 11 (Feb 16 – Feb 22)
Activities Completed

  • I fixed a bug where articles with lowercase titles would show zero pageviews.
  • I added commas to large pageview numbers so they are easier to read and sort.
  • I added tooltips across the interface to explain things that were unclear, including more descriptive names for the tasks.
  • I removed geography-based topics from the topics filter, since regional filtering is already handled by the country selector.
  • I removed the separate Quality column from the table and folded that information into the progress bar column, which now reads something like "Quality progress: 65%."
  • I grouped API requests together and added timing controls so that data is fetched more efficiently, reducing unnecessary load on Wikipedia's servers.

Challenges

  • Making sure the corrected article title was used consistently across every type of API call (pageviews, language links, quality) were more challenging than expected.
  • It was challenging to get the pageview column to sort correctly without accidentally breaking other columns that also contain commas.

What I Learned

  • I learned how Wikipedia's API resolves normalizes titles, and how to use that corrected title as the starting point for everything else in the tool.
  • I got a better understanding of how table sorting works and how to handle numbers that are formatted with thousands separators.
  • I learned how to send batches of multiple articles in the front end with delay at intervals.

Weekly Report
Week 12 (Feb 22 – March 1)
Activities Completed

  • Wrote a final report and documentation for a Diff post.
  • Created an "about page" for the tool here.
  • Completed all the remaining "nice to haves" like adding a link from the tool to the documentation page.
  • Updated the Wiki Table copy format to the current table entries.
  • Deployed the final codes on Toolforge.

Challenges

  • No challenges for this week.

What I Learned

  • I learnt how to link an article section to the menu content on the about page.
  • I learnt how to write technical documentation.