Page MenuHomePhabricator

GSoC 2026: Programs & Events Dashboard system-wide metrics and data downloads - Sharon Mwenda
Closed, DeclinedPublic

Description

Google summer of code 2026 Proposal: Programs & Events Dashboard System-Wide Metrics & Data Downloads

ApplicantSharon Mwenda
Contactsmkathambi@gmail.com
Githubhttps://github.com/sharon-kathambi
LocationBudapest, Hungary
TimezoneCET/CEST
OrganizationWikimedia Foundation
ProjectPrograms & Events Dashboard System-Wide Metrics & Data Downloads
Mentors@Ragesoss, @Abishekdascs
Duration350 hours (Medium)
  1. Abstract

The Wikimedia Programs & Events Dashboard (outreachdashboard.wmflabs.org) tracks hundreds of organized editing projects: edit-a-thons, education campaigns, and community programs across the entire Wikimedia ecosystem. But Wikimedia Foundation staff must request system-wide data manually, and the set of metrics exposed to the public is outdated and incomplete.

This project delivers two tightly related improvements:

  • A secure, authenticated data-download endpoint (CSV/JSON) giving WMF staff on-demand access to aggregated data across all editing projects on the P&E Dashboard.
  • An overhauled system-wide metrics layer new metrics, updated calculations, and a redesigned public-facing statistics page to make the impact of organized editing visible to the whole community

Both deliverables are grounded in the existing Ruby on Rails codebase and will follow the project's established conventions for data aggregation, API design, and React-based frontend components.

2. Motivation & Context
2.1 The Problem
The P&E Dashboard already collects granular data: article edits, bytes added, new editors, article quality scores via ORES, wiki coverage, and more. However, three gaps limit its usefulness for institutional stakeholders:

  • WMF staff point person @FRomeo_WMF must manually compile system-wide data on request- a slow, error-prone process that cannot scale as the number of programs grows.
  • The existing /stats page exposes only a narrow slice of available metrics and has not been substantially updated to reflect newer data models.
  • There is no API surface for programmatic consumption of aggregate statistics, making integration with WMF analytics pipelines impossible without a custom database query.

2.2 Why This Matters
The P&E Dashboard is the primary tool that Wikimedians worldwide use to organize and document their impact. Robust, accessible system-wide statistics serve multiple audiences: WMF staff making grant decisions, program organizers demonstrating community value, researchers studying organized editing, and the general public exploring the Wikimedia ecosystem. Automating data access directly removes a recurring bottleneck for WMF operations.

3. Technical Approach
3.1 Codebase Orientation
WikiEduDashboard is a Ruby on Rails application with a React/JSX frontend, a MySQL/MariaDB database, and Sidekiq for background jobs. Data about courses, revisions, and article quality is regularly fetched from Wikimedia APIs and stored in local tables. The existing app/controllers/analytics_controller.rb and related models are the natural anchoring point for new data-export features.

3.2 System-Wide Data Download Endpoint
A new Rails controller - app/controllers/staff_downloads_controller.rb will expose authenticated endpoints under /staff/downloads. Access will be gated by an admin/super admin role check consistent with the existing Role model. The endpoint will support:

  • Format selection: ?format=csv and ?format=json query params, handled by Rails respond_to.
  • Scope filtering: by date range, wiki, program type (course, edit-a-thon, generic), and campaign.
  • Async generation for large datasets: a Sidekiq background job generates the file, stores it temporarily and delivers a download link preventing request timeouts.
  • Audit logging: every download request will be recorded with staff user ID, timestamp, and filter parameters.

Exported fields will be defined in a dedicated presenter class (app/presenters/system_data_presenter.rb), making the shape of exports easy to review and extend. Initial field set (negotiated with mentors and @FRomero_WMF):

  • Program metadata: id, title, type, campaign, institution, home wiki, start/end dates, language.
  • Participation: total editors, new editors (registered during course), returning editors.
  • Content impact: articles edited, articles created, bytes added, references added.
  • Quality signals: average article assessment before/after (via ORES wp10 scores), total articles improved.
  • Coverage: list of wikis edited across.

3.3 Overhauled System-Wide Metrics
The existing SpecialStats model and /stats view will be refactored into a dedicated SystemStats service object (app/services/system_stats.rb) responsible for computing and caching aggregated metrics. Caching will use Rails.cache with a configurable TTL (default 24h), refreshed by a nightly Sidekiq job.

New and updated metrics to be added (full list subject to mentor review):

  • Active programs: currently tracked vs. all-time totals, broken down by type and wiki.
  • Editor retention: proportion of editors active in more than one program.
  • Geographic/linguistic spread: number of distinct wikis and languages covered.
  • Content quality improvement: aggregate delta in article assessment scores.
  • References added: a new metric leveraging existing revision data.
  • Historical trend data: monthly aggregates for plotting over time.

3.4 Frontend Updates
The public /stats page will be redesigned as a React component tree, consistent with other data-heavy pages in the codebase. It will feature:

  • A responsive summary card row (total programs, total editors, total articles improved, total wikis).
  • Time-series charts (using the existing Recharts or Chart.js integration already present in the codebase).
  • Filterable metric breakdowns (by wiki, program type, and date range).
  • A JSON API endpoint (/api/v1/stats) to expose the same data for third-party consumers.

3.5 Testing Strategy
Every new Ruby class will have RSpec unit tests, and new controller endpoints will have request specs. React components will be covered by Jest/RTL tests. I will target ≥90% line coverage on all new code. CI must remain green (rake spec + npm test) throughout.

4. Project Timeline
Total: 350 hours over approximately 14 weeks. The project is structured in four phases with a buffer week for review and unexpected complexity.

Phase / WeekHoursDeliverables
Community Bonding (May 1 - May 24)30 hrsGet up to speed with mentors and codebase. Trace existing stats models, the course/campaign data pipeline, and the role/permissions system. Sync with mentors and FRomeo_WMF to finalize the export field requirements and agree on the full metrics list. Produce a written implementation spec for mentor sign-off before coding begins.
Phase 1(May 25 - Jun 20)90 hrsBuild the staff data-download foundation: StaffDownloadsController with admin role-gating, new route, SystemDataPresenter, CSV and JSON serialization, async Sidekiq job, Active Storage file handling, and audit log model + migration. Full RSpec request and unit test suite. Submit PR 1 for review
Phase 2(Jun 21 - Jul 9)100 hrsRefactor SpecialStats into SystemStats service. Implement new metrics, caching layer, and nightly refresh job. Build /api/v1/stats JSON endpoint. Redesign /stats page as React component with summary cards, time-series charts, and filter controls. RSpec + Jest tests. Submit PR . Midterm evaluation.
Phase 3(Jul 11 – Aug 16)90 hrsStakeholder feedback pass: address WMF/mentor review comments. Implement additional filter options and any newly requested metrics. Improve download UI (progress indicator, history of past downloads for staff). Performance profiling and query optimization for large datasets. Documentation: API docs and admin guide updates.
Buffer & Polish(Aug 17 – 24)40 hrsBug fixes, accessibility review of new UI, code cleanup per Rubocop/ESLint. Final PR reviews and merge. Write project summary for documentation purposes

Midterm evaluation target (Jul 6–10): the staff download endpoint is fully functional and merged (PR 1); the SystemStats service and new metrics backend are complete and tested; the redesigned /stats React page is open for review (PR 2). The post-midterm period (Phase 3 + Final Polish) is dedicated to stakeholder feedback, performance hardening, documentation, and a clean final submission by August 24

5. About Me
I am a Masters student studying Computer Science(AI/ML) in Obuda University, I have experience working as a full stack developer for 3.5 years and 3 years contributing in open source
I have 3 years experience writing code in Ruby and React.

5.1 Engagement with WikiEduDashboard
I set up the full WikiEduDashboard development environment locally early in my application process, working through the project's setup documentation and resolving environment-specific issues along the way. That hands-on exploration gave me a solid map of the codebase how courses, campaigns, revisions, and users relate to each other, how background jobs are structured with Sidekiq, and how the React frontend consumes Rails API endpoints.

I have submitted and merged the following contributions as part of my application preparation:
Issue #6557 — Admin CSV export for all public courses and instructors. I added a new admin-only ReportsController action, a new route, and extended ReportCsvWorker to generate the file asynchronously, preloading associations to avoid N+1 queries and using find_each for batch processing. This PR directly mirrors the core technical work planned for this GSoC project.
Issue #6646 — Live preview for the Site Notice admin setting. I replaced a single-line text input with an auto-expanding TextAreaInput and built a new SiteNoticePreview React component with dynamic state sync via useEffect, giving me direct experience with the project's React component and state management conventions.

6. Deliverables Summary

  • Authenticated staff data-download endpoint with CSV and JSON support, async generation, audit logging, and scoped filtering.
  • SystemStats service object with expanded metrics set, 24-hour cache, and nightly refresh background job.
  • /api/v1/stats public JSON endpoint for third-party analytics consumers.
  • Redesigned /stats React page with summary cards, time-series charts, and filter controls.
  • Comprehensive RSpec and Jest test suites (≥80% coverage on new code).
  • Updated API and admin documentation reflecting all new functionality.
  1. Risks & Mitigations
RiskMitigation
Export field requirements unclearFinalize spec with FRomero_WMF during community bonding before writing any code.
Large dataset query performanceProfile early with realistic data volumes; add database indexes and paginate exports.
Async job complexityStart with synchronous CSV generation; move to async only once synchronous path is fully tested.
Scope creep on metricsPrioritize a small, well-defined set of high-value metrics; defer nice-to-haves to a follow-on PR.
Availability conflictsI wrap up my semester on May 24th and will have no conflict schedule till August

Event Timeline

Hi, thanks for submitting your GSoC 2026 project proposal with Wikimedia!

Please make sure you’ve also submitted your proposal on the official Summer of Code website: https://summerofcode.withgoogle.com. The deadline for both submission and any edits is the same, so ensure everything is finalized before March 31, 18:00 UTC, as changes won’t be possible after that.

We strongly recommend completing any updates at least 30 minutes before the deadline to avoid last-minute glitches or unexpected technical issues.

Wishing you all the best for your application. Hope to see you as part of the program soon! 🚀

Hi, thank you for your submission and the effort you put into your proposal. This year we received over 380 strong applications, and unfortunately we were not able to offer you a slot. This was a very competitive process, and many high quality proposals could not be selected. We truly encourage you to stay engaged and continue contributing to Wikimedia projects. Over the years, many contributors who were not selected for Google Summer of Code have gone on to make impactful contributions and become long term members of the community. Please do not see this as a failure, but as a step forward in your journey. We would love to stay in touch and support your continued involvement.

If you would like guidance on how to contribute to our projects outside GSoC, feel free to reach out to any of the mentors or org admins, they will be happy to help you get started.

You can get started or continue contributing here:

We hope to see your contributions in our community soon.