Page MenuHomePhabricator

Add a new endpoint to get the participant ongoing events
Closed, ResolvedPublic

Description

Create new endpoint to return the participant ongoing events
Used to list the events the user can associate to an event when editing an article

Acceptance criteria

  • This endpoint will return events in which the current user participats and that:
    • Have track contributions enabled
    • Target the current wiki
    • Are ongoing
    • Are not deleted
    • NB: This includes events for which the user has registered privately
  • The endpoint is documented

Event Timeline

cmelo changed the task status from Open to In Progress.Aug 5 2025, 7:50 PM
cmelo claimed this task.

Performance Question: API Query Optimization for Ongoing Events

Background:
Adding ongoing=true parameter to /campaignevents/v0/participant/{userid}/event_registrations API.
This endpoint will be called on every article edit/creation to populate a dropdown with ongoing events to ask if the edit is related to an event that the user is attending. (we may use cache as well and also have questions about to implement this cache, see T401211)

I would implement option 2, which sounds better in terms of performance and avoids unused data, but we are wondering if option 1 is also acceptable, even though it will return unused data for this use case.

Current API (without ongoing parameter):

  • Uses getEventsByParticipant()newEventsFromDBRows()
  • Results in 6 total queries (1 main + 5 related data queries)
  • Returns full event objects with all data

Two options for implementing ongoing=true:

Option 1: Modify existing method

  • Add ongoing filter to getEventsByParticipant()
  • Still triggers 6 queries but filters by date
  • Returns full event objects (we only need event_id and event_name)

Option 2: Create new optimized method

  • New method getOngoingEventIDsByParticipant()
  • Single query: SELECT event_id, event_name
  • Avoids 5 additional queries and unused data loading

Question for Performance Review:
Since we only need event_id and event_name for the dropdown, should we create the optimized single-query method, or is the 6-query approach acceptable when this runs on every article edit?

Files Modified:

  • src/Event/Store/IEventLookup.php
  • src/Event/Store/EventStore.php
  • src/Rest/ListEventsByParticipantHandler.php

Screenshot 2025-08-06 at 12.29.53.png (840×1 px, 118 KB)

I think my question here would be, more generally, what is our performance budget. The context is that we need to get information about a user right after they made an edit. Everything else is basically up for discussion, i.e. whether we can implement this as a standalone API request, whether we should have a warmup request (during edit stashing), whether we should cache this information and where, etc. I think the number of queries is just one of the aspects, and possibly a minor one (I'm not sure how much of a difference it would make).

Also, keep in mind that the extra 5 queries are only run if we found at least an event that the user participates in. Which means that for the vast majority of users (and edits) it would still be just one query.

@Daimona I may be missing some context here. Could you zoom out to the larger problem being solved? i.e. what requiremements or needs led you to the choice of making an API request post-edit? This is already a very specific solution, with most performance costs and potential reliability-cost-benefit are already locked-in.

What data do you want to update after article edits?

Generally speaking, primary database data integrity should probably should not rely on a client coming back to perform page view, load and execute some JavaScript, and succesfully make another HTTP request. I'd expect that to involve significant data loss, as devices/browsers have no obligation to remain connected, open, online, on power, without network congestion, and without any JavaScript errors (plus, Grade C browsers where we don't load JavaScript).

When we use server-side DeferredUpdate or JobQueue job, this is orders of magnitude cheaper, simpler, and more reliable. Would those work here? If already ruled out, the reasons would help understand the problem space better.

Even a Deferred or Job may experience loss during timeouts and post-send errors, especially as the data you describe sounds like it might experience lock contention and database waits. (We previously experienced this around editcount, category members, and site stats. Is this similar?)

Would stateless naive approach work? For example, if we fetch/compute what you need on-demand in the place where you want to display it. That should give you "perfect" data in theory. I take it that doesn't scale in your case, but it helps to share what specific data you're computing, and why the naive way can't scale. A step up from that might be to store and re-calculate it in bulk periodically on a per-event basis, possibly triggered by a de-duped job after edits. This is how the BetaFeature extension counts active users, for example.

https://wikitech.wikimedia.org/wiki/MediaWiki_Engineering/Guides/Backend_performance_practices#Persistence_layer

@Daimona I may be missing some context here. Could you zoom out to the larger problem being solved? i.e. what requiremements or needs led you to the choice of making an API request post-edit? This is already a very specific solution, with most performance costs and potential reliability-cost-benefit are already locked-in.

What data do you want to update after article edits?

Yep, apologies. This is in the context of the Collaborative Contributions project that we are working on next. The parent task, T378035, has a broad overview, but the TLDR is that when someone makes an edit, we want to see if that person is participating in an ongoing event (based on the event start and end date and its target wikis; this is information we already collect). If so, we want to ask them whether the edit was made as part of that event, so that we can create a dashboard with all edits that were made as part of that event, including statistics etc. Some design prototypes to help visualize this: F65720219 (dashboard, overview), F65720216 (dashboard, details), F65698893 (post-edit dialog to associate edit & event).

For the user interaction part, we had two options: a post-save dialog (as in the image I shared above), or a new field in the editor itself (see previous designs in T395620#10989435). For the MVP, we chose the former option and left the second for a later iteration (possibly). I suppose we could revert that decision if there are fundamental concerns with it.

This task is concerned about the first part, i.e., determining whether the person making the edit is participating in any events. Because the dialog is shown post-edit, a possible approach would be to listen to the postEdit hook in JS, make an API request to determine if the user participates in any events, and if so, display the dialog. The reason I wanted folks to weigh in on performance is that this needs to be done after every edit [for clients that receive JS, but that's OK given the use case]. So, that means loading a new RL module after each edit, and making an API request. Because this changes a core user flow (editing), I wasn't sure about the performance budget, and more generally I thought it would be worth publicising.

Generally speaking, primary database data integrity should probably should not rely on a client coming back to perform page view, load and execute some JavaScript, and succesfully make another HTTP request. I'd expect that to involve significant data loss, as devices/browsers have no obligation to remain connected, open, online, on power, without network congestion, and without any JavaScript errors (plus, Grade C browsers where we don't load JavaScript).

I think these are valid concerns, but given the use case, they are also acceptable risks. The worse that could happen is that an edit that was meant to be associated with an event will not be registered as such, which is basically the status quo today. The edit<->event association will also be editable, so data can be corrected retroactively if need be.

When we use server-side DeferredUpdate or JobQueue job, this is orders of magnitude cheaper, simpler, and more reliable. Would those work here? If already ruled out, the reasons would help understand the problem space better.

Those would work if we showed the event selector within the editing interface, with the submitted value being processed as part of the edit (onPageSaveComplete et al). However, as mentioned above, we ultimately went with a post-edit dialog, at least for the initial version. I think the main reason behind this choice was having more space available for explanations / help text, adding additional controls (e.g., "don't ask me again for this event"), and potentially showing more information like progress measurements or other things to make the experience a bit more engaging.

Even a Deferred or Job may experience loss during timeouts and post-send errors, especially as the data you describe sounds like it might experience lock contention and database waits. (We previously experienced this around editcount, category members, and site stats. Is this similar?)

Note that we will only write data if the user chooses to associate the edit and the event, which is a tiny minority of all edits. I'm not much concerned about that part. At any rate, all the writes would be made to the extension's tables, which for production are in x1 and separated from the main wiki DB. We will still need to gather data about the edit and its parent revision though, like the size, number of links, etc. We plan to do this via the JobQueue anyway.

Would stateless naive approach work? For example, if we fetch/compute what you need on-demand in the place where you want to display it. That should give you "perfect" data in theory. I take it that doesn't scale in your case, but it helps to share what specific data you're computing, and why the naive way can't scale. A step up from that might be to store and re-calculate it in bulk periodically on a per-event basis, possibly triggered by a de-duped job after edits. This is how the BetaFeature extension counts active users, for example.

It would work for the data itself associated to that edit (e.g., the byte difference, number of links, etc.), but we still need to determine whether a given edit should be associated with the event or not; and the only way to know is to ask the editor during or immediately after the edit itself.

Change #1176514 had a related patch set uploaded (by Cmelo; author: Cmelo):

[mediawiki/extensions/CampaignEvents@master] Add ongoing boolean parameter to participant events API

https://gerrit.wikimedia.org/r/1176514

Change #1176514 merged by jenkins-bot:

[mediawiki/extensions/CampaignEvents@master] Introduce new method getEventsForContributionAssociationByParticipant

https://gerrit.wikimedia.org/r/1176514

cmelo renamed this task from Update participant-events API to support ongoing events filter to Add a new method on EventStore to get events for contribution association by participant.Sep 17 2025, 2:58 PM
cmelo updated the task description. (Show Details)

Just as a note, we decide to not create a new endpoint but just add a new method in EventStore and make the backend call it directly and return this data for the front end to use, it removes the need of a new http request, and that is why the task description and AC was changed.

A note for @vaughnwalters, this is not testable by itself, but it will be used by T400953, so testing that one covers the tests on this one, which is responsible for returning the participant ongoing events when the modal is shown for the user to select an event to associate with an edit.

cmelo renamed this task from Add a new method on EventStore to get events for contribution association by participant to Add a new endpoint to get the participant ongoing events.Sep 18 2025, 7:57 PM
cmelo updated the task description. (Show Details)

Change #1189568 had a related patch set uploaded (by Daimona Eaytoy; author: Daimona Eaytoy):

[mediawiki/extensions/CampaignEvents@master] Create REST endpoint to list events that can be associated with edits

https://gerrit.wikimedia.org/r/1189568

Change #1189568 merged by jenkins-bot:

[mediawiki/extensions/CampaignEvents@master] Create REST endpoint to list events that can be associated with edits

https://gerrit.wikimedia.org/r/1189568

This endpoint will return events in which the current user participats and that:
✅ Have track contributions enabled
✅ Target the current wiki
✅ Are ongoing
✅ Are not deleted
✅ NB: This includes events for which the user has registered privately
✅ The endpoint is documented

Screenshot 2025-10-01 at 10.29.32 AM.png (934×1 px, 113 KB)

Screenshot 2025-10-01 at 4.52.53 PM.png (1×1 px, 249 KB)

All AC is met, sending to product sign off