Pronouns: he/him
Babel: it-N, en-3, fr-1
Note: I use this account for both work-related and volunteer activities. Everything that I do tagged with Connection-Team or related to the CampaignEvents extension is in my work capacity, and everything else is in my volunteer capacity, unless otherwise stated.
User Details
- User Since
- May 18 2017, 10:49 AM (446 w, 3 d)
- Availability
- Available
- IRC Nick
- Daimona
- LDAP User
- Daimona Eaytoy
- MediaWiki User
- Daimona Eaytoy [ Global Accounts ]
Wed, Dec 3
The schema change has been approved by DBA in T408460 and is currently in review, so I'm closing this task.
Tue, Dec 2
Mon, Dec 1
Previously: T410748, T411331, T411383. Like last time (T411383#11420846), I'm not seeing failed run logs in logstash for the last hour, and the last execution was actually successful. However, I just realized from https://wikitech.wikimedia.org/wiki/Mw-cron_jobs#Manually_deleting_a_failed_Job that the last failed job (T411331) needs to be manually deleted or the alert will keep coming up. As noted in T410748#11406974, we have only one deployer on the team and he's OOO. My 'restricted' access is not sufficient to delete the job either (speaking of which: are there plans to change this?). So, tagging serviceops for assistance. Thank you!
Given I just closed T411331#11419430 and another before that, I'm assuming this is the same thing and therefore closing this one as well. However, I will point out that I cannot find anything in logstash this time around, as the search gives 0 results for that time range. I'll escalate with SRE if this happens again.
Yep I saw those, if anything to find out whether my horrible hack had propagated, and so far it looks like it hasn't. I guess my point above is that str_starts_with should never be a replacement (except for ApiResult), and that [0] access (possibly with empty string check or null coalesce) should typically suffice.
I would hope that the str_starts_with -> ord optimization does not exist anywhere else. It's a sin I explicitly came up with for that code in ApiResult and I hope it hasn't spread. In most (if not all) cases, the actionable should be to add an explicit [0] to get the first character or something like that.
Seen this a bunch of times over the last few days in CampaignEvents as well, the last one being https://integration.wikimedia.org/ci/job/quibble-with-gated-extensions-selenium-php83/439/console for r1203476.
Given we just had T410748, I went to check logstash and found logs identical to last time: "The service mesh is unavailable, which can lead to unexpected results. Therefore, the script will not be executed. If you are *very* sure your script will not need the service mesh at all, you can run it again with MESH_CHECK_SKIP=1". Hence closing as another transient failure.
Fri, Nov 28
I just came across this bug but with a Message component, hence updating accordingly. I suppose there are going to be more component with the same issue.
This would be similar to r1152415, except we now have fewer references to Message (see codesearch). Basically just the two in MediaWikiSecurityCheckPlugin and then tests.
Yeah, I'm not sure why the method was determined to be extremely hot, it's possible the profile(r) we used back then was faulty. str_starts_with is more readable, does not emit deprecations, and from my quick tests, it seems even faster than ord, so...
Thu, Nov 27
The original error has also been resolved thanks to cached entries expiring, and now it won't ever happen again, hence closing.
Wed, Nov 26
Hi DBA, tagging you for review of the following schema change, as per items 1 and 2 of the workflow.
Thanks for the pointers, makes sense to me! I'll go ahead and close this then, as it was just a transient error.
Tue, Nov 25
- The missing space between the ) and { on lines 1 and 3
Yeah I'd love to finish this but there are a bunch of decisions to be made described in T338091#8900888 and I obviously can't make them on my own.
There are a few more left, see https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/tools/phan/SecurityCheckPlugin/+/9a021f3704adecc4a060e6deccf24462464a1f2a/MediaWikiSecurityCheckPlugin.php#88. From a quick glance:
- Database stuff needs to remain here due to complex taintedness shape which is not exposed in annotations; filed T411040 and made it a blocker.
- Shell stuff looks like it could be migrated
- Status(Value) stuff is there to address performance issues / limitations of taint-check. It should probably remain there unless we can improve the underlying performance issue. If the issue is fundamentally impossible to prevent (I don't remember the details), then I think moving them to core is OK.
This is complete AFAICT.
Not actively working on this. IIRC the patch conflicted with other stuff that I wanted to get merged first and is more important than generators.
This can be resolved once T397476 is closed.
This can be resolved once the two remaining subtasks are closed.
Mon, Nov 24
The beta is out for general testing and I have already tested it extensively on MW core before release. The upgrade should be easy at this point for us. I don't think we have a timeline for the official release yet but I can ask.
Upstream issue is https://github.com/TysonAndre/var_representation_polyfill/pull/5. This was worked around in phan v6 by live-patching the dependency. I'm not sure if that fix is worth backporting, I don't think we advertised phan 5 as PHP 8.5-compatible. Maybe we can just ignore this until we upgrade to phan 6?
General notes from looking into this:
- It seems pretty clear that the best option for storage is a separate table, as that will let us implement multiple goals quite easily, and avoid adding to the main campaign_events table. In code, goals can still be a property of EventRegistration. This would be similar to wikis, topics, etc.
- The table should only store the goal itself, not the progress towards it. This is because progress can be changed by multiple factors: for example, for a goal of number of edits: someone may add an edit, or remove an edit, or the page might be deleted (which will also remove the association), or the goal target itself might be changed by the organizer. So, progress would always be computed on-demand.
- This should be relatively fast, at least in the initial version; not too dissimilar from the code that builds the output in Special:EventDetails: a single query using COUNT or SUM aggregates.
- If needed for performance, maybe in a later version, we could add a simple layer of caching. However, this would only be useful if done aggressively, with unconditional caching for a short period of time (a few minutes) and no explicit purges. This might also be made explicit in the UI, saying that the progress shown might be outdated and will update in a few minutes. However, it would only protect against users rapidly viewing progress many times in a short time period. If we were to implement longer and less aggressive caching, we would need proper cache invalidation. But that, in turn, would require reacting to all the things that can change progress, which is exactly what I was trying to avoid by not storing the current progress in the database.
- Not storing current progress also makes it easier to reason about continuing to track progress once the goal has been reached. Whether a goal has been reached is something we would only find out after pulling the metrics, so it shouldn't make any difference.
- Where and how to show progress exactly is outside the scope of the investigation and will be looked into in T407786. I made sure that the task has a note about risks of CDN cache.
- The option to show progress on the event page would also require a schema change to the campaign_events table. I'm not sure if that falls under the scope of this task though, and also, last I heard we weren't sure whether to include that option. I'm also unsure how it would scale to multiple goals (i.e., choose which ones to show); if we choose to leave it out, there'd be no issues.
This has been completed long ago, all subtasks have been resolved.
Fri, Nov 21
Alright, I'll make a patch to switch to JSON serialization, as that's a good thing to have in general and would prevent future breakage. At this point I imagine we won't need to backport it to fix this task immediately, but still, the patch will be available if needed.
Hmmmm I can see that the job skipped the last execution at 15:00Z, but I don't know where to run the command above, and logstash has a lot of noise. Since all previous runs (every 3h) have been successful, I'm inclined to think this was just a fluke. What would be the next steps for us (Connection-Team) here? Tagging serviceops for guidance.
Thu, Nov 20
Just had the same issue with a new wiki. The conflict logs an error up higher in the logs, but wiki creation proceeds anyway:
6.0.0-beta was just released today. My last round of testing on MediaWiki core showed that there are no remaining weird regressions, not sure if we want to do more testing and if so of what kind.
Wed, Nov 19
Hmmm I'm not really sure... I could change from PHP serialization to JSON or something, but that's a bigger change. As a quick fix I suppose we could change the cache key, or ignore the deprecation (cached entries will be discarded anyway upon checking the version, and in 1 week's time all outdated entries will expire and the error will be gone). In general though, I'm not sure what else someone is supposed to do to avoid this issue. The point of setting and bumping the version key should be exactly to avoid these issues... Although I suppose with PHP serialization being discouraged, it kinda makes sense that support is best-effort... At any rate, I'll be gone for the day.
Surely caused by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CampaignEvents/+/1201801... I did bump the cache version key, but apparently it's one of those cases where it is not enough... Let me think of what can be done about it...
As this is a simple "search and warn" (just check for the existence of a cast token with a deprecated name and warn), it seems perfect for PHPCS.
Tue, Nov 18
Scheduled for 2025-11-20 14:00 UTC.
One thing I just thought about: in the current implementation, when someone adds a contribution it is not added to the table in real time. Firstly, this would be tricky because some of the information we need to build the table row cannot quickly be built client-side (like figuring out the localised name of the wiki, or proper linking, etc.). But most importantly, even if that weren't an issue, new contributions are enqueued for delayed processing and not added immediately. It could be a matter of seconds, but under high load it could take minutes (and in extreme cases hours, or fail altogether). So, I don't think it can be "fixed", but maybe we can update the success message to clarify this point? E.g. "Edit added, it will appear here soon" (I know this sucks, it's just to give an idea of the general concept).
The current failure seems to be due to a cloudflare outage (can verify at the link above). I'm not sure what we can do about it, probably nothing?
@cmelo This task uses the template for requesting a schema change in production, but we're not ready for that yet. According to https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change, we also need a separate task to design the schema change, get DBA approval, and merge it. Were you planning to create a new task or use T408460?
