Babel: it-N, en-3, fr-1
In order to tell whether an AFPData is NULL because it's really null, or just because there was nothing to parse we'd need something like https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/478424/. Which is old, WIPpy, etc.
I also re-added the subtask for weird syntax. While vvv's plans were to support that syntax in the new parser, I think we should instead forbid it in the old parser. Which is another blocker because it's not supported already by CachingParser. I think I'll start with the "deprecate" part now.
@Krinkle Actually, I don't think we have to re-implement the parser from scratch. I trust the current implementation of the CachingParser, so all I need is some time to read and understand it. At most I could send a couple of patches for docs / code quality, but that'd be quick anyway. A more detailed plan:
Wed, May 22
The trace tells nothing useful: it's obviously a timeout error, happening for the link you provided, because the query is too slow. What's interesting is the query itself:
@Krinkle Well, that's cool! I think we should first expand code coverage a bit, there are a few patches on gerrit. Then I'd like to move with subtasks of this task, backing up the changes with more and more tests. Honestly, the CachingParser is harder to understand than the usual one, and I've never really dug into it. Parsers also tend to become more and more complex very easily, so that's really something for which extra help is always appreciated.
I'm a bit unsure about my availability in the next couple of weeks, but I can definitely find time for it. For a more detailed plan (which I still don't have) and better coordination, just tell me what's better for you; for me, it's fine either here on this task, IRC, hangout, etc.
Tue, May 21
Faster gun draw.
Gonna fix it anyway shortly.
Mon, May 20
Have you seen T187153? If it's the same issue (and I think it is), mRecord is null because there's a Revision object serialized and stuffed into the DB which could be 10 years old now, it's pretty obvious that unserializing and using it is not going to work...
Looking at the stacktrace and at the amount of errors on fiwiki, I'm still convinced that it's a duplicate of T187153. At any rate, we're gonna prove or disprove it once https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/510725/ hits production in wmf.6.
Sun, May 19
Per above, and skins are being updated to 0.6.0.
Something worth noting about the behaviour of substr in edge-cases: it's different in PHP5 (and HHVM) and PHP7. I came across this peculiarity while writing this patch for AF. See the changelog on php.net.
This basically means that in PHP7 you have:
Sat, May 18
Thu, May 16
Only happens on fiwiki because they have some automated process in place which scrapes AbuseLog entries and often comes across broken entries. In short, the error happens because in the past we used to store in the DB serialized instances of AFComputedVariable, which in turns includes instances of Revision etc (scary, right?). The Revision class has changed since then, and the error pops out. The proper solution is a maintenance script as devised in T213006, which in turn is blocked (at least) on T34478 and T213478. A hack was put in place at https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/502946/, but apparently that's not enough.
Wed, May 15
I checked with 2.x, and we have 64 DoubleEscaped of a total of 512 warnings, so they're not really a problem. I'll sample a few warnings and check how many false positives I got. If there are too many, it may be worth fixing taint-check first (if the fix is easy), then start working on core as soon as a future version (not 2.0) is released.
Fixed in phan 1.3.0, included in 0.6.0 of our config.
Mon, May 13
This is essentially the same as the child task, given that the strict PHP requirement is imposed by the old version of phan. Moreover, one could force-install seccheck via --ignore-platform-reqs.
Thu, May 9
(Copying my comment from gerrit)
Seen 400 times in the last 24 hours, can't tell what the impact is though.
Wed, May 8
@Jdforrester-WMF Huh, I forgot that LibraryUpgrader is sorta dead right now. Maybe updating the container would be quicker, but given that the phan upgrade will have to be done at some point, I'd suggest going ahead with it. I have updated the mediawiki/phan patch (https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/+/506064/) to require 1.3.2. If @Legoktm could merge it and release 0.5.1, and you could use such a script, that'd be awesome. Thanks!
Tue, May 7
@hashar For what concerns the progress bar, Lego's fix in phan was included in the 1.2.8 release, so updating our phan config will fix it. I sent https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/+/506064/ for that (actually, it could be updated to 1.3.2). Nevertheless, that would require launching libraryupgrader to update mediawiki/phan inside extensions.
Mon, May 6
Well, I think the EXEC bits should be removed from Linker::link params, as nothing is output therein. Something similar happened for Message::rawParams (fix is here) and at this point I think the same should happen for HtmlArmor::__construct. So I downloaded Sudo master, reverted rESUD2321205fcd6d654b99acb7270c3bcd754d944c06 and ran the last patch in 2.x (this one). The result is now:
There should probably be a check for the list being empty, and this should not bubble up to the user.
Bad value for parameter $dbkey: should not be empty unless namespace is main and fragment is non-empty
I just checked and this will be fixed (without the need to suppress the issue) with taint-check 2.0 (still in dev).
Fri, May 3
Well, core is definitely slower... We can try running it on different extensions, but looking at previous runs it seems like AF has more or less the same runtime as other extensions (or maybe slightly higher). I also tried running seccheck (on AF) with xdebug enabled to see if I could get something useful. I ended up with a runtime of 4000 seconds and no useful data :-/
So I did some testing, running seccheck on AbuseFilter with mwext-fast. The runtime was 103 seconds. Then I removed seccheck from phan's config, and got a runtime of 72 seconds, which should be our lower bound. Given that 30 seconds aren't that much, I guess it's really not a priority.
I tried to do this as part of the plugin upgrade. However, phan has no way to filter unused suppressions. I think our best bet is to add the config option to phan. This problem wouldn't be a thing if seccheck ran in the same job as phan does (where unusedsuppression is enabled), but I doubt this is something we'd want to do in the near future.
Thu, May 2
I checked the OOUI thingy and yes, it's a false positive due to annotations. Actually, it's the same issue mentioned by @Bawolff in the commit message of . Now it also affects other widgets. Given that no XSS is reported without those annotations (which is nice!), I guess that our best bet is indeed to merge the exclusion hack  and remove the annotations when all extensions will use seccheck 2.x.
With this, I confirm that 2.x is ready for release. It's not perfect, of course, but there's time to improve it. I'll send some other patches for various FIXMEs etc. from now on. We can decide to include them in 2.x, but that's completely optional. Should you need any clarification about my patches, want to coordinate for mass-merge, or just need another pair of eyes, I'm available; either here, via email, on IRC, hangout, whatever you prefer.
Wed, May 1
Likely-final update: I fixed another couple of regressions, and tested the plugin on a bunch of MW extensions, results below. NOTE: I used random versions of the extensions (i.e. for some of them it could be an old version).
- AbuseFilter --> 4 DoubleEscaped OOUI warnings
- CheckUser --> 0 warnings
- CentralNotice --> 0 warnings
- CodeEditor --> 0 warnings
- ContentTranslation --> 1 XSS warning (unverified)
- Echo --> Several warnings. Some are OOUI widgets, others are unverified.
- Flow --> 0 warnings
- MassMessage --> 2 warnings, unverified
- MobileFrontend --> 4 DoubleEscaped OOUI warnings
- SpamBlacklist --> 1 warning, unverified
- Thanks --> 0 warnings
- TitleBlacklist --> 0 warnings
- VisualEditor --> 3 DoubleEscaped OOUI warnings
Tue, Apr 30
@Jdforrester-WMF I completely agree, there are plenty of reasons to do that. However, it won't be quick. Code for it is already on gerrit (see T213006), however I'm currently waiting for input on T34478 and T213478. And the code needs review of course.
@thcipriani What would you suggest? Decreasing the log level to info?
IMHO, from a generic POV this caught exception should be logged as warning. However, given that some bot is flooding logs with it, and that apparently the underlying issue won't be fixed anytime soon, we can certainly make it less scary.
Seen 7 times in the last week, 4 of which were on zhwiki today at short breaks (probably the user re-tried to submit the form).
[XMg2-wpAICIAAE4Q9OwAAABN] /wiki/Special:%E5%B0%81%E7%A6%81/2001:B011:E005:0:0:0:0:0/48 InvalidArgumentException from line 591 of /srv/mediawiki/php-1.34.0-wmf.1/includes/CommentStore.php: $comment can not be null
Mon, Apr 29
An almost-final update: every integration test is passing now. I also took the chance for a change which isn't totally related to the upgrade, https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/SecurityCheckPlugin/+/506869/. See the commit message about why it's included there. So, as of https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/SecurityCheckPlugin/+/506124/, CI can be re-enabled (after T218719 is resolved).
Now I started running this new version on AbuseFilter (which passes with seccheck 1.5.1) to see if we missed any regression. In fact, I found one with branch scopes (fixed in https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/SecurityCheckPlugin/+/507015/) and I now see just 4 DoubleEscaped warnings all due to OOUI\Widget. I'll try to see if those are other regressions or actual issues, then test the plugin again on other extensions to ensure everything is working.
After that, I think the 2.x branch will be ready for release.
Sun, Apr 28
OK, so the final request is:
- Have both an old version (0.1.2? 0.1.4?) and 1.0.1 for the seccheck job, just like it happens for the phan job
- Have 1.0.1 enabled for jobs running in the seccheck repo, when seccheck 2.0 will be released.
Sat, Apr 27
Yes, that's the "pseudobot" I mentioned in T187153#5101909 and the comment above. Something on fiwiki is mass-requesting abuselog entries since weeks. Anyway, IMHO having it logged as warning (and out of the exception channel) should make it less noisy... If not, we can safely remove the logging, although the underlying issue should still have high priority.
A little update: I managed to fix several regressions. Right now, only 5 tests are failing: callbackhook, json, multiassign, registerhook and skinjson. For some of these, the cause is https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/SecurityCheckPlugin/+/506491/ which needs to be partly rewritten. Probably, this affects json, registerhook and skinjson (which use passbyref). The reason which makes multiassign fail is already at T216974#5137415 (I didn't investigate any more). No info about callbackhook.
Thu, Apr 25
https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/SecurityCheckPlugin/+/506491/ should partly fix the reference trouble, and depends on phan 1.3.2. However, the refescape test is still failing for the same root cause.
Wed, Apr 24
Now I'm working on fixing the fresh code (i.e. make tests pass). It's not quick, but I have already identified a major problem: the bit of code for passByReference params (in TaintednessBaseVisitor::handleMethodCall) stopped working between 0.8.0 and 0.8.6. That piece of code already had a couple of fixme and todos saying that it had to be rewritten, so I guess the moment has come... The main problem here is that $func->getInternalScope()->getVariableByName( $param->getName() ) now returns an instance of Parameter, which doesn't have the taintedness of the var, while it used to be a PassByReferenceVariable holding much more info (including the taintedness).
At the point of https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/SecurityCheckPlugin/+/506124/, there are 14 integration tests failing:
The patch above should be the last one for this task, and it upgrades phan to 1.3.1 and ast to 1.0.1. I guess at this point we should re-enable CI for the 2.x branch, having PHP 7.2 and ast 1.0.1 there. Then, check if everything works as expected and fix whatever doesn't. Of course I can do that, aside from the CI changes.
Apr 22 2019
This makes sense. But then, isn't MySQL implicitly casting af_id when it has to compare a bigint with a varchar? It's either not casting the column (sounds weird) or doing that in such a way that the index is used.
Apr 21 2019
[XLxsMQpAADsAAImQla8AAACB] /w/index.php?title=Special:Contributions&offset=20190418104744&limit=1&target=KrBot WMFTimeoutException from line 39 of /srv/mediawiki/wmf-config/set-time-limit.php: the execution time limit of 60 seconds was exceeded
@Marostegui Thanks! What if a FORCE INDEX (af_id) is added to the join with abuse_filter? Is it still ignored? That's bad, and it means that we'll really have to go straight with the schema change. Apparently the engine is not casting af_id to string implicitly, or it's doing that differently, or it's a bug in the optimizer, I sincerely don't know.
Apr 20 2019
Yes, that's correct. However I'd encourage to investigate this issue now: you know, logs get garbage collected, people may forget what they did... Generally speaking, it could become hard or impossible to find tracks of this change in a few months. That being said, I checked again, and the first occurrence of "Kogo" on mw1274 is on 18/04 at 9:41 UTC. Nothing suspicious in the SAL at that time. Maybe it could be due to some unlogged change? The ConfigExceptions started at that time with a constant rate of ~10 per minute, so I'd expect the offending change to have happened no more than a couple of minutes before 9:41. But again, the only thing in the SAL around that time is a sync of InitialiseSettings which cannot have caused this issue.
Apr 19 2019
Calling resolved thanks to https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/501677/. The test doesn't fail anymore, see e.g. https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CloseWikis/+/492449/ or anything submitted recently to CloseWikis.
This is a bit difficult to investigate because jenkins logs have been garbage collected and they weren't copied here. Maybe someone could submit a DNM commit in BlueSpiceNamespaceManager and trigger PHP jobs until they fail? Or test it locally with PHP 7.1 if it's reproducible?
Apr 18 2019
Given that they're close on the keyboard, I originally thought of a typo. But that would mean someone edited the file and introduced the typo on mw1274. Could be something more subtle of course.
@mmodell Well, I guess we should always try to keep our code cross-DBMS-compat, and Postgres compatibility is mandatory for including AF in the tarball. That being said, there's no hurry to make it work now, given that it's been broken for 10 years or so... Basing on Logstash and tendril, it doesn't seem like AF is causing too many timeouts, aside from a couple of not-so-common queries on abuse_filter_log. Normalization is always great, although as for every schema change this one would take some time to be merged and deployed.
Thanks @mmodell! Now we have plenty of time to understand how to get this working. However, I fear that our only possibility to keep the cast is to FORCE the index (and I still don't know if it would work). Other than that, I guess we're left with the option to keep the Postgres incompatibility for now and jump straight to T220791. That one would also normalize the schema a bit (so that Postgres won't be the only DBMS to benefit from it), but it won't be quick to do.
@matej_suchanek Ah, so the index isn't used because of the cast? That's interesting, I'll do some experiments. However, what mostly confused me is that the DB engine performs the cast implicitly all the same, and I wouldn't expect a different performance. Probably there's some difference in how such cast is performed. I'll follow up with an update after having analyzed the EXPLAIN more thoroughly.
Indeed, I echo Urbanecm. Giving high priority because a refresh fixes it and the code already looks correct. If you think it should be UBN, feel free to escalate it.
Train blockers are UBN, and this is a TB because it's impossible to open Special:AbuseLog or even use the API (plus hotfix coming).
Well, looking at the message and at this line of code, it seems like it used to read"Kogo" instead of "Logo", although there's no sign of this typo in wmf.25, nor in master.
Although I found it on itwiki where we're always at 1.33-wmf.25...
ConfigException from line 53 of /srv/mediawiki/php-1.33.0-wmf.25/includes/config/GlobalVarConfig.php: GlobalVarConfig::get: undefined option: 'Kogo'
@DannyS712 Thanks for the help! There are some good explanations on the web, just searching "code coverage" on google yields something interesting. Basically, code coverage is the amount of code which gets executed during tests.
Apr 17 2019
@mobrovac Yes, it is, nothing has changed on this side since 2011. Since the plan is to fix the DB storage format and add new back-compat code, we can also start adding the utf-8 flag as part of T213006. The point is: is adding the flag enough?