Page MenuHomePhabricator

Source wiki editing and deletion always fails
Closed, ResolvedPublic8 Estimated Story Points

Description

Status

The source wiki edit and delete features are currently enabled on production, since ~ 11:30 UTC, September 18.

The feature was first enabled and available to our users from roughly 13:00 UTC on July 22 until 13:30 UTC on July 24.

Demo instructions

  • Import a file to commonswiki, as a user without admin privileges (but you need to be confirmed) on the source wiki. Enable "mark source file". Check for the NowCommons template prepended to the source file, in your name.
  • Import a file, as a user with admin privileges on the source wiki. Delete the file. Verify that the original was deleted.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
WMDE-leszek updated the task description. (Show Details)Jul 24 2019, 1:06 PM
WMDE-leszek set the point value for this task to 8.
awight updated the task description. (Show Details)Jul 24 2019, 11:58 PM

One thing I would check is whether this could be caused by wiki configuration for Title#getFullURL. That function requires special flags to return a valid URL rather than relative protocol string, and in T228846 we learned that the beta cluster config gives a different result than production's.

Another angle: can use the feature to import from test2 to testwiki, and recreate the failure in the more productiony environment?

Change 536590 had a related patch set uploaded (by Awight; owner: Awight):
[operations/mediawiki-config@master] Enable source wiki editing for testwiki

https://gerrit.wikimedia.org/r/536590

Once the config patch is deployed (hopefully in next Monday's SWAT), we can use testwiki and test2wiki to experiment with the feature.

Demo file:
https://test.wikipedia.org/wiki/Special:ImportFile?clientUrl=https%3A%2F%2Ftest2.wikipedia.org%2Fwiki%2FFile%3A1471-2156-10-59-1.jpg&importSource=FileExporter

Here's (roughly) a single timeline from one of the failed imports:

All I can make out is that we got an HTTP 403 status from the RemoteApiRequestExecutor. The centralauth errors seem to come afterwards and are probably unrelated. I'll see if there are corresponding logs on the source wiki, huwiki.

The only thing that might be related is this entry from 11:42:46, but it looks healthy. This token ID doesn't show up elsewhere.

Expected key centralauth:api-token:0660b3666423e8445f4c51929faaa5d192f2b6 found.

Nothing in the logs around that time matches the "403", surprisingly. Is it possible that we didn't hit the huwiki API?

The exception is more interesting than I'd originally noticed. The inner exception has the strange message "<Error, collected 1 error(s) on the way, integer value set>", which might be specific to a certain code path. However this is happening, we should unpack the exception to show actual errors.

The exception is more interesting than I'd originally noticed. The inner exception has the strange message "<Error, collected 1 error(s) on the way, integer value set>", which might be specific to a certain code path. However this is happening, we should unpack the exception to show actual errors.

False alarm. This pretty table and the message are part of Status serialization, all it means is that we got a 403.

Change 536811 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/FileImporter@master] Add debug logging for remote API failure

https://gerrit.wikimedia.org/r/536811

Re-reading the task description, I was wrong about the "failed to do remote request" string not appearing. Maybe the search was defeated by a missing whitespace before "failed" (which I've fixed in the above patch). I'll still add some logging to catch the json_decode failure, in case it ever happens.

Change 536974 had a related patch set uploaded (by Awight; owner: Awight):
[operations/mediawiki-config@master] Enable File Importer source wiki edits on beta cluster

https://gerrit.wikimedia.org/r/536974

Change 536811 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Add debug logging for remote API failures

https://gerrit.wikimedia.org/r/536811

Change 536980 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/FileImporter@wmf/1.34.0-wmf.22] Add debug logging for remote API failures

https://gerrit.wikimedia.org/r/536980

Change 536980 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@wmf/1.34.0-wmf.22] Add debug logging for remote API failures

https://gerrit.wikimedia.org/r/536980

Change 536590 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable source wiki editing for testwiki

https://gerrit.wikimedia.org/r/536590

Mentioned in SAL (#wikimedia-operations) [2019-09-16T11:10:09Z] <awight@deploy1001> Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: [[gerrit:536980|Add debug logging for remote API failures (T228851)]] (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2019-09-16T11:14:37Z] <awight@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:536590|Enable source wiki editing for testwiki (T228851)]] (duration: 01m 02s)

Change 536974 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable File Importer source wiki edits on beta cluster

https://gerrit.wikimedia.org/r/536974

Mentioned in SAL (#wikimedia-operations) [2019-09-16T11:21:22Z] <awight@deploy1001> Synchronized wmf-config/CommonSettings-labs.php: SWAT: [[gerrit:536974|Enable File Importer source wiki edits on beta cluster (T228851)]] (duration: 01m 03s)

The good news is that our warning text appears correctly, after a failure:

awight added a comment.EditedSep 16 2019, 12:03 PM

Here's the error logged by testwiki, at 2019-09-16T11:36:28

FileImporter\Remote\MediaWiki\RemoteApiRequestExecutor::doRequest failed to do remote request to test2.wikipedia.orgwith params {"action":"delete","format":"json","token":"765af166a00d33281e602d00137dadfb5d7f73bc+\\","title":"File:038pro01.jpg","reason":"This file is now on Wikipedia as \/\/test.wikipedia.org\/wiki\/File:038pro01.jpg (moved with FileImporter)."}: <Error, collected 1 error(s) on the way, integer value set>

I wish we were logging the centralauthtoken parameter on the URL. <-- TODO

test2wiki said:

/w/api.php?action=query&meta=tokens&format=json&centralauthtoken=bf456fe9262308daee03b3df9aa9700c90cfa7
Expected key centralauth:api-token:bf456fe9262308daee03b3df9aa9700c90cfa7 found.

In two hours or so, once the refinery job has imported webrequests, we can see the API calls with more granularity:

select * from webrequest
where
    uri_host = 'test2.wikipedia.org'
    and year = 2019
    and month = 9
    and day = 16
    and hour = 11;

This is the job which will import webrequest records for the hour above: https://hue.wikimedia.org/hue/jobbrowser/#!id=job_1564562750409_191782

I was able to verify that our delete request hit test2, which eliminates the theory that we were attempting to POST to a relative URL.

hostname sequence dt time_firstbyte ip cache_status http_status response_size http_method uri_host uri_path uri_query content_type referer x_forwarded_for user_agent accept_language x_analytics range is_pageview record_version client_ip geocoded_data x_cache user_agent_map x_analytics_map ts access_method agent_type is_zero referer_class normalized_host pageview_info page_id namespace_id tags isp_data accept webrequest_source year month day hour
cp1085.eqiad.wmnet 1127980291 2019-09-16T11:36:28Z 6.9E-5 2620:0:861:1:208:80:154:49 int-front 403 1941 POST test2.wikipedia.org /w/api.php ?centralauthtoken=841d5656c61ad03a8cdda72ca1930be090cfa7 text/html; charset=utf-8 - NULL mw-ext-FileImporter/* (https://www.mediawiki.org/wiki/Extension:FileImporter) - - - false 0.0.21 2620:0:861:1:208:80:154:49 {"city":"Unknown","subdivision":"Unknown","longitude":"-1.0","timezone":"Unknown","country_code":"--","country":"Unknown","latitude":"-1.0","continent":"Unknown","postal_code":"Unknown"} cp1085 int {"browser_family":"Other","os_major":"-","wmf_app_version":"-","browser_major":"-","os_minor":"-","os_family":"Other","device_family":"Other"} NULL 2019-09-16 11:36:28.0 desktop spider NULL none {"project_class":"wikipedia","project":"test2","qualifiers":[],"tld":"org","project_family":"wikipedia"} NULL NULL NULL [] {"organization":"Unknown","autonomous_system_organization":"Unknown","isp":"Unknown","autonomous_system_number":"-1"} - text 2019 9 16 11

The centralauthtoken is fresh, and isn't used by any other requests.

We can squeeze a little bit more information out with this request, once the data lands in Hadoop:

select * from event.mediawiki_api_request
where
        meta.domain = 'test2.wikipedia.org'
        and year = 2019
        and month = 9
        and day = 16
        and hour = 11;

Strange discovery: the event.mediawiki_api_request log does *not* include our attempted delete action. This means that ApiMain#logRequest is never called, which feels like a clue about what level is emitting the 403. logRequest should be called even if the API call results in an exception. The only ApiMain code which can respond with 403 and not log a request is the CORS check, but that should only be triggered if we include an origin parameter with our request (we do not).

Most of the errors we might suspect here are handled inside of a try block which results in logRequest, for example when:

  • token parameter is required but missing
  • invalid token
  • incorrect user permissions to perform a delete
  • attempt to delete a nonexistent file

So we're looking for something that could fail us earlier than the ApiMain#executeAction.

Change 537323 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/FileImporter@master] Log response headers and actual URL for remote API failures

https://gerrit.wikimedia.org/r/537323

Varnish will reject insecure POST requests using http rather than https. The response status will be 403.

Meanwhile, I realized that we're getting 301 redirects for our other, successful requests. This is almost certainly because we're requesting the insecure URL and being redirected to https, which cannot be done for POST requests.

The "http" theory is gaining steam: in the debug logs, I discovered this line:

2019-09-17 08:29:37 [XYCZbwpAIEIAAHczmr4AAACO] mw1329 testwiki 1.34.0-wmf.22 FileImporter DEBUG: Got cross-wiki, authorized API URL: //test2.wikipedia.org/w/api.php?centralauthtoken=<redacted>

Although we began the import with an https URL, it was converted to relative protocol along the way.

Change 537338 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/FileImporter@master] Use https rather than protcol-relative remote API URLs

https://gerrit.wikimedia.org/r/537338

Change 537345 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/FileImporter@wmf/1.34.0-wmf.22] Use https rather than protcol-relative remote API URLs

https://gerrit.wikimedia.org/r/537345

Change 537338 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Use https rather than protcol-relative remote API URLs

https://gerrit.wikimedia.org/r/537338

Change 537350 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Use wfExpandUrl() instead of hard-coded strings

https://gerrit.wikimedia.org/r/537350

Change 537323 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Log response headers and actual URL for remote API failures

https://gerrit.wikimedia.org/r/537323

Change 537350 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Use wfExpandUrl() instead of hard-coded strings

https://gerrit.wikimedia.org/r/537350

Note to future self: once we burn-in this feature a bit, we'll want to enable it on production, and preferably well in advance of the "default" deployments.

Change 537345 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@wmf/1.34.0-wmf.22] Use https rather than protcol-relative remote API URLs

https://gerrit.wikimedia.org/r/537345

Mentioned in SAL (#wikimedia-operations) [2019-09-17T11:27:23Z] <awight@deploy1001> Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: [[gerrit:537345|Use https rather than protcol-relative remote API URLs (T228851)]] (duration: 00m 58s)

awight added a comment.EditedSep 17 2019, 11:53 AM

Hooray: source wiki deletion works! https://test2.wikipedia.org/wiki/Special:Undelete/File:1471-2156-10-59-1.jpg

Source wiki editing will take some more preparation to verify, I think the sitelinks need to be set up on test.wikidata.org, and both $wgFileImporterWikidataEntityEndpoint and $wgFileImporterWikidataNowCommonsEntity configured for testwiki, to link to a template on test2wiki.

Change 537375 had a related patch set uploaded (by Awight; owner: Awight):
[operations/mediawiki-config@master] NowCommons test & test2wiki configuration

https://gerrit.wikimedia.org/r/537375

Final test configuration scheduled for tomorrow's SWAT.

Change 537586 had a related patch set uploaded (by Awight; owner: Awight):
[operations/mediawiki-config@master] Enable FileImport source wiki editing

https://gerrit.wikimedia.org/r/537586

Change 537375 merged by jenkins-bot:
[operations/mediawiki-config@master] NowCommons test & test2wiki configuration

https://gerrit.wikimedia.org/r/537375

Mentioned in SAL (#wikimedia-operations) [2019-09-18T11:14:55Z] <awight@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:537375|NowCommons test & test2wiki configuration (T228851)]] (duration: 01m 15s)

Change 537586 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable FileImport source wiki editing

https://gerrit.wikimedia.org/r/537586

Mentioned in SAL (#wikimedia-operations) [2019-09-18T11:25:52Z] <awight@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:537586|Enable FileImport source wiki editing (T228851)]] (duration: 01m 03s)

Mentioned in SAL (#wikimedia-operations) [2019-09-18T11:27:02Z] <awight@deploy1001> Synchronized wmf-config/CommonSettings-labs.php: SWAT: [[gerrit:537586|Enable FileImport source wiki editing (T228851)]] (duration: 00m 59s)

Monitoring

awight updated the task description. (Show Details)Sep 18 2019, 11:46 AM

Mentioned in SAL (#wikimedia-operations) [2019-09-18T11:25:52Z] <awight@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:537586|Enable FileImport source wiki editing (T228851)]] (duration: 01m 03s)

This means: the feature is enabled in production!

awight claimed this task.Sep 18 2019, 11:47 AM
thiemowmde closed this task as Resolved.Sep 23 2019, 8:46 AM
thiemowmde added a subscriber: thiemowmde.

Since this was not a new feature but a bugfix, I believe this doesn't need separate PM approval.

Runing a local multiwiki, I believe that our PROTO_HTTPS assumption makes import impossible when the target wiki is HTTP-only. Lines like this convert protocol-relative links to https:

wfExpandUrl( $api, PROTO_HTTPS )

The workaround seems harmless, the target wiki needs to specify the protocol in $wgServer, but this requirement should be documented on the extension page.

Uh, this is a bee nest I don't want to mess with. See T118413: Wikimedia wikis should use https:// in $wgServer.

I also wonder if this comment is misplaced, as this ticket was closed long ago?

Uh, this is a bee nest I don't want to mess with. See T118413: Wikimedia wikis should use https:// in $wgServer.

I also wonder if this comment is misplaced, as this ticket was closed long ago?

Misplaced considering that my conclusion is a suggested edit to mw:Extension:FileImporter. I think you're saying that the implementation is a bee's nest, but do you agree that the workaround makes sense? If a site is http-only, it seems like protocol-relative $wgServer shouldn't serve any real purpose and it's okay to suggest hardcoding the protocol in configuration.

I don't know. Considering the discussion in T118413 I would not encourage anybody to change $wgServer. We can't guarantee this doesn't cause trouble. What we can do in the FileImporter documentation is to document a protocol-relative configuration on a source wiki as a possible source of errors. Maybe some kind of FAQ or "Possible errors" section?