Page MenuHomePhabricator

Tagging file uploads with change tags is difficult bordering on impossible
Closed, ResolvedPublic

Description

Tagging file upload with change tags is difficult bordering on impossible.

  • Uploading code in UploadBase really sucks and doesn't let the user get the log entry and revision id that it creates.
  • Unlike for all other page creations, there's no revision entry inserted into RC, but the upload log entry.
  • Both the revision, log and RC entry have to be tagged.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
ResolvedNone
Resolvedmatthiasmullie
Resolvedmatmarex
DuplicateFeatureNone
DuplicateNone
DuplicateNone
ResolvedSamwilson
OpenNone
Resolvedmatthiasmullie
Resolvedmatmarex
OpenNone
Resolvedmatmarex
DeclinedFeatureNone
DuplicateNone
Resolvedmatmarex
Resolvedmatmarex

Event Timeline

matmarex raised the priority of this task from to Needs Triage.
matmarex updated the task description. (Show Details)
matmarex subscribed.
Restricted Application added subscribers: StudiesWorld, Steinsplitter, TTO, Aklapper. · View Herald Transcript

I spent some time digging into this code, starting with UploadBase::performUpload() (which seems to be a main entry point into the whole mess).

The ids are constructed and inserted a few levels deeper in the call stack: UploadBase::performUpload() → LocalFile::upload() → LocalFile::recordUpload2(). Unfortunately, the rev_id and rc_id are inserted inside an 'onTransactionIdle' callback, and I have no idea when that code actually runs.

For cross-wiki uploads (T121873), which are comparatively rare, I implemented tagging as a hack in WikimediaEvents (as of d72dc255ea16acae10df657dd661fe020e72e1d3). It runs on the UploadComplete hook and simply queries the database for the last rev_id and log_id associated with the given title (which LocalFile can provide), in yet another deferred update, hoping that this will be the one that was just created. This feel very brittle and wasteful (we need three queries to master to get the ids we should already know), and it's inconvenient that it has to be deferred (I'm not sure if that even guarantees that it'll work correctly, but it seems to work).

So I suppose we could do one of three things:

  • Pass the tag(s) to add to UploadBase::performUpload() and all the way down to whenever we know the log_id and rev_id. This doesn't seem very neat…
  • Defer change tag updates in UploadBase::performUpload(), and acquire the ids with additional queries, as in my hack. This is probably quite wasteful and not good enough to do for all uploads.
  • Change the code not to defer inserting the DB rows, and to return all the ids it just inserted into the database as part of the Status that UploadBase::performUpload() returns. No idea if this is possible.

@aaron, you rewrote recordUpload2() quite a bit recently. Thoughts?

[22:06] <AaronSchulz> RC rows are deferred to avoid contention and wasting time blocking the request. I wouldn't want to change that. It seems like passing the tags "all the way down" is the best in way in theory...though the current code probably makes that uglier than it should be.

matmarex triaged this task as Medium priority.Dec 18 2015, 9:45 PM

Change 265620 had a related patch set uploaded (by Bartosz Dziewoński):
Make it possible to tag new file uploads without messy queries

https://gerrit.wikimedia.org/r/265620

Change 265620 merged by jenkins-bot:
Make it possible to tag new file uploads without messy queries

https://gerrit.wikimedia.org/r/265620

matmarex removed a project: Patch-For-Review.
matmarex set Security to None.

Not pretty, but it'll do.