Page MenuHomePhabricator

Tagging file uploads with change tags is difficult bordering on impossible
Closed, ResolvedPublic


Tagging file upload with change tags is difficult bordering on impossible.

  • Uploading code in UploadBase really sucks and doesn't let the user get the log entry and revision id that it creates.
  • Unlike for all other page creations, there's no revision entry inserted into RC, but the upload log entry.
  • Both the revision, log and RC entry have to be tagged.

Related Objects


Event Timeline

matmarex created this task.Dec 18 2015, 4:43 PM
matmarex raised the priority of this task from to Needs Triage.
matmarex updated the task description. (Show Details)
matmarex added a subscriber: matmarex.
Restricted Application added a project: Multimedia. · View Herald TranscriptDec 18 2015, 4:43 PM
Restricted Application added subscribers: StudiesWorld, Steinsplitter, TTO, Aklapper. · View Herald Transcript
Restricted Application added a subscriber: Matanya. · View Herald TranscriptDec 18 2015, 4:44 PM
matmarex added a subscriber: aaron.Dec 18 2015, 8:58 PM

I spent some time digging into this code, starting with UploadBase::performUpload() (which seems to be a main entry point into the whole mess).

The ids are constructed and inserted a few levels deeper in the call stack: UploadBase::performUpload() → LocalFile::upload() → LocalFile::recordUpload2(). Unfortunately, the rev_id and rc_id are inserted inside an 'onTransactionIdle' callback, and I have no idea when that code actually runs.

For cross-wiki uploads (T121873), which are comparatively rare, I implemented tagging as a hack in WikimediaEvents (as of d72dc255ea16acae10df657dd661fe020e72e1d3). It runs on the UploadComplete hook and simply queries the database for the last rev_id and log_id associated with the given title (which LocalFile can provide), in yet another deferred update, hoping that this will be the one that was just created. This feel very brittle and wasteful (we need three queries to master to get the ids we should already know), and it's inconvenient that it has to be deferred (I'm not sure if that even guarantees that it'll work correctly, but it seems to work).

So I suppose we could do one of three things:

  • Pass the tag(s) to add to UploadBase::performUpload() and all the way down to whenever we know the log_id and rev_id. This doesn't seem very neat…
  • Defer change tag updates in UploadBase::performUpload(), and acquire the ids with additional queries, as in my hack. This is probably quite wasteful and not good enough to do for all uploads.
  • Change the code not to defer inserting the DB rows, and to return all the ids it just inserted into the database as part of the Status that UploadBase::performUpload() returns. No idea if this is possible.

@aaron, you rewrote recordUpload2() quite a bit recently. Thoughts?

matmarex claimed this task.Dec 18 2015, 9:39 PM

[22:06] <AaronSchulz> RC rows are deferred to avoid contention and wasting time blocking the request. I wouldn't want to change that. It seems like passing the tags "all the way down" is the best in way in theory...though the current code probably makes that uglier than it should be.

matmarex triaged this task as Normal priority.Dec 18 2015, 9:45 PM

Change 265620 had a related patch set uploaded (by Bartosz Dziewoński):
Make it possible to tag new file uploads without messy queries

Change 265620 merged by jenkins-bot:
Make it possible to tag new file uploads without messy queries

matmarex closed this task as Resolved.Jan 25 2016, 9:01 PM
matmarex removed a project: Patch-For-Review.
matmarex set Security to None.

Not pretty, but it'll do.