|mediawiki/core : master||RELEASE-NOTES-1.27: Add a note about file upload patrolling|
|mediawiki/core : master||Allow patrol of uploads|
- Mentioned In
- T124205: Error: 2013 Lost connection to MySQL server during query on NewFilesPager
T127848: Should be possible to associate a revision to a log entry without making it unpatrolled
T120867: Large amounts of unwanted files (mostly copyvios) uploaded via cross-wiki upload tool (A/B test of different upload interfaces)
T70936: rc_this_oldid should contain the revision IDs of null revisions created by log events
T98617: Make page moves patrollable
- Mentioned Here
- T86611: API does not fail gracefully when data is too large
T124205: Error: 2013 Lost connection to MySQL server during query on NewFilesPager
T98617: Make page moves patrollable
What's the progress on this ?
This would be of great help on Commons, currently have do have a manual way:
But it's basicly a lot of double work, since while wandering around on Commons, I come across uploads of people, being able to patrol those reduces the backlogs.
However, the old manual way on the above page would mean I'd have to do an entire day part.
Hopefully with patrol functionality on uploads this can be sorted out better and can be much better organised. (ie. filtering out patrolled uploads)
I asume this change would mean that anyone with the autopatrol right (bots, autopatrollers, sysops) are ofcourse autopatrolled in Uploads aswell.
After a chat on IRC we came to the conclusion that actually this should be
working right now (partially).
There's basicly two option:
- Make the upload-action patrollable (this means that theoratically an upload
without a description page is patrollable, it also means that re-uploads are
patrollable and that it needs to be integrated into the patrol model as
something new (as Rob sugggests above)
- Creation of File-description pages are currently only patrollable if it was
not the result of an upload (ie. accidental page creation, or for example a
local wikipedia filepage for a Commons-image, such as
) this functionality could be extended to cover all file-description pages instead of only the ones not created when uploading.
The latter option seems kinda of a bug that it is not the case already since
this means a lot of page creations are not patrollable. On the other hand I
think it's not wise to implement both because that would cause double work.
Created attachment 7794
rough patch for patrolling uploads
Rough patch to make uploads patrollable.
I haven't tested this very well, some issues that come to mind
*It generates patrol log entries of the form User:foo patrolled revision 0 of file:some_image.png (probably not too difficult to fix)
*It re-uses a bunch of the new page patrol stuff, so the interface seems to suggest you're patrolling a page instead of a file.
*And the big one: It only works if you follow links from RC (this is the case for new page patrol too). Krinkle pointed out that most people wouldn't do new image patrolling from special:recentchanges.
I'll check the patch out tonight. A few things that come to mind which may or may not be taken care of automatically by MediaWiki if the rc-entry is unpatrolled.
- Ability from the API to show only recent uploads (add way to specify rc_log_type and rc_log_action perhaps ?) - including to only show recent unpatrolled uploads
- On Special:RecentChanges the upload log entries should show a red exclamation mark if it's unpatrolled, and the patrolled ones should be hidden when clicking " Hide patrolled edits" (may need rephrasing as it doesn't just show edits - hasn't been that way for a looong time (log entries are part of RC for quite some time)
- Just like NewPage patrolling is rarely done from RecentChanges (as it's unpractical and doesn't provide the filtering options needed) - there is Special:NewPages for that with options to toggle "logged-in users, patrolled edits and bots". The same is the case for files, patrolling files from RecentChanges will most likely not be done as it doesn't give the info needed (size, thumbnail, etc.). Special:NewFiles is suited for this perfectly. These four toggle options have to be added to Special:NewFiles aswell.
- When viewing a difference view and the right-side is an upload (ie. re-upload / overwrite) it should show a [mark as patrolled] link in the diff-frame just like it does for edits (was done for edits in diff in r24607 )
- Although edits are mostly not new pages, the contrary is with uploads where the majority are 'new' and not 're-uploads'. Which means bug 15936 -like situations (where going to the oldid of the first revision of a file would -not- show the [mark] link) are unacceptable in my opinion, without it it's pretty useless.
Oh, and before I forget.
I was thinking about a variable like $wgUseLogPatrol instead of $wgUserUploadPatrol.
Which would either be a boolean or an array. If boolean it disables or enables patrolling for each log type. Or as an array to only enable it for some
ie. $wgUseLogPatrol = array( 'upload' => true, 'move' => true);
(In reply to comment #8)
- On Special:RecentChanges the upload log entries should show a red exclamation
mark if it's unpatrolled, and the patrolled ones should be hidden when clicking
I did that as part of the patch (also made the b marker for bot uploads work too)
- Just like NewPage patrolling is rarely done from RecentChanges (as it's
unpractical and doesn't provide the filtering options needed) - there is
Special:NewPages for that with options to toggle "logged-in users, patrolled
edits and bots". The same is the case for files, patrolling files from
RecentChanges will most likely not be done as it doesn't give the info needed
(size, thumbnail, etc.). Special:NewFiles is suited for this perfectly. These
four toggle options have to be added to Special:NewFiles aswell.
That might be an issue, as special:newimages uses the image table, not the RC table. (Special:newimages also does rather weird stuff when filtering bots that seems inefficient (It checks if the user who uploaded is currently a "bot" not that they were a bot when the image was uploaded), but I'm not all that well versed with db efficiency). There doesn't seem to be any indexes on the needed fields in recentchanges, so filtering rc to be only uploads might be inefficient (Again, I don't really understand the intricacies of db efficiency, so take what i say here with salt).
- Although edits are mostly not new pages, the contrary is with uploads where
the majority are 'new' and not 're-uploads'. Which means bug 15936 -like
situations (where going to the oldid of the first revision of a file would
-not- show the [mark] link) are unacceptable in my opinion, without it it's
I don't think that'd be easy to do (At least not in the way I did it in the patch above) since its patrolling the log actions, not edits, so the oldid (revision id) has no relation to what we're patrolling.
The other issues you mention are also things that generally apply and need to be worked on.
What is in place is this:
- A generic database implementation for patrolling recent changes entries (any entry, be it RC_EDIT, RC_NEW page or RC_LOG). RC_LOG includes upload actions. So this doesn't need anything for uploads, it is already generically in place. rc_patrolled is toggleable for log actions in theory, and uploads are logged and in recentchanges
- A front-end for patrolling edits
- Configurable: Through wgUseRCPatrol // enabled in mw by default, disabled on most wmf wikis, enabled on nl.wikipedia, commons.wikimedia and dozens others
- List: Special:RecentChanges with "Hide patrolled edits"
- Patrol-interface: On the diff pages with "[mark as patrolled]"
- A front-end for patrolling new pages
- Configurable: Through wgUseNPPatrol // enabled in mw by default
- List: Special:NewPages with "Hide patrolled edits"
- Patrol-interface: On the bottom when viewing a page, with "[mark as patrolled]" (though subject to bug 15936, which causes the link only to be there when visiting the page from Special:NewPages)
What we need is a front-end for patrolling uploads.
- Configurable: Of course
- List: We already have two file-list special pages (though plans to merge exist afaik), so it would only need to have a way to indicate the patrol mark and a way to exclude patrolled files from the view.
- Patrol interface: I would like if it were given first class treatment like for edits (not like with new pages where it is just dumped at the bottom through the ugly hack of passing it through a query parameter, making it not really related to the page you're looking at).
And contrary to edits/new pages, we need something special for uploads because contrary to regular edits, uploads are log actions. Right now those are not patrollable because Log sets rc_patrolled to false by default.
And then there is the issue of uploads (usually?) causing two events: New page creation and a log event. Though the new page creation is usually not emitted afaik.
Thank you so much for implementing this feature. The query appears to be a little slow. I repeatedly got
A database query error has occurred. This may indicate a bug in the software. Function: IndexPager::buildQueryInfo (NewFilesPager) Error: 2013 Lost connection to MySQL server during query (10.64.48.19)
Any idea how to speed it up?
Note, the page may be a little faster when "Show bots" is checked - https://commons.wikimedia.org/wiki/Special:NewFiles?showbots=1&hidepatrolled=1&limit=50&offset=
Some things I notice about the query (Locally it is
SELECT /* IndexPager::buildQueryInfo (NewFilesPager) Bawolff */ * FROM `image` LEFT JOIN `user_groups` ON (ug_group = 'bot' AND (ug_user = img_user)) INNER JOIN `recentchanges` ON ((rc_title = img_name) AND (rc_user = img_user) AND (rc_timestamp = img_timestamp)) WHERE (ug_group IS NULL) AND rc_type = '3' AND rc_log_type = 'upload' AND rc_patrolled = '0' ORDER BY img_timestamp DESC LIMIT 51
Using * as the field means that every column is returned. On some files img_metadata is huge (up to 16 mb). In the worst case this query could return more than 800 mb.
Second, doing explain on tool labs suggest that this filesorts the entire image table on enwiki, and filesorts the recentchanges table on commonswiki. Which is bad.
Maybe we could try and make it at least use the new_name_timestamp index on recentchanges (Since for this query, rc_new = 0 and rc_namespace = 6 and we're ordering by rc_timestamp).
Otherwise I think we need to add some new indexes (on rc_type, rc_patrol, rc_timestamp I guess? Or maybe even move rc_patrol into the image table)