Make file uploads patrollable
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Ciell
	Apr 5 2007, 9:07 AM

Description

New articles are yellow in the list
(http://nl.wikipedia.org/wiki/Speciaal:Newpages) untill they are marked as
patrolled; can this be done for new uploads as well?

URL: http://nl.wikipedia.org/wiki/Speciaal:Newimages

Details

Reference: bz9501

	Subject	Repo	Branch	Lines +/-
	RELEASE-NOTES-1.27: Add a note about file upload patrolling	mediawiki/core	master	+3 -0
	Allow patrol of uploads	mediawiki/core	master	+145 -12

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		GTrang	T72518 Tags for page moves should apply to auto-generated edits
Open		None	T70950 rc_this_oldid (resp. log_params) should contain revision ID of null revision (resp. redirect) created by moves
Resolved		Cenarium	T127852 Publish of recent change for a log entry should be deferred
Resolved		Cenarium	T127848 Should be possible to associate a revision to a log entry without making it unpatrolled
Open		None	T70936 rc_this_oldid should contain the revision IDs of null revisions created by log events
Open		None	T134802 Improve the curator workflow for reviewing new files
Open		None	T120453 Copyright violation detection tool for Commons
Resolved		None	T121870 Improve Special:NewFiles
Resolved		Cenarium	T11501 Make file uploads patrollable
Resolved		Cenarium	T122089 Log entries and associated page revisions are not actually associated in the database
Duplicate	Feature	None	T31793 Check uploaded images with Google image search to find copyright violations
Duplicate		None	T123517 Automatically check Commons uploads for possible copyright violations
Duplicate		None	T230561 Create a ML model to score new files in commons for copyvio issues
Resolved		Samwilson	T145165 Investigation: Copyvio tools for Commons
Open		None	T132650 Copyright detection (acoustic fingerprint matching) for audio files

Event Timeline

• bzimport raised the priority of this task from to Low.Nov 21 2014, 9:39 PM

• bzimport added a project: MediaWiki-Patrolling.

• bzimport set Reference to bz9501.

• bzimport added a subscriber: Unknown Object (MLST).

Ciell created this task.Apr 5 2007, 9:07 AM

robchur wrote:

We'll need to alter things so new uploads generate an "unpatrolled" entry
somewhere in the database...possibly worth rethinking part of the patrol model.

What's the progress on this ?

This would be of great help on Commons, currently have do have a manual way:

http://commons.wikimedia.org/wiki/Commons:Recent_uploads_patrol

But it's basicly a lot of double work, since while wandering around on Commons, I come across uploads of people, being able to patrol those reduces the backlogs.

However, the old manual way on the above page would mean I'd have to do an entire day part.

Hopefully with patrol functionality on uploads this can be sorted out better and can be much better organised. (ie. filtering out patrolled uploads)

I asume this change would mean that anyone with the autopatrol right (bots, autopatrollers, sysops) are ofcourse autopatrolled in Uploads aswell.

– Krinkle

content hidden in Bugzilla

After a chat on IRC we came to the conclusion that actually this should be
working right now (partially).

There's basicly two option:

Make the upload-action patrollable (this means that theoratically an upload

without a description page is patrollable, it also means that re-uploads are
patrollable and that it needs to be integrated into the patrol model as
something new (as Rob sugggests above)

Another option:

Creation of File-description pages are currently only patrollable if it was

not the result of an upload (ie. accidental page creation, or for example a
local wikipedia filepage for a Commons-image, such as
http://en.wikipedia.org/w/index.php?title=Special:Log&page=File:Penile-Clitoral+Structure.JPG&hide_review_log=0&hide_patrol_log=0
) this functionality could be extended to cover all file-description pages instead of only the ones not created when uploading.

The latter option seems kinda of a bug that it is not the case already since
this means a lot of page creations are not patrollable. On the other hand I
think it's not wise to implement both because that would cause double work.

Created attachment 7794
rough patch for patrolling uploads

Rough patch to make uploads patrollable.

I haven't tested this very well, some issues that come to mind
*It generates patrol log entries of the form User:foo patrolled revision 0 of file:some_image.png (probably not too difficult to fix)
*It re-uses a bunch of the new page patrol stuff, so the interface seems to suggest you're patrolling a page instead of a file.
*And the big one: It only works if you follow links from RC (this is the case for new page patrol too). Krinkle pointed out that most people wouldn't do new image patrolling from special:recentchanges.

Attached:

uploadPatrol.patch9 KBDownload

(In reply to comment #6)

*And the big one: It only works if you follow links from RC (this is the case
for new page patrol too).

That's bug 15936 for the record.

I'll check the patch out tonight. A few things that come to mind which may or may not be taken care of automatically by MediaWiki if the rc-entry is unpatrolled.

Ability from the API to show only recent uploads (add way to specify rc_log_type and rc_log_action perhaps ?) - including to only show recent unpatrolled uploads
- http://www.mediawiki.org/wiki/API:Query_-_Lists#recentchanges_.2F_rc

On Special:RecentChanges the upload log entries should show a red exclamation mark if it's unpatrolled, and the patrolled ones should be hidden when clicking " Hide patrolled edits" (may need rephrasing as it doesn't just show edits - hasn't been that way for a looong time (log entries are part of RC for quite some time)

Just like NewPage patrolling is rarely done from RecentChanges (as it's unpractical and doesn't provide the filtering options needed) - there is Special:NewPages for that with options to toggle "logged-in users, patrolled edits and bots". The same is the case for files, patrolling files from RecentChanges will most likely not be done as it doesn't give the info needed (size, thumbnail, etc.). Special:NewFiles is suited for this perfectly. These four toggle options have to be added to Special:NewFiles aswell.

When viewing a difference view and the right-side is an upload (ie. re-upload / overwrite) it should show a [mark as patrolled] link in the diff-frame just like it does for edits (was done for edits in diff in r24607 )

Although edits are mostly not new pages, the contrary is with uploads where the majority are 'new' and not 're-uploads'. Which means bug 15936 -like situations (where going to the oldid of the first revision of a file would -not- show the [mark] link) are unacceptable in my opinion, without it it's pretty useless.

Krinkle

Oh, and before I forget.
I was thinking about a variable like $wgUseLogPatrol instead of $wgUserUploadPatrol.
Which would either be a boolean or an array. If boolean it disables or enables patrolling for each log type. Or as an array to only enable it for some

ie. $wgUseLogPatrol = array( 'upload' => true, 'move' => true);

(In reply to comment #8)

On Special:RecentChanges the upload log entries should show a red exclamation

mark if it's unpatrolled, and the patrolled ones should be hidden when clicking

I did that as part of the patch (also made the b marker for bot uploads work too)

Just like NewPage patrolling is rarely done from RecentChanges (as it's

unpractical and doesn't provide the filtering options needed) - there is
Special:NewPages for that with options to toggle "logged-in users, patrolled
edits and bots". The same is the case for files, patrolling files from
RecentChanges will most likely not be done as it doesn't give the info needed
(size, thumbnail, etc.). Special:NewFiles is suited for this perfectly. These
four toggle options have to be added to Special:NewFiles aswell.

That might be an issue, as special:newimages uses the image table, not the RC table. (Special:newimages also does rather weird stuff when filtering bots that seems inefficient (It checks if the user who uploaded is currently a "bot" not that they were a bot when the image was uploaded), but I'm not all that well versed with db efficiency). There doesn't seem to be any indexes on the needed fields in recentchanges, so filtering rc to be only uploads might be inefficient (Again, I don't really understand the intricacies of db efficiency, so take what i say here with salt).

Although edits are mostly not new pages, the contrary is with uploads where

the majority are 'new' and not 're-uploads'. Which means bug 15936 -like
situations (where going to the oldid of the first revision of a file would
-not- show the [mark] link) are unacceptable in my opinion, without it it's
pretty useless.

I don't think that'd be easy to do (At least not in the way I did it in the patch above) since its patrolling the log actions, not edits, so the oldid (revision id) has no relation to what we're patrolling.

The other issues you mention are also things that generally apply and need to be worked on.

In response to added keywords.

Note, my patch was meant more as a starting point, I don't really think its ready/should be committed to trunk at this stage yet.

So what does need doing here, the addition of patrol stuff to [[Special:NewFiles]]? The system is in place, isn't it?

Not really.

What is in place is this:

A generic database implementation for patrolling recent changes entries (any entry, be it RC_EDIT, RC_NEW page or RC_LOG). RC_LOG includes upload actions. So this doesn't need anything for uploads, it is already generically in place. rc_patrolled is toggleable for log actions in theory, and uploads are logged and in recentchanges

A front-end for patrolling edits
- Configurable: Through wgUseRCPatrol[1] // enabled in mw by default, disabled on most wmf wikis, enabled on nl.wikipedia, commons.wikimedia and dozens others
- List: Special:RecentChanges with "Hide patrolled edits"
- Patrol-interface: On the diff pages with "[mark as patrolled]"

A front-end for patrolling new pages
- Configurable: Through wgUseNPPatrol[2] // enabled in mw by default
- List: Special:NewPages with "Hide patrolled edits"
- Patrol-interface: On the bottom when viewing a page, with "[mark as patrolled]" (though subject to bug 15936, which causes the link only to be there when visiting the page from Special:NewPages)

What we need is a front-end for patrolling uploads.

Configurable: Of course
List: We already have two file-list special pages (though plans to merge exist afaik), so it would only need to have a way to indicate the patrol mark and a way to exclude patrolled files from the view.
Patrol interface: I would like if it were given first class treatment like for edits (not like with new pages where it is just dumped at the bottom through the ugly hack of passing it through a query parameter, making it not really related to the page you're looking at).

And contrary to edits/new pages, we need something special for uploads because contrary to regular edits, uploads are log actions. Right now those are not patrollable because Log sets rc_patrolled to false by default.

And then there is the issue of uploads (usually?) causing two events: New page creation and a log event. Though the new page creation is usually not emitted afaik.

[1] https://www.mediawiki.org/wiki/Manual:$wgUseRCPatrol
[1] https://www.mediawiki.org/wiki/Manual:$wgUseNPPatrol

Nemo_bis awarded a token.Dec 12 2014, 8:46 AM

Cenarium subscribed.May 17 2015, 4:03 PM

I plan on uploading a patch set for this and T98617, since it's the same mechanism.
A checkbox in Special:NewFiles would allow to select only unpatrolled files.

Change 211656 had a related patch set uploaded (by Cenarium):
Allow patrol of page moves and uploads

https://gerrit.wikimedia.org/r/211656

gerritbot added a project: Patch-For-Review.May 17 2015, 11:43 PM

Krinkle renamed this task from List/indication of unpatrolled uploaded media files to Make file uploads patrollable.May 18 2015, 12:30 AM

Krinkle updated the task description. (Show Details)

Krinkle set Security to None.

Krinkle removed subscribers: gerritbot, Unknown Object (MLST).

Cenarium mentioned this in T98617: Make page moves patrollable.May 18 2015, 2:26 PM

Ricordisamoa subscribed.Jul 26 2015, 7:41 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 26 2015, 7:41 AM

Change 251795 had a related patch set uploaded (by Cenarium):
Allow patrol of uploads

https://gerrit.wikimedia.org/r/251795

Catrope unsubscribed.Nov 13 2015, 8:21 PM

matmarex added a parent task: T121870: Improve Special:NewFiles.Dec 18 2015, 4:39 PM

matmarex added a subtask: T122089: Log entries and associated page revisions are not actually associated in the database.Dec 21 2015, 10:15 PM

Change 251795 merged by jenkins-bot:
Allow patrol of uploads

https://gerrit.wikimedia.org/r/251795

ReleaseTaggerBot added projects: MW-1.27-release-notes, MW-1.27-release (WMF-deploy-2016-01-12_(1.27.0-wmf.10)).Jan 7 2016, 2:00 AM

Whoo! Thanks, @Cenarium!

I'm going to write a release note about it and drop a note somewhere on Commons.

matmarex closed this task as Resolved.Jan 7 2016, 2:05 AM

matmarex removed a project: Patch-For-Review.

matmarex closed subtask T122089: Log entries and associated page revisions are not actually associated in the database as Resolved.Jan 7 2016, 2:34 AM

matmarex mentioned this in T70936: rc_this_oldid should contain the revision IDs of null revisions created by log events.Jan 7 2016, 3:27 AM

matmarex added a parent task: T70936: rc_this_oldid should contain the revision IDs of null revisions created by log events.Jan 7 2016, 3:29 AM

Johan moved this task from To Triage to In current Tech/News draft on the User-notice board.Jan 7 2016, 1:22 PM

Change 263005 had a related patch set uploaded (by Bartosz Dziewoński):
RELEASE-NOTES-1.27: Add a note about file upload patrolling

https://gerrit.wikimedia.org/r/263005

gerritbot added a project: Patch-For-Review.Jan 7 2016, 11:07 PM

Change 263005 merged by jenkins-bot:
RELEASE-NOTES-1.27: Add a note about file upload patrolling

https://gerrit.wikimedia.org/r/263005

Meno25 unsubscribed.Jan 9 2016, 2:15 PM

Johan moved this task from In current Tech/News draft to Recently announced in Tech/News on the User-notice board.Jan 12 2016, 9:49 AM

Already posted by @Cenarium: https://commons.wikimedia.org/wiki/Commons:Village_pump#Soon_possible_to_patrol_files

matmarex removed a project: Patch-For-Review.Jan 13 2016, 10:14 PM

Johan moved this task from Recently announced in Tech/News to Already announced/Archive on the User-notice board.Jan 18 2016, 6:37 PM

matmarex mentioned this in T120867: Large amounts of unwanted files (mostly copyvios) uploaded via cross-wiki upload tool (A/B test of different upload interfaces).Jan 18 2016, 11:24 PM

Cenarium mentioned this in T127848: Should be possible to associate a revision to a log entry without making it unpatrolled.Feb 23 2016, 5:32 PM

Thank you so much for implementing this feature. The query appears to be a little slow. I repeatedly got

A database query error has occurred. This may indicate a bug in the software.

    Function: IndexPager::buildQueryInfo (NewFilesPager)
    Error: 2013 Lost connection to MySQL server during query (10.64.48.19)

Any idea how to speed it up?

Luke081515 added a subscriber: jcrespo.Feb 28 2016, 6:30 PM

Luke081515 subscribed.

Forgot to mention, this happens at Wikimedia Commons.

In T11501#2070758, @Rillke wrote:
Thank you so much for implementing this feature. The query appears to be a little slow. I repeatedly got
A database query error has occurred. This may indicate a bug in the software.

    Function: IndexPager::buildQueryInfo (NewFilesPager)
    Error: 2013 Lost connection to MySQL server during query (10.64.48.19)
Any idea how to speed it up?

This is T124205.

Note, the page may be a little faster when "Show bots" is checked - https://commons.wikimedia.org/wiki/Special:NewFiles?showbots=1&hidepatrolled=1&limit=50&offset=

Some things I notice about the query (Locally it is

SELECT /* IndexPager::buildQueryInfo (NewFilesPager) Bawolff */  * 
FROM `image` LEFT JOIN `user_groups` ON (ug_group = 'bot' AND (ug_user = img_user))
INNER JOIN `recentchanges` ON ((rc_title = img_name) AND (rc_user = img_user) AND (rc_timestamp = img_timestamp)) 
WHERE (ug_group IS NULL) AND rc_type = '3' AND rc_log_type = 'upload' AND rc_patrolled = '0'
ORDER BY img_timestamp DESC LIMIT 51

Using * as the field means that every column is returned. On some files img_metadata is huge (up to 16 mb). In the worst case this query could return more than 800 mb.

Second, doing explain on tool labs suggest that this filesorts the entire image table on enwiki, and filesorts the recentchanges table on commonswiki. Which is bad.

Maybe we could try and make it at least use the new_name_timestamp index on recentchanges (Since for this query, rc_new = 0 and rc_namespace = 6 and we're ordering by rc_timestamp).

Otherwise I think we need to add some new indexes (on rc_type, rc_patrol, rc_timestamp I guess? Or maybe even move rc_patrol into the image table)

Bawolff mentioned this in T124205: Error: 2013 Lost connection to MySQL server during query on NewFilesPager.Feb 28 2016, 7:26 PM

In T11501#2070814, @Bawolff wrote:

Using * as the field means that every column is returned. On some files img_metadata is huge (up to 16 mb). In the worst case this query could return more than 800 mb.

Fixed very recently in https://gerrit.wikimedia.org/r/#/c/269429/.

Second, doing explain on tool labs suggest that this filesorts the entire image table on enwiki, and filesorts the recentchanges table on commonswiki. Which is bad.

I don't think it does any filesorting after T124205.

In T11501#2070870, @matmarex wrote:

In T11501#2070814, @Bawolff wrote:

Using * as the field means that every column is returned. On some files img_metadata is huge (up to 16 mb). In the worst case this query could return more than 800 mb.

Fixed very recently in https://gerrit.wikimedia.org/r/#/c/269429/.

	F3847: uploadPatrol.patch
	Nov 21 2014, 9:39 PM

Make file uploads patrollableClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Make file uploads patrollable
Closed, ResolvedPublic
Actions

Related Objects
Search...