Page MenuHomePhabricator

Implement way of querying Proofread Page Status of a Page: (or revision) from the databases directly
Open, LowPublic

Description

Currently on English Wikisource Proofread page uses an embedded string at the head of the page to find the status of a page and the user that changed that status.

This means that whilst a search like:
insource: /pagequality level\=\"4\" user\=\"ShakespeareFan00\" / or
insource: /pagequality level\=\"4\" user\=\"Sfan00\" /

will find pages that I validated, it's a lot less easy to determine if these were later downgraded or re-validated over concerns about the original validation.

It's also possible to examine the status be cross-referencing the Category_links table.

However it would be appreciated if there was a way to query the Proofread Page status of any given Page: more directly such that a complex query can be reduced to a very simple WHERE clause based on a numeric test.

I'm not sure how this could be implemented in the long term though

Event Timeline

https://quarry.wmflabs.org/query/20680 is a partial soloution, but it can't easily track when something's status changed.

The pageprop mechanism seems a good way to allow such things (and will remove the need for the extension to rely on categories): https://www.mediawiki.org/wiki/Manual:Page_props_table

That was my thinking... The categories are useful though

My other additional thought was to have an additional "revision properties" table, that was like page properties but applied to specific revisions, so you can track status changes more closely by examining specific entries in a revision properties log_table.

Side question, how and where are certain types of "logged" edit summaries stored? (Examining that might be a possible suggestion on how revision properties might be added.)

The other issue would be recording in the database (possibly with a revision) who changed the status.

We could use the revision tag system and add a tag for each revision giving the status or a tag for each status change. See: https://www.mediawiki.org/wiki/Manual:Tags
I believe that storing which user changed the status in the database is not useful because it could be easily retrieved from the revision table (especially if we choose to have specific tags for each status change)

Hmm so the /* Problematic */ etc auto summaries become proper revision tags?

That makes sense. :)

Hmm so the /* Problematic */ etc auto summaries become proper revision tags?

It's the goal :-).

And how would this allow the sort of query I had in mind in my use case?

(I am assuming you do a Quarry query and cross reference the tag table as well, but I am not as up on SQL stuff with mediawiki as I perhaps should be.)

And how would this allow the sort of query I had in mind in my use case?

You do two joins between the revision table and the change_tag table (one for the current revision and one for the previous revision using the rev_parent_id column) and you use a WHERE close to filter on the previous/current revision status (using the change_tag table). You could retrieve the editing user with the rev_user column. With that you could retrieve e.g. all the validations done by a given user.

Which was my use case, Thanks...

The next problem is how someone could change proofread page in a non-breaking way to support this in future. Hmmm...

The next problem is how someone could change proofread page in a non-breaking way to support this in future. Hmmm...

It should not be too hard to add proper revision tags for new revisions. Then we could run a database update script to fill the tags for older revisions. When every revision is going to have their tag we could start to use them in ProofreadPage code instead of categories. But it's not a small task.

That's appreciated, and kind of why I'd tagged this as longer-term.

Still it would be VERY useful.

The next problem is how someone could change proofread page in a non-breaking way to support this in future. Hmmm...

It should not be too hard to add proper revision tags for new revisions. Then we could run a database update script to fill the tags for older revisions. When every revision is going to have their tag we could start to use them in ProofreadPage code instead of categories. But it's not a small task.

To clarify, is this adding, eg, validated as a tag to every revision of the page that is validated, or just to the edit that sets the status to validated?

Ideally, the status of individual revisions, and a status "change" indication should be different tags or properties.

My intended use case was doing a query to find out which pages I had changed from 'proofread' to 'validated', or "Not proofread' to 'Proofread' so I could review my efforts more closely, given concerns about incomplete validation, or proofreading which missed things. As there would be little point in re-examining pages that had been validated by othera (and which i perhaps edited for minor things like Lint errors, or minor typos) the ability to remove those minor edits from query results would be desirable.

Change 702798 had a related patch set uploaded (by Inductiveload; author: Inductiveload):

[mediawiki/extensions/ProofreadPage@master] WIP: Apply proofread level to page revisions

https://gerrit.wikimedia.org/r/702798

Change 702798 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Apply proofread levels to page revisions as change tags

https://gerrit.wikimedia.org/r/702798

Inductiveload renamed this task from Implement way of querying Proofread Page Status of a Page: (or revision) from the databases directly... to Implement way of querying Proofread Page Status of a Page: (or revision) from the databases directly.Aug 18 2021, 12:57 PM

While I am for adding such revision tags I am against migrating to and depending the value there (which is good for watching, etc.).

While I agree we should migrate to a better format, I recommend a move to an MCR slot as per: T291293: ProofreadPage: Move page proofreading status to an MCR slot.