Create an extension for storing WikiProject article assessment metadata in a database table via a parser function
Closed, ResolvedPublic13 Estimated Story Points
Actions

Assigned To

Authored By

	kaldari
	Oct 30 2015, 12:37 AM

Description

Right now WikiProject article assessment metadata is all stored in WikiText in templates on article talk pages (although some of it can also be accessed via categories). This means that there are no easy ways to query this data and it has to be aggregated and reported through scripts and bots.

Then each time the template was transcluded on an article talk page, it would add an entry into a table like so:

+-------------------------------------------------------------------------+
| Page       | Namespace  | Project    | Class   | Importance | Revision  |
+-------------------------------------------------------------------------+
| First aid  | 0          | Medicine   | C       | High       | 73264     |
+-------------------------------------------------------------------------+

It should also record a log entry each time an assessment is updated.

Once this table exists, developers (WMF or volunteer) can create tools that help WikiProjects organize their ratings.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T120219 PageAssessments deployment to WMF wikis
Resolved	Niharika	T120849 Write tests for page assessment tool
Resolved	Niharika	T117142 Create an extension for storing WikiProject article assessment metadata in a database table via a parser function

Event Timeline

kaldari created this task.Oct 30 2015, 12:37 AM

kaldari raised the priority of this task from to Needs Triage.

kaldari updated the task description. (Show Details)

kaldari added projects: Community-Tech, WikiProject-X.

kaldari subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 30 2015, 12:37 AM

kaldari added a parent task: T117116: [DO NOT USE] Make life better for WikiProjects (tracking) [superseded by #WikiProject-tools].Oct 30 2015, 12:37 AM

Harej subscribed.Oct 30 2015, 12:40 AM

Ricordisamoa subscribed.Oct 30 2015, 11:18 AM

• DannyH triaged this task as Medium priority.Oct 30 2015, 4:29 PM

• DannyH moved this task from New & TBD Tickets to Older: Team Work on the Community-Tech board.

• DannyH set Security to None.

Harej moved this task from Needs Triage to Requests on the WikiProject-X board.Nov 5 2015, 1:36 AM

If portability between projects is a concern, I would recommend using a term other than "WikiProject," which as far as I know is only used on a handful of Wikimedia projects (and is itself a fairly ambiguous term if you don't already know what it means). I would recommend something like "editorial review".

We should talk to WikiProject (or equivalent) users on non-English Wikipedia sites to see what would work best for them (as far as what metadata to store), but we need to make sure we keep it simple as this could easily spiral into Wikidata-lite.

Assessment for prioritization:
Support: Medium (came out of discussions with WikiProject X, but unclear how much demand there is for this outside of en.wiki)
Feasibility: Medium (fairly clear scope, but involves creating a new extension and probably a new database table)
Impact: High (would replace a lot of existing bot/script functionality like WP1.0 Bot, and would be available to all projects)
Risk: Medium (need to make sure that it will work for all projects, but need to keep scope under control)

Priority: Normal

• DannyH moved this task from Older: Team Work to Needs Discussion on the Community-Tech board.Nov 5 2015, 6:48 PM

@Harej: Do you know any community folks who are active in non-English Wikipedia WikiProjects that we could talk to?

• DannyH mentioned this in T116093: Investigation: What can we do for WikiProjects?.Nov 5 2015, 10:57 PM

This was my GSoC project years ago (https://www.mediawiki.org/wiki/User:Yuvipanda/GSoC) and https://phabricator.wikimedia.org/diffusion/SVN/browse/trunk/extensions/SelectionSifter/ is the code. Not sure how much of it is useful, however!

@yuvipanda: Do you think that using a parser function (rather than encoding data in the HTML and reparsing it out), is a good solution? I imagine it might have a tiny negative effect on parsing performance, but since assessment templates are only included on talk pages, I don't think anyone would notice.

• DannyH updated the task description. (Show Details)Nov 6 2015, 12:11 AM

Halfak subscribed.Nov 6 2015, 12:15 AM

Looks like a great idea to me. I'm going to describe a niche use-case that is probably out of scope, but I figured it may be valuable to record it in case it's easy to lump on.

I've been building machine learning models to predict quality assessments. I use the current template systems as training data for the models. In order to do this effectively, I need to work out which revision of what page was assessed. This is complex and error-prone due to the inconsistent use of templates and the presence of the template of the talk page. There are a surprising amount of *broken redirects* that get assessed as Featured Articles in English Wikipedia if you follow that strategy naively. ;)

If your implementation can save assessments historically and include information about the relevant revision of the article at the time of assessment, I would find that very valuable. This wouldn't need to be a substantial part of the feature; it could be done with the logging table. I imagine something like log_type="assessment" and log_action="updated" that would behave like log_type="rights" and log_action="rights" and have the log_page field set to the page that was assessed (not the talk page please).

@kaldari yes, parser functions definitely sound better. I can't think of any reasons to use the HTML reparsing outside of Yuvi in 2011 was more stupid...

@Halfak: Good point. I think it would be useful to at least record the revision ID as part of the assessment data. Using the logging table to log changes is also a good idea. I'll update the task description.

kaldari updated the task description. (Show Details)Nov 6 2015, 1:00 AM

kaldari mentioned this in T117124: Investigation: Discussion module for WikiProjects that lists the most recent new topics on talk pages in that category.Nov 6 2015, 7:03 PM

kaldari updated the task description. (Show Details)Nov 6 2015, 7:21 PM

• DannyH edited a custom field.Nov 6 2015, 7:25 PM

• DannyH moved this task from Needs Discussion to Up Next (June 3-21) on the Community-Tech board.

Note from the sprint planning meeting: After we finish this task, there are two follow-up tasks -- Adding the tables in production and a security review.

In T117142#1790271, @DannyH wrote:

Note from the sprint planning meeting: After we finish this task, there are two follow-up tasks -- Adding the tables in production and a security review.

There are significantly more steps required to deploy a new extension, see the checklist on https://www.mediawiki.org/wiki/Review_queue

Legoktm added a project: MediaWiki-extension-requests.Nov 9 2015, 4:46 AM

• DannyH added a project: WikiProject-tools.Nov 19 2015, 7:13 PM

Krenair removed a project: WikiProject-tools.Nov 19 2015, 7:38 PM

Krenair added a project: WikiProject-tools.Nov 20 2015, 12:49 AM

Krenair subscribed.

• DannyH edited projects, added Community-Tech-Sprint; removed Community-Tech.Nov 24 2015, 6:10 PM

Niharika claimed this task.Nov 25 2015, 12:40 PM

Niharika moved this task from Ready to In Development on the Community-Tech-Sprint board.

@Harej, I got the basic extension implementation in place, yay. I had a question - how often is it that people would remove the assessment template from a page and we'd need to remove the record from the DB? Would you be having any idea how significant this problem is?

Right now the extension is triggered every time a page having the assessment parser function is saved, but in order to take care of removed templates, we'd need to trigger the extension on every (Talk) page save.

Another thing, if you have other ideas for other features we could have in this extension, feel free to open tickets so we can work on them as we go along. Thank you!

• DannyH mentioned this in T119997: API for WikiProject article assessment data.Dec 1 2015, 6:35 PM

• DannyH added a parent task: T119997: API for WikiProject article assessment data.Dec 1 2015, 6:45 PM

Harej moved this task from Requests to In Progress on the WikiProject-X board.Dec 2 2015, 3:45 PM

@NiharikaKohli, regarding removal, I don't know that it happens a lot but it does happen from time to time (usually when a WikiProject is deemed defunct) so it is a situation worth accounting for. In lieu of triggering the extension on every talk page save, could you have a maintenance script that checks talk pages for deleted parser functions?

For other ideas: The most useful thing for me is the association between pages and WikiProjects. That information is the backbone of the other WikiProject X reports. What I am wondering, though, is if this extension could be used to generate the reports directly. I don't think it would be sensible to bake these reports directly into the extension, but rather, a generic framework for producing reports based on the data.

Some approaches to reports:

Intersections between the WikiProject table and a category (WikiProject Chemistry pages in need of Expert Attention, for example)
Subsets of a WikiProject table (all WikiProject Chemistry pages missing quality assessments, or for showcasing jobs well done, all the WikiProject Chemistry articles that are featured articles)

Building report generation function directly into this extension helps make it easier for others to do it; currently if you want a report, you have to either (a) know that wikiproject.json exists and you can edit it or (b) get me to do it, and that's for the specific set of reports my tools make available. Building in report generation also helps this extension realize its potential as an editorial curation tool; not only is the information stored, it can be analyzed and presented in meaningful ways.

Harej mentioned this in T105747: Enable the listing of content pages from Special:RecentChangesLinked when the associated talk page is linked..Dec 3 2015, 9:40 AM

Niharika added a parent task: T120219: PageAssessments deployment to WMF wikis.Dec 3 2015, 3:04 PM

Slaporte awarded a token.Dec 3 2015, 6:16 PM

@Harej: One of the things we're talking about adding to this extension is an API for querying the data (T119997). That should make report generation a lot easier. We're hoping that the community will use this API to generate a wide variety of different reports. Your input on what types of queries it should support would be useful. So far, we only have 2 use cases listed: Returning all the assessment data for an article, and returning all the assessments for a WikiProject.

He7d3r awarded a token.Dec 6 2015, 12:22 PM

He7d3r subscribed.

Consider also allowing people to add (optionally) the reason (a short text) why they think an article must have assessment X instead of Y.

@He7d3r: That's not a bad idea, although our current plan is to piggyback on the existing assessment templates, which don't have such a parameter (at least on English and French Wikipedias).

• DannyH mentioned this in T120849: Write tests for page assessment tool.Dec 8 2015, 6:13 PM

• DannyH added a subtask: T120849: Write tests for page assessment tool.

kaldari removed a subtask: T120849: Write tests for page assessment tool.Dec 10 2015, 4:33 AM

kaldari added a parent task: T120849: Write tests for page assessment tool.

kaldari added a parent task: T121068: Have PageAssessments store all the assessments in ExtensionData until the page is finished parsing.Dec 10 2015, 4:40 AM

Niharika mentioned this in T120219: PageAssessments deployment to WMF wikis.Dec 10 2015, 12:12 PM

kaldari removed a parent task: T121068: Have PageAssessments store all the assessments in ExtensionData until the page is finished parsing.Dec 10 2015, 6:06 PM

Stevietheman subscribed.Dec 12 2015, 2:58 PM

I've been thinking about this over the past few days, and I don't think adding a parser function to store this info in a database table is a substantial benefit given that it's already available through categorylinks mostly. A parser function has the downsides of requiring wikitext and complex wrapper templates, meaning that any tool/human that wants to write data or make an assessment still needs to navigate the mess of templates. There's also the whole awkwardness that the parser function goes on the talk page instead of the actual page.

An alternative idea I've been thinking of would be to move the ratings to Wikidata (property like "English Wikipedia WikiProject rating" -> "B-class" qualified by "WikiProject" -> "WikiProject Birds"). A MediaWiki extension might provide some simplified Lua bindings, an api.php query/prop module, and a special page to generate WP:1.0-style reports using the Wikidata query service. It would also provide a JavaScript/OOUI widget of some kind that would let you edit the ratings directly on the Wikipedia itself (like the current sitelinks widget). I think this would give us a lot of flexibility in how the data can be read and written, not locking us into wikitext banner templates or something.

The main problems I forsee with such a solution would be 1) Whether Wikidata would accept this kind of data and 2) Initial import of said data. At least for #2, I can volunteer for helping with that :)

As I commented on IRC, it would require Wikidata to play along. This cannot be guaranteed. However, a proposal they might consider would entail a generic “Wikimedia editorial assessment” property, with qualifiers for project (English Wikipedia) and WikiProject (for which there should be Wikidata items).

Another problem is that as a meta property, it may not necessarily belong on a Wikidata item. After all, the Wikidata item for “San Francisco” is not about the Wikipedia article on San Francisco, but is about the city (and county) in California. Items are necessarily tied to concepts, rather than specific articles.

I do know that Legoktm has used Wikidata to power an extension to great success, so it’s certainly something we can consider.

The proposed properties were rejected on Wikidata:

but these are listed in the development plan for badges:
https://www.wikidata.org/wiki/Wikidata:Development_plan#Badges

I'm surprised the Wikidata people didn't create a "MetaWikidata" for these purposes.

He7d3r added a subscriber: Lydia_Pintscher.Dec 13 2015, 1:39 PM

In T42810#460662, @Ricordisamoa wrote:

I think of an extension that would allow sysops and 'quality rating administrators' to edit quality status of pages; then such badges would be displayed in the 'Other languages' section, when linked from another language version.

@Legoktm: The idea of storing assessments in Wikidata has been suggested many times, but consistently rejected by the Wikidata community. The badges feature is for storing a single quality assessment for a page (such as "Good Article"). That's great for interlanguage links but not really useful for WikiProjects (the target user group for this feature). There is no plan for adding WikiProject-level assessments to badges, i.e. triplets of [ project : importance : quality ]. Keep in mind that the importance assessment in particular typically varies by project.

Using the categorylinks is a possibility (that's what WP1.0 bot uses), but it's English Wikipedia specific as you have to know the naming convention for the categories as set by the templates, including which categories are importance related and which are quality related.

I'm open to other ideas for how to accomplish this, but so far I haven't been able to come up with any realistic ideas other than using a template-embedded parser function.

Niharika moved this task from In Development to Needs Review/Feedback on the Community-Tech-Sprint board.Dec 16 2015, 4:19 PM

The first iteration for this is basically done, but is awaiting security review.

Adding Job Queue functionality is in T121069. Adding an API is in T119997.

Harej moved this task from In Progress to Radar on the WikiProject-X board.Jan 7 2016, 12:37 AM

Harej mentioned this in T123028: Deploy CollaborationKit on English Wikipedia.Jan 8 2016, 5:08 AM

• DannyH moved this task from Needs Review/Feedback to Q3 2018-19 on the Community-Tech-Sprint board.Jan 11 2016, 6:11 PM

• DannyH closed this task as Resolved.Jan 13 2016, 6:13 PM

• DannyH moved this task from Q3 2018-19 to Q1 2018-19 on the Community-Tech-Sprint board.

• DannyH edited projects, added Community-Tech; removed Community-Tech-Sprint.Jan 19 2016, 9:04 PM

• DannyH moved this task from Up Next (June 3-21) to Archive on the Community-Tech board.

kaldari mentioned this in T120230: Get code review for PageAssessments extension done.Jan 25 2016, 8:56 PM

In T117142#1878553, @kaldari wrote:

@Legoktm: The idea of storing assessments in Wikidata has been suggested many times, but consistently rejected by the Wikidata community.

Is there evidence to support the "consistently rejected" claim? I see two discussions on Wikidata, both from March 2014 and both have limited participation. And even in those small discussions, people were somewhat amenable to the idea, but there was a desire to start with badges and then re-evaluate in the future. Are there other discussions/rejections from Wikidata?

@MZMcBride: Even if Wikidata didn't reject storing article assessments, this is still a simpler solution (especially for querying the data), and it doesn't preclude the option of using Wikidata in the future. This is just a simple hack to get the assessment data from the existing templates into a structured and consistent format so that community developers, such as yourself, have easy access to the data. It doesn't require any changes to anyone's workflow, it doesn't require any modifications to existing software, and it only requires a small bit of developer effort to implement (most of which is already done). I'm not sure why you are so opposed to the implementation of this extension, as I really don't see any down-side to it. If you really think it would make more sense to store this data in Wikidata, you are welcome to try to convince the community there. The "perfect" solution should not, however, be the enemy of the "good" solution.

@MZMcBride: These discussions are getting hard to keep track of. Please direct additional comments about the architecture, implementation, and/or need for this extension to T120219.

In T117142#1996560, @kaldari wrote:

@MZMcBride: These discussions are getting hard to keep track of.

I agree, though I didn't file all of these tasks, fracturing the discussion.

Harej moved this task from Radar to Done on the WikiProject-X board.Mar 8 2016, 6:21 PM

kaldari removed a parent task: T119997: API for WikiProject article assessment data.Mar 24 2016, 5:26 PM

• Quiddity mentioned this in T117122: Investigation: Related changes by category, including main and talk namespaces (for WikiProjects).Sep 9 2016, 6:41 PM

• Phabricator_maintenance removed a subscriber: yuvipanda.Jun 7 2017, 6:46 PM

Liuxinyu970226 removed a parent task: T117116: [DO NOT USE] Make life better for WikiProjects (tracking) [superseded by #WikiProject-tools].Feb 14 2018, 1:36 AM

Create an extension for storing WikiProject article assessment metadata in a database table via a parser functionClosed, ResolvedPublic13 Estimated Story PointsActions

Description

Related ObjectsSearch...

Event Timeline

Create an extension for storing WikiProject article assessment metadata in a database table via a parser function
Closed, ResolvedPublic13 Estimated Story Points
Actions

Related Objects
Search...