Page MenuHomePhabricator

PageAssessments deployment to WMF wikis
Open, NormalPublic

Description

This is the Epic task for tracking stuff that needs to be done before the extension is deployed to production.

https://www.mediawiki.org/wiki/Extension:PageAssessments

Related Objects

StatusAssignedTask
Openkaldari
ResolvedNiharika
ResolvedNiharika
ResolvedNiharika
ResolvedNiharika
ResolvedNiharika
ResolvedNiharika
ResolvedNiharika
ResolvedNiharika
ResolvedNiharika
ResolvedNiharika
ResolvedNiharika
Resolvedkaldari
Resolvedjcrespo
ResolvedSamwilson
Resolvedkaldari
InvalidNone
Resolvedkaldari
Resolvedkaldari
ResolvedMusikAnimal
Openkaldari
ResolvedMusikAnimal
OpenNone
ResolvedMusikAnimal
DeclinedNone
OpenNone
DeclinedNone
OpenNone
ResolvedMusikAnimal

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@MZMcBride: Even if Wikidata didn't reject storing article assessments, this is still a simpler solution (especially for querying the data), and it doesn't preclude the option of using Wikidata in the future. This is just a simple hack to get the assessment data from the existing templates into a structured and consistent format so that community developers, such as yourself, have easy access to the data. It doesn't require any changes to anyone's workflow, it doesn't require any modifications to existing software, and it only requires a small bit of developer effort to implement (most of which is already done). I'm not sure why you are so opposed to the implementation of this extension, as I really don't see any down-side to it. If you really think it would make more sense to store this data in Wikidata, you are welcome to try to convince the community there. The "perfect" solution should not, however, be the enemy of the "good" solution.

You're still overstating the query restrictions, I think. As I understand it, Wikidata is queryable in exactly the way you want it to be. Is there reason to believe otherwise?

Speaking generally, when adding to the pile of technical debt, nobody ever thinks it's a big deal. And it isn't a big deal in that specific and isolated context. But I think we should solve the problem as best as possible rather than rushing in with a simple hack. We can and should take the longer view. I agree with not making the perfect the enemy of the good, but if we already have Wikidata today and now, why not use it? Be a pioneer in using and exploiting Wikidata! That would be cool and exciting, unlike yet another extension that we need to kill one day.

What is being proposed here is a quick fix, but I still don't see what the rush is, particularly if Wikidata is available and usable now.

Speaking of which, does this extension have logging?

We talked about doing logging, but dropped it for the MVP. It could always be added later. We mainly want to get a proof-of-concept out there and see if people actually use it.

Okay, then this extension cannot be deployed to production wikis until there's logging in place. It's fine for Beta Labs or wherever, of course.

Okay, then this extension cannot be deployed to production wikis until there's logging in place. It's fine for Beta Labs or wherever, of course.

There may be some confusion on my part here. For some reason I thought the idea for a {{#wikiproject:}} parser function had been killed. What's the intended input mechanism for the PageAssessments extension? I thought a major win here was moving away from wikitext?

As a secondary table, the page_props content has always just been generated from the contents of other tables by parsing stuff out of them. Unless we want to hook extra stuff into the parser or whatever for this automatic parsing to happen here as well, I don't think page_props is the right place for this either.

If we're talking about having a {{#wikiproject:}} parser function, this sounds exactly like what page_props table has been used for.

There may be some confusion on my part here. For some reason I thought the idea for a {{#wikiproject:}} parser function had been killed. What's the intended input mechanism for the PageAssessments extension? I thought a major win here was moving away from wikitext?

Looking at https://www.mediawiki.org/wiki/Extension:PageAssessments#Usage, this seems like you're basically creating a programmatic interface (good) that future solutions such as Wikidata will have then have to either accommodate or replicate in shim logic or that will result in a breaking change for clients/consumers (bad).

WikiProject assessment is not, however, a common feature of wikis. It's a use-case that is specific to Wikipedia. Thus implementing it in an extension makes sense to me.

First of all Wikipedia doesn't run the latest and greatest database software. We're currently running MariaDB 10.0 in production. Our software is also expected to run, in theory, on MySQL 5.0.2 and Postgres 9.0. MariaDB supports indexing of JSON through the use of virtual columns, but there are several problems with this. The implementations are incompatible in some cases between MariaDB and MySQL. Plus many popular SQL tools such as mysqldump, phpMyAdmin, SQLyog, etc. don't know how to handle these yet. Whatever database features we use have to be supported by our entire operations and development infrastructure and should be easy for 3rd parties to support as well.

So you're trying to solve for Wikimedia wikis or MediaWiki wikis?

I responded to a specific point about the ability of modern database engines to index blobs. I wasn't suggesting using blobs.

So you're trying to solve for Wikimedia wikis or MediaWiki wikis?

Wasn't that answered already?:

WikiProject assessment is not, however, a common feature of wikis. It's a use-case that is specific to Wikipedia. Thus implementing it in an extension makes sense to me.

Wasn't that answered already?

I included both quotes in my reply as the two quotes seem to be contradicting each other. I'm not sure how I could have made that clearer. One quote suggests that PageAssessments is intended primarily for Wikimedia wikis, while the other quote suggests that all third-party MediaWiki installations must be supported.

Just a side comment: if you don't find an agreement here about the best technical approach, then maybe you could look for broader attention at wikitech-l or at the TechCom?

I've certainly been around long enough to recognize the anti-pattern here. This is hardly the first case of a Wikimedia Foundation team coming along and saying "we've already decided how we're going to do this in the quickest way possible, here's some code." There will, of course, be predictable consequences from taking this approach.

Wasn't that answered already?

One quote suggests that PageAssessments is intended primarily for Wikimedia wikis, while the other quote suggests that all third-party MediaWiki installations must be supported.

I don't know how "should be easy" in T120230#1994320 suddenly became a rather different "must be supported" in T120219#1998135. So I don't see a "contradiction".

Looking at https://www.mediawiki.org/wiki/Extension:PageAssessments#Usage, this seems like you're basically creating a programmatic interface (good) that future solutions such as Wikidata will have then have to either accommodate or replicate in shim logic or that will result in a breaking change for clients/consumers (bad).

This would have no effect on a Wikidata implementation except to make it a lot easier. Here's how a switch to Wikidata would work:

  1. A script takes the data from the page_assessments table and imports it into Wikidata (which would be much easier than trying to scrape it from each wiki with it's own category conventions).
  2. Someone removes the parser function from each wiki's master assessment template (which is actually an optional step).

If everything were actually migrated to Wikidata, you could get the assessment data for a specific article easily from the Wikidata API. Getting aggregate data however, such as all the articles that belong to WikiProject Medicine would be more difficult, as you would have to use the Wikidata query service, which has a steep learning curve.

kaldari added a comment.EditedFeb 4 2016, 9:14 PM

Also, I'm not really sure how you would store this data in Wikidata anyway since Wikidata is wiki-language agnostic, but this data is specific to each wiki (which is one of the reasons the proposals were rejected on Wikidata). I guess you would have to create a WikiProjects property for each wiki and then the values would be the specific WikiProjects. Then under each value, you would have modifiers for the importance and quality. So for an item like Zika fever, you would have the following property entries:

English WikiProjects:
    WikiProject Medicine
        importance: mid
        class: start
    WikiProject Viruses
        importance: low
        class: start
   WikiProject Women's health
        importance: low
        class: start
Spanish WikiProjects:
    Wikiproyecto Enfermedades
    ...

Since modifiers can only be created as part of the Wikibase software, you would still have to write a Wikibase extension in order to support the full set of data. Either that, or create a large number of very specific properties like "WikiProject Medicine (English Wikipedia) importance". In either case, I doubt you will be able to have performant queries for aggregate WikiProject data. I also doubt that the Wikidata Community would support hosting the data, but you're welcome to make a proposal there.

Harej added a comment.Feb 4 2016, 9:16 PM

Not to mention that Wikidata items are about concepts, not specific Wikipedia articles. So it would be inappropriate to attribute things to “Zika fever” the disease when you should be attributing them to “Zika fever” the English Wikipedia article.

You're still overstating the query restrictions, I think. As I understand it, Wikidata is queryable in exactly the way you want it to be. Is there reason to believe otherwise?

Yes, see the comments above. Basically: you would have to use the Wikidata Query Service, which has a steep learning curve, rather than a simple dedicated API; it's very unlikely that queries for aggregate WikiProject data would be performant enough to be used in real-time; it's not even clear how such data would be stored in Wikidata to begin with or if the community would be willing to host the data.

There may be some confusion on my part here. For some reason I thought the idea for a {{#wikiproject:}} parser function had been killed.

Is there a previous discussion about a {{#wikiproject:}} parser function?

What's the intended input mechanism for the PageAssessments extension? I thought a major win here was moving away from wikitext?

The input mechanism (for now) is a single parser function that would be embedded in the master assessment template for each wiki, for example Template:WPBannerMeta on English Wikipedia. Eventually, however, non-Wikitext-based interfaces could be built which interfaced with the extension directly (as @Harej has proposed). Then both the assessment templates and the parser function could be deprecated. The parser function allows us to have a smooth transition between people using templates and whatever future interface is created, as it basically accomplishes 1/2 of the transition (getting the data outside of Wikitext) without requiring an immediate change of workflow. I'll try to write up more about this on mediawiki.org so it's easier to grok.

If we're talking about having a {{#wikiproject:}} parser function, this sounds exactly like what page_props table has been used for.

Yes, that's why using page_props was the first idea we looked at. I would love to use page_props for this as it would make everything much simpler, but it just wouldn't work. We could store the data there as a blob, but (without being able to use virtual columns) the queries would not be performant, mainly because we wouldn't have adequate indexing, but the huge size of the existing table wouldn't help either. Wikidata had the same problem (they store multidimensional data in blobs) and the only way they were finally able to make their data fully queriable was to have the WMF build a graph database layer on top of their data (the Wikidata Query Service), and any kind of complicated queries still take forever to run.

I have experience with programmatically storing data both in page_props (the Disambiguator extension) and in Wikidata (the WikiGrok extension), so I'm familiar with the limitations of both. Unfortunately, there isn't a sensible way to modify either of those to fit this use case. I also have experience with creating, altering, and deleting production database tables, and I really don't think it creates a huge amount of overhead or technical debt. I think our difference of opinion regarding PageAssessments ultimately boils down to the cathedral vs. the bazaar. You want a solution that will fit arbitrary future use cases, while I want to build a specific solution to a specific problem until I know of other use cases that need to be accommodated. Hopefully, in the future, it will be feasible to store this sort of data in page_props (once something like virtual columns are actually standardized and supported), but we're a long way from that.

There will, of course, be predictable consequences from taking this approach.

Yes, there will be a small amount of technical debt created by this project, but the debt vs. gain ratio looks pretty good from my point of view. Yes, we must consider alternatives, but I don't see any alternatives that are feasible at this point. Honestly, the one alternative that I think would be the closest to making sense (but would still have limitations) would be to have each Wikipedia supporting it's own Wikibase repo for article metadata. Considering that the project to have Commons support its own Wikibase repo for media metadata has been ongoing for many years now and is still not close to fruition, I really don't want to go down that path (which seems more like a death march). I also honestly don't see much of a down-side to this implementation. It's technically simple; will have little maintenance cost; doesn't require any changes from the community; won't add any bloat to existing tables; and provides exactly the functionality that is requested. If you see a significant down-side that I'm missing, let me know.

Not to mention that Wikidata items are about concepts, not specific Wikipedia articles. So it would be inappropriate to attribute things to “Zika fever” the disease when you should be attributing them to “Zika fever” the English Wikipedia article.

My understanding is that Wikidata already largely cleared that hurdle by implementing badges.

Harej added a comment.Feb 5 2016, 2:52 AM

Badges are per-article assessments and only include the basic good/featured tiers. The WikiProject assessment system has, at minimum, FA/A/GA/B/C/Start/Stub/List, plus deviations from that formula. (Some WikiProjects have a B+ rating.) And WikiProjects have their own ratings; WikiProject A may rate an article differently from WikiProject B. And then there is the matter of importance ratings, which definitely differ between projects.

The Wikidata developers might be reluctant to build this level of complexity into the Wikibase extension.

Is there a previous discussion about a {{#wikiproject:}} parser function?

There's T117142, which uses {{#wikiproject:}} in its task description. It looks like the name was later changed to {{#assessment:}}.

I thought a major goal here was to move away from wikitext. I thought maybe someone would finally develop an interface for modifying page properties that didn't require a large blob of wikitext being run through the preprocessor and parser. I understand now that the scope here is significantly narrower.

If we're talking about having a {{#wikiproject:}} parser function, this sounds exactly like what page_props table has been used for.

Yes, that's why using page_props was the first idea we looked at. I would love to use page_props for this as it would make everything much simpler, but it just wouldn't work. We could store the data there as a blob, but (without being able to use virtual columns) the queries would not be performant, mainly because we wouldn't have adequate indexing, but the huge size of the existing table wouldn't help either. Wikidata had the same problem (they store multidimensional data in blobs) and the only way they were finally able to make their data fully queriable was to have the WMF build a graph database layer on top of their data (the Wikidata Query Service), and any kind of complicated queries still take forever to run.

All right.

I think our difference of opinion regarding PageAssessments ultimately boils down to the cathedral vs. the bazaar. You want a solution that will fit arbitrary future use cases, while I want to build a specific solution to a specific problem until I know of other use cases that need to be accommodated. Hopefully, in the future, it will be feasible to store this sort of data in page_props (once something like virtual columns are actually standardized and supported), but we're a long way from that.

Sounds about right. I think adding properties to a MediaWiki page is so common that it should be natively supported. We've made inroads with categorylinks and page_props, but both aparently fall short here. I continue to think that it would be better to work through that problem instead of what I see as working around it. But I can respect our difference of opinion. Thank you for the thoughtful and thorough responses.

There will, of course, be predictable consequences from taking this approach.

Yes, there will be a small amount of technical debt created by this project, but the debt vs. gain ratio looks pretty good from my point of view. Yes, we must consider alternatives, but I don't see any alternatives that are feasible at this point. Honestly, the one alternative that I think would be the closest to making sense (but would still have limitations) would be to have each Wikipedia supporting it's own Wikibase repo for article metadata. Considering that the project to have Commons support its own Wikibase repo for media metadata has been ongoing for many years now and is still not close to fruition, I really don't want to go down that path (which seems more like a death march).

This was basically suggested in T117142#1876256. Setting up more than one Wikibase is an interesting idea.

Not to mention that Wikidata items are about concepts, not specific Wikipedia articles. So it would be inappropriate to attribute things to “Zika fever” the disease when you should be attributing them to “Zika fever” the English Wikipedia article.

@Lydia_Pintscher

Jep that is a correct assessment of the current situation from my side. Wikidata items are indeed about concepts and not about articles.

He7d3r added a subscriber: He7d3r.Feb 10 2016, 11:27 AM

Badges are per-article assessments and only include the basic good/featured tiers.

Isn't that just because the others were not created yet?

WikiProject A may rate an article differently from WikiProject B.

That does not seems to be the case for all wikis (at least for ptwiki there is a single quality for a given article, independently of wikiprojects - only the importance changes from wikiproject to wikiproject)

DannyH renamed this task from Investigation for PageAssessments deployment to WMF wikis (at least enwiki) to PageAssessments deployment to WMF wikis (at least enwiki).Mar 24 2016, 5:25 PM
Qgil removed a subscriber: Qgil.Mar 30 2016, 9:55 AM

Documentation about the PageAssessments project has been posted at https://meta.wikimedia.org/wiki/Community_Tech/PageAssessments.

PageAssessments has been deployed to Beta Labs. Please test it out and report any issues here. Currently, the extension consists of the {{#assessment}} parser function and 2 APIs: projectpages and pageassessments.

Querying for project(s):
http://simple.wikipedia.beta.wmflabs.org/w/api.php?action=query&list=projectpages&wppprojects=Medicine&wppassessments=true

Querying for page title(s):
http://simple.wikipedia.beta.wmflabs.org/w/api.php?action=query&prop=pageassessments&titles=Bandage&formatversion=2

The next step will be testing on test.wikipedia.org.

Harej added a comment.Jun 9 2016, 11:10 PM

I noticed you've embedded the {{#assessment}} parser function on the ns 0 Medicine rather than the ns 1 Talk:Medicine. Would it make a difference if you embedded it on a talk page instead of the non-talk page? I am assuming you would be embedding the parser function WikiProject templates, and those reside on the talk page.

Niharika added a comment.EditedJun 10 2016, 3:14 AM

I noticed you've embedded the {{#assessment}} parser function on the ns 0 Medicine rather than the ns 1 Talk:Medicine. Would it make a difference if you embedded it on a talk page instead of the non-talk page? I am assuming you would be embedding the parser function WikiProject templates, and those reside on the talk page.

No, it wouldn't make a difference. I'll generate some more test cases later today.

Here, http://simple.wikipedia.beta.wmflabs.org/w/api.php?action=query&list=projectpages&wppprojects=Botany|Clothing&wppassessments=true
Both of those pages have the assessment on the Talk page.

Change 294656 had a related patch set uploaded (by Kaldari):
Make sure PageAssessments extension is automatically branched

https://gerrit.wikimedia.org/r/294656

Change 294656 merged by jenkins-bot:
Make sure PageAssessments extension is automatically branched

https://gerrit.wikimedia.org/r/294656

Titoxd added a subscriber: Titoxd.Aug 31 2016, 7:01 PM
kaldari renamed this task from PageAssessments deployment to WMF wikis (at least enwiki) to PageAssessments deployment to WMF wikis.Sep 7 2016, 11:07 PM
kaldari added a comment.EditedSep 7 2016, 11:14 PM

PageAssessments has been deployed to English Wikivoyage and integrated into the master assessment template there. No problems were encountered during deployment. Deployment to English Wikipedia will be next (possibly within a week or two). To reiterate, this extension makes no user-facing changes other than making 2 new APIs and a parser function available. Otherwise, it is a completely invisible deployment.

Alsee added a subscriber: Alsee.Oct 16 2016, 12:11 PM

Remember that someone could put "JohnDoe(555)555-1234" in there, or even "JohnDoeIsAPedophile".

Did you remember to remove the value when the page is deleted, oversighted, or page moved? And copy the value again after page move, undeletion, un-oversight?

Remember that someone could put "JohnDoe(555)555-1234" in there, or even "JohnDoeIsAPedophile".
Did you remember to remove the value when the page is deleted, oversighted, or page moved? And copy the value again after page move, undeletion, un-oversight?

Did you test this before asking? If you had, or reviewed the implementation, then you would have noticed that this uses the standard mechanisms for page properties and handles all of those automatically.

Niharika reassigned this task from Niharika to kaldari.Nov 23 2016, 3:30 PM

Assigning to Ryan since he's looking over this now.

czar added a subscriber: czar.Jul 13 2017, 8:44 PM

I went to test this out on enwp today but I'm getting:

{
    "error": {
        "code": "internal_api_error_DBQueryError",
        "info": "[WWfbIQpAEDMAAFYiJJ4AAACS] Database query error."
    },
    "servedby": "mw1286"
}
MaxSem added a subscriber: MaxSem.EditedJul 13 2017, 8:53 PM

That's a timeout on

SELECT  pa_page_id AS `page_id`,pa_project_id AS `project_id`,page_title AS `title`,page_namespace AS `namespace`,pap_project_title AS `project_name`
    FROM `page_assessments`
        JOIN `page` ON ((page_id = pa_page_id))
        JOIN `page_assessments_projects` ON ((pa_project_id = pap_project_id)) 
    ORDER BY pa_project_id, pa_page_id
    LIMIT 11
geraki added a subscriber: geraki.May 12 2019, 11:05 AM