This is the Epic task for tracking stuff that needs to be done before the extension is deployed to production.
Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
mediawiki/tools/release | master | +1 -0 | Make sure PageAssessments extension is automatically branched |
Event Timeline
You're still overstating the query restrictions, I think. As I understand it, Wikidata is queryable in exactly the way you want it to be. Is there reason to believe otherwise?
Yes, see the comments above. Basically: you would have to use the Wikidata Query Service, which has a steep learning curve, rather than a simple dedicated API; it's very unlikely that queries for aggregate WikiProject data would be performant enough to be used in real-time; it's not even clear how such data would be stored in Wikidata to begin with or if the community would be willing to host the data.
What's the intended input mechanism for the PageAssessments extension? I thought a major win here was moving away from wikitext?
The input mechanism (for now) is a single parser function that would be embedded in the master assessment template for each wiki, for example Template:WPBannerMeta on English Wikipedia. Eventually, however, non-Wikitext-based interfaces could be built which interfaced with the extension directly (as @Harej has proposed). Then both the assessment templates and the parser function could be deprecated. The parser function allows us to have a smooth transition between people using templates and whatever future interface is created, as it basically accomplishes 1/2 of the transition (getting the data outside of Wikitext) without requiring an immediate change of workflow. I'll try to write up more about this on mediawiki.org so it's easier to grok.
Yes, that's why using page_props was the first idea we looked at. I would love to use page_props for this as it would make everything much simpler, but it just wouldn't work. We could store the data there as a blob, but (without being able to use virtual columns) the queries would not be performant, mainly because we wouldn't have adequate indexing, but the huge size of the existing table wouldn't help either. Wikidata had the same problem (they store multidimensional data in blobs) and the only way they were finally able to make their data fully queriable was to have the WMF build a graph database layer on top of their data (the Wikidata Query Service), and any kind of complicated queries still take forever to run.
I have experience with programmatically storing data both in page_props (the Disambiguator extension) and in Wikidata (the WikiGrok extension), so I'm familiar with the limitations of both. Unfortunately, there isn't a sensible way to modify either of those to fit this use case. I also have experience with creating, altering, and deleting production database tables, and I really don't think it creates a huge amount of overhead or technical debt. I think our difference of opinion regarding PageAssessments ultimately boils down to the cathedral vs. the bazaar. You want a solution that will fit arbitrary future use cases, while I want to build a specific solution to a specific problem until I know of other use cases that need to be accommodated. Hopefully, in the future, it will be feasible to store this sort of data in page_props (once something like virtual columns are actually standardized and supported), but we're a long way from that.
There will, of course, be predictable consequences from taking this approach.
Yes, there will be a small amount of technical debt created by this project, but the debt vs. gain ratio looks pretty good from my point of view. Yes, we must consider alternatives, but I don't see any alternatives that are feasible at this point. Honestly, the one alternative that I think would be the closest to making sense (but would still have limitations) would be to have each Wikipedia supporting it's own Wikibase repo for article metadata. Considering that the project to have Commons support its own Wikibase repo for media metadata has been ongoing for many years now and is still not close to fruition, I really don't want to go down that path (which seems more like a death march). I also honestly don't see much of a down-side to this implementation. It's technically simple; will have little maintenance cost; doesn't require any changes from the community; won't add any bloat to existing tables; and provides exactly the functionality that is requested. If you see a significant down-side that I'm missing, let me know.
My understanding is that Wikidata already largely cleared that hurdle by implementing badges.
Badges are per-article assessments and only include the basic good/featured tiers. The WikiProject assessment system has, at minimum, FA/A/GA/B/C/Start/Stub/List, plus deviations from that formula. (Some WikiProjects have a B+ rating.) And WikiProjects have their own ratings; WikiProject A may rate an article differently from WikiProject B. And then there is the matter of importance ratings, which definitely differ between projects.
The Wikidata developers might be reluctant to build this level of complexity into the Wikibase extension.
There's T117142, which uses {{#wikiproject:}} in its task description. It looks like the name was later changed to {{#assessment:}}.
I thought a major goal here was to move away from wikitext. I thought maybe someone would finally develop an interface for modifying page properties that didn't require a large blob of wikitext being run through the preprocessor and parser. I understand now that the scope here is significantly narrower.
Yes, that's why using page_props was the first idea we looked at. I would love to use page_props for this as it would make everything much simpler, but it just wouldn't work. We could store the data there as a blob, but (without being able to use virtual columns) the queries would not be performant, mainly because we wouldn't have adequate indexing, but the huge size of the existing table wouldn't help either. Wikidata had the same problem (they store multidimensional data in blobs) and the only way they were finally able to make their data fully queriable was to have the WMF build a graph database layer on top of their data (the Wikidata Query Service), and any kind of complicated queries still take forever to run.
All right.
I think our difference of opinion regarding PageAssessments ultimately boils down to the cathedral vs. the bazaar. You want a solution that will fit arbitrary future use cases, while I want to build a specific solution to a specific problem until I know of other use cases that need to be accommodated. Hopefully, in the future, it will be feasible to store this sort of data in page_props (once something like virtual columns are actually standardized and supported), but we're a long way from that.
Sounds about right. I think adding properties to a MediaWiki page is so common that it should be natively supported. We've made inroads with categorylinks and page_props, but both aparently fall short here. I continue to think that it would be better to work through that problem instead of what I see as working around it. But I can respect our difference of opinion. Thank you for the thoughtful and thorough responses.
There will, of course, be predictable consequences from taking this approach.
Yes, there will be a small amount of technical debt created by this project, but the debt vs. gain ratio looks pretty good from my point of view. Yes, we must consider alternatives, but I don't see any alternatives that are feasible at this point. Honestly, the one alternative that I think would be the closest to making sense (but would still have limitations) would be to have each Wikipedia supporting it's own Wikibase repo for article metadata. Considering that the project to have Commons support its own Wikibase repo for media metadata has been ongoing for many years now and is still not close to fruition, I really don't want to go down that path (which seems more like a death march).
This was basically suggested in T117142#1876256. Setting up more than one Wikibase is an interesting idea.
Jep that is a correct assessment of the current situation from my side. Wikidata items are indeed about concepts and not about articles.
Isn't that just because the others were not created yet?
WikiProject A may rate an article differently from WikiProject B.
That does not seems to be the case for all wikis (at least for ptwiki there is a single quality for a given article, independently of wikiprojects - only the importance changes from wikiproject to wikiproject)
Documentation about the PageAssessments project has been posted at https://meta.wikimedia.org/wiki/Community_Tech/PageAssessments.
PageAssessments has been deployed to Beta Labs. Please test it out and report any issues here. Currently, the extension consists of the {{#assessment}} parser function and 2 APIs: projectpages and pageassessments.
Querying for project(s):
http://simple.wikipedia.beta.wmflabs.org/w/api.php?action=query&list=projectpages&wppprojects=Medicine&wppassessments=true
Querying for page title(s):
http://simple.wikipedia.beta.wmflabs.org/w/api.php?action=query&prop=pageassessments&titles=Bandage&formatversion=2
The next step will be testing on test.wikipedia.org.
I noticed you've embedded the {{#assessment}} parser function on the ns 0 Medicine rather than the ns 1 Talk:Medicine. Would it make a difference if you embedded it on a talk page instead of the non-talk page? I am assuming you would be embedding the parser function WikiProject templates, and those reside on the talk page.
No, it wouldn't make a difference. I'll generate some more test cases later today.
Here, http://simple.wikipedia.beta.wmflabs.org/w/api.php?action=query&list=projectpages&wppprojects=Botany|Clothing&wppassessments=true
Both of those pages have the assessment on the Talk page.
Change 294656 had a related patch set uploaded (by Kaldari):
Make sure PageAssessments extension is automatically branched
Change 294656 merged by jenkins-bot:
Make sure PageAssessments extension is automatically branched
PageAssessments has been deployed to English Wikivoyage and integrated into the master assessment template there. No problems were encountered during deployment. Deployment to English Wikipedia will be next (possibly within a week or two). To reiterate, this extension makes no user-facing changes other than making 2 new APIs and a parser function available. Otherwise, it is a completely invisible deployment.
Remember that someone could put "JohnDoe(555)555-1234" in there, or even "JohnDoeIsAPedophile".
Did you remember to remove the value when the page is deleted, oversighted, or page moved? And copy the value again after page move, undeletion, un-oversight?
Did you test this before asking? If you had, or reviewed the implementation, then you would have noticed that this uses the standard mechanisms for page properties and handles all of those automatically.
I went to test this out on enwp today but I'm getting:
{ "error": { "code": "internal_api_error_DBQueryError", "info": "[WWfbIQpAEDMAAFYiJJ4AAACS] Database query error." }, "servedby": "mw1286" }
That's a timeout on
SELECT pa_page_id AS `page_id`,pa_project_id AS `project_id`,page_title AS `title`,page_namespace AS `namespace`,pap_project_title AS `project_name` FROM `page_assessments` JOIN `page` ON ((page_id = pa_page_id)) JOIN `page_assessments_projects` ON ((pa_project_id = pap_project_id)) ORDER BY pa_project_id, pa_page_id LIMIT 11
Changing Project tag from Tracking-New Ending to Epic as per task description.
Although, Could this task be closed? Is there a need to link all the different deployments of this tool across the wmf cluster?