Page MenuHomePhabricator

API for WikiProject article assessment data
Closed, ResolvedPublic5 Story Points

Description

Follow-up to T117142: Create an extension for storing WikiProject article assessment metadata in a database table via a parser function

Build an API so that people can access the data.

There are two possible uses:

  • Article-based: Looking at an article -- API would tell you which WikiProjects that article belongs to, and any assessments
  • Project-based: For WikiProject:Beetles, show all of the articles and assessments.

We'll talk to @Harej and others to see how this could be used.

Also add documentation for the API on the Extension page on mediawiki.org: https://www.mediawiki.org/wiki/Extension:PageAssessments.

Event Timeline

DannyH created this task.Dec 1 2015, 6:35 PM
DannyH updated the task description. (Show Details)
DannyH raised the priority of this task from to Normal.
DannyH added a project: Community-Tech.
DannyH moved this task to To be estimated/discussed on the Community-Tech board.
DannyH added subscribers: DannyH, Harej.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 1 2015, 6:35 PM
DannyH set Security to None.

I am responding to a request for comment made at the English Wikipedia WikiProject Council.
What is the question here?

kaldari added a subscriber: kaldari.Dec 3 2015, 6:34 PM

@Bluerasberry: We're interested in specific use cases for an article assessment query API. Assuming that all the article assessment data will be stored in a database table (T117142), what kinds of queries would you like to do against that data? For example:

  • A query to return all the featured articles (and associated assessment data) belonging to a particular WikiProject

A lot of this is obviously going to overlap with what WP1.0 bot currently does on English Wikipedia, so feel free to pull ideas from that, but also let us know what functionality you would like that isn't currently met by WP1.0 bot.

The basic functionality would be to give the Task Force Assessment table.
https://en.wikipedia.org/wiki/Template:Task_force_assessment
That would give featured articles and lots more, and is the established basis of conversation about WikiProjects.

Beyond that functionality I would like to see better differentiation between WikiProjects and their task forces. Some huge WikiProjects like WP:Biography would be broken down into subprojects if it were easy to manage data, but instead - and with data difficulty being a major reason for it - many people make different WikiProjects so that splits datasets which ought to be together. I would like to see data tables acknowledging task-forces and sub projects, so that if someone pulls two datasets they have more assurance there is less overlap.

Going beyond that - pageview data is the equivalent currency in Wikimedia projects to the other metrics currencies in the communications industry. Pageviews = hits = impressions = tweets = likes. A major reason why Wikipedia is not acknowledged as legitimate in the communications sector is that it is not possible to get metrics from Wikipedia easily. If pageview data was easily reported from WikiProjects, then that probably would be the single biggest thing that could be done to increase the prestige and market competitiveness of Wikimedia projects as a communication channel. Going beyond that - if it were possible for all the WikiProject watching and metrics tools to be applied to any arbitrary set of articles, then that would make WikiProjects a more personal experience for each user and make them much more valuable.

Going beyond that idea - I do not think it should be a sole priority to have an API to call data from WikiProjects. Whatever tools are developed, I would like them to have a way to call data from any user's arbitrary sets of articles, or in other words, I would like for it to be possible for users to have the ability to create multiple personal watchlists / WikiProjects / lists of articles, and given a list of articles, the user should be able to get data about that list.

DannyH edited a custom field.Dec 4 2015, 6:44 PM
Fhocutt moved this task from Ready to In Development on the Community-Tech-Sprint board.

Working branch until I figure out appropriate dependencies/remotes in Gerrit: https://github.com/fhocutt/Assessments/tree/T119997-write-API

Current issues: getting minimum boilerplate for the extension to show up in the api.php help page. Have not found helpful documentation.

Change 263575 had a related patch set uploaded (by Fhocutt):
[WIP] Begin API page assessment generator module

https://gerrit.wikimedia.org/r/263575

kaldari updated the task description. (Show Details)Jan 12 2016, 5:41 PM

I would recommend holding off on building an API for this extension until we figure out if this extension is even needed.

@MZMcBride: Frances has almost finished this already. It's just a basic preliminary API. Let's continue discussing the extension on the tracking ticket (T120219) and let Frances finish her work on this.

@Raymond: I was told that you might have some insights for us regarding how WikiProjects work on the German Wikipedia. It seems that they are a bit different than how WikiProjects work on English Wikipedia (and many other projects). On the English and French Wikipedias, WikiProjects basically claim certain articles as belonging to that WikiProject and each article is assigned a quality rating and an importance rating by each WikiProject that claims it. Danish Wikipedia is similar, but doesn't use importance ratings. On the Spanish and Italian Wikipedias, WikiProjects just claim articles, but don't give them ratings. On the German Wikipedia, as far as I can tell, it doesn't look like WikiProjects claim articles at all. Is that accurate?

kaldari claimed this task.Feb 16 2016, 6:29 PM
kaldari removed kaldari as the assignee of this task.Mar 15 2016, 5:25 PM
Niharika claimed this task.Mar 16 2016, 4:04 PM
Niharika moved this task from Ready to In Development on the Community-Tech-Sprint board.

@kaldari, I think I made all the necessary fixes but I'm not super familiar with how the API works, your review would be very helpful!
Here's what the test input produces:
Input: http://localhost:8080/w/api.php?action=query&list=projectpages&wppprojects=Medicine|History&wppassessments=true
Output:

{
    "batchcomplete": "",
    "query": {
        "projects": {
            "Wikiproject:Medicine": [
                {
                    "pageid": 4,
                    "ns": 0,
                    "title": "Test1",
                    "assessment": {
                        "class": "A",
                        "importance": "Low"
                    }
                },
                {
                    "pageid": 6,
                    "ns": 0,
                    "title": "Test2",
                    "assessment": {
                        "class": "G",
                        "importance": "Up"
                    }
                }
            ],
            "Wikiproject:History": [
                {
                    "pageid": 6,
                    "ns": 0,
                    "title": "Test2",
                    "assessment": {
                        "class": "A",
                        "importance": "High"
                    }
                },
                {
                    "pageid": 8,
                    "ns": 0,
                    "title": "Test3",
                    "assessment": {
                        "class": "F",
                        "importance": "High"
                    }
                }
            ]
        }
    }
}

Note that the project names have a 'Wikiproject:' prepended to them. This is because the page names are such.

@kaldari, as per Brad's review on https://gerrit.wikimedia.org/r/#/c/279130/2/PageAssessmentsBody.php and my chat with him on IRC, it seems like a better idea would be to maintain a separate table for Wikiproject names and an auto-increment ID for those which we reference here..
We still need to think through some other points Brad mentions in the code review though.

kaldari added a comment.EditedMar 24 2016, 4:44 PM

@Niharika: Yeah, Brad brings's up some good points. Let's go with your suggestion and create a new table for Wikiproject names. And let's not worry about trying to tie WikiProject names to actual WikiProject pages. I think that's just going to be too much of a headache. Sorry I didn't realize that earlier. I like your idea though. That will let us save space in both the pa_project column and the index.

kaldari closed this task as Resolved.Mar 24 2016, 5:25 PM

This is basically done. We just need to change how it works (with T130844).

Change 263575 abandoned by Kaldari:
Add PageAssessments API modules (projectpages and pageassessments)

Reason:
Squashed into https://gerrit.wikimedia.org/r/#/c/279130/

https://gerrit.wikimedia.org/r/263575