Page MenuHomePhabricator

Remove authentication from Wikimedia Commons Query Services (WCQS)
Open, HighPublic

Description

It was fine for the WMQS beta to have authentication, but the production SPARQL endpoint shouldn't be limited by authentication. Such a shift of policy with such implications is is not something a team of the WMF should unilaterally decide. This is something that should go all the way up to the WMF board to decide. So please disable it.

At https://commons.wikimedia.org/wiki/Commons_talk:SPARQL_query_service/Upcoming_General_Availability_release#Mandatory_authentication_considered_harmful Andrew lists why this shouldn't be done.

Regarding this part of the announcement:

    "The biggest change to user behavior will be the requirement for user authentication to use all endpoints." 

To my (@Fuzheado) recollection, this is the only instance of needing to be authenticated to experience the main corpus of Wikimedia content. So this is a major policy shift. There are a number of concerns:
Endangered species? Are publicly viewable knowledge graphs like this at risk with WCQS locked up behind an authentication system?

    Restricted reading. We've come so far in finally spending the time to model and add millions of statements to Commons (huzzah!) and we finally have a usable query service for it (huzzah!) and then for the last mile, we're restricting access to it by instituting authentication? For a community that has "open by default" as an ethos, it feels like such an "own goal" misstep here.

    Tools implications. I'm thinking about the number of tools, scripts, and utilities that utilize SPARQL queries via Wikidata/WDQS that have given us tremendous capabilities... and the same approach or set of activities cannot be realized for WCQS because of this constraint. We cannot underestimate the headache of having to implement OAuth2 for each and every SPARQL query. I'm also puzzled how a service that has not even launched yet has to be this closed when none of our other APIs and services have started this way. Other tool creators have shared this common concern at this Phabricator thread T290300.

    Public perception. In terms of public outreach, especially for our GLAM-Wiki work, this is hard to swallow and reconcile with what we are evangelizing. As we are asking cultural and heritage partners to open up their collections and to share their metadata, we are doing so with the expectation of showcasing the benefits of open knowledge to the world. Or we thought we were. With this WCQS policy, every mention of "open content" and "open access" will require an asterisk. This will introduce an asymmetry in contributing content and experiencing its benefits.

    Alternative solutions. I am sympathetic to the complex support issues when any service is made available for public access, whether it's the Mediawiki API or a SPARQL endpoint. However, our "open by default" ethos is a core tenet for the movement and for equitable access to knowledge. Like-minded entities like openverse have found ways to have different tiers of access, while not requiring API keys. We should bend over backward to find "least restrictive" solutions such as throttling or limiting call frequency before we completely block access with mandatory authentication.

Thanks. - Fuzheado

Notes from round table discussion: https://docs.google.com/document/d/13BFQqjfAbzek8pmpLJenyQyxqM1VQiij1lYNjcOCyX0/edit#heading=h.gkr3sreu7vcd

Event Timeline

Multichill renamed this task from Remove authentication from Wikimedia Commons Query Services (WMQS) to Remove authentication from Wikimedia Commons Query Services (WCQS).Dec 18 2021, 5:06 PM

We may introduce two level of service (authenticated and anonymous) with separate resources and different timeout (e.g. 15s/60s or 60s/300s).

As mentioned in the WCQS beta 2 announcement, authentication (or the lack of it) on WCQS is an issue that requires further discussion and planning after the beta 2 release. I'll mark this as high priority as a feature request.

MPhamWMF moved this task from Incoming to Feature Requests on the Wikidata-Query-Service board.

adding security to WCQS, might have an unexpected effect. Since it is not possible to write a federated query where the query is submitted to a remote SPARQL endpoint, it is only possible to run federated queries directly on the WCQS, which means that WCQS needs to deal with all the complexity of a query. Removing that login requirement would allow the majority of the complexity can be dealt with at a remote endpoint.

Can the password feature on the SDCQC please, please, please, please pretty please be removed/disabled? The SDCQC is an epic feature, but almost useless thanks to the requirement to log in. Basically, Commons remains a data silo on its own.
I keep running into issues where I am building a query that I want to share, reuse in a jupyter notebook or run a federated query from Wikidata. The decision to Oauth here is really a poor design choice.

FYI, OpenRefine will likely implement a SPARQL importer in the upcoming time (May-August 2022) through an Outreachy internship. Many OpenRefine users have requested to make it possible to start OpenRefine projects from a SPARQL query.

I have explicitly asked to also investigate if it is even possible for us to start a project from a WCQS query. In our user research (as part of the Wikimedia-funded project to include Wikimedia Commons functionalities in OpenRefine), Wikimedia Commons users have asked for this feature.