Page MenuHomePhabricator

Serve WCQS Sparql endpoint through api.wikimedia.org with OAuth 2
Closed, DuplicatePublic

Description

As an external/bot developer I want to be able to authenticate with WCQS via my application/bot so that it can use the the service without human interaction.

Beta WCQS is currently basically set up as a 3rd party auth app, which is needed to verify registered users. We want to be able to use a project-level authentication with WCQS so that bots can interact with it similarly as they do with other Wikimedia projects.
After investigation, it was clear that the best way to do that would be to expose sparql endpoint for WCQS through api.wikimedia.org - providing us with the OAuth2 flow and rate limiting (even based on logged in/no logged in status).

AC:

  • Oauth bot authentication is available and documented

Event Timeline

Gehel triaged this task as Medium priority.Sep 6 2021, 12:47 PM
Gehel moved this task from Incoming to SDAW on the Wikidata-Query-Service board.

As an external/bot developer I want to be able to authenticate with WCQS via my application/bot so that it can use the the service without human interaction.

I can’t speak for others, but as a tool developer, I don’t want to have to authenticate with WCQS. If you put an OAuth gate before the query service, I will just remove support for it from my tool entirely (as I already said in this commit message).

@LucasWerkmeister current WCQS-beta behaviour is a bug - sparql endpoint should be authenticated as well, not only UI (we created a ticket for that - T290889) . Production WCQS will start with the authentication already in place.

In general - I understand your sentiment. Using the service without any additional authentication is much easier for tool developers - taking WDQS as an example.

On the other hand, we have been affected by this greatly, when maintaining that service - we don't have an effective way to block or limit users that cause or contribute to issues with the service stability. We made the decision to use an authentication in WCQS so that we won't have the same limitation with another service we maintain. I think this will help us as a team do a better job of keeping WCQS running smoothly.

In the future, we plan to use an API gateway (this ticket), that will provide an easier way for a non-interactive users to use the service. If you decide to drop support for WCQS now, I urge you to reconsider that in the future.

How you are planning to handle the use federated queries? Afaik tool creators will just rout the queries through Wikidata or some other endpoint which is without authentication? In other hand if you are blocking federated queries from Wikidata in example, then the service just is bad.

In any case with mandatory authentication you will basically limit all use cases where the client side would query data dynamically for UI (say like https://wikidocumentaries-demo.wmflabs.org ) to something where user approval is asked first.

On the other hand, we have been affected by this greatly, when maintaining that service - we don't have an effective way to block or limit users that cause or contribute to issues with the service stability. We made the decision to use an authentication in WCQS so that we won't have the same limitation with another service we maintain. I think this will help us as a team do a better job of keeping WCQS running smoothly.

I share the concerns of @LucasWerkmeister here - we've come so far to finally spend the time to model and add statements to Commons (huzzah!) and we finally have a usable query service for it (huzzah!) and then for the last mile, we're restricting access to it by instituting authentication? For a community that has "open by default" as an ethos, it feels like such an "own goal" misstep here.

I'm thinking about the number of tools, scripts, and utilities that utilize SPARQL queries via Wikidata/WDQS that have given us tremendous capabilities... and the same approach or set of activities cannot be realized for WCQS because of this constraint. We cannot underestimate the headache of having to implement OAuth2 for each and every SPARQL query. I'm also puzzled how a service that has not even launched yet has to be this closed when none of our other APIs and services have started this way.

Having helped with investigating a WDQS outage caused by a *single* user once, I have a lot of sympathy for why we'd want authentication, but I worry that putting auth walls up for any access at all is a bad step (we've long had similar discussions about this for the MediaWiki API too). I think it would be helpful to have pointers to earlier conversations where less drastic measures like increased rate limits, or a split of unauthenticated vs authenticated traffic were considered and why they were deemed unworkable. And what resources are needed to offer this. I read the announcement, and personally I would take a reduced SLO + no auth required over having better uptime with authentication required.

In any case with mandatory authentication you will basically limit all use cases where the client side would query data dynamically for UI (say like https://wikidocumentaries-demo.wmflabs.org ) to something where user approval is asked first.

I see no reason why a tool couldn't use a tool-specific account for queries like that.

Hi,

I see no reason why a tool couldn't use a tool-specific account for

queries like that.

Because it prevents creating client only solutions and requests would need
to be routed via proxy which would do the authentication. This would
increase overall complexity. Another same kind of situation would be with
tools like wikishootme which currently directly queries information from
WQS in client side.

Anyway, like Legokm, I would also take a reduced SLO + no auth required
over having better uptime with authentication required.

Br,

  • Kimmo Virtanen, Zache

Because it prevents creating client only solutions and requests would need to be routed via proxy which would do the authentication. This would increase overall complexity.

+1. We're in a really fortunate position to being one of the very few large websites with an API that is accessible without authentication. It's really beneficial when explaining concepts of API's and knowledge graphs to students and they don't need to go through hoops to understand authentication and other things before doing a simple HTTP GET call. It's in our mission that we want to share the sum of human knowledge. It doesn't say anywhere that we should make that as easy as possible, but i think we should. Putting up an authentication layer is making it harder for people to access our knowledge.

Agreed w/ many above, as a user I wouldn't want to authenticate most of the time either.
On a new device, on a public machine, on the go, testing something out, &c &x. Echoing Lego: "I would prefer a reduced SLO + no auth required over better uptime + auth"

Why were separate service channels deemed unworkable in the past? An optional "higher SLO + higher red-tape service channel" seems to make sense. Even in a total authocracy, you could automatically generate a new account for people who haven't logged in / can't log in / don't have an account; this can be invisible to them.

I share the opinion of @Multichill, @LucasWerkmeister and others: I understand the rationale for authentication, and I think I can live with it as user ; but as a tool developer, I don’t want to have to implement OAuth2 in my tools (such as Tool-inteGraality) − I was planning to add WCQS to inteGraality (T294893) but frankly I’m unlikely to do so if I have to throw in oauth on top.

So a first step in making this acceptable could be to have an authentication mechanism that’s transparently figured out already for toolforge accounts (that might be covered by the API gateway plans mentioned by @Zbyszko?) − like credentials already available on disk like for ToolsDB for some straightforward mechanism (OAuth token, basic auth for all I care). I assume SUL is planned as auth provider − having to register a WIkimedia username for every tool would also be unnecessary hassle (some tools may have a companion bot account like integraality does ; many won’t) − then tool-users would need to be somehow recognized as well.

This of course does not solve the issue for non-Toolforge tools, or perhaps more crucially for user-scripts (for examples of SPARQL-querying scripts, come to my mind IdentifierInput or ExMusica.js − there must be plenty of others). What’s the plan for such scripts − having their credentials in plain text in the JS? Some proxy?

Merging this ticket into a T313813, where we are tracking more general work of utilizing the API Gateway, which is designed specifically as a platform tool to manage various WMF services, for WDQS and WCQS. This includes, among other things, specific handling of authentication that should be more robust and documented than if the Search team were to continue working on it independently.