Serve WCQS Sparql endpoint through api.wikimedia.org with OAuth 2
Closed, DuplicatePublic
Actions

Assigned To

None

Authored By

	• Zbyszko
	Sep 3 2021, 7:33 AM

Description

As an external/bot developer I want to be able to authenticate with WCQS via my application/bot so that it can use the the service without human interaction.

Beta WCQS is currently basically set up as a 3rd party auth app, which is needed to verify registered users. We want to be able to use a project-level authentication with WCQS so that bots can interact with it similarly as they do with other Wikimedia projects.
After investigation, it was clear that the best way to do that would be to expose sparql endpoint for WCQS through api.wikimedia.org - providing us with the OAuth2 flow and rate limiting (even based on logged in/no logged in status).

AC:

Oauth bot authentication is available and documented

Related Objects
Search...

Status	Assigned	Task
Resolved	Ladsgroup	T271851 Clean up gui from the wdqs deploy repo and puppet
Resolved	None	T260568 [EPIC] Productionize WCQS
Resolved	EBernhardson	T280006 Set up the application authentication for WCQS on commons-query.wikimedia.org
Duplicate	None	T290300 Serve WCQS Sparql endpoint through api.wikimedia.org with OAuth 2

Event Timeline

• Zbyszko created this task.Sep 3 2021, 7:33 AM

Gehel triaged this task as Medium priority.Sep 6 2021, 12:47 PM

Gehel moved this task from Incoming to SDAW on the Wikidata-Query-Service board.

As an external/bot developer I want to be able to authenticate with WCQS via my application/bot so that it can use the the service without human interaction.

I can’t speak for others, but as a tool developer, I don’t want to have to authenticate with WCQS. If you put an OAuth gate before the query service, I will just remove support for it from my tool entirely (as I already said in this commit message).

@LucasWerkmeister current WCQS-beta behaviour is a bug - sparql endpoint should be authenticated as well, not only UI (we created a ticket for that - T290889) . Production WCQS will start with the authentication already in place.

In general - I understand your sentiment. Using the service without any additional authentication is much easier for tool developers - taking WDQS as an example.

On the other hand, we have been affected by this greatly, when maintaining that service - we don't have an effective way to block or limit users that cause or contribute to issues with the service stability. We made the decision to use an authentication in WCQS so that we won't have the same limitation with another service we maintain. I think this will help us as a team do a better job of keeping WCQS running smoothly.

In the future, we plan to use an API gateway (this ticket), that will provide an easier way for a non-interactive users to use the service. If you decide to drop support for WCQS now, I urge you to reconsider that in the future.

How you are planning to handle the use federated queries? Afaik tool creators will just rout the queries through Wikidata or some other endpoint which is without authentication? In other hand if you are blocking federated queries from Wikidata in example, then the service just is bad.

In any case with mandatory authentication you will basically limit all use cases where the client side would query data dynamically for UI (say like https://wikidocumentaries-demo.wmflabs.org ) to something where user approval is asked first.

• MPhamWMF removed a project: Discovery-Search (Current work).Sep 20 2021, 3:27 PM

• Zbyszko mentioned this in T290299: Replace token store in MW OAuth WCQS proxy with JWT .Oct 1 2021, 6:02 PM

Fuzheado subscribed.Dec 16 2021, 5:38 PM

In T290300#7349048, @Zbyszko wrote:

On the other hand, we have been affected by this greatly, when maintaining that service - we don't have an effective way to block or limit users that cause or contribute to issues with the service stability. We made the decision to use an authentication in WCQS so that we won't have the same limitation with another service we maintain. I think this will help us as a team do a better job of keeping WCQS running smoothly.

I share the concerns of @LucasWerkmeister here - we've come so far to finally spend the time to model and add statements to Commons (huzzah!) and we finally have a usable query service for it (huzzah!) and then for the last mile, we're restricting access to it by instituting authentication? For a community that has "open by default" as an ethos, it feels like such an "own goal" misstep here.

I'm thinking about the number of tools, scripts, and utilities that utilize SPARQL queries via Wikidata/WDQS that have given us tremendous capabilities... and the same approach or set of activities cannot be realized for WCQS because of this constraint. We cannot underestimate the headache of having to implement OAuth2 for each and every SPARQL query. I'm also puzzled how a service that has not even launched yet has to be this closed when none of our other APIs and services have started this way.

Having helped with investigating a WDQS outage caused by a *single* user once, I have a lot of sympathy for why we'd want authentication, but I worry that putting auth walls up for any access at all is a bad step (we've long had similar discussions about this for the MediaWiki API too). I think it would be helpful to have pointers to earlier conversations where less drastic measures like increased rate limits, or a split of unauthenticated vs authenticated traffic were considered and why they were deemed unworkable. And what resources are needed to offer this. I read the announcement, and personally I would take a reduced SLO + no auth required over having better uptime with authentication required.

In T290300#7354239, @Zache wrote:

In any case with mandatory authentication you will basically limit all use cases where the client side would query data dynamically for UI (say like https://wikidocumentaries-demo.wmflabs.org ) to something where user approval is asked first.

I see no reason why a tool couldn't use a tool-specific account for queries like that.

Hi,

I see no reason why a tool couldn't use a tool-specific account for

queries like that.

Because it prevents creating client only solutions and requests would need
to be routed via proxy which would do the authentication. This would
increase overall complexity. Another same kind of situation would be with
tools like wikishootme which currently directly queries information from
WQS in client side.

Anyway, like Legokm, I would also take a reduced SLO + no auth required
over having better uptime with authentication required.

Br,

Kimmo Virtanen, Zache

GFontenelle_WMF added a subscriber: FRomeo_WMF.Dec 16 2021, 8:18 PM

GFontenelle_WMF subscribed.

Abbe98 subscribed.Dec 17 2021, 8:36 AM

Because it prevents creating client only solutions and requests would need to be routed via proxy which would do the authentication. This would increase overall complexity.

+1. We're in a really fortunate position to being one of the very few large websites with an API that is accessible without authentication. It's really beneficial when explaining concepts of API's and knowledge graphs to students and they don't need to go through hoops to understand authentication and other things before doing a simple HTTP GET call. It's in our mission that we want to share the sum of human knowledge. It doesn't say anywhere that we should make that as easy as possible, but i think we should. Putting up an authentication layer is making it harder for people to access our knowledge.

LWyatt subscribed.Dec 17 2021, 4:31 PM

Multichill awarded a token.Dec 18 2021, 4:49 PM

No, as a tool developer I don't want to authenticate, see https://commons.wikimedia.org/wiki/Commons_talk:SPARQL_query_service/Upcoming_General_Availability_release#Mandatory_authentication_considered_harmful . Filed T297995 to remove it.

Agreed w/ many above, as a user I wouldn't want to authenticate most of the time either.
On a new device, on a public machine, on the go, testing something out, &c &x. Echoing Lego: "I would prefer a reduced SLO + no auth required over better uptime + auth"

Why were separate service channels deemed unworkable in the past? An optional "higher SLO + higher red-tape service channel" seems to make sense. Even in a total authocracy, you could automatically generate a new account for people who haven't logged in / can't log in / don't have an account; this can be invisible to them.

Izno subscribed.Dec 19 2021, 7:15 AM

There has been a response on meta to this from the WMF: https://commons.wikimedia.org/wiki/Commons_talk:SPARQL_query_service/Upcoming_General_Availability_release#Follow_up_response_to_WCQS_Authentication

I share the opinion of @Multichill, @LucasWerkmeister and others: I understand the rationale for authentication, and I think I can live with it as user ; but as a tool developer, I don’t want to have to implement OAuth2 in my tools (such as Tool-inteGraality) − I was planning to add WCQS to inteGraality (T294893) but frankly I’m unlikely to do so if I have to throw in oauth on top.

So a first step in making this acceptable could be to have an authentication mechanism that’s transparently figured out already for toolforge accounts (that might be covered by the API gateway plans mentioned by @Zbyszko?) − like credentials already available on disk like for ToolsDB for some straightforward mechanism (OAuth token, basic auth for all I care). I assume SUL is planned as auth provider − having to register a WIkimedia username for every tool would also be unnecessary hassle (some tools may have a companion bot account like integraality does ; many won’t) − then tool-users would need to be somehow recognized as well.

This of course does not solve the issue for non-Toolforge tools, or perhaps more crucially for user-scripts (for examples of SPARQL-querying scripts, come to my mind IdentifierInput or ExMusica.js − there must be plenty of others). What’s the plan for such scripts − having their credentials in plain text in the JS? Some proxy?

Spinster subscribed.Dec 22 2021, 11:59 AM

Sannita subscribed.Jan 3 2022, 3:57 PM

• sdkim mentioned this in T299649: Limit problematic queries for WCQS.Jan 20 2022, 3:24 PM

Dominicbm mentioned this in T307391: Enable CORS support for WCQS SPARQL endpoint access.May 2 2022, 7:06 PM

Gehel mentioned this in T313813: API Gateway to provide authorization and capacity management for W[CD]QS.Jul 26 2022, 2:49 PM

Merging this ticket into a T313813, where we are tracking more general work of utilizing the API Gateway, which is designed specifically as a platform tool to manage various WMF services, for WDQS and WCQS. This includes, among other things, specific handling of authentication that should be more robust and documented than if the Search team were to continue working on it independently.

• MPhamWMF closed this task as a duplicate of T313813: API Gateway to provide authorization and capacity management for W[CD]QS.Jul 27 2022, 12:38 PM

Serve WCQS Sparql endpoint through api.wikimedia.org with OAuth 2Closed, DuplicatePublicActions

Description

Related ObjectsSearch...

Event Timeline

Serve WCQS Sparql endpoint through api.wikimedia.org with OAuth 2
Closed, DuplicatePublic
Actions

Related Objects
Search...