Page MenuHomePhabricator

wdqs-proxy: write protection (reject SPARQL update queries)
Closed, ResolvedPublic2 Estimated Story Points

Description

wdqs-proxy needs the capability to control the semantic of incoming queries and block
UPDATEs.

In the current architecture, this is handled by nginx assigning a specific header that instructs Blazegraph to reject SPARQL UPDATE queries. In v2, we can centralize this logic in the passthrough layer.

This would make the configuration easier for us to reason about (it's closer to query inspection, and federation allow list management) and simplify the setup for third parties (e.g., Wikibase Cloud and Suite).

AC

  • wdqs-proxy has the capability to reject UPDATE queries. (Edit: changed our minds about this part of the original AC: "This should be default behaviour, configurable ")

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Add tests to document preexisting behavior for SPARQL updates in wdqs-proxyrepos/wikidata-platform/wdqs/wdqs-proxy!33lericksonwrite-protectionmain
Customize query in GitLab

Event Timeline

trueg renamed this task from wdqs-proxy: write protection to wdqs-proxy: write protection (reject SPARQL update queries).Apr 30 2026, 3:59 PM
lerickson set the point value for this task to 2.May 6 2026, 2:21 PM
lerickson changed the task status from Open to In Progress.May 11 2026, 6:36 PM

lerickson merged https://gitlab.wikimedia.org/repos/wikidata-platform/wdqs/wdqs-proxy/-/merge_requests/33

Add tests to document preexisting behavior for SPARQL updates in wdqs-proxy

It turns out that the proxy was already rejecting updates, because it created and executed a Query object and in Jena, a Query is used to represent the read-only operations (see Query javadoc with the different QueryTypes). UpdateRequest is used to hold a request to insert/delete and other non-read-only actions.

Right now passing an update string throws a QueryException, which is re-thrown as QueryParsingException. QueryRewriterService.rewriteQuery() catches QueryParsingException and converts it to a BadRequestException (HTTP 400).

This is fine for use because 4xx requests do not count towards our error budget, but it's a bit obscure IMHO. How about we handle the update esplicitely and return a 405 (method not allowed), signaling that the endpoint is read-only?

Not tested, but how about something like this?

private Query parseQuery(String queryString) throws QueryParsingException, UpdateOperationException {
    String withPrefixes = SparqlTools.defaultPrefixes() + queryString;                                                                                                 
    try {                                                                                                                                                              
        return QueryFactory.create(withPrefixes);
    } catch (QueryException ex) {                                                                                                                                      
        try {                                                                                                                                                          
            UpdateFactory.create(withPrefixes);
            throw new UpdateOperationException("SPARQL update operations are not permitted");                                                                          
        } catch (QueryException ignored) {
            throw new QueryParsingException(ex.getLocalizedMessage(), ex);                                                                                             
        }
    }                                                                                                                                                                  
}

and then we handle the UpdateOperationException explicitely in rewriter error code path

public String rewriteQuery(String query) {
    try {                                                                                                                                                              
        return queryRewriter.rewriteQuery(query).toString();
    } catch (UpdateOperationException e) {                                                                                                                             
        throw new WebApplicationException(
            Response.status(Response.Status.METHOD_NOT_ALLOWED)                                                                                                        
                    .header("Allow", "GET, POST")
                    .entity("SPARQL update operations are not permitted")                                                                                              
                    .build());
    } catch (QueryParsingException ex) {                                                                                                                               
        throw new BadRequestException(ex);
    } catch (FederationException e) {                                                                                                                                  
        throw new BadRequestException(e);                                                                                                                              
    } catch (QueryRewriterException e) {
        throw new ServerErrorException(Response.Status.INTERNAL_SERVER_ERROR, e);                                                                                      
    }                                                                                                                                                                  
}

This should not introduce overhead on the hot path. For a valid SELECT/ASK/CONSTRUCT/DESCRIBE query, QueryFactory.create() succeeds and the method returns.

We pay some extra cost only when QueryFactory has already thrown, which means the request was going to result in an error response anyway. In this path the latenc
of the response doesn't matter.

Thank you! I ended up doing something very similar. I like your idea of passing along the cause with an UpdateOperationException and then returning a better HTTP response. I was originally planning to update the response but had 403 in mind, but this one (405) is better.

The MR I have in progress now also rearranges things a bit to preserve the ability to pass the updates along through the rewriter (unchanged) if the config is set up to allow it (that feature will come in a follow-up MR).

lerickson opened https://gitlab.wikimedia.org/repos/wikidata-platform/wdqs/wdqs-proxy/-/merge_requests/35

Draft: Remove "hint" from the default prefixes list, and add an explicit test for hint removal.

lerickson closed https://gitlab.wikimedia.org/repos/wikidata-platform/wdqs/wdqs-proxy/-/merge_requests/35

Remove "hint" from the default prefixes list, and add an explicit test for hint removal.

Thank you! I ended up doing something very similar. I like your idea of passing along the cause with an UpdateOperationException and then returning a better HTTP response. I was originally planning to update the response but had 403 in mind, but this one (405) is better.

On second thought. 405 came to mind because it's what Fuskei does. However, Fuseki will return a 405 when a sparql update query is sent via a GET (sparql updates are only allowed over POST) to signify the GET http method invalid.

Returning 405 on both POST and GET is mixing sparql and http semantics. Maybe 403, without and allow header in response, was a better choice here?

I think 405 makes more sense. A 403 suggests that there is a way to get permission, which is not the case, writes are simply not possible via this endpoint.

Actually, I was to quick in answering, while a 403 does not really fit, a 405 does not, either (because the HTTP method is not the problem here, and we cannot reply with a proper Allow header). I now think we should actually stick to a 400 and produce a proper error reply as suggested in T424218.

Good point @trueg that the HTTP method isn't the issue, so 405 isn't as close as I thought it was. I think I'm OK with using 400 given all this context.

lerickson merged https://gitlab.wikimedia.org/repos/wikidata-platform/wdqs/wdqs-proxy/-/merge_requests/34

Add UpdateOperationException to convey that an illegal update operation was attempted.

The proxy now rejects updates (well, it actually already did) and diagnoses them as being disallowed writes. Right now it just returns a plain old 400 Bad Request, but later on in T424218 we will add better info. This one is done, though.