Page MenuHomePhabricator

Increase the max length of URL to be shortened
Open, Needs TriagePublic

Description

Currently the maximum size for the URL to be shortened on the URL shortener is 2,000 characters. This is not enough for some Wikidata queries that main contain up to 4000 characters. My suggestion is to increase the limit to 10K and see if there are some use cases of longer URLs.

(example of query)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Habitator_terrae This is why the Meta stewards have the ability to delete a link is this present security or personal data protection issues, for example.

This would help solve another problem - at the moment the existing third-party shortener embedded in the query service returns a 431 error for queries which are longer than ~3000 characters.

(It's usually possible to cut a query to get it under this limit but it means that sometimes there's an incentive to remove whitespace, comments, etc, which can be counterproductive)

Is there any absolute length on what the query service will handle via the URL? If so, we should perhaps use that to define the upper limit on what's acceptable for the URL shortener.

Is there any absolute length on what the query service will handle via the URL? If so, we should perhaps use that to define the upper limit on what's acceptable for the URL shortener.

I don’t think so, though eventually it has to switch how it submits the query to the backend (T216590).

@Habitator_terrae This is why the Meta stewards have the ability to delete a link is this present security or personal data protection issues, for example.

If the link lenght would be grow up to 4000, link like w.wiki/uR (an File in Base64 code between []) would become more effective. So it could be missused for sharing also other content, that has nothing to do with wikimedia. This very problematic, because the size of the short links grow, when the size of the numbers of the short links grow. And I don't see how Steward could fix this alone.

This is not enough for some Wikidata queries that main contain up to 4000 characters. My suggestion is to increase the limit to 10K and see if there are some use cases of longer URLs.

After a certain point we're just creating an arbitrary text store, with little oversight and visibility. I don't really want to deal with the abuse potential of 10k characters.

I don't really think gigantic URLs are the best way to share SPARQL queries tbh, but that's a bit off topic for this task.

I don't really think gigantic URLs are the best way to share SPARQL queries

In case anyone is wondering, a work-around is to put the query on a wiki page and use a short URL for that page; like this:

https://w.wiki/vj

It should also be noted that the max url length for IE/Edge for instance is still 2047 characters even to this day.

I don't really think gigantic URLs are the best way to share SPARQL queries

In case anyone is wondering, a work-around is to put the query on a wiki page and use a short URL for that page; like this:

https://w.wiki/vj

This is a good workaround for a link to the query, but we cannot make a link to the results page this way.

The other workaround, of course, is just to copy & paste the URL into tinyurl, which seems able to take at least 4500 characters -- but that is something of a step back to square 1.

+ 1 on increasing length. Indeed long wikidata queries are probably the single most valuable use-case for the w.wiki url shortener.

The usefulness of the query system increases massively if links are easily shared in twitter posts, news articles, academic articles (Per comments at T224259).

Is the abuse potential significantly raised when character limit is increased from 2k->10k? I'd have thought the main danger point was going from 200->2k.

If the link lenght would be grow up to 4000, link like w.wiki/uR (an File in Base64 code between []) would become more effective. So it could be missused for sharing also other content, that has nothing to do with wikimedia. This very problematic, because the size of the short links grow, when the size of the numbers of the short links grow. And I don't see how Steward could fix this alone.

Maybe I'm being very naive, but I'm not sure how much of an issue this would really be. The sizes here are going to be a few kb at most - it's not like people will be able to code 300mb into a URL and abuse it for large-scale filesharing.

An alternative way to prevent this kind of abuse would be to to add a filter to check that the "shortened link" is actually valid - w.wiki/uR gives a "not found" error so could be filtered out that way.

I'm going to mark this as declined, as we have no plans to raise the length limit for now. You can use wiki pages or some other mechanism to share insanely long URLs, this isn't a pastebin service.

A core point of this service was to be to able to share long queries. (I'm guessing WDQS now accounts for the majority of organically requested shortenings?)

But the service doesn't let us share long queries. So we now have less functionality than we did before, now we don't even have a link from WDQS to the tinyurl shortener any more.

The shortener was supposed to be a way to avoid blocking up the wikitext of talk pages, emails, etc with vast long URLs.

"Declined" is not acceptable.

You can use wiki pages or some other mechanism to share […] long URLs

You can do this for URLs of any length – by that argument, you don’t need a URL shortener at all. The whole idea is that it makes sharing URLs more convenient, and for the Wikidata Query Service, that’s how queries have always been shared. There’s enough comments on this task already to testify that there is a real user need for supporting longer URLs.

What if we add a second rate limit to the URL shortener, to limit the number of bytes you shorten, in addition to the current limit on the number of URLs? (User::pingLimiter() has an $incrBy parameter, so technically, this should be possible.) The current limits are 10 (IP, newbie) and 50 (user) queries per 120 seconds, and the current average URL length is 366.3951 bytes, so an additional limit of 4000 / 20000 bytes per 120 seconds would be unlikely to affect other users, but allow users of the Query Service to still share URLs to long queries (at a lower rate).

You can use wiki pages or some other mechanism to share […] long URLs

You can do this for URLs of any length – by that argument, you don’t need a URL shortener at all. The whole idea is that it makes sharing URLs more convenient, and for the Wikidata Query Service, that’s how queries have always been shared. There’s enough comments on this task already to testify that there is a real user need for supporting longer URLs.

I'm not disputing the use case, I'm just saying that UrlShortener can't meet them. Wiki pages have significant anti-abuse measures from logging (RecentChanges, IRC, etc.) to filters (AbuseFilter, SpamBlacklist) and so on. UrlShortener (for good reasons) has none of that.

In any case, I think we're going up the wrong avenue here. As I alluded to earlier, why are we using URLs to store such giant blobs of text? It's not editable, collaborate-able, etc. Any refinement needs a brand new URL that needs to be re-distributed to everyone. I think it would be more valuable for such large queries that are intended to be shared (and not throwaways) to have a one click button or something on WDQS that automatically creates a wiki page for the query, and dumps the current results in a table or similar format. And then a convenient way to edit the query/re-generate the results.

Another option would be to have differential rate limits - 2k in general, longer links only allowed for query.wikidata.org URLs? (suggestion from Stefano)

Since the main use case for shotening long urls within wikimedia is 99% for query.wikidata.org, could that page instead include an option for the query to be piped straight to the url shortener (or compress its url)?

One of the provided example queries that query.wikidata.org provides is:

https://query.wikidata.org/#%23Get%20known%20variants%20reported%20in%20CIViC%20database%20%28Q27612411%29%20of%20genes%20reported%20in%20a%20Wikipathways%20pathway%3A%20Bladder%20Cancer%20%28Q30230812%29%0ASELECT%20DISTINCT%20%3Fpathway%20%3FpathwayLabel%20%3Fpwpart%20%3FpwpartLabel%20%3Fvariant%20%3FvariantLabel%20%3Fdisease%3FdiseaseLabel%20WHERE%20%7B%0A%0A%20%20%20VALUES%20%3Fpredictor%20%7Bp%3AP3354%20p%3AP3355%20p%3AP3356%20p%3AP3357%20p%3AP3358%20p%3AP3359%7D%0A%20%20%20VALUES%20%3FpredictorQualifier%20%7Bpq%3AP2175%7D%0A%20%20%20VALUES%20%3FwpID%20%7B%22WP2828%22%7D%0A%20%20%0A%20%20%20%3Fpathway%20wdt%3AP2410%20%3FwpID%20%3B%20%23%20Pathways%20has%20a%20Wikipathways%20identifier%0A%20%20%20%20%20%20%20%20%20%20wdt%3AP527%20%3Fpwpart%20.%20%23%20which%20contains%20pathways%20parts%0A%20%20%0A%20%20%20%3Fdisease%20wdt%3AP279%2B%20wd%3AQ504775%20.%20%20%23%20The%20disease%20is%20a%20subclass%20of%20urinary%20bladder%20cancer%20%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%23%20based%20on%20annotations%20in%20the%20Disease%20ontology%0A%20%20%20%3Fvariant%20wdt%3AP3329%20%3FcivicID%20%3B%20%23%20a%20variant%20known%20in%20CIViC%20%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%3Fpredictor%20%3Fnode%20%3B%20%23%20has%20a%20predicting%20relation%20with%20diseases%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%23%20labeled%20as%20being%20a%20subclass%20of%20urinary%20bladder%20cancer%0A%20%20%20%20%20%20%20%20%20%20%20%20%20wdt%3AP3433%20%3Fpwpart%20.%20%20%23%20variant%20is%20biological%20variant%20of%0A%20%20%20%0A%20%20%20%7B%3Fnode%20%3FpredictorStatement%20%3Fdrug_label%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%3FpredictorQualifier%20%3Fdisease%20.%7D%0A%20%20%20UNION%20%0A%20%20%20%7B%0A%20%20%20%20%20%20%3Fnode%20%3FpredictorStatement%20%3Fdisease%20%20.%0A%20%20%20%7D%0A%20%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D

An interim solution could be to automatically run everything >2000 characters through goo.gl or bit.ly first, then run that url through w.wiki so that they do they screening for us but we still get to have the w.wiki url?

It should also be noted that the max url length for IE/Edge for instance is still 2047 characters even to this day.

I don't think that's true anymore.

  • For ResourceLoader we have bumped the limit to 5,000 - citing that IE9 supports upto 5,000. The limit of 2K applied to IE6-8 only. I'm assuming this means IE11 and Edge support 5K as well.
  • For EvenLogging we considered the same (T208282), but was declined given the pending migration to EventGate POST bodies.
Ladsgroup added a subscriber: Ladsgroup.

I agree that we should increase the size limit just a bit. At least make it en-par with with url shortener services like bitly. It can be abused but I think having a smaller rate limit would be good anyway.

I do agree with @Legoktm that it would be nice to have a service where one could record and browse one's own and other people's recent queries -- and perhaps tag them into particular categories of interest.

T104762 "Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry)" has been open for this since July 2015. But no steps have been made towards it.

In the meantime, it would be useful now to be able to shorten longer queries. As others have noted above, this is the key use-case that the URL shortener was instituted for. It is unfortunate that the service is not fulfilling it.

Is there a problem with increasing the size limit. It /seems/ like a very easy quickwin. What's the story?

Given the community requests here, I took the liberty to reopen it. Talking to DBAs a while back, there's not much storage issues with url shortener and also as long as the blob is below 4GB (IIRC talking to Jaime) it should be fine. So I suggest increasing it to 5K for now (similar to ResourceLoader).

Change 617843 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Increase the url shortener url size limit from 2k to 5k

https://gerrit.wikimedia.org/r/617843

Given the community requests here, I took the liberty to reopen it. Talking to DBAs a while back, there's not much storage issues with url shortener and also as long as the blob is below 4GB (IIRC talking to Jaime) it should be fine. So I suggest increasing it to 5K for now (similar to ResourceLoader).

The issue is not, and has never been database size nor technical limitations. It's about the lack of anti-abuse measures and monitoring. Increasing the URL size means we're creating a place for anyone to just dump giant blobs of text that we now host/distribute. There are definitely legitimate cases for this, but unless we also have anti-abuse measures in place, I don't think we should be doing this (like I said earlier T220703#5383773...).

Ultimately, UrlShortener is a url shortener, not a pastebin.

I had to use TinyURL for several queries recently because the query service's own URL shortening failed. The two I can find again had URLs 3170 and 2521 characters long so 5k seems like a good limit for my queries.

What about tying the higher limit to a validation that the URL payload is indeed a SPARQL query? If you're really worried about abuse, you could even strip all comments and validate that the remainder is a syntactically valid SPARQL query. That would solve the main pain point of not being able to share long queries, yet not open up a pastebin for arbitrary large amounts of text.

Given the community requests here, I took the liberty to reopen it. Talking to DBAs a while back, there's not much storage issues with url shortener and also as long as the blob is below 4GB (IIRC talking to Jaime) it should be fine. So I suggest increasing it to 5K for now (similar to ResourceLoader).

The issue is not, and has never been database size nor technical limitations. It's about the lack of anti-abuse measures and monitoring. Increasing the URL size means we're creating a place for anyone to just dump giant blobs of text that we now host/distribute. There are definitely legitimate cases for this, but unless we also have anti-abuse measures in place, I don't think we should be doing this (like I said earlier T220703#5383773...).

Anti-abuse measures seems important, what about reducing the rate limit? The total number of shortened urls is pretty small and I don't see any reason to keep the rate limit that high and it would address your concern, at least to some degree. We can also have two levels of rate limit in url shortener, you would be able to shorten urls up to 2k with a rate limit, and shorten urls up to 5k with a way more strict rate limit (like 1 per minute). Whichever sounds better for you.

Ultimately, UrlShortener is a url shortener, not a pastebin.

URL shortener that doesn't shorten long urls would defy the point of having url shortener in the first place.

...but unless we also have anti-abuse measures in place, I don't think we should be doing this (like I said earlier T220703#5383773...).

Is reducing the rate limit enough for this as Ladsgroup suggested?

Anti-abuse measures seems important, what about reducing the rate limit? The total number of shortened urls is pretty small and I don't see any reason to keep the rate limit that high and it would address your concern, at least to some degree. We can also have two levels of rate limit in url shortener, you would be able to shorten urls up to 2k with a rate limit, and shorten urls up to 5k with a way more strict rate limit (like 1 per minute). Whichever sounds better for you.

The rate limit just prevents the amount of abuse we have to deal with, it doesn't actually stop it. For reference, this is 5000 characters:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

On the English Wikipedia, articles need to be at least 1.5k characters to be considered for DYK. So in theory you could fit 3+ articles inside the 5k limit, which illustrates that a limit that high is just an arbitrary text store.

I think what I said earlier is apt:

In any case, I think we're going up the wrong avenue here. As I alluded to earlier, why are we using URLs to store such giant blobs of text? It's not editable, collaborate-able, etc. Any refinement needs a brand new URL that needs to be re-distributed to everyone. I think it would be more valuable for such large queries that are intended to be shared (and not throwaways) to have a one click button or something on WDQS that automatically creates a wiki page for the query, and dumps the current results in a table or similar format. And then a convenient way to edit the query/re-generate the results.

I suppose I'll take a stab at working on T104762: Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry) then.

I want to mention that not all languages are ASCII, for example, just one letter in Persian becomes six characters when it's url-encoded: https://fa.wikipedia.org/wiki/%D9%88

E.g. this link to an WP:ANI case is already 792 characters:
https://fa.wikipedia.org/wiki/%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF%DB%8C%D8%A7:%D8%AA%D8%A7%D8%A8%D9%84%D9%88%DB%8C_%D8%A7%D8%B9%D9%84%D8%A7%D9%86%D8%A7%D8%AA_%D9%85%D8%AF%DB%8C%D8%B1%D8%A7%D9%86#%D8%AF%D8%B1%D8%AE%D9%88%D8%A7%D8%B3%D8%AA_%D9%82%D8%B7%D8%B9_%D8%AF%D8%B3%D8%AA%D8%B1%D8%B3%DB%8C_%D8%AF%D8%A7%D8%A6%D9%85_(%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:Amirrezasoltani)_%D8%A8%D9%87_%D8%AF%D9%84%DB%8C%D9%84_%D8%A7%D8%AE%D9%84%D8%A7%D9%84%DA%AF%D8%B1%DB%8C_%D9%87%D8%A7%DB%8C_%D9%85%D8%B3%D8%AA%D9%85%D8%B1_%D8%A7%D8%B2_%D8%B7%D8%B1%DB%8C%D9%82_%D8%A7%D9%81%D8%B2%D9%88%D8%AF%D9%86_%D9%85%D8%B7%D8%A7%D9%84%D8%A8_%D9%86%D8%A7%D9%85%D8%B1%D8%A8%D9%88%D8%B7_%D9%88_%D8%AD%D8%B0%D9%81_%D8%A7%D8%B7%D9%84%D8%A7%D8%B9%D8%A7%D8%AA_%D8%A7%D8%B5%D9%84%DB%8C_%D9%85%D9%82%D8%A7%D9%84%D8%A7%D8%AA

To comparison to English, this would be more 5000 characters if it's url encoded:
وووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووووو

@Legoktm is there any evidence that the current link shortener has been abused in the way you fear a link shortener capable of shortening longer URIs might be? b/c right now, a couple of years after this change request was made, we still have no effective way of communicating lengthy SPARQL reports on, for instance, social media, or in the Wikidata weekly summary - which for obvious reasons would prefer not to be bloated by 5k SPARQL report URIs where 19 char URIs would suffice.

@Legoktm is there any evidence that the current link shortener has been abused in the way you fear a link shortener capable of shortening longer URIs might be?

Yes :(

b/c right now, a couple of years after this change request was made, we still have no effective way of communicating lengthy SPARQL reports on, for instance, social media, or in the Wikidata weekly summary - which for obvious reasons would prefer not to be bloated by 5k SPARQL report URIs where 19 char URIs would suffice.

Just put the super long URL in a wiki page and link that.

@Legoktm is there any evidence that the current link shortener has been abused in the way you fear a link shortener capable of shortening longer URIs might be?

Yes :(

So arbitrarily constraining the maximum URL length that can be shortened has not prevented the abuse that the constraint was claimed to prevent.

Just put the super long URL in a wiki page and link that.

As noted here two and a half years ago: "This is a good workaround for a link to the query, but we cannot make a link to the results page this way."

@Legoktm might you be able to provide any details of that evidence?

@Legoktm is there any evidence that the current link shortener has been abused in the way you fear a link shortener capable of shortening longer URIs might be?

Yes :(

So arbitrarily constraining the maximum URL length that can be shortened has not prevented the abuse that the constraint was claimed to prevent.

Maybe! Maybe lengthening the limit won't invite more abuse, or maybe the 5k limit is holding back some stuff, or maybe people just haven't discovered it as a possible abuse vector.

Just put the super long URL in a wiki page and link that.

As noted here two and a half years ago: "This is a good workaround for a link to the query, but we cannot make a link to the results page this way."

That's fair. I made a new tool today called "long-sparql". Given a page like https://www.wikidata.org/wiki/User:Legoktm/test, where there's a query on the page with {{SPARQL|query=...}} looking at ?action=info, the page id is 19420423. You can now link directly to that query with https://ls.toolforge.org/p/19420423. The tool looks up the query on the page and then redirects you to it, which I believe should be identical to w.wiki. It's rather rough, but if people find it useful, it would be straightforward to add support to linking to queries on specific revisions of wiki pages, and provide a form for people to input a page's title and get the shorter link from it. It's mostly in proof-of-concept status currently, so let me know (source code: https://gitlab.com/legoktm/long-sparql/).

Ideally, the Wikidata query service interface would give you the option to save too-long queries onto a wiki page (some OAuth login), and then provide a different link based on that.

As a side note, when trying to deploy that tool to Toolforge, I ran into limits on how long of a URL could be redirected via Location: ... headers, because nginx has some default limits that the example query on my test page exceeded. w.wiki does not go through nginx but rather some other proxies (Apache, Varnish, ATS, envoy), so very long URLs might get caught in one of those too. Probably fixable or avoidable, but just another thing to keep in mind.

long-sparql looks neat. Do you think you could make it support {{query page}} as well? It has the same query parameter as {{SPARQL}}; you could also ask the API to parse {{query_page_name|style=url}} to get the URL out of it, but it looks like you already have a wikitext parser, so I guess you don’t need that.

Ideally, the Wikidata query service interface would give you the option to save too-long queries onto a wiki page (some OAuth login)

The straightforward way to do this would require enabling CORS access from query.wikidata.org to www.wikidata.org. I requested this a while ago in T218568, but it was declined, and having since found several XSS vulnerabilities in the query service UI, I’m now very glad about that, and fairly reluctant to consider it again.

Using OAuth would be an alternative, I guess (with a server backend that only allows very specific actions: creating user subpages with SPARQL content), but building and deploying that backend would be more work.

Just put the super long URL in a wiki page and link that.

As noted here two and a half years ago: "This is a good workaround for a link to the query, but we cannot make a link to the results page this way."

That's fair. I made a new tool today called "long-sparql"... You can now link directly to that query...

So how does long-sparql support "a link to the results page"?

So how does long-sparql support "a link to the results page"?

Maybe I'm missing some context, can you give an example of what a link to a results page would be?

So how does long-sparql support "a link to the results page"?

Maybe I'm missing some context, can you give an example of what a link to a results page would be?

Yes, anything under the "Short URL to results" link on query result pages.

Example: https://w.wiki/4QHr

Note that "Short URL to results" returns "URL shortening failed" for the example Tagishsimon gave, in a tweet cited up-thread; precisely because "the maximum size for the URL to be shortened on the URL shortener is 2,000 characters", as stated in the original ticket.

long-sparql looks neat. Do you think you could make it support {{query page}} as well? It has the same query parameter as {{SPARQL}}; you could also ask the API to parse {{query_page_name|style=url}} to get the URL out of it, but it looks like you already have a wikitext parser, so I guess you don’t need that.

Done. Now it just looks for the first query.wikidata.org link on the page (including via templates), which should make it compatible with all templates I hope: https://gitlab.com/legoktm/long-sparql/-/commit/07426c30876fd130aeceacc82fe4884c699594f6

Ex: https://www.wikidata.org/wiki/Template:Query_page/sandbox -> https://ls.toolforge.org/p/72919344

(sidenote: just so no one thinks I'm crazy, I'm definitely not parsing wikitext, it's using Parsoid HTML via https://docs.rs/parsoid/ :))

Ideally, the Wikidata query service interface would give you the option to save too-long queries onto a wiki page (some OAuth login)

The straightforward way to do this would require enabling CORS access from query.wikidata.org to www.wikidata.org. I requested this a while ago in T218568, but it was declined, and having since found several XSS vulnerabilities in the query service UI, I’m now very glad about that, and fairly reluctant to consider it again.

Using OAuth would be an alternative, I guess (with a server backend that only allows very specific actions: creating user subpages with SPARQL content), but building and deploying that backend would be more work.

The main reason I suggested OAuth is because I know WCQS is already using it, though I don't know the specific implementation nor how complex adding something like this would be. I agree with not enabling CORS access (like I did back then :)).

Yes, anything under the "Short URL to results" link on query result pages.

Example: https://w.wiki/4QHr

Cool, didn't know about embed.html. If the wiki page links to the embed.html page, it should just work. E.g. https://www.wikidata.org/wiki/User:Legoktm/Test2 -> https://ls.toolforge.org/p/16582503