Page MenuHomePhabricator

Increase the max length of URL to be shortened
Closed, DeclinedPublic

Description

Currently the maximum size for the URL to be shortened on the URL shortener is 2,000 characters. This is not enough for some Wikidata queries that main contain up to 4000 characters. My suggestion is to increase the limit to 10K and see if there are some use cases of longer URLs.

(example of query)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 11 2019, 2:29 PM
Lea_Lacroix_WMDE updated the task description. (Show Details)

Two other long queries are linked in T163683#3913304 – 1448 and 4850 characters.

Hi,
first, sry if this is written on the wrong page:
There must be put intention in that this could really easy missused by linking a very big file with https://de.wikipedia.org/[FILE] . For example I used it to link some (sry for that joke I will never do it again) comments like w.wiki/Jt.

@Habitator_terrae This is why the Meta stewards have the ability to delete a link is this present security or personal data protection issues, for example.

agray added a subscriber: agray.Apr 12 2019, 9:32 AM

This would help solve another problem - at the moment the existing third-party shortener embedded in the query service returns a 431 error for queries which are longer than ~3000 characters.

(It's usually possible to cut a query to get it under this limit but it means that sometimes there's an incentive to remove whitespace, comments, etc, which can be counterproductive)

Is there any absolute length on what the query service will handle via the URL? If so, we should perhaps use that to define the upper limit on what's acceptable for the URL shortener.

This comment was removed by Lucas_Werkmeister_WMDE.

Is there any absolute length on what the query service will handle via the URL? If so, we should perhaps use that to define the upper limit on what's acceptable for the URL shortener.

I don’t think so, though eventually it has to switch how it submits the query to the backend (T216590).

@Habitator_terrae This is why the Meta stewards have the ability to delete a link is this present security or personal data protection issues, for example.

If the link lenght would be grow up to 4000, link like w.wiki/uR (an File in Base64 code between []) would become more effective. So it could be missused for sharing also other content, that has nothing to do with wikimedia. This very problematic, because the size of the short links grow, when the size of the numbers of the short links grow. And I don't see how Steward could fix this alone.

This is not enough for some Wikidata queries that main contain up to 4000 characters. My suggestion is to increase the limit to 10K and see if there are some use cases of longer URLs.

After a certain point we're just creating an arbitrary text store, with little oversight and visibility. I don't really want to deal with the abuse potential of 10k characters.

I don't really think gigantic URLs are the best way to share SPARQL queries tbh, but that's a bit off topic for this task.

I don't really think gigantic URLs are the best way to share SPARQL queries

In case anyone is wondering, a work-around is to put the query on a wiki page and use a short URL for that page; like this:

https://w.wiki/vj

Framawiki added a subscriber: Framawiki.
TheDJ added a subscriber: TheDJ.Apr 29 2019, 10:09 AM

It should also be noted that the max url length for IE/Edge for instance is still 2047 characters even to this day.

I don't really think gigantic URLs are the best way to share SPARQL queries

In case anyone is wondering, a work-around is to put the query on a wiki page and use a short URL for that page; like this:
https://w.wiki/vj

This is a good workaround for a link to the query, but we cannot make a link to the results page this way.

Jheald added a comment.May 8 2019, 8:07 AM

The other workaround, of course, is just to copy & paste the URL into tinyurl, which seems able to take at least 4500 characters -- but that is something of a step back to square 1.

+1 for this. Further example of a long SPARQL report is at https://twitter.com/Tagishsimon/status/1128769104674996234 fwiw

+ 1 on increasing length. Indeed long wikidata queries are probably the single most valuable use-case for the w.wiki url shortener.

The usefulness of the query system increases massively if links are easily shared in twitter posts, news articles, academic articles (Per comments at T224259).

Is the abuse potential significantly raised when character limit is increased from 2k->10k? I'd have thought the main danger point was going from 200->2k.

If the link lenght would be grow up to 4000, link like w.wiki/uR (an File in Base64 code between []) would become more effective. So it could be missused for sharing also other content, that has nothing to do with wikimedia. This very problematic, because the size of the short links grow, when the size of the numbers of the short links grow. And I don't see how Steward could fix this alone.

Maybe I'm being very naive, but I'm not sure how much of an issue this would really be. The sizes here are going to be a few kb at most - it's not like people will be able to code 300mb into a URL and abuse it for large-scale filesharing.

An alternative way to prevent this kind of abuse would be to to add a filter to check that the "shortened link" is actually valid - w.wiki/uR gives a "not found" error so could be filtered out that way.

Legoktm closed this task as Declined.Aug 1 2019, 7:35 AM

I'm going to mark this as declined, as we have no plans to raise the length limit for now. You can use wiki pages or some other mechanism to share insanely long URLs, this isn't a pastebin service.

Jheald added a comment.EditedAug 1 2019, 9:19 AM

A core point of this service was to be to able to share long queries. (I'm guessing WDQS now accounts for the majority of organically requested shortenings?)

But the service doesn't let us share long queries. So we now have less functionality than we did before, now we don't even have a link from WDQS to the tinyurl shortener any more.

The shortener was supposed to be a way to avoid blocking up the wikitext of talk pages, emails, etc with vast long URLs.

"Declined" is not acceptable.

You can use wiki pages or some other mechanism to share […] long URLs

You can do this for URLs of any length – by that argument, you don’t need a URL shortener at all. The whole idea is that it makes sharing URLs more convenient, and for the Wikidata Query Service, that’s how queries have always been shared. There’s enough comments on this task already to testify that there is a real user need for supporting longer URLs.

What if we add a second rate limit to the URL shortener, to limit the number of bytes you shorten, in addition to the current limit on the number of URLs? (User::pingLimiter() has an $incrBy parameter, so technically, this should be possible.) The current limits are 10 (IP, newbie) and 50 (user) queries per 120 seconds, and the current average URL length is 366.3951 bytes, so an additional limit of 4000 / 20000 bytes per 120 seconds would be unlikely to affect other users, but allow users of the Query Service to still share URLs to long queries (at a lower rate).

You can use wiki pages or some other mechanism to share […] long URLs

You can do this for URLs of any length – by that argument, you don’t need a URL shortener at all. The whole idea is that it makes sharing URLs more convenient, and for the Wikidata Query Service, that’s how queries have always been shared. There’s enough comments on this task already to testify that there is a real user need for supporting longer URLs.

I'm not disputing the use case, I'm just saying that UrlShortener can't meet them. Wiki pages have significant anti-abuse measures from logging (RecentChanges, IRC, etc.) to filters (AbuseFilter, SpamBlacklist) and so on. UrlShortener (for good reasons) has none of that.

In any case, I think we're going up the wrong avenue here. As I alluded to earlier, why are we using URLs to store such giant blobs of text? It's not editable, collaborate-able, etc. Any refinement needs a brand new URL that needs to be re-distributed to everyone. I think it would be more valuable for such large queries that are intended to be shared (and not throwaways) to have a one click button or something on WDQS that automatically creates a wiki page for the query, and dumps the current results in a table or similar format. And then a convenient way to edit the query/re-generate the results.

agray added a comment.Aug 1 2019, 9:41 AM

Another option would be to have differential rate limits - 2k in general, longer links only allowed for query.wikidata.org URLs? (suggestion from Stefano)

Thomas_Shafee added a comment.EditedAug 1 2019, 11:28 AM

Since the main use case for shotening long urls within wikimedia is 99% for query.wikidata.org, could that page instead include an option for the query to be piped straight to the url shortener (or compress its url)?

One of the provided example queries that query.wikidata.org provides is:

https://query.wikidata.org/#%23Get%20known%20variants%20reported%20in%20CIViC%20database%20%28Q27612411%29%20of%20genes%20reported%20in%20a%20Wikipathways%20pathway%3A%20Bladder%20Cancer%20%28Q30230812%29%0ASELECT%20DISTINCT%20%3Fpathway%20%3FpathwayLabel%20%3Fpwpart%20%3FpwpartLabel%20%3Fvariant%20%3FvariantLabel%20%3Fdisease%3FdiseaseLabel%20WHERE%20%7B%0A%0A%20%20%20VALUES%20%3Fpredictor%20%7Bp%3AP3354%20p%3AP3355%20p%3AP3356%20p%3AP3357%20p%3AP3358%20p%3AP3359%7D%0A%20%20%20VALUES%20%3FpredictorQualifier%20%7Bpq%3AP2175%7D%0A%20%20%20VALUES%20%3FwpID%20%7B%22WP2828%22%7D%0A%20%20%0A%20%20%20%3Fpathway%20wdt%3AP2410%20%3FwpID%20%3B%20%23%20Pathways%20has%20a%20Wikipathways%20identifier%0A%20%20%20%20%20%20%20%20%20%20wdt%3AP527%20%3Fpwpart%20.%20%23%20which%20contains%20pathways%20parts%0A%20%20%0A%20%20%20%3Fdisease%20wdt%3AP279%2B%20wd%3AQ504775%20.%20%20%23%20The%20disease%20is%20a%20subclass%20of%20urinary%20bladder%20cancer%20%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%23%20based%20on%20annotations%20in%20the%20Disease%20ontology%0A%20%20%20%3Fvariant%20wdt%3AP3329%20%3FcivicID%20%3B%20%23%20a%20variant%20known%20in%20CIViC%20%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%3Fpredictor%20%3Fnode%20%3B%20%23%20has%20a%20predicting%20relation%20with%20diseases%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%23%20labeled%20as%20being%20a%20subclass%20of%20urinary%20bladder%20cancer%0A%20%20%20%20%20%20%20%20%20%20%20%20%20wdt%3AP3433%20%3Fpwpart%20.%20%20%23%20variant%20is%20biological%20variant%20of%0A%20%20%20%0A%20%20%20%7B%3Fnode%20%3FpredictorStatement%20%3Fdrug_label%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%3FpredictorQualifier%20%3Fdisease%20.%7D%0A%20%20%20UNION%20%0A%20%20%20%7B%0A%20%20%20%20%20%20%3Fnode%20%3FpredictorStatement%20%3Fdisease%20%20.%0A%20%20%20%7D%0A%20%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D

An interim solution could be to automatically run everything >2000 characters through goo.gl or bit.ly first, then run that url through w.wiki so that they do they screening for us but we still get to have the w.wiki url?

Krinkle added a subscriber: Krinkle.EditedAug 1 2019, 11:49 AM

It should also be noted that the max url length for IE/Edge for instance is still 2047 characters even to this day.

I don't think that's true anymore.

  • For ResourceLoader we have bumped the limit to 5,000 - citing that IE9 supports upto 5,000. The limit of 2K applied to IE6-8 only. I'm assuming this means IE11 and Edge support 5K as well.
  • For EvenLogging we considered the same (T208282), but was declined given the pending migration to EventGate POST bodies.
Ladsgroup added a subscriber: Ladsgroup.

I agree that we should increase the size limit just a bit. At least make it en-par with with url shortener services like bitly. It can be abused but I think having a smaller rate limit would be good anyway.

Jheald added a comment.Aug 1 2019, 3:01 PM

I do agree with @Legoktm that it would be nice to have a service where one could record and browse one's own and other people's recent queries -- and perhaps tag them into particular categories of interest.

T104762 "Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry)" has been open for this since July 2015. But no steps have been made towards it.

In the meantime, it would be useful now to be able to shorten longer queries. As others have noted above, this is the key use-case that the URL shortener was instituted for. It is unfortunate that the service is not fulfilling it.