Throttling access to Special Pages that make potentially expensive queries
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	Reedy
	Mar 20 2017, 4:08 PM

Description

So, AllPages isn't cached, and it probably shouldn't be. But it can be used to make expensive queries. And users can make a lot of simultaneous requests, which isn't good.

It'd be reasonable if we had a way to limit simultaneous queries by a user in these cases

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		• jcrespo	T160916 Special:AllPages disabled due to performance issues
		Declined		None	T160920 Throttling access to Special Pages that make potentially expensive queries

Event Timeline

Reedy created this task.Mar 20 2017, 4:08 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 20 2017, 4:08 PM

Reedy mentioned this in T160914: Databases overflown with connections due to slow query on Special:AllPages.Mar 20 2017, 4:09 PM

What about doing pagination?

It is using pagination

@Reedy I am guessing that each time it sorts entire table by url or the name of the article? Which i guess is extremely expensive when compared to sorting by ID

Also any chance you could provide me all URLs of all articles?

It shows currently

Welcome to Wikipedia,
the free encyclopedia that anyone can edit.
5,363,335 articles in English

In T160920#3114875, @BurstPower wrote:

@Reedy I am guessing that each time it sorts entire table by url or the name of the article? Which i guess is extremely expensive when compared to sorting by ID

Also any chance you could provide me all URLs of all articles?

It shows currently

Welcome to Wikipedia,
the free encyclopedia that anyone can edit.
5,363,335 articles in English

https://dumps.wikimedia.org/enwiki/20170301/enwiki-20170301-all-titles-in-ns0.gz is a list of all NS 0 ala "content" pages, which on enwiki is the only NS

reedy@tin:/srv/mediawiki-staging/php-1.29.0-wmf.16$ mwscript eval.php enwiki
> var_dump( $wgContentNamespaces );
array(1) {
  [0]=>
  int(0)
}

In T160920#3114842, @Reedy wrote:

It is using pagination

It's most likely a bug as it uses elasticsearch. Unless it is using both mysql and elastic search for searching. Elasticsearch default limit for searching is 10,000.

In T160920#3114927, @Paladox wrote:

In T160920#3114842, @Reedy wrote:

It is using pagination

It's most likely a bug as it uses elasticsearch. Unless it is using both mysql and elastic search for searching. Elasticsearch default limit for searching is 10,000.

It's nothing to do with ElasticSearch

Why is it potentially expensive? We have an index on NS + title.

@Reedy ty very much that file really helps me. It contains redirects as well however i can handle them :)

Ah, OK, the redirect filter. Just remove it? Unindexed queries like that should not be exposed in miser mode.

@Reedy i need some more help about redirects

Because it doesnt show the redirect in HttpWebRequest as ResponseUri

So my only option is encoding title to obtain real url

However i see that space character is encoded as _ instead of +

Are there any other additional rules?

@Tgr @Paladox

In T160920#3115217, @BurstPower wrote:

@Reedy i need some more help about redirects

Because it doesnt show the redirect in HttpWebRequest as ResponseUri

So my only option is encoding title to obtain real url

However i see that space character is encoded as _ instead of +

Are there any other additional rules?

@Tgr @Paladox

Well, the " " being exposed as "_" is very typical in MediaWiki, look at the page URLs to see this.

The API will resolve redirects for you, for example https://en.wikipedia.org/w/api.php?action=query&titles=WP:AWB&redirects

@Reedy ty for answer. however, still it doesnt show encoded URL. i need encoded URLs to match against same pages in other languages :)

I wish it was doing server side redirect instead of client side. That way i would have the absolute final URL

But i wonder this

Can we assume that the absolute final URL is , the title of the page (obtain from H1) , replace space character with _ and then URL encode it?

I note this is really the wrong task for these discussions.

What sort of encoding do you have? What sort of encoding do you need?

In T160920#3115310, @BurstPower wrote:

Can we assume that the absolute final URL is , the title of the page (obtain from H1) , replace space character with _ and then URL encode it?

You can't take the title of the page from the <h1>, as it can be modified for display (e.g. iPhone with lowercase first letter or various articles with HTML formatting).

But… if you're already able to access the page HTML, you must already have a title, and in fact, you must already have the URL you used to access the page… why do you need to build it again?

FWIW, canonical page URLs are just https://en.wikipedia.org/wiki/ + page title with spaces replaced with _ and other special characters percent-encoded as usual.

matmarex unsubscribed.Mar 20 2017, 6:38 PM

The API can resolve redirects for you (see the redirects parameter); that, unlike filtering, should not raise any performance concerns.

@BurstPower here is a list of all pages in enwiki's mainspace minus redirects: http://tools.wmflabs.org/betacommand-dev/reports/en_articles.txt

• jcrespo mentioned this in T160985: Create an easy to deploy kill switch for every self-contained mediawiki functionality.Mar 21 2017, 6:02 PM

1997kB moved this task from To triage to Special:Contributions / Special:DeletedContributions on the MediaWiki-Special-pages board.Dec 24 2019, 2:03 PM

1997kB moved this task from Special:Contributions / Special:DeletedContributions to To triage on the MediaWiki-Special-pages board.Dec 24 2019, 2:53 PM

Izno moved this task from To triage to SpecialPage system on the MediaWiki-Special-pages board.Apr 24 2022, 12:50 AM

In T160920#3115124, @Tgr wrote:

Ah, OK, the redirect filter. Just remove it? Unindexed queries like that should not be exposed in miser mode.

That was already disabled with https://gerrit.wikimedia.org/r/c/mediawiki/core/+/343681

Is there still the need to throttle the special page? For example for other queries from that page. The task description needs some more information.

In T160920#8327261, @Umherirrender wrote:

In T160920#3115124, @Tgr wrote:

Ah, OK, the redirect filter. Just remove it? Unindexed queries like that should not be exposed in miser mode.

That was already disabled with https://gerrit.wikimedia.org/r/c/mediawiki/core/+/343681

Is there still the need to throttle the special page? For example for other queries from that page. The task description needs some more information.

Does anyone know? (Heads up to Reedy)

Umherirrender unsubscribed.Feb 10 2023, 7:32 PM

Is there still the need to throttle the special page? For example for other queries from that page. The task description needs some more information.

@Reedy: No reply; declining for the time being.

Throttling access to Special Pages that make potentially expensive queriesClosed, DeclinedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Throttling access to Special Pages that make potentially expensive queries
Closed, DeclinedPublic
Actions

Related Objects
Search...