Page MenuHomePhabricator

LinkSearch uses numeric offset paging instead of paging by last entry returned
Open, MediumPublic

Description

I noticed today, that paging on special:linksearch uses a numeric offset. For example: http://commons.wikimedia.org/w/index.php?title=Special:LinkSearch&limit=500&offset=5000&target=http%3A%2F%2Fwww.wga.hu%2Fart%2F

For efficiency reasons it should probably use an offset of the last entry displayed on the page. (For example: offset=http://www.wga.hu/art/c/crespi/daniele/pieta.jpg or something)

Note: I say this without looking how it is implemented/the db queries. So I may be misunderstanding the issues involved.


Version: 1.21.x
Severity: normal
URL: http://commons.wikimedia.org/w/index.php?title=Special:LinkSearch&limit=500&offset=5000&target=http%3A%2F%2Fwww.wga.hu%2Fart%2F

Related Objects

StatusSubtypeAssignedTask
ResolvedBawolff
OpenNone
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
DuplicateLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedAntoine_Quhen
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
OpenLadsgroup
ResolvedBUG REPORTLadsgroup

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:14 AM
bzimport set Reference to bz45237.
bzimport added a subscriber: Unknown Object (MLST).

API uses also offset, because there is no primary key you can use for pagination. That makes it also hard to reverse the order.

primary key was added with gerrit 51675, maybe that helps

@matmarex Why was T130058 merged? That appears to be tied to a recent other bug fix.

I think this is the only way to solve T130058. The artificial limit of 10 000 could be removed, I guess, but then you'd hit the real limit around 50 000 or so when the query time exceeds the timeout. Scanning 50 000 rows is just too slow.

@matmarex Sorry for not seeing this sooner: 50,000 fixes most of The Wikipedia Library concern for the short term (though not all -- JSTOR, one of our longest running partnerships, has over 70,000 links).

The magic bullet for the large number problem (since it mostly effects English), might be the ability to generate reports by namespace -- that way you could distinguish between Article space, and project space and User page links. The bulk of the use cases for Special:LinkSearch have functional difference between how links are used in Articles and how links are used in User space, etc.

Change 935445 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] ExternalLinks: Make order by and continue only rely on el_id in READ NEW

https://gerrit.wikimedia.org/r/935445

This patch doesn't solve the issue but it would make it easier to solve.

Change 935445 merged by jenkins-bot:

[mediawiki/core@master] ExternalLinks: Make order by and continue only rely on el_id in READ NEW

https://gerrit.wikimedia.org/r/935445

Change 935856 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.41.0-wmf.16] ExternalLinks: Make order by and continue only rely on el_id in READ NEW

https://gerrit.wikimedia.org/r/935856

Change 935857 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.41.0-wmf.15] ExternalLinks: Make order by and continue only rely on el_id in READ NEW

https://gerrit.wikimedia.org/r/935857

Change 935857 abandoned by Ladsgroup:

[mediawiki/core@wmf/1.41.0-wmf.15] ExternalLinks: Make order by and continue only rely on el_id in READ NEW

Reason:

https://gerrit.wikimedia.org/r/935857

Change 935856 merged by jenkins-bot:

[mediawiki/core@wmf/1.41.0-wmf.16] ExternalLinks: Make order by and continue only rely on el_id in READ NEW

https://gerrit.wikimedia.org/r/935856

Mentioned in SAL (#wikimedia-operations) [2023-07-10T13:35:09Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:935856|ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237)]]

Mentioned in SAL (#wikimedia-operations) [2023-07-10T13:36:38Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:935856|ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-07-10T13:46:12Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:935856|ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237)]] (duration: 11m 03s)

Change 939354 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] [WIP] SpecialLinkSearch: Switch to ordering based on el_id

https://gerrit.wikimedia.org/r/939354