Page MenuHomePhabricator

isCategoryRedirect is very slow
Open, NormalPublic

Description

In page.py we have a function "isCategoryRedirect":

def isCategoryRedirect(self):
   """Return True if this is a category redirect page, False otherwise."""

This is rather slow. Maybe some caching should be used.

Details

Reference
bz59080

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 2:33 AM
bzimport added a project: Pywikibot-General.
bzimport set Reference to bz59080.
bzimport added a subscriber: Unknown Object (????).
Ladsgroup closed this task as Invalid.Jun 7 2015, 3:26 PM
Ladsgroup removed a project: Pywikibot-General.
Ladsgroup set Security to None.

I checked the code very carefully, no more caching can be added and it's fast enough (it's not very fast for first run in every site though but It caches template afterwards and gets really fast)

jayvdb reopened this task as Open.Jun 7 2015, 3:32 PM
jayvdb added a subscriber: jayvdb.

It is using caching, but it is doing additional, unnecessary work.

It is also determining the category redirect target, so it uses templatesWithParams.

There is no speed advantage of doing that in isCategoryRedirect. It can be done in isCategoryRedirectTarget.

Instead isCategoryRedirect can use Page.templates() , which will be cached and is used by templatesWithParams() anyway.

Some of them are unnecessary works but none of them are resource consuming (by creating unnecessary API calls) and removing them doesn't affect very much at speed of the method. That's just my opinion.

There is no speed advantage of doing that in isCategoryRedirect. It can be done in isCategoryRedirectTarget.

getCategoryRedirectTarget?

jayvdb added a comment.Jun 7 2015, 3:59 PM

There is no speed advantage of doing that in isCategoryRedirect. It can be done in isCategoryRedirectTarget.

getCategoryRedirectTarget?

yes, sorry.

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptSep 25 2016, 4:50 AM

Marking as easy.
What essentially needs to be done is change the self.templatesWithParams() to self.templates() because we don't want the parameters which contain the target of the redirect.

The appropritate functionality will have to be moved to getCategoryRedirectTarget as that's where the parameter (the redirect target) is needed.

This makes the function faster because we lazily find the target only when needed.

Change 339930 had a related patch set uploaded (by IulianR):
Improve isCategoryReview

https://gerrit.wikimedia.org/r/339930

matej_suchanek updated the task description. (Show Details)
matej_suchanek moved this task from Backlog to Doing on the good first bug board.
matej_suchanek added a subscriber: Ankit.0905.
Xqt added a subscriber: Xqt.Feb 26 2017, 3:21 PM

The proposal of the given patch is a breaking change because isCategoryRedirect() is True now even if the redirect target is not a category or the target is not given. I don't see any advantage here.

Xqt added a comment.Feb 26 2017, 3:43 PM

An alternate implementation is made in https://gerrit.wikimedia.org/r/#/c/312739,edit/ with the same Problem. I don't see an advantage against the current implementation for both patches. There is a kind of code duplication reading the templates which gives wrong results now. To ensure isCategoryRedirect you have to confirm there is a template and that leads to a valid category which means you have to get the target.

@Xqt , from a wiki point of view, the new behavior is correct: when the category redirects to a page from another namespace, an error is shown, but the category is still included in the "Redirected categories" tracking category. It is also in accordance with the behavior of page.isRedirect, where no check is made.

The problem is just if category.py can handle the new behavior. I say we move to this new behavior after announcing it on the list and then change the rest of the scripts to accomodate it.

Xqt added a comment.Feb 27 2017, 5:02 AM

I think we could introduce a new method is_category_redirect with the new behavior and deprecate the old one but keep the old behavior unchanged. This ensures not breaking private scripts. And be should find out the reasons for the current implementations e.g. by blaming the old compat code parts.

Aklapper moved this task from Doing to Backlog on the good first bug board.Oct 16 2017, 2:35 PM
D3r1ck01 moved this task from Backlog to Doing on the good first bug board.Nov 5 2017, 5:22 PM

Change 339930 abandoned by Xqt:
Improve isCategoryReview

https://gerrit.wikimedia.org/r/339930

Change 406271 had a related patch set uploaded (by Matěj Suchánek; owner: Matěj Suchánek):
[pywikibot/core@master] Cleanup Page.isCategoryRedirect

https://gerrit.wikimedia.org/r/406271

matej_suchanek removed IulianR as the assignee of this task.Jan 26 2018, 10:55 AM

Change 406271 abandoned by Matěj Suchánek:
Cleanup Page.isCategoryRedirect

Reason:
Superseded by Ib74d621

https://gerrit.wikimedia.org/r/406271

Dvorapa added a subscriber: Dvorapa.Jun 3 2018, 3:31 PM
In T61080#3056850, @Xqt wrote:

I think we could introduce a new method is_category_redirect with the new behavior and deprecate the old one but keep the old behavior unchanged. This ensures not breaking private scripts. And be should find out the reasons for the current implementations e.g. by blaming the old compat code parts.

Agree