Page MenuHomePhabricator

Page.isEmpty
Closed, ResolvedPublic

Description

I've always considered this method to be a very strange thing, with its definition of empty being unlikely to be useful elsewhere. Why 4? my guess is that 4 characters is too small to be useful, e.g. {{a}} and [[a]] are 5 chars.

If isEmpty is unintentionally used on a category page, which on small wikis are usually 'empty' except for category links, the category page will be considered to be 'empty'. This is why interwiki.py needs to also check isCategory.

My gut feeling is that this method should be copied to be a function in interwiki.py, where you can optimise it how you want, so that it skips useless pages using faster algorithms. e.g. if the page is in a content namespace, rather than looking at interwikilinks and category links, I am guessing that the following will be much faster and be almost as good:

def simpleEmptyCheck(page):
    try:
        # get the 50th character, if it exists
        page.text[50]
        return False
    except IndexError:
        return True

Then Page.isEmpty can be marked as @deprecated .

The use in page_tests would then be moved into a new test method in TestPageDeprecation.

Details

Related Gerrit Patches:

Event Timeline

jayvdb created this task.Sep 11 2015, 10:08 PM
jayvdb raised the priority of this task from to Needs Triage.
jayvdb updated the task description. (Show Details)
jayvdb added a subscriber: jayvdb.
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptSep 11 2015, 10:08 PM

i would like to work on this bug could you assigned me this bug

i would like to work on this bug could you assigned me this bug

Hello @Rajdeep594! You neednt have the task assigned to you to start working on it. If you feel like, you can click on 'Edit Task' and add your name to 'Assinged To' field.

You can find the steps to create and push a patch to gerrit at https://www.mediawiki.org/wiki/Gerrit/Tutorial Good luck!

hi 01tonythomas,
i will be glad to work on this bug.
thank you
Regards,
Rajdeep kaur

Rajdeep594 set Security to None.
XZise added a subscriber: XZise.Oct 1 2015, 2:51 PM

Hello @Rajdeep594! You neednt have the task assigned to you to start working on it. If you feel like, you can click on 'Edit Task' and add your name to 'Assinged To' field.

Alternatively if you comment below you can change the “Action” from “Comment” to “Reassign/Claim” and you can set whoever claims it (by default it's you). And if someone is committed to work on a bug it's always better to claim it of course so that others see that it's being handled. Now I don't know what your intentions were, because of course you can still work on it without claiming it (as you said).

Change 243487 had a related patch set uploaded (by John Vandenberg):
deprecate Page.isEmpty

https://gerrit.wikimedia.org/r/243487

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 11 2015, 9:46 PM

@Rajdeep594 , do you intend to complete this task? If not, you can un-assign it, so someone else may complete the task.

Xqt added a subscriber: Xqt.Nov 15 2015, 12:40 PM

The main reason for this isEmpty() method is to prohibit empty pages beeing fetched into the interlanguage links on other sites or removed from it (because the page was vandalized). This method leads interwiki.py just for skipping it. I don't see any advantage to change that behaviour.

Xqt, the proposal is for interwiki.py to have its own 'is empty' function which can be improved to be more intelligent (e.g. adding a 'is vandalised' check) without breaking other tools which depend on Page.isEmpty returning False if it is four chars or less.

Change 243487 abandoned by John Vandenberg:
deprecate Page.isEmpty

Reason:
no response from @Rajdeep, please feel free to resume this work if you are still interested.

https://gerrit.wikimedia.org/r/243487

jayvdb removed Rajdeep594 as the assignee of this task.Nov 17 2015, 11:46 PM
Xqt added a comment.Nov 18 2015, 7:54 AM

@jayvdb, anyway this was an inappropriate implementation for a minor problem if any.

@jayvdb, anyway this was an inappropriate implementation for a minor problem if any.

Yup. Ok

vadi claimed this task.Dec 7 2015, 5:12 PM
XZise added a comment.Dec 7 2015, 5:28 PM

Just as a general note, these 6 lines mentioned in the OP could be easily shortened into one: return len(page.text) > 50

Change 257367 had a related patch set uploaded (by Vadiraja.k):
Deprecate page.isEmpty() and add function in interwiki to check for empty pages

https://gerrit.wikimedia.org/r/257367

Change 257367 merged by jenkins-bot:
Deprecate page.isEmpty() and add function in interwiki to check for empty pages

https://gerrit.wikimedia.org/r/257367

jayvdb closed this task as Resolved.Dec 9 2015, 4:45 AM