Page MenuHomePhabricator

Enhance Pywikibot.Page with is_person method
Closed, DeclinedPublicFeature

Description

Feature summary: Pywikibot.Page should have an is_person_wd() method to determine if the article is about a person.

Return True, if the page instantiates a person, i.e. value of P31 is Q5 (the simpliest way)
Return False, if

  • it is not a person
  • namespace is not 0
  • page is not connected to Wikidata
  • project is other than Wikipedia

_wd postfix emphasizes for users that the method is based on Wikidata.

Use case(s): Deciding if the article is about a person is a frequent tasks as biographies may have several properties, categories, tasks to do with. Usually it is painful, as there are no obvious and easy-to-parse properties in the article text, and most Wikipedias don't have a simple category. Biographies form a main field of articles, not just one theme of million.

Benefits: The method would make the decision easy. Additional advantages: we could create pagegenerators or subclass Page with special methods.

Event Timeline

Xqt triaged this task as Low priority.Feb 4 2023, 4:50 PM

Change 888791 had a related patch set uploaded (by Ayush Anand33; author: Ayush Anand33):

[pywikibot/core@master] Add is_person_wd() method

https://gerrit.wikimedia.org/r/888791

Thank you! Could you please add a docstring so that it appears in doc?

This is too Wikimedia-specific (or even Wikipedia-specific). I'd prefer having this as a utility function or application logic rather than making it part of the model.

project is other than Wikipedia

Why?

This is too Wikimedia-specific (or even Wikipedia-specific). I'd prefer having this as a utility function or application logic rather than making it part of the model.

I think I am not quite worng if I suppose that Pywikibot is used on Wikipedia in vast majority of cases (although I myself already used it elsewhere). I would not make this an ideological question. This is the best place of Page methods.

What is your concrete suggestion to put this method?

project is other than Wikipedia

Why?

Because we don't know how other projects work. They may have their own repositories with other Ps and Qs. Do you have a better idea?

What I dislike about the proposal is that it introduces a domain-specific code to an almost completely generic interface. Indeed, I checked that almost every piece of code in Page/BasePage is site-agnostic and would work on whatever wiki you use it. Not this enhancement (even though it can be documented). In my opinion, it's a bad practice and precedent.

Do you have a better idea?

Make it at least a bit generic. For example, make it a helper method similar to Page.get_best_claim where P31=Q5 would be the input.

Should we create then a derived WikipediaPage class? I don't really like the idea, but later it could have other wikispecific methods. This would explicitely dedicated to this project, and whoever calls it in a bad project, can blaim him/herself.

As noted by others, I agree that it's too domain-specific to be part of the Page class.

FWIW, I've implemented this (using logic specific to enwiki) as part of dyk-tools: Article.is_biography()

Yes, if we cannot agree in a good soulution, I will develop a huwiki-specific module. I just thought this feature would be useful throughout Wikipedia, but I don't insist on it.

OK then, I leave the framework and implement it in my own scripts.

Change 888791 abandoned by Xqt:

[pywikibot/core@master] Add is_person_wd() method

Reason:

T328769 was declined

https://gerrit.wikimedia.org/r/888791