Page MenuHomePhabricator

`APISite.is_data_repository` does not work if `self` is an `APISite`
Open, Needs TriagePublic

Description

It is not possible to use that function to determine if the site is a data repository when the site is not already a DataSite instance. And in that case an isinstance() would work too.

If possible that should be done together with T85331 because the current system doesn't allow that a Wikibase repo uses another Wikibase repo (although I'm not sure if that is possible).

Event Timeline

XZise created this task.Dec 29 2014, 10:42 PM
XZise raised the priority of this task from to Needs Triage.
XZise updated the task description. (Show Details)
XZise added subscribers: Aklapper, Unknown Object (MLST), XZise and 2 others.

I believe Wikimedia will soon have repo's that talk to other repos. If I understand correctly, the Commons Metadata project will make use of Wikidata .. somehow.

matej_suchanek added a subscriber: matej_suchanek.

I think the method now works as expected.

>>> import pywikibot
>>> site = pywikibot.Site('en', 'wikipedia')
>>> site.is_data_repository()
False
>>> repo = site.data_repository()
>>> repo
DataSite("wikidata", "wikidata")
>>> repo.is_data_repository()
True
>>> pywikibot.Site('wikidata', 'wikidata')
DataSite("wikidata", "wikidata")
>>> pywikibot.Site('en', 'wiktionary').is_data_repository()
False

The only thing we could do better is to override this method in DataSite():

def is_data_repository(self):
  return True

since Site.data_repository() always returns DataSite().

Note that this task is referenced from two places in code, so some cleanup would be useful.

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptJan 31 2017, 8:56 PM
>>> site = pywikibot.Site('wikidata', 'wikidata', interface='APISite')
>>> site
APISite("wikidata", "wikidata")
>>> site.is_data_repository()
False
>>> site.data_repository()
DataSite("wikidata", "wikidata")

Though I'm not sure why someone would do interface='APISite' in the first place.

I am thinking of whether this method is useful for anything but a confusion. If the site should be repo, it is a DataSite. That it's a DataSite you can test via isinstance(). The identity testing in the current APISite.is_data_repository() method is either obsolete (for repos themselves), or it doesn't work (for the mentioned case).

IMO the best solution would be:

  1. deprecate and remove APISite.is_data_repository()
  2. always query for the repository
  3. override DataSite.data_repository() with return self (for better performance)

I am thinking of whether this method is useful for anything but a confusion. If the site should be repo, it is a DataSite. That it's a DataSite you can test via isinstance().

  • deprecate and remove APISite.is_data_repository()

+1. Agreed, make sense. I don't write bots for wikidata so I can't say for wikidata bot coders, though.

Though I'm not sure why someone would do interface='APISite' in the first place.

In fact, you may need to treat the site as a client wiki. We shouldn't prevent this.

In fact, you may need to treat the site as a client wiki. We shouldn't prevent this.

DataSite inherits from APISite. Anything valid in APISite (as a client wiki) should be equally valid in DataSite (as a repository wiki), but not necessarily the other way around. If you need to treat Wikidata as a client wiki, using only method provided in APISite, whether you initialize it as APISite or DataSite should make no difference.

In fact, you may need to treat the site as a client wiki. We shouldn't prevent this.

whether you initialize it as APISite or DataSite should make no difference.

It does, see for example WikidataSPARQLPageGenerator. This is the case where you may want the generator to return pages in the project namespace connected to items, for instance.