Page MenuHomePhabricator

Pull out all Wikibase-related parts of pywikibot to pywikibot/wikibase and use it as a submodule
Closed, DeclinedPublic

Description

It seems huge parts of pywikibot are devoted to handle Wikibase data model which should not be. More importantly It is not usable outside of pywikibot. I already made pywikibot/wikibase and it needs some work.

Event Timeline

Ladsgroup claimed this task.
Ladsgroup raised the priority of this task from to High.
Ladsgroup updated the task description. (Show Details)
Ladsgroup subscribed.

Why is this a high priority? pywikibot/wikibase is effectively the same as pywikibot - how does that help.
IMO T102741 is a dependency for this task - that needs to be done first.

@jayvdb, It's a high priority because it's a direct dependency on work we want to do next.

It also seems that this might be considered a duplicate of T102741, but it is not a dependency. If T102741 is not addressed in the short term, we'll still need to make progress on this since we have a deadline to meet.

@Halfak, what work do you want to do next? What do you need? What is your deadline? I am struggling to understand what "pywikibot/wikibase" solves as opposed to "pywikibot (with wikibase functionality)". I suspect that you are wanting something different from either of those.

T102741 is waiting on Phab project being created, so perhaps you might nudge whoever can help with that.
A well designed and stable data model layer should be a pre-req for building a new client access layer.
And a new client access layer would look nothing like pywikibot's wikibase, which was mostly designed to fit within the existing wikitext-centric model of pywikibot, and the design decisions throughout mostly only make sense within a pywikibot framework.

@jayvdb, we need to build feature extractors for WikiData in order to predict vandalism. @Ladsgroup's work is poised to make that feature extraction easier. We're not using pywikibot for a lot of reasons -- many of which have to do with its bloat and, the poor fit that the API has to the needs or revscoring's dependency injection framework.

FWIW, we already have the data model for WikiData extracted and we've been iterating its structure. See https://github.com/wikimedia/pywikibot-wikibase

Even when T102741 is resolved (which I don't expect to happen in 2015), we'll need a compatibility layer with the 'real' data model.
It makes sense to extract it ASAP.

The part for revscoring is done, but I still want to remove those parts from pywikibot-core

Is this actually resolved? As @Ladsgroup mentioned the parts are still in pywikibot.

The part for wb-vandalism is done, but we still have old code in pywikibot which means this bug is not done yet but as far as revscoring team is concerned, it's done for them. We have option of removing revscoring project but I leave it to @Halfak :)

Ladsgroup removed a project: User-Ladsgroup.
Xqt subscribed.

I am strictly against splitting main parts of the framework out of the current project. On bot owner side it is more difficult to combine a working bot, on developers side it is more difficult to check the code when it is distributed on several parts. Mainly the current and further development of wikibase is inside the pywikibot/core framework not inside pywikibot/wikibase.

On bot owner side it is more difficult to combine a working bot

Why is that? pywikibot has dependencies (e.g. mwoauth) so why not add another dependency. Dependencies can be version specific so you can guarantee that code doesn't change until the dependent version does.

developers side it is more difficult to check the code when it is distributed on several parts

Again, I point to other dependencies. I don't think it scales to have the same small set of developers reviewing all changes. Would you link to help me review pull requests to mwoauth? Maybe not because you don't think about the internal workings of MediaWiki's OAuth implementation (or maybe you do. I don't know). But I'll focus on making sure that we practice good (if not great!) software management in mwoauth so it remains a happy dependency. Why not pywikibase?

One more note. As it stands, we have pywikibase as a separate library that a bunch of things depend on (e.g. revscoring and ORES). We'll never depend on pywikibot because it's an awkward monolith and we only really want/need the functionality of pywikibase. If pywikibot implements a parallel installation of pywikibase-like-code, that means that the duplicate functionality will have to be maintained. In the best case scenario, bugs will be fixed in both libraries and features will be implemented twice. In the more likely scenario, the two libraries will become out of sync and work will simply be duplicated and end-user programmers will be confused. I don't see this as a good outcome at all.

One thing worth noting is that pywikibot should not care about how serialization of Wikidata works and only cares about the data model.

Xqt lowered the priority of this task from High to Lowest.Dec 18 2018, 9:41 AM

As I can see splitting pywikibot into parts never worked in past. There are few examples where this failed:

  • spelling is unsupported for 10 years and archived now
  • wiktionary is unsupported for 8 years and archived now
  • commonsdelinker is unsupported for 5 years
  • opencv is unsupported for 5 years and archived now
  • pycolorname is unsupported for 5 years and archived now
  • wikibase is unsupported for 1 year and never combined with the pywikibot code
  • misc was never used and has been deleted months ago
  • pywikibot 2.0 patch backporting from master branch lacked for month and was dropped
  • i18n is the only external library which is supported due to automatic update from twn
  • mwapi is supported and used quite widely
  • mwxml is widely used, supported and vastly more performant than pywikibase's XML dump processing utility
  • mwbase is currently used by ORES and revscoring so long term support can be expected, I'd recommend its adoption within pywikibot.
  • mwparserfromhell has always been separate and is well supported

Oh! I almost forgot about mwoauth which is already used internally by pywikibot

@matej_suchanek Do you think Pywikibot-wikibase will be revived? Or should we close the repo for good?

I don't know, this is a question for the maintainers. The "master" for Wikidata is still inside Pywikibot. This is more like a fork that if ever was the leader, now is greatly behind.

I asked about the future. If we should keep it for the future Wikidata implementation you are planning or just close it.

btw Pywikibot could use also WikidataIntegrator (T222608)

Pywikibot-wikibase repository is closed already.