Page MenuHomePhabricator

Reducing complexity of the Family class
Open, LowPublic

Description

As the overall trend should be towards using AutoFamily I want to list everything which looks unnecessary as it can be fetched via the API here.

The following could be replaced by API calls:

  • namespacesWithSubpage: This should be already possible via the Namespace class, as it's in [[https://www.mediawiki.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces|action=query&meta=siteinfo&siprop=namespaces]] returned as subpages="". Maybe the Namespace class should get properties like has_subpages which make it easier to use.
  • linktrails and linktrail(): At least in newer wikis it is reported via the API [[https://www.mediawiki.org/w/api.php?action=query&meta=siteinfo&siprop=general|action=query&meta=siteinfo&siprop=general]] although I'm not sure how much a “MediaWiki:Linktrail” does change/overwrite it. Main problem there is to parse it into a Python regex (see also Gerrit 184216). According to API:Meta it was added in 1.21 so there still needs to be some support for older wikis. Gerrit 207179 does add dynamic support.
  • known_families and get_known_families(): Could be replaced by using the interwiki map. There is only one usage in the library which could be easily replaced.
  • nocapitalize: This is namespace specific and already represented in the Namespace class (see Link.parse). The primary use of it, is when creating a APISite instance that the username is not capitalized. But according to Manual:$wgCapitalLinkOverrides the User namespace is never affected by that (and thus always False). Gerrit 190619 is going to deprecate it.
  • interwiki_forward and interwiki_forwarded_from: This is can be done via the API to determine to which project en for example redirects (on commons for example to the Wikipedia). (T104129)
  • obsolete: This is an odd beast with an ambiguous definition. There is a patch to make it obsolete Gerrit 187358.
  • languages_by_size: There is a patch, but that only works for some families efficiently. There is also a patch to do that manually which would work on any but is relatively slow as it needs to contact every code.
  • scriptpath(): Is in the siteinfo (like the linktrail) but obviously to get to the API that needs to be defined. AutoFamily (with the complete URL) already supply it.
  • versionnumber() and version(): These is already deprecated, and if it needs to be configured, force_version() should be used.
  • shared_image_repository(): There is a patch (Gerrit 181416) to make it more dynamic, but unfortunately it doesn't work always, so there is still some dynamic configuration needed. (T74847)
  • shared_data_repository(): There is already a bug report here (T85331) and depends on how multiple repositories are represented in the future.
  • server_time(): Already deprecated with a site method.

There also some configuration variables. These should be moved into config2.py with a “global default” a possibility to overwrite it for each family with a specific setting. One problem could be when they need to be dynamic and executable code.

  • protocol(): The AutoFamily automatically defines it. Maybe there should be a simpler approach which just reads a use_https boolean attribute. So whenever someone needs a normal Family class they can use use_https = True. Alternatively generate_family_file.py should add that always (and then with the correct defined protocol from the URL) so the user easily sees what needs to be done.
  • ignore_certificate_error(): Should be similar when a normal Family class is used (boolean attribute and generate_family_file does add it correctly set)
  • interwiki_attop
  • interwiki_on_one_line
  • interwiki_text_separator
  • category_attop
  • category_on_one_line
  • category_text_separator
  • categories_last
  • interwiki_putfirst
  • interwiki_putfirst_doubled
  • ssl_pathprefix(): Although it depends on how the siteinfo then changes, it could be retrieved from there (same problem as scriptpath()).
  • nicepath()
  • rcstream_host()
  • _get_path_regex(self): That needs to change especially if a site is accessible via multiple hostnames or it should be never defined.
  • maximum_GET_length()
  • force_version()
  • code2encoding() and encoding(): It depends what encoding is meant. The communication with the server on HTTP level? If so shouldn't the server answer accordingly if there is no valid encoding. It could then use that encoding. If it is really required (and not UTF-8) we could still implement it via a configuration variable.
  • post_get_convert() and pre_put_convert(): This should be probably rewritten into a list of converters and then via a configuration some converters could enabled.

Some of the methods are static and don't need to be changed/overwritten and thus don't need to be removed:

  • language_groups: Although this could be probably statically defined and doesn't change with other families
  • hostname() and ssl_hostname(): Those are set correctly in AutoFamily and the question is, if they need to be overridden in normal Family instances.
  • path(), querypath(), apipath(), nice_get_address(): Those probably never change and are always relative to scriptpath()/nicepath()
  • from_url()

I'm not sure about these however:

  • category_redirect_templates, category_redirects(), get_cr_templates()
  • use_hard_category_redirects: Is nowhere used in the project, except to set an Site attribute.
  • disambiguationTemplates, disambig(): disambiguationTemplates is only used for disambig(), which is only used in pywikibot.page.
  • cross_projects
  • cross_projects_cookies
  • cross_projects_cookie_username
  • cross_allowed
  • disambcatname
  • ldapDomain
  • crossnamespace
  • iwkeys(): Is nowhere used in the project. This basically list all codes and the codes of from interwiki_forward. Could be probably replaced by a better interwiki map implementation which allows to get the complete mapping (instead of the current way to get only one definition). (T104129)
  • _addlang(): Is nowhere used in the project.
  • dbName()
  • code2encodings() and encodings(): Those two are somewhat strange, because they return by default the same value as the singular variants (not even wrapping them in a list). But even then why does it need to define multiple encodings?
  • isPublic(): Is nowhere used in the project but might be helpful, because if that is False, the API is probably not accessible without logging in. But that could pose some problems as part of the code assumes it is always possible to determine the version for example (though force_version() could be used there?).

Related Objects

Event Timeline

XZise created this task.Feb 13 2015, 1:14 PM
XZise raised the priority of this task from to Needs Triage.
XZise updated the task description. (Show Details)
XZise added a project: Pywikibot.
XZise added a subscriber: XZise.
Restricted Application added subscribers: Aklapper, Unknown Object (MLST). · View Herald TranscriptFeb 13 2015, 1:14 PM
jayvdb added a subscriber: jayvdb.Feb 15 2015, 4:40 AM
XZise updated the task description. (Show Details)Feb 16 2015, 1:24 PM
XZise set Security to None.
XZise added a comment.Feb 16 2015, 1:26 PM

I'm not 100 % sure, but it seems like dbName, use_hard_category_redirects and disambiguationTemplates are like configurable variables. So those might be moved to the section of entries which get configuration variables.

Omegat added a subscriber: Omegat.Mar 1 2015, 3:51 PM

Change 201446 had a related patch set uploaded (by XZise):
[FEAT] Load the settings from wiki

https://gerrit.wikimedia.org/r/201446

Ricordisamoa added a subscriber: Ricordisamoa.
XZise updated the task description. (Show Details)Apr 29 2015, 8:30 AM

Change 219610 had a related patch set uploaded (by John Vandenberg):
Family attribute namespacesWithSubpage is unused

https://gerrit.wikimedia.org/r/219610

Change 219610 merged by jenkins-bot:
Family attribute namespacesWithSubpage is unused

https://gerrit.wikimedia.org/r/219610

Change 219822 had a related patch set uploaded (by John Vandenberg):
Remove unused Family._addlang

https://gerrit.wikimedia.org/r/219822

Change 219822 merged by jenkins-bot:
Remove unused Family._addlang

https://gerrit.wikimedia.org/r/219822

Change 221439 had a related patch set uploaded (by XZise):
[FEAT] APISite.article_path to replace nicepath

https://gerrit.wikimedia.org/r/221439

Change 221439 merged by jenkins-bot:
[FEAT] Replace nicepath by APISite.article_path

https://gerrit.wikimedia.org/r/221439

jayvdb updated the task description. (Show Details)Sep 4 2015, 8:37 AM
jayvdb updated the task description. (Show Details)
Xqt triaged this task as Low priority.Jun 28 2017, 8:22 AM

Change 516645 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [cleanup] Remove interwiki_replacement_overrides

https://gerrit.wikimedia.org/r/516645

Change 516645 merged by jenkins-bot:
[pywikibot/core@master] [cleanup] Remove interwiki_replacement_overrides

https://gerrit.wikimedia.org/r/516645