Page MenuHomePhabricator

Add superset query service as pywikibot generator
Closed, ResolvedPublicFeature

Description

Superset is SQL query interface to Wikimedia database replica. It has API which allows to run SQL queries over https and it would be nice to have local SQL query access from pywikibot without need of wikimedia cloud accounts. With superset user needs only give OAUTH permission.

Howto use

  1. add mediawiki test user for meta.wikimedia.org to user-config.py
  2. login to meta.wikimedia.org
  3. login to superset https://superset.wmcloud.org/login and give OAUTH permissions
  4. python -m unittest -v tests.superset_tests

mybot.py

import pywikibot
from pywikibot import pagegenerators
from pywikibot.bot import ExistingPageBot

class MyBot(ExistingPageBot):
    def treat_page(self):
        """Load the given page, do some changes, and save it."""
        print(self)

def main():
    """Parse command line arguments and invoke bot."""
    options = {}
    gen_factory = pagegenerators.GeneratorFactory()

    # Option parsing
    local_args = pywikibot.handle_args()  # global options
    local_args = gen_factory.handle_args(local_args)  # generators options

    MyBot(generator=gen_factory.getCombinedGenerator(), **options).run()

if __name__ == '__main__':
    main()

python pwb.py mybot.py -family:commons -lang:commons -supersetquery:"SELECT page_namespace, page_title FROM page LIMIT 10"

Example SQL queries

  • mandatory return columns for creating are: page_id OR page_namespace and page_title
  • if page_wikidb is defined then it is used as site. Example values are enwiki or enwiki_p or commonswiki or commonswiki_p.
SELECT gil_wiki AS page_wikidb, gil_page AS page_id FROM globalimagelinks GROUP BY gil_wiki LIMIT 10
SELECT page_id FROM page LIMIT 10
SELECT page_namespace, page_title FROM page LIMIT 10

Future work

  • T367393 - In future Superset should be able to query ToolsDB public databases also and this needs to be added.

Details

Related Changes in Gerrit:

Event Timeline

Xqt triaged this task as Low priority.EditedJun 17 2024, 7:51 AM
Xqt changed the subtype of this task from "Task" to "Feature Request".
Xqt subscribed.

Looks like it could be implemented similar to |sparql or mysql

Change #1046603 had a related patch set uploaded (by Zache-tool; author: Zache-tool):

[pywikibot/core@master] Add superset pagegenerator

https://gerrit.wikimedia.org/r/1046603

Zache updated the task description. (Show Details)
Zache updated the task description. (Show Details)

@Xqt Yes, I looked those as an example when I started.

Zache updated the task description. (Show Details)
Zache updated the task description. (Show Details)

Change #1046603 merged by jenkins-bot:

[pywikibot/core@master] Add superset pagegenerator

https://gerrit.wikimedia.org/r/1046603