Page MenuHomePhabricator

Evaluate improving/replacing mkdocs-mdpo-plugin to better support TWN integration
Closed, ResolvedPublicSpike

Description

The mkdocs-mdpo-plugin outputs one .po file per source file per language. It does not output any .po files for the source language (English).

TWN wants one English language .po file to use as input to it's translation memory and will output one .po file per target language which meets the configured export threshold.

The work of extracting translation units from Markdown input files to .po files and back is actually handled by https://github.com/mondeja/mdpo. The mkdocs-mdpo-plugin handles how that library is used in the mkdocs build flow.

Investigate the complexity of creating a fork or replacement of mkdocs-mdpo-plugin to better align with an established TWN workflow:

  • preparation of translation dictionary as a single English language .po
  • consumption of localized .po files found in git clone as a single file per target language

Related Objects

StatusSubtypeAssignedTask
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedNone
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedTBurmeister
ResolvedNone
ResolvedNone
ResolvedAklapper
ResolvedNone
ResolvedNone
ResolvedNone
Resolvedbd808
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedNone
Resolvedbd808
Resolvedbd808
ResolvedBUG REPORTbd808
Resolvedabi_
Resolvedbd808
ResolvedSpikebd808

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptNov 17 2021, 12:05 AM
bd808 changed the task status from Open to In Progress.Nov 17 2021, 12:06 AM
bd808 triaged this task as High priority.
bd808 moved this task from Inbox to Implementation on the Wikimedia-Developer-Portal board.

I don't have any code to show folks for this, but I have read through most of the code for both mkdocs-mdpo-plugin and mdpo at this point. The good news is that I think it is possible to use mdpo to create a workflow that matches with the current expectations of TWN for a GNU Portable Object based project. The less good news is that it will not be a simple update of mkdocs-mdpo-plugin. I haven't asked upstream if they would be interested in the level of change that this will need.

I would currently recommend writing a local plugin to replace mkdocs-mdpo-plugin which does not attempt to abstract operations for reuse in generating another site. Instead I would recommend focusing on the exact operations needed to build this single project and using mkdocs-mdpo-plugin as a reference for some of the operations that this purpose built plugin will need to perform. I would not consider this to be forking mkdocs-mdpo-plugin.

I would currently recommend writing a local plugin to replace mkdocs-mdpo-plugin which does not attempt to abstract operations for reuse in generating another site.

This sounds reasonable to me. When you say "local", does that mean it lives inside the same codebase as the rest of the site, or does it have to be deployed separately?

This sounds reasonable to me. When you say "local", does that mean it lives inside the same codebase as the rest of the site, or does it have to be deployed separately?

I'm currently trying to do it in the same codebase as the content, but that may not work out well in the long term. The main difficulty is getting the Docker containers to play nice with it across host platforms. This is related to the issue described at T295823: Failures updating python packages in dev environment following Linux Docker improvements which could be fixed by T296046: Allow build time control of effective UID/GID for runtime in Blubber generated Dockerfile or a similar solution. For now I'm getting things done in my local deployment by virtue of the Docker Desktop for MacOS workaround of using the build time UID/GID identity as my local runtime identity as well. This is needed due to the way that Poetry interacts with installing the local codebase into it's managed virtual environment that is used at runtime. Changing the implementation of a custom plugin would be more tedious, but easier to manage cross platform with Poetry + Docker if it was installed as an external library.

Dump of event order when building current POC site with a placeholder plugin that logs all on_* events:

DEBUG    -  Loading configuration file: /srv/app/mkdocs.yml
DEBUG    -  Loaded theme configuration for 'material' from
            '/opt/lib/poetry/wikimedia-developer-portal-2uZo5AhP-py3.7/lib/python3.7/site-packages/material/mkdocs_theme.yml':
            {'language': 'en', 'direction': None, 'features': [], 'palette':
            {'primary': None, 'accent': None}, 'font': {'text': 'Roboto',
            'code': 'Roboto Mono'}, 'icon': None, 'favicon':
            'assets/images/favicon.png', 'include_search_page': False,
            'search_index_only': True, 'static_templates': ['404.html']}
DEBUG    -  Config value: 'config_file_path' = '/srv/app/mkdocs.yml'
DEBUG    -  Config value: 'site_name' = 'Wikimedia Developer Portal'
DEBUG    -  Config value: 'nav' = [{'Get started': 'index.md'}, {'Wikimedia
            APIs': 'api/index.md'}]
DEBUG    -  Config value: 'pages' = None
DEBUG    -  Config value: 'site_url' = 'https://developer.wikimedia.org/'
DEBUG    -  Config value: 'site_description' = 'Portal for discovering technical
            documentation about Wikimedia projects'
DEBUG    -  Config value: 'site_author' = 'Wikimedia Foundation and
            contributors'
DEBUG    -  Config value: 'theme' = Theme(name='material',
            dirs=['/opt/lib/poetry/wikimedia-developer-portal-2uZo5AhP-py3.7/lib/python3.7/site-packages/material',
            '/opt/lib/poetry/wikimedia-developer-portal-2uZo5AhP-py3.7/lib/python3.7/site-packages/mkdocs/templates'],
            static_templates=['404.html', 'sitemap.xml'],
            locale=Locale(language='en', territory=''), language='en',
            direction=None, features=['navigation.instant',
            'navigation.tracking', 'navigation.tabs', 'navigation.sections',
            'toc.integrate', 'navigation.top', 'search.share'],
            palette=[{'media': '(prefers-color-scheme: light)', 'scheme':
            'default', 'toggle': {'icon': 'material/lightbulb-outline', 'name':
            'Switch to dark mode'}}, {'media': '(prefers-color-scheme: dark)',
            'scheme': 'slate', 'toggle': {'icon': 'material/lightbulb', 'name':
            'Switch to light mode'}}], font=False, icon=None,
            favicon='assets/images/favicon.png', include_search_page=False,
            search_index_only=True)
DEBUG    -  Config value: 'docs_dir' = '/srv/app/src'
DEBUG    -  Config value: 'site_dir' = '/srv/app/site'
DEBUG    -  Config value: 'copyright' = 'Copyright © 2021 Wikimedia
            Foundation and contributors'
DEBUG    -  Config value: 'google_analytics' = None
DEBUG    -  Config value: 'dev_addr' = Address(host='127.0.0.1', port=8000)
DEBUG    -  Config value: 'use_directory_urls' = True
DEBUG    -  Config value: 'repo_url' = ''
DEBUG    -  Config value: 'repo_name' = ''
DEBUG    -  Config value: 'edit_uri' = ''
DEBUG    -  Config value: 'extra_css' = ['assets/stylesheets/theme.css']
DEBUG    -  Config value: 'extra_javascript' = []
DEBUG    -  Config value: 'extra_templates' = []
DEBUG    -  Config value: 'markdown_extensions' = ['toc', 'tables',
            'fenced_code', 'meta', 'attr_list', 'pymdownx.highlight',
            'pymdownx.superfences', 'pymdownx.inlinehilite']
DEBUG    -  Config value: 'mdx_configs' = {'toc': {'permalink': True,
            'toc_depth': 3}}
DEBUG    -  Config value: 'strict' = False
DEBUG    -  Config value: 'remote_branch' = 'gh-pages'
DEBUG    -  Config value: 'remote_name' = 'origin'
DEBUG    -  Config value: 'extra' = {'alternate': [{'name': 'English', 'lang':
            'en'}, {'name': 'Español', 'link': '/es/', 'lang': 'es'}]}
DEBUG    -  Config value: 'plugins' = PluginCollection([('search',
            <mkdocs.contrib.search.SearchPlugin object at 0x7ff516ae4a20>),
            ('macros', <mkdocs_macros.plugin.MacrosPlugin object at
            0x7ff516a2ddd8>), ('wikimedia', <plugin.plugin.WikimediaPlugin
            object at 0x7ff5169bc6a0>), ('mdpo',
            <mkdocs_mdpo_plugin.plugin.MdpoPlugin object at 0x7ff516953710>)])
INFO     -  [macros] - Macros arguments: {'module_name': 'macros',
            'modules': [], 'include_dir': 'data/includes', 'include_yaml': [],
            'j2_block_start_string': '', 'j2_block_end_string': '',
            'j2_variable_start_string': '', 'j2_variable_end_string': '',
            'verbose': True}
DEBUG    -  [macros] - Project dir '/srv/app'
INFO     -  [macros] - Found local Python module 'macros' in: /srv/app
INFO     -  [macros] - Found external Python module 'macros' in:
            /srv/app
DEBUG    -  [macros] - Variables: ['extra', 'config', 'environment',
            'plugin', 'git', 'alternate', 'categories']
INFO     -  [macros] - Extra variables (config file): ['alternate']
DEBUG    -  [macros] - Content of extra variables (config file):
            {'alternate': [{'name': 'English', 'lang': 'en'}, {'name':
            'Español', 'link': '/es/', 'lang': 'es'}]}
INFO     -  [macros] - Extra filters (module): ['pretty']
DEBUG    -  [macros] - Docs directory: /srv/app/src
INFO     -  [macros] - Includes directory: data/includes
DEBUG    -  [event] on_config
DEBUG    -  [event] on_pre_build
INFO     -  Cleaning site directory
INFO     -  Building documentation to directory: /srv/app/site
DEBUG    -  [event] on_files
INFO     -  The following pages exist in the docs directory, but are not
            included in the "nav" configuration:
              - api/reading.md
DEBUG    -  [event] on_nav
DEBUG    -  Reading markdown pages.
DEBUG    -  Reading: index.md
DEBUG    -  [event] on_pre_page[index.md]
DEBUG    -  [event] on_page_read_source[index.md]
DEBUG    -  [event] on_page_markdown[index.md]
DEBUG    -  [event] on_pre_page[es/index.md]
DEBUG    -  [event] on_page_read_source[es/index.md]
DEBUG    -  [event] on_page_markdown[es/index.md]
DEBUG    -  [event] on_page_content[es/index.md]
DEBUG    -  [event] on_page_content[index.md]
DEBUG    -  Reading: api/index.md
DEBUG    -  [event] on_pre_page[api/index.md]
DEBUG    -  [event] on_page_read_source[api/index.md]
DEBUG    -  [event] on_page_markdown[api/index.md]
DEBUG    -  [event] on_pre_page[es/api/index.md]
DEBUG    -  [event] on_page_read_source[es/api/index.md]
DEBUG    -  [event] on_page_markdown[es/api/index.md]
DEBUG    -  [event] on_page_content[es/api/index.md]
DEBUG    -  [event] on_page_content[api/index.md]
DEBUG    -  Reading: api/reading.md
DEBUG    -  [event] on_pre_page[api/reading.md]
DEBUG    -  [event] on_page_read_source[api/reading.md]
DEBUG    -  [event] on_page_markdown[api/reading.md]
DEBUG    -  [event] on_pre_page[es/api/reading/index.md]
DEBUG    -  [event] on_page_read_source[es/api/reading/index.md]
DEBUG    -  [event] on_page_markdown[es/api/reading/index.md]
DEBUG    -  [event] on_page_content[es/api/reading/index.md]
DEBUG    -  [event] on_page_content[api/reading.md]
DEBUG    -  [event] on_env
DEBUG    -  Copying static assets.
DEBUG    -  Copying media file: 'assets/stylesheets/theme.css'
DEBUG    -  Copying media file: 'assets/images/favicon.png'
DEBUG    -  Copying media file: 'assets/javascripts/bundle.1e84347e.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/bundle.1e84347e.min.js.map'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.ar.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.da.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.de.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.du.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.es.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.fi.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.fr.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.hi.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.hu.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.it.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.ja.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.jp.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.multi.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.nl.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.no.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.pt.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.ro.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.ru.min.js'
DEBUG    -  Copying media file:
            'assets/javascripts/lunr/min/lunr.stemmer.support.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.sv.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.th.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.tr.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.vi.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/min/lunr.zh.min.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/tinyseg.js'
DEBUG    -  Copying media file: 'assets/javascripts/lunr/wordcut.js'
DEBUG    -  Copying media file:
            'assets/javascripts/workers/search.8397ff9e.min.js'
DEBUG    -  Copying media file:
            'assets/javascripts/workers/search.8397ff9e.min.js.map'
DEBUG    -  Copying media file: 'assets/stylesheets/main.db9e7362.min.css'
DEBUG    -  Copying media file: 'assets/stylesheets/main.db9e7362.min.css.map'
DEBUG    -  Copying media file: 'assets/stylesheets/palette.3f5d1f46.min.css'
DEBUG    -  Copying media file:
            'assets/stylesheets/palette.3f5d1f46.min.css.map'
DEBUG    -  Building theme template: 404.html
DEBUG    -  [event] on_pre_template[404.html]
DEBUG    -  [event] on_template_context[404.html]
DEBUG    -  [event] on_post_template[404.html]
DEBUG    -  Building theme template: sitemap.xml
DEBUG    -  [event] on_pre_template[sitemap.xml]
DEBUG    -  [event] on_template_context[sitemap.xml]
DEBUG    -  [event] on_post_template[sitemap.xml]
DEBUG    -  Gzipping template: sitemap.xml
DEBUG    -  Building markdown pages.
DEBUG    -  Building page index.md
DEBUG    -  [event] on_page_context[index.md]
DEBUG    -  [event] on_post_page[index.md]
DEBUG    -  Building page api/index.md
DEBUG    -  [event] on_page_context[api/index.md]
DEBUG    -  [event] on_post_page[api/index.md]
DEBUG    -  Building page api/reading.md
DEBUG    -  [event] on_page_context[api/reading.md]
DEBUG    -  [event] on_post_page[api/reading.md]
DEBUG    -  Building page es/index.md
DEBUG    -  [event] on_page_context[es/index.md]
DEBUG    -  [event] on_post_page[es/index.md]
DEBUG    -  Building page es/api/index.md
DEBUG    -  [event] on_page_context[es/api/index.md]
DEBUG    -  [event] on_post_page[es/api/index.md]
DEBUG    -  Building page es/api/reading/index.md
DEBUG    -  [event] on_page_context[es/api/reading/index.md]
DEBUG    -  [event] on_post_page[es/api/reading/index.md]
DEBUG    -  [event] on_post_build
INFO     -  Documentation built in 1.10 seconds

Processing order is roughly:

  • load config
  • run pre-build handlers
  • find files on disk
  • create site nav structure
  • for each markdown file found:
    • run pre-page callbacks
    • run read_source callbacks
    • read markdown from file (if not done by read_source callback)
    • run markdown callbacks
    • convert markdown to html
    • run html callbacks
  • run global template env callbacks
  • copy static assets to output location
  • for each theme template (non-markdown pages):
    • run template load callbacks
    • run template env callbacks
    • render template
    • run pre-save callbacks
    • store output (unless cancelled by pre-save callback)
  • for each markdown file:
    • run jinja env callbacks
    • render page
    • run pre-save callbacks
    • store output (unless cancelled by pre-save callback)
  • run post-build handlers

One thing I notice from this is that the rendered HTML from the markdown processing step seems to be held in RAM for all pages in the site during the build. You can hold a lot of HTML in say 1GiB of RAM, but a thing to be aware of if we start seeing OOM issues during site generation.

I'm going to declare this spike {{Done}}. I did not produce a solid estimate of the effort required to implement the new plugin, but I have a scientific wild ass guess (SWAG™) estimate of 30-50 ideal hours.