Page MenuHomePhabricator

Consider a static site generator for Wikimedia Design Style Guide
Closed, InvalidPublic

Description

Given that we are alright with using a build process for the style guide (T164246), we should consider using a static site generator. It will:

  • Reduce repetition of markup - header, footer, parts of the sidebar
  • Improve i18n support
  • Remove the need to keep built files in the repo (we currently have just one btw)
NameLanguagei18n supportFile format
MiddlemanRubyYes (falls back to the default language if there are no strings for the language it is looking for)YAML or many others ...even wiki markup (o_0)
HexoNode.jsOnly for layout, not for source, which is not helpful to us (not sure about this, could someone test and update the task?). Couldn't get it to work though.JSON or YAML
Webpack build (html-loader, i18n-webpack-plugin)Node.jsYes (no fallbacks, I am guessing this does a simple replace)JSON
PelicanPythoni18n for source (page content); may be able to translate UI via Jinja-templates+passed parameterYAML/Markdown/reStructuredText
HugoGoYes (uses go-18n)Markdown and Org-mode
JekyllRubyWith plugins: polyglot or jekyll-multiple-languages-plugin - note: not with gh-pagesMarkdown
GatsbyJavaScriptYes. Content: with GraphQL query or with plugin. UI: jQuery.i18 bindings for React or any other React i18n pluginPlugins for: Markdown, YAML, JSON, CSV, etc.

It shouldn't be too hard to get these to work with http://translatewiki.net/.

Event Timeline

We are already using Middleman for the Transparency Report, and so I have some experience with it.

  • work with translatewiki.net,
  • be easily understandable for non-technically advanced contributors to adapt,
  • be performant

We can surely figure out a way to make these work with translatewiki, and as they would essentially serve pre-built static pages performance should not be a problem either.

Ease of understanding is a bit of a problem though. Thinking about it from the perspective of Middleman, the markup will have references to the strings being used, like <%= t('privacy.information_produced') %>, and the strings will live in YAML file.

This means that to edit the content, you'd first have to see the markup file, identify which string needs to be edited and then edit it in the YAML file. That is in no way easy , but I cant think of any other simple solution.


I haven't tested this, but we could have a build system where the markup could be...

<p t-string="typography-intro">Lorem ipsum … sit amet</p>

... and a build process that runs before Middleman kicks in could convert it to...

<p><%= t('typography.intro') %></p>

… and create the following YAML file...

---
en:
  typography:
    intro: Lorem ispum … sit amet

This way, a contributor only edits the markup, and only the English version, and the rest is taken care of by the intermediate build process, Middleman and transatewiki.

@Prtksxna What do you refer to with source in the task description?

@Prtksxna What do you refer to with source in the task description?

As I couldn't get this to work at all I am not fully sure, but my guess is that source refers to the actual content of the pages and layout is the stuff that is around - headers, footers, menus etc.

I might be wrong about both things here :)

I used pelican (python) for creating websites. It is similar to hexo. It may share the same source/content problem, though, if

source refers to the actual content of the pages and layout

is correct, I can say that pelican seems to do both, see e.g. fordes

I used pelican (python) for creating websites. It is similar to hexo

Ah! I forgot about Pelican. Do you mind adding it to the table? Could you also update the fallback language when translations aren't available?

In last week's Editing design meeting @Pginer-WMF noted that the style guide will, for the next few months, undergo a lot of changes, and then stabilize into a document that needs fewer updates. Given this timeline, it makes sense to keep the style guide easy to edit (like it is now) and mark it for translation later. This makes sense from a translation perspective too.

Volker_E renamed this task from Consider a static site generator for WikimediaUI Styleguide to Consider a static site generator for WikimediaUI Style Guide.May 25 2017, 1:04 PM

Any reason why we haven't considered more popular static site generators like Jekyll or Hugo? Here is a list by github metrics.

Any reason why we haven't considered more popular static site generators like Jekyll or Hugo? Here is a list by github metrics.

Thanks for the suggestions @bmansurov! I didn't consider jekyll because it doesn't have i18n support (not without a plugin AFAIK). Hugo wasn't on my radar at that time, and it seems it has multilingual mode. If you're familiar with it, do add it to the table in the description.

When comms was considering a styleguide-based site last quarter (project abandoned) for the foundation strategy, I chose to convert the styleguide to Jekyll because of its easy integration with github. In my opinion, Jekyll has many limitations, like poor i18n support, and a bare-bones feature set, but it offers a few benefits, like integrated github-pages support, and not having to maintain the build toolchain.

One feature that I think should be given consideration when choosing an SSG is the ability for non-technical stakeholders to edit content. With the comms jekyll site, I hooked it up to http://prose.io/ which is an online markdown editor that integrates into github. With this workflow, anyone with an authorized github account can make changes to the site content (i.e. a folder of markdown files) through a WYSIWYG editor and the changes are then 'saved' as a pull request to the repo. Once the pull request is merged, github automatically rebuilds the jekyll site with the updated content. I found this to be a very user-friendly approach to editing a static site.

This is a link to the comms site repo https://github.com/j4n-co/wmf-2030-strategy-site
and here on gh-pages https://j4n-co.github.io/wmf-2030-strategy-site/home/

I personally don't "like" the limitation of Jekyll very much (for example creating the menu was very convoluted) but I think the workflow of editing content online and having the site rebuild automatically is a very convenient workflow.

I agree, an important reason for using an SSG over HTML should be simpler content editing.

@Jdrewniak prose.io seems very useful, thanks for sharing . I partially ported the style guide to Middleman to see if there was any benefit of doing so, and also to see if it would work with Prose. I say partially because I only moved the text content, not any of the images or fonts. I also converted only one page to markdown to test with Prose. Several styles and classes may be missing too.

You can see all the code on GitHub and view the generated website on http://prtksxna.github.io/wsg-middleman. I am happy with the results of this experiment and wanted to share them here.


Style guide on Middleman

Structure

Like most SSGs, Middleman lets you move content out into layouts and partials, thus making single pages of content be more simple. This allows for the content HTML pages to have just the content and nothing else. Not having the HTML of the layout on every page makes it easier to edit, especially if we plan to use something like Prose.

ToC

Initially, I moved the ToC into a separate partial that was loaded in the main layout. I then added some helper methods to be able to add the is-on class on its own without manual effort. Finally, I wrote a method that could generate the ToC with all the correct classes for every page using a YAML file that looks like this:

- name: Introduction
  path: "/"
- name: Design principles
  path: "/design-principles.html"
- name: Visual style
  path: "/visual-style.html"
  submenu:
    - name: Principles
      path: "/visual-style.html"
    - name: Colors
      path: "/visual-style--colors.html"
    - name: Typography
      path: "/visual-style--typography.html"
    - name: Icons
      path: "/visual-style--icons.html"
    - name: Imagery
      path: "/visual-style--imagery.html"
    - name: Illustrations
      path: "/visual-style--illustrations.html"

New pages can be added to the ToC by editing just this.

Prose

Prose works out of the box for any markdown file. It was also able to correctly edit the frontmatter of the page which means editing of metadata would be possible with necessary configuration. Here is what editing the main page looks like:

MarkdownPreview
Screen Shot 2018-01-09 at 11.25.13 AM.png (900×1 px, 331 KB)
Screen Shot 2018-01-09 at 11.25.19 AM.png (900×1 px, 281 KB)

Build

Nothing beats Jekyll integration with Github, but automated builds for Middleman aren't too hard either. Middleman lets you add deploy strategies and GitHub is one of them. I set this up for a personal project once and it was quite straightforward.

But, given that we'd likely be deploying this on our own servers it makes things even easier (we already do so for the Transparency Report).

@Prtksxna yeah middleman reminds me of the good old days.. when we didn't have to use webpack :P
I've looked into Middleman for this usecase myself, I haven't done any styleguide porting, but I looked at how it handle i18n and wikitext.

Middleman

wikitext
This was just a curiousity, and practically speaking, importing content from mediawiki isn't very useful without templates, but, Middleman can feasibly import wikitext using Tilt, even from a remote endoint . Again, probably useless, but what the heck.

i18n
I think Middleman's i18n capabilities are pretty bare-bone. This is where I ended my exploration, because I couldn't produce a simple language picker (a locale switcher is still in progress).

Menus
I don't think there is a built-in helper for creating menus in Middleman. I'd prefer to have the menu dynamically generated from the content, so when someone adds a page, they can position it in the menu using a field in the markdown front-matter. I don't know if this is easy in Middleman, but that question lead me to explore Hugo, the SSG written in Go.

Thoughts on Hugo

I briefly explored Hugo as an option, and although I know nothing about Go, I was pretty impressed. I has very good i18n support (language switching is no problem) and built-in menu functionality. What made me really consider Hugo though, is it's theme functionality. Themes in Hugo are just basically Hugo sites placed inside other Hugo sites. I think this is advantageous because you could for example, build the styleguide site, and then just place that site inside a "theme" folder inside of a new site, and all the styleguide components will be made available. You can insert the styleguide as a git submodule, and then the parent site can just update the submodule to get the latest version.

My cons for Hugo so far are:

  • Very big ramp-up time (because it has lots of features and it's written in Go)
  • No extendible funcitonality (i.e. plugins) (...because Go is a compiled language)

Let's continue the conversation, especially when thinking about a layout base template that could be used for other projects (similar to what @Jdrewniak started for the 2030 microsite). But I'll put it on Backlog as it doesn't seem feasible from our conversations (still evaluation needed and possible technical entrance hurdles) for v1 of the Style Guide.

@dbarratt Could you evaluate?

Added it to the table, let me know if you have any questions about it.

@dbarratt Thanks! Any public repo/project where you have it in use?

@dbarratt Thanks! Any public repo/project where you have it in use?

Yep. I'm using it here: https://github.com/boggs-love/web and here's the prod version: https://boggs.love

but a better example would the React website itself: https://reactjs.org and the repo for it: https://github.com/reactjs/reactjs.org

@dbarratt the thing that I find puzzling about Gatsby.js is how the generated output is inextricably dependant on React. That means that even if your website doesn't use javascript, you're still shipping the React library to the client, which I find strange. Looking at this GitHub issue which asks "how do you strip out all javascript" the answer seems to be "why would you do that?" which makes me feel old :P

@dbarratt the thing that I find puzzling about Gatsby.js is how the generated output is inextricably dependant on React. That means that even if your website doesn't use javascript, you're still shipping the React library to the client, which I find strange. Looking at this GitHub issue which asks "how do you strip out all javascript" the answer seems to be "why would you do that?" which makes me feel old :P

ha! yeah it's a little weird. I guess you could always just remove the JS file from the generated HTML file. Since it's all rendered at build time using ReactDOMServer you get a no-JS static version of the site, so just not including the JS file would have the same effect as the user disabling JS.

But yeah, I don't know why you would do that, it would strip away the biggest features of Gatsby (no-page refreshes between pages, preloading content, etc.)

Volker_E renamed this task from Consider a static site generator for WikimediaUI Style Guide to Consider a static site generator for Wikimedia Design Style Guide.Jul 16 2018, 12:36 PM

This task hasn't seen much movement recently, but I'd like to point out that it's still something I'm very interested in.

I've investigated many (if not all) of the suggested frameworks, and think that a common weakness among them is i18n support. Many frameworks advertise some degree of i18n, but quickly run into limitations ( e.g. it's suprisingly difficult to create a language switcher in Middleman) or require plugins, which risk being unmaintained or falling out of sync with the core framework.

From my research, I believe that Hugo has the best i18n support, offering built-in i18n features for translating UI elements, content, menus, menu items, as well as built-in language-switcher support and more advanced features such as rewriting links to support language-specific subdomains.

I've jotted down some notes here regarding the research I've done so far.

I do concede that Hugo has a steep learning curve, since the Go templating language is rather obtuse (and there doesn't seem to be much appetite for supporting Mustache) , but once you get used to it, it provides lots of flexibility in terms of specifying different layouts for pages, embedding UI widgets in content, importing external content (from a wiki maybe...) etc.

I've created a sample Wikimedia Styleguide Hugo "theme" project on github. Please check it out. One thing I like about Hugo is that a "theme" is really just a Hugo site that's placed inside of another Hugo site. This offers benefits for maintainability since the theme can be kept up-to-date as a git submodule.

You can install the sample project with npm install and boot it with npm run start. The GitHub pages version is also available (test language switcher in the footer).

@Jdrewniak: I also tried hugo (https://urbook.fordes.de/) and was pretty happy with it (not speaking go) – I was able to fiddle with the templates, thats a good sign :-)

I particularly liked:

  • that one could hook in other text converters (pandoc), too, in case one takes the performance penalty.
  • Powerful menu structures possible. Submenus needed lots of Jinja-Hacks in Pelican (at least in my approach, but I found no better practices)

Does Hugo's i18n support work with translatewiki.net? or would you be using something else for translations?

@Nikerabbit might be able help us here: We would like to ensure possible connectivity with translatewiki.net before putting more developer time into a solution like Hugo. Are there possibilities to do so?

I suppose it might be worth splitting the translation into Content vs. Interface. As far as I know (and please correct me if I'm wrong @Nikerabbit), translatewiki is primarily (if not exclusively) a tool for Interface translations.

You could declare that all of your content is Interface and use translatewiki for all of it (which is what we did on Interaction Timeline), or you could declare that all of your content is Content (I think wikimediafoundation.org does this) or there could be a mix (MediaWiki does this).

If you have Content translations, I guess it's a good question to ask: Where do translators translate the content? Do they do it in markdown files in a git repo? Do they do it in a CMS?

If you have Interface translations, do they translate on translatewiki or something else?

Regardless of the system you choose, it's good to think about these things. Worst case scenario, you end up building a plugin for your system to work with translatewiki, but hey, the more systems that work with translatewiki, the better. :)

@dbarratt good points. As you mention, there is a distinction between interface translations and content translations.

Most of these frameworks, including Hugo, make this distinction as well. For interface translations, Hugo (and Middleman, Pelican etc.) support passing strings from a directory of YAML files, so in that respect, integrating translatewiki looks very similar to many other projects.

Content translations on the other hand, is different. I think content translations would have to go through the same editing workflow as standard content, which ultimetly means committing markdown files to a git repository. It might be possible to use UI translation strings as content, but that would require further investigation.

Involving markdown and git in the editing and translation workflows is a high barrier to entry, especially for non-technical people. However there are a few tools that make this easier, such as http://prose.io, a visual markdown editor that integrates with github, and "saves" edits as pull-requests.

Another option for editing could be integrating the content more with mediawiki itself. In that scenario, the page content could be stored on a wiki, and then, along with the translations, the site-generator could pull down the content from the wiki and process it to generate the final HTML.

We don't want to introduce new interfaces for translators for this kind of translation (where translations are expected to to have 1:1 equivalence on content). Not integrating with Translate adds maintenance costs and harms discovery and usability.

In my experience the main reason why translating prose in this kind of system is difficult is the lack of stable identifiers. When we are dealing with plain text, it harms usability for authors to add those markers in, see for example Translate's page translation. When dealing with non-plaintext, it should possible to hide those markers, but as far as I know that hasn't been implemented anywhere. It's also possibly to manage the markers outside the text, but that needs a process to map equivalences every time something changes. We kind of do this at translatewiki.net, but since we lack good support for renames, it is impractical to place such burden on us.

You also need to implement an automatic segmentation so that e.g. each paragraph, heading and maybe even list item is a separate translation unit. And you need to take care that only minimal mark-up is present in the translation units. Usually this happens with an extractor tool that identifies translatable parts from a document and places them into a standard l10n file format (json, yaml) and also stores enough metadata to reconstruct the original file with translations.

If you can implement a solution where the prose is placed on translatable wiki pages (on mediawiki.org for example) and have a script to pull in those translations when building the site (taking the HTML output and massaging it as necessary?), I think that would the solution that requires the least amount of new complex tooling. There is a known usability issues on edits translatable wiki pages, but when that is improved, you would get those improvements too without extra effort.

There is a known usability issues on edits translatable wiki pages, but when that is improved, you would get those improvements too without extra effort.

What's the issue(s)?

In the discussion around evaluating future Vue.js based component libary demo tool in T290912, @AnneT has brought up and experimented with VitePress. It uses markdown as file format and supports i18n – similar to its precursor Vuepress. Seems like a tempting candidate!

I +1 @bmansurov and @Jdrewniak suggestion of using Hugo as it provide decent i18n, a large feature set, is framework agnostic and has a strong user base. Though WMF offically adopted Vue.js so it may make more sense to use VitePress. I will research what we will lose by using one or the other.

Concerning the initial goals:

Reduce repetition of markup - header, footer, parts of the sidebar
Improve i18n support
Remove the need to keep built files in the repo (we currently have just one btw)

I don't see any of the two being a blocker. And I see i18n support more of "how can we integrate it with TranslateWiki" than "does the generator support i18n" issue.

@Nikerabbit is segmentation of content required? In my brief experience with TranslateWiki I can tell that segmentation often is a issue because the translator has no clue about the context of what's being translated.

@Nikerabbit is segmentation of content required? In my brief experience with TranslateWiki I can tell that segmentation often is a issue because the translator has no clue about the context of what's being translated.

Yes, because:

  • The translation interface is optimized for short form content, at most one paragraph a time.
  • It splits up the translation work into more manageable chunks for volunteers. If they get stuck trying to find a good translation for some part, they can skip that for now and translate the others
  • Translation memory works better for shorter content
  • It's easier to review and update only the changed parts, instead of checking a whole document.

Context shouldn't be a problem. We do segmentation for translatable pages, but translators can still see the context. For example: https://www.mediawiki.org/w/index.php?title=Special:Translate&group=page-Help%3AExtension%3ATranslate&action=page&filter=&language=fi
And message documentation feature is there to add additional info where needed.

Codex, as design system for Wikimedia, is the successor of DSG and it's using VitePress with Markdown files as content source.