Page MenuHomePhabricator

Build linting tools for technical documentation
Closed, ResolvedPublic

Description

Build tools that can be used to analyze technical documentation and generate feedback or suggestions on how to improve it. Use a general-purpose prose linter as the basis for the tools.

Context

In a complex ecosystem of tools, codebases, and APIs used by a global community of contributors, technical documentation is a critical resource. But writing high quality docs can be difficult. To make it easier to write good documentation, we will build prototypes of tools that can provide feedback about documentation either as it's being written, during publishing, or after it's live.

To that end, we will use an existing, general-purpose prose linter with a custom configuration based on our style guides. We will create proof-of-concept solutions embedding the linter in our processes, for example in GitLab CI pipelines.

We hope that with tools built in this project we'll be able to empower new writers, increase current writers' confidence in their docs, and improve documentation quality.

Why linting?

Linting is the process of analyzing code in search of errors, bugs, and suspicious constructs (source). It's a crucial and familiar part of many development processes. It allows developers to automate static checks against a set of rules, for example defined in a style guide.

Technical documentation isn't code, but it's also governed by a set of clear rules. Technical writers and editors often codify them in style guides of their own, but these are often a lengthy read because natural language can get complex. An automated linter that checks your documentation during writing or reviewing can make an existing style guide much more accessible and easy to use.

Outcomes

In the course of this project, we've created artifacts described in the following sections.

Note:

  • All tools designed in this project were developed by the Technical Documentation Team and function as prototypes. In the event of broad adoption of these tools, they might need to be improved, refactored, or rewritten.
  • The focus in this project is on linting documentation written in English. Tools and configurations developed here don't work with other languages and aren't translatable. However, we hope that some configurations, designs, and lessons learned in this project will be directly transferable to similar projects for other languages.

Linter configuration

Linter configuration with a programmatically codified version of selected style guide rules

Vale expects its configuration in the .vale.ini file to be located either in user's home directory, or in the project home directory. This configuration file should also specify the location of the styles directory, which contains the style rules.

The linter-quickstart repository provides the .vale.ini file, and the .styles directory. The .styles/Wikimedia directory contains YAML files with style guide rules codified using Vale's rule syntax. Each file defines a rule or rules and a severity level.

Placing the .vale.ini file and the styles folder in the home directory on your device should be enough to run Vale in CLI mode and as a language server.

GitLab CI configuration

GitLab CI configuration with the linter available for importing in custom pipelines

The linter-quickstart repository contains a GitLab CI configuration file with a linting job that can be copied to your .gitlab-ci.yml file in a different repository. This linting job requires two environment variables to be defined for the purpose of specifying a Vale configuration and a documentation directory. A job returns linting information in the form of direct output from Vale running on the specified files or directories.

The pipeline job fails when linted files contain any style errors, and succeeds if the linter outputs only warnings or suggestions. There are currently no rules with an "error" severity level in the .styles directory.

Web-based tool

Web-based tool for linting wiki page content

The documentation linter web app is a Toolforge tool with a simple interface. It allows users to request linter output for a single page on mediawiki.org or Wikitech. Linter output isn't cached - the linter runs on every request.

The app displays a separate section for each detected issue. Each section contains information about the type of feedback, rule, a line or section that feedback applies to, and any other information necessary to improve the documentation. The application stores no history or context, merely runs the linter on current page content. As page content changes, the app will display fewer issues when the docs improve, and more issues if the changes introduce new problems.

The web app is written in Python using Flask. It serves as a thin wrapper around Vale and offers little functionality on its own. It doesn't require authentication and has a simple architecture, with back-end providing most of the functionality. The app is designed to minimize CPU, RAM, and bandwidth requirements for its users.

The app repository includes linter configuration and the full set of style rules.

Documentation

Plan

Note: Tasks can run in parallel.

Phase 1

  • Pick a linter T388111: Choose a linter to use in the linting tools project
  • Evaluate rules and guidelines from the available style guides
  • Select the rules to codify programmatically
  • Codify the selected rules in the language of the linter
  • Specify any other configuration options necessary for the linter to run
  • Test the linter and the configuration on existing documentation
  • Write instructions on how to use the linter and the configuration in a local development environment

Phase 2

  • Design a GitLab CI pipeline documentation linting step
  • Create a CI configuration that can be included in a pipeline
  • Test the configuration in a real project
  • Consider the steps necessary to replicate the functionality for projects in Gerrit
  • Write instructions on how to use the linter in any project pipeline

Phase 3

  • Design a web-based tool for linting wiki page content
  • Develop the tool
  • Deploy the tool
  • Test the tool on existing wiki docs
  • Document the tool

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Update style rules and dependenciesrepos/technical-documentation/documentation-linting/linter-webapp!8kbachrelease/0.0.3main
Update rule links and documentationrepos/technical-documentation/documentation-linting/linter-quickstart!14kbachrelease-candidatemain
Update README.mdrepos/technical-documentation/documentation-linting/linter-quickstart!13kbachupdate-readmemain
Improve page cleanuprepos/technical-documentation/documentation-linting/linter-webapp!7kbachimprove-page-cleanupmain
Publish version 0.0.2repos/technical-documentation/documentation-linting/linter-webapp!6kbachrelease/0.0.2main
Fix bugsrepos/technical-documentation/documentation-linting/linter-webapp!5kbachbugfixesmain
Fix bugsrepos/technical-documentation/documentation-linting/linter-quickstart!11kbachbugfixesmain
Prepare to publish version 0.0.1repos/technical-documentation/documentation-linting/linter-webapp!4kbachrelease/0.0.1main
style rules: suggested additionsrepos/technical-documentation/documentation-linting/linter-quickstart!8tburmeisterrulesmain
First prototyperepos/technical-documentation/documentation-linting/linter-webapp!1kbachfirst-prototypemain
Customize query in GitLab

Event Timeline

KBach changed the task status from Open to In Progress.Mar 6 2025, 11:17 AM
KBach triaged this task as Medium priority.
KBach moved this task from Backlog to In progress on the Tech-Docs-Team board.
KBach moved this task from In progress to Active projects on the Tech-Docs-Team board.
KBach updated the task description. (Show Details)

Over the last week:

  • We've decided to use Vale as the linter of choice in this project. For details, see T388111.
  • I've created an initial/placeholder repository with the linter configuration. It contains a few of our style guide rules codified using Vale's YAML syntax and will serve as a starting point for future work.
  • I've spent some time designing the CI pipeline. My initial approach of using Blubber directly with Vale's official Docker image was unsuccessful because Blubber doesn't currently support alpine-based images (reported in T388769). I have put together four alternative configurations and am now evaluating them in terms of usability.
KBach updated the task description. (Show Details)

Over the last week:

  • I've clarified the expected outcome ideas and added more details to their descriptions in this task. The designs of all planned artifacts are now finalized.
  • I've created the linter CI pipeline repository and ran some initial linting jobs on a test document. The main focus is now on minimizing the number of steps necessary to make use of the linter pipeline in another repository (i.e. UX).
KBach updated the task description. (Show Details)

Over the last week:

  • I've moved away from keeping configurations in separate repositories and created a single quickstart repo with all the configuration files and instructions. This is because at this stage separate repositories introduced unnecessary complexity. I'll rethink this structure closer to the end of the project. The new quickstart repository contains all the information necessary to start using Vale locally and in a GitLab CI pipeline (though the number of checks performed by the linter is still small at the moment).
  • I've created a repository for the web application and started building it based on earlier prototypes.

This week, I started looking into defining a Vale ruleset for linting Wikimedia docs. I reviewed existing rulesets from companies and open-source projects, reviewed Wikimedia style guides, and reviewed past work on plain language guidelines. I created a draft list of proposed rules and alert levels to start codifying and experimenting with. See the working GoogleDoc.

The first prototype of the web application is almost ready, I just need to fix a few bugs. I should be able to submit a request for review early next week.

We made good progress this week on codifying the rules. We've submitted merge requests that add nine new rules and adjust several of the existing rules. Based on our testing, we've iterated on our approach to the rules; see the doc for more details. Next, we'll be merging the patches, looking into readability metrics, resolving remaining to-dos, and testing with a wider range of pages.

  • First prototype of the wiki linter has been reviewed by @apaskulin and is now available on the main branch. I will now work on deploying it on Toolforge.
  • I am still working on a few of the remaining style rules.

This week's update:

  • Completed a study of available readability metrics and their basis in Vale's built-in metric variables, and submitted a merge request with two experimental rules based on my findings and a merge request to preserve the rules I used to generate page stats for development purposes.
  • Submitted a merge request with a few low-priority rules.
  • Merged two merge requests from new contributors (!8 and !9)
  • First prototype of the wiki linter is now live on Toolforge.
  • Main set of style rules in the quickstart repository is now complete.
  • We are now ready to start testing the tools in real projects and on existing documentation.

This week, I drafted a list of repositories and wiki pages to use as an initial test set. So far, I've tested 7 repositories and 12 wiki pages to evaluate the rules for bugs, adjustments, and false positives, amounting to over 5,000 suggestions.

I'm still in the process of testing the tools:

  • Almost done testing the wiki linter. Still need to identify a root cause for one bug.
  • Started testing the repositories. Here I'm focusing on making improvements to style rules with particularly high false positive rates.

This week, I completed my testing of the remaining two repositories and consolidated my findings and suggestions for the prototype release of the ruleset.

Main tests are done and bug fixes are ready for review. I will now work on minor UI improvements in the web app, continue verifying the fixed rules, and write documentation.

  • I deployed a new version of the wiki linter to https://techdoc-linter.toolforge.org/. This version features UI improvements, new usage instructions, and a few bug fixes.
  • One final bug fix is ready for review in this MR in GitLab.
  • I'm still working on the documentation.
  • I merged and deployed the final bug fix in the web app.
  • I researched the possibility of replicating the linter CI pipeline for projects in Gerrit. We now have a general idea of how to achieve it, but will not implement it as part of this task.
  • Linter rule reference documentation is now undergoing a review. I will publish it on mediawiki.org when it's ready.
  • Other documentation is still in progress.

I've been away most of the week so there hasn't been much progress in this task. Work on the documentation is still ongoing.

The linter rule reference is now on wiki, awaiting a few final modifications. I'll move it to its final location when the other documentation (still in progress but almost done) is ready.

The main project documentation page is now in review. I'm still making minor changes in the other docs (README, reference), and in the tools themselves. I'm planning to finalize this project next week.

This project has now been completed.