Page MenuHomePhabricator

Redesign Data Platform docs on Wikitech
Open, In Progress, MediumPublic

Description

This task covers technical writer support for the conversion of Data Engineering docs on Wikitech from a team-focused collection to a technical-product-focused collection for data platform(s).

This work depends partially the migration of team-focused documentation from Wikitech to MediaWiki.

Technical documentation work included:

  • Content audit of DPE docs on Wikitech
  • Collaborative design of a new, task-focused structure for the data platform documentation
  • Implementation of new documentation structure and metadata
  • Creation of one or more landing pages for different data platform components
  • Deprecation and redirection of legacy team pages

Event Timeline

TBurmeister changed the task status from Open to In Progress.Nov 9 2023, 7:29 PM
TBurmeister triaged this task as Medium priority.
TBurmeister added a project: Goal.
TBurmeister moved this task from In progress to Active projects on the Tech-Docs-Team board.
TBurmeister changed the status of subtask T350910: Copyedits for Data Platform team docs from Open to In Progress.

Status update:

  • Finished and shared draft of content strategy (Google Doc) highlighting my findings so far from DPE docs content audit; identified design goals.
  • Started co-designing the information architecture for DPE docs on Wikitech; the main goals right now are:
    • Get aligned on the plan for how the docs on Wikitech will be evolving
    • Identify a meaningful set of subcollections that can improve the user experience for the two major audiences: data producers and data consumers.
  • Olja and other DPE team leads will meet in the coming days to get aligned on overall docs strategy for Wikitech and team docs on Mediawiki
  • Content audit will continue in parallel with planning/design meetings in the coming week.

Status update:

  • Olja created the next iteration of the information architecture draft (Google doc) for the DPE docs, and we had a working session to get it into a review-able state so that other stakeholders can provide feedback. Olja is coordinating the discussions about this content structure with her team/eng leads and comments are already happening on the doc; Tricia will support as needed.
  • We agreed to focus on finalizing the structure and content groupings now, and worry about presentation and formatting on-wiki after everyone is aligned on the underlying IA.
  • We agreed to abandon the binary separation of "data consumer" and "data producer" docs since there's a non-trivial subset of the content that is for power users and would be hard to fit in to that rigid distinction.

Status update: work started on T350914. More details there!

To help with making sense of this large corpus of docs (with many confusing page redirects), and to power future landing page design, I've created some categories on Wikitech and started categorizing the pages. So far, I've created the following categories and started applying them to pages that were obvious candidates based on their current/legacy page structure:

Category:Data_domains

This is by no means a complete set of categories; it's just a start to help move work forward on this task, since we've sort of lost traction due to the overwhelming amount of content. The idea is that having meaningful groupings of pages based on category metadata will help make it easier to move away from the legacy and outdated page structure that is no longer serving us.

Messy details and ongoing work is tracked in [[this Google spreadsheet https://docs.google.com/spreadsheets/d/1g_m-W__Hrg2zyxLhY8y2WbwHh0ZS3cMAdUrDBoskcks/edit?usp=sharing]], but I'll continue to update this and related phab tasks with meaningful status updates so no one has to try to interpret that sheet :-)

it occurs to me that we should broader "edits data" to "contribution data", since some kinds of contribution we would want to collect there aren't edits. otherwise looks spiffy!

Good point! I added a parent category for Contribution data and nested Edits data under that.

Since work on this project has been stalled for a bit due to its large scope and complexity, today I built out a set of demo wiki pages in my sandbox, to try to help us feel out what our WIP content framework could look like when more fully built-out on wiki: https://wikitech.wikimedia.org/wiki/User:Triciaburmeister/Sandbox/Data_platform

My hope is that, if this feels workable, it will provide an easier set of buckets for us to work on moving existing content into, or at least a more concrete artifact for the teams to talk about and iterate on.

Status update:

  • Timeline established (Google doc) and phab subtasks filed under T350914 for landing page review of the 4 major sections of the new Data platform docs portal page (the main landing page)
  • Working group of subject matter experts created and we now have a dedicated Slack channel for coordinating this work
  • Draft of the "Discover data" landing page (1 of 4) is shared and undergoing review with a deadline of April 12.
  • Draft of the "Analyze data" landing page (2 of 4) is in progress with a deadline for sharing the draft for feedback on April 15

As part of replacing the outdated Data Engineering docs on Wikitech, I have made the following revisions after consultation with the DPE team/stakeholders:

Pages deleted:

Pages redirected and not moved:

Pages moved and redirected:

Still to do / requires discussion:

Decision from 12 June working session (see https://wikitech.wikimedia.org/wiki/Data_Engineering/TOC and this content analysis of page structure for reference):

  • Move the following to be subpages of Data_Platform, and add Category:Data_Platform:
    • Pages under Data_Engineering/, with the exception of /Systems
    • Pages under Analytics/, with the exception of /Systems and /Cluster
  • Move the following to be subpages of Data_Platform/Systems, and add Category:Data_Platform_systems
    • Pages under Data_Engineering/Systems
    • Pages under Analytics/Systems and Analytics/Cluster

This means that there will be the following page structure for all DPE and Data Platform docs on Wikitech:

  • Data_Platform_Engineering (team page / portal)
  • Data_Platform (landing page)
    • Data_Platform/Systems [and subpages]
    • Data_Platform/Data_Lake [and subpages]
    • Data_Platform/AQS [and subpages]
    • Data_Platform/Evaluations [and subpages]

Analytics/Archive will remain as-is. Analytics/Team will move to Analytics/Archive and pages marked as {{historical}}.

Great! Thanks. Are you planning to leave redirects when moving the pages, or should we minimise the number of redirects and fix any broken links as we discover them?

I was planning to leave redirects, but I'm open to not doing that if you think it's a better long-term solution for the overall health of the docs and wikitech!

Great! Thanks. Are you planning to leave redirects when moving the pages, or should we minimise the number of redirects and fix any broken links as we discover them?

I would definitely say to leave redirects, unless there's a strong reason not to. Moving without leaving a redirect is very unusual.

Epic status update:

  • I moved 456 pages from /Analytics and /Data_Engineering to their new homes as outlined above in https://phabricator.wikimedia.org/T350911#9886785.
  • I cleaned up the many redirects that were making these docs confusing to navigate (see the legacy state here).
  • Updated links on all the new Data Platform landing pages and some other key content pages, but otherwise the redirects will handle getting people to the right place.

Thanks to this project, today I achieved my one thousandth edit on Wikitech :-)

Continued cleanup after the big doc migration:

  • Created a new landing page for AQS docs, to accommodate on-wiki user docs located at Data_Platform/AQS, separate from the maintainer docs at Data_Platform/Systems/AQS.
  • Updated the content of Analytics page. Its only remaining subpages are under /Archive.
  • Updated navigation template to make use of the more consolidated page structure, linking to key landing pages like /Data_Lake instead of sections of the higher-level landing pages like /Discover_data.

Ongoing/still to-do:

  • Ongoing work to add categories to pages so that page structure is no longer the only way that docs about the same topic can be discovered and/or collocated.
  • Documenting the documentation structure and maintenance process: filed some phab tasks for remaining work, and wrote down some key doc maintenance tips, but still some more work to do on this before marking this task as complete.

Amazing! @TBurmeister thank you so much for making all of this happen. I'm adding the epic tag because this has most definitely been an epic amount of work 😁