Page MenuHomePhabricator

Redesign Data Platform docs on Wikitech
Open, In Progress, MediumPublic

Description

This task covers technical writer support for the conversion of Data Engineering docs on Wikitech from a team-focused collection to a technical-product-focused collection for data platform(s).

This work depends partially the migration of team-focused documentation from Wikitech to MediaWiki. The DPE teams will do that work, with tech writer support in the form of copyediting and advising.

Technical documentation work will likely include (to be scoped and more defined iteratively):

  • Content audit of DPE docs on Wikitech
  • Collaborative design of a new, task-focused structure for the data platform documentation
  • Implementation of new documentation structure
  • Creation of one or more conceptual overviews for different data platform components

Event Timeline

TBurmeister changed the task status from Open to In Progress.Nov 9 2023, 7:29 PM
TBurmeister triaged this task as Medium priority.
TBurmeister added a project: Goal.
TBurmeister moved this task from In progress to Active projects on the Tech-Docs-Team board.
TBurmeister changed the status of subtask T350910: Copyedits for Data Platform team docs from Open to In Progress.

Status update:

  • Finished and shared draft of content strategy (Google Doc) highlighting my findings so far from DPE docs content audit; identified design goals.
  • Started co-designing the information architecture for DPE docs on Wikitech; the main goals right now are:
    • Get aligned on the plan for how the docs on Wikitech will be evolving
    • Identify a meaningful set of subcollections that can improve the user experience for the two major audiences: data producers and data consumers.
  • Olja and other DPE team leads will meet in the coming days to get aligned on overall docs strategy for Wikitech and team docs on Mediawiki
  • Content audit will continue in parallel with planning/design meetings in the coming week.

Status update:

  • Olja created the next iteration of the information architecture draft (Google doc) for the DPE docs, and we had a working session to get it into a review-able state so that other stakeholders can provide feedback. Olja is coordinating the discussions about this content structure with her team/eng leads and comments are already happening on the doc; Tricia will support as needed.
  • We agreed to focus on finalizing the structure and content groupings now, and worry about presentation and formatting on-wiki after everyone is aligned on the underlying IA.
  • We agreed to abandon the binary separation of "data consumer" and "data producer" docs since there's a non-trivial subset of the content that is for power users and would be hard to fit in to that rigid distinction.

Status update: work started on T350914. More details there!

To help with making sense of this large corpus of docs (with many confusing page redirects), and to power future landing page design, I've created some categories on Wikitech and started categorizing the pages. So far, I've created the following categories and started applying them to pages that were obvious candidates based on their current/legacy page structure:

Category:Data_domains

This is by no means a complete set of categories; it's just a start to help move work forward on this task, since we've sort of lost traction due to the overwhelming amount of content. The idea is that having meaningful groupings of pages based on category metadata will help make it easier to move away from the legacy and outdated page structure that is no longer serving us.

Messy details and ongoing work is tracked in [[this Google spreadsheet https://docs.google.com/spreadsheets/d/1g_m-W__Hrg2zyxLhY8y2WbwHh0ZS3cMAdUrDBoskcks/edit?usp=sharing]], but I'll continue to update this and related phab tasks with meaningful status updates so no one has to try to interpret that sheet :-)

it occurs to me that we should broader "edits data" to "contribution data", since some kinds of contribution we would want to collect there aren't edits. otherwise looks spiffy!

Good point! I added a parent category for Contribution data and nested Edits data under that.

Since work on this project has been stalled for a bit due to its large scope and complexity, today I built out a set of demo wiki pages in my sandbox, to try to help us feel out what our WIP content framework could look like when more fully built-out on wiki: https://wikitech.wikimedia.org/wiki/User:Triciaburmeister/Sandbox/Data_platform

My hope is that, if this feels workable, it will provide an easier set of buckets for us to work on moving existing content into, or at least a more concrete artifact for the teams to talk about and iterate on.

Status update:

  • Timeline established (Google doc) and phab subtasks filed under T350914 for landing page review of the 4 major sections of the new Data platform docs portal page (the main landing page)
  • Working group of subject matter experts created and we now have a dedicated Slack channel for coordinating this work
  • Draft of the "Discover data" landing page (1 of 4) is shared and undergoing review with a deadline of April 12.
  • Draft of the "Analyze data" landing page (2 of 4) is in progress with a deadline for sharing the draft for feedback on April 15

As part of replacing the outdated Data Engineering docs on Wikitech, I have made the following revisions after consultation with the DPE team/stakeholders:

Pages deleted:

Pages redirected and not moved:

Pages moved and redirected:

Still to do / requires discussion:

Decision from 12 June working session (see https://wikitech.wikimedia.org/wiki/Data_Engineering/TOC and this content analysis of page structure for reference):

  • Move the following to be subpages of Data_Platform, and add Category:Data_Platform:
    • Pages under Data_Engineering/, with the exception of /Systems
    • Pages under Analytics/, with the exception of /Systems and /Cluster
  • Move the following to be subpages of Data_Platform/Systems, and add Category:Data_Platform_systems
    • Pages under Data_Engineering/Systems
    • Pages under Analytics/Systems and Analytics/Cluster

This means that there will be the following page structure for all DPE and Data Platform docs on Wikitech:

  • Data_Platform_Engineering (team page / portal)
  • Data_Platform (landing page)
    • Data_Platform/Systems [and subpages]
    • Data_Platform/Data_Lake [and subpages]
    • Data_Platform/AQS [and subpages]
    • Data_Platform/Evaluations [and subpages]

Analytics/Archive will remain as-is. Analytics/Team will move to Analytics/Archive and pages marked as {{historical}}.