Page MenuHomePhabricator

Complete content audit and task mapping for Data Platform docs
Closed, ResolvedPublic

Description

Complete a content audit (lightweight survey) of Data Engineering docs:
https://wikitech.wikimedia.org/wiki/Data_Engineering/TOC

  • Identify major docs that are relevant for key personas and critical user journeys
  • Start to align existing docs with a new task-focused information architecture for the Data Platform documentation
  • Start tracking key tasks and user goals that are missing documentation (not feasible due to lack of resourcing)

Event Timeline

TBurmeister moved this task from Next to In progress on the Tech-Docs-Team board.

Highlights of findings from content audit so far (full details in Content Strategy (Google Doc)):

Snapshot of analysis from content audit

image.png (790×1 px, 173 KB)

Main opportunities to improve content structure and findability

Identify sub-collections

[Work in progress, working on this with Olja and others to create a draft IA.]

Add navigation menus

Currently, there's only a nav template for the team docs (which are moving to MediaWiki). The only way to navigate the large corpus of other pages is through the TOC page, which has separate sections based on the legacy team structure, /Analytics and /Data_Engineering.

Each major subcollection of pages within this large doc collection should have its own navigation menu. The information architecture we create will support navigation within and across those subcollections.

Simplify and standardize subpage structure

Some sections of this doc collection have a deep subpage structure with many pages, while others are shallow (no subpages). This means that the page structure can only be used for navigation in the sections of the collection that have structure. For example, from a page like this one, you can use the breadcrumbs to navigate up and down the tree in the /Analytics/Data_Lake section and its subpages.

Other than links within individual pages, there's no easy way to navigate to or explore the shallow parts of the information space. One must either search, or rely on the TOC page – both of which require pre-existing knowledge of the infrastructure and its terminology in order to be used effectively.

Add category metadata

Currently, most semantic context is embedded in page hierarchy structure, i.e. /Analytics/Data_Lake/Edits, /Data Engineering/Systems/Druid.
https://wikitech.wikimedia.org/wiki/Category:Data_stream seems useful, but it's unclear if it's complete/maintained. Identifying relevant use cases for category metadata in these collections is still an open task, but will be considered as a supplemental way to enable navigation across these pages for use cases that aren't covered by the revised IA and new navigation menus.

TBurmeister changed the task status from Open to In Progress.Dec 7 2023, 4:22 PM

Status update:
Content audit work continues, currently focused on placing existing pages into the new collections we've identified and are outlined on the draft landing page https://wikitech.wikimedia.org/wiki/Data_Platform_Engineering.

I've been adding categories for some of the topics that have the most content sprawl across wikis, to try to make it easier to work with all the pages and hopefully streamline them. Categories added so far:

Status update: Did more work to add categories as page metadata so we can more freely design landing pages and other navigation that isn't tied to page/subpage structure. For example, some of the pages within one category might go in different sections of the landing page navigation, like a page on how to use Spark vs. a page on how to administer it. Also hoping that having these categories can make it easier to divide and conquer when it comes to reviewing and archiving outdated content.

Categories created or revised:

So far, I've categorized ~160 of roughly 285 pages (my tracking is a little ambiguous because of page redirects so it could be more or less). Not all the categories are represented as metadata, some of them are just aligning current pages to their thematic home in the new overall site structure, as outlined on https://wikitech.wikimedia.org/wiki/Data_Platform_Engineering.

I wasn't able to use the categories as I had hoped because it turns out the DynamicPageList extension isn't installed on Wikitech. So I put together a revised content outline just by linking to the categories and other key landing pages:

https://wikitech.wikimedia.org/wiki/User:Triciaburmeister/Sandbox/Data_platform

Marking this as resolved since work to add category metadata is done, and the main deliverable I'm working on for the "Simplify and standardize subpage structure" section in my original analysis is the landing pages project now tracked in T350914.

TBurmeister updated the task description. (Show Details)
TBurmeister moved this task from In progress to Done on the Tech-Docs-Team board.