Page MenuHomePhabricator

Ensure wikis.txt is accurate or replace it with an alternative
Open, Needs TriagePublic

Description

During various Add-Link-Structured-Task maintenance changes, wikis.txt sometimes gets out of sync (it either misses a wiki or it contains a wiki which doesn't have published datasets). This breaks the linkrecommendation-internal-load-datasets k8s pod, which relies on wikis.txt to determine which wikis it should work on.

Since wikis.txt out of sync was the cause of several few issues with Add a link, we should probably create a more sustainable mechanism for keeping track of list of wikis. There are two options:

  1. Keep using wikis.txt as the source of truth, but improve operating procedures to ensure its content is correct. This can be done by including disabled_wikis.txt as a docs-only counterpart and introducing a periodical check that would verify wikis.txt + disabled_wikis.txt matches all wikis in the dataset directory.
  2. Remove wikis.txt and instead rely on the list of existing directories to determine which datasets doing (a Python equivalent of what I did with lftp at P50941)

Tasks in which wikis.txt out of sync was the underlying issue should be added as subtasks to this one.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I'm inclined to option (2) for simplicity.

Urbanecm_WMF moved this task from Inbox to Triaged on the Growth-Team board.