During various Add-Link-Structured-Task maintenance changes, wikis.txt sometimes gets out of sync (it either misses a wiki or it contains a wiki which doesn't have published datasets). This breaks the linkrecommendation-internal-load-datasets k8s pod, which relies on wikis.txt to determine which wikis it should work on.
Since wikis.txt out of sync was the cause of several few issues with Add a link, we should probably create a more sustainable mechanism for keeping track of list of wikis. There are two options:
- Keep using wikis.txt as the source of truth, but improve operating procedures to ensure its content is correct. This can be done by including disabled_wikis.txt as a docs-only counterpart and introducing a periodical check that would verify wikis.txt + disabled_wikis.txt matches all wikis in the dataset directory.
- Remove wikis.txt and instead rely on the list of existing directories to determine which datasets doing (a Python equivalent of what I did with lftp at P50941)
Tasks in which wikis.txt out of sync was the underlying issue should be added as subtasks to this one.