Page MenuHomePhabricator

Fix argo.wikimedia.swiss database connection logic (cannot generate any report for dewiki - works for itwiki and few more wikis)
Closed, ResolvedPublic32 Estimated Story Points

Description

Preamble

Ilario Valdelli reported that jobs of Argo WMCH are not generated anymore.

https://argo.wikimedia.swiss/

Problem

It seems this tool was designed on an undocumented feature of the Wikimedia Replicas, executing generic MySQL queries to generic databases (e.g. dewiki_p, itwiki_p) over a single database connection to the itwiki cluster, but this is not possible anymore, since that cluster does not have all our relevant wikis anymore.

Precisely, itwiki_p is available via itwiki.analytics.db.svc.wikimedia.cloud and dewiki_p via dewiki.analytics.db.svc.wikimedia.cloud ecc. and AFAIK other shares in the current 6 phisical clusters are not reliable - so relying on s2.analytics.db.svc.wikimedia.cloud should not be suggested too.

https://lists.wikimedia.org/hyperkitty/list/cloud@lists.wikimedia.org/thread/YG6VLDX23HP6QEVERYS7HYUPUNPAQW2U/

https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#New_host_names

Plan

  • 2022-07-25 tool exploration (https://meta.wikimedia.org/wiki/Wikimedia_CH/Project/Argo_Wikimetrics)
    • 2022-07-25 server access
    • 2022-07-25 documenting application startup
    • 2022-07-25 documenting database connections to Wikitech
  • 2022-07-25 fix bastion with SSH fingerprint changed
  • 2022-07-25 fix deprecated usage of <project>.analytics.db.svc.eqiad.wmflabs and adopt <project>.analytics.db.svc.wikimedia.cloud
  • 2022-09-23 understand the best fix strategy (rewrite database connection logic)
  • import source code from Synapta's GitHub to Wikimedia GitLab

Proposed Solutions

  1. Manual patch (2 points): instantiate lot of SSH tunnels from our server to Wiki Replicas, one for each required DB connection
  2. Semi-manual patch (4 points): same as above but with a script generating the tunnels
  3. Rewrite (32 points): rewrite the tool to do not rely on a single connection for all databases, but instantiate the right connection for the right database (this requires a handover of the project)
  4. Improve Wikimedia Cloud for the benefit of all its users (hoping that our workaround will not be needed anymore): T318191: Evaluate opening the readonly Wiki Replicas to the WAN (since we already have user authentication)

I've exposed the first 3 solutions to Ilario Valdelli and he opted for the solution n. 3 (probably because he loves things done right).

Event Timeline

valerio.bozzolan created this task.

OK one of the problems was that the SSH service called autossh-wmflabs needed a manual execution in order to manually confirm the RSA firm change.

https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/dev.toolforge.org

This part is fixed now.

OK the problem is now identified at least.

Maybe after a recent change in Wikimedia Replicas - it cannot access other databases. Example error in log:

ERROR 1049 (42000): Unknown database 'dewiki_p'

This has sense since the server is hardcoded to:

itwiki.analytics.db.svc.eqiad.wmflabs:3306
valerio.bozzolan renamed this task from Investigate why argo.wikimedia.swiss does not work anymore to Fix argo.wikimedia.swiss database connection logic (cannot generate any report except for itwiki).Jul 25 2022, 4:22 PM
valerio.bozzolan updated the task description. (Show Details)
valerio.bozzolan renamed this task from Fix argo.wikimedia.swiss database connection logic (cannot generate any report except for itwiki) to Fix argo.wikimedia.swiss database connection logic (cannot generate any report for dewiki - works for itwiki and few more wikis).Jul 25 2022, 4:38 PM

I will start reserving the local ports 13306..13311 for this on wmch-argo server.

{F35528511, size=full}
https://members.wikimedia.ch/wiki/Infrastructure/Argo

I've submitted a scheduled job 4 hours ago and it's still executing it without any explosion. Big success.

valerio.bozzolan set the point value for this task to 32.
valerio.bozzolan set Final Story Points to 12.