Page MenuHomePhabricator

[Scraper] Wmcloud Migration
Open, In Progress, HighPublic2 Estimated Story Points

Description

  • Consider a custom URL/Domain name again
  • Implement code changes
  • Take down toolforge instance
  • Final DB migration, copy from toolforge to wmcloud
  • Enable scheduler in settings.ini on wmcloud

Acceptance Criteria
All wikibase-metadata is now present and going to Wmcloud (whether through manual routine database migrations, a proxy or a cron script)

Event Timeline

roti_WMDE renamed this task from [Scraper] Finallize wmcloud migration to [Scraper] wmcloud migration.Sep 15 2025, 9:10 AM
roti_WMDE claimed this task.
roti_WMDE renamed this task from [Scraper] wmcloud migration to [Scraper] Wmcloud Migration.Sep 16 2025, 6:09 AM
roti_WMDE updated the task description. (Show Details)
Leif_WMDE changed the task status from Open to In Progress.Oct 9 2025, 10:04 AM

My recommendation for completing this work would be these two tasks in this order:

  1. Create a script or configuration on the ToolForge instance to forward all traffic going there to the WMCloud instance URL, and to leave the ToolForge instance running indefinitely--or at least for 6 moths to 1 year after the new release of Wikibase which incorporates the WMCloud URL.
  1. Migrate any data since last merge from the ToolForge database to the WMCloud database.

Re. # 1 implementation: In brief research it appears the most minimal configuration might be to utilize the lighthttpd backend available on ToolForge instances to accomplish this, but it could be accomplished a few ways. Helpful refs: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd and https://redmine.lighttpd.net/projects/lighttpd/wiki/TutorialConfiguration

Looks like I don't actually have access to that ToolForge instance, thought I did. @RickiJay-WMDE I assume you do, and if so can you add me to the project perhaps? My shell user name is lorenjohnson.

Either way I have request membership of the ToolForge project directly here, so if you don't have access hopefully this will clear things up? https://toolsadmin.wikimedia.org/tools/membership/status/2085

Leif_WMDE triaged this task as High priority.
Leif_WMDE added a subscriber: RickiJay-WMDE.

The WMCloud instance was again maxed on disk usage. I found the root cause of that though which was that the Docker images we are building to package the code were never getting pruned, so there were over 10G of old images hanging around. When Docker does a build of an image that has changed, it will hold onto the past versions until you explicitly remove them. The manual and common solution is simply to routinely run sudo docker image prune. There may be a setting you can set on that servers Docker configuration to make it auto-prune routinely, or one could simply create a bash script added to a cron schedule to do this for you... Not sure what is best or standard for that, but for now we're out of the woods with disk space usage and just run that command going forward.

However ever after clearing up the disk space issue and rebooting everything I was not able to get things working again. The backend still seems to not be happy and the data isn't pulling. The backend is the issue, and it appears to come up fine accordingly to logging but isn't showing up on port 8000.,.. Maybe stuck in some migrations? @RickiJay-WMDE

Yay ! We're back up! After clearing up the disk space issue, I was able to manually kick off all the migrations and that seems to bring things back to life.