Per T166732, the cuc_comment field will be removed, but we're afraid we're going to miss it because we don't have a good way to keep in touch with that work. So this should remind us to monitor the schema before every sqoop, and when the change is applied, change our sqoop and deploy.
Changing sqoop to join a base table to the comment one in the sqoop-SQL query has been tested for mediawiki-history and has lead to non-acceptable performance (too long).
We now sqoop the comment and actor tables from analytics production replicas for acceptable performance time, and join base tables to the actor and comment ones in Spark for mediawiki-history.
I suggest the same approach should be used for cu_changes, meaning waiting for actor and comment tables to be present, and join to them.
Waiting for comments before starting to work in that direction.
I fixed a broken link, the task we need to follow is now T233004, and it looks like work is going forward. So we'll need a patch here. I'm happy to take this but will wait for grooming to jump back in the dance.
@Nuria I also see a bug in Phabricator. You mentioned https://phabricator.wikimedia.org/T232531 with the full URL, which doesn't grab the status of the task. If you use the T-shorthand like T232531, then Phab shows it crossed out and would've helped find the miscommunication.