In T355139#9466863, @xcollazo wrote:clickstream_monthly_dag.py - The underlying job, ClickstreamBuilder.scala will break as it reads pagelinks. You'd think fix would be easy enough as per Amir's email thread:
pl_namespace and pl_title columns of pagelinks table will be dropped and you will need to use pl_target_id joining with the linktarget table instead. This is basically identical to the templatelinks normalization that happened a year ago.
However, the caveat is that, as of now, the plan is to drop columns on some sections while other sections would still not have been completely migrated. This means we will have to have code that understands which wikis is being fetched, and then come back again and migrate the code again... The rationale for doing the changes like this is that some section migrations take a long long time. We need to monitor the outcome of this email thread.
Further:
In T355139#9477209, @xcollazo wrote:Update from @Ladsgroup :
On Fri, Jan 19, 2024 at 4:08 PM Xabriel Collazo Mojica <xcollazo@wikimedia.org> wrote:
Amir,To summarize: the only wiki that will soon get the old columns dropped is commonswiki and the rest of the wikis will keep the old columns until the migration to the new columns is complete on all wikis, at which time there will be a communication.
Is this correct?
Yes, until further communication, only s4 (commonswiki and testcommonswiki) and testwiki (s3) will have their old columns removed.