Add dry-run mode to Flow External Store migration script
Closed, ResolvedPublic

Description

This will:

  • Output the current user content for each affected row
  • Insert into External Store
  • Output the flow_revision row that would be used in the update, but don't actually update flow_revision.
  • Read the new ES URL using the ES API and check that it has the same content as the original (we could also output the new version, but it seems simpler and sufficient to just output "Equal: true" or "Equal: false".
Mattflaschen-WMF updated the task description. (Show Details)
Mattflaschen-WMF raised the priority of this task from to Needs Triage.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptNov 24 2015, 10:06 PM
Mattflaschen-WMF set Security to None.
Catrope triaged this task as High priority.Dec 19 2015, 1:31 AM

Change 265524 had a related patch set uploaded (by Mattflaschen):
WIP: Dry run for external store

https://gerrit.wikimedia.org/r/265524

@jcrespo Do you want to review the dry run patch, or should we just review it in-team?

Change 265524 merged by jenkins-bot:
Dry run for external store

https://gerrit.wikimedia.org/r/265524

Checked the script in wmflabs

  • 'dry-run' option is added to the script specific parameters
  • FlowExternalStoreMoveCluster.php does not do anything harmful when cluster names are fake ( in --to and --from)
extensions/Flow/maintenance/FlowExternalStoreMoveCluster.php --wiki='enwiki' --dry-run --from=first_cluster --to=another_cluster
Starting dry run

Dry run completed
  • with real cluster name in --from=cluster1 , the script detects if the incorrect name is given for 'to' paramter
 mwscript extensions/Flow/maintenance/FlowExternalStoreMoveCluster.php --wiki='enwiki' --dry-run --from=cluster1 --to=cluster2
Starting dry run

Starting dry run batch

Old content: Add content
LBFactoryMulti::newExternalLB: Unknown cluster "cluster2"

flow_revision columns would become:
array (
)
New content for ID t2wzag8vz21wpcew does not match prior content.
New content: 
Old content: Add content

Terminating dry run.
  • for --from=cluster1 --to=cluster1 try
Starting dry run

Starting dry run batch

Old content: Add content
flow_revision columns would become:
array (
  'rev_content' => 'DB://cluster1/5581',
  'rev_flags' => 'utf-8,gzip,external,topic-title-wikitext',
)
New content for ID t2wzag8vz21wpcew does not match prior content.
New content: sLIQH??+I?+
Old content: Add content

Terminating dry run.
jmatazzoni closed this task as Resolved.May 2 2016, 9:59 PM

Good catch, this is actually a bug. It wrongly says the content doesn't match, because the new one is compressed. Not caught locally since it is not compressed there.

Starting dry run

Starting dry run batch

Old content: Add content
flow_revision columns would become:
array (
  'rev_content' => 'DB://cluster1/5581',
  'rev_flags' => 'utf-8,gzip,external,topic-title-wikitext',
)
New content for ID t2wzag8vz21wpcew does not match prior content.
New content: sLIQH??+I?+
Old content: Add content

Terminating dry run.