Page MenuHomePhabricator

Broken reportupdater queries: edit count bucket label contains illegal characters
Closed, ResolvedPublic3 Estimated Story Points

Description

This is breaking some of our daily metrics. We would like to fix before calling parent T275757 "done".

Apr 01 05:02:13 an-launcher1002 reportupdater-templatedata[8907]: ValueError: Invalid metric name "MediaWiki.TemplateData.template.template-description-change.actionCount.byEditCount.100-999 edits.byWiki.bowiki"

Apr 01 05:01:58 an-launcher1002 reportupdater-templatewizard[8911]: ValueError: Invalid metric name "MediaWiki.TemplateWizard.save.byEditCount.1000+ edits.byWiki.afwiki"

Apr 01 07:28:07 an-launcher1002 reportupdater-codemirror[8627]: 2021-04-01 07:28:07,337 - CRITICAL - Invalid metric name "MediaWiki.CodeMirror.sessions.byEditor.wikitext_2010.byEnabled.true.byEditCount.1000+ edits.byWiki.adywiki"

Graphite can only handle these characters, [a-zA-Z0-9_=-.]. We need to encode the " " space and "+" plus characters in the edit count bucket metrics tag.


Actions to take:

  • disable these jobs in puppet
  • purge graphite
  • Apply query fixes Ie37ec0bd29109 and I7d54d253e25.
  • Set job start dates to 2021-01-01: I18d4373b51fd20
  • purge output files for all queries in codemirror, templatedata, templatewizard, and visualeditor
  • reenable jobs: Ie778f5b7b797

Event Timeline

I'll apply this transformation at the output,

select replace(replace('100-999 edits', '+', ' or more'), ' ', '_');
select replace(replace('1000+ edits', '+', ' or more'), ' ', '_');

100-999_edits
1000_or_more_edits

It would be nice if I could reuse this function across hive scripts, is there an easy way to do that short of UDF?

Change 676297 had a related patch set uploaded (by Awight; author: Awight):

[analytics/reportupdater-queries@master] Escape edit count bucket for metrics tag name

https://gerrit.wikimedia.org/r/676297

awight added a project: Unplanned-Sprint-Work.
awight set the point value for this task to 1.

I wasn't sure whether this is unplanned or planned work, so errored towards caution.

awight moved this task from Review to Watching (Stalled) on the WMDE-TechWish-Sprint-2021-03-31 board.

Change 676297 merged by Mforns:

[analytics/reportupdater-queries@master] Escape edit count bucket for metrics tag name

https://gerrit.wikimedia.org/r/676297

I just realized that the cached output files for all of these jobs have been generated with invalid data, please remove them. I'm not sure when the bad data starts, but it can be recognized by strings like "1000+ edits" or "5-99 edits". These should now be "1000_or_more_edits" and "5-99_edits" respectively.

So sorry for this extra complication!

After discussion, we think that some of the report data may have been pushed to Graphite already. To avoid duplicating data, we'll need to purge the metrics in Graphite and then remove the reportupdater output files to cause the job to re-run.

These are the affected metrics:

MediaWiki.CodeMirror.toggles.byEditor.wikitext_2010.byEnabled.$enabled.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.CodeMirror.toggles.byEditor.wikitext_2017.byEnabled.$enabled.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.CodeMirror.sessions.byEditor.wikitext_2010.byEnabled.$enabled.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.CodeMirror.sessions.byEditor.wikitext_2017.byEnabled.$enabled.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.CodeMirror.preferences.byPreference.wikitext_2010.byEnabled.true.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.CodeMirror.preferences.byPreference.wikitext_2017.byEnabled.true.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.CodeMirror.preferences.byPreference.CodeMirror.byEnabled.true.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.CodeMirror.preferences.byPreference.wikitext_2010_and_CodeMirror.byEnabled.true.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.CodeMirror.preferences.byPreference.wikitext_2017_and_CodeMirror.byEnabled.true.byEditCount.$edit_count_bucket.byWiki.$wiki

MediaWiki.TemplateData.dialog.created_and_saved_template.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.TemplateData.dialog.created_and_abandoned_template.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.TemplateData.dialog.edited_and_saved_template.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.TemplateData.dialog.edited_and_abandoned_template.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.TemplateData.template.$action.actionCount.byEditCount.$edit_count_bucket.byWiki.$wiki

MediaWiki.TemplateWizard.open.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.TemplateWizard.save.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.TemplateWizard.abort.byEditCount.$edit_count_bucket.byWiki.$wiki

MediaWiki.VisualEditor.templateDialog.open.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.VisualEditor.templateDialog.open.byMethod.menu.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.VisualEditor.templateDialog.open.byMethod.keyboard.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.VisualEditor.templateDialog.open.byMethod.edit_existing.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.VisualEditor.templateDialog.close.bySaved.$saved.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.VisualEditor.templateDialog.edit_parameter.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.VisualEditor.templateDialog.doc_click.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.VisualEditor.templateDialog.add_known_param.byEditSaved.$saved.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.VisualEditor.templateDialog.add_unknown_param.byEditSaved.$saved.byEditCount.$edit_count_bucket.byWiki.$wiki
MediaWiki.VisualEditor.session.byEditCount.$edit_count_bucket.byWiki.$wiki

and these are the reportupdater jobs,

codemirror/hive/sessions 
codemirror/hive/toggles                                      
templatedata/hive/actions                                    
templatedata/hive/dialog                                     
templatewizard/hive/template_wizard_exits                    
templatewizard/hive/template_wizard_opens                    
visualeditor/hive/template_dialog_opens                      
visualeditor/hive/template_dialog_opens_by_edit_count        
visualeditor/hive/template_dialog_other_events               
visualeditor/hive/template_dialog_parameters_by_edit_success 
visualeditor/hive/visual_editor_sessions

I'm not sure how to manage the timing between the two purges, especially since I can't perform either one myself. Maybe it's best to:

  • disable these jobs in puppet
  • purge graphite
  • purge output files
  • reenable jobs

If others agree, I'll prepare a patch to do that.

Change 679390 had a related patch set uploaded (by Awight; author: Awight):

[operations/puppet@production] Temporarily disable some reportupdater jobs

https://gerrit.wikimedia.org/r/679390

I've taken screenshots of the historical CodeMirror grafana boards, to be sure we have a baseline ahead of syntax highlighting deployments. Now it's safe to purge that data from Graphite.

The other features haven't been deployed yet, so it's fine to lose data before Feb 1. Metrics and output files should be entirely purged, no selectiveness is required.

We want the raw aggregated data as well:

Change 679390 merged by Filippo Giunchedi:

[operations/puppet@production] Temporarily disable some reportupdater jobs

https://gerrit.wikimedia.org/r/679390

Change 680021 had a related patch set uploaded (by Awight; author: Awight):

[operations/puppet@production] [DNM] Revert "Temporarily disable some reportupdater jobs"

https://gerrit.wikimedia.org/r/680021

awight changed the point value for this task from 1 to 3.

Change 680267 had a related patch set uploaded (by Awight; author: Awight):

[analytics/reportupdater-queries@master] Update job start dates to only backfill existing data

https://gerrit.wikimedia.org/r/680267

Change 680270 had a related patch set uploaded (by Awight; author: Awight):

[analytics/reportupdater-queries@master] Escape edit count bucket for Graphite (sql query)

https://gerrit.wikimedia.org/r/680270

awight added a subscriber: mforns.

@mforns I think this is ready to go now, I've purged Graphite and prepared patches for the remaining steps (see task description).

Change 680270 merged by Mforns:

[analytics/reportupdater-queries@master] Escape edit count bucket for Graphite (sql query)

https://gerrit.wikimedia.org/r/680270

Change 680267 merged by Mforns:

[analytics/reportupdater-queries@master] Update job start dates to only backfill existing data

https://gerrit.wikimedia.org/r/680267

Change 680021 merged by Filippo Giunchedi:

[operations/puppet@production] Revert "Temporarily disable some reportupdater jobs"

https://gerrit.wikimedia.org/r/680021

awight claimed this task.
awight removed a project: Patch-For-Review.
awight updated the task description. (Show Details)

Thanks to the many people who helped with this!