Page MenuHomePhabricator

Scraper: track with production Prometheus
Closed, ResolvedPublic

Description

Follow the instructions for adding a new production metric, and point collection at stat1009 with the expected port and path, probably :9568/metrics .

Alternatively: it's also possible to set up ephemeral job push metrics but these have the major drawback of not easily capturing runtime performance metrics, which is the main raison d'etre of our metrics.

When done, update our internal documentation and remove the deprecated Prometheus process. This might still be useful for local dev, so it can be ported to the code readme.

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedWMDE-Fisch

Event Timeline

This will simplify how we share monitoring duty during the long-running scrape job.

Change #1023085 had a related patch set uploaded (by Awight; author: Awight):

[operations/puppet@production] Temporary monitoring for long-running analytics client job

https://gerrit.wikimedia.org/r/1023085

awight moved this task from Doing to Tech Review on the WMDE-TechWish-Sprint-2024-04-12 board.

Change #1023085 merged by Btullis:

[operations/puppet@production] Temporary monitoring for long-running analytics client job

https://gerrit.wikimedia.org/r/1023085

Change #1023152 had a related patch set uploaded (by Awight; author: Awight):

[operations/puppet@production] Revert "Temporary monitoring for long-running analytics client job"

https://gerrit.wikimedia.org/r/1023152

Change #1023424 had a related patch set uploaded (by Awight; author: Awight):

[operations/puppet@production] Include job for scraper monitoring

https://gerrit.wikimedia.org/r/1023424

Change #1023424 merged by Filippo Giunchedi:

[operations/puppet@production] Include job for scraper monitoring

https://gerrit.wikimedia.org/r/1023424

When the metrics land, they should appear on https://prometheus-eqiad.wikimedia.org/analytics/targets?search=wmde_tewu .

Some fun queries include:

  • rate(scraper_analyze_plugin_duration_bucket[2m])
  • rate(scraper_summarize_page_duration_bucket{le="+Inf"}[2m])
  • vm_memory_total
  • sum(scraper_summarize_page_duration_count)
  • scraper_summarize_page_duration_count
  • sum(scraper_process_wiki_duration_count)

Sneak preview for those playing at home:

Pages processed per second, by wiki: [ graph ]

{F48332149}

Not sure why, but the attached image is private.

WMDE-Fisch claimed this task.

Change #1023152 merged by Filippo Giunchedi:

[operations/puppet@production] Revert temporary monitoring for scraper

https://gerrit.wikimedia.org/r/1023152