Page MenuHomePhabricator

Move some analytics jobs to day time in Virginia
Open, Needs TriagePublic

Description

Analytics is the second most power hungry cluster in production (just behind k8s). Given that it consumes 38kW on average. And looking PJM grid data (436g for each kWh for three months) this roughly translates to 397kg of CO2 every day or 145 tonnes of CO2 every year. That's roughly 350 people doing a trans-Atlantic flight (Based on ICAO calculator, FRA-JFK)

Other clusters have an extremely stable consumption so we can't move them but analytics varies a lot:
https://grafana.wikimedia.org/d/f64mmDzMz/power-usage?orgId=1&viewPanel=110

image.png (817×1 px, 190 KB)

PJM grid is greener during the day (because of the solar the average drops to 395g per kWh in Winter and I estimate it drops to ~336 g per kWh) and for every 10% of power consumption moved to day time we save 3.7kg CO2 in winter and 9.4kg CO2 in summer every day. That translates to 2.4 tonnes of CO2 saved every year (assuming average of summer and winter). That's roughly 6 people doing a trans-Atlantic flight. Noting that the impact will constantly increase since several solar projects are connecting to PJM including Fox Squirrel Solar (last part connected this December) and BayWa r.e. solar projects (to be finished by end of 2025).

Event Timeline

Related: T371321: [Idea] Collect pageview data using client-side instrumentation. I'd expect that energy consumption would go down if we stopped searching for needles in the webrequest haystack.

Ottomata renamed this task from Move some anlaytics jobs to day time in Virginia to Move some analytics jobs to day time in Virginia.Feb 18 2025, 6:54 PM

FWIW. Almost all of the power consumption in analytics infra (understandably) is coming from an-worker nodes. There is some large consumption coming from presto cluster too: https://grafana.wikimedia.org/goto/ytdntxGDR?orgId=1

Since the vast majority of analytic jobs are scheduled via Airflow, we could certainly change the schedule cron definitions of each DAG to achieve this, but it would require careful planning, as today we rely heavily on the @daily alias, which makes everything run at midnight UTC.

We could certainly pick one heavy job and do an experimental rescheduling as a start.

Since the vast majority of analytic jobs are scheduled via Airflow, we could certainly change the schedule cron definitions of each DAG to achieve this, but it would require careful planning, as today we rely heavily on the @daily alias, which makes everything run at midnight UTC.

That seems to be 7pm virgina time. If we push the default a bit earlier, that could automatically have the intended effect.

We could certainly pick one heavy job and do an experimental rescheduling as a start.

That sounds good to me!