Page MenuHomePhabricator

Regional views for Foundation-level metrics
Closed, ResolvedPublic

Assigned To
Authored By
kzimmerman
Feb 28 2023, 5:54 PM
Referenced Files
F36994439: Regional_Active_Editors_annual.png
May 12 2023, 5:02 PM
F36994437: Regional_Active_Editors_1.jpeg
May 12 2023, 5:02 PM
F36994435: Regional_Active_Editors_0.jpeg
May 12 2023, 5:02 PM
F36994431: Regional_Unique_Devices_00.jpeg
May 12 2023, 5:02 PM
F36994328: Map_UniqueDevices3moRollingYoy.png
May 12 2023, 3:17 PM
F36992361: Map_ContentPerc23.png
May 11 2023, 3:23 PM
F36992360: Map_ContentPerc22.png
May 11 2023, 3:23 PM
F36992359: Map_EditorsPerc.png
May 11 2023, 3:23 PM

Description

In the May staff meeting (scheduled for May 4), we plan to present regional views of our metrics.

We will need timeline data for the following, broken down by region:

We may also need data for:

  • Net new content or total content, based on the region the content represents T332399

We would like to create:

Regions:
Per Jaime Anstee, use wmf_region from gdi.country_meta_data. Note that Kassia has said that Grantmaking will align with this definition for regions in the coming fiscal year (FY23-24)

We will also need visuals for this data. One possible approach is here: T329588

Timeline:
Slides due: April 28 (likely date) - we will need to have the data and visuals before this date
Presentation: May 4

(originally this task was created for the April meeting, but we were asked to delay our presentation due to scheduling constraints)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
mpopov triaged this task as High priority.
mpopov edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
mpopov subscribed.

Note: Use regional view from GDI's table

kzimmerman added a subscriber: Miriam.

Assigning to @Miriam as she'll be putting together the April staff meeting presentation and deciding on how to show the data.

Hi @Miriam I wanted to share this ticket which has some examples of charts we made for looking at country-level data: https://phabricator.wikimedia.org/T329588

I think if we are looking at region-level timeline data, we can do a graph with multiple lines. Do you know what time period we want to look at and which regions we want to include? We can also sync up for a meeting if that would be helpful.

Hi! I did a first pass on these. Here is a version of the Regional Active Editors Data on a 2x2 graph:

Regional_Active_Editors_2.jpeg (1×3 px, 325 KB)

Regional_Active_Editors_1.jpeg (1×3 px, 342 KB)

It seems a bit crowded to me and I'm going to keep experimenting with formatting.

It definitely was too crowded to have all eight regions on one chart:

Regional_Active_Editors_2x4.png (2×4 px, 801 KB)

Here is the Regional Unique Devices charts — with normalized yaxes:

Regional_Unique_Devices_2.jpeg (1×3 px, 276 KB)

Regional_Unique_Devices_1.jpeg (1×3 px, 298 KB)

@HXi-WMF: Those look really nice! A few suggestions for improvement:

  • In active editors only Sub-Saharan Africa chart has y-axis guide lines, but all regions should have those
  • I think the tick marks should be consistent between regions. Either they should all be in increments of 1K or all in increments of 500 so changes. Right now a 1cm shift in one region is not necessarily the same as a 1cm shift in another.
  • I don't think normalized y-axis works for unique devices – it hides patterns in Sub-Saharan Africa, for example
  • What do you think about highlighting some of the recurring peaks & troughs? The peaks in active editors in North America, for example, seem to be consistently in early parts of each year (seems like February/March, maybe?)

@HXi-WMF for Unique devices, we should block-off/remove data from Feb 2021 to June 2022, similar to F36913649.

I have these updated graphs for Active Editors with standardized yaxis intervals but different ranges:

Regional_Active_Editors_1.jpeg (1×3 px, 304 KB)

Regional_Active_Editors_2.jpeg (1×3 px, 287 KB)

These are Unique Devices with the blocked off area and similar formatting. I put the blocked off area data note above the charts bc it seemed like too much text to do it on every single subplot:

Regional_Unique_Devices_1.jpeg (1×3 px, 298 KB)

Regional_Unique_Devices_2.jpeg (1×3 px, 278 KB)

Let me know your thoughts!

My thoughts on highlight is whether it should be done for more zoomed in versions or individual region graphs? I already think the charts put together can look a bit cluttered.

@HXi-WMF: I agree on highlighting in different versions and your reasoning.

Minor issue with active editors & unique devices in Sub-Saharan Africa. The y-axis range goes below 0 but 0 is the minimum possible value for those metrics.

Hi! I have updated versions of the regional view charts here:

  • The charts are now sorted by their total value over the time period.
  • The multi-figure generation is done programmatically and can adjust to however many columns the input dataset is, putting at most four charts per figure (in matplotlib, each "image" is a figure).
  • The y axis standardization is done so that all charts have the same y axis range of ticks and number of ticks. It centers the chart as much as possible while keeping the same tick intervals (by finding the closest median divisible by the standard tick interval). It sets the minimum possible tick to 0. Right now it doesn't add a 2/3rds buffer on largest range, and this may take me a little longer to implement because it is a bit complicated.

Active Editors:

Regional_Active_Editors_0.jpeg (1×3 px, 331 KB)

Regional_Active_Editors_1.jpeg (1×3 px, 277 KB)

However, setting such a large y axis range seems to make the Unclassed unreadable, because for Active Editors, Unclassed only contained about a dozen data points all < 10 in value. The dates show up weird here because the data points are discontinuous. I spent a bit of time on it but am not sure the work it will need is worth fixing:

Regional_Active_Editors_2.jpeg (1×3 px, 166 KB)

I was thinking of having charts keep original y axis ranges if they were really far below the standardized y axis range, so charts like Unclassed could still be readable:

Regional_Active_Editors_2.jpeg (1×3 px, 169 KB)

Unique Devices:

  • I changed the blocked off area to a hatch and added the hatch into the legend. This is done with a hatched box patch and not via matplotlib's legend functions bc the blocked off area is not an actual plot.
  • I think it makes even more sense here to keep the original y axis ranges if they are far below the standardized range because it allows us to read the Unknown and Unclassed data.

Regional_Unique_Devices_0.jpeg (1×3 px, 484 KB)

Regional_Unique_Devices_1.jpeg (1×3 px, 456 KB)

With the standardized y axis range:

Regional_Unique_Devices_2.jpeg (1×3 px, 291 KB)

With their original y axis range:

Regional_Unique_Devices_02.jpeg (1×3 px, 307 KB)

Hi @HXi-WMF and @Mayakp.wiki, thank you SO SO much and sorry for the late reply here!

Would it be possible to have those signals smoothed with a 3-months rolling window? This way, we can reliably compute and report YoY differences after smoothing out sporadic fluctuations.

What I would like to try for the March staff meeting is to report indicators of YoY change for the 8 regions, maybe mapping them on the world map in some way with colors (like green for positive growth, yellow for flat growth and red for negative growth - but probably there are better ways to do this).

Let me know, also happy to jump in a call if needed :)

@kzimmerman CC

Hi! Wanted to provide some updated view options. Everything below is for Active Editors — maybe we can decided which ones we prefer based on these and I will also apply the changes to Unique Devices.

Here is the 4-chart-page views with adjusted y-axis labels:

Regional_Active_Editors_0.jpeg (1×3 px, 327 KB)

Regional_Active_Editors_1.jpeg (1×3 px, 275 KB)

Regional_Active_Editors_2.jpeg (1×3 px, 171 KB)

I will try some additional formatting options on these in the next post, but didn't want to post about two many things at once. I will try the y axis labels bolded and a lighter x-axis.

Here are 8-chart-page views:

Regular:

Regional_Active_Editors_fullview.jpeg (1×3 px, 345 KB)

Rolling 3 Month Average:

Regional_Active_Editors_rolling.jpeg (1×3 px, 322 KB)

Annual (averaged):

Regional_Active_Editors_annual.jpeg (1×3 px, 276 KB)

Quarter (averaged):

Regional_Active_Editors_quarterly.jpeg (1×3 px, 325 KB)

Note that we use averages for all the aggregate views so that the y axis ranges stay consistent.

The script also generates all the charts individually as well and saves them as images. I added in the YoY highlight and annotation to the individual charts for the regular view:

individual_Regional_Active_Editors_0.jpeg (1×3 px, 247 KB)

individual_Regional_Active_Editors_1.jpeg (1×3 px, 203 KB)

individual_Regional_Active_Editors_2.jpeg (1×3 px, 212 KB)

individual_Regional_Active_Editors_3.jpeg (1×3 px, 223 KB)

individual_Regional_Active_Editors_5.jpeg (1×3 px, 201 KB)

individual_Regional_Active_Editors_4.jpeg (1×3 px, 197 KB)

individual_Regional_Active_Editors_6.jpeg (1×3 px, 182 KB)

individual_Regional_Active_Editors_7.jpeg (1×3 px, 183 KB)

Here are versions of the multi-chart figures with the yaxis bolded to highlight the different values:

Four Charts per Figure:

Regional_Active_Editors_0.jpeg (1×3 px, 328 KB)

Regional_Active_Editors_1.jpeg (1×3 px, 276 KB)

Eight Charts per Figure:

Regional_Active_Editors_fullview.jpeg (1×3 px, 346 KB)

Regional_Active_Editors_rolling.jpeg (1×3 px, 323 KB)

Regional_Active_Editors_annual.jpeg (1×3 px, 277 KB)

Regional_Active_Editors_quarterly.jpeg (1×3 px, 326 KB)

Thank you @HXi-WMF!

After seeing the different views, let's go with the 8-chart image. The slides with 4 charts are a bit easier to read, but then splitting attention over the course of 2 images makes it more complicated again.

The annual average is incredibly helpful for seeing the trends. Per our discussion, please remove partial years because they're misleading (2023 looks better than it should, because so far we only have January and Feb data - and those months have higher numbers than, say, the summer months will have).

I also think the individual images are helpful. Can you please update the y-axis to bold the minimum and maximum numbers?

Finally, can you do an update on unique devices with these changes?

Would it be possible to have those signals smoothed with a 3-months rolling window? This way, we can reliably compute and report YoY differences after smoothing out sporadic fluctuations.

@Miriam Hua did a view with the 3 month rolling window (see T330780#8776714) but I don't think it clarifies the trends enough to be worth the extra explanation needed around rolling averages. I also think the quarterly averages don't help much in seeing longer term trends.

I would suggest we keep a version with annual averages as well as monthly averages when we need to talk about more current data.

What I would like to try for the March staff meeting is to report indicators of YoY change for the 8 regions, maybe mapping them on the world map in some way with colors (like green for positive growth, yellow for flat growth and red for negative growth - but probably there are better ways to do this).

Hua updated the task description to capture this. @HXi-WMF regarding priority, the maps are higher priority & should be worked on first since they'll be used for the staff meeting

Hi! I did a first pass at the map in matplotlib. Please let me know any feedback you have and I will make versions with the other data (editor growth, unique devices etc.) tomorrow.

Map.png (1×3 px, 480 KB)

Some notes:

  • I implemented what we discussed about having borders around the regions.
  • I tried a version with fewer numbers on the colorbar but found it less informative because it was hard to extrapolate what colors in the middle meant. I can post an example with tomorrow's charts.
  • I turned down the alpha on the text labels — I think they look less clunky that way and you can see the map behind them.
  • I removed Antarctica to make the other continents more prominent.
  • I turned down the intensity of the colorscale to make text and border lines more visible.

@HXi-WMF can you maybe flip the color palette? meaning yellow for the lower numbers and purple for higher numbers? (similar to the draft). the way we interpret colors is light=less and dark=more..

Here are versions of the chart with the reader and editor data and percentage and Month over Month change values. I flipped the color palette as well.

I left the colobar on absolute values for the charts showing percent of total, because I thought that added information. We can also switch that over to a simple percentage. I wasn't sure if we even wanted to keep that chart in at all.

World Population

Map_WorldPop.png (1×3 px, 489 KB)

World Population - % of Total

Map_WorldPopPerc.png (1×3 px, 503 KB)

Unique Devices

Map_UniqueDevices.png (1×3 px, 497 KB)

Map_UniqueDevicesPerc.png (1×3 px, 508 KB)

Map_UniqueDevicesMoM.png (1×3 px, 508 KB)

Active Editors

Map_Editors.png (1×3 px, 477 KB)

Map_EditorsPerc.png (1×3 px, 492 KB)

Map_EditorsMoM.png (1×3 px, 511 KB)

I will add in Content Data when it is updated in that task.

Thank you @HXi-WMF !

I find the color scheme to be appealing and the borders are helpful for differentiating regions.

One suggestion I have on the metrics would be to round them to 1-2 significant figures so they're easier to take in. And perhaps increase the alpha on the labels - most of the labels are easy to read, but the ones on top of the Philippines and Iceland are harder to read.

@Miriam in response to your comment in the google invitation — The plots have different numbers from the line plots because these are using March data and the line plots before were using February data fyi!

Hi @HXi-WMF! Thanks, would it be possible then to have the YoY difference over the 3-month rolling average values? Just to have more significant numbers that don't fluctuate so much between months. Thank you!

Hi! I made some of the changes we discussed. I didn't have time to implement everything requested. The main thing is that I added YoY of 3 month Rolling Averages as Miriam suggested.— it should be more stable but is it harder to interpret?

Map_WorldPopPerc.png (1×3 px, 497 KB)

Map_WorldPop.png (1×3 px, 488 KB)

Map_UniqueDevices.png (1×3 px, 496 KB)

Map_UniqueDevicesPerc.png (1×3 px, 496 KB)

Map_UniqueDevices3moRollingYoy.png (1×3 px, 507 KB)

Map_Editors.png (1×3 px, 476 KB)

Map_EditorsPerc.png (1×3 px, 500 KB)

Map_Editors3moRollingYoy.png (1×3 px, 515 KB)

I'll work on some of the other changes we talked about for Monday including —

  • Standardizing the colorbar axis for percentage measures
  • A map with region names
  • Rounding to 2 significant digits (right now I have it rounding to 0 decimal places)

Here are views for content quality!

Map_Content.png (1×3 px, 475 KB)

Map_ContentPerc.png (1×3 px, 503 KB)

Map_ContentYoY.png (1×3 px, 494 KB)

Hi Hua, 2 things as you are running the final version of the plots:

  • as you are revising the percentages to the 2nd decimal digit, could you also add '+' in front of the ones with positive growth? Thank you so much!
  • For Unique devices, are you taking the % difference from 2020 or 2022? I think we should use 2020 as this is the last reliable data we have.

I made the requested updates.

One issue I ran into with standardizing the colorbar ranges for percentage charts is that — The range of the charts goes from [0 to 50%] for all the charts EXCEPT for Unique Devices YoY, which is mostly in the negative. Since this one chart has a slightly expanded range, I ran into the issue of the colors on the other charts becoming indistinguishable when I expanded the range to [-25% to 50%]. To deal with this, I changed the colormap to one that goes through a wider range of colors. Let me know how this feels to you.

Did we want 2 decimal places or 2 significant digits? I originally thought it was the latter but your last comment said the former. I think either is fine. I actually think rounded integers for percentages worked well too.

Map_RegionNames.png (1×3 px, 430 KB)

Map_WorldPop.png (1×3 px, 478 KB)

Map_WorldPopPerc.png (1×3 px, 507 KB)

Map_UniqueDevices.png (1×3 px, 488 KB)

Map_UniqueDevicesPerc.png (1×3 px, 503 KB)

Map_UniqueDevices3moRollingYoy.png (1×3 px, 533 KB)

Map_Editors.png (1×3 px, 470 KB)

Map_EditorsPerc.png (1×3 px, 507 KB)

Map_Editors3moRollingYoy.png (1×3 px, 534 KB)

Map_Content.png (1×3 px, 468 KB)

Map_ContentPerc.png (1×3 px, 503 KB)

Map_ContentYoY.png (1×3 px, 500 KB)

I just realized the Percent of Totals shouldn't have +/- signs!

Map_ContentPerc.png (1×3 px, 506 KB)

Map_UniqueDevicesPerc.png (1×3 px, 504 KB)

Map_EditorsPerc.png (1×3 px, 509 KB)

Map_WorldPopPerc.png (1×3 px, 509 KB)

@HXi-WMF can we re create the Unique device comparison chart and compare 3-month rolling average of Jan-March 2023 vs Jan-March 2020 ?
these are the % changes Im expecting:

Central & Eastern Europe & Central Asia: 14.98%
East, Southeast Asia, & Pacific: -14.14%
Latin America & Caribbean: -14.75%
Middle East & North Africa: 12.25%
North America: -0.75%
Northern & Western Europe: -3.94%
South Asia: 22.07%
Sub-Saharan Africa: 33.41%

thanks and pls let me know if you have any questions!

Thank you for the updated 2020 vs 2023 chart, Hua!

Reassigning this to Hua - the work for the staff presentation is done (THANK YOU @HXi-WMF, @Miriam , @Mayakp.wiki , and @Iflorez !!), but we'll continue using this task for some related requests.

Hua, for Friday, can you please provide the following charts:

  • Map charts
    • Quality Articles % of total with March 2023 data
    • Quality Articles % of total with March 2022 data (make sure the range is standardized across these 2 charts)
    • Quality Articles counts with March 2023 data
    • Quality Articles counts with March 2022 data (make sure the range is standardized across these 2 charts)
  • Line charts (8 regions per image)
    • Monthly regional unique devices with blocked off areas & trend line (updated through March 2023) - trend line should be started from March 2018
    • Monthly regional active editors with trend line (updated through March 2023) - trend line should be started from March 2018
    • Annual monthly average of regional active editors (only include full years - 2018 through 2022) (given the data issues I don't think it makes sense to do an annual version of the unique devices)

For numbers, please round to 2 significant digits.

Hi! I have updated map charts here. I used the old plasma colormap — let me know how you feel / if it's too uniform. I looked at it again and thought it was okay but can play around with seeing if I can add in white too.

Additionally, I didn't standardize the colorbars for the Percent of Totals here. Because logically, the colors for the Percent of Total map should mirror the colors for the Total Values map — since it's just divided by a single total number. We can see that if we don't force all the Percent of Total maps to the same colorbar.

But all the Change over Time colobars are set to -25-50%

Map_RegionNames.png (1×3 px, 430 KB)

World Population

Map_WorldPop.png (1×3 px, 478 KB)

Map_WorldPopPerc.png (1×3 px, 500 KB)

Unique Devices

Map_UniqueDevices.png (1×3 px, 488 KB)

Map_UniqueDevicesPerc.png (1×3 px, 497 KB)

Map_UniqueDevices3moRollingYoy.png (1×3 px, 527 KB)

Editors

Map_Editors.png (1×3 px, 465 KB)

Map_EditorsPerc.png (1×3 px, 500 KB)

Map_Editors3moRollingYoy.png (1×3 px, 533 KB)

Content
These next two are set to the same scale 0-600K scale.

Map_Content22.png (1×3 px, 478 KB)

Map_Content23.png (1×3 px, 479 KB)

These next two colorbars are set to the same 0-50% scale as each other.

Map_ContentPerc22.png (1×3 px, 505 KB)

Map_ContentPerc23.png (1×3 px, 504 KB)

Map_ContentYoY.png (1×3 px, 495 KB)

Note for significant figures — For absolute values, if the number was 5M, I kept it as 5M instead of making it 5.0M because I felt the trailing 0 is irrelevant and distracting because most of the other numbers are large. But I did keep in trailing zeroes for the percentage values.

Hi @HXi-WMF , this chart uses the incorrect comparison

Unique Devices

Map_UniqueDevices3moRollingYoy.png (1×3 px, 527 KB)

can you update the unique devices map that compares 2020 vs 2023 to 2 significant digits ?

Map_UniqueDevices3moRollingYoy.png (1×3 px, 532 KB)

map with correct values

Oh yes, I made some changes to the code and forgot to update Unique Devices comparison to 3 years instead of 1 year. Here is the fixed chart @Mayakp.wiki

Map_UniqueDevices3moRollingYoy.png (1×3 px, 529 KB)

Here are the line charts:

Unique Devices - March 2018-March 2023 with trendline:

Regional_Unique_Devices_00.jpeg (1×3 px, 497 KB)

Active Editors - March 2018-March 2023 with trendline:

Regional_Active_Editors_0.jpeg (1×3 px, 352 KB)

Regional_Active_Editors_1.jpeg (1×3 px, 170 KB)

Active Editors - Annual 2018-2022 (includes full year for each year, started from Jan 2018)

Regional_Active_Editors_annual.png (1×3 px, 229 KB)

kzimmerman moved this task from Doing to Done on the Product-Analytics (Kanban) board.