Page MenuHomePhabricator

UserImpact: Fetch information for more articles when calculating most-viewed-articles data point
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Delete the relevant row in growthexperiments_user_impact
  • Visit Special:Homepage, make a note of the impact data generated
  • Delete the relevant row in growthexperiments_user_impact
  • Run php extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --force, make a note of the impact data generated

What happens?:

The impact data is not the same between the maintenance script and the on-demand generation.

What should have happened instead?:

It should be the same.

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

Marking as high priority, as this seems to indicate a bug somewhere.

Comparison of job queue generated and on-demand generation:

1{
2 "@version": 5,
3 "userId": 1,
4 "userName": "Admin",
5 "receivedThanksCount": 5,
6 "editCountByNamespace": [
7 530
8 ],
9 "editCountByDay": {
10 "2021-04-07": 3,
11 "2021-04-19": 2,
12 "2021-04-21": 1,
13 "2021-04-23": 4,
14 "2021-04-24": 1,
15 "2021-04-26": 7,
16 "2021-04-27": 10,
17 "2021-04-29": 3,
18 "2021-04-30": 13,
19 "2021-05-01": 6,
20 "2021-05-03": 31,
21 "2021-05-04": 1,
22 "2021-05-05": 2,
23 "2021-05-07": 17,
24 "2021-05-11": 3,
25 "2021-05-12": 6,
26 "2021-05-19": 3,
27 "2021-05-21": 4,
28 "2021-05-26": 10,
29 "2021-05-27": 8,
30 "2021-05-28": 17,
31 "2021-06-02": 5,
32 "2021-06-09": 5,
33 "2021-06-10": 9,
34 "2021-06-22": 34,
35 "2021-06-28": 1,
36 "2021-07-02": 1,
37 "2021-07-09": 1,
38 "2021-08-12": 3,
39 "2021-08-13": 3,
40 "2021-08-17": 1,
41 "2021-08-18": 1,
42 "2021-08-24": 1,
43 "2021-08-25": 1,
44 "2021-08-27": 1,
45 "2021-09-10": 9,
46 "2021-09-15": 1,
47 "2021-09-18": 1,
48 "2021-09-21": 3,
49 "2021-09-23": 3,
50 "2021-09-24": 4,
51 "2021-09-28": 4,
52 "2021-09-30": 3,
53 "2021-10-05": 3,
54 "2021-10-20": 1,
55 "2021-10-21": 11,
56 "2021-10-22": 8,
57 "2021-10-23": 21,
58 "2021-10-28": 10,
59 "2021-10-29": 6,
60 "2021-10-30": 11,
61 "2021-11-01": 5,
62 "2021-11-02": 41,
63 "2021-11-03": 2,
64 "2021-11-04": 3,
65 "2021-11-09": 5,
66 "2021-11-10": 2,
67 "2021-11-11": 22,
68 "2021-11-12": 2,
69 "2021-11-13": 1,
70 "2021-11-24": 3,
71 "2021-11-25": 3,
72 "2022-05-31": 4,
73 "2022-06-01": 2,
74 "2022-06-03": 1,
75 "2022-06-17": 4,
76 "2022-06-24": 6,
77 "2022-06-25": 1,
78 "2022-07-01": 1,
79 "2022-07-03": 3,
80 "2022-07-04": 9,
81 "2022-07-05": 4,
82 "2022-07-06": 1,
83 "2022-07-08": 3,
84 "2022-07-19": 4,
85 "2022-07-27": 2,
86 "2022-09-08": 2,
87 "2022-09-12": 2,
88 "2022-09-15": 22,
89 "2022-10-06": 2,
90 "2022-10-20": 1,
91 "2022-10-29": 4,
92 "2022-11-02": 2,
93 "2022-11-04": 3,
94 "2022-11-05": 1,
95 "2022-11-07": 4,
96 "2022-11-08": 1,
97 "2022-11-09": 3,
98 "2022-11-11": 1,
99 "2022-11-18": 5,
100 "2022-11-19": 7,
101 "2022-11-21": 4,
102 "2022-11-22": 1,
103 "2022-11-24": 1,
104 "2022-11-25": 1,
105 "2022-11-26": 1,
106 "2022-11-29": 1,
107 "2022-11-30": 1,
108 "2022-12-01": 4,
109 "2022-12-02": 2,
110 "2022-12-06": 1,
111 "2022-12-09": 1
112 },
113 "timeZone": [
114 "ZoneInfo|780|Pacific/Auckland",
115 780
116 ],
117 "newcomerTaskEditCount": 309,
118 "lastEditTimestamp": 1670502764,
119 "generatedAt": 1670502879,
120 "longestEditingStreak": {
121 "datePeriod": {
122 "start": "2021-11-09",
123 "end": "2021-11-13",
124 "days": 5
125 },
126 "totalEditCountForPeriod": 32
127 },
128 "totalEditsCount": 530,
129 "dailyTotalViews": {
130 "2022-10-09": 3467,
131 "2022-10-10": 4266,
132 "2022-10-11": 4638,
133 "2022-10-12": 4364,
134 "2022-10-13": 3956,
135 "2022-10-14": 3626,
136 "2022-10-15": 3123,
137 "2022-10-16": 3845,
138 "2022-10-17": 4238,
139 "2022-10-18": 4156,
140 "2022-10-19": 4164,
141 "2022-10-20": 4028,
142 "2022-10-21": 3763,
143 "2022-10-22": 3099,
144 "2022-10-23": 3592,
145 "2022-10-24": 4657,
146 "2022-10-25": 5495,
147 "2022-10-26": 4362,
148 "2022-10-27": 3920,
149 "2022-10-28": 4592,
150 "2022-10-29": 3397,
151 "2022-10-30": 4623,
152 "2022-10-31": 4621,
153 "2022-11-01": 4138,
154 "2022-11-02": 4928,
155 "2022-11-03": 4857,
156 "2022-11-04": 4156,
157 "2022-11-05": 3809,
158 "2022-11-06": 4953,
159 "2022-11-07": 5084,
160 "2022-11-08": 5444,
161 "2022-11-09": 5405,
162 "2022-11-10": 4807,
163 "2022-11-11": 4889,
164 "2022-11-12": 4068,
165 "2022-11-13": 3692,
166 "2022-11-14": 4736,
167 "2022-11-15": 4705,
168 "2022-11-16": 4915,
169 "2022-11-17": 5301,
170 "2022-11-18": 4096,
171 "2022-11-19": 3607,
172 "2022-11-20": 4013,
173 "2022-11-21": 4714,
174 "2022-11-22": 5407,
175 "2022-11-23": 4804,
176 "2022-11-24": 5075,
177 "2022-11-25": 4595,
178 "2022-11-26": 3759,
179 "2022-11-27": 4207,
180 "2022-11-28": 5149,
181 "2022-11-29": 5140,
182 "2022-11-30": 5015,
183 "2022-12-01": 5104,
184 "2022-12-02": 4668,
185 "2022-12-03": 3910,
186 "2022-12-04": 4253,
187 "2022-12-05": 4727,
188 "2022-12-06": 4196,
189 "2022-12-07": 4650
190 },
191 "recentEditsWithoutPageviews": [],
192 "topViewedArticles": {
193 "Test": {
194 "firstEditDate": "2022-11-18",
195 "newestEdit": "20221118135831",
196 "views": {
197 "2022-10-09": 0,
198 "2022-10-10": 0,
199 "2022-10-11": 0,
200 "2022-10-12": 0,
201 "2022-10-13": 0,
202 "2022-10-14": 0,
203 "2022-10-15": 0,
204 "2022-10-16": 0,
205 "2022-10-17": 0,
206 "2022-10-18": 0,
207 "2022-10-19": 0,
208 "2022-10-20": 0,
209 "2022-10-21": 0,
210 "2022-10-22": 0,
211 "2022-10-23": 0,
212 "2022-10-24": 0,
213 "2022-10-25": 0,
214 "2022-10-26": 0,
215 "2022-10-27": 0,
216 "2022-10-28": 0,
217 "2022-10-29": 0,
218 "2022-10-30": 0,
219 "2022-10-31": 0,
220 "2022-11-01": 0,
221 "2022-11-02": 0,
222 "2022-11-03": 0,
223 "2022-11-04": 0,
224 "2022-11-05": 0,
225 "2022-11-06": 0,
226 "2022-11-07": 0,
227 "2022-11-08": 0,
228 "2022-11-09": 0,
229 "2022-11-10": 0,
230 "2022-11-11": 0,
231 "2022-11-12": 0,
232 "2022-11-13": 0,
233 "2022-11-14": 0,
234 "2022-11-15": 0,
235 "2022-11-16": 0,
236 "2022-11-17": 0,
237 "2022-11-18": 1059,
238 "2022-11-19": 727,
239 "2022-11-20": 718,
240 "2022-11-21": 1160,
241 "2022-11-22": 1496,
242 "2022-11-23": 1255,
243 "2022-11-24": 1207,
244 "2022-11-25": 987,
245 "2022-11-26": 789,
246 "2022-11-27": 775,
247 "2022-11-28": 1408,
248 "2022-11-29": 1420,
249 "2022-11-30": 1288,
250 "2022-12-01": 1359,
251 "2022-12-02": 1403,
252 "2022-12-03": 846,
253 "2022-12-04": 872,
254 "2022-12-05": 1485,
255 "2022-12-06": 1403,
256 "2022-12-07": 1978
257 },
258 "viewsCount": 23635,
259 "pageviewsUrl": "https://pageviews.wmcloud.org/?project=en.wikipedia.org&userlang=en&start=2022-11-18&end=2022-12-07&pages=Test"
260 },
261 "Time": {
262 "firstEditDate": "2022-11-30",
263 "newestEdit": "20221201083405",
264 "views": {
265 "2022-10-09": 0,
266 "2022-10-10": 0,
267 "2022-10-11": 0,
268 "2022-10-12": 0,
269 "2022-10-13": 0,
270 "2022-10-14": 0,
271 "2022-10-15": 0,
272 "2022-10-16": 0,
273 "2022-10-17": 0,
274 "2022-10-18": 0,
275 "2022-10-19": 0,
276 "2022-10-20": 0,
277 "2022-10-21": 0,
278 "2022-10-22": 0,
279 "2022-10-23": 0,
280 "2022-10-24": 0,
281 "2022-10-25": 0,
282 "2022-10-26": 0,
283 "2022-10-27": 0,
284 "2022-10-28": 0,
285 "2022-10-29": 0,
286 "2022-10-30": 0,
287 "2022-10-31": 0,
288 "2022-11-01": 0,
289 "2022-11-02": 0,
290 "2022-11-03": 0,
291 "2022-11-04": 0,
292 "2022-11-05": 0,
293 "2022-11-06": 0,
294 "2022-11-07": 0,
295 "2022-11-08": 0,
296 "2022-11-09": 0,
297 "2022-11-10": 0,
298 "2022-11-11": 0,
299 "2022-11-12": 0,
300 "2022-11-13": 0,
301 "2022-11-14": 0,
302 "2022-11-15": 0,
303 "2022-11-16": 0,
304 "2022-11-17": 0,
305 "2022-11-18": 0,
306 "2022-11-19": 0,
307 "2022-11-20": 0,
308 "2022-11-21": 0,
309 "2022-11-22": 0,
310 "2022-11-23": 0,
311 "2022-11-24": 0,
312 "2022-11-25": 0,
313 "2022-11-26": 0,
314 "2022-11-27": 0,
315 "2022-11-28": 0,
316 "2022-11-29": 0,
317 "2022-11-30": 3369,
318 "2022-12-01": 3402,
319 "2022-12-02": 2931,
320 "2022-12-03": 2823,
321 "2022-12-04": 3118,
322 "2022-12-05": 2944,
323 "2022-12-06": 2473,
324 "2022-12-07": 2335
325 },
326 "viewsCount": 23395,
327 "pageviewsUrl": "https://pageviews.wmcloud.org/?project=en.wikipedia.org&userlang=en&start=2022-11-30&end=2022-12-07&pages=Time"
328 },
329 "Codex_Alimentarius": {
330 "imageUrl": "https://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Food_Safety_1.svg/40px-Food_Safety_1.svg.png",
331 "firstEditDate": "2022-06-17",
332 "newestEdit": "20221117205817",
333 "views": {
334 "2022-10-09": 102,
335 "2022-10-10": 145,
336 "2022-10-11": 158,
337 "2022-10-12": 153,
338 "2022-10-13": 129,
339 "2022-10-14": 174,
340 "2022-10-15": 169,
341 "2022-10-16": 291,
342 "2022-10-17": 246,
343 "2022-10-18": 223,
344 "2022-10-19": 254,
345 "2022-10-20": 321,
346 "2022-10-21": 169,
347 "2022-10-22": 146,
348 "2022-10-23": 145,
349 "2022-10-24": 172,
350 "2022-10-25": 146,
351 "2022-10-26": 182,
352 "2022-10-27": 178,
353 "2022-10-28": 141,
354 "2022-10-29": 141,
355 "2022-10-30": 136,
356 "2022-10-31": 138,
357 "2022-11-01": 135,
358 "2022-11-02": 196,
359 "2022-11-03": 187,
360 "2022-11-04": 162,
361 "2022-11-05": 129,
362 "2022-11-06": 134,
363 "2022-11-07": 196,
364 "2022-11-08": 168,
365 "2022-11-09": 149,
366 "2022-11-10": 164,
367 "2022-11-11": 153,
368 "2022-11-12": 155,
369 "2022-11-13": 155,
370 "2022-11-14": 162,
371 "2022-11-15": 135,
372 "2022-11-16": 142,
373 "2022-11-17": 185,
374 "2022-11-18": 110,
375 "2022-11-19": 113,
376 "2022-11-20": 132,
377 "2022-11-21": 157,
378 "2022-11-22": 133,
379 "2022-11-23": 143,
380 "2022-11-24": 122,
381 "2022-11-25": 120,
382 "2022-11-26": 93,
383 "2022-11-27": 122,
384 "2022-11-28": 164,
385 "2022-11-29": 191,
386 "2022-11-30": 183,
387 "2022-12-01": 178,
388 "2022-12-02": 132,
389 "2022-12-03": 123,
390 "2022-12-04": 172,
391 "2022-12-05": 150,
392 "2022-12-06": 177,
393 "2022-12-07": 165
394 },
395 "viewsCount": 9646,
396 "pageviewsUrl": "https://pageviews.wmcloud.org/?project=en.wikipedia.org&userlang=en&start=2022-10-09&end=2022-12-07&pages=Codex_Alimentarius"
397 },
398 "NAT64": {
399 "firstEditDate": "2022-11-09",
400 "newestEdit": "20221109085343",
401 "views": {
402 "2022-10-09": 0,
403 "2022-10-10": 0,
404 "2022-10-11": 0,
405 "2022-10-12": 0,
406 "2022-10-13": 0,
407 "2022-10-14": 0,
408 "2022-10-15": 0,
409 "2022-10-16": 0,
410 "2022-10-17": 0,
411 "2022-10-18": 0,
412 "2022-10-19": 0,
413 "2022-10-20": 0,
414 "2022-10-21": 0,
415 "2022-10-22": 0,
416 "2022-10-23": 0,
417 "2022-10-24": 0,
418 "2022-10-25": 0,
419 "2022-10-26": 0,
420 "2022-10-27": 0,
421 "2022-10-28": 0,
422 "2022-10-29": 0,
423 "2022-10-30": 0,
424 "2022-10-31": 0,
425 "2022-11-01": 0,
426 "2022-11-02": 0,
427 "2022-11-03": 0,
428 "2022-11-04": 0,
429 "2022-11-05": 0,
430 "2022-11-06": 0,
431 "2022-11-07": 0,
432 "2022-11-08": 0,
433 "2022-11-09": 92,
434 "2022-11-10": 90,
435 "2022-11-11": 72,
436 "2022-11-12": 51,
437 "2022-11-13": 61,
438 "2022-11-14": 100,
439 "2022-11-15": 104,
440 "2022-11-16": 106,
441 "2022-11-17": 95,
442 "2022-11-18": 111,
443 "2022-11-19": 53,
444 "2022-11-20": 31,
445 "2022-11-21": 115,
446 "2022-11-22": 100,
447 "2022-11-23": 98,
448 "2022-11-24": 77,
449 "2022-11-25": 64,
450 "2022-11-26": 53,
451 "2022-11-27": 55,
452 "2022-11-28": 99,
453 "2022-11-29": 105,
454 "2022-11-30": 120,
455 "2022-12-01": 117,
456 "2022-12-02": 103,
457 "2022-12-03": 55,
458 "2022-12-04": 59,
459 "2022-12-05": 107,
460 "2022-12-06": 98,
461 "2022-12-07": 131
462 },
463 "viewsCount": 2522,
464 "pageviewsUrl": "https://pageviews.wmcloud.org/?project=en.wikipedia.org&userlang=en&start=2022-11-09&end=2022-12-07&pages=NAT64"
465 },
466 "Uruguay_women's_national_football_team": {
467 "firstEditDate": "2022-11-20",
468 "newestEdit": "20221120214115",
469 "views": {
470 "2022-10-09": 0,
471 "2022-10-10": 0,
472 "2022-10-11": 0,
473 "2022-10-12": 0,
474 "2022-10-13": 0,
475 "2022-10-14": 0,
476 "2022-10-15": 0,
477 "2022-10-16": 0,
478 "2022-10-17": 0,
479 "2022-10-18": 0,
480 "2022-10-19": 0,
481 "2022-10-20": 0,
482 "2022-10-21": 0,
483 "2022-10-22": 0,
484 "2022-10-23": 0,
485 "2022-10-24": 0,
486 "2022-10-25": 0,
487 "2022-10-26": 0,
488 "2022-10-27": 0,
489 "2022-10-28": 0,
490 "2022-10-29": 0,
491 "2022-10-30": 0,
492 "2022-10-31": 0,
493 "2022-11-01": 0,
494 "2022-11-02": 0,
495 "2022-11-03": 0,
496 "2022-11-04": 0,
497 "2022-11-05": 0,
498 "2022-11-06": 0,
499 "2022-11-07": 0,
500 "2022-11-08": 0,
501 "2022-11-09": 0,
502 "2022-11-10": 0,
503 "2022-11-11": 0,
504 "2022-11-12": 0,
505 "2022-11-13": 0,
506 "2022-11-14": 0,
507 "2022-11-15": 0,
508 "2022-11-16": 0,
509 "2022-11-17": 0,
510 "2022-11-18": 0,
511 "2022-11-19": 0,
512 "2022-11-20": 26,
513 "2022-11-21": 30,
514 "2022-11-22": 33,
515 "2022-11-23": 33,
516 "2022-11-24": 104,
517 "2022-11-25": 42,
518 "2022-11-26": 46,
519 "2022-11-27": 31,
520 "2022-11-28": 124,
521 "2022-11-29": 58,
522 "2022-11-30": 41,
523 "2022-12-01": 33,
524 "2022-12-02": 75,
525 "2022-12-03": 57,
526 "2022-12-04": 26,
527 "2022-12-05": 33,
528 "2022-12-06": 28,
529 "2022-12-07": 25
530 },
531 "viewsCount": 845,
532 "pageviewsUrl": "https://pageviews.wmcloud.org/?project=en.wikipedia.org&userlang=en&start=2022-11-20&end=2022-12-07&pages=Uruguay_women%27s_national_football_team"
533 }
534 },
535 "topViewedArticlesCount": 60043
536}

It seems to come down to a different set of articles being selected as top-viewed. Probably because of the PageViewInfo request limit. I'll add some logging to make it easier to see when that is the case.

Change 866832 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Log when PageViewInfo fails to return data about a page

https://gerrit.wikimedia.org/r/866832

PageViewInfo limits the AQS requests to five per call, so whenever there are more than five articles with a cold PageViewInfo cache (and the cache expires daily so it's pretty common), only the first five (in chronological order according to the user's edits, I think?) will have data; next time you look at it, the next five etc. Since the final (pageivew-based) order is entirely unrelated to the initial order, you will get a somewhat random subset of the articles on each request until you have most data in the cache. I added some logging code but I'm pretty sure this is the reason.

Not really sure what to do about it. We could fetch data for more pages in the maintenance script, probably, especially if we add some delay between the requests. In theory the nice approach would be to add an AQS API for per-user pageview data, and do the aggregation as an ETL pipeline on the analytics stack, but that would be a big ask for a minor element of an as-of-yet experimental new feature.

PageViewInfo limits the AQS requests to five per call, so whenever there are more than five articles with a cold PageViewInfo cache (and the cache expires daily so it's pretty common), only the first five (in chronological order according to the user's edits, I think?) will have data; next time you look at it, the next five etc. Since the final (pageivew-based) order is entirely unrelated to the initial order, you will get a somewhat random subset of the articles on each request until you have most data in the cache. I added some logging code but I'm pretty sure this is the reason.

Not really sure what to do about it. We could fetch data for more pages in the maintenance script, probably, especially if we add some delay between the requests. In theory the nice approach would be to add an AQS API for per-user pageview data, and do the aggregation as an ETL pipeline on the analytics stack, but that would be a big ask for a minor element of an as-of-yet experimental new feature.

Could we increase the limit? Having difficulty tracking down where the 5 article limit came from, but maybe something like 50 would handle most of our use cases?

The limit is set in $wgPageViewInfoWikimediaRequestLimit. I think 5 is an appropriate limit for a web request; we could just call PageViewService::getPageData() multiple times in a row from the maintenance script, though. The one thing to look out for is how it affects AQS request rate (but if that becomes a problem, we can just add some artifical delay).

The limit is set in $wgPageViewInfoWikimediaRequestLimit.

Sorry, I meant I had trouble tracking down why this number was selected as opposed to something else (10, 20, 50 etc).

I think 5 is an appropriate limit for a web request; we could just call PageViewService::getPageData() multiple times in a row from the maintenance script, though. The one thing to look out for is how it affects AQS request rate (but if that becomes a problem, we can just add some artifical delay).

That would fix the issue for regeneration from the maintenance script, but I think we would also need to allow calling PageViewService::getPageData() multiple times when data is regenerated via ImpactHooks.php, otherwise the user would see inconsistencies in their article list. (And if we are calling PageViewService::getPageData() when regenerating data from ImpactHooks.php, then maybe it makes more sense to bump the limit for PageViewInfoWikimediaRequestLimit globally or in our call to PageViewInfo?)

Change 866832 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Log when PageViewInfo fails to return data about a page

https://gerrit.wikimedia.org/r/866832

Sorry, I meant I had trouble tracking down why this number was selected as opposed to something else (10, 20, 50 etc).

I don't remember, but I don't think anything more than gut feeling went into it.

That would fix the issue for regeneration from the maintenance script, but I think we would also need to allow calling PageViewService::getPageData() multiple times when data is regenerated via ImpactHooks.php, otherwise the user would see inconsistencies in their article list. (And if we are calling PageViewService::getPageData() when regenerating data from ImpactHooks.php, then maybe it makes more sense to bump the limit for PageViewInfoWikimediaRequestLimit globally or in our call to PageViewInfo?)

In general, an extension should not change another extension's globals (preferably shouldn't read them either), and in this case it woulnd't work reliably anyway, as the value gets copied into an object property when the service is initialized. It would make sense to allow the caller to supply the value in some way, I'm just not sure what that way should be (it doesn't really make sense in the interface as it is specific to one of the implementations).

The post-save hook is not that much of a problem since you only have one new article there, the rest are already cached. But the issue would still affect users who are outside of the maintenance script's conditions, and probably a stampede situation with the PageViewInfo cache (although that is fixable).

Sorry, I meant I had trouble tracking down why this number was selected as opposed to something else (10, 20, 50 etc).

I don't remember, but I don't think anything more than gut feeling went into it.

That would fix the issue for regeneration from the maintenance script, but I think we would also need to allow calling PageViewService::getPageData() multiple times when data is regenerated via ImpactHooks.php, otherwise the user would see inconsistencies in their article list. (And if we are calling PageViewService::getPageData() when regenerating data from ImpactHooks.php, then maybe it makes more sense to bump the limit for PageViewInfoWikimediaRequestLimit globally or in our call to PageViewInfo?)

In general, an extension should not change another extension's globals (preferably shouldn't read them either), and in this case it woulnd't work reliably anyway, as the value gets copied into an object property when the service is initialized.

Agreed; I meant that we could globally change the value, by adjusting PageViewInfoWikimediaRequestLimit in extension.json in the PageViewInfo extension, or via an override in InitialiseSettings.php in the mediawiki-config repo.

We have some data from the logging. Across the last 7 days, there are 22,157 notices logged. The spikes are from the job runner:

image.png (1×2 px, 304 KB)

Most of the entries are for larger numbers of titles, but there are several log entries where we failed to fetch page view info for < 5 articles.

(Not moving to in progress as it disappears from the "Top priorities by Jan 26th" column, but claiming as I am looking into this further.)

(Not moving to in progress as it disappears from the "Top priorities by Jan 26th" column, but claiming as I am looking into this further.)

We aim to have all the impact module tasks in QA by Friday, so maintaining this column seems less important. Moving to in progress.

Change 881414 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] [WIP] Process more articles when fetching page view data

https://gerrit.wikimedia.org/r/881414

Change 881414 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Process more articles when fetching page view data

https://gerrit.wikimedia.org/r/881414

kostajh renamed this task from refreshUserImpact job and on-demand user impact generation should produce identical results to UserImpact: Fetch information for more articles when calculating most-viewed-articles data ponit.Jan 30 2023, 10:28 AM

I'm writing this comment to give a heads-up to maintainers of AQS (I believe that is Data-Engineering?) about a patch riding this week's train. (SRE triager: I didn't know which sub-tag of SRE, if any, was relevant, thanks in advance for applying the right one.)

Some background:

  • the refreshUserImpactJob runs daily and is also enqueued after:
    • a) a user receives thanks
    • b) a user makes an edit
    • c) the user visits Special:Homepage and we believe their user impact data is stale
  • among other things, the job attempts to fetch page view data (last 60 days) for articles that the user has edited
    • in wmf.20, this is limited to fetching data for 5 articles from AQS. The PageViewInfo extension, which interacts with AQS, allows 5 uncached requests to AQS per call. After PageViewInfo fetches data from AQS, it places it in the WANObjectCache. So if I have 10 articles, none of them are in the cache, I will get back page view data for 5 articles. If I request again, or if another user comes along and has edited the same articles as me, they will get 5 cached articles and 5 new articles.

What the patch changes:

  • The job will attempt to fetch data for up to 1,000 articles in the user's edit history. It will make a request to AQS via PageViewInfo icon, and if the number of articles with page view data is less than what is requested, it will continue to make requests until
    • a) it has data for articles
    • b) the maximum limit of 1,000 articles is reached (GEUserImpactMaxArticlesToProcessForPageviews config switch)
    • c) the maximum execution time of 5 minutes is reached (GEUserImpactMaximumProcessTimeSeconds config switch)

In case we see an unsustainable increase in traffic to AQS, we can adjust the config switches to revert to the previous status quo.

Some additional information:

  • It's hard to track down exactly how many additional requests to AQS this will generate per day.
  • The growthexperiments_user_impact table is a decent proxy for how many users we are generating data for. That has 574 rows on cswiki, 202 on bnwiki, 1,365 on arwiki, and 3,045 on eswiki.
  • Putting that together, we have 5,186 users. The theoretical maximum for articles we could request page view data for from AQS in a job request would be 5,186,000. In practice, we would get nowhere near that number because most of these users are new editors with edits to a handful of articles, and there will be overlap between the articles, so some percentage of the requests would be to WANObjectCache and not AQS.

Questions:

  • What dashboard(s) can Growth team engineers monitor to keep an eye on traffic as the patch reaches group2?
  • Are there any concerns from Data-Engineering or SRE about this patch?

Would it be worth to also add/list stewards on central https://www.mediawiki.org/wiki/Developers/Maintainers ?

IMO, yes, but will leave that to the stewards to add themselves.

I'm writing this comment to give a heads-up to maintainers of AQS (I believe that is Data-Engineering?) about a patch riding this week's train. (SRE triager: I didn't know which sub-tag of SRE, if any, was relevant, thanks in advance for applying the right one.)

Some background:

  • the refreshUserImpactJob runs daily and is also enqueued after:
    • a) a user receives thanks
    • b) a user makes an edit
    • c) the user visits Special:Homepage and we believe their user impact data is stale
  • among other things, the job attempts to fetch page view data (last 60 days) for articles that the user has edited
    • in wmf.20, this is limited to fetching data for 5 articles from AQS. The PageViewInfo extension, which interacts with AQS, allows 5 uncached requests to AQS per call. After PageViewInfo fetches data from AQS, it places it in the WANObjectCache. So if I have 10 articles, none of them are in the cache, I will get back page view data for 5 articles. If I request again, or if another user comes along and has edited the same articles as me, they will get 5 cached articles and 5 new articles.

What the patch changes:

  • The job will attempt to fetch data for up to 1,000 articles in the user's edit history. It will make a request to AQS via PageViewInfo icon, and if the number of articles with page view data is less than what is requested, it will continue to make requests until
    • a) it has data for articles
    • b) the maximum limit of 1,000 articles is reached (GEUserImpactMaxArticlesToProcessForPageviews config switch)
    • c) the maximum execution time of 5 minutes is reached (GEUserImpactMaximumProcessTimeSeconds config switch)

In case we see an unsustainable increase in traffic to AQS, we can adjust the config switches to revert to the previous status quo.

Some additional information:

  • It's hard to track down exactly how many additional requests to AQS this will generate per day.
  • The growthexperiments_user_impact table is a decent proxy for how many users we are generating data for. That has 574 rows on cswiki, 202 on bnwiki, 1,365 on arwiki, and 3,045 on eswiki.
  • Putting that together, we have 5,186 users. The theoretical maximum for articles we could request page view data for from AQS in a job request would be 5,186,000. In practice, we would get nowhere near that number because most of these users are new editors with edits to a handful of articles, and there will be overlap between the articles, so some percentage of the requests would be to WANObjectCache and not AQS.

Questions:

  • What dashboard(s) can Growth team engineers monitor to keep an eye on traffic as the patch reaches group2?
  • Are there any concerns from Data-Engineering or SRE about this patch?

The patch improved the situation (~450 entries per 24 hours, it was 2,927 entries per 24 hours in wmf.20):

{F36762440}

but also resulted in T328945: An earlier attempt to fetch page {page title} failed. To limit server load, retries have been blocked for 30 minutes.

kostajh renamed this task from UserImpact: Fetch information for more articles when calculating most-viewed-articles data ponit to UserImpact: Fetch information for more articles when calculating most-viewed-articles data point.Feb 6 2023, 5:56 PM

One thing to keep in mind is that the DC switch (scheduled for March 1 IIRC) will increase the response time of AQS quite significantly, so we should keep an eye on what happens.

Checked in wmf.25 (after March01/2023 data center switch over) - "Failed to get page view data for". For 24 hours there are 464 hits:

Screen Shot 2023-03-02 at 5.38.44 PM.png (858×2 px, 226 KB)

Also, logstash Growth Team dashboard PageViewInfo channel shows the following error for test2wiki
Failed fetching https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/test2.wikipedia.org/ ...
The rate is really low, but since test2wiki has zero tasks available - test2wiki Special:NewcomerTasksInfo shows all zeroes, maybe test2wiki should be excluded from logging ?

The user impact module is not really related to tasks. Also PageViewInfo is used for other things, not just GrowthExperiments (although the errors are all GrowthExperiments-related).

The user impact module is not really related to tasks. Also PageViewInfo is used for other things, not just GrowthExperiments (although the errors are all GrowthExperiments-related).

Thx, @Tgr for clarification! Per our slack conversation, checked the two things - and all seem to ok.

  • pageview stats are reliable (user impact data doesn't change on every page reload for a user with many edits due to how a only a couple pageview data points are fetched on every reload)
    • no AQS performance problems after the DC switch