Page MenuHomePhabricator

EventLogging sees too few distinct client IPs {oryx} [8 pts]
Closed, ResolvedPublic

Description

See the three examples below, for very different EL tables, which contain an implausibly low number of distinct client IP (hashes) in the EventLogging capsule:

  • 37 IPs in 111004 events
  • 37 IPs in 8630 events
  • 41 IPs in 94218 events

As @ori pointed out to me, this may be suspiciously close to the number of Varnish IPs (around 40?). Indeed, the EL table for visits to the Wikimedia blog (which is not behind Varnish) shows a more normal ratio:

  • 507 IPs in 3326 events
select COUNT(*)
from
log.MobileWebSectionUsage_14321266 WHERE timestamp LIKE '2015111800%';
+----------+
| COUNT(*) |
+----------+
|   111004 |
+----------+

mysql:research@analytics-store.eqiad.wmnet [(none)]> select COUNT(distinct clientIP) from log.MobileWebSectionUsage_14321266 WHERE timestamp LIKE '2015111800%';
+--------------------------+
| COUNT(distinct clientIP) |
+--------------------------+
|                       37 |
+--------------------------+


mysql:research@analytics-store.eqiad.wmnet [(none)]> SELECT COUNT(DISTINCT(clientIP)) FROM log.MobileWikiAppDailyStats_12637385 WHERE timestamp LIKE '20151118%';
+---------------------------+
| COUNT(DISTINCT(clientIP)) |
+---------------------------+
|                        37 |
+---------------------------+

mysql:research@analytics-store.eqiad.wmnet [(none)]> SELECT COUNT(*) FROM log.MobileWikiAppDailyStats_12637385 WHERE timestamp LIKE '20151118%';+----------+
| COUNT(*) |
+----------+
|     8630 |
+----------+


mysql:research@analytics-store.eqiad.wmnet [(none)]> SELECT COUNT(*) FROM log.UniversalLanguageSelector_7327441 WHERE timestamp LIKE '20151118%';
+----------+
| COUNT(*) |
+----------+
|    94218 |
+----------+

mysql:research@analytics-store.eqiad.wmnet [(none)]> SELECT COUNT(DISTINCT(clientIP)) FROM log.UniversalLanguageSelector_7327441 WHERE timestamp LIKE '20151118%';
+---------------------------+
| COUNT(DISTINCT(clientIP)) |
+---------------------------+
|                        41 |
+---------------------------+



SELECT COUNT(*) FROM log.WikimediaBlogVisit_5308166 WHERE timestamp LIKE '20151118%';
+----------+
| COUNT(*) |
+----------+
|     3326 |
+----------+

SELECT COUNT(DISTINCT(clientIP)) FROM log.WikimediaBlogVisit_5308166 WHERE timestamp LIKE '20151118%';
+---------------------------+
| COUNT(DISTINCT(clientIP)) |
+---------------------------+
|                       507 |
+---------------------------+

Event Timeline

Tbayer raised the priority of this task from to Needs Triage.
Tbayer updated the task description. (Show Details)
Tbayer added a project: Analytics-Backlog.
Tbayer added subscribers: Tbayer, ori.

Below is a closer look on how these numbers developed over time, for two of the three tables examined in the task. Seems something happened in June, around the time of the HTTPS-only rollout, and something else happened in September.

mysql:research@analytics-store.eqiad.wmnet [log]> SELECT LEFT(timestamp,8) AS date, COUNT(DISTINCT(clientIP)) AS ips FROM log.UniversalLanguageSelector_7327441 GROUP BY date ORDER BY date;
+----------+-------+
| date     | ips   |
+----------+-------+
| 20140210 |     9 |
| 20140211 |    11 |
| 20140212 |    22 |
| 20140213 |    59 |
| 20140214 |   265 |
| 20140215 |   271 |
| 20140216 |   264 |
| 20140217 |   332 |
| 20140218 |   994 |
| 20140219 |  4898 |
| 20140220 |  6622 |
| 20140221 | 44151 |
| 20140222 | 42857 |
| 20140223 | 51391 |
| 20140224 | 60435 |
| 20140225 | 61134 |
| 20140226 | 59657 |
| 20140227 | 56416 |
| 20140228 | 46384 |
| 20140301 | 42602 |
| 20140302 | 49928 |
| 20140303 | 57454 |
| 20140304 | 58813 |
| 20140305 | 59089 |
| 20140306 | 56665 |
| 20140307 | 46552 |
| 20140308 | 40782 |
| 20140309 | 50771 |
| 20140310 | 59904 |
| 20140311 | 61390 |
| 20140312 | 59677 |
| 20140313 | 57051 |
| 20140314 | 46506 |
| 20140315 | 41952 |
| 20140316 | 52058 |
| 20140317 | 58573 |
| 20140320 | 42960 |
| 20140321 | 45987 |
| 20140322 | 41747 |
| 20140323 | 50898 |
| 20140324 | 59332 |
| 20140325 | 59060 |
| 20140326 | 58452 |
| 20140327 | 55142 |
| 20140328 | 45558 |
| 20140329 | 40188 |
| 20140330 | 47143 |
| 20140331 | 56855 |
| 20140401 | 54606 |
| 20140402 | 56864 |
| 20140403 | 56317 |
| 20140404 | 46297 |
| 20140405 | 41122 |
| 20140406 | 49161 |
| 20140407 | 57819 |
| 20140408 | 56841 |
| 20140409 | 55781 |
| 20140410 | 53193 |
| 20140411 | 43812 |
| 20140412 | 38680 |
| 20140413 | 45784 |
| 20140414 | 53901 |
| 20140415 | 52050 |
| 20140416 | 49028 |
| 20140417 | 44221 |
| 20140418 | 40052 |
| 20140419 | 37392 |
| 20140420 | 40940 |
| 20140421 | 52198 |
| 20140422 | 56806 |
| 20140423 | 55791 |
| 20140424 | 53441 |
| 20140425 | 45463 |
| 20140426 | 40968 |
| 20140427 | 50616 |
| 20140428 | 57786 |
| 20140429 | 49814 |
| 20140430 | 46250 |
| 20140501 | 46561 |
| 20140502 | 51573 |
| 20140503 | 48466 |
| 20140504 | 57717 |
| 20140505 | 66571 |
| 20140506 | 68023 |
| 20140507 | 66828 |
| 20140508 | 58691 |
| 20140509 | 46276 |
| 20140510 | 36367 |
| 20140511 | 43359 |
| 20140512 | 50875 |
| 20140513 | 53879 |
| 20140514 | 51995 |
| 20140515 | 49253 |
| 20140516 | 44437 |
| 20140517 | 46315 |
| 20140518 | 56781 |
| 20140519 | 67814 |
| 20140520 | 67696 |
| 20140521 | 67297 |
| 20140522 | 62009 |
| 20140523 | 50236 |
| 20140524 | 43807 |
| 20140525 | 53006 |
| 20140526 | 61281 |
| 20140527 | 61425 |
| 20140528 | 59221 |
| 20140529 | 56385 |
| 20140530 | 47965 |
| 20140531 | 42766 |
| 20140601 | 50008 |
| 20140602 | 59709 |
| 20140603 | 59195 |
| 20140604 | 57533 |
| 20140605 | 56153 |
| 20140606 | 44120 |
| 20140607 | 38817 |
| 20140608 | 45451 |
| 20140609 | 54885 |
| 20140610 | 55261 |
| 20140611 | 53750 |
| 20140612 | 50196 |
| 20140613 | 41872 |
| 20140614 | 38084 |
| 20140615 | 44440 |
| 20140616 | 51704 |
| 20140617 | 51309 |
| 20140618 | 46616 |
| 20140619 | 44151 |
| 20140620 | 38457 |
| 20140621 | 33872 |
| 20140622 | 38907 |
| 20140623 | 44900 |
| 20140624 | 45442 |
| 20140625 | 44007 |
| 20140626 | 41863 |
| 20140627 | 36982 |
| 20140628 | 31562 |
| 20140629 | 36236 |
| 20140630 | 42024 |
| 20140701 | 41190 |
| 20140702 | 42057 |
| 20140703 | 41265 |
| 20140704 | 34799 |
| 20140705 | 31113 |
| 20140706 | 35411 |
| 20140707 | 42872 |
| 20140708 | 42076 |
| 20140709 | 39620 |
| 20140710 | 40228 |
| 20140711 | 37250 |
| 20140712 | 31544 |
| 20140713 | 33540 |
| 20140714 | 41593 |
| 20140715 | 42307 |
| 20140716 | 42458 |
| 20140717 | 40494 |
| 20140718 | 36108 |
| 20140719 | 30525 |
| 20140720 | 34909 |
| 20140721 | 42569 |
| 20140722 | 42704 |
| 20140723 | 40969 |
| 20140724 | 39597 |
| 20140725 | 35280 |
| 20140726 | 30112 |
| 20140727 | 32319 |
| 20140728 | 38395 |
| 20140729 | 40224 |
| 20140730 | 39485 |
| 20140731 | 38899 |
| 20140801 | 35109 |
| 20140802 | 30151 |
| 20140803 | 33420 |
| 20140804 | 41241 |
| 20140805 | 40131 |
| 20140806 | 39451 |
| 20140807 | 38409 |
| 20140808 | 34935 |
| 20140809 | 30104 |
| 20140810 | 32768 |
| 20140811 | 41341 |
| 20140812 | 41781 |
| 20140813 | 41584 |
| 20140814 | 38274 |
| 20140815 | 33558 |
| 20140816 | 30037 |
| 20140817 | 33603 |
| 20140818 | 41847 |
| 20140819 | 43328 |
| 20140820 | 43223 |
| 20140821 | 42379 |
| 20140822 | 38087 |
| 20140823 | 32295 |
| 20140824 | 36282 |
| 20140825 | 44387 |
| 20140826 | 45188 |
| 20140827 | 43502 |
| 20140828 | 41481 |
| 20140829 | 36613 |
| 20140830 | 31396 |
| 20140831 | 36021 |
| 20140901 | 43957 |
| 20140902 | 45999 |
| 20140903 | 46542 |
| 20140904 | 42761 |
| 20140905 | 38111 |
| 20140906 | 33159 |
| 20140907 | 39208 |
| 20140908 | 48155 |
| 20140909 | 50478 |
| 20140910 | 50075 |
| 20140911 | 47984 |
| 20140912 | 41657 |
| 20140913 | 35800 |
| 20140914 | 42294 |
| 20140915 | 50533 |
| 20140916 | 51136 |
| 20140917 | 52449 |
| 20140918 | 49797 |
| 20140919 | 41818 |
| 20140920 | 37054 |
| 20140921 | 45894 |
| 20140922 | 54278 |
| 20140923 | 54543 |
| 20140924 | 53924 |
| 20140925 | 51549 |
| 20140926 | 42722 |
| 20140927 | 37467 |
| 20140928 | 45785 |
| 20140929 | 54449 |
| 20140930 | 53960 |
| 20141001 | 52267 |
| 20141002 | 48713 |
| 20141003 | 40201 |
| 20141004 | 35486 |
| 20141005 | 44136 |
| 20141006 | 55996 |
| 20141007 | 57036 |
| 20141008 | 57507 |
| 20141009 | 54932 |
| 20141010 | 42949 |
| 20141011 | 36767 |
| 20141012 | 42286 |
| 20141013 | 57842 |
| 20141014 | 58808 |
| 20141015 | 57904 |
| 20141016 | 54739 |
| 20141017 | 45528 |
| 20141018 | 39526 |
| 20141019 | 45883 |
| 20141020 | 56749 |
| 20141021 | 55763 |
| 20141022 | 55154 |
| 20141023 | 52867 |
| 20141024 | 45532 |
| 20141025 | 38874 |
| 20141026 | 46839 |
| 20141027 | 57693 |
| 20141028 | 57892 |
| 20141029 | 57740 |
| 20141030 | 54621 |
| 20141031 | 45101 |
| 20141101 | 40461 |
| 20141102 | 47951 |
| 20141103 | 57716 |
| 20141104 | 60123 |
| 20141105 | 60407 |
| 20141106 | 57800 |
| 20141107 | 47889 |
| 20141108 | 42800 |
| 20141109 | 50386 |
| 20141110 | 59871 |
| 20141111 | 61198 |
| 20141112 | 61352 |
| 20141113 | 57834 |
| 20141114 | 47962 |
| 20141115 | 43186 |
| 20141116 | 53422 |
| 20141117 | 61431 |
| 20141118 | 61655 |
| 20141119 | 60973 |
| 20141120 | 57587 |
| 20141121 | 47689 |
| 20141122 | 42684 |
| 20141123 | 53065 |
| 20141124 | 61809 |
| 20141125 | 60182 |
| 20141126 | 58584 |
| 20141127 | 54321 |
| 20141128 | 44438 |
| 20141129 | 40858 |
| 20141130 | 51236 |
| 20141201 | 58699 |
| 20141202 | 58853 |
| 20141203 | 58609 |
| 20141204 | 54670 |
| 20141205 | 44622 |
| 20141206 | 39694 |
| 20141207 | 48756 |
| 20141208 | 53384 |
| 20141209 | 56499 |
| 20141210 | 55009 |
| 20141211 | 52504 |
| 20141212 | 42385 |
| 20141213 | 38383 |
| 20141214 | 47217 |
| 20141215 | 53185 |
| 20141216 | 51401 |
| 20141217 | 49049 |
| 20141218 | 44391 |
| 20141219 | 35247 |
| 20141220 | 30986 |
| 20141221 | 35648 |
| 20141222 | 38573 |
| 20141223 | 36449 |
| 20141224 | 30723 |
| 20141225 | 29876 |
| 20141226 | 32617 |
| 20141227 | 32450 |
| 20141228 | 34982 |
| 20141229 | 39097 |
| 20141230 | 37264 |
| 20141231 | 27573 |
| 20150101 | 29342 |
| 20150102 | 39228 |
| 20150103 | 39879 |
| 20150104 | 44380 |
| 20150105 | 50104 |
| 20150106 | 50864 |
| 20150107 | 17396 |
| 20150108 | 51855 |
| 20150109 | 45247 |
| 20150110 | 40691 |
| 20150111 | 48574 |
| 20150112 | 55551 |
| 20150113 | 55708 |
| 20150114 | 55651 |
| 20150115 | 52597 |
| 20150116 | 44776 |
| 20150117 | 39028 |
| 20150118 | 48145 |
| 20150119 | 56141 |
| 20150120 | 56271 |
| 20150121 | 55634 |
| 20150122 | 53577 |
| 20150123 | 44747 |
| 20150124 | 39548 |
| 20150125 | 45883 |
| 20150126 | 54341 |
| 20150127 | 55161 |
| 20150128 | 54840 |
| 20150129 | 51542 |
| 20150130 | 42898 |
| 20150131 | 37982 |
| 20150201 | 44943 |
| 20150202 | 53261 |
| 20150203 | 54194 |
| 20150204 | 53687 |
| 20150205 | 48493 |
| 20150206 | 44137 |
| 20150207 | 37856 |
| 20150208 | 45819 |
| 20150209 | 54314 |
| 20150210 | 53443 |
| 20150211 | 52882 |
| 20150212 | 50035 |
| 20150213 | 41109 |
| 20150214 | 34256 |
| 20150215 | 41227 |
| 20150216 | 49223 |
| 20150217 | 51083 |
| 20150218 | 50145 |
| 20150219 | 48393 |
| 20150220 | 41878 |
| 20150221 | 37083 |
| 20150222 | 44235 |
| 20150223 | 53923 |
| 20150224 | 54281 |
| 20150225 | 54274 |
| 20150226 | 50938 |
| 20150227 | 42052 |
| 20150228 | 37300 |
| 20150301 | 44920 |
| 20150302 | 53290 |
| 20150303 | 53524 |
| 20150304 | 53316 |
| 20150305 | 50955 |
| 20150306 | 41613 |
| 20150307 | 37123 |
| 20150308 | 43373 |
| 20150309 | 54137 |
| 20150310 | 53413 |
| 20150311 | 53406 |
| 20150312 | 51449 |
| 20150313 | 42985 |
| 20150314 | 37672 |
| 20150315 | 46369 |
| 20150316 | 55246 |
| 20150317 | 53819 |
| 20150318 | 47168 |
| 20150319 | 30460 |
| 20150320 | 26649 |
| 20150321 | 23715 |
| 20150322 | 28738 |
| 20150323 | 38333 |
| 20150324 | 33574 |
| 20150325 | 34382 |
| 20150326 | 47130 |
| 20150327 | 39776 |
| 20150328 | 33866 |
| 20150329 | 39920 |
| 20150330 | 48025 |
| 20150331 | 45341 |
| 20150401 | 40977 |
| 20150402 | 30881 |
| 20150403 | 32950 |
| 20150404 | 31653 |
| 20150405 | 34380 |
| 20150406 | 43598 |
| 20150407 | 48573 |
| 20150408 | 47806 |
| 20150409 | 44991 |
| 20150410 | 38374 |
| 20150411 | 33928 |
| 20150412 | 40223 |
| 20150413 | 49553 |
| 20150414 | 50305 |
| 20150415 | 48878 |
| 20150416 | 47047 |
| 20150417 | 40560 |
| 20150418 | 34910 |
| 20150419 | 42354 |
| 20150420 | 51223 |
| 20150421 | 50654 |
| 20150422 | 49916 |
| 20150423 | 41230 |
| 20150424 | 36107 |
| 20150425 | 28574 |
| 20150426 | 32878 |
| 20150427 | 42740 |
| 20150428 | 43230 |
| 20150429 | 45108 |
| 20150430 | 30196 |
| 20150501 | 32206 |
| 20150502 | 28907 |
| 20150503 | 36217 |
| 20150504 | 42771 |
| 20150505 | 47037 |
| 20150506 | 47606 |
| 20150507 | 45881 |
| 20150508 | 38670 |
| 20150509 | 33291 |
| 20150510 | 39794 |
| 20150511 | 50207 |
| 20150512 | 48912 |
| 20150513 | 47488 |
| 20150514 | 42258 |
| 20150515 | 35326 |
| 20150516 | 30668 |
| 20150517 | 36823 |
| 20150518 | 45273 |
| 20150519 | 45235 |
| 20150520 | 46261 |
| 20150521 | 44223 |
| 20150522 | 36338 |
| 20150523 | 30874 |
| 20150524 | 35416 |
| 20150525 | 36938 |
| 20150526 | 47188 |
| 20150527 | 45391 |
| 20150528 | 42579 |
| 20150529 | 33911 |
| 20150530 | 29156 |
| 20150531 | 27911 |
| 20150601 | 43494 |
| 20150602 | 42988 |
| 20150603 | 20695 |
| 20150604 | 33981 |
| 20150605 | 31555 |
| 20150606 | 27590 |
| 20150607 | 27485 |
| 20150608 | 40365 |
| 20150609 | 40099 |
| 20150610 | 35016 |
| 20150611 | 37374 |
| 20150612 | 16670 |
| 20150613 |  7321 |
| 20150614 |  6166 |
| 20150615 |  9010 |
| 20150616 |  1350 |
| 20150617 |    37 |
| 20150618 |    32 |
| 20150619 |    75 |
| 20150620 |    33 |
| 20150621 |    38 |
| 20150622 |   105 |
| 20150623 |    36 |
| 20150624 |    37 |
| 20150625 |    35 |
| 20150626 |   122 |
| 20150627 |    40 |
| 20150628 |    34 |
| 20150629 |    35 |
| 20150630 |    65 |
| 20150701 |    32 |
| 20150702 |    33 |
| 20150703 |    31 |
| 20150704 |    59 |
| 20150705 |    31 |
| 20150706 |    30 |
| 20150707 |    70 |
| 20150708 |    33 |
| 20150709 |    32 |
| 20150710 |    39 |
| 20150711 |    32 |
| 20150712 |    63 |
| 20150713 |    33 |
| 20150714 |    37 |
| 20150715 |    43 |
| 20150716 |    36 |
| 20150717 |    64 |
| 20150718 |    37 |
| 20150719 |    38 |
| 20150720 |    68 |
| 20150721 |    36 |
| 20150722 |    36 |
| 20150723 |    33 |
| 20150724 |    62 |
| 20150725 |    37 |
| 20150726 |    38 |
| 20150727 |    39 |
| 20150728 |   580 |
| 20150729 |  2005 |
| 20150730 |    66 |
| 20150731 |    35 |
| 20150801 |    40 |
| 20150802 |    37 |
| 20150803 |    66 |
| 20150804 |    37 |
| 20150805 |    36 |
| 20150806 |    34 |
| 20150807 |    34 |
| 20150808 |    66 |
| 20150809 |    33 |
| 20150810 |    37 |
| 20150811 |    35 |
| 20150812 |    94 |
| 20150813 |    61 |
| 20150814 |    35 |
| 20150815 |    34 |
| 20150816 |    44 |
| 20150817 |    35 |
| 20150818 |    37 |
| 20150819 |    33 |
| 20150820 |    32 |
| 20150821 |    64 |
| 20150822 |    30 |
| 20150823 |    35 |
| 20150824 |    37 |
| 20150825 |    36 |
| 20150826 |    35 |
| 20150827 |    37 |
| 20150828 |    35 |
| 20150829 |    39 |
| 20150830 |    32 |
| 20150831 |    65 |
| 20150901 |    43 |
| 20150902 |    40 |
| 20150903 |    44 |
| 20150904 |    79 |
| 20150905 |    43 |
| 20150906 |    44 |
| 20150907 |    44 |
| 20150908 |   159 |
| 20150909 |   114 |
| 20150910 |   596 |
| 20150911 |   454 |
| 20150912 |   454 |
| 20150913 |   473 |
| 20150914 |   454 |
| 20150915 |   456 |
| 20150916 |   382 |
| 20150917 |    34 |
| 20150918 |    38 |
| 20150919 |    34 |
| 20150920 |    34 |
| 20150921 |    62 |
| 20150922 |    37 |
| 20150923 |    39 |
| 20150924 |    46 |
| 20150925 |    43 |
| 20150926 |    44 |
| 20150927 |    45 |
| 20150928 |    48 |
| 20150929 |    39 |
| 20150930 |    42 |
| 20151001 |    41 |
| 20151002 |    46 |
| 20151003 |    43 |
| 20151004 |    45 |
| 20151005 |    42 |
| 20151006 |    43 |
| 20151007 |    45 |
| 20151008 |    44 |
| 20151009 |    41 |
| 20151010 |    42 |
| 20151011 |    48 |
| 20151012 |    43 |
| 20151013 |    41 |
| 20151014 |    39 |
| 20151015 |    46 |
| 20151016 |    40 |
| 20151017 |    46 |
| 20151018 |    46 |
| 20151019 |    40 |
| 20151020 |    42 |
| 20151021 |    42 |
| 20151022 |    43 |
| 20151023 |    41 |
| 20151024 |    42 |
| 20151025 |    42 |
| 20151026 |    38 |
| 20151027 |    43 |
| 20151028 |    44 |
| 20151029 |    43 |
| 20151030 |    42 |
| 20151031 |    39 |
| 20151101 |    41 |
| 20151102 |    42 |
| 20151103 |    41 |
| 20151104 |    44 |
| 20151105 |    40 |
| 20151106 |    43 |
| 20151107 |    41 |
| 20151108 |    39 |
| 20151109 |    43 |
| 20151110 |    41 |
| 20151111 |    44 |
| 20151112 |    40 |
| 20151113 |    46 |
| 20151114 |    41 |
| 20151115 |    42 |
| 20151116 |    42 |
| 20151117 |    43 |
| 20151118 |    41 |
| 20151119 |    42 |
| 20151120 |    37 |
+----------+-------+
mysql:research@analytics-store.eqiad.wmnet [log]> SELECT LEFT(timestamp,8) AS date, COUNT(DISTINCT(clientIP)) AS ips FROM log.MobileWikiAppDailyStats_12637385 GROUP BY date ORDER BY date;
+----------+-----+
| date     | ips |
+----------+-----+
| 20150717 |   1 |
| 20150720 |  53 |
| 20150721 |  29 |
| 20150722 |  29 |
| 20150723 |  29 |
| 20150724 |  58 |
| 20150725 |  29 |
| 20150726 |  29 |
| 20150727 |  29 |
| 20150728 |  29 |
| 20150729 |  30 |
| 20150730 |  58 |
| 20150731 |  29 |
| 20150801 |  29 |
| 20150802 |  29 |
| 20150803 |  53 |
| 20150804 |  29 |
| 20150805 |  29 |
| 20150806 |  29 |
| 20150807 |  29 |
| 20150808 |  58 |
| 20150809 |  29 |
| 20150810 |  29 |
| 20150811 |  29 |
| 20150812 |  87 |
| 20150813 |  59 |
| 20150814 |  29 |
| 20150815 |  30 |
| 20150816 |  29 |
| 20150817 |  30 |
| 20150818 |  29 |
| 20150819 |  29 |
| 20150820 |  29 |
| 20150821 |  58 |
| 20150822 |  29 |
| 20150823 |  29 |
| 20150824 |  29 |
| 20150825 |  29 |
| 20150826 |  29 |
| 20150827 |  29 |
| 20150828 |  29 |
| 20150829 |  29 |
| 20150830 |  29 |
| 20150831 |  57 |
| 20150901 |  37 |
| 20150902 |  37 |
| 20150903 |  37 |
| 20150904 |  74 |
| 20150905 |  37 |
| 20150906 |  37 |
| 20150907 |  37 |
| 20150908 | 135 |
| 20150909 | 105 |
| 20150910 | 470 |
| 20150911 | 439 |
| 20150912 | 440 |
| 20150913 | 442 |
| 20150914 | 442 |
| 20150915 | 431 |
| 20150916 | 377 |
| 20150917 |  29 |
| 20150918 |  29 |
| 20150919 |  29 |
| 20150920 |  29 |
| 20150921 |  58 |
| 20150922 |  29 |
| 20150923 |  29 |
| 20150924 |  37 |
| 20150925 |  37 |
| 20150926 |  37 |
| 20150927 |  37 |
| 20150928 |  37 |
| 20150929 |  37 |
| 20150930 |  37 |
| 20151001 |  37 |
| 20151002 |  37 |
| 20151003 |  37 |
| 20151004 |  37 |
| 20151005 |  37 |
| 20151006 |  37 |
| 20151007 |  37 |
| 20151008 |  37 |
| 20151009 |  37 |
| 20151010 |  37 |
| 20151011 |  37 |
| 20151012 |  37 |
| 20151013 |  37 |
| 20151014 |  37 |
| 20151015 |  37 |
| 20151016 |  38 |
| 20151017 |  37 |
| 20151018 |  38 |
| 20151019 |  37 |
| 20151020 |  37 |
| 20151021 |  38 |
| 20151022 |  37 |
| 20151023 |  37 |
| 20151024 |  37 |
| 20151025 |  37 |
| 20151026 |  37 |
| 20151027 |  37 |
| 20151028 |  37 |
| 20151029 |  37 |
| 20151030 |  37 |
| 20151031 |  37 |
| 20151101 |  37 |
| 20151102 |  37 |
| 20151103 |  37 |
| 20151104 |  37 |
| 20151105 |  37 |
| 20151106 |  37 |
| 20151107 |  37 |
| 20151108 |  37 |
| 20151109 |  37 |
| 20151110 |  37 |
| 20151111 |  37 |
| 20151112 |  37 |
| 20151113 |  37 |
| 20151114 |  37 |
| 20151115 |  37 |
| 20151116 |  37 |
| 20151117 |  37 |
| 20151118 |  37 |
| 20151119 |  38 |
| 20151120 |  31 |
+----------+-----+

Thanks for the detailed investigation, @Tbayer. Querying the WikimediaBlogVisit data was a very clever idea -- I wish I had thought of it :)

I think we should just drop this field and the associated code. I cannot recall a single case of it being used for its intended purpose, which is to catch spam. It was very consciously designed to be unusable for all other purposes.

I think we should just drop this field and the associated code. I cannot recall a single case of it being used for its intended purpose, which is to catch spam. It was very consciously designed to be unusable for all other purposes.

Well actually you used it yourself in blogreport.py (the aforementioned exception), for an estimate of daily uniques ;) But in general I agree; the fact that nobody had noticed for five months that the field was no longer generating valid data (probably) almost elsewhere does not fuel a burning conviction that the field is essential. With QuickSurveys though, it might become more relevant to be able to detect or mitigate "ballot stuffing".

BTW I tested two more tables from two other teams,
MultimediaViewerDuration_10427980 and Search_14361785. They show the same picture currently.

mysql:research@analytics-store.eqiad.wmnet [log]> SELECT COUNT(DISTINCT(clientIP)) FROM Search_14361785 WHERE timestamp LIKE '20151119%';
+---------------------------+
| COUNT(DISTINCT(clientIP)) |
+---------------------------+
|                        37 |
+---------------------------+

mysql:research@analytics-store.eqiad.wmnet [log]> SELECT COUNT(DISTINCT(clientIP)) FROM MultimediaViewerDuration_10427980 WHERE timestamp LIKE '20151119%';
+---------------------------+
| COUNT(DISTINCT(clientIP)) |
+---------------------------+
|                        37 |
+---------------------------+
Nuria subscribed.

Adding @Ottomata as we were talking about related ip chnages recently. I am in favour of removing the field entirely even with the awesome way we have to rotate the salt now.

@Nuria, which field are you referring to? clientIP in EL tables? If so, let's chat about it before removing that field since part of the research we are doing right now for the Reading team relies on that field and we need to make sure at least one alternative works well before removing this field.

@leila: Understood but see comments about this being broken on several tables since 20150616.

@Nuria, which field are you referring to? clientIP in EL tables? If so, let's chat about it before removing that field since part of the research we are doing right now for the Reading team relies on that field and we need to make sure at least one alternative works well before removing this field.

Research that relies on that field is probably bogus because of the issue @Tbayer identified. It looks highly likely at this point that starting from June, the hashed IP was an internal one, rather than the actual client IP.

BTW, we are soon removing ip and x_forwarded_for from the webrequest table, in favor of the X-Client-IP header that will be set on all varnish requests. We should make eventlogging use this header and set clientIp to that.

@Nuria and @ori, agreed. Please keep me in the loop if a discussion happens outside of this thread and T118557. We have an upcoming research in Q3 that relies heavily on the outcome of these discussions.

BTW, we are soon removing ip and x_forwarded_for from the webrequest table, in favor of the X-Client-IP header that will be set on all varnish requests. We should make eventlogging use this header and set clientIp to that.

Why not simply drop it?

@leila, @Ottomata: note that even then not all evenlogging requests go through varnish, only the ones that come from the javascript client, there are 4 clients for EL: IOs, android, php and javascript.

fgiunchedi triaged this task as Medium priority.Dec 7 2015, 2:52 PM
fgiunchedi subscribed.

@ori we need it for the reader research, for matching QuickSurvey responses to the webrequest logs.

@Ottomata, can we look into this ticket?

Yeah! @Nuria, how do we figure out if we should fix this, or drop IPs? It seems some want it dropped, but research needs them.

Should we just fix, or is there more discussion to be had?

@Ottomata:
I think it will be unfair to remove the IP if research is relying on it for near term projects, so let's fix it. salted IPs should be getting purged once we turn purging back on (this is pending the toku db updates) so they should only be held for 90 days max. FYI, that EL does not store IPs in the clear.

Moving item to kanban

Ok, I think to fix this, we just need to change the varnishkafka format string to send the X-Client-IP header in place of the %h host IP header.

madhuvishy renamed this task from EventLogging sees too few distinct client IPs to EventLogging sees too few distinct client IPs {oryx} [8 pts].Jan 11 2016, 6:59 PM
Nuria renamed this task from EventLogging sees too few distinct client IPs {oryx} [8 pts] to EventLogging sees too few distinct client IPs [8].Jan 11 2016, 6:59 PM
madhuvishy renamed this task from EventLogging sees too few distinct client IPs [8] to EventLogging sees too few distinct client IPs {oryx} [8 pts].Jan 11 2016, 7:08 PM

Change 263623 had a related patch set uploaded (by Ottomata):
Use X-Client-IP instead of %h for eventlogging varnishkafka instance

https://gerrit.wikimedia.org/r/263623

Change 263623 merged by Ottomata:
Use X-Client-IP instead of %h for eventlogging varnishkafka instance

https://gerrit.wikimedia.org/r/263623

Will this do XFF resolution or just the immediate client IP? (Vast fix either way, mind!)

Yes, it does XFF. It is the new canonical way of IDing client IPs, and is done in varnish for all requests.

Ok, should be deployed. Can someone verify that new data looks good?

@Ironholds or @Tbayer, can you confirm that clientIPs make more sense now?

@Ottomata I checked couple of tables that I knew and the diversity of hashed IPs looks healthy. I also looked at two of Tilman's examples and they definitely see more hashed IPs. It's hard for me to know if everything is correct, though, unless we look at a specific table, we generate some requests, and see if those are registered with the same hashed IP.
@Tbayer, do you think we are ready to call this task done?