Number of Wikipedia Zero increasing drastically in mid March 2014
It seems the number of log lines has increased a lot over the
last few weeks from ~2.5M/day to 3.3M/day [1].

Is this increase sane?

(Is the increase related to switching on HTTPS for zero?)


qchris@stat1002 0 21:01:43
cwd: ~
for i in /a/squid/archive/zero/zero.tsv.log-201403* ; do echo "$i: $(zcat $i | wc -l)" ; done
/a/squid/archive/zero/zero.tsv.log-20140301.gz: 2572772
/a/squid/archive/zero/zero.tsv.log-20140302.gz: 2550606
/a/squid/archive/zero/zero.tsv.log-20140303.gz: 2687931
/a/squid/archive/zero/zero.tsv.log-20140304.gz: 2749754
/a/squid/archive/zero/zero.tsv.log-20140305.gz: 2669759
/a/squid/archive/zero/zero.tsv.log-20140306.gz: 2733986
/a/squid/archive/zero/zero.tsv.log-20140307.gz: 2680985
/a/squid/archive/zero/zero.tsv.log-20140308.gz: 2517903
/a/squid/archive/zero/zero.tsv.log-20140309.gz: 2577466
/a/squid/archive/zero/zero.tsv.log-20140310.gz: 2845407
/a/squid/archive/zero/zero.tsv.log-20140311.gz: 2945301
/a/squid/archive/zero/zero.tsv.log-20140312.gz: 3010404
/a/squid/archive/zero/zero.tsv.log-20140313.gz: 2871820
/a/squid/archive/zero/zero.tsv.log-20140314.gz: 2880289
/a/squid/archive/zero/zero.tsv.log-20140315.gz: 2744255
/a/squid/archive/zero/zero.tsv.log-20140316.gz: 2771308
/a/squid/archive/zero/zero.tsv.log-20140317.gz: 2958194
/a/squid/archive/zero/zero.tsv.log-20140318.gz: 3192418
/a/squid/archive/zero/zero.tsv.log-20140319.gz: 3352401

(In reply to Toby Negrin from comment #2)

Hi Dan -- can you please triage?



Sure I'll investigate.

  • Dan

Dan asked me to review. I'll examine over the next few days. Need tomorrow to think about it, then probably Monday to analyze and Tuesday to do a second pass.

Since numbers reported by our monitoring went to >5M today, I had a
quick look, just to make sure our infrastructure is not badly broken.

Lines for SSL requests since yesterday skyrocketed.
Lines for carrier 470-01 since yesterday skyrocketed.

So to me it currently does not look like a problem with the analytics

I don't know if this may be related: bug 62980

(In reply to Jesús Martínez Novo (Ciencia Al Poder) from comment #6)

I don't know if this may be related: bug 62980

Thanks for the pointer!

It's a bit subtle, but the difference between bug 62980 and this bug is
between plain number of log lines (this bug) and which of those log lines
get counted as page views (bug 62980).

So to me, they are separate things.

What's the current thinking here? Has there been any more investigation?

(In reply to Greg Grossmeier from comment #8)

What's the current thinking here? Has there been any more investigation?

We've played whack-a-mole, and will probably need to keep doing so, until the root cause is addressed with the operator.

Is someone still looking into this?

Can we identify which partner (X-CS) is responsible for the increase at that time? I can look into more details once I have that information.

Sorry, this ticket is quite old and refers to an infrastructure we no longer use to count pageviews for zero . Closing.