Page MenuHomePhabricator

Negative total number of bytes for German Wikipedia in 2001?
Open, Needs TriagePublic

Description

Why does https://stats.wikimedia.org/v2/#/de.wikipedia.org/content/net-bytes-difference/normal|table|All|page_type~content*non-content show that the content namespaces of the german wikipedia has at 2001-12-01 -256597 (=0+0+488+7286+(-1884)+151740+112580+33938+161258+65165+(-9106)+(-778062)) Bytes or do I misunderstand something?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 9 2018, 10:16 AM
Cirdan renamed this task from Wikistats Bug to Negative total number of bytes for German Wikipedia in 2001?.Sep 9 2018, 3:39 PM
Cirdan updated the task description. (Show Details)
Nuria added a subscriber: Nuria.Sep 10 2018, 4:01 PM

Negative bytes are due to deletion of revisions/pages. is that your question?

Nuria moved this task from Incoming to Radar on the Analytics board.Sep 10 2018, 4:28 PM

My understanding of the question is: How come is it possible that the global-sum of net-bytes since the beginning of de.wikipedia.org can be negative?

My two cents in that respect: We have data up to 2001, but we know it is incomplete. First, not al of very old (before 2005) is present, and second, the oldest seems to actually be missing.
An example: this page historical edits from 2001 are only removing bytes. However the previous versions of the content is missing from the DB.
This is my understanding of why the total number bytes after 1 year can be negative.
We have a plan to try to reconciliate very-old sources of data (for instance that one , but it's not a priority as of now.

You understand my question in the right way. I wanted to make a graph of the data in the de.wikipedia.org content namespace and I was very confused. Do you know a metric that shows the total number of bytes?

The closest to total number of bytes we have is the metric you have checked. As explained previously, the negative aspect of it is due to historical changes, and should be negligible after 2005. For before, there is no easy solution as of now.