Page MenuHomePhabricator

Add 'Pages Improved' data to downloadable csv
Closed, ResolvedPublic3 Estimated Story Points


The Pages Improved downloadable reports give details on all Main space pages edited during an event. This report in csv (spreadsheet) format will provide event organizers with data that they can sort and recombine in order to create documents or other reports for partners, grantors, bosses, etc.

  • Metric definitions: See below, under "Definitions of Metrics"
  • Event details: In addition to the data/metrics, the CSV file will contain some descriptive information about the event and the report. See below under "Event details."
  • Report filename: when the user saves the report, the filename should follow this format: pages-improved_event-name

Report Content

Metrics/column names

  • The left-most column of the report will be a list of page titles.
  • The metrics in each row will be presented/calculated as they apply to the particular page listed in the left column. E.g., "Bytes changed" means bytes changed for that article (as opposed to the same figure on the Event Summary reports, where it totals the bytes changed for the whole event).
  • The default sort order will be by "Avg. daily pageviews", descending
  • The column headings will be, in order (pls use this approved wording):
  • Title
  • URL
  • Wiki
  • Edits during event [method defined in T206821]
  • Bytes changed during event [method defined in T206821]
  • Avg. daily pageviews [method defined in T206817]
  • Incoming links [method defined in T214219]
  • More page metrics [see below for formatting info]

Event details

At the bottom of the csv report, please list the following in the left column below a label that says: Event details

  • |Pages improved:|Eventname|
  • |Timezone:|Timezonecountry/City|
  • |Start date:|yyyy-mm-dd hh:mm|
  • |End date:|yyyy-mm-dd hh:mm|
  • |Last updated:|yyyy-mm-dd hh:mm|

Metric definitions & formatting

  • Title: page title of each Main space Page Improved (consistent with all active filters). This is an article-by-article listing of the pages counted in the Pages Improved metric from T205561.
  • URL of the page above.
  • Wiki where the article exists. Limited to the short list of wikis defined on the Event Setup screen for the event. For space reasons, label all Wikipedias only using the language name—"Spanish," French," etc. (i.e., omit "Wikipedia"). List "Commons" and "Wikidata" as such.
  • Edits during event The edit count to the article during the event period.
  • Bytes changed during event The net bytes changed to the page during the event period. If the Bytes Changed is a negative number, please include a - (but don't use a + for positive numbers, since it screws up alignment in CSVs).
  • Avg. daily pageviews is an average over the preceding 30 days. If 30 days are not available, use the average of however many days are available. If a count is not available for some reason (e.g., during the first day of a page's existence), show "n/a", for "not available" rather than 0.
  • Incoming links A count of links to the article (cumulative, i.e., since creation)
  • More page metrics provides a URL that links users to the XTools "Page History" page for that article. In CSV reports, just list the URL. In Wikitext reports, combine the URL with the word "more" to form links.

Data that are fixed at event close vs. data that continue to develop

Figures like Pageviews naturally continue to develop after the event is over. Other figures can be considered fixed once the event period is over; these could be stored and need never be calculated again. Here is a breakdown for this report:

Remain fixed

  • Wiki
  • Edits during event
  • Bytes changed during event

Continue to develop

  • Title
  • URL
  • Avg. daily pageviews
  • Incoming links
  • More page metrics [URL]

Related Objects

Event Timeline

jmatazzoni created this task.
jmatazzoni removed the point value for this task.
jmatazzoni updated the task description. (Show Details)
jmatazzoni renamed this task from Implement 'Pages Improved' downloadable csv, version 1 to Implement 'Pages Improved' downloadable csv .Nov 29 2018, 11:15 PM
jmatazzoni updated the task description. (Show Details)

@Mooeypoo, have a look at this ticket, which defines the "Articles Improved" report. As I'd hoped, pretty much every metric here is also in the Articles Created report.

The only exception is Editors, which is similar to T208546, except that it lists the individual editors. I will write a "Method" ticket for that.

jmatazzoni renamed this task from Implement 'Pages Improved' downloadable csv to Add 'Pages Improved' data to downloadable csv .Dec 1 2018, 12:06 AM
jmatazzoni updated the task description. (Show Details)
jmatazzoni renamed this task from Add 'Pages Improved' data to downloadable csv to Add 'Pages Improved' data to downloadable csv (release II).Jan 15 2019, 11:39 PM
jmatazzoni renamed this task from Add 'Pages Improved' data to downloadable csv (release II) to Add 'Pages Improved' data to downloadable csv .Jan 30 2019, 10:47 PM
jmatazzoni updated the task description. (Show Details)
jmatazzoni updated the task description. (Show Details)

Desirable but not for MVP / Deprecated Metrics

This is a list of metrics we'd wanted for this report but which were judged out of scope for the first MVP

  • Bytes changed subsequently
  • # of words changed subsequently
  • Still exists? [T206695]
  • Words added during event [T206690]
  • Words removed during event [T206690]
  • Net % change in words during event [ T206690]
  • Article class (where available)

@MaxSem, I don't see this in the Download menu. Is there a url parameter to test?

@dom_walden, I proofed the csv report against the spec and can confirm that the language and formatting are as specified. That leaves you to concentrate on checking the accuracy.

I also checked to make sure that the number of Pages Improved and the number of Wikidata items reported here are consistent with the numbers reported in the Summary report. (We've decided to include the Wikidata items in this report, since we dropped the originally planned Wikidata report.)

@MaxSem I did spot some errors in presenting special characters in page titles. See below. Is this something that can be fixed easily? (The links for all these pages work fine, so no worries there).

  • Juan Guaidó
  • Leopoldo López
  • Plan País
  • Humberto Calderón Berti

There should be an option in Open Office to choose an encoding when importing CSVs (choose UTF-8). If there isn't, don't think we can do much. For comparison, in Numbers it works by default.

@MaxSem I think the xtools links should not convert "/" to "%2F" (in the title of a page).

For example, program "dwalden Category filter" event "Cross-wiki" has 3 en.wikivoyage entries. Their xtools links take you to a 404 page.

This is the case in the wikitext and csv report.

Also, the csv and wikitext reports have a tab/spaces on each of the first lines. I don't know if this will cause problems (for example, LibreOffice does not seem to care, although it did cause me problems with Python pandas when trying to compare the titles in the csv report with the titles I got from the database, it did not recognise them as equal).

@MaxSem I think the xtools links should not convert "/" to "%2F" (in the title of a page).

Raised as T219779.

@MaxSem I think the xtools links should not convert "/" to "%2F" (in the title of a page).

Raised as T219779.

Thanks Dom. I've moved T219779 to the Sprint, so—@MaxSem — it does not need to be addressed in this ticket.

Ran the pages improved report for a number of different events across different wikis, including commons and wikidata, and for events with more than one wiki which were filtered by Categories, Participants and both Categories and Participants.

I compared the output to data I had collected myself. The pages/items included in the report matched, as well as the metrics "Edits during event", "Bytes changed during event" and "Incoming links". There were minor discrepancies in "Avg. daily pageviews" (i.e. +/-1) for a small number of pages, probably due to issues outlined in T217704.

jmatazzoni moved this task from Product sign-off to Done on the Community-Tech-Sprint board.

Thanks for testing this thoroughly @dom_walden. It's a good feeling knowing you've been through it. Resolving.