Page MenuHomePhabricator

Add 'Pages Created' data to downloadable Wikitext report
Closed, ResolvedPublic2 Estimated Story Points

Description

Pages Created reports give details on all articles created during an event. This downloadable Wikitext version provides event organizers with a quick way to share information by posting it on wiki.

  • Metric definitions: This report presents a subset of the metrics defined in T206058, which describes the CSV version of the this report. All metric here use the same definitions.

Report Content

Metrics/layout

  • The left-most column of the report will be a list of (linked) page titles.
  • The metrics in each row will be presented/calculated as they apply to the particular article listed in the left column. E.g., "Bytes changed" means bytes changed for that article (as opposed to the same figure on the Event Summary reports, where it gives the total for the whole event).
  • Metrics formatted as links: Unlike the CSV version of this report, this Wikitext version presents certain fields as links:
    • "Title": combine page title with page URL to form links
    • "Creator": combine username with URL of the userpage, on the same wiki as the Page Created.
    • "More page metrics": combine the URL with the word "more"
  • Sortable columns: Table columns will be sortable (see mockup).
  • Default sort The default table sort will be by Avg. daily pageviews, descending
  • TABLE MOCKUP: For wikitext styling, layout, etc., follow the link to a sample table (screenshot below).

Screen Shot 2018-10-11 at 11.22.17 AM.png (187×1 px, 57 KB)

The column headings will be, in order (pls use this approved wording):

  • Title [present as link, see above]
  • Creator [present as link, see above]
  • Wiki
  • Edits during event [method defined in T206821]
  • Bytes changed during event [method defined in T206820]
  • Pageviews, cumulative [method defined in T206817]
  • Avg. daily pageviews [method defined in T206817]
  • Incoming links [method defined in T214219]
  • More page metrics [present as link, see above]

Descriptive info

At the top of the table, provide the following information. See mockup for design and layout.

  • Pages Created * Event name
  • Last updated yyyy-mm-dd hh:mm (timezonecountry/city) [Note: this is the time of the last Update, not the download, and the timezone of the event not the user]
  • About these figures [links to help doc defined in T210896]

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
MusikAnimal moved this task from Ready to In Development on the Community-Tech-Sprint board.

Just as with the Event Summary report, it's easier to start with wikitext than CSV only so that I don't have to keep opening up spreadsheet software. So I'm taking on this before T206058.

Preliminary PR at https://github.com/wikimedia/eventmetrics/pull/185. I've requested review on GitHub but leaving this task under "In development" as I'm still working on more PRs.

Avg. daily pageviews [method defined in T206817]

@jmatazzoni The existing method used in the Event Summary report gets the average pageviews over the past 30 days. I'm assuming the same should apply for the Pages Created report, however in this case it's easier to get the overall average (pages cumulative / num. days since page creation). Would that be fine?

Avg. daily pageviews [method defined in T206817]

@jmatazzoni The existing method used in the Event Summary report gets the average pageviews over the past 30 days. I'm assuming the same should apply for the Pages Created report, however in this case it's easier to get the overall average (pages cumulative / num. days since page creation). Would that be fine?

Actually let me get back to you on that... But if by chance you did want an overall average (and not the past 30 days), let me know.

Avg. daily pageviews [method defined in T206817]

@jmatazzoni The existing method used in the Event Summary report gets the average pageviews over the past 30 days. I'm assuming the same should apply for the Pages Created report, however in this case it's easier to get the overall average (pages cumulative / num. days since page creation). Would that be fine?

Actually let me get back to you on that... But if by chance you did want an overall average (and not the past 30 days), let me know.

A 30-day average is more desirable, since 1) in the long term it will give a better picture of the current rate at which pageviews accumulate, and 2) the figure will be comparable to the one for Pages Improved. If this is a problem we can discuss alternatives.

Ready for QA! The report is accessible at /programs/:programId/events/:eventId/pages-created, e.g. https://eventmetrics-dev.wmflabs.org/programs/81/events/141/pages-created

@MusikAnimal When I go to:
https://eventmetrics-dev.wmflabs.org/programs/46/events/65/pages-created

500: Internal Server Error
The server said: Unable to determine database name for domain '*.wikipedia'.

Looked at the Pages Created report for a few different events, including events with multiple wikis.

Default sort The default table sort will be by Avg. daily pageviews, descending

There is no default sort. All columns are sortable.

Title [present as link, see above]
Creator [present as link, see above]

Checked a few, were correct. Including correct project.

Edits during event [method defined in T206821]

Page creation is counted as an edit. For example, this event only contains one revision, which is a page creation. Number of edits reported by pages-created = 1 (https://eventmetrics-dev.wmflabs.org/programs/118/events/262/pages-created).

Bytes changed during event [method defined in T206820]

I only checked this for cases where it was easy to determine, such as when the only edit to a page during an event was a page creation.

Pageviews, cumulative [method defined in T206817]

Accurate when compared with https://tools.wmflabs.org/pageviews

Avg. daily pageviews [method defined in T206817]

Accurate when compared with https://tools.wmflabs.org/pageviews.

Incoming links [method defined in T214219]

Accurate if we only want to list pages which are in namespace 0. This would be consistent with similar metrics (e.g. files in use).

However, I believe we are counting pages which redirect to the created page as incoming links. Do we want this?

For example, this report shows 5 Incoming links for "First Lady Michelle Obama (painting)". XTools has 8 (although it includes other namespaces as well). Here's the association "Pages that link to": https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/First%20Lady%20Michelle%20Obama%20(painting)&hidetrans=1 showing most of the incoming links are page redirects.

More page metrics [present as link, see above]

URL appears to take you to the correct page information, including the correct project.

If an event created a page with the same title in different projects (e.g. https://de.wikipedia.org/wiki/Michelle_Obama and https://es.wikipedia.org/wiki/Michelle_Obama) it appeared to calculate the stats for the two pages correctly as well and the links in the tables were all correct.

Nor are redirect pages (created when you move a page) counted as created pages. I think this is correct (and is consistent with T216557).

Thank you for the thorough review!! :) I've got a follow-up PR at https://github.com/wikimedia/eventmetrics/pull/197

There is no default sort. All columns are sortable.

Good catch. This is fixed with the new PR.

Page creation is counted as an edit. For example, this event only contains one revision, which is a page creation. Number of edits reported by pages-created = 1 (https://eventmetrics-dev.wmflabs.org/programs/118/events/262/pages-created).

This I hope is the intended behaviour. Page creation requires an edit, after all. Pinging @jmatazzoni to be sure.

I believe we are counting pages which redirect to the created page as incoming links. Do we want this?

I am safely assuming not. Easy fix, included in the new PR.

I ran a report for my Reverse Engineered Event, which is designed to have one of everything (@MusikAnimal, I made you an Organizer in case you want to use for testing). Here is the report.

It looks great! I do see a couple of things: Edits during event and Bytes changed during event are both 0 for all pages. I think that is wrong. E.g., here is the history for the article Gustavo Baquero. The event time period is 9 am to 12pm on 2/5 PT. During that time there were a half-dozen saves, all by the original creator. Same story for this article, which has one save after the creation, also by the creator.

Something is clearly awry. Looking into this now.

Silly developer error. The timestamps were not in UTC as they should be when querying the database. Thanks for catching this!

PR: https://github.com/wikimedia/eventmetrics/pull/201

I ran a report for my Reverse Engineered Event...Edits during event and Bytes changed during event are both 0 for all pages...

Now report numbers for Number of Edits and Bytes Changed.

There is no default sort. All columns are sortable.

Good catch. This is fixed with the new PR.

Rows appear to be sorted.

I believe we are counting pages which redirect to the created page as incoming links. Do we want this?

I am safely assuming not. Easy fix, included in the new PR.

Re-ran this event and the figures I got in the first row were the same as the non-redirect, namespace 0 pages here: https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/First%20Lady%20Michelle%20Obama%20(painting)&hidetrans=1

@MusikAnimal @dom_walden I'm noticing a discrepancy in Avg. daily pageviews metrics between our calculation and Xtools. Here are some examples.

Article, Gustavo Baquero, for 2/5 - 3/3

Q61506256 for 2/5 - 3/3

This last one gives a pretty good clue as to what is happening. If you look at the Xtools graph, pageviews occurred in only 3 of the last 30 days. The grand total of 14 pvs / 3 = 4, kind of. So it appears we may not be averaging over the entire period but only for days where we have reports.

This last one gives a pretty good clue as to what is happening. If you look at the Xtools graph, pageviews occurred in only 3 of the last 30 days. The grand total of 14 pvs / 3 = 4, kind of. So it appears we may not be averaging over the entire period but only for days where we have reports.

I think you're right. Event Metrics uses an API call like https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/wikidata/all-access/user/Q61506256/daily/20190205/20190303 to get the page view data. As far as I can see, getPageviewsPerArticle() assumes that this will return one item per day and does its calculations based on this, but this is not always the case (as in your example).

Good catch of a very subtle bug.

Should the "period" by which we average be the number of days between the event start and the report run? Or should it be 30 days as a constant period?

In T205502#5000862, @dom_walden wrote:

This last one gives a pretty good clue as to what is happening. If you look at the Xtools graph, pageviews occurred in only 3 of the last 30 days. The grand total of 14 pvs / 3 = 4, kind of. So it appears we may not be averaging over the entire period but only for days where we have reports.

I think you're right. Event Metrics uses an API call like https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/wikidata/all-access/user/Q61506256/daily/20190205/20190303 to get the page view data. As far as I can see, getPageviewsPerArticle() assumes that this will return one item per day and does its calculations based on this, but this is not always the case (as in your example).

Good catch of a very subtle bug.

@MusikAnimal, please have a look at this issue. Should we handle this issue in this ticket, for the report, or should we reopen T206817, about pageviews?

This last one gives a pretty good clue as to what is happening. If you look at the Xtools graph, pageviews occurred in only 3 of the last 30 days. The grand total of 14 pvs / 3 = 4, kind of. So it appears we may not be averaging over the entire period but only for days where we have reports.

You mean Pageviews Analysis, not XTools :)

I think you're right. Event Metrics uses an API call like https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/wikidata/all-access/user/Q61506256/daily/20190205/20190303 to get the page view data. As far as I can see, getPageviewsPerArticle() assumes that this will return one item per day and does its calculations based on this, but this is not always the case (as in your example).

Good catch of a very subtle bug.

MusikAnimal, please have a look at this issue. Should we handle this issue in this ticket, for the report, or should we reopen T206817, about pageviews?

This is one of the known gotchas of the pageviews API. The bug is a side effect of our recent change to use the "per-article" strategy, where our average is computed based on how long the article existed. I can fix this now. How you want to handle it ticket-wise is up to you.

In T205502#5002262, @MusikAnimal wrote:

This is one of the known gotchas of the pageviews API. The bug is a side effect of our recent change to use the "per-article" strategy, where our average is computed based on how long the article existed. I can fix this now. How you want to handle it ticket-wise is up to you.

Oh good. Glad it's fixable. Go ahead and fix in this ticket then. I'll move this to In Dev for you to work on. Thanks .

@MusikAnimal, I just noticed all the numbers in this wikitext table are left aligned. Numbers are usually right aligned, I think. Is it possible to set the alignment for the numbers columns (only)? I.e.:

Edits during event
Bytes changed during event
Pageviews, cumulative
Avg. daily pageviews
Incoming links

MusikAnimal, I just noticed all the numbers in this wikitext table are left aligned. Numbers are usually right aligned, I think. Is it possible to set the alignment for the numbers columns (only)? I.e.:

Edits during event
Bytes changed during event
Pageviews, cumulative
Avg. daily pageviews
Incoming links

Done with https://github.com/wikimedia/eventmetrics/pull/210. Moving to Needs Review

PR merged. Numeric cells should now be right-aligned.

This is pretty simple and solely about presentation, so going to jump straight to product sign-off.