Page MenuHomePhabricator

Create a method for counting 'Pages with uploaded files'
Closed, ResolvedPublic5 Estimated Story Points

Description

For this ticket, you will create a method for counting the unique articles on which Files Uploaded get placed. (This is a new metric for Event Metrics and was split off from T206819)

Definitions of metrics that use this method

  • Unique pages with uploaded files A count of the total number of non-duplicate articles all the Files Uploaded for the event are on. Used on the Event Summary reports (T205561 and T206692)
  • Pages using this file The same metric but for an individual uploaded file. I.e., this provides a count of all the articles on which a single file is placed. As used in the Files Uploaded reports (T212547 and T214942)

Requirements

This metric relies on the Files Uploaded metric and method, which are created in T206819.

  • All filetypes: These metrics will track images, video files, audio files and other upload types.
  • Uploads to Commons and local Wikipedias: We will track uploads to any Wikipedia and to Commons, as long as the wikis are specified as wikis of interest for the event.
  • Article pages on all wikis (not just those specified): The Main space articles counted, on the other hand, can be on any wiki; the wikis do not need to be specified as wikis of interest in setup.

Event Timeline

jmatazzoni renamed this task from Create a method for 'Pages with uploaded files' to Create a method for counting 'Pages with uploaded files' .Feb 5 2019, 10:53 PM
jmatazzoni updated the task description. (Show Details)
jmatazzoni updated the task description. (Show Details)
jmatazzoni updated the task description. (Show Details)

Wait, looks like we already have it - file-usage looks exactly like this?

Wait, looks like we already have it - file-usage looks exactly like this?

This basically needs to provide two types of numbers: 1) on the Summary report, a count of all non-duplicate pages on which uploaded files are placed, and 2) On the Files Uploaded report, a count for each individual file.

If you can get those from the existing methods, then great. But @MusikAnimal thought this was necessary, since it is a new metric. (What Grant Metrics had before was "Files in use," which grabs the other end of the stick, looking at how many files are on at least one page.)

file-usage is the number of files uploaded that are currently being used in articles. This task is about the number of unique pages that use those files, which is a new metric. You could probably expand the existing EventRepository::getFileUsage() method to return a distinct count of gil_wiki and gil_page, that way we get both information in the same query. Side note -- the comment for that method is wrong, it makes it sound like it's doing "Pages with uploaded files", which is what it did back in the early days of Grant Metrics. I've got that fixed as part of https://github.com/wikimedia/eventmetrics/pull/176

Merged. This can be QA'd using the Event Summary report.

Unique pages with uploaded files A count of the total number of non-duplicate articles all the Files Uploaded for the event are on. Used on the Event Summary reports (T205561 and T206692)
...
Article pages on all wikis (not just those specified): The Main space articles counted, on the other hand, can be on any wiki; the wikis do not need to be specified as wikis of interest in setup.

https://eventmetrics-dev.wmflabs.org/programs/104/events/243/summary, which has images uploaded to commons, comparing with querying commonswiki_p.globalimagelinks. Figures I got were the same. Appears to be counting pages from various wikis including wikidata.

https://eventmetrics-dev.wmflabs.org/programs/46/events/65/summary, which has images uploaded to commons and az.wiki. Querying azwiki_.imagelinks shows 3 pages using the images uploaded to az.wiki during the event. commonswiki_p.globalimagelinks shows 26 pages from various wikis using images uploaded to commons. Figure in Event Summary is 29, so consistent.

We are only counting pages in namespace 0, which is consistent with the "Files in use" method.

As an alternative oracle, I checked figures for https://eventmetrics-dev.wmflabs.org/programs/105/events/214 against what I could derive from the Edit List and what the "File:" pages tell you about "File usage on other wikis". Figures I got were the same.

As this task refactored getFileUsage(), to test for any regressions to existing methods I made sure I compared "Files in use" before and after updating the statistics for the above events (and a few others). I did not see this figure change.

These methods are also affected by the issue noted in T206819#4948234.

Pages using this file The same metric but for an individual uploaded file. I.e., this provides a count of all the articles on which a single file is placed. As used in the Files Uploaded reports (T212547 and T214942)

I could not test this method, as no report uses it yet. @jmatazzoni should this remain in the QA column?

All filetypes: These metrics will track images, video files, audio files and other upload types.

I did not test this with video, audio, etc. I believe all uploaded files are treated the same (i.e. they appear in the image table, links to files appear in the imagelinks table) (https://www.mediawiki.org/wiki/Help:Images)

In T215356#4971354, @dom_walden wrote:

Pages using this file The same metric but for an individual uploaded file. I.e., this provides a count of all the articles on which a single file is placed. As used in the Files Uploaded reports (T212547 and T214942)

I could not test this method, as no report uses it yet. @jmatazzoni should this remain in the QA column?

No. Let's go ahead and clear this. Especially as Files Uploaded just dropped off the project list. If we bring it back, we can test all this then.

@dom_walden, if this looks good to you, I'm going to mark off on T206692 that "Unique pages with uploaded files" is in place in the Wikitext report.