Page MenuHomePhabricator

[BUG] "Files in use" metric includes files used on non-mainspace pages
Closed, ResolvedPublicBUG REPORT

Description

What do do

  • Please restrict "Files in use" so that a file is "in use" only if it is placed on a main namespace page.
  • Please update the on-screen metric glosses as follows (I've updated the Help page already):
    • Avg. daily views to uploaded files: A 31-day average of daily pageviews to all main namespace pages on all wikis on which Uploaded files have been placed (compiled once per day).  If fewer than 31 days are available, the average is calculated for the available days.
    • Unique pages with uploaded files: A count of all non-duplicate main namespace pages, on all wikis, on which Files Uploaded have been placed.  
    • Uploaded files in use: A count of the Uploaded Files that have been placed on at least one main namespace page on any wiki.

What is the problem? (background)

"Files in use" appears to count files as being in use if they are used on non-mainspace pages.

For example, this event reports 6 files uploaded and 1 in use. However, of the 6 files, the only one in use is https://commons.wikimedia.org/wiki/File%3AReichstag_building_main_entrance_01.jpg which is used on a Wikipedia namespace page.

The getUsedFiles method in EventRepository.php does not include "gil_page_namespace_id = 0" or "il_from_namespace = 0", as other file metrics do.

Steps to reproduce problem
  1. Make an event which only includes the file upload for https://commons.wikimedia.org/wiki/File%3AReichstag_building_main_entrance_01.jpg
  2. Calculate metrics

Expected behavior: Reports 0 files in use
Observed behavior: Reports 1 file in use

Event Timeline

Thanks for reporting this Dom, but the definition of this metric doesn't say anything about main namespace, and I think reporting all the pages where a file gets posted is fine for our purposes. Closing this.

@jmatazzoni I guess there was confusion from the start. The other file metrics ("avg. daily views to files uploaded", and "unique pages with files uploaded"), both only look at mainspace pages. This can be changed, but I'm letting you know in case you by chance wanted it limited to the mainspace at some point, but side-stepped this requirement for other reasons. Doesn't matter which way we go -- just comment here and I will change the code accordingly.

In T219379#5063473, @MusikAnimal wrote:

@jmatazzoni I guess there was confusion from the start. The other file metrics ("avg. daily views to files uploaded", and "unique pages with files uploaded"), both only look at mainspace pages. This can be changed, but I'm letting you know in case you by chance wanted it limited to the mainspace at some point, but side-stepped this requirement for other reasons. Doesn't matter which way we go -- just comment here and I will change the code accordingly.

Thanks for pointing this inconsistency out Leon. Would there be a performance hit if we decided to include all the namespaces for all these metrics? I.e., if we made "avg. daily views to files uploaded", and "unique pages with files uploaded" both count all pages where a file is uploaded?

Would there be a performance hit if we decided to include all the namespaces for all these metrics? I.e., if we made "avg. daily views to files uploaded", and "unique pages with files uploaded" both count all pages where a file is uploaded?

Probably, though I doubt by much. There can't be but so many uses outside the mainspace, anyway (at least for your typical Commons-oriented event).

That said, shall I proceed with making all file metrics include all namespaces?

I don't know how much work this is but I suggest we get the meatier work out of the way first.

In T219379#5063690, @MusikAnimal wrote:

That said, shall I proceed with making all file metrics include all namespaces?

Not at this time. I'll make a ticket and we'll consider whether it's something we want to get to. Thanks.

That said, shall I proceed with making all file metrics include all namespaces?

Not at this time. I'll make a ticket and we'll consider whether it's something we want to get to. Thanks.

You don't think it should be consistent? Or at least we should update our messaging. If time investment is the concern, changing "files in use" to look at only mainspace will take maybe 15 minutes. Reversing the logic for all metrics will maybe an hour (won't go into technical details, still rather straightforward, just a bit more involved). Those are conservative estimates. My free time is available too until we hit our deadline... I want to make this perfect! :)

In T219379#5063473, @MusikAnimal wrote:

@jmatazzoni I guess there was confusion from the start. The other file metrics ("avg. daily views to files uploaded", and "unique pages with files uploaded"), both only look at mainspace pages...

In T219379#5063907, @MusikAnimal wrote:

You don't think it should be consistent? Or at least we should update our messaging. If time investment is the concern, changing "files in use" to look at only mainspace will take maybe 15 minutes...

Thinking about this and rereading the comments, the answer seems clear: if "Avg. daily views to files uploaded" is counting Main namespace pages only, let's standardize on that (because I don't want to mess with that tricky metric at this point). So, the answer to this issue is the simple one, as follows (I've also copied this to the Description, above):

  • Please restrict "Files in use" so that a file is "in use" only if it is placed on a main namespace page.
  • Please update the on-screen metric glosses to as follows (I've updated the Help page already):
    • Avg. daily views to files uploaded: A 31-day average of daily pageviews to all main namespace pages on all wikis on which Files Uploaded have been placed (compiled once per day).  If fewer than 31 days are available, the average is calculated for the available days.
    • Unique pages with uploaded files: A count of all non-duplicate main namespace pages, on all wikis, on which Files Uploaded have been placed.  
    • Uploaded files in use: A count of the Files Uploaded that have been placed on at least one main namespace page on any wiki.

Does that cover it Leon?

MusikAnimal moved this task from Ready to In Development on the Community-Tech-Sprint board.

Does that cover it Leon?

Yes, looks good :) Thanks!

For two commons events (one with categories and participants, the other with just categories) I compared the event summary reported "Uploaded files in use", "Uploaded files" and "Unique pages with uploaded files" to my own database queries.

I also checked that the "Uploaded files" figure was the same as the number of files appearing the All edits reports, although the code change should not have touched that metric.

I also took a small en.wikivoyage event with 13 uploaded files and, using the All edits report, checked each file had no namespace links when I visited its respective https://en.wikivoyage.org/wiki/File: page (although this can be out of date if the page's cache has not been updated recently).

@MusikAnimal this looks good except that the gloss for Unique pages with uploaded files wasn't changed. Here is the correct gloss:

  • Unique pages with uploaded files: A count of all non-duplicate main namespace pages, on all wikis, on which Files Uploaded have been placed.  

OMG, I gave you the wrong copy. Argh. Now I'm sorry. Should be "Uploaded Files," not Files Uploaded!

  • Unique pages with uploaded files: A count of all non-duplicate main namespace pages, on all wikis, on which Uploaded Files have been placed. -

Approved and merged. Didn't seem like this needed QA really.