Consider restricting the 'usage' to certain namespaces
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	JeanFred
	Aug 27 2020, 8:03 AM

Description

One of the four core-metrics of the Wikiloves tool suite, as documented in Tool:Wikiloves#Metrics is “Images used in the wikis How many of these uploads are in use in the wikis”.

This is achieved by checking whether the image is in the gil_to column in the globalimagelinks table (code).

This method will obviously considers all usages as equal, including transclusions in the Wikipedia: or Project: namespaces. If a competition would, for example, build in the Wikipedia: namespace galleries of all pictures uploaded (per monument/location/uploader/etc.), then this would lead to ~100% usage, which might be surprising.

Two questions:

Should the query discriminate on namespace when calculating the usage?
Which namespace should be used?

Some relevant previous work:

Glamorous will offer a tickbox “Namespaces − Show file usage in article namespace only”
Baglama will consider all namespaces

(Per its design, the wikiloves tool cannot make it a user-selected option.)

Event Timeline

JeanFred created this task.Aug 27 2020, 8:03 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 27 2020, 8:03 AM

This seems reasonable ; indeed WLE 2020 Ukraine achieving 100% usage is probably not what is expected :)

While I agree that the 100% example above is not what one might expect from the metric, I do want to point out that it _does_ follow the metric definition: whether the file is in use or not, which is a very clear-cut definition. So this is less an implementation issue than a definition issue.

We thus need a new definition, and personally I am uneasy about, essentially, defining which global use is worthy and which is not. It does sound dramatic, but it does boil down to that, and I do not think this sort of decision quite belongs in the hands of the implementer.

While it sounds simple (“only consider Article namespace, ie NS 0”), I also think it is more complicated than that. With rTHER, I have the configuration for 117 datasets on various wikis, including where the harvesting bot should find the monuments lists − arguably, usage in the monuments lists can be considered 'worthy'.
78 datasets are indeed defined for NS 0¹ only ; yet others have a variety of namespaces, here’s the breakdown:

 1 [0,100]
 1 [0,2]
 8 [0,4]
 1 [102]
11 [104]
 1 [118]
 7 [4]

Now, I have no idea what are these namespaces − turns out:

104 is the “Page” Wikisource NS, and also the “Anexo” NS on es.wp.
108 is the Page NS on it.ws.
100 is the Portal NS on dewiki (probably 'unworthy'?) but also the “Appendix” NS on en.wikt (probably 'worthy'?)

Maybe I am making things more complicated than they need to be − most 'worthy usage' /is/ probably in NS0, and it’s clear that User: namespace should not be considered 'worthy'. But the question is more complicated than it seems, obvious answers turn out not to be so obvious, and I don’t necessarily want to be the one imposing my views. :-)

¹ My jq is rusty but this worked:

cat *.json | jq -c '.table, .namespaces'| grep -B 1 '\[0\]' | grep monuments | wc -l

Personally, I would just exclude a few of the namespaces or fork them into a seperate percentage (i.e. % used on Project and User pages): Project space (which is 4?) and user space (which is 2?) Neither of them is supposed to be used for "content" in the browsable sense -- for the GLAM tools, its less of a big deal that they track these secondary uses because the high level metric that most of them want to report is pageviews - but in this case, the usage rate is way more important because of the metric is an indicator of organizing success.

JeanFred added a subscriber: Effeietsanders.Aug 28 2020, 1:29 PM

Consider restricting the 'usage' to certain namespacesOpen, Needs TriagePublicActions

Description

Event Timeline

Consider restricting the 'usage' to certain namespaces
Open, Needs TriagePublic
Actions