Page MenuHomePhabricator

Review and translate the Dutch National Library’s statistics procedures
Open, LowPublic2 Estimated Story Points


Review, translate and summarise the Dutch National Library’s statistics procedures into English, as we (Sandra and David) think they are of interest to other GLAMs and GLAM-wikimedians.

The purpose is also to compare them with what other GLAMs have reported as their needs in regards to a statistics solution for content on the Wikimedia platforms.

This ticket is part of WMIL's development of a prototype GLAM Statistics Dashboard. (WMIL doesn't use Phabricator so only my small advisory role in that project will be reflected here!)

Event Timeline

David_Haskiya_WMSE updated the task description. (Show Details)
David_Haskiya_WMSE set the point value for this task to 4.

Yes, very nice! I (obviously) support this idea, but you might indeed want to summarize it, because it's overdetailed at some points.

You might also want to include a link to Detecting Wikipedia articles strongly based on single library collections, as this is based on the work done here

Arne Wossink of the RKD did some extra work (=manual in Dutch) on further automating the work, so you might want to contact him to also have that text translated

David_Haskiya_WMSE renamed this task from Review and translate the Dutch National Library’s statistics procedures, to Review and translate the Dutch National Library’s statistics procedures.Jun 29 2020, 6:04 AM
David_Haskiya_WMSE updated the task description. (Show Details)
David_Haskiya_WMSE changed the point value for this task from 4 to 2.
SandraF_WMF added a comment.EditedJun 29 2020, 8:33 AM

I have just summarized Olaf's procedure for a report as part of my work at WMNL, and that has made me quite aware that Olaf has built his (really great!) procedure based on what currently *can* be measured with the available tools landscape.

It may also be very interesting to ask Olaf what he would *like* to measure (and what his management would like to know) if anything was possible and there were no technical limitations at all. This question may especially be interesting for metrics related to anything on Wikidata, as Wikidata GLAM-metrics needs have (as far as I know) not really systematically been inventorized yet.

OlafJanssen added a comment.EditedJul 1 2020, 2:23 PM

Re: It may also be very interesting to ask Olaf what he would *like* to measure

One thing immediately pops to mind: with BAGLAMa2 you measure the number of request of Wikipedia pages containing images from (for example) This is a nice indicator, but it does not tell me

  1. If a given page has actually been viewed by the user (for some duration) and
  2. if a KB-image in that page had been seen/viewed with some level of attention. This is especially true for images that appear in the lower parts of long pages, as most users don't scroll and read all the way down, making the likelyhood of 'bottom images' to be seen and interacted with quite low.

I wonder if it would be possible

  1. to detect the relative position of an image in a page (higher = more chanche of being seen by user)
  2. to detect that an image in a page has been interacted with / clicked on (eg from triggering the image preview like this

  1. convert this into some metric/indicator from which you can tell (with a bit more certainty than is currently possible from BAglama2) if an image has actually been viewed with some attention.

Next two things that might be improved:

  1. the current default External link search tool does not offer the possibility to filter by namespace.

There is however a (rather hidden) tool that can do that: and

It might be an idea to implement that namespace filtering into the default tool

  1. Also the current tool assumes http if no protocol is specified, it took me some time to find out that specifying https gives a very different result set
OlafJanssen added a comment.EditedJul 2 2020, 12:13 PM

Just for info: I just started work on 3 articles in English explaining the approach, tools and outcomes of my KB-statistics work. It'll be basically a written out & illustrated version of

The 3 articles will be published here on Github (so off-wiki) and will have a similar style as the article already there

I didn't yet put a deadline on completing the 3 articles, I'll work on them bit by bit over the next months or so.

@OlafJanssen Thank you for this excellent input! I'll take a look next week (when I have another few hours to put into this project). Also, since you're planning to translate what you've been doing into English I will certainly only make a very brief summary of it myself. I've created a first draft document trying to summarise requirements on metrics based on previous research. It's here (just ask for access and I'll give it to you) and your comments would be welcome.

I totally agree that the metric "views of Wikipedia articles in which a media file from our upload is present" is a weak and even potentially misleading metric. It should certainly not be "THE metric of success", but it's what can be measured so I guess that's why it has de-facto become that key metric. Number of actual media requests, from other Wikimedia projects and also external sources, would be a more correct key metric of use. Right now though I think the WMF Analytics team only supports measuring media requests in a rather simple way: file by file. So checking all media requests for a Commons Category is not possible, at least not easily so. Also, I'm unsure it's possible to discern and distinguish from where a media request to Commons is made from.

My old Europeana colleague James Morley did some experimenting with checking in-article usage stats and taking the placement of the image in the article into account (as you also suggest). He's made a writeup here,

David_Haskiya_WMSE triaged this task as Low priority.Jul 6 2020, 6:47 AM
David_Haskiya_WMSE moved this task from This week to Backlog on the User-David_Haskiya_WMSE board.