Page MenuHomePhabricator

Create metrics logging for Wikispeech
Closed, ResolvedPublic

Description

Log each request to wikispeech-listen to a new database table containing:

  • timestamp
  • segment index of page
  • segment hash
  • consumer url
  • page
  • time spent
    • synthesizing utterance
    • retrieving utterance from store
  • number of characters in segment
  • length in time of utterance speech

Event Timeline

Change 704802 had a related patch set uploaded (by Karl Wettin (WMSE); author: Karl Wettin (WMSE)):

[mediawiki/extensions/Wikispeech@master] [WIP] Log metrics from ApiWikispeechListen

https://gerrit.wikimedia.org/r/704802

Change 708543 had a related patch set uploaded (by Karl Wettin (WMSE); author: Karl Wettin (WMSE)):

[mediawiki/extensions/Wikispeech@master] [WIP] Database metrics journal for ApiWikispeechListen

https://gerrit.wikimedia.org/r/708543

Change 704802 merged by jenkins-bot:

[mediawiki/extensions/Wikispeech@master] Log metrics from ApiWikispeechListen

https://gerrit.wikimedia.org/r/704802

@Sebastian_Berlin-WMSE (and @kalle ) is there any documentation about: 1) What these stats will be used for (i.e. why we are collecting them), 2) How they will be used (i.e. automated reports or regularly checking in on them), 3) An argument for why this doesn't reveal any info relating to any individual using Wikispeech.

Change 708543 had a related patch set uploaded (by Karl Wettin (WMSE); author: Karl Wettin (WMSE)):

[mediawiki/extensions/Wikispeech@master] [WIP] Database metrics journal for ApiWikispeechListen

https://gerrit.wikimedia.org/r/708543

@kalle Should this one be abandoned?

I couldn't find any documents specifically about this.

  1. Some of this was specified in T140359 as part of the original project (as nice to have). I believe that this would be used for estimation server costs (2021-05-20 - Pris för drift av Wikispeech och Speechoid). I think @Jopparn wanted to have some numbers before proceeding with roll out of Wikispeech and actively inviting more users (correct me if I'm wrong). Also so we can see and tell others how Wikispeech is used, e.g. "it's used X times per month for a total of Y hours of listening".
  2. As far as I know this hasn't been established. I've downloaded the logs myself and done some graphs by hand once or twice. I think we just wanted to make sure that the data was available so that we could use it later.
  3. No user specific information is logged, it's just info about requests. I don't think we actually store anything that we couldn't already get from the combined logs of the consumer wiki and Speechiod.

Change 708543 abandoned by Lokal Profil:

[mediawiki/extensions/Wikispeech@master] [WIP] Database metrics journal for ApiWikispeechListen

Reason:

Superseded by 704802

https://gerrit.wikimedia.org/r/708543