Page MenuHomePhabricator

Understanding first day: activate for Vietnamese Wikipedia
Closed, ResolvedPublic

Description

We are going to start recording data from Vietnamese Wikipedia into the EditorJourney schema, like we have done for Czech and Korean Wikipedias.

In order to start doing this, @kostajh needs to know the ID numbers of namespaces in Vietnamese Wikipedia that will be considered "sensitive" so that the URLs of page views in those namespaces can be obfuscated. This is the English version:

  • Article (0)
  • Article talk (1)
  • File (6)
  • File talk (7)
  • Portal (100)
  • Portal (101)
  • Draft (118)
  • Draft talk (119)

Here are the Vietnamese ones (that wiki doesn't have Draft ns.)

  • We need to find the equivalents to these in Vietnamese, which @Trizek-WMF can do. Maybe it will be helpful if @kostajh lists the ones we are using for Czech and Korean.
  • After @Trizek-WMF has those, @kostajh can turn this on. Let's set a tentative date of Thursday, January 17.
  • Then @nettrom_WMF and/or @MMiller_WMF can make sure that events are flowing correctly to the data lake with their obfuscations and everything.

Event Timeline

@Trizek-WMF -- thank you. Could you list the actual IDs of the namespaces that we should obfuscate?

They are the same as on your list. Other unusual namespaces are modules (828), utilities/gadgets (2300) and utilities/gadgets definition (2302).

They are the same as on your list.

@Trizek-WMF But not drafts, since that wiki doesn't have them. In summary it's:

  • Article (0)
  • Article talk (1)
  • File (6)
  • File talk (7)
  • Portal (100)
  • Portal (101)

Change 484289 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[operations/mediawiki-config@master] EditorJourney: Enable data collection for viwiki

https://gerrit.wikimedia.org/r/484289

@MMiller_WMF the patch to enable data collection is ready. Could we schedule it for tomorrow or Wednesday, in case any unforeseen problems arise, in which case Thursday would be the fallback date? If we wait until Thursday and there's an issue then we would have to wait until Monday to try again.

@kostajh -- let's schedule this for Wednesday.

I'm pinging the community about it.

let's schedule this for Wednesday.

Scheduled for the 10 AM PST SWAT window on Wednesday.

Change 484289 merged by jenkins-bot:
[operations/mediawiki-config@master] EditorJourney: Enable data collection for viwiki

https://gerrit.wikimedia.org/r/484289

Mentioned in SAL (#wikimedia-operations) [2019-01-16T17:18:16Z] <dcausse@deploy1001> Synchronized wmf-config/InitialiseSettings.php: EditorJourney: Enable data collection for viwiki T213348 (duration: 00m 52s)

I verified that there is data flowing into the Data Lake also from Vietnamese Wikipedia, with 57 events currently recorded. Closing this ticket as this work is now completed.