Page MenuHomePhabricator

Get data from Training Modules
Closed, ResolvedPublic


We conducted another thank-you banner campaign. In this campaign we used the training modules again. To analyse the data we need the tracking tables of the training modules.

relevant time frame of the data: 01/01/2019 - 17/01/2019
to whom: @GoranSMilovanovic (Please provide data via mail to keep data security standards:
by when: beginning next week would be awesome (23/01/2019)

@Ragesoss: Could you please provide us with the data again?

@Christine_Domgoergen_WMDE: FYI

Event Timeline

I've just pulled the data, and will send it by email shortly.

For future reference, here's the slightly updated code which I ran this time:

module_ids = [40001, 40002, 40003, 40004]

csv_data = [['username', 'training_module', 'last_slide_completed', 'module_completion_date', 'started_at', 'last_slide_completed_at']]

module_ids.each do |m_id|
  tm = TrainingModule.find(m_id)
  tmus = TrainingModulesUsers.where(training_module_id: m_id)
  tmus.each do |tmu|
    csv_data << [tmu.user&.username, tm.slug, tmu.last_slide_completed, tmu.completed_at, tmu.created_at, tmu.updated_at]
end'/home/ragesoss/wmde_training_data_2019-01.csv', 'wb') do |csv|
  csv_data.each do |line|
    csv << line

There are two entries in this data set with no username, which somehow got logged to user 0 — possibly from a user who had cookies disabled, or a similar issue — and which I've left in the results.

I have the dataset with me, so I guess we can close this ticket now.

@Ragesoss Thank You for explaining in detail how we can get the data and also thank you for providing it so rapidly!

@GoranSMilovanovic Before we close the ticket: Is it possible with the information from Regesoss to get the data for yourself or are there any additional rights, that you need. I would like to not burden Regesoss with all the data requests every time.

@Rageross @Stefan_Schneider_WMDE

Currently I have no idea on (a) where do the data live, and (b) what exactly is done do get the relevant dataset, but if @Ragesoss can provide a concise tutorial (a Google Hangouts session, say) - I am more than willing to learn.

Currently, the only way to pull that data is to have server access, and run the above ruby code from a server console. It's a quick task, so I think it'll be easier to keep doing it this way for now, but at some point we could build an easier way to generate this data on demand.

Ruby? No, thanks - R, Python, SQL, Spark whatever, but not another programing language on my agenda.

@Ragesoss @GoranSMilovanovic Cool! Thanks for your feedback. Then we will keep it this way for now and I will close the ticket. Thx for providing the data again.