Page MenuHomePhabricator

(subtask) Dan's standard metrics
Closed, ResolvedPublic13 Story Points

Description

from https://meta.wikimedia.org/wiki/Research:Standard_metrics

2.1 Newly registered user
2.2 New editor
2.4 Surviving new editor
3.2.4 Daily unique page creators
3.2.5 Daily unique media creators
4.5 Daily pages created
4.6 Daily media created

3.1.1 Rolling active editor
3.1.2 Rolling new active editor
3.1.3 Rolling surviving new active editor
3.1.4 Rolling recurring old active editor
3.1.5 Rolling re-activated editor

Event Timeline

Milimetric updated the task description. (Show Details)Nov 4 2016, 3:55 PM
Nuria set the point value for this task to 13.Nov 10 2016, 4:58 PM
Milimetric added a comment.EditedNov 14 2016, 9:01 PM

Note: One user, id 3164 in simplewiki, has two newusers records in the logging table. So the user history reconstruction goes backwards and first finds "newusers/autocreate", logs that as the creation event, and then finds "newusers/create" and discards that event. Therefore the user is considered as self-created in mediawiki but auto-created in the reconstructed history. The same user, id 3164 in simplewiki, has a different name if we parse it out of the log_title vs. if we parse it out of log_comment.

However, we decided not to work on it. The user has a corrupted history which should really be fixed in mediawiki. If someone can explain why this is a problem, the fix seems simpler to me in mediawiki than in our algorithm (and more universally beneficial).

This comment was removed by Milimetric.

Weird thing found: User Rohini in simplewiki, and Brookie for example, have this weird pattern:

  • create user
  • create another user with similar name
  • "usurp" event where the first created user is renamed to something like "User (usurped)" or "User (usurped)~simplewiki"

The create event for the second creation doesn't exist in the logging table. The create event for the first creation exists, but under our algorithm it's linked to the " (usurped)" user. This is probably not accurately reflecting reality but it's a very small number of people that are in this kind of situation (115 out of 500k+ users on simplewiki). It's also not obvious that the creation event should be linked to the second user.

Another problem that we should fix. Archive records with ar_page null are not generating revision / create history events. They must be discarded somewhere in the pipeline and this is messing with early edit metrics. We should include them even if we don't have page information. An example:

 select event_entity,
        event_type,
        event_timestamp,
        event_user_creation_timestamp,
        event_user_id,
        event_user_is_created_by_self

   from mediawiki_history
  where wiki_db = 'simplewiki'
    and event_user_id in (3173, 3176, 3224, 3259, 3263, 3283, 3307)
    and substring(event_timestamp, 0, 6) = '200604'
  order by event_entity, event_type, event_user_id, event_timestamp
  limit 1000
;

and the records it should show:

select * from archive where left(ar_timestamp, 6) = '200604' and ar_user  in (3173, 3176, 3224, 3259, 3263, 3283, 3307);
mforns moved this task from In Progress to Done on the Analytics-Kanban board.Nov 28 2016, 12:25 PM
Nuria closed this task as Resolved.Dec 16 2016, 6:02 PM