Page MenuHomePhabricator

rev_user and rev_user_text == NULL in wmf_raw.mediawiki_revision
Closed, ResolvedPublic

Description

I've been trying to get some user information from the revisions tables. I have tried different queries, but I'm always getting Null values both for rev_user and rev_user_text

For example, this query:

hive (wmf_raw)> SELECT DISTINCT rev_user FROM mediawiki_revision WHERE snapshot = '2020-02' AND wiki_db = 'enwiki';

returns

rev_user
NULL

Is this a bug or Am I doing something wrong?

Event Timeline

diego created this task.Jun 9 2020, 2:35 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 9 2020, 2:35 AM
diego renamed this task from rev_user and rev_user_text == NULL in wmf_raw.mediawiki_revision & to rev_user and rev_user_text == NULL in wmf_raw.mediawiki_revision.Jun 9 2020, 3:17 AM
diego updated the task description. (Show Details)
ashley added a subscriber: ashley.Jun 9 2020, 8:22 AM

Can't speak for the WMF's analytics setup etc. but in general mw:Actor migration is a thing, so yes, attempting to use rev_user and/or rev_user_text would indeed result in NULLs. You'll need to JOIN the actor table (and for the time being, the revision_actor_temp table as well).

I confirm @ashley's point. You should use actor_id to join to the actor table, or use wmf.mediawiki_history where that join is already done.

diego added a comment.Jun 9 2020, 9:59 PM

Thanks @ashley and @JAllemandou . I've solved the problem using wmf.mediawiki_history, so no emergencies from my side. However, is that the expected behavior for wmf_raw.mediawiki_revision? And in the case that it is, for the records, could you please provide an example of the join you are suggesting, I don't see an actor table in wmf_raw:

hive (wmf_raw)> show tables;
OK
tab_name
apiaction
cirrussearchrequestset
mediawiki_archive
mediawiki_change_tag
mediawiki_change_tag_def
mediawiki_content
mediawiki_content_models
mediawiki_imagelinks
mediawiki_ipblocks
mediawiki_ipblocks_restrictions
mediawiki_logging
mediawiki_page
mediawiki_page_restrictions
mediawiki_pagelinks
mediawiki_private_actor
mediawiki_private_comment
mediawiki_private_cu_changes
mediawiki_project_namespace_map
mediawiki_redirect
mediawiki_revision
mediawiki_slot_roles
mediawiki_slots
mediawiki_user
mediawiki_user_groups
mediawiki_wbc_entity_usage
webrequest
webrequest_sequence_stats
webrequest_sequence_stats_hourly
wikibase_wbt_item_terms
wikibase_wbt_property_terms
wikibase_wbt_term_in_lang
wikibase_wbt_text
wikibase_wbt_text_in_lang
wikibase_wbt_type

@diego : The actor table is in the above list: mediawiki_private_actor.
The table is named private as it comes from the production DB. The labsDB actor view is using joins preventing us to sqoop the data in a timely manner.

diego closed this task as Resolved.Jun 10 2020, 3:11 PM