Page MenuHomePhabricator

Abuse filter analytics dashboard is broken
Closed, ResolvedPublic

Description

Publishing with Content Translation can be prevented by the multiple edit filters defined by each Wikipedia community.
An analytics dashboard was created to identify edit filters that frequently prevent translations to be published. This is useful to learn which kind of issues the translations are more prone to and improve the way those are dealt with in the tool.

Currently the dashboard is broken:

superset.wikimedia.org_superset_dashboard_cx-abuse-filter_(iPad Mini).png (1×2 px, 145 KB)

Event Timeline

Pginer-WMF triaged this task as Medium priority.Mar 3 2022, 12:03 PM
Pginer-WMF moved this task from Backlog to Priority Backlog on the Language-analytics board.

It looks like this error is caused by a permission error for the dataset used to create this dashboard : 'amire80.cx_abuse_filter_daily'

@Amire80 - Since you are the owner of the dataset, can you please try running the following command?:

hdfs dfs -chmod -R o+r /user/amire80/data/cx_abuse_filter_daily

This should provide users read access to the files in Hadoop and fix the Superset dashboard.

Update: @Amire80 was unable to fix. He tried running hdfs dfs -chmod -R o+r /user/amire80/data/cx_abuse_filter_daily to update the permissions and received the following error:

chmod: changing permissions of '/user/amire80/data/cx_abuse_filter_daily/year=2022/month=2/day=11': Permission denied. user=amire80 is not the owner of inode=/user/amire80/data/cx_abuse_filter_daily/year=2022/month=2/day=11

Reassigning to Data-Engineering to look into this.

odimitrijevic raised the priority of this task from Medium to High.Mar 9 2022, 7:06 PM
odimitrijevic moved this task from Incoming to Ops Week on the Data-Engineering board.
MNeisler added a subscriber: MNeisler.

Interesting, what is creating this data? I see that permissions on that table directory are:

drwxrwxr-x   - analytics wikidev          0 2022-01-02 03:00 /user/amire80/data/cx_abuse_filter_daily

How did this get owned by the analytics user in the first place?

@Ottomata

I was not involved in the creation of this dataset so I'm not sure.

@nshahquinn-wmf or @Amire80 Do you know the source of the amire80.cx_abuse_filter_daily dataset or how this got owned by the analytics user ?

@nshahquinn-wmf or @Amire80 Do you know the source of the amire80.cx_abuse_filter_daily dataset or how this got owned by the analytics user ?

No, I don't, unfortunately!

It actually comes from this ReportUpdater query!

Changing the table ownership to analytics:analytics-privatedata-users and restrict readership to group only as the data contains event related precise info (token for instance).
With that change the dashboard now works for me.

I expect the change to propagate to newly computed days and will monitor the next run of the query to confirm.

I'm worried that whatever is creating this data is not doing it as the correct user, and newly created data may continue to have this problem.

I'm worried that whatever is creating this data is not doing it as the correct user, and newly created data may continue to have this problem.

I double checked today, data generated this morning has correct owner and perms.