Page MenuHomePhabricator

Review ORES extension for data leaks
Closed, ResolvedPublic

Description

In https://phabricator.wikimedia.org/T123795#2360919, @jcrespo said:

A last reminder is that creating new tables, as of now, is blocked by me (as in, has to have my OK) to assess potential data leaks due to labs filtering issues:
https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change

We'd like to have a review of the ORES extension completed so that we can be unblocked for deployment to fawiki.

Event Timeline

Please share the gerrit commit link with the update tables script.

(It doesn't need to be a gerrit link, but I suppose the code creating the tables is in some repo somewhere?)

@jcrespo This is the schema change. Do you need anything else? The gerrit patch is here but I need to update it but I won't add anything related to DB I'm not sure if I need to explicitly define new tables in mediawiki-config repo. Please correct me if I'm wrong.

I must note these two tables contains only public data and it would be great to have their replica in labs.

What is the privacy requirements of these score tables? Are they all public? Is their privacy dependent on the revision they refer to? What happens if the revision is hidden publicly? What if the revision is deleted? What if it is a revision of a non-public wiki?

This is all due to see if these tables should be made available on labs, not at all or only partially.

All of these data come from a public endpoint (the ORES service). So no private dependencies. We do not support private wikis (yet) because the service (ores.wm.o) won't work for private wikis, AFAIK there is no plan on supporting such wikis. Long story short, it's only another caching layer for the ores service.

About cases when an edit is being deleted, it doesn't store anything except revid and score that came from the ores service (most probably storing the score happen immediately after edits being made and in that case they are public in wiki). It would be good to keep score in such cases in the db for cases that an admin reviews special:deletedrevision. but if you think score of a deleted edit is a private datum. I can write triggers to remove them from DB once they got deleted (but I must note we would have bigger problems because ores.wm.o keeps the score in redis cache too so people can access private data by simply hitting the service directly)

jcrespo claimed this task.

With this information, this is ok to go. I also gave a quick look at the structure and it is ok.

Please keep me informed of changes related to the database, because is not unusual to have problems with new extensions when deployed at large for the first time.