Page MenuHomePhabricator

Make (redacted) log_search table available on Wiki Replicas
Open, MediumPublic

Description

Since gerrit 105443 - https://gerrit.wikimedia.org/r/#/c/105443/ the log_search table is used for the connection between log and protection. The table log_search is not visible on ToolLabs. Please make the table visible. Thanks.

If the table is blacklisted due to revision delete or oversight? Than just whitelist "ls_field = 'pr_id'".

Event Timeline

Umherirrender raised the priority of this task from to Needs Triage.
Umherirrender updated the task description. (Show Details)
Umherirrender subscribed.

Another solution is to skip the rows where the log_id is log_type = 'suppress' in the logging table.

scfc triaged this task as Medium priority.Apr 6 2015, 10:37 AM
scfc moved this task from Backlog to Ready to be worked on on the Toolforge board.
scfc subscribed.

Is this still needed?

I find it still a good idea to have these data.

Instead of skip log_type = 'suppress' there is a whitelist for log_type to be shown on labs
The pr_id part needs still be done

jcrespo added subscribers: Bstorm, Bawolff, jcrespo.

log_search is marked as a private table. We need 3 things here:

  • Security and/or the owner of that functionality to clarify why that table is not private
  • Cloud preparing the appropriate filtering
  • DBAs allowing replication to labs and copying existing tables

It is not clear to me how is in charge of that table, that should be clarified first- but being private right now means we cannot take this lightly. We (I) am ready to do the DBA part as soon as the private data status is clear and aproved.

I don't know if anyone really "owns" the table, so I'll put my MediaWiki-Platform-Team hat on and look at it.

Its purpose, as I understand it, is to be an indexed key-value store for associating miscellaneous data with log entries, generally used to find the log entry when you have some other ID. If the table is exposed at all, it should probably be filtered by a whitelist on ls_field with each value being individually reviewed before being added to the whitelist. We might also require that the corresponding logging table row (join on log_id = ls_log_id) is visible, and possibly also that it has log_deleted = 0 to avoid potential leaking of RevDel-hidden data via cross-reference with other tables.

Looking at ls_field = 'pr_id' that has been specifically requested here, that is a reference to the page_restrictions.pr_id associated with a protection log entry, to allow for finding the log entry that created a particular row in page_restrictions. In the web UI, this is used on Special:ProtectedPages to populate the protecting user and reason columns. I'd think it should be ok to expose the row in the Cloud replicas as long as the corresponding logging table row has (log_deleted & 1) = 0. Although Special:ProtectedPages itself doesn't include that check (maybe it should?).

I don't know if anyone really "owns" the table

s/own/has any idea what the table is/

If the table is exposed at all

Do you have any doubt? Could that have hashes of passwords of private ips? If things like that can clearly not going to be there, I can unlock replication, and later fine tune the whitelist.

I just found an old comment by @Bawolff :

log_search: The information contained within would probably be useful to tool labs users, however it might be complicated to expose safely

Basically, I want to be 100% (or 99.999% sure) we are doing things right and asking several people for their opinion.

I can't see why there'd be password hashes in the table.

ls_value will contain IPs for rows with ls_field = 'target_author_ip'. Specifically, these rows will be associated with log entries for revision-deletions to allow for searching for revision-deletions of edits/logs attributed to a particular user.

On metawiki I see rows with ls_field = 'oldname' that look like they're holding the pre-rename name of a user to allow finding rename log entries.

The other ls_field values I see on enwiki, wikidatawiki, and metawiki seem like they'll mostly contain integers referencing some table's PK. A few hold timestamps, one holds OAuth consumer key hashes, and there's one that seems to correspond to values of change_tag.ct_tag.

OTOH, it's hard to say what someone might decide to add in the future. In general, anything where someone says "I have identifier X and I want to find log entries directly related to that identifier" seems fair game.

I agree with anomie's assesment. If ls_field is on a whitelist and we are sure that ls_log_id references something on the log_type whitelist and is not revdel'ed it should be fine to expose

Thank you very much, I will now apply my changes and when I am done, move it to cloud.

jcrespo moved this task from Backlog to In progress on the DBA board.
jcrespo moved this task from In progress to Pending comment on the DBA board.
bd808 renamed this task from Make (redacted) log_search table available on ToolLabs to Make (redacted) log_search table available on Wiki Replicas.Feb 19 2020, 8:31 PM
LSobanski subscribed.

We need to check what has been done here and what actions are pending.

Found this after my curiosity made me check if log_search is available on the Wiki replicas. A couple of fields that might be considered non-private are:

  1. Revision IDs of edits that have been thanked. For those, ls_field = "thankid", with ls_value set to rev-{revid} reflecting the revision of the edit that was thanked. The logging table only stores the actor and recipient of the thank. Adding this would enable things like aggregation of Thanks usage based on characteristics of the edit made.
  2. The block ID of a block. These have ls_field = "ipb_id", with ls_value set to the ID of the block. This would enable looking up the relevant information about the block in the logging table.