Page MenuHomePhabricator

Make (redacted) log_search table available on ToolLabs
Open, NormalPublic

Description

Since gerrit 105443 - https://gerrit.wikimedia.org/r/#/c/105443/ the log_search table is used for the connection between log and protection. The table log_search is not visible on ToolLabs. Please make the table visible. Thanks.

If the table is blacklisted due to revision delete or oversight? Than just whitelist "ls_field = 'pr_id'".

Event Timeline

Umherirrender updated the task description. (Show Details)
Umherirrender raised the priority of this task from to Needs Triage.
Umherirrender added a subscriber: Umherirrender.

Another solution is to skip the rows where the log_id is log_type = 'suppress' in the logging table.

scfc triaged this task as Normal priority.Apr 6 2015, 10:37 AM
scfc moved this task from Triage to Backlog on the Toolforge board.
scfc added a subscriber: scfc.

Is this still needed?

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.Nov 2 2017, 6:39 PM
1978Gage2001 moved this task from Triage to In progress on the DBA board.Dec 11 2017, 9:45 AM
Marostegui moved this task from In progress to Triage on the DBA board.Dec 11 2017, 11:07 AM

Is this still needed?

I find it still a good idea to have these data.

Instead of skip log_type = 'suppress' there is a whitelist for log_type to be shown on labs
The pr_id part needs still be done

jcrespo added subscribers: Bstorm, Bawolff, jcrespo.

log_search is marked as a private table. We need 3 things here:

  • Security and/or the owner of that functionality to clarify why that table is not private
  • Cloud preparing the appropriate filtering
  • DBAs allowing replication to labs and copying existing tables

It is not clear to me how is in charge of that table, that should be clarified first- but being private right now means we cannot take this lightly. We (I) am ready to do the DBA part as soon as the private data status is clear and aproved.

Anomie added a subscriber: Anomie.Jun 15 2018, 5:02 PM

I don't know if anyone really "owns" the table, so I'll put my MediaWiki-Platform-Team hat on and look at it.

Its purpose, as I understand it, is to be an indexed key-value store for associating miscellaneous data with log entries, generally used to find the log entry when you have some other ID. If the table is exposed at all, it should probably be filtered by a whitelist on ls_field with each value being individually reviewed before being added to the whitelist. We might also require that the corresponding logging table row (join on log_id = ls_log_id) is visible, and possibly also that it has log_deleted = 0 to avoid potential leaking of RevDel-hidden data via cross-reference with other tables.

Looking at ls_field = 'pr_id' that has been specifically requested here, that is a reference to the page_restrictions.pr_id associated with a protection log entry, to allow for finding the log entry that created a particular row in page_restrictions. In the web UI, this is used on Special:ProtectedPages to populate the protecting user and reason columns. I'd think it should be ok to expose the row in the Cloud replicas as long as the corresponding logging table row has (log_deleted & 1) = 0. Although Special:ProtectedPages itself doesn't include that check (maybe it should?).

I don't know if anyone really "owns" the table

s/own/has any idea what the table is/

If the table is exposed at all

Do you have any doubt? Could that have hashes of passwords of private ips? If things like that can clearly not going to be there, I can unlock replication, and later fine tune the whitelist.

I just found an old comment by @Bawolff :

log_search: The information contained within would probably be useful to tool labs users, however it might be complicated to expose safely

Basically, I want to be 100% (or 99.999% sure) we are doing things right and asking several people for their opinion.

I can't see why there'd be password hashes in the table.

ls_value will contain IPs for rows with ls_field = 'target_author_ip'. Specifically, these rows will be associated with log entries for revision-deletions to allow for searching for revision-deletions of edits/logs attributed to a particular user.

On metawiki I see rows with ls_field = 'oldname' that look like they're holding the pre-rename name of a user to allow finding rename log entries.

The other ls_field values I see on enwiki, wikidatawiki, and metawiki seem like they'll mostly contain integers referencing some table's PK. A few hold timestamps, one holds OAuth consumer key hashes, and there's one that seems to correspond to values of change_tag.ct_tag.

OTOH, it's hard to say what someone might decide to add in the future. In general, anything where someone says "I have identifier X and I want to find log entries directly related to that identifier" seems fair game.

I agree with anomie's assesment. If ls_field is on a whitelist and we are sure that ls_log_id references something on the log_type whitelist and is not revdel'ed it should be fine to expose

jcrespo claimed this task.Jun 15 2018, 5:44 PM

Thank you very much, I will now apply my changes and when I am done, move it to cloud.

Marostegui moved this task from Next to Backlog on the DBA board.Sep 12 2018, 5:23 AM
jcrespo moved this task from Backlog to In progress on the DBA board.Sep 18 2018, 11:07 AM
jcrespo moved this task from In progress to Next on the DBA board.
jcrespo removed jcrespo as the assignee of this task.