Page MenuHomePhabricator

AbuseFilter's filters could be wiki pages
Open, Needs TriagePublic

Description

Problem
AbuseFilter allows privileged users to create filters that execute on each edit. AbuseFilter duplicates a lot of the functionality of wiki pages, including revisions ( history / change logs / recent changes / diffs ), discussions (notes), and import/export
but, is also missing some of the functionality that wiki pages have that filters could benefit from, such as open talk pages (T95737), ability to watchlist filters (T62588), "view changes" before saving, and partial blocks

Proposed Solution
Since everything is a wiki page, filters should also be a wiki page. This would remove a significant amount of code from AbuseFilter, and give filters additional functionality.

The filters should be stored with custom extension (.filter ?) and content model.

AbuseFilter should use Multi-Content-Revisions to store the filter code in the main slot, and in metadata (serialized as JSON) in another slot.

Lastly, AbuseFilter should implement the userCan hook to control access to the filters (i.e. if a filter is marked as Private, it should prevent read for all unprivileged users of both the content and the discussion page). The private filters will also need to be filtered out of the database replicas (that are available on Toolforge and as dumps).

Current status: nice to have idea, but not feasible in the short term due to inability to restrict viewing of private filters

Event Timeline

This would also allow T62588 - should that be merged? I suggested this as a potential route there:

@Daimona a potential, though potentially controversial, solution would be to make AbuseFilter a namespace, with the "notes" section being the talk page. Editing in that namespace would require the current rights to edit abusefilters (like the edit requirements imposed on the MediaWiki, Gadget, Gadget-definition, etc. namespaces). Thus, they would no longer be "special pages" in the sense for watching changes, and would have talk pages. There is probably a more elegant solution though.

This would also allow T62588 - should that be merged? I suggested this as a potential route there:

Yes because you could watch the filters (and their talk pages) on your standard watch list.

@Daimona a potential, though potentially controversial, solution would be to make AbuseFilter a namespace, with the "notes" section being the talk page. Editing in that namespace would require the current rights to edit abusefilters (like the edit requirements imposed on the MediaWiki, Gadget, Gadget-definition, etc. namespaces). Thus, they would no longer be "special pages" in the sense for watching changes, and would have talk pages. There is probably a more elegant solution though.

exactly! :)

Points to discuss:

  • How will "notes" be implement: using the talk page would be the easiest, but currently notes can only be edited by users who can edit filters, and using the talk page would either allow other users to edit notes or (by protecting the page to avoid this) defeat the purpose of the talk page. A potential route in linking each AbuseFilter to a Talk page (for general discussion, open to all) and a notes page (a 3rd extra namespace, but with no associated talk page). Example: wikinews links each article to a discussion page and an opinions page.
  • How should flags and tags be stored? Suggestion:s similar to ProofreadPage: Looking at the content of a proofread page page (example) shows the use of a custom tag: <pagequality level=\"3\" user=\"DannyS712\" />. Thus, something like <abusefilter flags=\"enabled, private\" actions=\"warn, tag\" message=\"abusefilter-warn\" tag=\"test edits, example tag\" /> could be used. Or, in JSON form, similar to MassMessage (example), or in separate slots.
  • How should filter descriptions vs ids be stored? Currently, filters are "named" based on a unique integer id (abuse filter 1, 2, etc) and then a text description is set. If the "pages" are now stored as AbuseFilter:1, AbuseFilter:2, etc with a description specified on the page, we lose the benefit of a namespace in terms of utilizing page names. However, if we use AbuseFilter:Test edits, AbuseFilter: LTA, etc then how are filter ids stored, and how are they connected to pages?
  • Content handler editor: This namespace will likely need a custom edit form (example: the custom edit form for mass message lists)
  • How will "notes" be implement

Yeah there are a lot of different options. You could make the notes just a piece of metadata and store it with all the other metadata. Or have two talk pages (like you mention). Another option would be to use a template like how Wikidata does.

  • How should flags and tags be stored?

I would use Multi-Content Revisions. So the metadata would be stored separately from the filter itself and can be serialized in whatever format you want (i.e. JSON) This is how structured data on commons works (which is probably overkill for what we need, but you can see how this works).

  • How should filter descriptions vs ids be stored?

As part of the migration, we could reverse the ids, so the filter id would be the page id. For instance AbuseFilter:Test edits might have a page id of 123456 so that would be the filter id. I think there are pros and cons to the id as the title (like Wikidata) and using the page id (like Structured Data on Commons)

  • Content handler editor

Yes. I imagine it would be a custom edit form like Wikidata or StructuredDataOnCommons

  • How will "notes" be implement

On second thought, I think it might be best if the notes were a 3rd slot on the filter. This way it could support wikitext. So you'd have three slots:

  • Conditions (Code)
  • Metadata (JSON)
  • Notes (wikitext)

As part of the migration, we could reverse the ids, so the filter id would be the page id. For instance AbuseFilter:Test edits might have a page id of 123456 so that would be the filter id. I think there are pros and cons to the id as the title (like Wikidata) and using the page id (like Structured Data on Commons)

What would happen to current links to filters though? [[Special:AbuseFilter/1]] should redirect to wherever the AbuseFilter is moved, and I think filter ids should continue to be sequential. Maybe store that as another part of the metadata, and redirect the Special:AbuseFilter links (so that current links still work)

What would happen to current links to filters though? [[Special:AbuseFilter/1]] should redirect to wherever the AbuseFilter is moved, and I think filter ids should continue to be sequential. Maybe store that as another part of the metadata, and redirect the Special:AbuseFilter links (so that current links still work)

What we could do is add the page id to the current table. That way filters will have both the filter id (that they currently have) and the new page id. Doing that would allow referencing them in either direction.

Well, this looks exiciting, and would definitely open the door to lots of requested features + other improvements. However, it'd be a big change.
Also, there are other things to pay attention to; for instance, don't let filters run when a filter is changed. And surely something else.
Tomorrow I'll try to give my answers to some of the questions above.

This is something we can try, although I think we should first focus on higher-priority stuff - currently, there are several refactorings to the AF backend.

Is a team planning to work on this?

Is a team planning to work on this?

Not that I'm aware of. I was talking with @Mooeypoo about AbuseFilter yesterday and I thought I should document some of the thoughts somewhere. :)

  • How will "notes" be implement

I like 3d slot + wikitext (T227595#5318493), which would give e.g. the possibility to sign with a button.

  • How should flags and tags be stored?

JSON looks great, we already do that for filter imports.

  • How should filter descriptions vs ids be stored?

! In T227595#5318444, @dbarratt wrote:
As part of the migration, we could reverse the ids, so the filter id would be the page id. For instance AbuseFilter:Test edits might have a page id of 123456 so that would be the filter id. I think there are pros and cons to the id as the title (like Wikidata) and using the page id (like Structured Data on Commons)

Please no. Every time I've heard something about filters, they tend to use the filter ID to refer to existing filters. Saying "filter 42" or even "filter 957" is good, but saying "filter 15765421" sucks. Especially if they're not consecutive. For the same reason, I'm not fully convinced by having page name == filter's name. Moreover, filter names are not unique, and it's not even guaranteed that all filters have a name (it became required only last year). Sure, we'd have the benefit of understanding immediately what a filter is. But I think the same goes for filter IDs, at least for people who manage AF daily. That said, I don't like "AbuseFilter: 1" etc., either. But I don't really have a better solution.

  • Content handler editor

I guess the existing code in AbuseFilterViewEdit can be easily adapted.


I also have another caveat: AF currently stores filter data in its own abuse_filter table, where values are inserted in different columns, and thus it's easier to query the table with several options. For instance, look at the search form on Special:AbuseFilter. Searching filters that are "non-deleted AND private AND disabled", and showing e.g. their hit count is almost instantaneous on a small-sized table (1000 rows on enwiki), but I feel it would become terrible to make the same query on whatever table the metadata gets inserted, especially if such metadata is stored in a non-atomic field.
And this stands unless we're able to turn the filter pages into wikipages while keeping the current DB structure, of course.


Is a team planning to work on this?

Not that I'm aware of. I was talking with @Mooeypoo about AbuseFilter yesterday and I thought I should document some of the thoughts somewhere. :)

Thanks :) I was just asking because, as I said above, this would be a pretty big change, and I'm unsure if I can do that alone as volunteer.

And this stands unless we're able to turn the filter pages into wikipages while keeping the current DB structure, of course.

I don't see why not. IMHO it's fine to store data in more than one place. Technically you only need to put the "queryable" fields in the existing table(s), but removing the non-queryable fields can be a task for later. :)

Based on rereading the comments above, and to summarize/clear up an miscommunications/highlight still-unresolved questions, as well as to list my own views/assumptions on undiscussed points (highlighted), the current general plan seems to be:

  • Namespaces: create 2 new namespaces, AbuseFilter: and AbuseFilter talk:
    • The 'AbuseFilter talk will be a normal talk namespace, without any special requirements, content models, revision slots, etc
    • The AbuseFilter namespace with be a custom namespace, with 3 slots per page:
      • The actual filter contents (written in AbuseFilter's own language but likely stored as a simple string)
      • A metadata section for filter metadata (throttle parameters, warnings, tags, and whether the filter is enabled, private, or deleted (assuming "deleted" filters should still be kept public; as an actual wikipage, they could be entirely deleted)) stored as JSON
      • A wikitext section for notes
        • Which supports automatic substitution of signatures (~~~~)
        • Categories? Templates?
    • The AbuseFilter: namespace will require the same permissions currently used
      • abusefilter-modify right is needed to edit, move, or create any filter
      • abusefilter-modify-restricted right is needed to affect any filter with restricted actions
      • abusefilter-modify-global right is needed to affect any filter with a global scope
      • abusefilter-view-private right is needed to view any filter that is marked as private
      • abusefilter-view right is needed to view any other (non-private) page
    • The AbuseFilter: namespace will not allow page protection
    • The AbuseFilter: namespace should not support changing the content model of pages, and pages in other namespaces should not be able to have the AbuseFilter's content model
  • Page and filter names and ids
    • [[Special:AbuseFilter/$1]], which currently links to the filter with filter id $1, will redirect to the AbuseFilter: page with the appropriate filter id
    • Filter ids will continue to be the primary key for AbuseFilters, and every new page created in the AbuseFilter: namespace will receive a filter id (as is done currently) in addition to a page id
    • Should AbuseFilter:$1 will refer to the filter with the description $1?
      • In which case filter descriptions must be unique
      • Filters either cannot be renamed once created, or redirect logic must be added (to avoid breaking existing links)
    • ...or to the filter with the id $1
      • In which case page names are not descriptive of their contents
    • Pages should not be able to be moved into or out of the AbuseFilter: namespace (similar to how file pages cannot be moved to mainspace)
  • A new content handler will be created (or the current editor will be adapted) which will have the same fields as the current editors
    • But, if filters are stored as AbuseFilter:Description, the description field may need to be removed, since editing the description would be equivalent to moving the page
  • Import/Export
    • The dedicated "export this filter" function will be removed in favor of using MediaWiki's own Special:Export
      • If "deleted" filters are still visible, they should be allowed to be exported
    • The dedicated "import filter" function may be removed in favor of using MediaWiki's own Special:Import
  • Schema
    • The abuse_filter_history table will be deprecated in favor of using the standard method of accessing page history (though this will slow down queries searching for the history of filters)
    • A new field will be added to the abuse_filter table to correspond to the pageid of the AbuseFilter: page where the filter is specified (if filters are stored as AbuseFilter:Foo rather than with the filter id being the page name)
dbarratt renamed this task from AbuseFilter's filters should be wiki pages to AbuseFilter's filters could be wiki pages.Jul 17 2019, 10:03 PM
dbarratt changed the task status from Open to Stalled.

@DannyS712 That all seems great to me! I didn't even think about the import/export and how MediaWiki's could be used instead. :)

Also, I think the main slot should be used for the filter code itself, since that is the subject of the wikipage.

@dbarratt: As you set the task status to stalled, who exactly / specifically are you waiting for for further input?

@dbarratt: As you set the task status to stalled, who exactly / specifically are you waiting for for further input?

There are a lot of discussion points in T227595#5342379 that are looking for additional feedback.

Aklapper changed the task status from Stalled to Open.Jul 18 2019, 12:03 AM

You might mix up that a task itself is stalled (I do not see that here) vs that implementing what's proposed/discussed is stalled. Maybe Proposal covers that?

You might mix up that a task itself is stalled (I do not see that here) vs that implementing what's proposed/discussed is stalled. Maybe Proposal covers that?

Probably. :)

Lastly, AbuseFilter should implement the userCanhook to control access to the filters (i.e. if a filter is marked as Private, it should prevent read for all unprivileged users of both the content and the discussion page). The private filters will also need to be filtered out of the database replicas (that are available on Toolforge and as dumps).

You would need to do a lot more than that to make access control work correctly

Lastly, AbuseFilter should implement the userCanhook to control access to the filters (i.e. if a filter is marked as Private, it should prevent read for all unprivileged users of both the content and the discussion page). The private filters will also need to be filtered out of the database replicas (that are available on Toolforge and as dumps).

You would need to do a lot more than that to make access control work correctly

2 things. First, I don't think that the discussion page should be hidden, but more importantly second, is there a better place for further discussion to be held? Should they be held here on the ticket, on mediawiki, or somewhere else entirely?

Lastly, AbuseFilter should implement the userCanhook to control access to the filters (i.e. if a filter is marked as Private, it should prevent read for all unprivileged users of both the content and the discussion page). The private filters will also need to be filtered out of the database replicas (that are available on Toolforge and as dumps).

You would need to do a lot more than that to make access control work correctly

Yeah, this. There's a reason AbuseFilter filters aren't wiki pages, because having private ones isn't possible in MediaWiki. And if you want public and private filters to have the same interface, then wikipages are unfortunately a non-starter.

! In T227595#5344447, @Legoktm wrote:
Yeah, this. There's a reason AbuseFilter filters aren't wiki pages, because having private ones isn't possible in MediaWiki.

Yeah, and there are other features that cannot be ported, like Special:AbuseLog into Special:Log (T21494).

Either way, for this specific task I can see benefits in implementing filters with some wikipage-like logic. And MCR would be perfect for that. Although,well, it's true that we cannot go too far from the current implementation.

Anyway, I believe this would be a nice-to-have for the future, but I'd recommend against doing it now. There's too much tech debt in the AF code base, and I'm currently trying to pay some of it (see e.g. r478232 and the ones down the dependency chain).

Yeah, this. There's a reason AbuseFilter filters aren't wiki pages, because having private ones isn't possible in MediaWiki.

I'm curious why that is? it seems like their are permission hooks to handle that? Is that not their purpose? Or would that mess with caching, etc.?

Partial read restrictions are rather poorly implemented in mediawiki. They prevent direct page views but there are lots of indirect ways to leak page contents. There has been little interest in fixing this in the past as its not something that wikimedia uses. There has been mixed interest from (corporate) third parties with some really wanting it but a minority viewing lack of functional read restrictions to be a killer feature (due to perverse incentives in the corporate environments those users use mediawiki in).

Anyways, I think there are good reasons to fix the way read restrictions work, even for wikimedia's use case (its security restricted but T160266 expands on that reasoning. Basically I believe that having a more consistent access system makes tricky loop holes less likely) I think the primary requirement is that the common case of no restriction must not have any (non trivial) additional performance overhead, especially in regards to how parser cache operates

wow thanks for the explanation. that's really helpful. I agree that there are use cases (like this one) where Wikimedia might benefit from better read restrictions.

Lastly, AbuseFilter should implement the userCanhook to control access to the filters (i.e. if a filter is marked as Private, it should prevent read for all unprivileged users of both the content and the discussion page). The private filters will also need to be filtered out of the database replicas (that are available on Toolforge and as dumps).

You would need to do a lot more than that to make access control work correctly

Yeah, this. There's a reason AbuseFilter filters aren't wiki pages, because having private ones isn't possible in MediaWiki. And if you want public and private filters to have the same interface, then wikipages are unfortunately a non-starter.

What about multi-content revisions where the main slot is publicly visible, but contains currently public data, while other slots are private and contain the actual contents of the filter. See T21005: "private" info that should not be + usability issue with filter detail - the public main slot could contain some of that information

Lastly, AbuseFilter should implement the userCanhook to control access to the filters (i.e. if a filter is marked as Private, it should prevent read for all unprivileged users of both the content and the discussion page). The private filters will also need to be filtered out of the database replicas (that are available on Toolforge and as dumps).

You would need to do a lot more than that to make access control work correctly

Yeah, this. There's a reason AbuseFilter filters aren't wiki pages, because having private ones isn't possible in MediaWiki. And if you want public and private filters to have the same interface, then wikipages are unfortunately a non-starter.

Partial read restrictions are rather poorly implemented in mediawiki. They prevent direct page views but there are lots of indirect ways to leak page contents. There has been little interest in fixing this in the past as its not something that wikimedia uses. There has been mixed interest from (corporate) third parties with some really wanting it but a minority viewing lack of functional read restrictions to be a killer feature (due to perverse incentives in the corporate environments those users use mediawiki in).

Anyways, I think there are good reasons to fix the way read restrictions work, even for wikimedia's use case (its security restricted but T160266 expands on that reasoning. Basically I believe that having a more consistent access system makes tricky loop holes less likely) I think the primary requirement is that the common case of no restriction must not have any (non trivial) additional performance overhead, especially in regards to how parser cache operates

So it seems that, at least currently, implementing read restrictions is not possible. So where does that leave this task?

  1. Stalled waiting for read restrictions to be implemented generally, with the understanding that without them AbuseFilters will never be wiki pages
  2. Declined due to lack of read restrictions, without prejudice to reopening the merged tasks for categories, edit summaries, talk pages, watchlisting, etc
  3. Rescoped to not rely on read restrictions by only converting part of the features to use the regular wiki page features
  4. Rescoped to investigating the feasibility of creating a wrapper/interface/something that extends a wiki page to allow read restrictions for AbuseFilter specifically
  5. Something else...

For what concerns myself, this task is

  1. Something else: something which could/would be nice to have, a possibly cool feature overall. But nothing that we should implement in the near term, due to: the huge amount of work needed; the challenges we'd find (like read restrictions); the other (serious) bugs which we should solve first.

In other words, it should have lowest priority.
And of course, when the moment will come, we'll probably have to do 4. and fix any other blocker (like read restrictions).

An alternative solution is make blobs of each AbuseFilter page encrypted with a specific per-filter key (stored in abuse filter table), so that any mechanisms (including extensions) that reads content objects directly can not expose private filters, and decryption of AbuseFilter content are handled in AbuseFilter extension with a permission check. As AbuseFilter rules are also stored in the table, Decryption will not be done in each edits.

When viewing an AbuseFilter note, the content will be generated (decrypted) on-the-fly (as AbuseFilter notes are not something frequently viewed; though they can still be cached), and no record will be kept in any link tables.

Daimona moved this task from Backlog to Stretch on the AbuseFilter (Overhaul-2020) board.
Daimona added subscribers: Base, Sunfyre.

Will be reconsidered once the architecture part is done.