Page MenuHomePhabricator

The abuselog query API should allow filtering out some fields from details
Closed, DuplicatePublic

Description

For a typical abuse filter on English Wikipedia (these numbers are for filter 614 which targets meme vandalism), just 1,000 log entries consumes about 170MB of storage and most of that is due to just 3 properties under the details. Specifically:

  • The new_html property within details accounts for about 46% of the total size.
  • The old_wikitext and new_wikitext properties each account for about 20% to 21% of the total size.

But if the analysis the user is doing doesn't require new_html, the bandwidth and storage can be cut almost in half. If both old_wikitext and new_wikitext are also not needed, then required size is only about 5% of the full size.

Therefore, it would be great if there was some way to query details and leave out specific properties (i.e., key/value pairs in the JSON) under details that aren't needed for the filter being debugged, analyzed, etc. Allowing some properties to be left out would greatly reduce the bandwidth required for log queries and it would also make it easier (although not guaranteed by any stretch) for log queries to stay below the maximum response size. This would help improve the performance of tools the edit filter team uses for debugging filters, analyzing false positives, etc.