Page MenuHomePhabricator

[Spike] investigate prevalence of spiders within dashboard data
Closed, ResolvedPublic3 Estimated Story Points

Description

A research spike to see which data collection scripts tend to run into spiders, and can benefit most from automata removal.

Event Timeline

Ironholds claimed this task.
Ironholds raised the priority of this task from to Needs Triage.
Ironholds updated the task description. (Show Details)
Ironholds subscribed.

For @Tfinc and others:

This is now done (results attached). I looked at a variety of EL schemas across our four areas of Analysis focus, and never found an automata rate > 1%. This is likely to be because automata doesn't tend to accept JavaScript.

Obviously for the Hive-based data, for example, we'll have to do this detection, but it looks like for the vast majority of our data it's not necessary.

Deskana subscribed.