Page MenuHomePhabricator

Set up data storage to collect loosely structured data from CI
Open, NormalPublic

Description

The Release-Engineering-Team have decided to start collecting as much data as possible out of CI, Code Health, Incidents, etc. To that end, we need to provision some storage for this data.

We don't expect the data to be very large because we will filter and summarize up-front rather than storing bulk logs or artifacts. I would expect the data to be quite a bit less than 10 gigabytes.

After a couple of discussions within the team, we are leaning towards using ElasticSearch for the back-end since it has powerful query capabilities that will facilitate easy retrieval in a usable format for reporting and analysis.

Event Timeline

mmodell created this task.Dec 13 2018, 5:35 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptDec 13 2018, 5:35 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@EBernhardson: Would it be reasonable to store this data on the search cluster? We thought to ask for your blessing to do so, in order to avoid setting up a separate elasticsearch cluster for this tiny use-case. So I guess the question is whether you think it's reasonable and won't be a burden on the Discovery-Search team.

hashar added a subscriber: hashar.Dec 13 2018, 5:45 PM

For the CI logs and tests result, we have the old T78705 which has a bunch of context. It is a subset of this T211904 task.

EBjune added a subscriber: EBjune.Dec 18 2018, 6:57 PM

@mmodell EBernhardson is out through the end of the year and really needs to weigh in on this

@EBjune: Thanks for the heads-up. I definitely want EBernhardson to weigh in. This is just exploratory work and it can wait.

EBjune triaged this task as Normal priority.Jan 3 2019, 6:11 PM