Page MenuHomePhabricator

Find out if Kibana/Elasticsearch allows queries based on the results of other queries
Closed, ResolvedPublic

Description

When going to https://wikimedia.biterg.io/goto/e20a34c2ae32075d3cab7851532fb20b (new contributors in Gerrit in the last 60 days), we have

  • a widget that shows a list of "New Authors" who joined Gerrit within the last 60 days, and
  • a widget that shows a list of all changesets ("gerrit_all_changesets") which were created within the last 60 days.

What I'd love to have however is a list that only displays changesets created by those "New Authors". (Currently you either have to filter by each author separately, or use all 'new author' names to construct a custom author_name:"Alice" OR author_name:"Bob" query.)

I'm pretty sure I have to create a new widget/visualization and a new query for this (and that we cannot use the existing "gerrit_all_changesets" here), but I do not know how yet.

Wondering if Bool Queries or Range Queries might be related. Or if I'm completely off.

Event Timeline

I think the problem here is going to be the data is stored in separate indices. The linked dashboard uses the gerrit_demo index which afaict only contains events for newly seen patch authors? The actual changesets are in a separate index called gerrit. Elasticsearch doesn't do joins of any kind, so there is no direct way to query this data. What I would typically suggest here is augmenting the data ingestion pipeline to include some sort of marker in the gerrit index that can be filtered on. Essentially it means pre-joining (denormalizing) the timestamp markers into the documents.

@EBernhardson: That makes sense - thanks a lot for your fast comment here (was thinking about CC'ing you on this task but first wanted to try digging myself)!

Indeed, the gerrit_demo index has the date fields first and first_all (and an internal _id field which is unfortunately not the Change-Id in Gerrit). And those two fields first and first_all do not exist in the gerrit index (note to myself: githash in the gerrit index is the Change-Id in Gerrit actually).

Hmm, so a potential approach might be merging the fields of the index gerrit_demo into the standard gerrit index once the C_Gerrit_Demo dashboard (it says "Demo" for a reason) gets productized at some point by Bitergia (who host that service and website for us) in T151161.
I'll ask Bitergia about their implementation plans in our (non-public) upstream support ticket at https://gitlab.com/Bitergia/c/Wikimedia/support/issues/16

So the current workaround (until these two indices are merged) to get the list of all changesets in Gerrit by new authors displayed on C_Gerrit_Demo is to

  • go to https://wikimedia.biterg.io/app/kibana#/dashboard/C_Gerrit_Demo
  • get all author names in the "New Authors" widget, either by pressing the Ctrl key in Firefox and marking the column with author names, or by installing some add-on in Chromium (untested), or by exporting to CSV via the link at the bottom of that widget and then removing all the other columns,
  • constructing a query from those author names; format is author_name:"ABC" OR author_name:"XYZ"
  • enter that query into the text search field on top by replacing the *
  • looking at the "gerrit_backlog_table" widget (for open changesets in Gerrit) or the "gerrit_all_changesets" widget (for any changesets in Gerrit)

Nota bene: The results do not exclude bot accounts until these indices are merged, and that some authors might be shown twice.

Investigation done, hence closing this task. Again thanks to Erik for the help.

Summarized in the parent task at T188244#4084927.