Page MenuHomePhabricator

Find out if Kibana/Elasticsearch allows queries based on the results of other queries
Closed, ResolvedPublic

Description

When going to https://wikimedia.biterg.io/goto/e20a34c2ae32075d3cab7851532fb20b (new contributors in Gerrit in the last 60 days), we have

  • a widget that shows a list of "New Authors" who joined Gerrit within the last 60 days, and
  • a widget that shows a list of all changesets ("gerrit_all_changesets") which were created within the last 60 days.

What I'd love to have however is a list that only displays changesets created by those "New Authors". (Currently you either have to filter by each author separately, or use all 'new author' names to construct a custom author_name:"Alice" OR author_name:"Bob" query.)

I'm pretty sure I have to create a new widget/visualization and a new query for this (and that we cannot use the existing "gerrit_all_changesets" here), but I do not know how yet.

Wondering if Bool Queries or Range Queries might be related. Or if I'm completely off.

Event Timeline

EBernhardson added a subscriber: EBernhardson.EditedMar 16 2018, 7:39 PM

I think the problem here is going to be the data is stored in separate indices. The linked dashboard uses the gerrit_demo index which afaict only contains events for newly seen patch authors? The actual changesets are in a separate index called gerrit. Elasticsearch doesn't do joins of any kind, so there is no direct way to query this data. What I would typically suggest here is augmenting the data ingestion pipeline to include some sort of marker in the gerrit index that can be filtered on. Essentially it means pre-joining (denormalizing) the timestamp markers into the documents.

@EBernhardson: That makes sense - thanks a lot for your fast comment here (was thinking about CC'ing you on this task but first wanted to try digging myself)!

Indeed, the gerrit_demo index has the date fields first and first_all (and an internal _id field which is unfortunately not the Change-Id in Gerrit). And those two fields first and first_all do not exist in the gerrit index (note to myself: githash in the gerrit index is the Change-Id in Gerrit actually).

Hmm, so a potential approach might be merging the fields of the index gerrit_demo into the standard gerrit index once the C_Gerrit_Demo dashboard (it says "Demo" for a reason) gets productized at some point by Bitergia (who host that service and website for us) in T151161.
I'll ask Bitergia about their implementation plans in our (non-public) upstream support ticket at https://gitlab.com/Bitergia/c/Wikimedia/support/issues/16

So the current workaround (until these two indices are merged) to get the list of all changesets in Gerrit by new authors displayed on C_Gerrit_Demo is to

  • go to https://wikimedia.biterg.io/app/kibana#/dashboard/C_Gerrit_Demo
  • get all author names in the "New Authors" widget, either by pressing the Ctrl key in Firefox and marking the column with author names, or by installing some add-on in Chromium (untested), or by exporting to CSV via the link at the bottom of that widget and then removing all the other columns,
  • constructing a query from those author names; format is author_name:"ABC" OR author_name:"XYZ"
  • enter that query into the text search field on top by replacing the *
  • looking at the "gerrit_backlog_table" widget (for open changesets in Gerrit) or the "gerrit_all_changesets" widget (for any changesets in Gerrit)

Nota bene: The results do not exclude bot accounts until these indices are merged, and that some authors might be shown twice.

Aklapper closed this task as Resolved.Mar 27 2018, 3:32 PM

Investigation done, hence closing this task. Again thanks to Erik for the help.

Summarized in the parent task at T188244#4084927.