Page MenuHomePhabricator

List not removed from search index when un-published
Closed, ResolvedPublicBUG REPORT

Description

Expected:

  1. Create a list & mark it as published
  2. Visit /published-lists and confirm list is there
  3. Edit list and remove published mark
  4. Visit /published-lists and confirm list is not there

It may have been me doing really strange things (using admin rights to edit another user's list), but I had step 4 fail by continuing to show the list. I "fixed" this in prod by recreating the search index, but obviously this is not an ideal solution. I have a hunch that we have setup the search index to only show published lists (which is correct behavior) but that this same setup prevents the index from updating when a list is marked as private again because the updated record no longer matches the query of lists to index.

Event Timeline

bd808 triaged this task as Medium priority.Mar 15 2022, 10:19 PM
bd808 created this task.
bd808 changed the subtype of this task from "Task" to "Bug Report".
bd808 moved this task from Backlog to Groomed/Ready on the Toolhub board.

yes, I also think rebuilding the index every time a list is unpublished doesn't seem right. I looked it up and it does seem like something the snippet below can help us with:

POST /my-index-000001/_delete_by_query
{
  "query": {
    "match": {
      "user.id": "elkbee"
    }
  }
}

If this looks like it's a good solution to you, we can work on crafting an exact query that will suit our usecase and then find out how to send the desired query through elasticsearch_dsl_drf

I think that all that is needed to send the delete request to Elasticsearch is:

from django_elasticsearch_dsl.registries import registry

registry.delete(instance_of_ToolList_that_is_no_longer_public)

This would likely be called from a post_save signal handler attached to the ToolList model from somewhere inside our search app.

Change 771939 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[wikimedia/toolhub@main] api: delete list document from es index when list is unpublished

https://gerrit.wikimedia.org/r/771939

Change 771939 merged by jenkins-bot:

[wikimedia/toolhub@main] api: delete list document from es index when list is unpublished

https://gerrit.wikimedia.org/r/771939

Change 771939 merged by jenkins-bot:

[wikimedia/toolhub@main] api: delete list document from es index when list is unpublished

https://gerrit.wikimedia.org/r/771939

There is a bug in this implementation that @Raymond_Ndibe and I did not catch prior to merging. The bug is that an unhandled elasticsearch.helpers.errors.BulkIndexError exception will be raised by registry.delete(instance) when the given list (instance) is not currently in the index. This will happen for all unpublished lists except when their published status was just toggled. There is no good way to tell what has changed about a model in a post_save hook, so unconditionally catching the possible exception may be the best option here.

Change 772939 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[wikimedia/toolhub@main] bug(search): Guard against BulkIndexError when deleting List from index

https://gerrit.wikimedia.org/r/772939

Change 772939 merged by jenkins-bot:

[wikimedia/toolhub@main] search: Guard against BulkIndexError when deleting List from index

https://gerrit.wikimedia.org/r/772939

Change 786342 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/deployment-charts@master] toolhub: Bump container version to 2022-04-21-215651-production

https://gerrit.wikimedia.org/r/786342

Change 786342 merged by jenkins-bot:

[operations/deployment-charts@master] toolhub: Bump container version to 2022-04-21-215651-production

https://gerrit.wikimedia.org/r/786342