Page MenuHomePhabricator

"Empty" facet values?
Closed, ResolvedPublicBUG REPORT

Description

From https://gerrit.wikimedia.org/r/c/wikimedia/toolhub/+/666524

Sometimes, I see empty values in the sidebar columns with a number shown next to them. Perhaps, we don't show empty values.

I have noticed this a few times too. I think it is actually a data cleaning problem on the backend, but I need to mess around a bit and catch a reproduction to be sure.

We need to catch this in a repeatable query so we can figure out if it is corrupt data, missing input validation, or a legitimate aggregation response that we should be dealing with differently in the UI layer.

Event Timeline

bd808 changed the subtype of this task from "Task" to "Bug Report".Mar 24 2021, 12:17 AM

This looks to be a "fun with Django" problem. Our toolhub.apps.toolinfo.models.Tool model contains multiple text/char fields which are marked with both blank=True meaning the the value is optional and null=True meaning that missing values should be stored in the database as NULL. The upstream documentation warns against this:

If a string-based field has null=True, that means it has two possible values for “no data”: NULL, and the empty string.

The "fun" comes when removing null=True and then interacting with the other representation we have for this model in toolhub.apps.search.documents.ToolDocument. The search index will happily store both blank strings and null/None values, but they are treated differently when searching. A blank string is treated as a present, but empty value. A null/None is treated as an absent value. Ideally we want the latter behavior for optional strings so that we can more easily run searches for them.

Poking around I think we have two possible fixes:

  • Use custom model fields which override get_db_prep_value to return None when the value is an empty string
  • Use custom document fields which override _serialize to return None when the value is an empty string

Change 675339 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):
[wikimedia/toolhub@main] db: Add support for forcing empty strings to be saved as null

https://gerrit.wikimedia.org/r/675339

Existing dbs can be "cleaned" with these manual SQL updates:

update toolinfo_tool set author = NULL where author = '';
update toolinfo_tool set repository = NULL where repository = '';
update toolinfo_tool set subtitle = NULL where subtitle = '';
update toolinfo_tool set openhub_id = NULL where openhub_id = '';
update toolinfo_tool set bot_username = NULL where bot_username = '';
update toolinfo_tool set replaced_by = NULL where replaced_by = '';
update toolinfo_tool set icon = NULL where icon = '';
update toolinfo_tool set license = NULL where license = '';
update toolinfo_tool set tool_type = NULL where tool_type = '';
update toolinfo_tool set api_url = NULL where api_url = '';
update toolinfo_tool set translate_url = NULL where translate_url = '';
update toolinfo_tool set bugtracker_url = NULL where bugtracker_url = '';
update toolinfo_tool set _schema = NULL where _schema = '';
update toolinfo_tool set _language = NULL where _language = '';

Change 675339 merged by jenkins-bot:

[wikimedia/toolhub@main] db: Add support for forcing empty strings to be saved as null

https://gerrit.wikimedia.org/r/675339

Change 678682 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[wikimedia/toolhub@main] api: Add support for forcing empty strings to be saved as null

https://gerrit.wikimedia.org/r/678682

Change 678682 merged by jenkins-bot:

[wikimedia/toolhub@main] api: Add support for forcing empty strings to be saved as null

https://gerrit.wikimedia.org/r/678682