Page MenuHomePhabricator

Historic revisions of toolinfo records with "author" changes fail after schema changes
Closed, ResolvedPublicBUG REPORT

Description

The schema changes for T293565: Allow multiple authors in toolinfo.json cause deserialization errors when attempting to view historic revisions which include modifications to the "author" field as single string.

web_1     | 2022-02-07T23:02:43Z [e2074d6d95d44430b9daff3177a582f9] django.request ERROR: Internal Server Error: /api/tools/mediawiki/revisions/1020/
web_1     | Traceback (most recent call last):
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/jsonfield/fields.py", line 46, in to_python
web_1     |     return checked_loads(value, **self.load_kwargs)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/jsonfield/json.py", line 18, in checked_loads
web_1     |     value = json.loads(value, **kwargs)
web_1     |   File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
web_1     |     return _default_decoder.decode(s)
web_1     |   File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
web_1     |     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
web_1     |   File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
web_1     |     raise JSONDecodeError("Expecting value", s, err.value) from None
web_1     | json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
web_1     |
web_1     | During handling of the above exception, another exception occurred:
web_1     |
web_1     | Traceback (most recent call last):
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/django/core/serializers/python.py", line 143, in Deserializer
web_1     |     data[field.name] = field.to_python(field_value)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/jsonfield/fields.py", line 48, in to_python
web_1     |     raise ValidationError(_("Enter valid JSON."))
web_1     | django.core.exceptions.ValidationError: ['Enter valid JSON.']
web_1     |
web_1     | During handling of the above exception, another exception occurred:
web_1     |
web_1     | Traceback (most recent call last):
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/reversion/models.py", line 266, in _object_version
web_1     |     use_natural_foreign_keys=version_options.use_natural_foreign_keys))[0]
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/django/core/serializers/json.py", line 69, in Deserializer
web_1     |     yield from PythonDeserializer(objects, **options)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/django/core/serializers/python.py", line 145, in Deserializer
web_1     |     raise base.DeserializationError.WithData(e, d['model'], d.get('pk'), field_value)
web_1     | django.core.serializers.base.DeserializationError: ['Enter valid JSON.']: (toolinfo.tool:pk=782) field_value was 'Magnus Manske, Brion Vibber, Lee Daniel Crocker, Tim Starling, et al'
web_1     |
web_1     | During handling of the above exception, another exception occurred:
web_1     |
web_1     | Traceback (most recent call last):
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/django/core/handlers/exception.py", line 34, in inner
web_1     |     response = get_response(request)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/django/core/handlers/base.py", line 115, in _get_response
web_1     |     response = self.process_exception_by_middleware(e, request)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/django/core/handlers/base.py", line 113, in _get_response
web_1     |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
web_1     |     return view_func(*args, **kwargs)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/rest_framework/viewsets.py", line 125, in view
web_1     |     return self.dispatch(request, *args, **kwargs)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/rest_framework/views.py", line 509, in dispatch
web_1     |     response = self.handle_exception(exc)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/rest_framework/views.py", line 469, in handle_exception
web_1     |     self.raise_uncaught_exception(exc)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
web_1     |     raise exc
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/rest_framework/views.py", line 506, in dispatch
web_1     |     response = handler(request, *args, **kwargs)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/drf_spectacular/drainage.py", line 144, in wrapped_method
web_1     |     return method(self, request, *args, **kwargs)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/rest_framework/mixins.py", line 56, in retrieve
web_1     |     return Response(serializer.data)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/rest_framework/serializers.py", line 555, in data
web_1     |     ret = super().data
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/rest_framework/serializers.py", line 253, in data
web_1     |     self._data = self.to_representation(self.instance)
web_1     |   File "/srv/app/toolhub/apps/toolinfo/serializers.py", line 249, in to_representation
web_1     |     ret = super().to_representation(instance)
web_1     |   File "/srv/app/toolhub/apps/versioned/serializers.py", line 80, in to_representation
web_1     |     ret = super().to_representation(instance)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/rest_framework/serializers.py", line 509, in to_representation
web_1     |     attribute = field.get_attribute(instance)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/rest_framework/fields.py", line 457, in get_attribute
web_1     |     return get_attribute(instance, self.source_attrs)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/rest_framework/fields.py", line 97, in get_attribute
web_1     |     instance = getattr(instance, attr)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/django/utils/functional.py", line 80, in __get__
web_1     |     res = instance.__dict__[self.name] = self.func(instance)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/reversion/models.py", line 308, in field_dict
web_1     |     field_dict = self._local_field_dict
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/django/utils/functional.py", line 80, in __get__
web_1     |     res = instance.__dict__[self.name] = self.func(instance)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/reversion/models.py", line 286, in _local_field_dict
web_1     |     object_version = self._object_version
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/django/utils/functional.py", line 80, in __get__
web_1     |     res = instance.__dict__[self.name] = self.func(instance)
web_1     |   File "/opt/lib/poetry/toolhub-2uZo5AhP-py3.7/lib/python3.7/site-packages/reversion/models.py", line 269, in _object_version
web_1     |     "object_repr": self.object_repr,
web_1     | reversion.errors.RevertError: Could not load mediawiki version - incompatible version data.

Event Timeline

One way to fix this would be with a Django migration that edits the serialized data inside Version models much like the migration that is converting the existing Tool records to the new schema. It feels kind of gross to me to be editing historic revision information, but I don't think that we should really need to do this very often. Changing the type of an existing model field should be a very rare action.

One way to fix this would be with a Django migration that edits the serialized data inside Version models much like the migration that is converting the existing Tool records to the new schema. It feels kind of gross to me to be editing historic revision information, but I don't think that we should really need to do this very often. Changing the type of an existing model field should be a very rare action.

This makes sense. Like you said this is not much of a problem since we only need to do is once. And this unique situation is as a result of the way we handle certain fields (certain fields accept json blob within which there are several json key,value pairs)

bd808 changed the task status from Open to In Progress.Feb 8 2022, 9:06 PM
bd808 claimed this task.
bd808 moved this task from Backlog to In Progress on the Toolhub board.

The migration approach turns out to be trickier than it seemed like it should be. The challenge arises in how Django's migration system works and specifically how importing a model directly is not encouraged due to the need to use model instances which match the migration state. I ended up needing to manually create the equivalent QuerySet to what Version.objects.get_for_model(Tool) would normally create which looks something like:

ContentType = apps.get_model("contenttypes", "ContentType")
tool_ct = ContentType.objects.get(app_label="toolinfo", model="Tool")
Version = apps.get_model("reversion", "Version")
qs = Version.objects.filter(content_type=tool_ct)
for vers in qs.all():
    ...

Change 758001 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[wikimedia/toolhub@main] api: Convert toolinfo author from string to array of objects

https://gerrit.wikimedia.org/r/758001

Change 758001 merged by jenkins-bot:

[wikimedia/toolhub@main] api: Convert toolinfo author from string to array of objects

https://gerrit.wikimedia.org/r/758001

bd808 changed the task status from In Progress to Open.Feb 8 2022, 11:45 PM
bd808 moved this task from In Progress to Review on the Toolhub board.

Change 770638 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/deployment-charts@master] toolhub: Bump container version to 2022-03-15-002555-production

https://gerrit.wikimedia.org/r/770638

Change 770638 merged by jenkins-bot:

[operations/deployment-charts@master] toolhub: Bump container version to 2022-03-15-002555-production

https://gerrit.wikimedia.org/r/770638