Page MenuHomePhabricator

Updates to the section does not update the timestamp in cx_corpora table
Closed, ResolvedPublic

Description

Noticed while @Petar.petkovic reviewing https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/457410/

Here is the easy way to reproduce with going back to dashboard only once:

  1. Start fresh translation
  2. Translate one paragraph using Apertium and save ('Apertium' and 'source' entries are added to CX corpora and 20180911092929 is the timestamp)
  3. Switch to "Don't use MT" and save (entry for 'user' is added to CX corpora and timestamp is 20180911092949)
  4. Switch back to Apertium and save (previous 'Apertium' entry in the CX corpora does not update its timestamp, so 20180911092929 remains)
  5. Go back to dashboard and restore, to see blank section instead of one coming from Apertium

In 2016, to fix an issue related to database issue @Nikerabbit authored https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/273886. That patch intenationally avoided updating timestamps when translation units are updated.

That patch has a comment : "Preventing the duplicates in the first place is left as an follow-up". Looks like time for that follow up is arrived

Details

Related Gerrit Patches:
mediawiki/extensions/ContentTranslation : masterDon't convert timestamps for translation units to integer
mediawiki/extensions/ContentTranslation : masterAdd error checking to TranslationStorageManager::update

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 11 2018, 11:22 AM
Nikerabbit added a comment.EditedSep 12 2018, 7:25 AM

This line should update the timestamp of the unit on save, is this not the case?

This line should update the timestamp of the unit on save, is this not the case?

You're looking at an obsolete diff. Existing timestamp is reused since this change.

This line should update the timestamp of the unit on save, is this not the case?

You're looking at an obsolete diff. Existing timestamp is reused since this change.

The line is still the same after that patch too.

This line should update the timestamp of the unit on save, is this not the case?

You're looking at an obsolete diff. Existing timestamp is reused since this change.

The line is still the same after that patch too.

My browser stripped the line number, so I made some wrong guesses for the line you're talking about. The timestamp isn't updated in the scenario from description of the ticket. The cause might still not be well defined, but the outcome is clear.

Change 460010 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/ContentTranslation@master] Add error checking to TranslationStorageManager::update

https://gerrit.wikimedia.org/r/460010

That patch has a comment : "Preventing the duplicates in the first place is left as an follow-up". Looks like time for that follow up is arrived

We can do that by dropping cxc_timestamp from the unique index. I think that was my wishful thinking that we would at some point store history of these, but it only causing more issues now. We should also take advantage of cxc_id to simplify our query conditions for updates.

We most likely need to write a script that purges duplicates before we change the unique index.

Also, it looks like cxc_sequence_id is unused, and can be removed at the same time.

Change 460010 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Add error checking to TranslationStorageManager::update

https://gerrit.wikimedia.org/r/460010

Arrbee moved this task from Needs Triage to Bugs on the ContentTranslation board.Sep 18 2018, 6:40 AM

Ever since the patch was merged, I started getting lots of save failures, to the point it really affects my workflow, so I needed to check what is going on. Any attempt at updating the existing record in cx_corpora table fails, and basic steps I used for testing are:

  • Start translation and add one paragraph. Wait for save or press Ctrl+S
  • Make a change (adding one word is sufficient) and save
  • Make another change. Next attempt of saving will always fail, when updating of existing DB record is attempted

When cx_corpora record is getting updated, TranslationStorageManager::doFind finds the row by translation ID, section ID and origin. Using that row, instance of TranslationUnit is created. Inside of the constructor of that class, timestamp is converted to int. For example, timestamp obtained from DB was string in format "20181003090933". When that string gets converted to int, that number becomes 2,147,483,647 which is the biggest integer value on 32-bit systems like mine. That number is then fed to $db->timestamp in TranslationStorageManager::update method, which treats it like Unix timestamp, which results in year 2038.

Just removing integer conversion solves the problem of saving failures. The problem described in this ticket is also resolved, but in step 4, entry for 'user' origin is updated, even though 'Apertium' entry should only update timestamp.

The problem of save failures does not affect production, because 64-bit environment is used.

Change 464420 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/extensions/ContentTranslation@master] Don't convert timestamps for translation units to integer

https://gerrit.wikimedia.org/r/464420

Change 464420 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Don't convert timestamps for translation units to integer

https://gerrit.wikimedia.org/r/464420

Petar.petkovic closed this task as Resolved.Oct 17 2018, 4:25 PM
Petar.petkovic removed a subscriber: gerritbot.