Marking as resolved as this fix should address this. If logstash still shows the entries after this fix is deployed to WMF wikis, this can be re-opened.
Tue, Dec 5
I would suggest that QA isn't needed for this task and the way this is verified to work is that logstash is inspected next week to ensure these errors are no longer being triggered.
Mon, Dec 4
The last change (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/977215/) shouldn't need QA, so this can be marked as resolved.
Seems resolved based on logstash (other exceeds of limits are from other code).
Marking this as resolved as I don't see any more of these in the last few days.
This will need to wait until either:
- The three relevant changes (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/979167/, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/974687/, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/978155) are backported
- The 12-18th of December (depending on the specific wiki), so that the train with the above changes in has been normally carried out
Sun, Dec 3
Sat, Dec 2
Fri, Dec 1
Suggested QA steps (requires local wiki to test - cannot be tested on betawikis as CheckUser is not enabled):
- Install CheckUser, if required
- Make one testing edit using Google Chrome
- Open the DB
- Run the following SQL:
SELECT MIN(uachm_reference_id) FROM cu_useragent_clienthints_map;
- If the above query returns 1, then do the following (otherwise skip forward to step 6):
- Run the SQL listed after step 5C
- Make another testing edit
- Go back to step 4
TRUNCATE TABLE cu_changes; TRUNCATE TABLE cu_useragent_clienthints_map;
- Run the following SQL, replacing <lower uachm_reference_id> with a integer that is smaller than the one obtained in step 4. You may repeat this step with different values for <lower uachm_reference_id> each time as long as they are still smaller than the integer from step 4. However, do not run this more than 100 times.
INSERT INTO cu_useragent_clienthints_map (uachm_reference_id, uachm_reference_type, uachm_uach_id) VALUES (<lower uachm_reference_id>, 0, 1), (<lower uachm_reference_id>, 0, 1), (<lower uachm_reference_id>, 0, 1);
- Close the DB (or open a new console window)
- Run the purgeOldData.php maintenance script
- Verify that the output of the maintenance script has on each line after cu_changes, cu_log_event, or cu_private_event the text like Purged 0 rows and Y client hint mapping rows purged. where Y is the number of times you repeated step 6.
- Open the DB, if necessary
- Repeat step 4 and make sure the result was the same.
Thu, Nov 30
Suggested QA steps for betawikis:
- Make sure you have access to betawiki DBs, and if not get access
- Go to https://meta.wikimedia.beta.wmflabs.org/wiki/Special:SiteMatrix and choose a wikipedia from the list
- Go to that betawiki and load Special:ListFiles. Check that images appear in that list (ignore audio/video). If not, repeat step 2 to choose a different beta wikipedia.
- Connect to the betawiki over ssh (e.g. ssh deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud)
- Open the DB for your chosen wikipedia (e.g. sql dewiki)
- Run the following SQL and keep a note of the output:
SELECT COUNT(*) FROM mediamoderation_scan;
- Run the maintenance script (this can be done via mwscript extensions/MediaModeration/maintenance/importExistingFilesToScanTable.php --wiki=dewiki replacing dewiki with the name of the wiki you chosen
- Make sure the maintenance script runs without any errors
- Repeat steps 5 and 6.
- Verify that the second time you ran the query a larger count was present
One addendum to the QA steps is that you should use a midi file that hasn't been uploaded to the wiki before. This means choosing a new one from https://commons.wikimedia.org/wiki/Category:MIDI_files that you have not used for testing before.
Thanks @DatGuy for this patch. I'm closing this as resolved as the acceptance criteria have been completed.
@DatGuy can this be closed now?
QA should not be needed for the change being made to core (as this only affects method documentation).
Wed, Nov 29
The query is the first listed in the task description.
Tue, Nov 28
I propose the following to fix this:
- Make MediaModeration only accept files that have File::getMediaType return MEDIATYPE_BITMAP or MEDIATYPE_DRAWING
- Ensure that MediaModerationFileProcessor::canScanFile is always called before attempting to scan file (even if the DB has an entry for it)
- (potentially) Write a maintenance script to remove files from the scan table that do not have File::getMediaType return MEDIATYPE_BITMAP or MEDIATYPE_DRAWING.
Copying @Tchanders alternative implementation method from the declined subtask to here:
I think it's clear that these indexes will not be added for the time being. I will explore different ways to avoid the need for these indexes in the parent task.
Thanks for the thoughts and no worries on any delay.
Mon, Nov 27
I had not thought of this approach and in theory it seems better than trying to add new indexes. However, I see some more problems that would need solving:
- To make it possible to implement T170148, we could not rely on the fact that (within a small margin of error) the timestamp correlates to the ID associated with the entry in the cu_log_event table. This has received legal approval and is now technically possible to implement. IMO this would be useful for checkuser investigations
- If we ever implement searching by Client Hints data, these indexes will be needed. The equivalent task for user agent strings (T146837) seemed to be desired by CUs.
@Tchanders yes. As such I will move it to done.
Would welcome any thoughts from DBA on the proposed patch and in general about adding indexes.
With assistance from @kostajh, I was able to determine that a single query on wikidatawiki to determine if a given revision ID is in the cu_changes table can take over a minute to run. Considering this could be a maximum of 100 of these queries per maintenance script run and for the first few runs the script will be reaching the maximum, this will be too inefficient without an index.