Page MenuHomePhabricator

Dreamy_Jazz (WBrown (WMF))
Engineering

Projects (10)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
May 1 2018, 4:55 PM (292 w, 1 d)
Availability
Available
IRC Nick
Dreamy_Jazz
LDAP User
Dreamy Jazz
MediaWiki User
Dreamy Jazz [ Global Accounts ]

Recent Activity

Today

Dreamy_Jazz closed T352694: MediaWiki\CheckUser\Maintenance\PurgeOldData::prune: no transaction to commit, something got out of sync as Resolved.

Marking as resolved as this fix should address this. If logstash still shows the entries after this fix is deployed to WMF wikis, this can be re-opened.

Thu, Dec 7, 11:24 AM · http-client-hints, MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, Wikimedia-production-error

Tue, Dec 5

Dreamy_Jazz updated the task description for T351409: Expand the database lookup service to get rows from the mediamoderation_scan table.
Tue, Dec 5, 1:57 PM · Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz added a comment to T352694: MediaWiki\CheckUser\Maintenance\PurgeOldData::prune: no transaction to commit, something got out of sync.

I would suggest that QA isn't needed for this task and the way this is verified to work is that logstash is inspected next week to ensure these errors are no longer being triggered.

Tue, Dec 5, 1:55 PM · http-client-hints, MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, Wikimedia-production-error
Dreamy_Jazz updated the task description for T352755: UserAgentClientHintsManager::deleteMappingRows can exceed TransactionProfiler limits by deleting over 1000 rows in one query.
Tue, Dec 5, 11:44 AM · Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz renamed T352755: UserAgentClientHintsManager::deleteMappingRows can exceed TransactionProfiler limits by deleting over 1000 rows in one query from UserAgentClientHintsManager::deleteMappingRows can exceed TransactionProfiler limits for deleting rows to UserAgentClientHintsManager::deleteMappingRows can exceed TransactionProfiler limits by deleting over 1000 rows in one query.
Tue, Dec 5, 11:42 AM · Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz created T352755: UserAgentClientHintsManager::deleteMappingRows can exceed TransactionProfiler limits by deleting over 1000 rows in one query.
Tue, Dec 5, 11:42 AM · Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz renamed T352754: Expectation (writes <= 0) not met by Special:Investigate when relaunching the tour from Expectation (writes <= 0) by MediaWiki::main not met (actual: 1) in trx #9c5e43899f: DELETE FROM `user_properties` WHERE up_user = '?' AND up_property = '?' to Expectation (writes <= 0) not met by Special:Investigate when relaunching the tour.
Tue, Dec 5, 11:33 AM · Trust and Safety Product Team, CheckUser
Dreamy_Jazz moved T352754: Expectation (writes <= 0) not met by Special:Investigate when relaunching the tour from General / Unsorted to Investigate on the CheckUser board.
Tue, Dec 5, 11:32 AM · Trust and Safety Product Team, CheckUser
Dreamy_Jazz created T352754: Expectation (writes <= 0) not met by Special:Investigate when relaunching the tour.
Tue, Dec 5, 11:32 AM · Trust and Safety Product Team, CheckUser

Mon, Dec 4

Dreamy_Jazz closed T350863: Create maintenance script to import all existing images to mediamoderation_scan table as Resolved.

The last change (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/977215/) shouldn't need QA, so this can be marked as resolved.

Mon, Dec 4, 8:21 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz closed T350863: Create maintenance script to import all existing images to mediamoderation_scan table, a subtask of T350865: Import all existing images to the mediamoderation_scan table on WMF wikis, as Resolved.
Mon, Dec 4, 8:21 PM · Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz closed T350863: Create maintenance script to import all existing images to mediamoderation_scan table, a subtask of T351546: [Sub EPIC] Set up rows , as Resolved.
Mon, Dec 4, 8:21 PM · Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz moved T352694: MediaWiki\CheckUser\Maintenance\PurgeOldData::prune: no transaction to commit, something got out of sync from Priority Backlog to Needs review on the Trust and Safety Product Sprint (Sprint Shamisen) board.
Mon, Dec 4, 5:12 PM · http-client-hints, MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, Wikimedia-production-error
Dreamy_Jazz edited projects for T352694: MediaWiki\CheckUser\Maintenance\PurgeOldData::prune: no transaction to commit, something got out of sync, added: Trust and Safety Product Sprint (Sprint Shamisen); removed Trust and Safety Product Sprint.
Mon, Dec 4, 5:12 PM · http-client-hints, MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, Wikimedia-production-error
Dreamy_Jazz added projects to T352694: MediaWiki\CheckUser\Maintenance\PurgeOldData::prune: no transaction to commit, something got out of sync: Trust and Safety Product Team, Trust and Safety Product Sprint.
Mon, Dec 4, 5:11 PM · http-client-hints, MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, Wikimedia-production-error
Dreamy_Jazz created T352694: MediaWiki\CheckUser\Maintenance\PurgeOldData::prune: no transaction to commit, something got out of sync.
Mon, Dec 4, 5:05 PM · http-client-hints, MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, Wikimedia-production-error
Dreamy_Jazz closed T342613: CompareService::getTotalEditsFromIp queries exceeding TransactionProfiler limits as Resolved.

Seems resolved based on logstash (other exceeds of limits are from other code).

Mon, Dec 4, 5:01 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Anti-Harassment, CheckUser
Dreamy_Jazz closed T342613: CompareService::getTotalEditsFromIp queries exceeding TransactionProfiler limits, a subtask of T248926: Performance review of checkuser database queries [NOT READY], as Resolved.
Mon, Dec 4, 5:00 PM · CheckUser, Anti-Harassment
Dreamy_Jazz closed T346970: Wikimedia\RequestTimeout\RequestTimeoutException: The maximum execution time of 60 seconds was exceeded on Special:Investigate as Resolved.

Marking this as resolved as I don't see any more of these in the last few days.

Mon, Dec 4, 4:58 PM · Wikimedia-production-error, Anti-Harassment, CheckUser
Dreamy_Jazz added a comment to T350865: Import all existing images to the mediamoderation_scan table on WMF wikis.

This will need to wait until either:

  1. The three relevant changes (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/979167/, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/974687/, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/978155) are backported
  2. The 12-18th of December (depending on the specific wiki), so that the train with the above changes in has been normally carried out
Mon, Dec 4, 4:51 PM · Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz reopened T350863: Create maintenance script to import all existing images to mediamoderation_scan table as "Open".
Mon, Dec 4, 4:45 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz reopened T350863: Create maintenance script to import all existing images to mediamoderation_scan table, a subtask of T350865: Import all existing images to the mediamoderation_scan table on WMF wikis, as Open.
Mon, Dec 4, 4:45 PM · Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz reopened T350863: Create maintenance script to import all existing images to mediamoderation_scan table, a subtask of T351546: [Sub EPIC] Set up rows , as Open.
Mon, Dec 4, 4:45 PM · Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz closed T350863: Create maintenance script to import all existing images to mediamoderation_scan table, a subtask of T350865: Import all existing images to the mediamoderation_scan table on WMF wikis, as Resolved.
Mon, Dec 4, 4:44 PM · Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz closed T350863: Create maintenance script to import all existing images to mediamoderation_scan table, a subtask of T351546: [Sub EPIC] Set up rows , as Resolved.
Mon, Dec 4, 4:44 PM · Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz closed T350863: Create maintenance script to import all existing images to mediamoderation_scan table as Resolved.
Mon, Dec 4, 4:44 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz added a project to T352681: Mark Tor exit nodes in pagers: MediaWiki-extensions-TorBlock.
Mon, Dec 4, 3:31 PM · Patch-For-Review, MediaWiki-extensions-TorBlock, CheckUser
Dreamy_Jazz added a project to T352679: HistoryBlobUtilsTest::testUnserializeBadEmbedded Creation of dynamic property HistoryBlobStub::$bad is deprecated: MediaWiki-General.
Mon, Dec 4, 3:24 PM · MediaWiki-General, PHP 8.2 support
Dreamy_Jazz moved T352679: HistoryBlobUtilsTest::testUnserializeBadEmbedded Creation of dynamic property HistoryBlobStub::$bad is deprecated from Backlog to MediaWiki core on the PHP 8.2 support board.
Mon, Dec 4, 3:24 PM · MediaWiki-General, PHP 8.2 support
Dreamy_Jazz updated the task description for T352085: Make PHP 8.2 voting on development (master) branch of MW ecosystem (core, vendor, extensions, skins, libraries).
Mon, Dec 4, 3:23 PM · PHP 8.2 support
Dreamy_Jazz updated the task description for T352679: HistoryBlobUtilsTest::testUnserializeBadEmbedded Creation of dynamic property HistoryBlobStub::$bad is deprecated.
Mon, Dec 4, 3:22 PM · MediaWiki-General, PHP 8.2 support
Dreamy_Jazz created T352679: HistoryBlobUtilsTest::testUnserializeBadEmbedded Creation of dynamic property HistoryBlobStub::$bad is deprecated.
Mon, Dec 4, 3:22 PM · MediaWiki-General, PHP 8.2 support
Dreamy_Jazz updated the task description for T341829: Enable read new for the event table migration.
Mon, Dec 4, 3:05 PM · Patch-For-Review, Trust and Safety Product Team, Trust and Safety Product Sprint (Sprint Shamisen), Anti-Harassment, CheckUser
Dreamy_Jazz moved T341829: Enable read new for the event table migration from Priority Backlog to In Progress on the Trust and Safety Product Sprint (Sprint Shamisen) board.
Mon, Dec 4, 2:06 PM · Patch-For-Review, Trust and Safety Product Team, Trust and Safety Product Sprint (Sprint Shamisen), Anti-Harassment, CheckUser
Dreamy_Jazz updated the task description for T341829: Enable read new for the event table migration.
Mon, Dec 4, 12:41 PM · Patch-For-Review, Trust and Safety Product Team, Trust and Safety Product Sprint (Sprint Shamisen), Anti-Harassment, CheckUser
Dreamy_Jazz updated the task description for T341829: Enable read new for the event table migration.
Mon, Dec 4, 11:20 AM · Patch-For-Review, Trust and Safety Product Team, Trust and Safety Product Sprint (Sprint Shamisen), Anti-Harassment, CheckUser

Sun, Dec 3

Dreamy_Jazz moved T324907: Create separate tables for log events in CheckUser from Patches for review to General / Unsorted on the CheckUser board.
Sun, Dec 3, 8:17 PM · Epic, MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), MW-1.40-notes (1.40.0-wmf.22; 2023-02-06), Patch-For-Review, Anti-Harassment, Schema-change, CheckUser

Sat, Dec 2

Dreamy_Jazz closed T351945: Standardise IP display formatting as Resolved.
Sat, Dec 2, 4:50 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), CheckUser
Dreamy_Jazz claimed T351417: Create a service that creates LocalFile and ArchivedFile objects for files that have a given SHA-1.
Sat, Dec 2, 12:42 AM · Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)

Fri, Dec 1

Dreamy_Jazz moved T350681: Update purgeOldData.php maintenance script to look for and delete orphaned map rows from Needs review to Needs QA on the Trust and Safety Product Sprint (Sprint Shamisen) board.

Suggested QA steps (requires local wiki to test - cannot be tested on betawikis as CheckUser is not enabled):

  1. Install CheckUser, if required
  2. Make one testing edit using Google Chrome
  3. Open the DB
  4. Run the following SQL:
Get the oldest uachm_reference_id value
SELECT MIN(uachm_reference_id) FROM cu_useragent_clienthints_map;
  1. If the above query returns 1, then do the following (otherwise skip forward to step 6):
    1. Run the SQL listed after step 5C
    2. Make another testing edit
    3. Go back to step 4
SQL to run ONLY for step 5B
TRUNCATE TABLE cu_changes;
TRUNCATE TABLE cu_useragent_clienthints_map;
  1. Run the following SQL, replacing <lower uachm_reference_id> with a integer that is smaller than the one obtained in step 4. You may repeat this step with different values for <lower uachm_reference_id> each time as long as they are still smaller than the integer from step 4. However, do not run this more than 100 times.
Add simulated orphaned entries to cu_useragent_clienthints_map
INSERT INTO cu_useragent_clienthints_map (uachm_reference_id, uachm_reference_type, uachm_uach_id) VALUES (<lower uachm_reference_id>, 0, 1), (<lower uachm_reference_id>, 0, 1), (<lower uachm_reference_id>, 0, 1);
  1. Close the DB (or open a new console window)
  2. Run the purgeOldData.php maintenance script
  3. Verify that the output of the maintenance script has on each line after cu_changes, cu_log_event, or cu_private_event the text like Purged 0 rows and Y client hint mapping rows purged. where Y is the number of times you repeated step 6.
  4. Open the DB, if necessary
  5. Repeat step 4 and make sure the result was the same.
Fri, Dec 1, 10:23 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz updated the task description for T350681: Update purgeOldData.php maintenance script to look for and delete orphaned map rows.
Fri, Dec 1, 9:58 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz moved T350681: Update purgeOldData.php maintenance script to look for and delete orphaned map rows from In Progress to Needs review on the Trust and Safety Product Sprint (Sprint Shamisen) board.
Fri, Dec 1, 5:17 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz claimed T351409: Expand the database lookup service to get rows from the mediamoderation_scan table.
Fri, Dec 1, 12:27 PM · Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz closed T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed, a subtask of T350323: Write an empty row to scan table on file upload, as Resolved.
Fri, Dec 1, 10:36 AM · Trust and Safety Product Sprint (Sprint Shamisen), MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz closed T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed as Resolved.
Fri, Dec 1, 10:36 AM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler
Dreamy_Jazz closed T350323: Write an empty row to scan table on file upload, a subtask of T350863: Create maintenance script to import all existing images to mediamoderation_scan table, as Resolved.
Fri, Dec 1, 10:35 AM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz closed T350323: Write an empty row to scan table on file upload, a subtask of T350865: Import all existing images to the mediamoderation_scan table on WMF wikis, as Resolved.
Fri, Dec 1, 10:35 AM · Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz closed T350323: Write an empty row to scan table on file upload as Resolved.
Fri, Dec 1, 10:35 AM · Trust and Safety Product Sprint (Sprint Shamisen), MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz renamed T350323: Write an empty row to scan table on file upload from [M] Write an empty row to scan table on file upload to Write an empty row to scan table on file upload.
Fri, Dec 1, 10:35 AM · Trust and Safety Product Sprint (Sprint Shamisen), MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz updated the task description for T350323: Write an empty row to scan table on file upload.
Fri, Dec 1, 10:34 AM · Trust and Safety Product Sprint (Sprint Shamisen), MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)

Thu, Nov 30

Dreamy_Jazz added a comment to T350863: Create maintenance script to import all existing images to mediamoderation_scan table.

Suggested QA steps for betawikis:

  1. Make sure you have access to betawiki DBs, and if not get access
  2. Go to https://meta.wikimedia.beta.wmflabs.org/wiki/Special:SiteMatrix and choose a wikipedia from the list
  3. Go to that betawiki and load Special:ListFiles. Check that images appear in that list (ignore audio/video). If not, repeat step 2 to choose a different beta wikipedia.
  4. Connect to the betawiki over ssh (e.g. ssh deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud)
  5. Open the DB for your chosen wikipedia (e.g. sql dewiki)
  6. Run the following SQL and keep a note of the output:
SELECT COUNT(*) FROM mediamoderation_scan;
  1. Run the maintenance script (this can be done via mwscript extensions/MediaModeration/maintenance/importExistingFilesToScanTable.php --wiki=dewiki replacing dewiki with the name of the wiki you chosen
  2. Make sure the maintenance script runs without any errors
  3. Repeat steps 5 and 6.
  4. Verify that the second time you ran the query a larger count was present
Thu, Nov 30, 9:03 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz added a comment to T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed.

One addendum to the QA steps is that you should use a midi file that hasn't been uploaded to the wiki before. This means choosing a new one from https://commons.wikimedia.org/wiki/Category:MIDI_files that you have not used for testing before.

Thu, Nov 30, 6:38 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler
Dreamy_Jazz added a project to T347556: Some user creation entries have incorrect Special:Log url: Trust and Safety Product Team.
Thu, Nov 30, 6:14 PM · Trust and Safety Product Team, CheckUser, Anti-Harassment
Dreamy_Jazz closed T345135: Cache calls to Linker::userLink and Linker::userToolLinksRedContribs as Resolved.

Thanks @DatGuy for this patch. I'm closing this as resolved as the acceptance criteria have been completed.

Thu, Nov 30, 6:13 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Anti-Harassment, http-client-hints, CheckUser
Dreamy_Jazz updated the task description for T345135: Cache calls to Linker::userLink and Linker::userToolLinksRedContribs.
Thu, Nov 30, 6:13 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Anti-Harassment, http-client-hints, CheckUser
Dreamy_Jazz closed T345135: Cache calls to Linker::userLink and Linker::userToolLinksRedContribs, a subtask of T345134: Improve load times for Special:CheckUser's 'Get edits' mode, as Resolved.
Thu, Nov 30, 6:13 PM · Anti-Harassment, Performance Issue, http-client-hints, CheckUser
Dreamy_Jazz added a comment to T351945: Standardise IP display formatting.

@DatGuy can this be closed now?

Thu, Nov 30, 6:11 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), CheckUser
Dreamy_Jazz moved T350863: Create maintenance script to import all existing images to mediamoderation_scan table from Needs review to Needs QA on the Trust and Safety Product Sprint (Sprint Shamisen) board.

I will write suggested QA steps for betawikis by tommorrow. QA should wait until QA on T350323 an T352234 is completed.

Thu, Nov 30, 6:09 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz added a comment to T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed.

QA should not be needed for the change being made to core (as this only affects method documentation).

Thu, Nov 30, 6:05 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler
Dreamy_Jazz moved T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed from Needs review to Needs QA on the Trust and Safety Product Sprint (Sprint Shamisen) board.
Thu, Nov 30, 5:53 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler
Dreamy_Jazz updated the task description for T350863: Create maintenance script to import all existing images to mediamoderation_scan table.
Thu, Nov 30, 5:51 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)

Wed, Nov 29

Dreamy_Jazz added a comment to T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event.

And on the opposite direction, I have been working to remove indexes that are only used by maint scripts as these script don't have a time constraint and can split the work into different chunks and do it differently (the one I will do soon is index on cl_collation in categorylinks). So if it's possible to split deletion into chunks and go through them, it would be much better.

The issue I have is that the "atomic unit" for a query using the method that I went with takes over a minute. If the vslow group was used, could the query take over a minute to run?

Usually it's fine to run one-minute-long queries in maint scripts as long as it's only on vslow and it's not like 1 million one-minute-long queries. I also do wonder if there is a way to chop it to let's says 60 queries that take 1s. Drop me the query somewhere and I try to see what can be done.

The query is the first listed in the task description.

Wed, Nov 29, 12:19 AM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints
Dreamy_Jazz renamed T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed from MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as renderable when TimedMediaHandler extension is installed to MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed.
Wed, Nov 29, 12:15 AM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler
Dreamy_Jazz moved T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed from In Progress to Needs review on the Trust and Safety Product Sprint (Sprint Shamisen) board.
Wed, Nov 29, 12:15 AM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler
Dreamy_Jazz added a parent task for T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed: T350323: Write an empty row to scan table on file upload.
Wed, Nov 29, 12:14 AM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler
Dreamy_Jazz added a subtask for T350323: Write an empty row to scan table on file upload: T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed.
Wed, Nov 29, 12:14 AM · Trust and Safety Product Sprint (Sprint Shamisen), MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0)
Dreamy_Jazz changed the point value for T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed from 2 to 1.
Wed, Nov 29, 12:11 AM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler

Tue, Nov 28

Dreamy_Jazz claimed T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed.
Tue, Nov 28, 11:49 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler
Dreamy_Jazz added a comment to T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed.

I propose the following to fix this:

  • Make MediaModeration only accept files that have File::getMediaType return MEDIATYPE_BITMAP or MEDIATYPE_DRAWING
  • Ensure that MediaModerationFileProcessor::canScanFile is always called before attempting to scan file (even if the DB has an entry for it)
  • (potentially) Write a maintenance script to remove files from the scan table that do not have File::getMediaType return MEDIATYPE_BITMAP or MEDIATYPE_DRAWING.
Tue, Nov 28, 11:48 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler
Dreamy_Jazz added projects to T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed: Trust and Safety Product Team, Trust and Safety Product Sprint (Sprint Shamisen).
Tue, Nov 28, 11:33 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler
Dreamy_Jazz created T352234: MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed.
Tue, Nov 28, 11:33 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Patch-For-Review, Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MediaModeration (MediaModeration 2.0), TimedMediaHandler
Dreamy_Jazz updated subscribers of T350681: Update purgeOldData.php maintenance script to look for and delete orphaned map rows.

Copying @Tchanders alternative implementation method from the declined subtask to here:

Do we need the index?

I'm wondering if we really need the new indexes. Since the CheckUser tables are quite large (I didn't have the permissions to check on production but they're some of the largest locally), it's a fair lift storing and recalculating the new indexes, so we should tread carefully when it comes to adding new ones.

From reading the code, it seems we want to know the smallest reference ID in the map table that isn't orphaned, working on assumption that reference IDs for a given reference type increase over time. So can we do this the other way round:

  • Find the reference ID from the oldest row in the changes table, e.g. SELECT cuc_this_oldid FROM cu_changes WHERE cuc_this_oldid != 0 ORDER BY cuc_timestamp LIMIT 1
  • Then delete all map rows for the same type with a lower uachm_reference_id

Does the increasing reference ID assumption hold?

Not completely, I think due to sharding - e.g. here are the oldest on enwiki:

(enwiki)> SELECT cuc_this_oldid FROM cu_changes ORDER BY cuc_timestamp LIMIT 10;
+----------------+
| cuc_this_oldid |
+----------------+
|     1172847017 |
|     1172847022 |
|     1172847019 |
|     1172847030 |
|     1172847028 |
|     1172847021 |
|     1172847024 |
|     1172847026 |
|     1172847027 |
|     1172847018 |
+----------------+
10 rows in set (0.001 sec)

This affects both the implementation from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/977267 and the method suggested in this comment.

We could adapt the query above to look for the lowest ID in the first n rows sorted by timestamp instead, to ensure we almost definitely get the lowest ID. And/or we could decide we're OK with missing a very small amount of client hints data for the oldest changes, or with sometimes keeping a few data points for a few extra days.

Tue, Nov 28, 4:20 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz moved T350681: Update purgeOldData.php maintenance script to look for and delete orphaned map rows from Needs review to In Progress on the Trust and Safety Product Sprint (Sprint Shamisen) board.
Tue, Nov 28, 4:20 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz closed T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event as Declined.

I think it's clear that these indexes will not be added for the time being. I will explore different ways to avoid the need for these indexes in the parent task.

Tue, Nov 28, 4:11 PM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints
Dreamy_Jazz updated the task description for T350681: Update purgeOldData.php maintenance script to look for and delete orphaned map rows.
Tue, Nov 28, 4:11 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz closed T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event, a subtask of T350681: Update purgeOldData.php maintenance script to look for and delete orphaned map rows, as Declined.
Tue, Nov 28, 4:10 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz added a comment to T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event.

Sorry I wanted to respond sooner but got sick (and today we had an outage). I actually wanted to say the exact same thing as Thalia. We are really short of space in core dbs and don't have much room to breath. These tables are big already (e.g. in wikidata it's 12GB which is not small but not super large either, enwiki it's 6.4GB).

Thanks for the thoughts and no worries on any delay.

Tue, Nov 28, 4:05 PM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints

Mon, Nov 27

Dreamy_Jazz added a comment to T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event.

Do we need the index?

I'm wondering if we really need the new indexes. Since the CheckUser tables are quite large (I didn't have the permissions to check on production but they're some of the largest locally), it's a fair lift storing and recalculating the new indexes, so we should tread carefully when it comes to adding new ones.

From reading the code, it seems we want to know the smallest reference ID in the map table that isn't orphaned, working on assumption that reference IDs for a given reference type increase over time. So can we do this the other way round:

  • Find the reference ID from the oldest row in the changes table, e.g. SELECT cuc_this_oldid FROM cu_changes WHERE cuc_this_oldid != 0 ORDER BY cuc_timestamp LIMIT 1
  • Then delete all map rows for the same type with a lower uachm_reference_id

Does the increasing reference ID assumption hold?

Not completely, I think due to sharding - e.g. here are the oldest on enwiki:

(enwiki)> SELECT cuc_this_oldid FROM cu_changes ORDER BY cuc_timestamp LIMIT 10;
+----------------+
| cuc_this_oldid |
+----------------+
|     1172847017 |
|     1172847022 |
|     1172847019 |
|     1172847030 |
|     1172847028 |
|     1172847021 |
|     1172847024 |
|     1172847026 |
|     1172847027 |
|     1172847018 |
+----------------+
10 rows in set (0.001 sec)

This affects both the implementation from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/977267 and the method suggested in this comment.

We could adapt the query above to look for the lowest ID in the first n rows sorted by timestamp instead, to ensure we almost definitely get the lowest ID. And/or we could decide we're OK with missing a very small amount of client hints data for the oldest changes, or with sometimes keeping a few data points for a few extra days.

I had not thought of this approach and in theory it seems better than trying to add new indexes. However, I see some more problems that would need solving:

  • To make it possible to implement T170148, we could not rely on the fact that (within a small margin of error) the timestamp correlates to the ID associated with the entry in the cu_log_event table. This has received legal approval and is now technically possible to implement. IMO this would be useful for checkuser investigations
  • If we ever implement searching by Client Hints data, these indexes will be needed. The equivalent task for user agent strings (T146837) seemed to be desired by CUs.
Mon, Nov 27, 6:55 PM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints
Dreamy_Jazz closed T346044: Remove CheckUserUnionQueryBuilder, a subtask of T324907: Create separate tables for log events in CheckUser, as Resolved.
Mon, Nov 27, 5:31 PM · Epic, MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), MW-1.40-notes (1.40.0-wmf.22; 2023-02-06), Patch-For-Review, Anti-Harassment, Schema-change, CheckUser
Dreamy_Jazz closed T346044: Remove CheckUserUnionQueryBuilder, a subtask of T337159: Make PHPUnit dataProvider static in CheckUser tests, as Resolved.
Mon, Nov 27, 5:31 PM · Trust and Safety Product Sprint (Sprint Shamisen), MW-1.41-notes (1.41.0-wmf.27; 2023-09-19), Anti-Harassment (AHaT Sprint 32 - Baseball Cap), CheckUser
Dreamy_Jazz closed T346044: Remove CheckUserUnionQueryBuilder as Resolved.
Mon, Nov 27, 5:31 PM · Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), CheckUser
Dreamy_Jazz updated the task description for T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event.
Mon, Nov 27, 4:37 PM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints
Dreamy_Jazz moved T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event from Unsorted to Add / Create on the Schema-change board.
Mon, Nov 27, 4:35 PM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints
Dreamy_Jazz updated the task description for T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event.
Mon, Nov 27, 4:35 PM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints
Dreamy_Jazz updated the task description for T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event.
Mon, Nov 27, 4:34 PM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints
Dreamy_Jazz updated the task description for T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event.
Mon, Nov 27, 4:34 PM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints
Dreamy_Jazz closed T337159: Make PHPUnit dataProvider static in CheckUser tests, a subtask of T332865: PHPUnit data providers should be simple static functions that return plain data, as Resolved.
Mon, Nov 27, 4:25 PM · MW-1.41-notes (1.41.0-wmf.20; 2023-08-01), Patch-For-Review, MediaWiki-General
Dreamy_Jazz closed T337159: Make PHPUnit dataProvider static in CheckUser tests as Resolved.
Mon, Nov 27, 4:24 PM · Trust and Safety Product Sprint (Sprint Shamisen), MW-1.41-notes (1.41.0-wmf.27; 2023-09-19), Anti-Harassment (AHaT Sprint 32 - Baseball Cap), CheckUser
Dreamy_Jazz moved T337159: Make PHPUnit dataProvider static in CheckUser tests from Blocked/Stalled 🚧 to Done Q1 2023-2024 on the Anti-Harassment (AHaT Sprint 32 - Baseball Cap) board.

@Tchanders yes. As such I will move it to done.

Mon, Nov 27, 4:23 PM · Trust and Safety Product Sprint (Sprint Shamisen), MW-1.41-notes (1.41.0-wmf.27; 2023-09-19), Anti-Harassment (AHaT Sprint 32 - Baseball Cap), CheckUser
Dreamy_Jazz moved T337159: Make PHPUnit dataProvider static in CheckUser tests from Needs review to Done on the Trust and Safety Product Sprint (Sprint Shamisen) board.
Mon, Nov 27, 4:23 PM · Trust and Safety Product Sprint (Sprint Shamisen), MW-1.41-notes (1.41.0-wmf.27; 2023-09-19), Anti-Harassment (AHaT Sprint 32 - Baseball Cap), CheckUser
Dreamy_Jazz changed the point value for T350681: Update purgeOldData.php maintenance script to look for and delete orphaned map rows from 3 to 2.
Mon, Nov 27, 3:56 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz changed the point value for T350681: Update purgeOldData.php maintenance script to look for and delete orphaned map rows from 2 to 3.
Mon, Nov 27, 3:56 PM · MW-1.42-notes (1.42.0-wmf.9; 2023-12-12), Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team, CheckUser, http-client-hints
Dreamy_Jazz added a project to T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event: DBA.

Would welcome any thoughts from DBA on the proposed patch and in general about adding indexes.

Mon, Nov 27, 3:35 PM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints
Dreamy_Jazz closed T336049: Address unsafe regular expression in checkuser/cidr.js as Resolved.
Mon, Nov 27, 3:30 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), CheckUser
Dreamy_Jazz claimed T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event.
Mon, Nov 27, 1:00 PM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints
Dreamy_Jazz set the point value for T341829: Enable read new for the event table migration to 2.
Mon, Nov 27, 12:25 PM · Patch-For-Review, Trust and Safety Product Team, Trust and Safety Product Sprint (Sprint Shamisen), Anti-Harassment, CheckUser
Dreamy_Jazz added projects to T341829: Enable read new for the event table migration: Trust and Safety Product Sprint (Sprint Shamisen), Trust and Safety Product Team.
Mon, Nov 27, 12:25 PM · Patch-For-Review, Trust and Safety Product Team, Trust and Safety Product Sprint (Sprint Shamisen), Anti-Harassment, CheckUser
Dreamy_Jazz updated the task description for T341829: Enable read new for the event table migration.
Mon, Nov 27, 12:24 PM · Patch-For-Review, Trust and Safety Product Team, Trust and Safety Product Sprint (Sprint Shamisen), Anti-Harassment, CheckUser
Dreamy_Jazz updated the task description for T324907: Create separate tables for log events in CheckUser.
Mon, Nov 27, 11:30 AM · Epic, MW-1.41-notes (1.41.0-wmf.9; 2023-05-15), MW-1.40-notes (1.40.0-wmf.22; 2023-02-06), Patch-For-Review, Anti-Harassment, Schema-change, CheckUser
Dreamy_Jazz updated subscribers of T351944: Create indexes cuc_this_oldid for cu_changes and cule_log_id for cu_log_event.

With assistance from @kostajh, I was able to determine that a single query on wikidatawiki to determine if a given revision ID is in the cu_changes table can take over a minute to run. Considering this could be a maximum of 100 of these queries per maintenance script run and for the first few runs the script will be reaching the maximum, this will be too inefficient without an index.

Mon, Nov 27, 11:26 AM · DBA, Trust and Safety Product Sprint (Sprint Shamisen), Schema-change, Trust and Safety Product Team, http-client-hints