User Details
- User Since
- Apr 16 2024, 4:22 PM (112 w, 3 d)
- Availability
- Available
- LDAP User
- Amdrel
- MediaWiki User
- Amdrel [ Global Accounts ]
Yesterday
@JSengupta-WMF Thank you for the design input. I've created a patch that adds this notification.
Tue, Jun 2
When storing this information it can be stored in either the information template or structured data (or both?). Should we prioritize one over the other?
Fri, May 22
I've noticed that there's a retry button already at the bottom of the page that re-attempts all of the failed uploads, including ones that aren't fixable. Was this ticket raised to address visibility issues? I didn't notice the button initially as I had to scroll down to the bottom of the page when uploading a batch of files. Adding buttons to the individual file cards next to the "Remove" button would help with that.
Mon, May 18
@Soda My apologies, I retract what I said earlier. After testing again to create the steps I noticed that I missed that it got added to the right category. I must have glanced over it.
Fri, May 15
I was able to reproduce this with a local development setup of Mediawiki with UploadWizard installed.
May 11 2026
Feb 25 2026
The script finished and I found 135 videos in total that need re-uploads. Attached is a CSV file with the full list of titles and YouTube IDs. I excluded AI upscaled videos as well (new YouTube feature). It's actually a bit less than I expected.
Feb 24 2026
Thank you for the examples! I'm also running a script that should get a (mostly) complete list of videos with higher resolutions available. I'll update the thread with the list once it's complete.
Feb 23 2026
I took a look at this query's 20 most recent results and 15/20 of the videos downloaded correctly. The remaining 5 videos are at a lower resolution than the original source. All of these improper videos were downloaded manually by the user prior to being uploaded to video2commons and were not fetched directly from YouTube directly by video2commons. Likely what happened was the manual method used to download these videos by users prior to uploading them to video2commons failed to download the full resolution versions. I downloaded each of these videos using yt-dlp and got the full resolution versions as we would hope would happen.
Feb 16 2026
Hiding non CC-BY videos from the upload list could result in issues uploading public domain content that we have the right to upload. An example of this I came across is government works, which depending on the laws of the country are often public domain, though not always. Videos uploaded by the U.S. federal government often don't have public domain labels on YouTube even though they may be fine to upload (e.g. videos on the White House channel). I think we should instead inform the user about the license of each video in the playlist and preselect any videos that have valid licenses. Here is an example of what this would look like:
Feb 9 2026
The workflows have run successfully. I logged into the instances to check the status of the workers and it also looks like it's restarting gracefully as we hoped it would.
Feb 3 2026
We've got a working deployment script for the encoders merged now. The next step is to automatically run it when code is merged.
Jan 16 2026
The infodump is appreciated! Though admittedly a lot of this is out of my depth. It's good to know what our options are on Cloud VPS.
Dec 18 2025
As far as restarting the instances gracefully goes, using the systemd command I suggested should prevent you from having to keep a login shell open for restarts while waiting for ffmpeg subprocesses to finish. However that doesn't solve the problem of automatically deploying code to all of the instances at once (this is currently manual as far as I know).
Dec 5 2025
Not directly, however there's a couple of things we can do. We can tweak settings sent to the encoder such as look ahead distance and see if that results in less memory being used, but that may have an effect on output quality and file size. We could also try lowering the amount of threads for each ffmpeg process we spawn, and I think this is what we should try first. I noticed that we're currently creating 16 threads per process (matches core count, default for libsvtav1), which is a bit excessive if there's multiple tasks being processed at once. The way it's currently configured results in 48 threads competing for 16 CPU cores for a worker that has 3 tasks being processed. More threads result in more memory being used. We're using more memory than needed on worker nodes that are processing more than 1 video concurrently.
AV1 transcoding can be a bit of a memory hog. encoding04 is currently processing a single job and it's using 12.0G of memory (RES). The video being converted is a 4k Apple ProRes 422 file. Processing just two of these files concurrently would result in OOM errors.
After looking into some logs deeper I can see the OOM killer has been getting invoked around the time instances stop reporting to the dashboard. I also saw several logs that indicated network errors afterwards as well, though the instance did remain up. I see lots of prometheus write errors after the OOM killer got to work.
I've gained access and I noticed a couple of things so far. The new cron job and old cron jobs aren't running due to the following errors:
Nov 26 2025
Nov 25 2025
Right now it appears that the video2commons application isn't able to tell which users are administrators or not. I've found that the LDAP API hosted at https://ldap.toolforge.org allows me to programmatically query members of the tools.video2commons group, and it corresponds to the maintainers list on toolsadmin. I can modify video2commons to check against that list to determine whether a user can restart tasks belonging to other users.
Nov 24 2025
I may have been mistaken in my initial assessment. I did some testing with Celery and V2C locally and I was able to observe that initiating a restart was gracefully waiting for uploads to complete although we have no explicit signal handling. Celery seems to be handling this by default, however, there is one major caveat I encountered that may be relevant.
Hopefully that works. If it does, longer term should we reconsider having AV1 as the default transcoding target? At least while we only have 16 GiB of memory. Having to limit how many workers we have due to this change could potentially back up the tool a lot. If we change the default or hide the option AV1 videos in an MP4 container should still upload fine since the video data can be remuxed by ffmpeg, which shouldn't take as much memory, though I'd like to verify that. I believe most recently uploaded YouTube videos at or over 1080p in size use AV1 with either an MP4 or WebM container.
Has this gotten worse in the last couple of weeks? AV1 transcoding uses a lot more memory than VP9 encoding, so maybe that can be causing this.
Nov 21 2025
I opened a ticket in GitHub: https://github.com/toolforge/video2commons/issues/265
I opened a ticket: https://github.com/toolforge/video2commons/issues/264
Not yet! I have requested access just now: https://toolsadmin.wikimedia.org/tools/membership/status/2064
Nov 7 2025
When I download the example video you linked to I can see GPS coordinates in the global metadata labeled as LOCATION and LOCATION-eng when I probe the file with ffprobe. Here is the output that I get:
Nov 3 2025
Here are the transcoded videos: https://drive.google.com/file/d/11rW4A0XqAWhJWYjNBHwIBcYm9yRTu_F0/view?usp=sharing
Oct 30 2025
I cannot overwrite the files since they were not uploaded by me. I can provide the files to you if that's okay, though.
Oct 28 2025
In my testing when working with a single input and output file ffmpeg seemed to keep most global standard metadata.
Oct 22 2025
Is this something that can still be reproduced? I don't have access to the deployed V2C, but downloading and transcoding all of these videos with my local version of V2C doesn't result in black videos.
Jun 26 2025
It looks like this bug was introduced when we added short circuit logic at the beginning of the round that checks if all remaining hopefuls are eligible to be elected or eliminated with the current remaining seat count. This change was introduced to workaround the infinite loop bug if I recall correctly (http://phabricator.wikimedia.org/T291821). If I move that logic to the end of the round (and update the local round object) instead then we get an elected message as well. There is the problem though that the quota doesn't align with the new elected winner, which looks suspect. Is this fine? If not, would it be fine if we also added a note that says, "Candidate 'A' elected to fill remaining seats"? The same could also apply where we eliminate remaining hopefuls once we reach the seat limit.
Sounds good to me. I split the PR in two:
Jun 25 2025
I have a patch open for this: https://github.com/WikipediaLibrary/externallinks/pull/441
@Amdrel https://phabricator.wikimedia.org/p/Amdrel/ I'm looking through
the order of operations for reviewing your PRs and just wanted to make sure
I was reviewing them in the order you intended:Is this the correct order?
Jun 24 2025
We also need to run the new fill commands as well so the program totals continue to render properly. How far back do we want to go for archiving aggregates? Would aggregates older than a year work?
Jun 16 2025
It looks like all of the ballots have the same root cause.
Jun 13 2025
After investigating 12_9_4996_1067098093 I believe the new result may actually be more correct. Candidate 9 does appear to have slightly more votes because there is single ballot with multiple candidates chosen 1 1 2 3 4 5 6 7 8 9 0 in the .blt file that includes candidate 9 in it. Candidate 9 has the same amount of single votes as 10 11 and 12 do. The votes from the referenced ballot get added to candidate 9 during round 2. I suspect what may have happened was the previous implementation using floating point numbers was unable to account for the extra low ranked vote. I would have to debug the old version to compare to be sure, but the fact that OpenSTV can reproduce this result with 15 points of precision may make sense if this is the case, though 10 points of precision should have been enough which seems odd to me.
Jun 10 2025
Sorry. I have edited my above comments to include links to the after results in a user page.
Jun 5 2025
The quota and votes for candidates A and B don't exactly match. I think possibly just because we need to update the precision on the tally page to reflect above changes. @STran will know more.
Also notice how before we had a final round where Candidate A is explicitly elected. We no longer seem to do that and it might look as if they are being elected with ~509 votes when the quota is ~669.
Jun 3 2025
I looked into how the pagers work a bit more and it looks like we would have to render a table directly with all tallies instead of using a pager if we move everything to securepoll_properties.
If we're certain that adding another table will cause issues (albeit one that should be small) then I can explore using the properties table.
Jun 2 2025
May 29 2025
May 28 2025
May 27 2025
May 23 2025
I spent some time looking into this. I performed the investigation under the assumption that any new databases we would potentially add would be running on the same instance and disk as the current database as for this project everything runs together on the same instance to help lower costs.
May 21 2025
May 7 2025
May 6 2025
Apr 25 2025
As far as implementing tally modifiers in the UI goes, would it make more sense to create a new page for editing these separate from CreatePage? At least in the case of eliminated candidates it wouldn't make sense to include this during creation, though it would make sense if being edited.
Apr 24 2025
Apr 8 2025
I'm thinking it would make sense to move tallies into their own one to many table (election -> tallies) rather than keeping the results in securepoll_properties. We should be able to migrate all existing tallies into this new table.
Apr 7 2025
Apr 4 2025
Mar 27 2025
I implemented the calculation of monthly totals in my patch and that solves the issue of having to pull most of the archives from object storage for the calculation of program-level totals. It doesn't result in much additional database bloat and is performant (~200MB mostly coming from user totals). Filters still work as well.
Mar 19 2025
My monthly aggregate jobs are complete and I've got some stats on what these archives will look like. To work with the existing filters I have the archives split by organisation_id, collection_id, full_date and on_user_list (if false include all aggregates).
Mar 18 2025
I'm worried about how this is going to work with the date range filters on pages like /organisations/<pk> and /programs/<pk> if we pulled archive .json files over XHR client-side to be rendered. We need to be able to support any range of months for the chart and totals, so the archives need to be split up to enable that. If we have archives of aggregate data split by month and data from the last 6 years is requested for an organization's collection then that could be over 60 archives that need to be downloaded to generate the graph and calculate the totals below it. I don't know exactly how large the final archives are going to be yet file size wise since I'm still running the monthly aggregate jobs against my local data.
Mar 14 2025
Mar 12 2025
The CSV downloads for pages would have to pull from the aggregate archives since link stats per page are included, but for the website view it's grouped by project name instead.
I've marked the PR as ready to review. I briefly looked at write performance and haven't identified any unused indexes or any other issues with the writes themselves causing slowdowns. We're still bottlenecked by SELECT operations that precede writes to both aggregates and link events.
When we archive aggregate data and remove it from the tables it won't be viewable from the programs and organisations pages anymore. How far back are we planning to keep data? We could make a required date option for the aggregate archival commands so we have control over that regardless of what retention period we decide on.
Mar 11 2025
I uploaded a patch that makes the reason required only for vanish rejections while retaining the old behavior for renames.
Mar 5 2025
I have a work in progress PR here that adds a couple: https://github.com/WikipediaLibrary/externallinks/pull/417
Mar 4 2025
The patch I just added moves the execution of the automatic vanish to a job which can be configured to execute on Meta-Wiki.