Page MenuHomePhabricator

mfossati (Marco Fossati)
Software Engineer, Structured Content

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Jan 6 2022, 7:27 PM (127 w, 6 d)
Availability
Available
LDAP User
Marco Fossati
MediaWiki User
MFossati (WMF) [ Global Accounts ]

Recent Activity

Yesterday

mfossati moved T360515: UploadWizard doesn't remember any more "release rights" step decisions from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.
Wed, Jun 19, 1:43 PM · MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Regression, Structured-Data-Backlog (Current Work), UploadWizard

Thu, Jun 6

mfossati updated the task description for T364374: [L] Prepare image suggestions for a new set of Wikipedias.
Thu, Jun 6, 10:52 AM · Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Level-Image-Suggestions

Tue, Jun 4

mfossati moved T361045: [L] Improve the "use" step in the upload wizard from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.
Tue, Jun 4, 5:38 PM · MW-1.43-notes (1.43.0-wmf.9; 2024-06-11), Structured-Data-Backlog (Current Work), UploadWizard

Mon, Jun 3

mfossati updated the task description for T364374: [L] Prepare image suggestions for a new set of Wikipedias.
Mon, Jun 3, 11:03 AM · Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Level-Image-Suggestions
mfossati closed T350007: [M] Adapt image suggestions to comply with breaking database schema changes as Resolved.
isu = spark.read.table('analytics_platform_eng.image_suggestions_suggestions')
alis = isu.where('section_index is null')
slis = isu.where('section_index is not null')
Mon, Jun 3, 10:57 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
mfossati closed T350007: [M] Adapt image suggestions to comply with breaking database schema changes, a subtask of T340437: [EPIC] Data pipelines maintenance , as Resolved.
Mon, Jun 3, 10:57 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions
mfossati reopened T364374: [L] Prepare image suggestions for a new set of Wikipedias, a subtask of T340437: [EPIC] Data pipelines maintenance , as In Progress.
Mon, Jun 3, 10:56 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions
mfossati reopened T364374: [L] Prepare image suggestions for a new set of Wikipedias as "In Progress".
Mon, Jun 3, 10:56 AM · Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Level-Image-Suggestions
mfossati closed T364374: [L] Prepare image suggestions for a new set of Wikipedias as Resolved.
Mon, Jun 3, 10:55 AM · Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Level-Image-Suggestions
mfossati closed T364374: [L] Prepare image suggestions for a new set of Wikipedias, a subtask of T340437: [EPIC] Data pipelines maintenance , as Resolved.
Mon, Jun 3, 10:55 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions
mfossati updated the task description for T364374: [L] Prepare image suggestions for a new set of Wikipedias.
Mon, Jun 3, 10:15 AM · Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Level-Image-Suggestions

Fri, May 31

mfossati added a comment to T361045: [L] Improve the "use" step in the upload wizard.

@Etonkovidova @Sneha FYI as of now the patch is reverted, so we won't see the change on beta until we re-merge it.

Fri, May 31, 2:12 PM · MW-1.43-notes (1.43.0-wmf.9; 2024-06-11), Structured-Data-Backlog (Current Work), UploadWizard
mfossati moved T366266: Make captions optional when inputting descriptions from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.
Fri, May 31, 12:52 PM · MW-1.43-notes (1.43.0-wmf.8; 2024-06-04), Structured-Data-Backlog (Current Work)
mfossati changed the status of T364374: [L] Prepare image suggestions for a new set of Wikipedias from Open to In Progress.
Fri, May 31, 11:07 AM · Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Level-Image-Suggestions
mfossati changed the status of T364374: [L] Prepare image suggestions for a new set of Wikipedias, a subtask of T340437: [EPIC] Data pipelines maintenance , from Open to In Progress.
Fri, May 31, 11:07 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions
mfossati added a comment to T361061: [M] Update the 'other information' field in upload wizard.

@Etonkovidova @Sneha , the reason why I haven't added that horizontal line is because another one will show up in case of multiple uploads, so I've left it out.

Fri, May 31, 10:22 AM · MW-1.43-notes (1.43.0-wmf.8; 2024-06-04), Structured-Data-Backlog (Current Work), UploadWizard

Thu, May 30

mfossati changed the status of T362328: [L] Improve the date field in describe step of upload wizard from Open to In Progress.
Thu, May 30, 10:10 AM · Patch-For-Review, UploadWizard, Structured-Data-Backlog (Current Work)
mfossati changed the status of T362328: [L] Improve the date field in describe step of upload wizard, a subtask of T358765: [EPIC] Describe step UX improvements in the UW on Commons, from Open to In Progress.
Thu, May 30, 10:10 AM · Epic, UploadWizard, Structured-Data-Backlog (Current Work)
mfossati moved T361061: [M] Update the 'other information' field in upload wizard from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.
Thu, May 30, 10:07 AM · MW-1.43-notes (1.43.0-wmf.8; 2024-06-04), Structured-Data-Backlog (Current Work), UploadWizard

Wed, May 29

mfossati moved T361045: [L] Improve the "use" step in the upload wizard from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.
Wed, May 29, 5:58 PM · MW-1.43-notes (1.43.0-wmf.9; 2024-06-11), Structured-Data-Backlog (Current Work), UploadWizard
KStoller-WMF awarded T364374: [L] Prepare image suggestions for a new set of Wikipedias a Like token.
Wed, May 29, 1:23 PM · Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Level-Image-Suggestions
mfossati added a comment to T364374: [L] Prepare image suggestions for a new set of Wikipedias.

Hey @KStoller-WMF , chiming in while @AUgolnikova-WMF is out of office: yes, I'll pick up this ticket next week. Stay tuned!

Wed, May 29, 9:32 AM · Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Level-Image-Suggestions

Tue, May 28

mfossati updated the task description for T363707: UploadWizard homeButton mal formatted link.
Tue, May 28, 3:02 PM · MW-1.43-notes (1.43.0-wmf.8; 2024-06-04), Structured-Data-Backlog (Current Work), UploadWizard

Mon, May 27

mfossati moved T361061: [M] Update the 'other information' field in upload wizard from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
Mon, May 27, 2:49 PM · MW-1.43-notes (1.43.0-wmf.8; 2024-06-04), Structured-Data-Backlog (Current Work), UploadWizard

Wed, May 22

mfossati changed the status of T361061: [M] Update the 'other information' field in upload wizard from Open to In Progress.
Wed, May 22, 3:24 PM · MW-1.43-notes (1.43.0-wmf.8; 2024-06-04), Structured-Data-Backlog (Current Work), UploadWizard
mfossati changed the status of T361061: [M] Update the 'other information' field in upload wizard, a subtask of T358765: [EPIC] Describe step UX improvements in the UW on Commons, from Open to In Progress.
Wed, May 22, 3:24 PM · Epic, UploadWizard, Structured-Data-Backlog (Current Work)

May 15 2024

mfossati moved T361050: [XL] Improve how categories field is displayed in the upload wizard from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
May 15 2024, 2:33 PM · MW-1.43-notes (1.43.0-wmf.6; 2024-05-21), Structured-Data-Backlog (Current Work), UploadWizard
mfossati updated mfossati.
May 15 2024, 2:15 PM
mfossati updated the task description for T364551: [SPIKE] Send an image thumbnail to the logo detection service within Upload Wizard.
May 15 2024, 9:20 AM · Patch-For-Review, Structured-Data-Backlog (Current Work), Machine-Learning-Team

May 14 2024

mfossati added a comment to T363506: Pass image objects to the logo detection service.

We concluded that we will figure out the format after the team figures out the spike (accessing the image and sending a thumbnail to Lift Wing).

See T364551: [SPIKE] Send an image thumbnail to the logo detection service within Upload Wizard

I'd suggest we proceed with a base64 encoded image for now.

With binary being the preferred format, right?

May 14 2024, 2:23 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati renamed T364551: [SPIKE] Send an image thumbnail to the logo detection service within Upload Wizard from [SPIKE] Resize an image file to 224x224 pixels within Upload Wizard to [SPIKE] Send an image thumbnail to the logo detection service within Upload Wizard.
May 14 2024, 2:21 PM · Patch-For-Review, Structured-Data-Backlog (Current Work), Machine-Learning-Team

May 13 2024

mfossati added a comment to T362749: Deploy logo-detection model-server to LiftWing staging.

Yes, Upload stash shouldn't be accessed directly or indirectly. It is internal to mediawiki and private.

Having it private makes total sense from a user privacy point of view. This would also mean that sending image thumbnails from the stash to Lift Wing is out of the question.

I think that the logo detection service can be exposed through an internal endpoint, so it will be inside WMF’s infrastructure.
Moreover, when an image is sent to the upload stash, there’s a set of already implemented checks including existing duplicates and previously deleted duplicates.

May 13 2024, 2:23 PM · Machine-Learning-Team
mfossati added a comment to T362749: Deploy logo-detection model-server to LiftWing staging.

you can just send over the file to liftwing maybe? (we should consider alternative designs and so on).

See T363506: Pass image objects to the logo detection service.

May 13 2024, 1:16 PM · Machine-Learning-Team

May 10 2024

mfossati updated subscribers of T362749: Deploy logo-detection model-server to LiftWing staging.

@mfossati is there any other way to access the images in the upload stash other than using a cookie. Using a user cookie to access an API doesn't seem like the right way for a production application both from a design as well as a security point of view. An API key/token would seem more appropriate (if there is such an option available).

I agree and have dug deeper in the current request being made to the Upload API: maybe the CSRF token is what we're looking for. See upload_file_in_chunks in the example request code. I can confirm that the Upload Wizard is sending a token parameter in the request.

May 10 2024, 10:10 AM · Machine-Learning-Team
mfossati added a comment to T361049: [XL] Improve the file name, caption, and description fields.

(2) I have some problems testing these two AC:

  • Pre-fill the title using file name if it matches the descriptive criteria, if not leave it blank
  • Update the copy for the current error message for when the user has not entered a descriptive title as show here.
May 10 2024, 8:45 AM · MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), Structured-Data-Backlog (Current Work), UploadWizard
mfossati added a comment to T361049: [XL] Improve the file name, caption, and description fields.

(1) the scope of re-designing Describe step presently doesn't include Additional information from the figma mockup

Chiming in: this will be done in T361061: [M] Update the 'other information' field in upload wizard.

May 10 2024, 8:13 AM · MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), Structured-Data-Backlog (Current Work), UploadWizard

May 9 2024

mfossati updated subscribers of T363506: Pass image objects to the logo detection service.

@mfossati I am in favor of passing the image object in some serialized form.
We would need the upload wizard to send a resized image (224x224) instead of the whole file.

I've opened T364551: [SPIKE] Send an image thumbnail to the logo detection service within Upload Wizard to investigate the feasibility of this solution.

May 9 2024, 3:36 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati added a comment to T363506: Pass image objects to the logo detection service.

@mfossati We noticed that the user can define the width in the url like in this example http://commons.wikimedia.org/w/index.php?title=Special:FilePath&file=Cambia_logo.png&width=224. If we can use this then it would be sufficient and we can stick with using urls in the request.

Hmm, I've just given it a try and I think it won't work for stashed images, which is a hard requirement for us.

@isarantopoulos @kevinbazira , I think I found how to get a thumbnail from a stashed image. There you go: https://commons.wikimedia.org/wiki/Special:UploadStash/thumb/1awuam969hko.2tkfbz.10893556.png/224px-1awuam969hko.2tkfbz.10893556.png, where 1awuam969hko.2tkfbz.10893556.png is the stash file key. The 224px- prefix is the width size.
Of course, I feel there's a caveat, as it seems that the thumbnail is generated on the fly at request time. Still not optimal, but sounds like a workable solution.

May 9 2024, 3:27 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati created T364551: [SPIKE] Send an image thumbnail to the logo detection service within Upload Wizard.
May 9 2024, 2:36 PM · Patch-For-Review, Structured-Data-Backlog (Current Work), Machine-Learning-Team
mfossati added a comment to T363506: Pass image objects to the logo detection service.

We would need the upload wizard to send a resized image (224x224) instead of the whole file.

I can imagine we can tackle that from within the Upload Wizard with some JavaScript library. I can create a ticket to look into that if you think this would be the best solution.

May 9 2024, 9:14 AM · Machine-Learning-Team, Structured-Data-Backlog
mfossati awarded T363506: Pass image objects to the logo detection service a Mountain of Wealth token.
May 9 2024, 9:09 AM · Machine-Learning-Team, Structured-Data-Backlog
mfossati added a comment to T363506: Pass image objects to the logo detection service.

If one user sends a request with 50 image URLs and another sends a request with 50 serialized images objects, the latter is likely to exceed the server's request body size limit faster.

Thinking out loud: what about sending multiple requests if the limit is reached? I speculate that 50 uploads are an edge case: if this happens, we could dispatch different requests.

May 9 2024, 8:59 AM · Machine-Learning-Team, Structured-Data-Backlog

May 8 2024

mfossati added a comment to T363506: Pass image objects to the logo detection service.

@mfossati We noticed that the user can define the width in the url like in this example http://commons.wikimedia.org/w/index.php?title=Special:FilePath&file=Cambia_logo.png&width=224. If we can use this then it would be sufficient and we can stick with using urls in the request.

Hmm, I've just given it a try and I think it won't work for stashed images, which is a hard requirement for us.

May 8 2024, 4:53 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati added a comment to T363506: Pass image objects to the logo detection service.

@isarantopoulos , totally agree, makes a lot of sense.

May 8 2024, 1:15 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati moved T361049: [XL] Improve the file name, caption, and description fields from Code Review to Needs QA on the Structured-Data-Backlog (Current Work) board.
May 8 2024, 8:32 AM · MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), Structured-Data-Backlog (Current Work), UploadWizard

May 7 2024

mfossati moved T350007: [M] Adapt image suggestions to comply with breaking database schema changes from Doing to Verify on Production on the Structured-Data-Backlog (Current Work) board.

Fix deployed & pipeline resumed. Needs some monitoring.

May 7 2024, 2:33 PM · Structured-Data-Backlog (Current Work), Image-Suggestions
mfossati added a subtask for T340437: [EPIC] Data pipelines maintenance : T364374: [L] Prepare image suggestions for a new set of Wikipedias.
May 7 2024, 9:25 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions
mfossati added a parent task for T364374: [L] Prepare image suggestions for a new set of Wikipedias: T340437: [EPIC] Data pipelines maintenance .
May 7 2024, 9:25 AM · Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Level-Image-Suggestions
mfossati created T364374: [L] Prepare image suggestions for a new set of Wikipedias.
May 7 2024, 9:24 AM · Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Level-Image-Suggestions

May 6 2024

mfossati added a comment to T362749: Deploy logo-detection model-server to LiftWing staging.

@achou pointed out that files might not be accessible since the upload stash docs state: files not be public, and only writable/accessible by the uploader.

@mfossati, if images uploaded to the stash are private to the user, how will the tool you build to do logo-detection be able to access these image URLs or serialized image objects and send them to the LiftWing API to get a prediction?

Great catch, I totally missed this!
I've just scratched the surface: it seems that the stash URL request should contain some logged-in user session ID to enable access, which is stored in a cookie. We'll have to dig into the Upload stash code base to fully understand the mechanism. For now I can see cookies like commonswikiSession that ring a bell.
That said, what if we T363506: Pass image objects to the logo detection service instead? Would that not require a logged-in user? Definitely an open question.

May 6 2024, 3:10 PM · Machine-Learning-Team

May 2 2024

mfossati added a comment to T363506: Pass image objects to the logo detection service.

We would need the upload wizard to send a resized image (224x224) instead of the whole file. Is that something you are already considering or think it would be easy to try?

We haven't thought of this yet, mainly because pre-processing logic on the model side already handles resizing. That said, I agree it'd be better to directly send the 224x224 image object.

May 2 2024, 1:07 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati changed the status of T350007: [M] Adapt image suggestions to comply with breaking database schema changes from Open to In Progress.

Change deployed:

0: jdbc:hive2://analytics-hive.eqiad.wmnet:10> describe wmf_raw.mediawiki_pagelinks;
+--------------------------+-----------------------+--------------------------------------------------------------------------------------+
|         col_name         |       data_type       |                                       comment                                        |
+--------------------------+-----------------------+--------------------------------------------------------------------------------------+
| pl_from                  | bigint                | Key to the page_id of the page containing the link                                   |
| pl_from_namespace        | int                   | MediaWiki version:  ? 1.24 - page_namespace of the page containing the link          |
| pl_target_id             | bigint                | Foreign key to linktarget.                                                           |
| snapshot                 | string                | Versioning information to keep multiple datasets (YYYY-MM for regular labs imports)  |
| wiki_db                  | string                | The wiki_db project                                                                  |
|                          | NULL                  | NULL                                                                                 |
| # Partition Information  | NULL                  | NULL                                                                                 |
| # col_name               | data_type             | comment                                                                              |
|                          | NULL                  | NULL                                                                                 |
| snapshot                 | string                | Versioning information to keep multiple datasets (YYYY-MM for regular labs imports)  |
| wiki_db                  | string                | The wiki_db project                                                                  |
+--------------------------+-----------------------+--------------------------------------------------------------------------------------+
May 2 2024, 9:11 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
mfossati changed the status of T350007: [M] Adapt image suggestions to comply with breaking database schema changes, a subtask of T340437: [EPIC] Data pipelines maintenance , from Open to In Progress.
May 2 2024, 9:11 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

May 1 2024

mfossati added a comment to T361049: [XL] Improve the file name, caption, and description fields.

@Sneha :

We're asking the user to omit the file extension, so https://commons.wikimedia.org/wiki/MediaWiki:Titleblacklist-custom-space and https://commons.wikimedia.org/wiki/MediaWiki:Titleblacklist-custom-SVG-thumbnail are less likely.

May 1 2024, 9:26 AM · MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), Structured-Data-Backlog (Current Work), UploadWizard

Apr 30 2024

mfossati updated subscribers of T361049: [XL] Improve the file name, caption, and description fields.

@mfossati If I understood this correctly, it seem currently the dialog is pulling text from that first link and any changes on that page would be reflected in our dialog?

Correct. It's an on-wiki system message.

Could we have a custom dialog with only example text that is not linked to any of these pages. It seems there are a lot of variation of these pages so we can't confidently rely on one. We are not showing any links or additional text. We are only showing examples (which are unlikely to change.)

Yes, we could, but those messages seem to come from the community (e.g., https://commons.wikimedia.org/w/index.php?title=MediaWiki:Titleblacklist-custom-filename&action=history), so I'd opt for keeping the process intact, i.e., propose the updates on wiki. @Sannita, what do you think?

Apr 30 2024, 6:47 PM · MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), Structured-Data-Backlog (Current Work), UploadWizard
mfossati moved T363707: UploadWizard homeButton mal formatted link from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
Apr 30 2024, 11:07 AM · MW-1.43-notes (1.43.0-wmf.8; 2024-06-04), Structured-Data-Backlog (Current Work), UploadWizard
mfossati changed the status of T361050: [XL] Improve how categories field is displayed in the upload wizard from Open to In Progress.
Apr 30 2024, 9:28 AM · MW-1.43-notes (1.43.0-wmf.6; 2024-05-21), Structured-Data-Backlog (Current Work), UploadWizard
mfossati changed the status of T361050: [XL] Improve how categories field is displayed in the upload wizard, a subtask of T358765: [EPIC] Describe step UX improvements in the UW on Commons, from Open to In Progress.
Apr 30 2024, 9:27 AM · Epic, UploadWizard, Structured-Data-Backlog (Current Work)
mfossati changed the status of T363707: UploadWizard homeButton mal formatted link from Open to In Progress.
Apr 30 2024, 9:23 AM · MW-1.43-notes (1.43.0-wmf.8; 2024-06-04), Structured-Data-Backlog (Current Work), UploadWizard
mfossati moved T361049: [XL] Improve the file name, caption, and description fields from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.
  • Update the copy on the "view example" dialog as shown here

@Sneha, I think we need a Commons admin to update https://commons.wikimedia.org/wiki/MediaWiki:Titleblacklist-custom-filename. The patch has a temporary workaround so that we can test it, but I suggest to remove it before merging.
FYI the following custom messages also exist:

Apr 30 2024, 9:21 AM · MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), Structured-Data-Backlog (Current Work), UploadWizard

Apr 29 2024

mfossati closed T352748: [SPIKE] Image classifier prototype, a subtask of T349641: [EPIC] MVP Logo machine detection in Upload Wizard , as Resolved.
Apr 29 2024, 3:59 PM · UploadWizard, Epic, Structured-Data-Backlog (Current Work)
mfossati closed T352748: [SPIKE] Image classifier prototype as Resolved.
Apr 29 2024, 3:59 PM · Structured-Data-Backlog (Current Work)
mfossati added a project to T363506: Pass image objects to the logo detection service: Machine-Learning-Team.
Apr 29 2024, 2:22 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati added a subtask for T358676: Host a logo detection model for Commons images: T363506: Pass image objects to the logo detection service.
Apr 29 2024, 2:21 PM · Structured-Data-Backlog (Current Work), Machine-Learning-Team
mfossati added a parent task for T363506: Pass image objects to the logo detection service: T358676: Host a logo detection model for Commons images.
Apr 29 2024, 2:21 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati added a project to T363505: Pass the maximum number of uploads to the logo detection service: Machine-Learning-Team.
Apr 29 2024, 2:21 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati added a subtask for T358676: Host a logo detection model for Commons images: T363505: Pass the maximum number of uploads to the logo detection service.
Apr 29 2024, 2:21 PM · Structured-Data-Backlog (Current Work), Machine-Learning-Team
mfossati added a parent task for T363505: Pass the maximum number of uploads to the logo detection service: T358676: Host a logo detection model for Commons images.
Apr 29 2024, 2:21 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati added a project to T363503: Ignored exception in the logo detection prototype: Machine-Learning-Team.
Apr 29 2024, 2:20 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati added a subtask for T358676: Host a logo detection model for Commons images: T363503: Ignored exception in the logo detection prototype.
Apr 29 2024, 2:19 PM · Structured-Data-Backlog (Current Work), Machine-Learning-Team
mfossati added a parent task for T363503: Ignored exception in the logo detection prototype: T358676: Host a logo detection model for Commons images.
Apr 29 2024, 2:19 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati added a parent task for T358676: Host a logo detection model for Commons images: T349641: [EPIC] MVP Logo machine detection in Upload Wizard .
Apr 29 2024, 2:17 PM · Structured-Data-Backlog (Current Work), Machine-Learning-Team
mfossati added a subtask for T349641: [EPIC] MVP Logo machine detection in Upload Wizard : T358676: Host a logo detection model for Commons images.
Apr 29 2024, 2:17 PM · UploadWizard, Epic, Structured-Data-Backlog (Current Work)

Apr 26 2024

mfossati moved T361049: [XL] Improve the file name, caption, and description fields from Code Review to Doing on the Structured-Data-Backlog (Current Work) board.

Back on it.

Apr 26 2024, 3:15 PM · MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), Structured-Data-Backlog (Current Work), UploadWizard
mfossati closed T347566: [M] Send an alert in case of no ALIS or SLIS as Resolved.

Closing, see T347569#9747385.

Apr 26 2024, 10:27 AM · Structured-Data-Backlog (Current Work), Section-Level-Image-Suggestions, Image-Suggestions
mfossati closed T347566: [M] Send an alert in case of no ALIS or SLIS, a subtask of T340437: [EPIC] Data pipelines maintenance , as Resolved.
Apr 26 2024, 10:26 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions
mfossati closed T347569: [L] Block search indices update and Cassandra tasks in case of no ALIS or SLIS data as Resolved.

Checked raw counts of the last 5 snapshots:

snapshots = ('2024-03-18', '2024-03-25', '2024-04-01', '2024-04-08', '2024-04-15')
tables = ('image_suggestions_instanceof_cache', 'image_suggestions_lead_image_data', 'image_suggestions_search_index_delta', 'image_suggestions_search_index_full', 'image_suggestions_suggestions', 'image_suggestions_title_cache', 'image_suggestions_wikidata_data')
for s in snapshots:
    print(s)
    for t in tables:
        print(t)
        ddf = spark.read.table(f'analytics_platform_eng.{t}').where(f"snapshot='{s}'")
        print(ddf.count())
    print()
2024-03-18
image_suggestions_instanceof_cache
5405468
image_suggestions_lead_image_data
8046032
image_suggestions_search_index_delta
6984930
image_suggestions_search_index_full
74311759
image_suggestions_suggestions
369129698
image_suggestions_title_cache
5228917
image_suggestions_wikidata_data
104657563
Apr 26 2024, 10:25 AM · Structured-Data-Backlog (Current Work), SDAW-MediaSearch, Section-Level-Image-Suggestions, Image-Suggestions
mfossati closed T347569: [L] Block search indices update and Cassandra tasks in case of no ALIS or SLIS data, a subtask of T340437: [EPIC] Data pipelines maintenance , as Resolved.
Apr 26 2024, 10:24 AM · Epic, Structured-Data-Backlog (Current Work), Image-Suggestions, Section-Topics, Essential-Work, Section-Level-Image-Suggestions

Apr 25 2024

mfossati created T363506: Pass image objects to the logo detection service.
Apr 25 2024, 5:15 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati created T363505: Pass the maximum number of uploads to the logo detection service.
Apr 25 2024, 5:10 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati created T363503: Ignored exception in the logo detection prototype.
Apr 25 2024, 5:02 PM · Machine-Learning-Team, Structured-Data-Backlog
mfossati awarded T362749: Deploy logo-detection model-server to LiftWing staging a Burninate token.
Apr 25 2024, 9:05 AM · Machine-Learning-Team
mfossati added a comment to T362749: Deploy logo-detection model-server to LiftWing staging.

@mfossati, when a model-server is deployed within the WMF k8s infrastructure it has to be configured to enable it to access external resources like wikimedia, wikipedia, and wikidata (see details here). Is it possible for the Structured content team to provide sample URLs from the commons upload stash? This will enable us to configure the logo-detection model-server to access them from LiftWing. Thanks in advance.

Hey @kevinbazira , here's how a public stash URL would look like: https://commons.wikimedia.org/wiki/Special:UploadStash/file/1avpfxdmdb4c.deuia.10893556.png. The only variable would be the file key, i.e., 1avpfxdmdb4c.deuia.10893556.png.
Not 100% sure, but I guess that you can go for http://localhost:6500/wiki/Special:UploadStash/file/1avpfxdmdb4c.deuia.10893556.png, with commons.wikimedia.org as the host header.

Apr 25 2024, 8:45 AM · Machine-Learning-Team

Apr 17 2024

mfossati moved T361049: [XL] Improve the file name, caption, and description fields from Doing to Code Review on the Structured-Data-Backlog (Current Work) board.

Submitted a draft patch that needs extra pairs of eyes. Moving to code review.

Apr 17 2024, 6:16 PM · MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), Structured-Data-Backlog (Current Work), UploadWizard

Apr 16 2024

mfossati added a comment to T350007: [M] Adapt image suggestions to comply with breaking database schema changes.

Migration will complete in roughly one week and old columns will be dropped in two weeks: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/Y4C7W4TEC7DXXTY3HKDBG7HB56QBRXPY/
If this all lands in April, wmf_raw.mediawiki_pagelinks/snapshot=2024-04 will contain the breaking changes.

Apr 16 2024, 11:34 AM · Structured-Data-Backlog (Current Work), Image-Suggestions
mfossati placed T293878: [L] Gather labeled data relevant to synonyms up for grabs.
Apr 16 2024, 9:09 AM · WMF-Inspiration-Week-2022-ML-Collab, Structured-Data-Backlog

Apr 10 2024

mfossati updated the task description for T358756: [SPIKE] Determine general classes of available Commons images.
Apr 10 2024, 11:15 AM · Structured-Data-Backlog
mfossati closed T361254: Build the logo detection demo page as Resolved.

Published at https://commons.wikimedia.org/wiki/Commons:WMF_support_for_Commons/Upload_Wizard_Improvements/Logo_detection, closing. Thanks @Sannita for your work!

Apr 10 2024, 11:15 AM · UploadWizard, Structured-Data-Backlog (Current Work)
mfossati closed T361254: Build the logo detection demo page, a subtask of T349641: [EPIC] MVP Logo machine detection in Upload Wizard , as Resolved.
Apr 10 2024, 11:14 AM · UploadWizard, Epic, Structured-Data-Backlog (Current Work)
mfossati added a parent task for T362218: Diff blog post on Commons images analysis: T357587: [Research EPIC] Media quality investigation on Commons FY23/24.
Apr 10 2024, 9:55 AM · Commons, Structured-Data-Backlog
mfossati added a subtask for T357587: [Research EPIC] Media quality investigation on Commons FY23/24: T362218: Diff blog post on Commons images analysis.
Apr 10 2024, 9:55 AM · Epic, Structured-Data-Backlog (Current Work)
mfossati created T362218: Diff blog post on Commons images analysis.
Apr 10 2024, 9:51 AM · Commons, Structured-Data-Backlog

Apr 9 2024

mfossati changed the status of T361049: [XL] Improve the file name, caption, and description fields from Open to In Progress.
Apr 9 2024, 10:41 AM · MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), Structured-Data-Backlog (Current Work), UploadWizard
mfossati changed the status of T361049: [XL] Improve the file name, caption, and description fields, a subtask of T358765: [EPIC] Describe step UX improvements in the UW on Commons, from Open to In Progress.
Apr 9 2024, 10:41 AM · Epic, UploadWizard, Structured-Data-Backlog (Current Work)

Apr 8 2024

mfossati added a comment to T358756: [SPIKE] Determine general classes of available Commons images.

Outcome of a quick investigation on available pre-trained models that may fit our use case:

  • it seems that pre-training is generally done on standard benchmark datasets, check out this list
  • keras offers models pre-trained on the following datasets:
datasettasks# classesfit
ImageNet-1k [1, 2]image classification1,000
COCOobject detection, segmentation80 (objects) + 91 (stuff)TODO try out a model
SA1Bsegmentationnoneunlikely
  • it may be worth to look for models trained on CIFAR-100, with 100 classes grouped into 20 super-classes
  • need to explore Hugging Face's models
Apr 8 2024, 3:22 PM · Structured-Data-Backlog
mfossati updated the task description for T358756: [SPIKE] Determine general classes of available Commons images.
Apr 8 2024, 1:18 PM · Structured-Data-Backlog

Apr 5 2024

mfossati added a comment to T350007: [M] Adapt image suggestions to comply with breaking database schema changes.

According to T345771#9526320:

  • The old columns have been dropped in testwiki and will be dropped soon (this and next week) on commonswiki and testcommonswiki.
    • The rest of wikis will keep the old schema until all wikis have been migrated (or at least almost all of them if we realize wikidata is taking way too long).
Apr 5 2024, 1:32 PM · Structured-Data-Backlog (Current Work), Image-Suggestions

Apr 4 2024

mfossati awarded T358676: Host a logo detection model for Commons images a Burninate token.
Apr 4 2024, 8:57 AM · Structured-Data-Backlog (Current Work), Machine-Learning-Team
mfossati added a comment to T358676: Host a logo detection model for Commons images.

The prototype looks good to me, I'm excited to see this effort move to the next level!
@kevinbazira, I've especially appreciated the tightness of our development iterations 😄 .

Apr 4 2024, 8:56 AM · Structured-Data-Backlog (Current Work), Machine-Learning-Team
mfossati added a comment to P58917 logo-detection: prototype for JSON input, preprocess with Keras, and return JSON output.

@kevinbazira , I can confirm that inputs and outputs are fine.
FYI, I've fixed the expected type of the image dataset, so please use the latest commit.

Apr 4 2024, 8:52 AM · Machine-Learning-Team