Page MenuHomePhabricator

Add Wikimedia Commons data namespace support to Cat-a-lot
Closed, ResolvedPublic

Description

In February 2025, a categorization feature for the data namespace was added to Wikimedia Commons (T242596). These categories are defined in the JSON data of the page using the mediawikiCategories parameter. Pages in this namespace do not support wikitext and will throw a content format text/x-wiki is not supported by the content model Tabular.JsonConfig error if wikitext is added.

Currently, this means that category tools like HotCat or Cat-a-lot will not work for these pages and support for JSON content model should be added to these tools.

Example pages which are using mediawikiCategories parameter:

Parameter is used in maps, charts and tabular data in Wikimedia Commons. From Cat-a-lot perspective all of these are json formatted data with mediawikiCategories value.

Event Timeline

Zache renamed this task from Add mediawikiCategories parameter support to Cat-a-lot to Add Wikimedia Commons datanamespace support to Cat-a-lot.Mar 6 2025, 12:28 PM
Zache moved this task from Requires more info to Outreachy (Round 30) on the gadget-Cat-a-lot board.
Zache updated the task description. (Show Details)

Hi @Zache!
I wanted to let you know that I’m interested in working on this. This seems like a challenging but exciting task, and I believe it would be a great opportunity for me to learn new things.

I’ve started familiarizing myself with the available resources and I’m currently working on a strategy to approach the task. At this stage, I’m focusing on understanding how Cat-a-lot currently handles categorization for wikitext pages and how the mediawikiCategories parameter works in the data namespace. Once I have a clearer plan, I’ll share it with you for feedback before diving into implementation.

I’m particularly motivated by the potential impact this task could have on the community. Adding support for the data namespace would make Cat-a-lot more versatile and useful for a wider range of users.

If you have any resources, or advice that I need to know before getting started, please share, I’d greatly appreciate it!

Thank you and go for it. Yes, it would be useful to have it to be supported on Cat-a-lot. I updated the ticket with some example links.

Hey! I wanted to share an update on my progress. With the help of some documentation, I’ve completed the initial step: detecting the content model of the page. Cat-a-lot can now identify JSON-based pages in the Data namespace.

  1. What I’ve Done
    • Added logic to detect JSON content models (Tabular.JsonConfig, Map.JsonConfig, Chart.JsonConfig).
    • Updated the switch (ns) block to include the Data namespace (486).
    • Tested on example JSON-based pages and confirmed Cat-a-lot initializes correctly. Cat-a-lot didn’t initialize on JSON pages initially because the Data namespace wasn’t included in the code.
  1. Next Steps
    • Fetch and parse JSON data to extract mediawikiCategories. ( currently working on this )
    • Allow users to modify the mediawikiCategories array.
    • Update the UI for JSON-based pages.

I believe this approach aligns with the project's goal. Let me know if you have any suggestions!

Hi @Zache ,
I wanted to share my progress and some challenges I've encountered while implementing JSON support in Cat-a-lot. This has proven to be more complex than I initially anticipated, but I'm making steady progress by tackling issues systematically

diff for reference

Current Progress

  • Reliably identify JSON content models and initialize Cat-a-lot on Data namespace pages.
  • Implemented fetchJsonData() to retrieve and parse JSON content
  • Successfully extract mediawikiCategories from parsed JSON
  • Maintained consistent UI while internally routing JSON/wikitext edits differently

Challenges I’m Working Through

  • Tracking file objects through async operations (API calls → edits → UI updates)
  • Solved initial reference errors but the flow feels fragile
  • Translating between MediaWiki’s JSON structure and Cat-a-lot’s wikitext-based logic
  • The UI shows edit being successful for JSON pages but nothing actually gets edited. Trying to pin point the reason for this

A question that's been bugging me and i need your input on is:

  • Would it be better to:
    1. Fully isolate JSON handling in a separate module, or
    2. Continue adapting the existing wikitext logic (current approach)?

I’m committed to seeing this through, but I’d greatly appreciate your perspective on these points. The complexity has been a great learning opportunity, though at times confusing but I'm working through it steadily.

Hi @adiba_anjum

I would split the task and start from changes which are related editCategories() code path and would make that work first and submit it to the production. After that I would focus on 486 namespace handling.

About code

If i looked correctly code paths would be contained to following functions so I would follow the same idea what you have already done.

  • getContent()
    • editCategories()
      • editJsonCategories()
        • saveJsonData()
          • this.doAPICall()
      • editWikitextCategories()
        • this.doAPICall()

The both cases JSON and Wikitext editing could have same success handler so result handler would be same.

fetchJsonData()
In this function you would not need to request the json content as in getContent() request it already has JSON data in content variable similarly than with wikitext. JSON parsing is required though and error handling if the mediawikiCategories doesn't exits.

The UI shows edit being successful for JSON pages but nothing actually gets edited. Trying to pin point the reason for this

Removing categories worked. I think that the cause for nothing actually gets edited could be that you have implemented only "add" and "remove" modes, but not for example "copy" and "move" modes. If there is value "nochange":"" in API response it means that new version was identical to old version.

This is not the cleanest example, but maybe it is enough for getting he idea. You could keep the code path for JSON and Wikitext as same and only separate the code which edits the content. Something like this:

editCategories: function ( result, file, targetcat, mode ) {
   if ( !result || !result.query ) {
      // Happens on unstable wifi connections..
      this.connectionError.push( file[ 0 ] );
      this.updateCounter();
      return;
   }
   var page = CAL._getPageQuery( result );
   if ( !page || page.ns === 2 ) { return; }
   var id = page && page.revisions && page.revisions[ 0 ]
		
    if (!id) { return; }

    const timestamp = id.timestamp;
    this.starttimestamp = result.curtimestamp;
    
    // Proper JSON content model detection
    const isJson = ['Tabular.JsonConfig', 'Map.JsonConfig', 'Chart.JsonConfig']
        .includes(file.contentModel);
	
    console.log('Editing file:', file.title, 'Content model:', file.contentModel, 'Is JSON:', isJson);
	    
     if (isJson) {
         nochange, text, summary = this.editJsonCategories(result, file, targetcat, mode);
     } else {
        nochange, text, summary = this.editWikitextCategories(result, file, targetcat, mode);
    }

    if ( nochange ) {
       this.notFound.push( file[ 0 ] );
       this.updateCounter();
      return;
    }

    var data = {
			action: 'edit',
			assert: 'user',
			summary: summary,
			title: file[ 0 ],
			text: text,
                        contentmodel: file.contentModel,
			bot: true,
			starttimestamp: this.starttimestamp,
			basetimestamp: timestamp,
			watchlist: this.settings.watchlist,
			tags: this.changeTag,
			token: this.edittoken
    };
    if ( this.settings.minor ) {
       // boolean parameters are quirky, see
       // https://commons.wikimedia.org/w/api.php?action=help&modules=main#main/datatype/boolean
       data.minor = true;
    }

    this.doAPICall( data, function ( r ) {
        delete CAL.XHR[ file[ 0 ] ];
         CAL.markAsDone( file[ 1 ], mode, targetcat );
	 CAL.updateUndoCounter( r );
    } );
},

Also as FYI, there is tool named Special:ApiSandbox where you can test the API queries and see the results: Example query.

Hey, thank you for the detailed feedback! I’ve started restructuring the code as you suggested, focusing first on editCategories() to ensure the editing logic works reliably before handling namespace detection. Here’s my current approach:

  1. Phase 1: Unified Editing Flow
    • Modified editCategories() to detect content type (JSON/wikitext).
    • Split editing logic into editJsonCategories() and editWikitextCategories().
    • Both now return same values for consistency.
  1. Next Steps
    • Implement other modes for JSON.
    • Verify if nochange detection works (testing with Special:ApiSandbox, thanks a lot for suggesting this!).
    • Once editing is stable, I’ll revisit Data namespace initialization.

I’ll share a patch once I’ve tested these changes thoroughly. Thanks again for the guidance!

Hi @Zache, I'm excited to share that I've successfully implemented JSON Data namespace support for Cat-a-lot! After thorough testing, all operations now work correctly for both wikitext and JSON content model pages.

Key Changes Made:

  1. Content Model Detection:
    • Enhanced getMarkedLabels() to include content model information
    • Modified getContent() to preserve content model info
  2. Dual Processing Paths:
    • Refactored editCategories() as a router function that:
      • Detects content type (JSON/wikitext)
      • Routes to appropriate editor function
    • Kept existing editWikitextCategories() unchanged
    • Created new editJsonCategories() for JSON operations
  3. JSON-Specific Implementation:
    • Added complete support for all operations in JSON:
      • Add: Pushes to mediawikiCategories array
      • Remove: Filters from array
      • Copy: Duplicates category objects with new names
      • Move: Removes source and adds target category
    • Implemented proper JSON parsing/serialization
    • Added content model parameter to API calls
  4. Error Handling:
    • Added specific handling for JSON parse errors
    • Improved category existence checking
  5. Edge Case Handling:
    • Case sensitivity in category names- Missing mediawikiCategories arrays
    • Invalid JSON structures

I'd appreciate your review of the implementation when convenient.

For reference:
diff of the change

A fix has been proposed in subtask T390252. Please visit it to find the testing procedure and all other related info.

I checked the code. I will continue here on code review comments and leave T390252. for testing related comments.

Diff complexity
From a review point of view, it would be a good idea to keep feature-related changes as simple as possible and submit code cleanup, syntax updates, etc. in separate. Otherwise, the diff will be too complex to understand what has actually been changed and how those changes impact the program. For example, here is your original change and this is how it looks when the try/catch and white space changes were removed ( (diff)). Both cases it is somewhat complex, but in the second one, only meaningful changes remain.

Fetching contentmodel
You can add fetching the contentmodel to the API call done in getContent() so you don't need to make a separate API call for that (diff). Also, as the server is returning the data as part of revision data, we can trust that it is always available if there is revision text (i.e., no default value by us needed).

editCategories
If you keep both in same doAPICall() it keeps code simpler as different content models doesn't separate handlers etc.

Modifying getMarkedLabels
getMarkedLabels() is used externally by Gadget-ACDC so it output format cannot be changed without updating it.

editJsonCategories()

The categories which are coming from jsonData.mediawikiCategories are human written and not canonical. There are some cases which will need to be handled when comparing categorynames.

  • First character of the name is case insensitive: "categoryname and "Categoryname are same.
  • Spaces and underscores are same: "This is categoryname" and "This_is_categoryname" are same.
  • Whitespaces before and after the name are not significant and should be stripped: " Categoryname " will become as "Categoryname"

The targetcat from html title attribute mostlikely follows the form where first character is upper case and underscores.

Sortkeys
It may be needed to add support for sortkeys, as there issupport for them on the wikitext side. For example, sortkeys are parsed in regexCatBuilder (see line 770) and utilized when new categories are added to wikitext. However, based on the code alone, it's unclear for me how this should work. We should ask someone who actively uses Cat-a-lot to clarify how sortkeys are supposed to work when categories are moved / copied.

Here is my refactored version as reference. There is some variable name changes some checks what i didn't include from you because i wanted to keep diff simpler, but they were generally good ones and i could have include them.

Hi, thank you for your detailed review and the refactored version—it is incredibly helpful in clarifying the key improvements need. I appreciate the time you took to highlight areas for simplification and optimization.
I’ll revise the code based on your feedback.

editJsonCategories()

The categories which are coming from jsonData.mediawikiCategories are human written and not canonical. There are some cases which will need to be handled when comparing categorynames.

  • First character of the name is case insensitive: "categoryname and "Categoryname are same.
  • Spaces and underscores are same: "This is categoryname" and "This_is_categoryname" are same.
  • Whitespaces before and after the name are not significant and should be stripped: " Categoryname " will become as "Categoryname"

The targetcat from html title attribute mostlikely follows the form where first character is upper case and underscores.

Similarly to T386783, I suggest using mw.Title for normalization.

Hey @Zache, I wanted to share my current progress on implementing JSON category support and thank you for your guidance. Your refactored version was extremely helpful in:

  1. Simplifying the architecture by integrating content model detection into getContent()
  2. Maintaining clean separation between JSON/wikitext handlers while keeping editCategories() unified
  3. Preserving external compatibility, especially with getMarkedLabels()

Core functionality complete:

  • Add/Remove/Copy/Move operations for JSON categories
  • Proper category normalization using mw.Title (thanks @Tacsipacsi for the suggestion!)
  • Fixed infinite loader issues for edge cases (adding existing/removing non-existent JSON categories)

Actively researching:

  • Sort key support in JSON categories (investigating consistent implementation with wikitext behavior)
  • Additional edge case handling (malformed JSON, empty arrays)

Would you recommend any specific approach for: Sort key storage format in JSON?

I'll continue refining the implementation based on your suggestions. Thank you again for the clear refactoring example—it dramatically improved the code structure.

diff for reference

Hey! I've been examining how Cat-a-lot handles sort keys in both wikitext and JSON pages and following are my key observations:

  • Sort keys allow you to control how pages are sorted within a category. By default, pages are sorted alphabetically by their title. However, in wikitext pages, you can specify an alternative sorting key using the pipe character (|) in the category link.
  • Example:
    • Basic category link: [[Category:Animals]]
    • With sort key: [[Category:Animals|Elephant]] -Will be sorted under "E" as if the page title was "Elephant"
    • With only whitespace: [[Category:Animals| ]] -Will be sorted before everything else and has whitespace as sort key.

JSON Pages (Data Namespace):

  • Categories are stored in the mediawikiCategories array in the JSON structure
  • Each category is an object with name and optionally sort properties
  • Example:
{
  "mediawikiCategories": [
    {"name": "Animals", "sort": "Elephant"},
  ]
}

Current Behavior:

  • Sort keys are preserved exactly as written during copy/move operations on both wikitext and JSON pages.
  • Leading/trailing whitespace in sort keys is maintained.
  • Cat-a-lot UI provides no way to handle or modify sort keys

Proposed Changes

  • Normalize Sort Keys:
    • Strip leading/trailing whitespace from sort keys (same as category names)
    • Consistency with category name handling
    • Most users don't intend to sort under whitespace characters
    • Explicit whitespace sorting still possible with empty sort key
    • Example:
      • [[Category:Animals| Elephant ]] → [[Category:Animals|Elephant]]
      • [[Category:Animals| ]] → [[Category:Animals|]] (still sorts under whitespace)
  • UI Integration:
    • The Cat-a-lot UI may need updates to allow specifying sort keys when adding categories
    • User may want to modify the sort key through the UI itself when making edits.

Would you agree this would be a worthwhile improvement? I'm happy to provide more details or prepare a patch if this seems reasonable.

Hey! I've been examining how Cat-a-lot handles sort keys in both wikitext and JSON pages and following are my key observations:

Thank you!

Proposed Changes

  • Normalize Sort Keys:
    • Strip leading/trailing whitespace from sort keys (same as category names)
    • Consistency with category name handling
    • Most users don't intend to sort under whitespace characters
    • Explicit whitespace sorting still possible with empty sort key
    • Example:
      • [[Category:Animals| Elephant ]] → [[Category:Animals|Elephant]]
      • [[Category:Animals| ]] → [[Category:Animals|]] (still sorts under whitespace)
  • UI Integration:
    • The Cat-a-lot UI may need updates to allow specifying sort keys when adding categories
    • User may want to modify the sort key through the UI itself when making edits.

Would you agree this would be a worthwhile improvement? I'm happy to provide more details or prepare a patch if this seems reasonable.

Yes, i think it is good to implement sortkey handling for json namespace too.

Note: in wikitext these two behaviour differently:

" " sorts category before everything else:

[[Category:Foobar| ]] -> expands to Category:Foobar with sortkey " "

null sortkey removes any default sortkey and uses the category name as sortkey:

[[Category:Foobar|]] -> expands to Category:Foobar with sortkey "Foobar"

Hi, thank you for your feedback and for clarifying the distinction between whitespace and null sort keys in wikitext. I appreciate your guidance on this nuanced behavior.

I initially struggled to understand the practical use cases for sort keys, but examining examples like Category:Authors of fantastique literature (sorted under Category:Authors|Fantastique) helped clarify their value. This demonstrates how sort keys can:

  • Provide more logical organization than default title sorting
  • Maintain contextual information in the category name while optimizing display order

Thank you for explaining the specific behaviors:

  • Explicit space sort keys (| ): Force items to the top of category listings
  • Empty sort keys (|): Revert to default title-based sorting

For the JSON implementation, I'll ensure:

  • Preservation of these special cases exactly as they appear
  • Normalization of only incidental whitespace in regular sort keys (e.g., | Key → |Key)

Regarding UI Integration for sort keys: Should this be a separate enhancement? Looking forward to your feedback. For now, I'll focus exclusively on the normalization changes for JSON handling.

As a practical example of default sortkeys, Wikimedia Commons commonly uses the keyword DEFAULTSORT to set sorting in "lastname, firstname" order when the category name is formatted as "firstname lastname."

Example:

Thank you for sharing this practical example - it really helps illustrate the real-world importance of sort key.

Regarding UI Integration for sort keys: Should this be a separate enhancement? Looking forward to your feedback. For now, I'll focus exclusively on the normalization changes for JSON handling.

I think it's a good idea to follow the existing functionality what is used for wikitext, which doesn't include any UI-specific features. If we add UI elements to JSON, we would need to add them to wikitext editing as well to maintain consistency.

Instead, once this is ready, you could proceed with the second part that was left out when focus was kept on editCategories() code path.

Got it! Thanks for the guidance.

Just to confirm my understanding:

  1. Immediate Focus:
    • Finalize the editCategories() code path for JSON content models with proper sort key handling.
    • Ensure all operations (add/remove/copy/move) work flawlessly
  1. Next Steps:
    • Once the above is production-ready, I'll:
      • Explicitly handle Namespace 486 (Data:)
      • Maintain the existing JSON content model checks as the primary gate

This aligns with your suggestion in this comment, right?

Hey! I’ve added the implementation for Data namespace support in Cat-a-lot, incorporating your feedback. Here’s a brief overview of the changes:

  • The gadget now correctly handles NS 486 pages while preserving sort keys exactly as provided.
  • All operations (add/remove/copy/move) work as expected.
  • Maintained all existing functionality for both JSON and wikitext pages.

Let me know if you’d like any adjustments or further details. Thanks!

diff for reference

@adiba_anjum Thank you for the new version. Some notes.

Namespace checks
About namespace detection and validation in editJsonCategories() (lines 932 - 943 ), as well as other similar checks: I's not necessary to perform additional validation checks here to confirm the content returned by server is valid. This also applies when the function returns data for saving—as the origin of the json data is server we can trust that it is valid. if there's a format error, the server will throw an error when page is saved and it will be caught by the error handler implemented in T390242.

Because this, error handling in the earlier version was adequate. Ie. the version where there was just try catch for JSON parse errors and create of jsonData.mediawikiCategories = [] if it didn't exits.

However, in addition to these there could be namespace check in editCategories() line 893 to confirm that namespace number is 486 as expected and if not then throw an visible error. This would be just for getting errors if there is something unexpected in server configuration.

Some errors in the code

  • meta in jsonData is required. ( is there parameter meta in the data namespace specs?)
  • The second mediawikiCategories initialization in line 957  is not needed as mediawikiCategories already initialized in line 954

Hi, I've implemented your feedback on the Data namespace support for Cat-a-lot. Here's how I addressed each point:

  1. Removed Redundant Validation
    • Removed the additional JSON structure validation since the server handles this during saving (T390242)
    • Kept only the essential try-catch for JSON parsing and mediawikiCategories initialization
  1. Simplified Array Initialization
    • Eliminated the duplicate mediawikiCategories array check in the Data namespace validation block
    • Maintained just the single initialization at the start of editJsonCategories()
  1. Fixed Namespace Detection
    • Used page.ns from the API response to properly detect the edited page's actual namespace in editCategories()
  1. Preserved Core Functionality
    • Kept all category operations (add/remove/copy/move) and sort key working as before

The changes resulted in cleaner code while maintaining all functionality. Let me know if you'd like any adjustments or if there's additional feedback. Thanks again for your guidance!

diff for reference

Aklapper renamed this task from Add Wikimedia Commons datanamespace support to Cat-a-lot to Add Wikimedia Commons data namespace support to Cat-a-lot.Apr 10 2025, 5:50 AM

Thank you! I checked the code, tested it, and found it works nicely. The only thing that comes to my mind is that sortkeys could follow the source category sortkey. If there is no sortkey defined in source category, then it would omit it instead of setting an empty "" sortkey.

Hey, Thanks for testing the code! I've implemented the sort key changes that you suggested. The copy/move operations now:

  • Preserve non-empty sort keys exactly
  • Don't include empty sort key if they don't exist in the source category also omit empty/undefined sort keys.

My initial approach reason was that I wanted to maintain explicit structure consistency (always having a sort field), but I see now that omitting it entirely is cleaner and matches the expected behavior.

Let me know if you'd like any adjustments. Really appreciate your guidance!

diff for reference

This is merged to main code