Page MenuHomePhabricator

Replace space with tabs in JSON indentation autoformatting
Open, Needs TriagePublicFeature

Description

Hello. Anytime a JSON contentmodel page is saved, it's being autoformatted. The result is extremelly huge, because all the indentation is made by spaces. I suggest to replace it with tabs, so the redundant part will shrink in four times.
For example, from a small 103-kB file: a short piece

{
    "stations": [
        {
            "number": 1,
            "x": 1036,
            "y": 771,
...

has 48 spaces, which can be replaced with 12 tabs. Thank you.

Event Timeline

I imagine space was chosen as the default because it's not easy to type a tab character in a browser.

Confirmed that there is some algorithm that adjusts JSON whitespace when you save a JSON page onwiki. I tested it out.

I checked the JSON spec and it does not specify if tabs or whitespace is required. Any or no whitespace is allowed, it sounds like.

I'm neutral on this. Just providing some info.

I imagine space was chosen as the default because it's not easy to type a tab character in a browser.

Actually, it does not matter what the user types, ACE inserts defaults, which I'm asking to change.

It matters if you want to keep what the person types in sync with what is stored in the database. That could arguably be more intuitive than constantly changing spaces to tabs, or avoiding mixing tabs and spaces on the edit page.

I imagine space was chosen as the default because it's not easy to type a tab character in a browser.

But at the same time, both CodeMirror and CodeEditor/Ace (default code editors in MediaWiki) use tabs by default, so it makes little sense that this would be justification for it.

Well, I really want to convert most of this to JSON. But it's too big because of the spaces, impossible to update frequently on tablets or mobils. So it will wait until this task is resolved.

Change 987191 had a related patch set uploaded (by Majavah; author: Majavah):

[mediawiki/core@master] Indent JsonContent using tabs

https://gerrit.wikimedia.org/r/987191

Change 987194 had a related patch set uploaded (by Majavah; author: Majavah):

[mediawiki/extensions/AbuseFilter@master] ActionVariablesIntegrationTest: Support JsonContent using tabs

https://gerrit.wikimedia.org/r/987194

Change 987194 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] ActionVariablesIntegrationTest: Support JsonContent using tabs

https://gerrit.wikimedia.org/r/987194

Change 987191 merged by jenkins-bot:

[mediawiki/core@master] Indent JsonContent using tabs

https://gerrit.wikimedia.org/r/987191

FWIW, It reduced 20% of size of blocked domains list: https://en.wikipedia.beta.wmflabs.org/w/index.php?title=MediaWiki:BlockedExternalDomains.json&action=history

which is nice because hitting page size limit was brought up as a concern back then.

Change 986665 had a related patch set uploaded (by Reedy; author: Majavah):

[mediawiki/extensions/AbuseFilter@REL1_41] ActionVariablesIntegrationTest: Support JsonContent using tabs

https://gerrit.wikimedia.org/r/986665

Change 987466 had a related patch set uploaded (by Reedy; author: Majavah):

[mediawiki/extensions/AbuseFilter@REL1_40] ActionVariablesIntegrationTest: Support JsonContent using tabs

https://gerrit.wikimedia.org/r/987466

Change 987467 had a related patch set uploaded (by Reedy; author: Majavah):

[mediawiki/core@REL1_41] Indent JsonContent using tabs

https://gerrit.wikimedia.org/r/987467

Change 987468 had a related patch set uploaded (by Reedy; author: Majavah):

[mediawiki/core@REL1_40] Indent JsonContent using tabs

https://gerrit.wikimedia.org/r/987468

Change 987469 had a related patch set uploaded (by Reedy; author: Majavah):

[mediawiki/core@REL1_39] Indent JsonContent using tabs

https://gerrit.wikimedia.org/r/987469

Change 986665 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@REL1_41] ActionVariablesIntegrationTest: Support JsonContent using tabs

https://gerrit.wikimedia.org/r/986665

Change 987466 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@REL1_40] ActionVariablesIntegrationTest: Support JsonContent using tabs

https://gerrit.wikimedia.org/r/987466

Change 987469 merged by jenkins-bot:

[mediawiki/core@REL1_39] Indent JsonContent using tabs

https://gerrit.wikimedia.org/r/987469

Change 987467 merged by jenkins-bot:

[mediawiki/core@REL1_41] Indent JsonContent using tabs

https://gerrit.wikimedia.org/r/987467

Change 987468 merged by jenkins-bot:

[mediawiki/core@REL1_40] Indent JsonContent using tabs

https://gerrit.wikimedia.org/r/987468

Re: Tech News/User-notice - What wording would you suggest as the content? My best guess is something like:

Recent changes

  • Pages that use the JSON contentmodel will now use tabs instead of spaces. This will help with some page size limit issues.

Please confirm or improve that. Thanks!

I would suggest somethink like

Pages that use the JSON contentmodel will now use tabs instead of spaces for autoindentation. This will significantly reduce the page size.

The first sentence because otherwise it will look like "name":"John Doe" -> "name":"John\tDoe" conversion, or even asking the users to insert tabs manually. The second one explains the large impact more straightforward.

Storing tabs instead of spaces in the database is an improvement, but couldn’t we just store no whitespace, automatically adding whitespace only in the edit window? It’s important to have an easy-to-understand, nicely formatted structure in the edit window when and if the page is edited manually, but it’s just unnecessary bytes when it’s stored in the database or queried for machine-use (from Lua, through the API etc.).

Yes, that would be the ideal solution. But it would probably require more work from devs.

but couldn’t we just store no whitespace, automatically adding whitespace only in the edit window

That would add code complexity to other areas, such as the code for the window or the code for converting between content models. Seems unintuitive, so could create bugs in the future when someone assumes it works the normal way for their patch but it actually works another way.

At en:Wikipedia:Village pump (technical)#Tech News: 2024-03, the first bullet point says that tab characters are now used for indentation for JSON content model. Apparently not for tabular data at commons (Page content model: Tabular.JsonConfig). I recently created c:Data:Sandbox/CS1/Identifier limits.tab using an external text editor that uses tab characters for indenting. When I saved that page at commons, all of the tab characters were replaced with space characters.

Shouldn't indenting also apply to Page content model: Tabular.JsonConfig?