Page MenuHomePhabricator

streamconfigs action API query module returns everything(?) as associative arrays
Closed, ResolvedPublic

Description

While working on T318832: Modify the JS Client to be used non MW powered sites, I noticed that the streamconfigs action API query module returns everything(?) as associative arrays. This means that the lists in streams[].producers.metrics_platform_client.events and .provide_values are encoded as associative arrays, e.g. consider the following (truncated) config for the media.edit_attempt stream:

{
    "streams": {
        "mediawiki.edit_attempt": {
            "producers": {
                "metrics_platform_client": {
                    "events": {
                        "0": "eas."
                    },
                    "provide_values": {
                        "0": "agent_client_platform_family"
                    }
                }
            }
        }
    }
}

In JS, Java, and Swift, those properties will be deserialized to maps rather than lists.

Notes
  1. This behavior was introduced in T259917: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API

Event Timeline

Change 856592 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/extensions/EventStreamConfig@master] API: Do not tream MPC config as objects exclusively

https://gerrit.wikimedia.org/r/856592

Change 856621 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/extensions/EventStreamConfig@master] API: Only serialize streams as objects

https://gerrit.wikimedia.org/r/856621

Interesting. Ideally PHP would just do the right thing and distinguish between integer indexed arrays and associative arrays (objects).

What if...we checked the keys or the first key of the array and used that to infer the type? If the first key is 0, then we assume it is an integer indexed array, otherwise its an associative array?

Interesting. Ideally PHP would just do the right thing and distinguish between integer indexed arrays and associative arrays (objects).

For the most part PHP does the Right Thing™.

What if...we checked the keys or the first key of the array and used that to infer the type? If the first key is 0, then we assume it is an integer indexed array, otherwise its an associative array?

That's already happening. The question is what should you do when the array is empty. This was the question in T259917: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API, to which the answer was to force all parts of the stream config to be serialised as objects. This seemed overly prescriptive to me, hence this task and my patches.

I've submitted two patches:

  1. https://gerrit.wikimedia.org/r/856592 – This only fixes the issue for the MPC producer config. I'm not happy with this solution as it puts a magic string related to the Metrics Platform inside of EventStreamConfig. I suspect that more patches like this will be submitted as we add more producers and consumers
  1. https://gerrit.wikimedia.org/r/856621 – This restricts the fix for T259917: Streams with empty configs should be rendered as {} in the JSON returned by StreamConfig API such that the top-level and first-level (i.e. stream-level) configs are always serialised as objects. We will continue to return API responses like:
{
  "streams": {}
}

{
  "streams": {
    "foo": {}
  }
}

but anything beyond that is up to the MediaWiki Action API to serialise appropriately.

Oo, I see. Right empty I remember now.

So with solution 2., if a stream config setting is an array and is empty, it will serialize to a list [].

Let's do it.

Change 856592 abandoned by Phuedx:

[mediawiki/extensions/EventStreamConfig@master] API: Do not treat MPC config as objects exclusively

Reason:

I2de828679a4bd88f088a3697c5a97a94c6ec477b

https://gerrit.wikimedia.org/r/856592

Change 856621 merged by jenkins-bot:

[mediawiki/extensions/EventStreamConfig@master] API: Only serialize streams as objects

https://gerrit.wikimedia.org/r/856621

From my last comment on Gerrit:

For posterity, my testing process for this was as follows:

  1. Configured my local wiki with a copy of the production metawiki config from the operations/mediawiki-config repo
  2. curl'ed https://meta.wikimedia.org/w/api.php?action=streamconfigs&format=json&constraints=destination_event_service=eventgate-analytics-external
  3. curled http://localhost:8080/w/api.php?action=streamconfigs&format=json&constraints=destination_event_service=eventgate-analytics-external
  4. diffed the output of 2 and 3 and observed that there was no difference
  5. Applied this patch
  6. curled the URL in 3 again
  7. diffed the output of 3 and 6 and observed that the difference was limited to the Metrics Platform Client producer config
  8. Recorded the diff at https://phabricator.wikimedia.org/P41878

Note well that I used jd -set to get the diffs in 4 and 6.

Following up on the above, I captured the output of https://he.wikimedia.org/w/api.php?action=streamconfigs&format=json&constraints=destination_event_service=eventgate-analytics-external before and after the rollout of the new -wmf branch to group1 wikis, diffed them, and confirmed that the only the metrics_platform_client producer configs changed (see below):

$ jd -set he_before.json he_after.json
@ ["streams","mediawiki.edit_attempt","producers","metrics_platform_client","events"]
- {"0":"eas."}
+ ["eas."]
@ ["streams","mediawiki.edit_attempt","producers","metrics_platform_client","provide_values"]
- {"0":"agent_client_platform_family","1":"page_id","10":"performer_session_id","11":"performer_pageview_id","12":"performer_edit_count","2":"page_title","3":"page_namespace","4":"page_revision_id","5":"mediawiki_version","6":"mediawiki_is_debug_mode","7":"mediawiki_database","8":"performer_is_logged_in","9":"performer_id"}
+ ["agent_client_platform_family","page_id","page_title","page_namespace","page_revision_id","mediawiki_version","mediawiki_is_debug_mode","mediawiki_database","performer_is_logged_in","performer_id","performer_session_id","performer_pageview_id","performer_edit_count"]
@ ["streams","mediawiki.visual_editor_feature_use","producers","metrics_platform_client","events"]
- {"0":"vefu."}
+ ["vefu."]
@ ["streams","mediawiki.visual_editor_feature_use","producers","metrics_platform_client","provide_values"]
- {"0":"agent_client_platform_family","1":"mediawiki_database","2":"performer_id","3":"performer_edit_count"}
+ ["agent_client_platform_family","mediawiki_database","performer_id","performer_edit_count"]
@ ["streams","mediawiki.web_ui.interactions","producers","metrics_platform_client","curation","mediawiki_skin","in"]
- {"0":"minerva","1":"vector","2":"vector-2022"}
+ ["minerva","vector","vector-2022"]
@ ["streams","mediawiki.web_ui.interactions","producers","metrics_platform_client","events"]
- {"0":"web.ui.","1":"web_ui."}
+ ["web.ui.","web_ui."]
@ ["streams","mediawiki.web_ui.interactions","producers","metrics_platform_client","provide_values"]
- {"0":"page_namespace","1":"performer_is_logged_in","2":"performer_session_id","3":"performer_pageview_id","4":"performer_edit_count_bucket","5":"mediawiki_skin","6":"mediawiki_database"}
+ ["page_namespace","performer_is_logged_in","performer_session_id","performer_pageview_id","performer_edit_count_bucket","mediawiki_skin","mediawiki_database"]