Page MenuHomePhabricator

Gerritbot turns "+" into space, thus breaking most Gerrit URLs
Closed, ResolvedPublic

Description

Example at T232903#7000843:

[mediawiki/core@master] Fix the styling of the installer and deprecate its RL module

Reason:

See https://gerrit.wikimedia.org/r/c/mediawiki/core/ /679503

https://gerrit.wikimedia.org/r/674457

I think we found and fixed this once before, so perhaps this is a regression from T93331 (/cc @hashar), given the code hasn't changed much recently aside from that one afaik.

Event Timeline

Still broken :)

[mediawiki/extensions/FixedHeaderTable@master] Remove deprecated $wgParser variable

https://gerrit.wikimedia.org/r/584117

Reason:

Supeseded by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/FixedHeaderTable/ /592442

The change for T93331 is to wrap the abandoning reason with a Phabricator literal block marker: %%% . The code at https://gerrit.wikimedia.org/r/c/operations/puppet/+/675479/2/modules/gerrit/templates/its/PatchSetAbandoned.soy.erb can be summarized as:

Reason:{\n}
- {$reason}
+ %%%{$reason}%%%{\n}

The raw markup at T160811#7038582 show that the + sign in the URL got replaced by a space :

Reason:
%%%Supeseded by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/FixedHeaderTable/ /592442%%%

Sounds like urldecoding kicking in somehow? :-\

The metadata for the review in Gerrit can be retrieved with git fetch origin refs/changes/17/584117/meta and then:

$ git show fc80d03e6a697192570a85c9399b289fa9c5536c
commit fc80d03e6a697192570a85c9399b289fa9c5536c
Author: Gerrit User 34 <34@e9e9afe9-4712-486d-8885-f54b72dd1951>
Date:   Tue Apr 27 15:29:13 2021 +0000

    Update patch set 1
    
    Abandoned
    
    Supeseded by https://gerrit.wikimedia.org/r/c/mediawiki/extensions/FixedHeaderTable/+/592442
    
    Patch-set: 1
    Status: abandoned
    Tag: autogenerated:gerrit:abandon

Doesn't tell though what in the stack got it to change but I highly suspect the soy templating system (which is Google Closure Template) to kick in. As to why it would trigger when wrapped with %%% that is a mystery. But maybe it is unrelated.

I went to elevate a bunch of logging level in Gerrit. Loggers can be found using:

gerrit logging ls | grep \\.its\\.
com.googlesource.gerrit.plugins.its.base.its.ItsConfig: INFO
com.googlesource.gerrit.plugins.its.base.util.CommitMessageFetcher: INFO
com.googlesource.gerrit.plugins.its.base.util.IssueExtractor: INFO
com.googlesource.gerrit.plugins.its.base.validation.ItsValidateComment: INFO
com.googlesource.gerrit.plugins.its.base.workflow.ActionController: INFO
com.googlesource.gerrit.plugins.its.base.workflow.ActionExecutor: INFO
com.googlesource.gerrit.plugins.its.base.workflow.AddSoyComment: INFO
com.googlesource.gerrit.plugins.its.base.workflow.ItsRulesProjectCacheImpl: INFO
com.googlesource.gerrit.plugins.its.base.workflow.ItsRulesProjectCacheRefresher: INFO
com.googlesource.gerrit.plugins.its.base.workflow.RuleBase: INFO
com.googlesource.gerrit.plugins.its.phabricator.PhabricatorItsFacade: INFO
com.googlesource.gerrit.plugins.its.phabricator.PhabricatorModule: INFO
com.googlesource.gerrit.plugins.its.phabricator.conduit.ConduitConnection: INFO

We can then promote one to eg TRACE:

gerrit logging set TRACE com.googlesource.gerrit.plugins.its.base.workflow.AddSoyComment

When sending a patch https://gerrit.wikimedia.org/r/683803 attached to T281552 the + sign is not shallowed. In the log we got:

TRACE com.googlesource.gerrit.plugins.its.base.workflow.AddSoyComment : Rendered template /var/lib/gerrit2/review_site/etc/its/templates/PatchSetCreated.soy to:
Change 683803 had a related patch set uploaded (by Hashar; author: Hashar):
%%%[test/gerrit-ping@master] __version__%20%7C%20https%3A%2F%2Fexample.org%2F%2B%2Ffoo.html%%%
https://gerrit.wikimedia.org/r/683803

The payload urldecode to [test/gerrit-ping@master]__version__ | https://example.org/+/foo.html, it matches what is added as a comment at T281552#7048248 and render properly.

I then abandoned the change with the text:

__version__ | https://example.org/+/foo.html  , abandon for T280197 testing

That results in the + to disappear at T281552#7048256

In the Gerrit log, the HTTP thread formatted the template:

[HTTP POST /r/changes/test%2Fgerrit-ping~683803/abandon (hashar from [2a01:e0a:96e:e070:3c86:5938:96d3:4cb9])] TRACE com.googlesource.gerrit.plugins.its.base.workflow.AddSoyComment : Rendered template /var/lib/gerrit2/review_site/etc/its/templates/PatchSetAbandoned.soy to:
Change 683803 **abandoned** by Hashar:
%%%[test/gerrit-ping@master] __version__%20%7C%20https%3A%2F%2Fexample.org%2F%2B%2Ffoo.html%%%
Reason:
%%%__version__ | https://example.org/+/foo.html  , abandon for T280197 testing%%%
https://gerrit.wikimedia.org/r/683803

The rendering of PatchSetAbandoned.soy does not urlencode the reason whereas the PatchSetCreated.soy (it uses {$escapedSubject|escapeUri}).

The content sent to Phabricator is:

TRACE com.googlesource.gerrit.plugins.its.phabricator.conduit.ConduitConnection
Calling phabricator method maniphest.edit with the parameters {"__conduit__":{"token":"api-XXXXXXX"},"objectIdentifier":281552,"transactions":[{"type":"comment","value":"Change 683803 **abandoned** by Hashar:\n%%%[test/gerrit-ping@master] __version__%20%7C%20https%3A%2F%2Fexample.org%2F%2B%2Ffoo.html%%%\nReason:\n%%%__version__ | https://example.org/+/foo.html  , abandon for T280197 testing%%%\nhttps://gerrit.wikimedia.org/r/683803"}]}
[CONTEXT PLUGIN="gerrit" PLUGIN="its-phabricator" project="test/gerrit-ping" ]

Once urldecoding the value and replacing \n with newlines:

Change 683803 **abandoned** by Hashar:
%%%[test/gerrit-ping@master] __version__ | https://example.org/+/foo.html%%%
Reason:
%%%__version__ | https://example.org/ /foo.html  , abandon for T280197 testing%%%
https://gerrit.wikimedia.org/r/683803

And a second log message:

DEBUG com.googlesource.gerrit.plugins.its.phabricator.PhabricatorItsFacade Added comment
Change 683803 **abandoned** by Hashar:
%%%[test/gerrit-ping@master] __version__%20%7C%20https%3A%2F%2Fexample.org%2F%2B%2Ffoo.html%%%
Reason:
%%%__version__ | https://example.org/+/foo.html  , abandon for T280197 testing%%%
https://gerrit.wikimedia.org/r/683803 to bug 281552
[CONTEXT PLUGIN="gerrit" PLUGIN="its-phabricator" project="test/gerrit-ping" ]

My pure guess is we need to pipe the reason soy variable to escapeUri, and probably other fields would need the same.

Change 683810 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: escape URI in Phabricator comment

https://gerrit.wikimedia.org/r/683810

Change 683810 merged by Jbond:

[operations/puppet@production] gerrit: escape URI in Phabricator comment

https://gerrit.wikimedia.org/r/683810

Tried again and there is a server side error:

TRACE com.googlesource.gerrit.plugins.its.base.workflow.AddSoyComment
Rendered template /var/lib/gerrit2/review_site/etc/its/templates/PatchSetAbandoned.soy to:
Change 683803 **abandoned** by Hashar:
%%%[test/gerrit-ping@master] __version__%20%7C%20https%3A%2F%2Fexample.org%2F%2B%2Ffoo.html%%%
Reason:
%%%Abandonning%20after%20https%3A%2F%2Fgerrit.wikimedia.org%2Fr%2Fc%2Foperations%2Fpuppet%2F%2B%2F683810%20got%20deployed%%%
https://gerrit.wikimedia.org/r/683803

Which decodes to:

%%andonning after https://gerrit.wikimedia.org/r/c/operations/puppet/+/683810 got deployed%%%

And its-phabricator is not happy:

TRACE com.googlesource.gerrit.plugins.its.phabricator.conduit.ConduitConnection
callCapsule.error_info: Attempting to construct a query using a non-utf8 string when utf8 is expected. Use the `%B` conversion to escape binary strings data.
[CONTEXT PLUGIN="gerrit" PLUGIN="its-phabricator" project="test/gerrit-ping" ]

Which comes from Phabricator conduit :(

Change 683879 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] Revert "gerrit: escape URI in Phabricator comment"

https://gerrit.wikimedia.org/r/683879

Change 683879 merged by Jbond:

[operations/puppet@production] Revert "gerrit: escape URI in Phabricator comment"

https://gerrit.wikimedia.org/r/683879

My message had Abandonning ... , the template thus has %%%Abandonning and I guess that %A is interpreted as well as other %. I am not quite sure I understand what is going, but I suspect it might that its-phabricator does not url encode the payload when sending to Phabricator conduit. I am wondering what would happen if one abandon a change with some urlencoded string. And indeed!

I have abandoned a change with the reason %28%2B%29 and that shows up at T281552#7048989 as: (+)

com.googlesource.gerrit.plugins.its.base.workflow.AddSoyComment
Rendered template /var/lib/gerrit2/review_site/etc/its/templates/PatchSetAbandoned.soy to:

Change 683803 **abandoned** by Hashar:
%%%[test/gerrit-ping@master] __version__%20%7C%20https%3A%2F%2Fexample.org%2F%2B%2Ffoo.html%%%
Reason:
%%%%28%2B%29%%%
https://gerrit.wikimedia.org/r/683803
com.googlesource.gerrit.plugins.its.phabricator.conduit.ConduitConnection
Calling phabricator method maniphest.edit with the parameters
{"__conduit__":
  {"token":"XXXXX"}
  ,"objectIdentifier":281552,
  "transactions":[
    {"type":"comment",
     "value":"Change 683803 **abandoned** by Hashar:\n%%%[test/gerrit-ping@master] __version__%20%7C%20https%3A%2F%2Fexample.org%2F%2B%2Ffoo.html%%%\nReason:\n%%%%28%2B%29%%%\nhttps://gerrit.wikimedia.org/r/683803"
    }
  ]
}
com.googlesource.gerrit.plugins.its.phabricator.PhabricatorItsFacade
Added comment Change 683803 **abandoned** by Hashar:
%%%[test/gerrit-ping@master] __version__%20%7C%20https%3A%2F%2Fexample.org%2F%2B%2Ffoo.html%%%
Reason:
%%%%28%2B%29%%%
https://gerrit.wikimedia.org/r/683803 to bug 281552
thcipriani triaged this task as Low priority.
thcipriani added subscribers: mmodell, thcipriani.

Assigning to @mmodell to verify behavior on phab side

TLDR

If I abandon a change in Gerrit with the content %28%2B%29 it is shown as is in Gerrit but in Phabricator the comment has somehow been decoded and show up as (+). So the comment got url decoded, either by Gerrit its-phabricator / its-base plugin due to the Soy templates being too smart, or Phabricator conduit expect the payload to be urlencoded (that sounds weird).

its-phabricator code to send to conduit is:

conduit/ConduitConnection.java
  /**
   * Calls a conduit method with some parameters
   *
   * @param method The name of the method that should get called
   * @param params A map of parameters to pass to the call
   * @return The call's result, if there has been no error
   * @throws ConduitException
   */
  JsonElement call(String method, Map<String, Object> params, String token)
      throws ConduitException {
    String methodUrl = apiUrlBase + method;

    HttpPost httppost = new HttpPost(methodUrl);

    if (token != null) {
      Map<String, Object> conduitParams = new HashMap<>();
      conduitParams.put("token", token);
      params.put("__conduit__", conduitParams);
    }

    String json = gson.toJson(params);

    logger.atFinest().log("Calling phabricator method %s with the parameters %s", method, json);
    httppost.setEntity(new StringEntity("params=" + json, StandardCharsets.UTF_8));

    try (CloseableHttpResponse response = getClient().execute(httppost)) {
...

So it merely takes the HashMap of parameters, json encode those and pass the result in the POST as params=<json blob> which is not URL encoded!

I had a quick glance at similar code in another Java project https://github.com/jenkinsci/phabricator-plugin/blob/master/src/main/java/com/uber/jenkins/phabricator/conduit/ConduitAPIClient.java :

    /**
     * Post a URL-encoded "params" key with a JSON-encoded body as per the Conduit API
...

        List<NameValuePair> formData = new ArrayList<NameValuePair>();
        formData.add(new BasicNameValuePair("params", params.toString()));

        UrlEncodedFormEntity entity = new UrlEncodedFormEntity(formData, "UTF-8");
        post.setEntity(entity);

So at least that code uses org.apache.http.client.entity.UrlEncodedFormEntity and I guess its-phabricator should do the same :-]

I have send a tentative patch upstream to url encode POST made to conduit: https://gerrit-review.googlesource.com/c/plugins/its-phabricator/+/305522 But I don't know how to write tests in Java and I don't have any idea how to test the code I wrote :-\

I have send a tentative patch upstream to url encode POST made to conduit: https://gerrit-review.googlesource.com/c/plugins/its-phabricator/+/305522

Tested against in https://phab.wmflabs.org/T39#951 (well that URL is probably only temporary ... but meh. Better than nothing) and it worked as expected.

But I don't know how to write tests in Java and I don't have any idea how to test the code I wrote :-\

Not sure a unit test is useful here. But I tested manually and replacing {$escapedSubject|escapeUri} by {$subject} (using @hashar's patch on a local Gerrit) makes URLs that contain + work, escapes ' and " and \ correctly.

The patch got merged upstream. Thanks!

I don't know when the patch will be deployed to our gerrit instance. Should I resolve this now or leave it open until that happens?

The patch got merged upstream: https://gerrit-review.googlesource.com/c/plugins/its-phabricator/+/305522 but it is not picked up on a our deployment since it is solely in the master branch:

$ git branch -r --contains d6a9913
  origin/HEAD -> origin/master
  origin/master

We probably just need to merge master into stable-3.2 , then updating the plugins ( https://wikitech.wikimedia.org/wiki/Gerrit/Upgrade#Fetch_our_additional_plugins ) will pick it up.

To deploy the plugin update, we will have to adjust the soy templates that uses escapeUri since that would then be redundant and result in double escaping.

Will look at deploying that after Gerrit is upgraded to 3.2.11

Change 705499 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/software/gerrit@wmf/stable-3.2] [WMF] its-phabricator: Urlencode POST to conduit

https://gerrit.wikimedia.org/r/705499

Change 705503 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: remove escapeUri

https://gerrit.wikimedia.org/r/705503

Change 705499 merged by jenkins-bot:

[operations/software/gerrit@wmf/stable-3.2] [WMF] its-phabricator: Urlencode POST to conduit

https://gerrit.wikimedia.org/r/705499

Change 705650 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/software/gerrit@deploy/wmf/stable-3.2] Update its-phabricator: Urlencode POST to conduit

https://gerrit.wikimedia.org/r/705650

Change 705650 merged by Hashar:

[operations/software/gerrit@deploy/wmf/stable-3.2] Update its-phabricator: Urlencode POST to conduit

https://gerrit.wikimedia.org/r/705650

Mentioned in SAL (#wikimedia-operations) [2021-07-21T14:19:19Z] <hashar@deploy1002> Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # T280197

Mentioned in SAL (#wikimedia-operations) [2021-07-21T14:19:28Z] <hashar@deploy1002> Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # T280197 (duration: 00m 09s)

Gerrit reload the plugins automatically

[2021-07-21T14:20:06.975+0000]
[PluginScanner] INFO  com.google.gerrit.server.plugins.PluginLoader : Unloading plugin its-phabricator, version ab92a8d
[PluginScanner] INFO  com.google.gerrit.server.plugins.PluginLoader : Reloaded plugin its-phabricator, version 1c1d86a

I don't know about the soy template we will find out.

Change 705503 merged by Dzahn:

[operations/puppet@production] gerrit%3A%20remove%20escapeUri

https://gerrit.wikimedia.org/r/705503

@Dzahn assisted with the deployment of the new Soy template. Changes are taken in account immediately without having to restart Gerrit.

I went to the test change https://gerrit.wikimedia.org/r/c/test/gerrit-ping/+/683803 and restore/abandoned it with two different messages:

Test T280197 abandon: https://example.org/+/foo.html which results in:

...
Reason:

Test T280197 abandon: https://example.org/+/foo.html

%28%2B%29 which results in:

There is no more double encoding! An URL containing a + shows up properly.

The reason given by the user is posted verbatim to phabricator (by using %%% which prevent Phabricator from parsing it). That was done in https://gerrit.wikimedia.org/r/c/operations/puppet/+/675479 for T93331

Maybe the human given reason should be parsed? In which case we would:

--- a/modules/gerrit/templates/its/PatchSetAbandoned.soy.erb
+++ b/modules/gerrit/templates/its/PatchSetAbandoned.soy.erb
@@ -31,7 +31,7 @@
   Change {$changeNumber} **abandoned** by {$abandonerName}:{\n}
   %%%[{$project}@{$branch}] {$escapedSubject}%%%{\n}
   Reason:{\n}
-    %%%{$reason}%%%{\n}
+    {$reason}{\n}
 
   https://<%= @host %>/r/{$changeNumber}
 {/template}

Change 767521 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: use raw subject for Phabricator comments

https://gerrit.wikimedia.org/r/767521

Reopening since escapedSubject cause double quotes to show up with a backslashes in Phabricator. It is no more needed since we urlencode when doing the POST and we use %%% in Phabricator to prevent interpretation.

Change 767521 merged by Dzahn:

[operations/puppet@production] gerrit: use raw subject for Phabricator comments

https://gerrit.wikimedia.org/r/767521