Page MenuHomePhabricator

[components-api] allow specifying `source_repo`+`ref` for the config
Open, HighPublic

Description

This might mean using a structure for the config, maybe something like:

source_repo:
  url: http://gitlab.....
  ref: main
...

This has to be tested to make sure it bypasses the caches of github, for example by setting up a github pipeline that does a deployment right after a merge, and making sure it's getting the latest commit, and not the previous "cached" one.

Event Timeline

dcaro triaged this task as High priority.Aug 25 2025, 8:33 AM

It would be a breaking change, but perhaps:

source:
  url:
source:
  repo_url: 
  branch: main

That would follow the same structure as the other elements, where 1 key has multiple object types with it's own keys, and avoids needing to deal with e.g. source_url and source_repo both being specified.

The URL should probably be validated to either https:// or git:// with branch being optional (default to the remote).

I have a significant preference on also being able to specify path, which allows keeping all the files in 1 branch (e.g. https://github.com/cluebotng/component-configs) rather than needing to create a branch per tool (this is mostly so it's easy for a human to see what changed).

It might be worthwhile to perform the operation as an archive (apparently not supported by GitHub) or sparse checkout, to avoid very large repos from needing to be cloned.

I made https://phabricator.wikimedia.org/T402790 as it's not directly related to this, but implementation of this is perhaps a good time for consideration.

It's something that has niggled at me in multiple angles during trying to get this working. Feel free to un-link if you want to keep this issue clean.

+1 for both, though being non-backwards compatible we will have to support both syntaxes for a while

First stab at this:

https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/121 (validate repo url - not strictly required but generally makes sense)
https://gitlab.wikimedia.org/damian/components-api/-/merge_requests/1 (actual logic - on-top of 121, using new validated type)
https://gitlab.wikimedia.org/damian/components-api/-/merge_requests/2 (if happy using new lib, use it in the existing places - ontop of 1)

Will do some testing when lima-kilo finishes building.

Key question for now is if you are happy to bring in GitPython or want to stick to subprocess (can make a class around subprocess if needed)

I had to make some changes due to no writeable tmp directory, but the above appears to work (at least locally).

Some testing output added to https://gitlab.wikimedia.org/damian/components-api/-/merge_requests/1.

Marking all the above and not draft and leaving for review (assume the MRs will get re-created for the pipeline to run).

To close this out only https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/merge_requests/132 is pending, the rest of the links referenced above are either merged or superseded.

I struggle to see why we need an handle configurations saved in a git repo as a different from ordinary url, given that the files in a particular branch in both gitlab and github (and most git servers) can be defind as a simple url.
What am I missing?
typical example is https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/raw/replace_destination_image_comparison_with_image_name/LICENSE?ref_type=heads
the above link can be read by anything, we do not need to know it's in a git repo in branch replace_destination_image_comparison_with_image_name

@Raymond_Ndibe please read the history on the related tickets, specifically https://phabricator.wikimedia.org/T401868

Non-git http interfaces have full rights to cache, which they do, which combined with the inability to specify a specific ref, makes the http interface useless.

Honestly the code was written more than 2 weeks ago, this ticket has been open more than a month... so the time for a discussion on if it should be done seems a little time wasting at this point, especially given this ticket was for the literal implementation.