Page MenuHomePhabricator

Figure out cloning of github repo to gerrit
Closed, ResolvedPublic3 Estimated Story Points

Description

For the blubberization of Speechoid each service needs to have it's repo cloned to gerrit.

A joint concern for all of these is how to deal with syncing. I.e. do we need to review at each sync or is syncing fully automatic. If the latter then where does the sync job live?

Event Timeline

Lokal_Profil set the point value for this task to 12.Mar 11 2020, 1:34 PM

Had a chat with some people on #wikimedia-tech@freenode about this. Nobody really has a good answer, but asked me to take a step back and explain why we need this. I explained that the short story is about blubberization, which was accepted. Then I started to think about the long story. I'm uncertain. Is it about security policies? That blubber in CI only can access some places? Doesn't that mean we're basically are trying to bypass the security policies if we automatically merge external projectes owned by someone else (e.g. Mishkal) to Gerrit? That indeed we'll have to manually review any changes prior to merge?

So yes the background is security concerns where the blubberscripts are run as part of CI on the Wikimedia infrastructure and only has access to things in the Wikimedia docker registry.

Based on that thinking (and sort of reflecting the ask description) I think manual syncing will probably be the way forward where we create a gerrit patch for the upstream changes we wish to incorporate. Possibly some of these would be subject to security review if the upstream source is not trusted to deal with such things themselves. If the upstream repo is versioned then likely our patches would correspond to new releases.

@kalle If you can take a look among the existing gerrit projects and see if anything similar already exists (local mirror of well known external project) then the recent changes should give us an idea of how they have dealt with it.

I can't seem to find any project on gerrit.wikimedia.org that is a clone of an external project. Nor has anyone I've talked to heard of this before. Do we have any contact with someone within or that knows someone within the infrastructure team that work with CI that I can contact?

Think I've managed to get in touch with someone in the release engineering team to have a chat about what's acceptable and what's not.

The short answer from release engineers on IRC is that they will only accept base images produced and hosted by WMF. This we know, and everything else is rather fuzzy. I decided to contact Finland based Lars Wirzenius of the release engineer team, explained a bit and hope he has time for a short meeting to sort things out.

Here is the mail (in Swedish) I sent:

Hej Lars,

jag heter Kalle och jobbar på Wikimedia Sverige med projektet Wikispeech.

Som del av att närma oss en acceptabel beta-release har jag börjat skriva Blubber-konfiguration för de olika tjänsterna i vårt projekt. Vi är väldigt osäkra på vilka krav som ställs på tex innehållet i våra Docker-images och de delar som utvecklats av tredje part.

Är du en person vi kan prata lite kort med gällande det här? Om inte, känner du till någon annan inom WMF du kan referera oss vidare till? Jag har försökt lite i #wikimedia-releng@freenode, men tror det skulle vara väldigt givande för oss med ett litet telefonsamtal/videomöte.



Här följer en liten summering av våra funderingar och vad vi gjort med Blubber fram tills nu:

https://github.com/karlwettin/blubberize-mary-tts-stts
https://github.com/karlwettin/blubberize-mishkal
https://github.com/karlwettin/blubberize-pronlex
https://github.com/karlwettin/blubberize-wikispeech-mockup

Detta är fyra separata tjänster som utvecklats i diverse språk. I varje repo finns skript som laddar hem källkod och beroenden från Github-repon, bygger binärer från Java och Go, förbereder små databaser (alla tjänster är dock stateless) som sedan sätts samman till en Docker-image via Blubber. 

Är det här acceptabelt sett från WMFs perspektiv? Från den information jag stött på måste alla projekt bo i Gerrit och följa en given standard. Ovanstående projekt följer absolut ingen sådan standard. 

Problemet är att vi inte äger all källkod själva. Exempelvis är Mishkal ett tredjeparts python-projekt som vi har ett beroende till för arabisk fonetik från vår patchade version av talsyntesen Mary TTS som vi heller inte äger. Jag hittar inga exempel i Gerrit på projekt som är kloner av projekt på tex Github. Det får mig tro att antingen för vi något nytt, eller så har jag inte hittat rätt dokumentation gällande beroende till tredjepartskod.

Funderingar jag då får är kring hur mycket WMF litar på tredje part? Kan vi Blubberisera en extern git master och köra in i WMFs infrastruktur? Eller måste vi noga kontrollera tredjepartskoden inför varje egen release och hänvisa till exempelvis en specifik tag vi valt att lita på?


Tusen tack i förhand,

Got a response and was forwarded to Dan Duvall (dduvall@wikimedia.org) and Tyler Cipriani (tcipriani@wikimedia.org), whom I've now contacted.

Dan will help!

Hi Karl,

Thanks for reaching out! I can definitely help you out with writing Blubber files. I'm a bit swamped this week between home duties and deployments but I will take a closer look Friday or early next week and get back to you.

Kindly,
Dan
kalle changed the point value for this task from 12 to 3.Apr 2 2020, 8:58 AM

I will speak to Dan on monday April 27 at 18:30.

See T251107

We should probably get going with creating Gerrit repos for all services, fetch upstream projects and append them with our Blubber scripts. But before we do that we should read about https://wikitech.wikimedia.org/wiki/PipelineLib/Guides/How_to_configure_CI_for_your_project