Page MenuHomePhabricator

Gather data how much code development activity takes place canonically in Phabricator Diffusion
Open, LowPublic

Description

This task is a followup to T212904: Create a Tech Contributors Map to document major audiences and their goals.
This task is related to T237470: Create and maintain somehow a list of repos mastered in GitHub (and in Phabricator Diffusion) when it comes to the "Diffusion" part in its task summary.

We seem to have >300 repositories which use Phabricator Diffusion as a canonical place. As far as I know, we do not have stats how much development activity happens in these repos.

First question is probably: How to differentiate which commits were not "imported" but are canonical in Diffusion?

Event Timeline

Aklapper triaged this task as Low priority.Dec 31 2019, 10:31 PM
Aklapper created this task.

Have not taken a really close look, so just dropping stuff here:

bd808 added a subscriber: bd808.Dec 31 2019, 11:25 PM

The isImporting attribute seems to about active imports, not mirroring. The wording on the conduit API page is not clear, but listing all of the repos where this flag is false is a random grab bag of repos.

$ python3 diffusion_stats.py
extension-Athena
extension-Survey
MediaWiki Extensions
extension-ApprovedRevs
extension-Flow
wikimedia-cz-tracker
extension-JsonConfig
extension-LdapAuthentication
MediaWiki
extension-Html2Wiki
extension-GWToolset
extension-ReadingLists
extension-Newsletter
extension-TimedMediaHandler
education-program-dashboard
extension-WikimediaMaintenance
extension-Collection
extension-LiquidThreads
extension-Scribunto
integration-config
extension-Jade
mediawiki-services-mobileapps
CirrusSearch
extension-Duplicator
extension-TwitterLogin
analytics-quarry-web
Gerrit Code Review
labs-tools-wikibugs2
mediawiki/oauthclient-php
extension-WikiSEO
Wikimedia Puppet
extension-Genealogy
extension-AnonPrivacy
extension-Cognate
tool-libraryupgrader
labs/libraryupgrader
translatewiki
labs-tools-shorturls
tool-reviewers
tool-phpcs
tool-newusers
tool-ldap
tool-mwpackages
operations-software
extension-WikiLove
extension-GlobalCssJs
labs-tools-watch-translations
labs-tools-commons-mass-description
operations-software-tools-webservice
The official Wikipedia iOS application
tool-xtools
wikidata-query-gui
wikipedia-gadgets-ProveIt
Phabricator Translations
VisualEditor
Huggle
extension-DataTransfer
extension-Drafts
labs-tool-extdist
extension-RegexBlock
extensions-Mailgun
extension-GeoData
extension-EditSimilar
extension-Cargo
extension-Wikibase
extension-LogEntry
extension-AbsenteeLandlord
mediawiki/extensions/WikidataPageBanner
extension-WikimediaEvents
extension-TranslateSvg
extension-AdminLinks
extension-SimpleSAMLphp
extension-UserPageViewTracker
extension-PagedTiffHandler
extension-NSFileRepo
extension-MultiLanguageManager
extension-Math
extension-MachineVision
extension-Lingo
extension-LDAPGroups
extension-ActiveAbstract
extension-UserAgreement
extension-MarkAsHelpful
extension-BlueSpiceVisualDiff
extension-SocialLogin
extension-Cite
extension-QuizGame
extension-MintyDocs
extension-NewUserMessage
extension-Interwiki
labs-tools-bookreader
extension-CollaborationKit
labs-private
cloud-instance-puppet
extension-GoogleAuthenticator
extension-Calendar
extension-GettingStarted
extension-LifeWeb
extension-Examples
extension-PageTriage
diffusion_stats.py
import phabricator


phab = phabricator.Phabricator(
    host="https://phabricator.wikimedia.org/api/",
    token="<PUT YOUR API KEY HERE>",
)
phab.update_interfaces()

for r in phab.diffusion.repository.search(queryKey="active")["data"]:
    if not r["fields"]["isImporting"]:
        print(r["fields"]["name"])

Wondering if @mmodell has any idea how to approach this (UI or API wise) when it comes to finding which code repos are canonical in Diffusion (I guess/hope I'll find out pulling the latest changes and their authors myself). If not, that is totally fine too. Just worth a shot I thought, before I spend more time on this.

Phabricator repos have associated URIs which can be set to "observe" .. Any repo without an observe url is "hosted in diffusion." The diffusion.repository.search api supports the "uris" attachment.

So what you'd need to do is query the repository.search api and pass { "attachments": { "uris": true } } and then look for the URI with io set to "observe"

Here's an example of the api response:

{
  "result": {
    "data": [

      {
      "id": 459,
      "type": "REPO",
      "phid": "PHID-REPO-...",
      "fields": {
        "name": "extension-Quiz",
        "vcs": "git",
        "callsign": "EQUI",
        "shortName": null,
        "status": "active",
        "isImporting": false,
        "almanacServicePHID": null,
        "refRules": {
          "fetchRules": [],
          "trackRules": [],
          "permanentRefRules": []
        },
        "defaultBranch": "master",
        "description": {
          "raw": "MediaWiki extension Quiz"
        },
        "spacePHID": null,
        "dateCreated": 1417810068,
        "dateModified": 1565042963
      },
      "attachments": {
        "uris": {
          "uris": [{
            "id": "4363",
            "type": "RURI",
            "phid": "PHID-RURI-orccdbwoccehdy7zjas3",
            "fields": {
              "repositoryPHID": "PHID-REPO-hmrzsn5mcavq3xghlp6p",
              "uri": {
                "raw": "https:\/\/gerrit-replica.wikimedia.org\/r\/mediawiki\/extensions\/Quiz",
                "display": "https:\/\/gerrit-replica.wikimedia.org\/r\/mediawiki\/extensions\/Quiz",
                "effective": "https:\/\/gerrit-replica.wikimedia.org\/r\/mediawiki\/extensions\/Quiz",
                "normalized": "gerrit-replica.wikimedia.org\/r\/mediawiki\/extensions\/Quiz"
              },
              "io": {
                "raw": "observe",
                "default": "none",
                "effective": "observe"
              },
              "display": {
                "raw": "visible",
                "default": "never",
                "effective": "visible"
              },
              "credentialPHID": null,
              "disabled": false,
              "builtin": {
                "protocol": null,
                "identifier": null
              },
              "dateCreated": "1463620767",
              "dateModified": "1565042963"
            }
          }
         ]
       }
     }

The key part is result.data.[].attachments.uris.uris.[].io.raw

Thanks for the pointers.

This does not seem to work, or I'm doing something wrong. Assuming that the raw value should not be observe,
$:acko\> echo '{ "attachments": { "uris": true } }' | /var/www/html/phab/arcanist/bin/arc call-conduit --conduit-uri https://phabricator.wikimedia.org/ --conduit-token "cli-xxxxxxxxxxxxxxx" diffusion.repository.search | jq '.response.data[] | select(.attachments.uris.uris[].fields.io.raw != "observe").fields.name'
gives me a list of some repos which are canonical in Gerrit, same when using == "observe" instead. Meh.

@Aklapper: observe is only active if the uri is enabled, so perhaps some have disabled uris set to observe.

... gives me a list of some repos which are canonical in Gerrit, same when using == "observe" instead. Meh.

I spent some time fiddling with the jq example that you posted and I think the problem is the following:

It seems that select(.attachments.uris.uris[].fields.io.raw != "observe") passes any repo that has any uri where io is something other than "observe", however, all of the repos have at least one uri that will match != "observe" so they all get included in the output of the select()`.

I may be wrong about this but that's what seems to be happening from my admittedly brief attempt at this problem. Unfortunately I don't know jq well enough to make it do the right thing without spending significantly more time at it. I might take a stab at just doing it in php or python (or even javascript) later but for now I need to work on other things.

Thanks mmodell for the pointers, and especially for that last link which is very helpful. (Still trying to turn those web UI search results into some API call.)

Some notes for myself:
Using echo '{ "attachments": { "uris": true } }' | /var/www/html/phab/arcanist/bin/arc call-conduit --conduit-uri https://phabricator.wikimedia.org/ --conduit-token "api-xxxx" diffusion.repository.search | jq '.response.data[]' and looking at that output, every repo has several subelements in attachments 🡒 uris 🡒 uris[]. Manually looking for a repo that is canonical in Diffusion, and trying to find anything special, "name": "tool-newusers" looked like a candidate. Looks like I need to check if attachments 🡒 uris 🡒 uris[] 🡒 fields 🡒 io 🡒 raw is default for the first subelement of attachments 🡒 uris 🡒 uris[]. (Others values could be none, observe or mirror for mirrored repos.)

Call: echo '{ "after":"0", "attachments": { "uris": true } }' | /var/www/html/phab/arcanist/bin/arc call-conduit --conduit-uri https://phabricator.wikimedia.org/ --conduit-token "api-xxxx" diffusion.repository.search | jq '.response.data[] | select(.attachments.uris.uris[0].fields.io.raw == "default").fields.name'

Next: Somehow iterate by setting after cursor value (because pagination limit is 100 results). Then find out how to get activity for each repo via diffusion.commit.search . Or such. Who knows. Because no clue how to find a proper "project" or "repository" value. All that I tried did not work.