Page MenuHomePhabricator

[SPIKE] Explore data sources that we currently have to establish baselines for Metrics Platform metrics
Closed, ResolvedPublic5 Estimated Story PointsSpike

Description

Background

From T352816:

As a way to measure the Metrics Platform's progress in achieve our goals, we determined a set of measurable success criteria:

  • decrease the time to data by X%
  • decrease the amount of engineering days required to create instruments by X%

We now to to establish tooling or processes to collect this information in an ongoing way as well as establish baselines

AC

  • There is a publicly-accessible list of data sources that we could use to establish baselines for Metrics Platform metrics
  • Each list entry should include:
    • Whether the data are public or private
    • An example query or an example script to extract the required data
  • Sense-check all of the above with @nettrom_WMF

Possible Data Sources

  1. Gerrit – How can we extract the timestamp of the first commit associated with an instrumentation task? Is there other/more information available?
  2. Phabricator – Can we use Phabricator comments to estimate how long it took to build an instrument? Is there other/more information available?
  3. Surveys – R&DS ran a survey about this in 2020-21 (see https://wikimedia.slack.com/archives/C01DFMX6QLB/p1704816837945919)
  4. ???

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptJan 12 2024, 3:19 PM
phuedx renamed this task from [SPIKE] Document data sources that we currently have to establish baselines for Metrics Platform metrics to [SPIKE] Explore data sources that we currently have to establish baselines for Metrics Platform metrics.Jan 12 2024, 3:36 PM
phuedx updated the task description. (Show Details)
phuedx updated the task description. (Show Details)
phuedx set the point value for this task to 5.Jan 16 2024, 10:46 AM
cjming subscribed.

A notable challenge in establishing baselines for collecting Metrics Platform metrics is how much manual curation/intervention will be needed to be able to identify the length of time from start to finish of the development of a given instrument. Instrumentation building, configuration, deployment, iteration, and bug-fixing often span several tasks/patches/merge-requests that may or may not be accurately linked or categorized under the same epic or parent task in the case of Phabricator or the same metadata Phabricator task link in the case of Gerrit/GitLab/GitHub.

Assuming that moving forward, we are able to classify tasks with distinct, meaningful tags, and that partner product teams (PMs, engineers) use these classifications consistently, we can in theory leverage the Phabricator and Gerrit APIs to capture timelines.


Data Sources

Phabricator

Accessibility:

While most WMF work boards are visible to the public, the ability to use the APIs requires authentication. To make method calls via curl, one must generate an api token - i.e. user > settings > Conduit API Tokens https://phabricator.wikimedia.org/settings/user/<username>/page/apitokens/.

Approach:

Using Phabricator's Conduit API, specifically the maniphest.search API method, we can query its endpoint https://phabricator.wikimedia.org/api/maniphest.search using prebuilt and custom queries/constraints (many options and examples are provided in the documentation).

For example, we could classify and identify instrumentation tasks by leveraging a custom tag and/or combinations of tags. By experimenting with the Call Method form on the maniphest.search method page, one can pass in JSON parameters to generate the query and the method call results (i.e. enter {"projects":["metrics_platform_backlog", "wikilambda_front-end"]} in the constraints field). The curl cmd is provided as part of the response.

The following query searches for tasks tagged by "Metrics Platform Backlog" and "Wikilambda Front-end":

curl -k -s -X POST https://phabricator.wikimedia.org/api/maniphest.search \ 
-d api.token=api-token \
-d 'queryKey=all' \
-d 'constraints[projects][0]=metrics_platform_backlog' \
-d 'constraints[projects][1]=wikilambda_front-end'

The above query sends the following JSON response:

{
  "result": {
    "data": [
      {
        "id": 341748,
        "type": "TASK",
        "phid": "PHID-TASK-46quvfnlg76oedwtbeta",
        "fields": {
          "name": "Wikifunctions metrics: Track inventory of functions, implementations, tests",
          "description": {
            "raw": "## Description\n\nArrange to periodically count up functions/implementations/tests in Wikifunctions content, and make the counts available for dashboarding.  Initially, these counts will be obtained from `wikilambda_zobject_function_join` and `wikilambda_zobject_labels` tables.\n\n---\n\n## Completion checklist\n\n* [ ] Before closing this task, review one by one the checklist available here: https://www.mediawiki.org/wiki/Abstract_Wikipedia_team/Definition_of_Done#Front-end_Task/Bug_Completion_Checklist"
          },
          "authorPHID": "PHID-USER-v7df3kwnhakxh7cgcmxk",
          "ownerPHID": "PHID-USER-v7df3kwnhakxh7cgcmxk",
          "status": {
            "value": "resolved",
            "name": "Resolved",
            "color": "green"
          },
          "priority": {
            "value": 80,
            "name": "High",
            "color": "red"
          },
          "points": null,
          "subtype": "default",
          "closerPHID": "PHID-USER-zdschdmx5qepfqbs4rd7",
          "dateClosed": 1695856560,
          "spacePHID": "PHID-SPCE-6l6g5p53yi3mypnlpxjw",
          "dateCreated": 1689203623,
          "dateModified": 1695856560,
          "policy": {
            "view": "public",
            "interact": "public",
            "edit": "users"
          },
          "custom.deadline.due": null,
          "custom.train.status": null,
          "custom.train.backup": null,
          "custom.external_reference": null,
          "custom.release.version": null,
          "custom.release.date": null,
          "custom.security_topic": "default",
          "custom.risk.summary": null,
          "custom.risk.impacted": null,
          "custom.risk.rating": null,
          "custom.requestor.affiliation": null,
          "custom.error.reqid": null,
          "custom.error.stack": null,
          "custom.error.url": null,
          "custom.error.id": null,
          "custom.points.final": null,
          "custom.deadline.start": null
        },
        "attachments": {}
      },
      {
        "id": 350497,
        "type": "TASK",
        "phid": "PHID-TASK-aphgednarfnmfhzpvz3v",
        "fields": {
          "name": "Update the WikiLambda instrumentation to use core interaction events",
          "description": {
            "raw": "Epic: {T346285}\n\n== Background\n\nIn {T350495}, we deployed the latest version of the JS client library. We can now migrate the WikiLambda instrumentation to use core interaction events.\n\nSince the the WikiLambda instrumentation has a lot of instrumentation-specific (domain-specific?) custom data, this will provide a good case study in creating and using a new schema that extends the core interaction event schema with the JS client library.\n\n== AC\n\n[] Wherever possible, the data logged by [[ https://codesearch.wmcloud.org/search/?q=dispatchEvent%5C%28&files=resources%5C%2Fext%5C.&excludeFiles=&repos=Extension%3AWikiLambda | the existing instrument ]] is mapped to the `{ action, action_subtype, action_source, action_context }` 4-tuple\n[] Repeated fields are identified and a schema fragment is created\n** e.g. `/fragment/mediawiki/wikilambda/common/1.0.0`\n[] If a schema fragment is needed (it probably is), then a new schema is created\n** e.g. `/analytics/mediawiki/product_metrics/wikilambda/ui_actions/1.0.0`\n[] The instrument is updated to use the above (keeping the previous one for now)\n[] The new stream configuration is added (keeping the previous one for now)\n\n== Notes\n\n1. Example mappings that @sfaci and I worked through on 24/1/15:\n\n<table>\n  <tr>\n    <th>Event name</th>\n    <th>Custom data</th>\n    <th>Mapping</th>\n  </tr>\n  <tr>\n    <td>`wf.ui.editFunction.load`</td>\n    <td>\n* `isnewzobject`\n* `zobjectid`\n* `zlang`\n</td>\n    <td>\n* `action=edit_function`\n* `action_subtype=create` or `=edit` depending on `$isnewzobject`\n* `action_source=$zobjectid`\n* `action_context=$zlang`\n</td>\n</table>"
          },
          "authorPHID": "PHID-USER-w3pd7vqenmta6vpmhwcn",
          "ownerPHID": "PHID-USER-57xhwarqgbogyqgco52e",
          "status": {
            "value": "open",
            "name": "Open",
            "color": null
          },
          "priority": {
            "value": 50,
            "name": "Medium",
            "color": "orange"
          },
          "points": "5",
          "subtype": "default",
          "closerPHID": null,
          "dateClosed": null,
          "spacePHID": "PHID-SPCE-6l6g5p53yi3mypnlpxjw",
          "dateCreated": 1699028702,
          "dateModified": 1706190916,
          "policy": {
            "view": "public",
            "interact": "public",
            "edit": "users"
          },
          "custom.deadline.due": null,
          "custom.train.status": null,
          "custom.train.backup": null,
          "custom.external_reference": null,
          "custom.release.version": null,
          "custom.release.date": null,
          "custom.security_topic": "default",
          "custom.risk.summary": null,
          "custom.risk.impacted": null,
          "custom.risk.rating": null,
          "custom.requestor.affiliation": null,
          "custom.error.reqid": null,
          "custom.error.stack": null,
          "custom.error.url": null,
          "custom.error.id": null,
          "custom.points.final": null,
          "custom.deadline.start": null
        },
        "attachments": {}
      }
    ],
    "maps": {},
    "query": {
      "queryKey": "all"
    },
    "cursor": {
      "limit": 100,
      "after": null,
      "before": null,
      "order": null
    }
  },
  "error_code": null,
  "error_info": null
}

Note that 2 items were returned from the above query. We can then chain more queries together using the data returned from maniphest.search.

Based on how we filter results, we can take relevant phids and use the transaction.search method to read the transaction record for a particular object.

For example, by entering the phid of task "Update the WikiLambda instrumentation to use core interaction events", PHID-TASK-aphgednarfnmfhzpvz3v, in the objectIdentifier field of the transaction.search call method form, the method call result is (showing just a partial view of the JSON response):

{
  "data": [
    {
      "id": 9488499,
      "phid": "PHID-XACT-TASK-pzsuqlhrdfaaqic",
      "type": "comment",
      "authorPHID": "PHID-USER-57xhwarqgbogyqgco52e",
      "objectPHID": "PHID-TASK-aphgednarfnmfhzpvz3v",
      "dateCreated": 1706190916,
      "dateModified": 1706190916,
      "groupID": "7sckm6ah7pkqdfn4zsj62l7jh5shluwy",
      "comments": [
        {
          "id": 2265805,
          "phid": "PHID-XCMT-yqmqmphjzxcczibwk4r7",
          "version": 1,
          "authorPHID": "PHID-USER-57xhwarqgbogyqgco52e",
          "dateCreated": 1706190916,
          "dateModified": 1706190916,
          "removed": false,
          "content": {
            "raw": "At the time of writing this, the instrument code, schemas and stream configuration are aligned with the new mapping strategy. \nDetails about the WIP MRs that need to be reviewed are in https://phabricator.wikimedia.org/T350497#9478332"
          }
        }
      ],
      "fields": {}
    },
    {
      "id": 9485551,
      "phid": "PHID-XACT-TASK-h5s37pd257vzeyt",
      "type": null,
      "authorPHID": "PHID-USER-57xhwarqgbogyqgco52e",
      "objectPHID": "PHID-TASK-aphgednarfnmfhzpvz3v",
      "dateCreated": 1706110751,
      "dateModified": 1706110751,
      "groupID": "ca6eemkhw6nttb6cqvux36frvxsnhb4v",
      "comments": [],
      "fields": {}
    },
    {
      "id": 9485065,
      "phid": "PHID-XACT-TASK-sr6dt5jjxuste2n",
      "type": "comment",
      "authorPHID": "PHID-USER-57xhwarqgbogyqgco52e",
      "objectPHID": "PHID-TASK-aphgednarfnmfhzpvz3v",
      "dateCreated": 1706104876,
      "dateModified": 1706104876,
      "groupID": "qokyxtxscievke5ufqo4irmsewnk6adi",
      "comments": [
        {
          "id": 2264939,
          "phid": "PHID-XCMT-edyuqfj27m6swncs7rvh",
          "version": 1,
          "authorPHID": "PHID-USER-57xhwarqgbogyqgco52e",
          "dateCreated": 1706104876,
          "dateModified": 1706104876,
          "removed": false,
          "content": {
            "raw": "At the time of writting this, the [[  https://docs.google.com/spreadsheets/d/1WiPR-uzGlJ2LJ5DKks3WYRGV0W2uOl6l2WqwfdqZbrc | mapping strategy approach ]] has already been modified taking into consideration what I mentioned in my previous comment."
          }
        }
      ],
      "fields": {}
    },
...

While the Conduit API indicates that transaction.search is the preferred method for drilling into a particular task's transactions, it oddly doesn't seem to provide information about when a task is moved between columns. However there is a frozen (eventually marked to be deprecated) method maniphest.gettasktransactions that does provide this info.

By using the id (350497) from the first maniphest.search query using the Metrics Platform Backlog and Wikilambda Front-end tags, we can get the task's movement across the board:

curl https://phabricator.wikimedia.org/api/maniphest.gettasktransactions \
    -d api.token=api-token \
    -d 'ids[0]=350497'

which returns (abbreviated response):

result	
{
  "350497": [
    {
      "taskID": "350497",
      "title": "Sfaci added a comment.",
      "transactionID": "9488499",
      "transactionPHID": "PHID-XACT-TASK-pzsuqlhrdfaaqic",
      "transactionType": "core:comment",
      "oldValue": null,
      "newValue": null,
      "meta": {
        "remarkup.control": [],
        "core.groupID": "7sckm6ah7pkqdfn4zsj62l7jh5shluwy"
      },
      "comments": "At the time of writing this, the instrument code, schemas and stream configuration are aligned with the new mapping strategy. \nDetails about the WIP MRs that need to be reviewed are in https://phabricator.wikimedia.org/T350497#9478332",
      "authorPHID": "PHID-USER-57xhwarqgbogyqgco52e",
      "dateCreated": "1706190916"
    },
    {
      "taskID": "350497",
      "title": "Sfaci moved this task from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 08) board.",
      "transactionID": "9485551",
      "transactionPHID": "PHID-XACT-TASK-h5s37pd257vzeyt",
      "transactionType": "core:columns",
      "oldValue": null,
      "newValue": [
        {
          "columnPHID": "PHID-PCOL-a7ggi4ezzmv4wsdsxl3m",
          "afterPHIDs": [],
          "beforePHIDs": [],
          "boardPHID": "PHID-PROJ-a7rphgd5kx4twu4bw46n",
          "fromColumnPHIDs": {
            "PHID-PCOL-x6gzdj3oep6dvmszrcy7": "PHID-PCOL-x6gzdj3oep6dvmszrcy7"
          }
        }
      ],
      "meta": {
        "core.groupID": "ca6eemkhw6nttb6cqvux36frvxsnhb4v"
      },
      "comments": null,
      "authorPHID": "PHID-USER-57xhwarqgbogyqgco52e",
      "dateCreated": "1706110751"
    },
...

In the 2nd object of the 350497 array, note the transactionType, title, columnPHID key in the newValue array, and dateCreated above:

  • "transactionType": "core:columns" << query would look for this type
  • "title": "Sfaci moved this task from In Process to Code Review / Tech Input on the Data Products (Data Products Sprint 08) board."
  • "columnPHID": "PHID-PCOL-a7ggi4ezzmv4wsdsxl3m"
  • "dateCreated": "1706110751"

We can then use another method to get the value of the new column and its associated workboard (or parse it from the title).

Using project.column.search and its Call Method form, we can populate the constraints field with the columnPHID value from the previous query {"phids":["PHID-PCOL-a7ggi4ezzmv4wsdsxl3m"]}.

The Method Call Result from project.column.search returns:

{
  "data": [
    {
      "id": 28614,
      "type": "PCOL",
      "phid": "PHID-PCOL-a7ggi4ezzmv4wsdsxl3m",
      "fields": {
        "name": "Code Review / Tech Input",
        "proxyPHID": null,
        "project": {
          "id": 6924,
          "phid": "PHID-PROJ-a7rphgd5kx4twu4bw46n",
          "name": "Data Products Sprint 08"
        },
        "status": "0",
        "isDefaultColumn": false,
        "sequence": "6",
        "dateCreated": 1704904218,
        "dateModified": 1704904218,
        "policy": {
          "view": "public",
          "edit": "users"
        }
      },
      "attachments": {}
    }
  ],
  "maps": {},
  "query": {
    "queryKey": null
  },
  "cursor": {
    "limit": 100,
    "after": null,
    "before": null,
    "order": null
  }
}

Above response gives us the name ("Data Products Sprint 08") and ids of the project and the name of the column - "Code Review / Tech Input". And we have the date of the transaction from the previous query.

All the API methods and responses above provide a proof-of-concept that pending specific tagging on relevant tasks, we could craft a timeline of the development of a given instrument through the chaining of queries using Phabricator's Conduit API.


Gerrit

Accessibility:

Similar to Phabricator, visibility of patches are public but to use the Gerrit Rest API, authentication is necessary. To make method calls via curl, one should have HTTP credentials generated in profile settings (https://gerrit.wikimedia.org/r/settings/#HTTPCredentials).

Approach:

In conjunction with the Phabricator Conduit API, we could find all patches related to a Phabricator task (aka Bug in Gerrit) that are identified by specified project tags as noted in the maniphest.search query in the Phabricator section above.

Once we have the Phabricator task number (i.e. T123456), we can extract dates to track when associated Gerrit patches are opened to when they are merged, thereby creating a timeline of the actual work done for a Phab task.

For example, we can take the id from the first maniphest.search query in the beginning of this comment, 350497 (which is associated with phid = PHID-TASK-aphgednarfnmfhzpvz3v), and know that this is the task number that constitutes the uri of the task i.e. https://phabricator.wikimedia.org/T350497. Using T350497, we can use the Query Changes endpoint to get all associated Gerrit patches:

curl --user cjming -s https://gerrit.wikimedia.org/r/a/changes/?q=bug:T350497

which returns:

[
  {
    "id": "schemas%2Fevent%2Fsecondary~master~I30df3a64206a5737d65373b1af15aefb39a473a0",
    "project": "schemas/event/secondary",
    "branch": "master",
    "topic": "T350497",
    "attention_set": {},
    "removed_from_attention_set": {
      "10705": {
        "account": {
          "_account_id": 10705
        },
        "last_update": "2024-01-22 18:46:37.000000000",
        "reason": "Change was marked work in progress"
      }
    },
    "hashtags": [],
    "change_id": "I30df3a64206a5737d65373b1af15aefb39a473a0",
    "subject": "[DNM] Update the WikiLambda instrumentation to use core interaction events",
    "status": "NEW",
    "created": "2024-01-22 18:42:39.000000000",
    "updated": "2024-01-25 13:48:32.000000000",
    "submit_type": "MERGE_IF_NECESSARY",
    "mergeable": false,
    "insertions": 627,
    "deletions": 0,
    "total_comment_count": 0,
    "unresolved_comment_count": 0,
    "work_in_progress": true,
    "has_review_started": true,
    "meta_rev_id": "d0c06de01f2b7d241bdbfabbc39d3eac50ba81ca",
    "_number": 992224,
    "owner": {
      "_account_id": 10705
    },
    "requirements": [],
    "submit_records": [
      {
        "rule_name": "gerrit~DefaultSubmitRule",
        "status": "NOT_READY",
        "labels": [
          {
            "label": "Verified",
            "status": "OK",
            "applied_by": {
              "_account_id": 75
            }
          },
          {
            "label": "Code-Review",
            "status": "NEED"
          }
        ]
      }
    ]
  },
  {
    "id": "mediawiki%2Fextensions%2FWikiLambda~master~I4f0eae50865ab8d14ca55fe3f2fb4862af24f8c4",
    "project": "mediawiki/extensions/WikiLambda",
    "branch": "master",
    "topic": "T350497",
    "hashtags": [],
    "change_id": "I4f0eae50865ab8d14ca55fe3f2fb4862af24f8c4",
    "subject": "[DNM] Update the WikiLambda instrumentation to use core interaction events",
    "status": "NEW",
    "created": "2024-01-22 18:45:53.000000000",
    "updated": "2024-01-25 13:46:28.000000000",
    "submit_type": "MERGE_IF_NECESSARY",
    "mergeable": false,
    "insertions": 87,
    "deletions": 0,
    "total_comment_count": 0,
    "unresolved_comment_count": 0,
    "work_in_progress": true,
    "has_review_started": true,
    "meta_rev_id": "9ad1e949e7b0fb941d2ad96314cd1aad08db4e95",
    "_number": 992226,
    "owner": {
      "_account_id": 10705
    },
    "requirements": [],
    "submit_records": [
      {
        "rule_name": "gerrit~DefaultSubmitRule",
        "status": "NOT_READY",
        "labels": [
          {
            "label": "Verified",
            "status": "REJECT",
            "applied_by": {
              "_account_id": 75
            }
          },
          {
            "label": "Code-Review",
            "status": "NEED"
          }
        ]
      }
    ]
  },
  {
    "id": "operations%2Fmediawiki-config~master~I3b1d73e1aa6e66fdcc400d3f2f6337ec981ff5e7",
    "project": "operations/mediawiki-config",
    "branch": "master",
    "topic": "T350497",
    "hashtags": [],
    "change_id": "I3b1d73e1aa6e66fdcc400d3f2f6337ec981ff5e7",
    "subject": "[DNM] Update the WikiLambda instrumentation to use core interaction events",
    "status": "NEW",
    "created": "2024-01-22 18:34:45.000000000",
    "updated": "2024-01-24 12:12:34.000000000",
    "submit_type": "REBASE_IF_NECESSARY",
    "mergeable": false,
    "insertions": 30,
    "deletions": 1,
    "total_comment_count": 0,
    "unresolved_comment_count": 0,
    "work_in_progress": true,
    "has_review_started": true,
    "meta_rev_id": "44e29cd515a67bad9ae1a8c39a11cec222f22ebe",
    "_number": 992223,
    "owner": {
      "_account_id": 10705
    },
    "requirements": [],
    "submit_records": [
      {
        "rule_name": "gerrit~DefaultSubmitRule",
        "status": "NOT_READY",
        "labels": [
          {
            "label": "Verified",
            "status": "OK",
            "applied_by": {
              "_account_id": 75
            }
          },
          {
            "label": "Code-Review",
            "status": "NEED"
          }
        ]
      }
    ]
  }
]

Note there are 3 patch objects in this response - each in this case, have status:NEW with a created date. Since these patches are still in review, we only have created and updated timestamps available.

If/when a patch gets merged, its status will be MERGED and a submitted key would be included in the response that holds a timestamp of when the patch was merged.

By combining queries from both the Phabricator Conduit API and the Gerrit Rest API, we can extract data for tasks and patches to build a timeline for a given instrument assuming said tasks and patches are accurately tagged and labeled.


Surveys

As noted in the description of this task, surveys can be used to approximate median development times of instruments, as well as other pertinent information.

The results and analysis of an R&DS survey from 2020-21 was written up in a report wherein instrumentation build times and pain points were documented anecdotally.

It would provide valuable insights for Data Products and WMF decision makers writ large to be able to run surveys regularly as we build out the Metrics Platform as a way to measurably determine the success of our efforts and how they are being received by stakeholders.


Tagging, Comments on Work Boards

Another possible approach to capturing data for Metrics Platform baselines could be working with our partner product teams as they build instruments to log pertinent data during the development cycle. Information like actual engineering/development time per task/patch could help gauge relative increases/decreases in cumulative working hours in addition to spans of time (i.e. sprints, weeks, months) it takes from kickoff to data analysis.

Using the Phabricator and Gerrit APIs can help us empirically determine spans of time for instrument development but not cumulative engineering time. We would need to coordinate with product teams to find some way to log information about how long a discrete task takes to accomplish. This could be as simple as asking engineers to comment on Phab tasks using some convention that will enable us to query for and collect that data over time.


Considerations

Given the following success criteria of Metrics Platform:

  1. decrease the time to data by X%
  2. decrease the number of custom fields across instruments by X%
  3. decrease the amount of engineering days required to create instruments by X%

We will likely need to utilize a combination of the data sources articulated above to measure the outlined goals. The 1st and 3rd criteria can be reasonably determined by using the data extracted by Phabricator and Gerrit APIs in conjunction with a process for capturing engineering hours per task. The 2nd criteria can be readily measured by comparing the usage of base schemas and custom schemas (presumably derived from base MP schemas) per instrument (this will be further explored in another related spike T354955).

One thing to note - while this investigation focused on Gerrit as the presumed repository for code, many teams have or are in the process of migrating to GitLab, or host their code in GitHub (i.e. Android). It's reasonable to assume that both GitHub and GitLab have robust APIs that will enable similar types of querying for us to extract the data we need.

Regarding how to synthesize/visualize the data extracted from Phab and Gerrit/GitLab/GitHub, we could build out a dashboard with a form that takes in tags and/or task ids to search and on the backend runs the queries for pulling in tasks/patches/PRs/MRs and renders tables or timelines of all related work. There is precedence for this kind of tool - see https://data.releng.team/ which runs on ToolForge/CloudVPS. Other simpler (though less user-friendly and more manual) options might include pushing out data to a spreadsheet or manual querying/reporting on tag/taskid-related work.

If we go the route of building a tool/dashboard, the impact could be far greater than just Metrics Platform's use case. Any team could query using custom tags or task ids and see the output of a table/timeline that shows how long development took.

Next Steps

  • We agree/disagree on implementation for capturing baselines:
    • devise a tagging system to enable accurate querying
    • develop processes to collect data about cumulative engineering time on tasks
    • determine the output format for the data extraction and what infrastructure to use
    • write scripts to build out dashboards/spreadsheets/reporting using Phab/Code APIs
  • Pending above, write up tickets for the implementation work and how all this data will be collected, tracked, and analyzed.

Wow, awesome write-up @cjming! It helps a lot to understand and comment.

Here are my thoughts:

Assumptions:

  • I assume that establishing baselines means gathering metrics for instruments and MP datasets that have already been developed and productionized. Is that correct?
  • I assume that people will be creating between 10 to 20 instruments per year, at least in the near future. Is that real?
Phab and Gerrit as sources

I can see how we would successfully use Phab and Gerrit APIs in conjunction with a convention, i.e. specific tags, comments, etc. to capture MP metrics from now on!
We'd be adding some (minimal) work to developers who would have to remember to add proper tags and comments to these tools, but I think that's fine.
On the other hand, I imagine using them to establish a baseline could be difficult. Even if we manually add all the tags, comments, etc...
we could find that the expected order of things was not followed, or maybe a task was not created with the correct timestamps, or...?
Also, as mentioned in the write-up, it seems some of the information can not be gathered by Phab/Gerrit integration, and we'd probably have to ask around.
Lastly, I'm thinking that maybe this system is a bit prone to inconsistencies? Like, if someone forgets to log a step properly, will it be easy to correct it later?

Surveys as source

It seems to me that surveys would be a simple way to gather information about existing instruments to build the MP baseline metrics.
They would go directly to the person/people who know the answer (and the hidden details of the task, i.e. calendar time, external blockers, wierdnesses, etc.) and would "force" them to shape the answers in the expected format.
We'd be adding some work to developers/stakeholders who would have to fill in the survey after productionizing the instrument. But that might also make them feel attended? Like a satisfaction survey?
This would be a more low tech solution, we probably wouldn't have the data in the data lake, but rather in a spreadsheet, would that be OK?
We should have a way to automate sending surveys after instrument productionization, right? Or else, this would bring some manual work for us.
It feels to me that this could be a robust solution, since it's decoupled of the way people use Phab/Gerrit. It can also be easily repeated in case of failure.


My thoughts are leaning towards using surveys both for gathering baselines and on-going metrics, because:

  • It's easier to implement.
  • It's less error prone, so I image it will give us less maintenance work.
  • There's not a lot of instruments per year, so the manual work doesn't seem huge.
  • Surveys can collect information about cumulative working hours and also qualitative data (i.e. ease of use?).

But this is just me, please disagree!

Thanks for your thoughtful response @mforns! Replies inline

  • I assume that establishing baselines means gathering metrics for instruments and MP datasets that have already been developed and productionized. Is that correct?

AIUI we're making a distinction between past and future instruments -- moving forward, how do we establish baselines vs. how do we assess this data for past/current instruments in production? I believe this ticket is for establishing baselines starting now (presumably we'll have to do some work to capture this for previous efforts - TBD)

  • I assume that people will be creating between 10 to 20 instruments per year, at least in the near future. Is that real?

I actually don't know the number per year past or projected - it's a good, relevant question and something that could also be a useful metric. I'll spin off a spike ticket in the backlog for it.

Phab and Gerrit as sources

I imagine using them to establish a baseline could be difficult. Even if we manually add all the tags, comments, etc...

Concur -- the effectiveness of this approach depends heavily on accurate tagging which could be a significant challenge. At the same time, I think it'd be an interesting experiment (and in theory shouldn't take too long) to stand up a tool that takes tags or task ids as parameters and spits out tables/timelines of how long something took from start to finish using the phab/code apis. To your later point, I think capturing cumulative engineering hours tho for an instrument might be easier to get/approximate via surveys since we don't have convention in place yet to get this data during active development.

maybe this system is a bit prone to inconsistencies? Like, if someone forgets to log a step properly, will it be easy to correct it later?

Aye - as noted above, at best it's an approximation (which is better than nothing?). I do think it would be easy to correct after the fact if we go the tool/form route -- we could query the apis at any time for tags that are appended later.

Surveys as source

It seems to me that surveys would be a simple way to gather information about existing instruments to build the MP baseline metrics.

Agree - I didn't spend much time exploring surveys because I have a bias in thinking that automating this data collection would be better. But especially for assessing cumulative, perceived engineering time on instrument development (where nothing is in place to get this now) it might be a an easier, less arduous, if anecdotal, approach.

They would go directly to the person/people who know the answer (and the hidden details of the task, i.e. calendar time, external blockers, weirdnesses, etc.) and would "force" them to shape the answers in the expected format.
We'd be adding some work to developers/stakeholders who would have to fill in the survey after productionizing the instrument. But that might also make them feel attended? Like a satisfaction survey?

I actually like this idea -- it's not like we're onboarding 100s or even 10s of product teams at the moment -- it's manageable enough that we could do this on a case-by-case basis fairly readily.

This would be a more low tech solution, we probably wouldn't have the data in the data lake, but rather in a spreadsheet, would that be OK?

I think this is ok - maybe we publish this somewhere so it's publicly accessible. It's likely not up to me but I'd lobby for it.

We should have a way to automate sending surveys after instrument productionization, right? Or else, this would bring some manual work for us.
It feels to me that this could be a robust solution, since it's decoupled of the way people use Phab/Gerrit. It can also be easily repeated in case of failure.

This is an interesting idea -- maybe we can still leverage the Gerrit APIs to trigger sending a survey (i.e. if there are changes that include the MP base schemas that are merged in the secondary repo and corresponding stream config is deployed, we can have a survey be sent to the committing author). Or again, because we're only onboarding a few teams at any one time, we could handle this manually for the time being.

My thoughts are leaning towards using surveys both for gathering baselines and on-going metrics, because:

  • It's easier to implement.
  • It's less error prone, so I image it will give us less maintenance work.
  • There's not a lot of instruments per year, so the manual work doesn't seem huge.
  • Surveys can collect information about cumulative working hours and also qualitative data (i.e. ease of use?).

But this is just me, please disagree!

I generally agree! But I do think it would be an insightful and interesting (and hopefully not too difficult - this could be wishful thinking) exercise for us to track instrument development times from start to finish using available APIs.

I'll throw some related tickets in the backlog which we can discuss and take up or pass on in our next sprint.

hi @nettrom_WMF 👋 -- just curious if you have any thoughts about this (since you are tagged in one of the ACs)?

As noted in previous comments, I'm inclined to do some combo of automated data collection via available APIs (to capture start/end dates of instrument tickets/patches/MRs) and surveys (to capture/approximate cumulative engineering time per instrument) to come up with some sort of baseline metrics for Metrics Platform effectiveness/reception/impact over time. Any insight from your perspective is most welcome.

It's an amazing analysis @cjming!!

I just wanted to try to contribute with some thoughts that come to my mind after reading your big analysis and the discussion with Marcel. His point of view makes me be aware of some points I didn't realize before and, long story short, I think similar.

At first I was absolutely in favour of API usage to get data automatically. I tend to be in favour of automatic things but I didn't realize we are often working on a couple of task at the same time and the information we can get from phabricator/gerrit ends up being imprecise. It seems one important point is how much time engineers really spend working on instrumentation tasks and that information cannot be obtained from the API with accuracy because, while working on the instrument, you likely have been discussing about something during a while in slack/meeting and/or working on something else, meetings, . . .. So, if we need also to capture, manually, that information (and some other "more human" details), I think that maybe it doesn't make sense to make the effort to create something to capture automatically more information less accurate. That's why I think surveys are the best way to collect the information we want.

And, unfortunately, I think we would have the same problem for historic information if we decide to do something similar for already existing instruments.

Thanks, @cjming for the deep dive and the extensive notes.

I'm inclined to agree with @mforns that a survey would be the easiest to stand up and start collecting data. And, like @cjming, I do think it's worthwhile spending a sprint trying to build an end-to-end prototype in order to figure out what we would need instrument owners to do and then getting feedback. Maybe it'd be low friction after all? 🤞

So let's start with surveys and see if we can schedule a spike to build an end-to-end prototype?

This is an interesting idea -- maybe we can still leverage the Gerrit APIs to trigger sending a survey (i.e. if there are changes that include the MP base schemas that are merged in the secondary repo and corresponding stream config is deployed, we can have a survey be sent to the committing author).

Neato! I think we could break this down into an automation part – detecting new streams and their owners – and a manual part – sending the instrument owner(s) a survey.

Tagging @MSantos because I learned he is also making some tools to mine phab and gerrit data on the MW eng side to learn more about dev workflows... maybe we can converge or learn from both efforts?

Chatted with Clare about the difficulty to capture engineer days and decided on a couple adjustments to the metrics:

decrease the time to data by X%

Capture the number of calendar days for an instrumentation task starting from the Phab ticket moving to In Progress to first data collection
T356608: Build a tool to provide instrumentation development timelines

decrease the number of custom fields across instruments by X%

T356610: Write a script to capture custom data properties counts in secondary schemas

[NEW] increase developer satisfaction by X%

T356609: Build a survey for Metrics Platform adopters

decrease the amount of engineering steps required to create instruments by X%

Capture the number of interactions in gerrit and phab on an instrumentation task... @cjming does this seen feasible to do? It would be great to learn more about where precisely the most friction occurs and demonstrate that the client libraries and other shared tools we produce ease that pain. If yes, we will need to add a spike to sprint 10 to figure out how we will accomplish this.