Page MenuHomePhabricator

Upgrade ROCm to 4.5
Open, Needs TriagePublic


With we will be able to use tensorflow-io and tensorflow-rocm (the io package contains functionalities like an HDFS client and it was created for the release of tensorflow 2.6).

We were not able to upgrade to ROCm 4.3.1 due to this problem, but now we should be able to upgrade to something like 4.5 when upstream will release the new tensorflow-io package containing the fix (likely version 0.23).

Event Timeline

Change 738615 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Import new ROCm version 4.5.1

From it seems that hsa-ext-rocr-dev is not a concern anymore, so we can simplify the deployment procedure even further.

Change 738615 merged by Elukey:

[operations/puppet@production] Import new ROCm version 4.5

Mentioned in SAL (#wikimedia-operations) [2021-11-15T15:15:16Z] <elukey> reprepro --delete clearvanished on apt1001 to clean-up thirdparty/amd-rocm38 (buster and stretch) - T295661

Change 738947 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] aptrepo: update amd-rocm45 component's suite

Change 738947 merged by Elukey:

[operations/puppet@production] aptrepo: update amd-rocm45 component's suite

Mentioned in SAL (#wikimedia-operations) [2021-11-15T15:24:29Z] <elukey> import AMD ROCm 4.5 in thirdparty/amd-rocm45 for buster-wikimedia - T295661

ROCm 4.5 imported in apt. Next steps:

  • Wait for the release of the pypi package tensorflow-io
  • Test the new suite on one node (will need the help of @Miriam)

Time flies and both ROCm and tensorflow-io got several releases. is out and contains the pull request that I made for tensorflow-io (to allow tensorflow-rocm) so in theory we could test ROCm 4.5 and see if we can proceed (even if they have already released 5.x).

@Miriam do you have any preference? Nothing really urgent :)