Page MenuHomePhabricator

wikimedia/discovery/analytics: replace git-fat with git-lfs
Closed, InvalidPublic

Description

chad@wmf3179 analytics % cat .gitattributes
*.whl filter=fat -text
*.jar filter=fat -text

chad@wmf3179 analytics % find . -name '*.whl' -ls -exec cat {} \; -o -name '*.jar' -ls -exec cat {} \; 
 28980522      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/decorator-5.0.9-py3-none-any.whl
#$# git-fat e0c9be2a8af8c22ed30c73ff8138ac8461099a7a                 8901
 28982159      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/idna-2.10-py2.py3-none-any.whl
#$# git-fat 999b6718b4d789d8ca0d2ddf7c07826154291825                58811
 28982368      0 lrwxrwxrwx   1 ebernhardson ebernhardson       49 Sep  1 09:04 ./artifacts/rdf-spark-tools-latest-jar-with-dependencies.jar -> rdf-spark-tools-0.3.114-jar-with-dependencies.jar
#$# git-fat dc9ec28bbf636718020316dccb8f3ded9066e250             24748809
 28982355      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/networkx-1.11-py2.py3-none-any.whl
#$# git-fat 3209bca45fb613b7a4507cff1927b1fd44622e6c              1317927
 28982157      0 lrwxrwxrwx   1 ebernhardson ebernhardson       37 Sep  1 09:04 ./artifacts/glent-latest-jar-with-dependencies.jar -> glent-0.2.6-jar-with-dependencies.jar
#$# git-fat f0f391b831f3f09da31b4138f3e1d553d39fe6ae             40995064
 28982359      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/prometheus_client-0.11.0-py2.py3-none-any.whl
#$# git-fat 30fef728e9993f3ea69c0b71525f6362508ecc9d                56435
 28982364      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/python_json_logger-2.0.1-py34-none-any.whl
#$# git-fat 3726718fd7272fdc4b1f8fae6ebbe7a861662869                 7374
 28982156      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/glent-0.2.6-jar-with-dependencies.jar
#$# git-fat f0f391b831f3f09da31b4138f3e1d553d39fe6ae             40995064
 28982369      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/requests-2.25.1-py2.py3-none-any.whl
#$# git-fat b1009d9fd6acadc64e1a3cecb6f0083fe047e753                61216
 28982371      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/setuptools-57.0.0-py3-none-any.whl
#$# git-fat 0b0fcb339be89ae1b6360dbfb2be2075ae9f84c9               821665
 28982362      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/pyrsistent-0.17.3-cp37-cp37m-linux_x86_64.whl
#$# git-fat b83fc6cfcacc712024f2803ef0035fb08f5c0c6f                98627
 28982158      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/hyperopt-0.1.2-py3-none-any.whl
#$# git-fat 4eaf5f249a184a12bfe3e4fbbe37d39fd4192f90               115233
 28982374      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/tqdm-4.61.0-py2.py3-none-any.whl
#$# git-fat 80cc9df9545b54fe3e18b790d6c5187b65aad762                75783
 28982352      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/mock-4.0.3-py3-none-any.whl
#$# git-fat 89e027f3561efa6fb1dc9ab30ec60e507695bb76                28536
 28980352      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/attrs-21.2.0-py2.py3-none-any.whl
#$# git-fat a72511421b1aca19cc12b17e2859cf755e0a1ca3                53716
 28982365      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/python_logstash-0.4.6-py3-none-any.whl
#$# git-fat 04e52db2cb1f3e55ca2bb2b5b24a9ff9e5f2bda0                 8150
 28982354      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/mwapi-0.5.1-py2.py3-none-any.whl
#$# git-fat 03d878921284b9c2f6af86f7ba8923e29f782a92                10639
 28980355      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/chardet-4.0.0-py2.py3-none-any.whl
#$# git-fat e9eb83c71c09b3c8249bd7d6d2619b65fff03874               178743
 28982154      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/elasticsearch-hadoop-7.10.2.jar
#$# git-fat d752857f3fb54f51f4bc353075a2aabc2843cc1b              1021515
 28980354      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/certifi-2021.5.30-py2.py3-none-any.whl
#$# git-fat 2fcaa39108a9c99700c6f3f4198fcaa47b8ed707               145532
 28982372      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/six-1.15.0-py2.py3-none-any.whl
#$# git-fat 8730d16507db66e828c696ecc7cb785e557900bb                10963
 28982358      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/pip-21.1.2-py3-none-any.whl
#$# git-fat 296a5082c1e300e302d2d11a447bd92ce20d3d9d              1547997
 28982153      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/elasticsearch-5.5.3-py2.py3-none-any.whl
#$# git-fat 0007d4e42ed7bf76489549eabbf437a1d6c328b5               119268
 28982356      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/numpy-1.20.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
#$# git-fat 07ebc9f06abf992c1f15e6cd430d8867a3a45fd0             15307196
 28982353      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/more_itertools-8.7.0-py3-none-any.whl
#$# git-fat 859eff022eea6153860536e0a943c6967e507d33                48425
 28982360      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/py4j-0.10.9.2-py2.py3-none-any.whl
#$# git-fat 8e97d429ab19777c9cc934a36ffe8699081e7455               198796
 28982379      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/wmf_mjolnir-1.1-py3-none-any.whl
#$# git-fat bd1d3cac73e3f9b8c712885010192ac1268e6802               155623
 28982367      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/rdf-spark-tools-0.3.114-jar-with-dependencies.jar
#$# git-fat dc9ec28bbf636718020316dccb8f3ded9066e250             24748809
 28982370      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/scipy-1.6.3-cp37-cp37m-manylinux1_x86_64.whl
#$# git-fat db19626ba45d0b8f81792c89e457d1f6fc817a34             27390562
 28982378      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/wheel-0.36.2-py2.py3-none-any.whl
#$# git-fat 9e78f9fc756bc09c02c717fae6610cfd6d6a0fe7                35046
 28980353      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/certifi-2020.12.5-py2.py3-none-any.whl
#$# git-fat 7dff15a2066b8809c8772a243991bd1a25740ec3               147526
 28982350      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/jsonschema-3.2.0-py2.py3-none-any.whl
#$# git-fat 13a9abc0b85f73adfea760809110f4520118e1a4                56305
 28982361      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/pymongo-3.11.4-cp37-cp37m-manylinux2014_x86_64.whl
#$# git-fat b126006fdaa0044fff3f61a6d3e116e62a5359b3               512687
 28982380      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/xgboost-0.90-py2.py3-none-manylinux1_x86_64.whl
#$# git-fat 94a40d56f0fd37ed4683c775b455a7264d259037            142822578
 28982377      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/urllib3-1.26.5-py2.py3-none-any.whl
#$# git-fat effca0b8a9f0a0d7e546c880da06e9972357b742               138144
 28982155      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/future-0.18.2-py3-none-any.whl
#$# git-fat 58b165a584aa5236e44651894736ef781d92f387               491059
 28980351      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl
#$# git-fat 4d04149ec1b0035d5d828dd861009039b54069f5               636647
 28982363      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/python_dateutil-2.8.1-py2.py3-none-any.whl
#$# git-fat 3005ff67df93ee276fb8631e17c677df852254ad               227183
 28982357      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/oresapi-0.1.0-py2.py3-none-any.whl
#$# git-fat 95f1700b17dfc9a3d7294f79fd1891f7f192caf1                 8060
 28982373      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/six-1.16.0-py2.py3-none-any.whl
#$# git-fat 79e6f2e4f9e24898f1896df379871b9c9922f147                11053
 28982152      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/docopt-0.6.2-py2.py3-none-any.whl
#$# git-fat 15032b3ee3c325e618abb8468116c2c6be633e0e                13704
 28980523      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/dnspython-1.16.0-py2.py3-none-any.whl
#$# git-fat 9c44f537aa5fcaa2a3b6529bba9c59fc4dae8c50               188353
 28982351      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/kafka_python-1.4.7-py2.py3-none-any.whl
#$# git-fat 2d5dee2f09d2ad3e67addaee9a923dd2751c3a10               266121
 28982375      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/typing_extensions-3.10.0.0-py3-none-any.whl
#$# git-fat 6bb39b4a1d4882bb6889c4830c44a7c22eae5bc5                26127
 28982366      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/python_snappy-0.6.0-cp37-cp37m-manylinux2010_x86_64.whl
#$# git-fat 0a3c96080d53c90097f3979de90207b340fcb451                55288
 28982160      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/importlib_metadata-4.4.0-py3-none-any.whl
#$# git-fat 1fa9299575d630882893a204172782254cb993fa                17263
 28982151      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/dnspython-2.0.0-py3-none-any.whl
#$# git-fat 01e7db5fa5fca5b7ee00c45e2fdbb2f209c1b744               208262
 28982376      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/urllib3-1.26.3-py2.py3-none-any.whl
#$# git-fat bc1f2e29068a85cefc6c7652ae77eea287e0c9d8               137023
 28982381      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./artifacts/zipp-3.4.1-py3-none-any.whl
#$# git-fat 93ce9842312d434a7f2270ec9b02d903e22d7017                 5191
 26498286      4 -rw-rw-r--   1 ebernhardson ebernhardson       74 Sep  1 09:04 ./.mvn/wrapper/maven-wrapper.jar
#$# git-fat 99c11907918309fe94d7e7574a144c7c08077dd4                50710

Event Timeline

I can't explain why the find command above doesn't find them, but the above list is missing a number of things:

find . -name '*.whl' 
./artifacts/decorator-5.0.9-py3-none-any.whl
./artifacts/idna-2.10-py2.py3-none-any.whl
./artifacts/networkx-1.11-py2.py3-none-any.whl
./artifacts/prometheus_client-0.11.0-py2.py3-none-any.whl
./artifacts/python_json_logger-2.0.1-py34-none-any.whl
./artifacts/requests-2.25.1-py2.py3-none-any.whl
./artifacts/setuptools-57.0.0-py3-none-any.whl
./artifacts/pyrsistent-0.17.3-cp37-cp37m-linux_x86_64.whl
./artifacts/hyperopt-0.1.2-py3-none-any.whl
./artifacts/tqdm-4.61.0-py2.py3-none-any.whl
./artifacts/mock-4.0.3-py3-none-any.whl
./artifacts/attrs-21.2.0-py2.py3-none-any.whl
./artifacts/python_logstash-0.4.6-py3-none-any.whl
./artifacts/mwapi-0.5.1-py2.py3-none-any.whl
./artifacts/chardet-4.0.0-py2.py3-none-any.whl
./artifacts/certifi-2021.5.30-py2.py3-none-any.whl
./artifacts/six-1.15.0-py2.py3-none-any.whl
./artifacts/pip-21.1.2-py3-none-any.whl
./artifacts/elasticsearch-5.5.3-py2.py3-none-any.whl
./artifacts/numpy-1.20.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
./artifacts/more_itertools-8.7.0-py3-none-any.whl
./artifacts/py4j-0.10.9.2-py2.py3-none-any.whl
./artifacts/wmf_mjolnir-1.1-py3-none-any.whl
./artifacts/scipy-1.6.3-cp37-cp37m-manylinux1_x86_64.whl
./artifacts/wheel-0.36.2-py2.py3-none-any.whl
./artifacts/certifi-2020.12.5-py2.py3-none-any.whl
./artifacts/jsonschema-3.2.0-py2.py3-none-any.whl
./artifacts/pymongo-3.11.4-cp37-cp37m-manylinux2014_x86_64.whl
./artifacts/xgboost-0.90-py2.py3-none-manylinux1_x86_64.whl
./artifacts/urllib3-1.26.5-py2.py3-none-any.whl
./artifacts/future-0.18.2-py3-none-any.whl
./artifacts/PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl
./artifacts/python_dateutil-2.8.1-py2.py3-none-any.whl
./artifacts/oresapi-0.1.0-py2.py3-none-any.whl
./artifacts/six-1.16.0-py2.py3-none-any.whl
./artifacts/docopt-0.6.2-py2.py3-none-any.whl
./artifacts/dnspython-1.16.0-py2.py3-none-any.whl
./artifacts/kafka_python-1.4.7-py2.py3-none-any.whl
./artifacts/typing_extensions-3.10.0.0-py3-none-any.whl
./artifacts/python_snappy-0.6.0-cp37-cp37m-manylinux2010_x86_64.whl
./artifacts/importlib_metadata-4.4.0-py3-none-any.whl
./artifacts/dnspython-2.0.0-py3-none-any.whl
./artifacts/urllib3-1.26.3-py2.py3-none-any.whl
./artifacts/zipp-3.4.1-py3-none-any.whl
find . -name '*.whl' -or -name '*.jar' -ls -exec cat {} \;

this should instead be (i didn't expect this either):

find . -name '*.whl' -ls -exec cat {} \; -o -name '*.jar' -ls -exec cat {} \;

Weird! Thanks for spotting that.

find is indeed tricky from time to time, well done on figuring out the issue.

I have an alternative to list files that have a filter: fat git attribute which would work for any repo and saves one from having to list the file extensions to manage (it also handles the hypothetical case of a .gitattributes in a subdirectory which would set the filter for additional file extensions beside the ones defined in /.gitattributes):

git ls-files | git check-attr --stdin filter|grep -oP '(.*)(?=: filter: fat)'
Gehel subscribed.

The Search Platform team will be just watching, I don't think we need to do additional work on this. But ping us if needed!

Change 1010249 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[All-Projects@refs/meta/config] Enable LFS for wikimedia/discovery/analytics

https://gerrit.wikimedia.org/r/1010249

Change 1010249 merged by Ahmon Dancy:

[All-Projects@refs/meta/config] Enable LFS for wikimedia/discovery/analytics

https://gerrit.wikimedia.org/r/1010249

Change 1010297 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[wikimedia/discovery/analytics@master] Migrate from git-fat to git-lfs

https://gerrit.wikimedia.org/r/1010297

I tried to move this forward today but my change https://gerrit.wikimedia.org/r/c/wikimedia/discovery/analytics/+/1010297 was rejected by CI, saying that the repo has been archived. Does that mean this ticket can be closed?

I tried to move this forward today but my change https://gerrit.wikimedia.org/r/c/wikimedia/discovery/analytics/+/1010297 was rejected by CI, saying that the repo has been archived. Does that mean this ticket can be closed?

hrm, repo is still active in gerrit, but I see the commit that moved it to archived in integration/config is T346176: Archive wikimedia/discovery/analytics so sounds like we just need to archive this in gerrit, too, so folks can't push.

I tried to move this forward today but my change https://gerrit.wikimedia.org/r/c/wikimedia/discovery/analytics/+/1010297 was rejected by CI, saying that the repo has been archived. Does that mean this ticket can be closed?

hrm, repo is still active in gerrit, but I see the commit that moved it to archived in integration/config is T346176: Archive wikimedia/discovery/analytics so sounds like we just need to archive this in gerrit, too, so folks can't push.

I marked it read-only and set the repo description to [ARCHIVED].

Change 1010297 abandoned by Ahmon Dancy:

[wikimedia/discovery/analytics@master] Migrate from git-fat to git-lfs

Reason:

task declined

https://gerrit.wikimedia.org/r/1010297