Thu, Mar 21
Tue, Mar 19
Seems to work, so closing (and unassigning myself).
Apparently I was still assigned to this; unassigning. (or we could close this -- maybe crashing isn't so bad, as this did lead to the issue being noticed)
Unassigning; if anyone wants to pick this up they are more than welcome, but it's not on my radar to fix.
Looking at the logs, my hypothesis is the following:
Sun, Mar 17
- become mytoolname: sudo -niu tools.mytoolname
- crontab -l: ssh cat /etc/toollabs-cronhost` crontab -l`
Sat, Mar 16
@MarioFinale: as a workaround, I would suggest ignoring these jobs, and resubmitting them under a different name. Once the NFS issues are resolved (or the host is rebooted), they should disappear.
Attempted a force-umount using sudo umount -fr /mnt/nfs/labstore-secondary-tools-project, but this only resulted in umount.nfs4: /mnt/nfs/labstore-secondary-tools-project: device is busy.
I've done some initial investigation, but I'm unable to find the root cause. Some observations:
Thanks for taking a look at this. I migrated the bot to Stretch and Python 3 around that time, so that is likely the origin of the issue. I'll take a look in a bit more detail.
Fri, Mar 15
Thu, Mar 14
Notes for gerrit-reviewer-bot.
Wed, Mar 13
tsreports was surprisingly easy, but was not converted to Python 3 - so it will disappear when Python 2's deprecation (2020... maybe 2022 for Stretch) comes along. I added a sitenotice with steps to convert queries to Quarry instead.
Thu, Mar 7
gerrit-patch-uploader done -- it's insanely fast now!
Wed, Mar 6
RTB is now also converted.
Wikibugs is now fully Trusty-free! Next up: RTB. Hopefully as painless...
👍 after upgrading BS4 it seems to run smoothly again
Less luck there:
File "/data/project/wikibugs/wikibugs2/wikibugs.py", line 8, in <module> from bs4 import BeautifulSoup File "/mnt/nfs/labstore-secondary-tools-project/wikibugs/py35-stretch/lib/python3.5/site-packages/bs4/__init__.py", line 30, in <module> from .builder import builder_registry, ParserRejectedMarkup File "/mnt/nfs/labstore-secondary-tools-project/wikibugs/py35-stretch/lib/python3.5/site-packages/bs4/builder/__init__.py", line 308, in <module> from . import _htmlparser File "/mnt/nfs/labstore-secondary-tools-project/wikibugs/py35-stretch/lib/python3.5/site-packages/bs4/builder/_htmlparser.py", line 7, in <module> from html.parser import ( ImportError: cannot import name 'HTMLParseError'
✔ wb2-irc works, now testing wb2-phab.
😊 emoji test!
Testing IRC bot
Taxonomy is re-added as crontab. Now looking at the rest of the processes.
Mon, Mar 4
Contact converted to k8s uwsgi webservice. Notes:
Sun, Mar 3
Took a look at tvpmelder -- this seems more of an headache:
Sat, Mar 2
I have changed the overall structure of the project directory a bit. There are now two shared virtualenvs:
@MarcoAurelio mentioned on IRC that this also happens on login, and I could reproduce it as well:
Last login: Sat Mar 2 12:10:40 2019 from [...] groups: cannot find name for group ID 50062 valhallasw@tools-sgebastion-07:~$ id
/mnt/nfs/labstore1003-scratch also seems accessible again, so I think the problem should resolve itself now, but I think this still needs some investigation of why this happened.
labstore1006 seems to have recovered:
I'm not sure whether the Puppet are related -- the puppetmaster does not (always?) start correctly when it needs to respond to requests:
sudo mount -o remount /mnt/nfs/labstore1003-scratch does not help to bring the scratch mount back online. Note, the mount parameters changed from
The two problematic mounts are set to soft,timeo=300, which means any activity should time out after 30 seconds. But this is not consistent with what I see: ls does not time out, and there are many prometheus processes older than 30 seconds.
Something strange is going on here. Taking tools-exec-1401 as an example:
Feb 20 2019
virtualenv -p python3 py35-stretch ~/py35-stretch/bin/pip install -r requirements.txt # Successfully installed PyYAML-3.13 asyncio-redis-0.12.3 atomicwrites-1.3.0 attrs-18.2.0 beautifulsoup4-4.3.2 certifi-2018.11.29 chardet-3.0.4 click-6.7 docopt-0.6.2 fab-1.4.1 idna-2.8 irc3-0.5.1 more-itertools-6.0.0 pathlib2-2.3.3 pluggy-0.8.1 py-1.7.0 pytest-4.3.0 redis-2.10.3 requests-2.21.0 six-1.12.0 urllib3-1.24.1 venusian-1.2.0
Feb 18 2019
Feb 17 2019
- Runs as a uWSGI Python tool, 2.7.6
- should probably converted to an uWSGI k8s tool, possibly on Python3?
Feb 15 2019
Jan 28 2019
Reading through the bs4 changelog: https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/CHANGELOG I think this should be resolved in 4.4+. The warning seems gone with 4.7.1, so let's go to that version :-)
Jan 27 2019
Jan 22 2019
Sorry to hear that your bot was offline for 4 months. In general, a job that runs for so long is not an issue -- for example, Wikibugs regularly runs for months at a time without job resubmission.
Jan 20 2019
I fiddled around a bit with docker to try and find the minimal dependencies needed to pip install py3exiv2:
Looking at the stack trace, the user is running a very old version of pywikibot; the use of textlib.replace_links was removed in november 2015 (8a7c42f5).
Hi @Inaki-LL , thank you for reporting this. I'm moving this task to the VisualEditor project; I think the maintainers of that project should be able to determine whether this is expected and if not, what can be done about it.
Jan 16 2019
Unfortunately, PPA's cannot be used due to security concerns, but I think the following should work:
Dec 17 2018
Dec 14 2018
There are a number of 'nan' values imported. This seems to be due to the somewhat hacky filtering for unpaywall entries with arXiv identifiers. For example, there may be a link to http://cds.cern.ch/record/681502/files/arXiv:hep-ph_0411095.pdf, but not to the corresponding arXiv page. In those cases, unpaywall_doi_to_arxiv_zonder_initial_sorted would not contain an arXiv identifier, and pandas would substitute it with 'nan' (which is also an odd choice...)