Wed, Jan 16
Unfortunately, PPA's cannot be used due to security concerns, but I think the following should work:
Dec 17 2018
Dec 14 2018
There are a number of 'nan' values imported. This seems to be due to the somewhat hacky filtering for unpaywall entries with arXiv identifiers. For example, there may be a link to http://cds.cern.ch/record/681502/files/arXiv:hep-ph_0411095.pdf, but not to the corresponding arXiv page. In those cases, unpaywall_doi_to_arxiv_zonder_initial_sorted would not contain an arXiv identifier, and pandas would substitute it with 'nan' (which is also an odd choice...)
Update imports should also take care not to re-import manually removed statements such as https://www.wikidata.org/w/index.php?title=Q23042979&type=revision&diff=805387135&oldid=798100426
The arXiv identifier dataset has now been fully imported. Next question is how to keep this up to date with new entries in the unpaywall dataset / new dois/Qids being added to Wikidata.
Dec 13 2018
Dec 2 2018
Note for the next time: I forgot the quotation marks around the arXiv identifier. Those should be added in the merged.appy() step.
Split off second part of unpaywall data set using
Dec 1 2018
Just so I'm clear, we're doing this so users don't have to configure a ProxyCommand or the equivalent for Putty (plink.exe), right?
The use case of the HBA system is to allow users to connect to other hosts once they are connected via an ssh terminal. Although this is possible with tunnelling, this is not trivial to set up (especially on Windows).
Nov 14 2018
No longer relevant
Or: use a server password (see https://freenode.net/kb/answer/registration#logging-in).
Entry Nickname/Host Flags 18:49 ----- ---------------------- ----- 18:49 1 James_F +AFRefiorstv [modified 3w 2d 0h ago] 18:49 2 bearND +Aiortv [modified 3w 2d 0h ago] 18:49 3 tgr +Aiortv [modified 3w 2d 0h ago] 18:49 4 mholloway +Aiortv [modified 3w 2d 0h ago] 18:49 5 mateusbs17 +Aiortv [modified 3w 2d 0h ago] 18:49 6 joakino +Afiortv [modified 3w 2d 0h ago] 18:49 7 hip +Aiortv [modified 3w 1d 2h ago] 18:49 ----- ---------------------- -----
On the Gerrit patchset, @Krenair mentioned:
Channel op needs to get FreeNode staff to whitelist it from Sigyn
Nov 8 2018
^ was reported as:
tools.wikibugs@tools-bastion-03:~/wikibugs2$ python manage.py deploy wb2-irc /data/project/wikibugs/wikibugs2$ git rev-list HEAD --max-count=1 91c67baef2dc4e89df861e7dbde9096955d5362a /data/project/wikibugs/wikibugs2$ git reset --hard origin/master HEAD is now at 91c67ba Escape comment blocks /data/project/wikibugs/wikibugs2$ git pull Already up-to-date. /data/project/wikibugs$ qmod -rj wb2-irc Pushed rescheduling of job 1273560 on host tools-exec-1408.eqiad.wmflabs
Nov 7 2018
That depends... I imagine the errors will be smaller in Amsterdam (I think in the case of this windmill, one database points to the windmill itself and one to the street).
This is what Wikibugs submitted to the IRC server:
The matching could be done based on RD coordinates (x_coord, y_coord). For example, the first entry has RD coordinates x=247988m y=443906m (= 51°58'32.8"N 6°44'26.2"E in WGS84). The corresponding coordinates in the Rijksmonumentendatabase (https://cultureelerfgoed.nl/monumenten/526018) are
x=247999m y=443876m -- not exactly the same, but note that the address is also different (Vredenseweg 102 vs Bataafseweg 18).
Nov 6 2018
Nov 5 2018
See T208118 / https://www.wikidata.org/wiki/Wikidata:Dataset_Imports/Unpaywall for details of what I'm working on. Currently my focus is on importing arXiv identifiers, but extending this to add links to institutional repositories (which has the benefit of more often being the actual published version) is a natural extension.
Nov 2 2018
tsreports-dev is most likely a test I did -- I can't think of a reason why tsreports would need to do something with incoming emails. wikibugs is from a previous era where wikibugs reported events based on incoming emails... but that was in the bugzilla era.
The 10.1267/HÄMO04040279 series is a set of no longer working dois -- not sure if they have ever worked as 10.1267 is the prefix for a Japanese publisher -- not a German one. So likely there has been some kind of scraping/OCR issue there. Note that the original pubmed data specifies the dois as Hämo01234567 -- the capitalization is due to the wikidata importer.
The unpaywall arXiv-matched data set does not contain any non-ascii entries. The Inventaire dataset contains a few:
https://www.doi.org/doi_handbook/2_Numbering.html#2.4 specifies dois should be case-insensitive for ASCII dois, and the doi system will reject any dois that already exist in a different case. Next question: are non-ASCII dois a thing?
Import documented at https://www.wikidata.org/wiki/Wikidata:Dataset_Imports/Unpaywall
Oct 31 2018
imagecopy.py displays a graphical interface to verify which images should be transferred. As PAWS is a web based environment, this interface cannot be displayed.
Oct 29 2018
A further import is a little bit more complicated -- it's probably a good idea to skip existing arXiv identifiers instead of letting QS loop over all of them. We can query this import using
Oct 27 2018
QS import status: https://tools.wmflabs.org/quickstatements/#/batch/4905
One problem that still needs to be solved is the upper/lower case structure of dois. I don't know if we can generally assume they are case-insensitive, as that might be publisher dependent. At the same time, many sources do have different capitalization (e.g. PHYSREVLETT vs PhysRevLett).
Pipeline as of now:
Accidentally imported the wrong database (epsg_datum instead of epsg_coordinatereferencesystem). The new import (https://tools.wmflabs.org/mix-n-match/#/catalog/1935) seems much better, although the automatic matching could use some assistance.
Oct 26 2018
https://tools.wmflabs.org/mix-n-match/#/catalog/1934 ; waiting for automatic processing
- See https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:CreateAccount / https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:UserLogin for how this is solved on the beta cluster.
Oct 18 2018
Sep 30 2018
To add some context to the current situation -- most of the email sent to firstname.lastname@example.org is:
- from openbugbounty.org
- about a volunteer maintained tool hosted on tools.wmflabs.org
Sep 27 2018
Sep 7 2018
Aug 25 2018
@Paladox: I'm not sure if this is a breakage due to a Gerrit upgrade or whether this is due to some other reason. I couldn't figure out a simple fix to the problem (I think likely these events sholuld not be reported, and some other events /should/?), and I think the WIP reporting is useful but not critical, so I have backed out the whole WIP reporting for now.
Aug 18 2018
Aug 16 2018
That looks correct to me. Thank you for resubmitting the job!
Aug 5 2018
Jul 2 2018
Jul 1 2018
I'm not convinced it's a good idea to completely hide WIP patchsets from IRC.
Having redis makes the integration with grrrrit-wm much easier, so it makes sense to keep the architecture as-is.
Jun 30 2018
Manually truncated. The logs are rotated; the error log is written by SGE and not trivial to auto-rotate. The size (few MB) is also not much of an issue, even when viewed with a browser, so I think a manual process is fine.
Jun 20 2018
@Xqt: could you add a breakpoint/print statement to
File "C:\Program Files (x86)\Python36-32\lib\logging\__init__.py", line 994, in emit stream.write(msg)
to figure out which stream this is going to? Is this the console or is it trying to log to a file?
Jun 11 2018
Jun 8 2018
Jun 7 2018
9955966 0.30036 lighttpd-w tools.wikibu r 06/06/2018 18:28:30 webgrid-lighttpd@tools-webgrid 1 9956833 0.30035 wb2-phab tools.wikibu r 06/06/2018 18:45:38 email@example.com 1 9958476 0.30032 wb2-grrrri tools.wikibu r 06/06/2018 19:30:39 firstname.lastname@example.org 1 9958477 0.30032 wb2-irc tools.wikibu r 06/06/2018 19:30:41 email@example.com 1
Jun 6 2018
Did some debugging; the issue seems that we always set family=wikipedia and code=en in generate_user_files.py; see the updated description.
You should have access now.
Jun 5 2018
I couldn't get the redirects to work, but you're more than welcome to fiddle with the settings yourself. What is your readthedocs username?
Jun 4 2018
The SMB provider has been merged, so I think this is resolved.