Sep 13 2020
@Legoktm since it is hard deprecated, shouldn't it be removed from tests/phpunit/MediaWikiIntegrationTestCase.php in core?
Thanks @zhuyifei1999 ! Have you been able to work on this any?
Apr 25 2020
AND it looks like this task can finally be closed :)
Thanks for merging this @Aklapper !
@Yaron_Koren could you please take a look at this? Thanks!
Thanks again for the ping @Aklapper
@Clump could you please take a look at this? Thanks!
@Aklapper thanks for the ping. Committed :)
Thanks for the prod @Aklapper . For this one it appears like it has since been resolved by someone else.
Apr 10 2020
It took several hours (didn't immediately report this), but it just decided to work. Bizarre.
Apr 7 2020
There is currently an ongoing discussion very closely relating to this on Meta that appears to be garnering consensus in the other direction/against this ticket/proposal.
Apr 4 2020
The "vanilla" requests code that would accomplish this (using the same format as API:Users) is:
Mar 15 2020
Mar 12 2020
I am using the latest sseclient. I assume that it isn't "real" as event streams shouldn't be 504ing constantly? That wouldn't make sense & would mean that recent changes should be broken etc? This script is connecting to it through pywikibot's. site_rc_listener().
Mar 11 2020
This is happening only a few seconds after I start the script.
Going to resolve this then, given that it appears unavoidable. At least now that it throughs an exception, it can be handled (ie skip to the next item in the queue).
Mar 8 2020
Mar 6 2020
Mar 5 2020
Mar 2 2020
I just discovered that rcwatcher.py crashed at some point within the past couple of days. Interesting.
Traceback (most recent call last): File "rcwatcher.py", line 65, in <module> main() File "rcwatcher.py", line 57, in main run_watcher() File "rcwatcher.py", line 41, in run_watcher for change in rc: File "/usr/local/lib/python3.8/site-packages/pywikibot/comms/eventstreams.py", line 291, in __iter__ self.source = EventSource(**self.sse_kwargs) File "/home/thesanddoctor/sseclient/sseclient.py", line 48, in __init__ self._connect() File "/home/thesanddoctor/sseclient/sseclient.py", line 63, in _connect self.resp.raise_for_status() File "/home/ccc/.local/lib/python3.8/site-packages/requests/models.py", line 941, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://stream.wikimedia.org/v2/stream/recentchange
Feb 28 2020
@Dvorapa I saw that myself & just updated my above comment prior to seeing your response. It appears to have slipped through; I will have to add a catch for that. Others still relevant though.
Since updating to the latest master version of sseclient (post-fix merges) more of the workers crashed than usual (4 of the 5). 3 of the 4 crashes were due to the same issue.
Feb 26 2020
@zhuyifei1999 requests has been updated & the workers/feeder all restarted. I have re-started test_rc.py and will post back here if anything crashes. If it is good in a few days/week or something like that I think we could consider this resolved. Thanks for your help so far!
Feb 25 2020
@zhuyifei1999 Yes, first and third are two separate customers. The first and second are working with the same customer. The third (code) is just a plain/direct printing of the file name straight from recent changes listener and trying to turn it into a file page (until it fails).
@zhuyifei1999 the first and second traceback are from "production" worker instances and pop items off of the same redis queue (all fed by a single instance of rcwatcher.py), thus they wouldn't get the same image. So it isn't feasible that they would crash all at once. They basically get images first-come, first-serve from recent changes.
@Dvorapa python 3(.8)
@Dvorapa all encoding is set to UTF-8.
@zhuyifei1999 For both of these worth noting that I have not updated the the latest version with the change in behaviour that this task merged.
Feb 24 2020
Thanks @Yaron_Koren !
Feb 23 2020
@zhuyifei1999 Unknown at this point. Implemented and running alongside it now. If/when either it or the any of the 5 workers crash, will report back here.
@zhuyifei1999 The only log currently available is as follows (and linked above):
Feb 22 2020
@Dvorapa Commons. The issue appears to happen at random. I have improved the ordering of my logs so next time it happens it should hopefully actually tell me the file name at issue (configured to log the file name before trying to make a FilePage object out of it, hopefully it will do that before crashing). Given that the files are only run from recent changes if they are new uploads, this isn't something easily repeatable and does appear to happen at random. I will update here when I have more logs. Thank you for your patch to make the exception catchable.
@Dvorapa grabs the file from recent changes using site_rc_listener (script, ImageObj) and then sends it to rcworker (linked above) using redis. rcworker then creates a pwb FilePage object out of the title from the recent changes log and processes the file. site_rc_listener is what must be giving it the invalid image titles? Something just doesn't add up here for me as it doesn't make sense why the script is being given invalid image titles by pwb's site_rc_listener.
Merged. Thanks @Tgr !
With @Tgr 's help, a new patch set has been uploaded that is functional. Just awaiting review.
@Dvorapa But what would cause it to return it then when looking at images? I am sort of confused here.
@Yaron_Koren could you please take a look?
Feb 21 2020
@zhuyifei1999 Do you think that such a raise could be made? The problem that I see with both handlings though is that the titles are not "invalid" as they are the valid image titles on the wiki(?). I am also having this issue when it comes to my Commons Corruption Checking task.
Feb 16 2020
Feb 7 2020
Feb 6 2020
Jan 31 2020
@DannyS712 is there anything further needing doing here (specifically this subtask) or is this good to close?
Jan 26 2020
Stopped mine manually. Looks like it is down here too. British Columbia if it matters.
traceroute en.wikipedia.org traceroute to dyna.wikimedia.org (188.8.131.52), 64 hops max, 52 byte packets 1 192.168.1.254 (192.168.1.254) 5.488 ms 5.123 ms 1.664 ms 2 10.31.128.1 (10.31.128.1) 1091.086 ms 910.696 ms 982.800 ms 3 184.108.40.206 (220.127.116.11) 1120.761 ms 686.399 ms 999.062 ms 4 18.104.22.168 (22.214.171.124) 1003.838 ms 63.769 ms 1001.126 ms 5 ae7.cs2.sea1.us.zip.zayo.com (126.96.36.199) 999.608 ms 133.960 ms 908.272 ms 6 ae3.cs2.sjc2.us.eth.zayo.com (188.8.131.52) 940.233 ms 980.503 ms 1124.357 ms 7 ae27.cr2.sjc2.us.zip.zayo.com (184.108.40.206) 151.749 ms 36.743 ms 139.132 ms 8 ae11.mpr4.sfo3.us.zip.zayo.com (220.127.116.11) 479.827 ms 938.800 ms 1046.789 ms 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * * 19 * * * 20 * * * 21 * * * 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * * * 29 * * * 30 * * * 31 * * * 32 * * * 33 * * * 34 * * * 35 * * * 36 * * * 37 * * * 38 * * * 39 * * * 40 * * * 41 * * * 42 * * * 43 * * * 44 * * * 45 * * * 46 * * * 47 * * * 48 * * * 49 * * * 50 * * *