User Details
- User Since
- Oct 24 2014, 11:42 PM (519 w, 3 d)
- Availability
- Available
- LDAP User
- Betacommand
- MediaWiki User
- Unknown
Jan 17 2024
Ive reverted the unilateral change in policy about source code publication. Providing a OSI license with the code has met the toolserver and labs/cloud requirement for the last 18 years. There are reasons to not publicly publish code, if you want to change the policy this is not the correct way. And Im not going to debate it in a ticket either.
The code should be publicly accessible
Per RFC 2119
- SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.
I have reviewed publishing my code and there are a number of valid reasons for not making it accessible to the world. I am not going to re-hash them.
- I dont use Git, I use SVN
- The code is hosted in a private repo to avoid leaking of credentials/non-public information.
- None of the listed WSGI/Flask/FastAPI are the current CGI format that is used on labs ATM
Oct 20 2023
Jan 3 2023
@Tgr Not sure if cleanupUsersWithNoId.php will actually do anything, but I figured Id document everything.
Jan 2 2023
I am not on 1.39 yet, I would NOT call this already documented. the upgrade.php should either invoke the cleanupUsersWithNoId.php or should not run. failing to do so is a destructive upgrade, that multiple users have encountered.
Spent the better part of the day rolling back those changes, after restoring both the original database, and directory structure, I updated git checkout to REL1_35, updated composer, ran cleanupUsersWithNoId.php, then followed that by update.php.
Jan 1 2023
You are correct I was referring to the meta repos. Normally you just set the REL correct and check out the entire repo, then enable extensions on an as needed basis. Avoids having to do 10+ separate checkouts.
rel1_39 was only branched on CORE, not skins or extensions. and there hasn't been a REL branch for those since REL1_35. When attempting to update to the most current LTS it is not possible to match CORE checkout and extension/skin checkouts. This will cause compatibility issues at some point.
Oct 7 2022
I tried moving to K8's when they were first introduced. It went about as well as a cat in a room full of rocking chairs.
Nov 7 2020
Either way, commons does need a reparse for LARGE number of pages. I took a look at anything with a date before 20151106000000 and it returned 1,652,571 pages and going back just a year resulted in 10,722,335 pages. With the improvement of the parser and introduction of a number of tracking categories we should be really parsing pages at least 1 a year if not every 6 months at a minimum.
Oct 17 2020
Ive been following this for a while. How does this interact with SUL? is the login action only tracked on the home wiki, or all wikis you visit?
Oct 18 2019
Also I think you have to do it as a tool, not as yourself
Feb 16 2019
Feb 9 2019
Jan 8 2019
Nov 3 2018
Looks like this is still happening.
Apr 9 2018
Not sure why T117801 was merged. It was about exposing the registered email address for an account, not user to user emails. (useful for cases where the same email address is used to register multiple accounts)
Apr 3 2018
Dec 27 2017
Dec 24 2017
In most wikis the use of non-visible non-breaking spaces is discouraged. What I think the real issue here is the spike in non-breaking spaces being added in the first place. I think all wikied is doing is highlighting the root issue, not that wikied is the problem.
Dec 18 2017
Dec 17 2017
there is an issue with sslv3 in older versions of python which has been fixed in .8 or .9 which is why I was investigating k8s
Oct 16 2017
Oct 14 2017
Oct 3 2017
The other thing you can do when working with external links is to use the linksearch feature (Be sure to check for both http and https versions of the URL) Ive been using it for years and should be a lot more reliable than using the search index.
Sep 22 2017
Ive done some large scale processing but needing more than 4GB of ram is a sign of bad programming. fix that first, dont try to come back later to try and reduce usage then, its too late
Sep 8 2017
Would love to see this feature implemented. With large watchlists people disable the bolding/unbolding for readability purposes. Being able to set a timestamp and only get changes new than that also helps when edits are grouped by page. As of now its manually checking the load timestamp of your watchlist, refreshing and then finding the previous time and reviewing changes since then.
Jun 10 2017
Apr 14 2017
PipVideoJs
why not just keep a continuous thread going with a sleep(60) that way you minimize the number of jobs and don't loose functionality.
and PhpTagsDebugger
and Notifications
looks like DetectLanguage is also throwing the same issue, and I suspect there will be a half dozen or so others that need fixed too.
Apr 13 2017
Mar 20 2017
@BurstPower here is a list of all pages in enwiki's mainspace minus redirects: http://tools.wmflabs.org/betacommand-dev/reports/en_articles.txt
Mar 9 2017
Sorry if that wasn't clear in the initial post. The reason I was thinking of a combined total was to include the JobQueue as part of the value so that it was a representation of the current server lag. Returning the max of either would also work.
I havent looked at the actual code, but the formula looks correct. My thought would be in getMaxLag() to call that formula and if it has a positive value greater than 1 add it to the maxlag total value. So that the final Maxlag value would be ServerLag + JObQueueLag = MaxLag
Most bots use a maxlag check of 5-10 seconds, if we pick a value for each wiki based off average jobqueue size and for every X over that value add a few seconds to the lag param. We would want to scale the check based on wiki size. Take enwiki (These are random numbers not fact checked) has an average JobQueue of 1 million items. for every 100,000 over that add 1 second to the value of maxlag so that when the JobQueue reaches 150% of normal they back off until its down to normal levels.
Mar 8 2017
Mar 7 2017
Mar 5 2017
Suggestion, when the job queue gets too high set the maxlag parameter to a higher value, most bots use that as a throttle.
Feb 25 2017
Dec 15 2016
I would say this is functioning as intended all /wiki urls are in the short url form. Short urls do not handle any URL params. index.php is the correct access point for URLs that need params passed.
Dec 6 2016
I know jsub already has an option to log to a specific directory (Ive been using this for a while to avoid sge logs flooding my $HOME perhaps setting a global option using a username variable?
Oct 21 2016
It was every six months and happened via a login/script going to 4 times a year would be kind of obnoxious for most. I have things configured so it is mostly hands off for a reason.
Oct 17 2016
Quite, would help tracking down the remaining non-wikidata entries and addressing/fixing those.
Aug 19 2016
Aug 5 2016
I saw a note that commons was working which is why I asked
Which wikis are currently affected?
Aug 2 2016
@jcrespo Ok, looks like we have been on two different pages. I thought you where trying to make this a labs only "feature", where as you just want a proof of concept/viability. I know you are doing quite a bit to improve labs quality of service, and this wasn't a dig on that. Rather it was planning for the issues that labs has historically had, and trying to limit the long term impact of those issues. If you want to get it implemented on labs go ahead. Ill reach out to my Wayback Machine contacts and see what the best process moving forward would be for mass archivals.
Production issues may occur but getting those fixed is fairly easy, either purging or null editing the page in question is enough. Labs drift on the other hand requires a full re-import (which would loose all of the custom timestamps added over time).
@jcrespo Nothing against labs, but T138967 is a constant and persistent issue which would make the data useless upon the next re-import. Keeping this in mediawiki allows more consistent data and fewer chances of corruption. Yes page blanking vandalism would cause minor variances but isn't a significant issue in the big picture. Yes having a complete historical reference might be nice, its just not practical at this time. The category links table already has the same feature that I'm looking to add to the el table and allows tools like http://tools.wmflabs.org/betacommand-dev/reports/CATCSD.html for patrolling admins
One of the most useful use cases is for having a bot that feeds URLs to archiving services such as the wayback or webcite in a manor that is actually feasible. Right now the only way to do a task like that would be to keep an offline duplicate of the el table and compare it during each run. (which eats a LOT of disk space, CPU, and time) Having a incremental based reference point would mean that only those links newer than the last run would need to be processed, and that when retrieving copies of the archived version of a link we have a time reference to work with without needing to parse the history of the article.
Aug 1 2016
Jun 29 2016
Jun 27 2016
Jun 26 2016
Wont happen, the amount of cases where sensitive information is contained in deleted revisions prevents mass automatic access.
Jun 25 2016
Recommend closing as invalid. the information that the user wants is being pulled from wikidata via module see https://en.wikipedia.org/wiki/Template:Authority_control
May 31 2016
Not sure how I got listed as a maintainer, I am not involved with this project.
Apr 17 2016
Apr 13 2016
I would say 2 total messages, one for IPs one for users. If further customization is needed the messages can be updated on wiki to tweak the output.
Apr 6 2016
Jan 20 2016
Jan 4 2016
Dec 30 2015
Yeah, sorry for the delay in getting back to this, I have a dump from a few months after this, but it doesnt look like the revision is in it.
Dec 10 2015
Dec 9 2015
Dec 5 2015
I might have a few tricks for recovering this revision, give me a day or two Ill see what I can do.
Dec 3 2015
I think that 1.27 should ship with 5.3 or 5.4, and then move to 5.6 after the LTS, Given how difficult it is to get third party hosting providers to update php we shouldn't use our LTS to move to a cutting edge version. Lets use 1.28 for that.
Dec 1 2015
`
tools.betacommand-dev@tools-bastion-01:~$ mysql --defaults-file="${HOME}"/replica.my.cnf -h labsdb1001.eqiad.wmnet plwikisource_p -e "SELECT count(page_id) FROM page WHERE page_namespace=100 AND page_is_redirect=0" +----------------+ | count(page_id) | +----------------+ | 241003 | +----------------+ tools.betacommand-dev@tools-bastion-01:~$ mysql --defaults-file="${HOME}"/replica.my.cnf -h labsdb1002.eqiad.wmnet plwikisource_p -e "SELECT count(page_id) FROM page WHERE page_namespace=100 AND page_is_redirect=0" +----------------+ | count(page_id) | +----------------+ | 241003 | +----------------+ tools.betacommand-dev@tools-bastion-01:~$ mysql --defaults-file="${HOME}"/replica.my.cnf -h labsdb1003.eqiad.wmnet plwikisource_p -e "SELECT count(page_id) FROM page WHERE page_namespace=100 AND page_is_redirect=0" +----------------+ | count(page_id) | +----------------+ | 240973 | +----------------+
Nov 29 2015
Nov 23 2015
I know this is more work, but I think the community would prefer that you do a fresh import of the data. Having an unknown volume of data corruption in a table is not a good idea. Starting with a fresh copy would ensure the reliability of the data in question.
Nov 22 2015
Yeah, http://tools.wmflabs.org/betacommand-dev/cgi-bin/replag is showing 28047 behind
Nov 20 2015
I must agree with the others, the "improvements" that where made are not good. Depending on activity I can switch between the 1,2,6 Hour options or even 12-24 on a regular basis. What was 1 click is now 3 each time..... Why are you making the UI more complex to use?