GitHub: https://github.com/he7d3r
User Details
- User Since
- Oct 6 2014, 11:25 PM (600 w, 3 d)
- Availability
- Available
- IRC Nick
- he7d3r
- LDAP User
- He7d3r
- MediaWiki User
- He7d3r [ Global Accounts ]
Apr 22 2025
I followed these steps a few moments ago (using Brave browser on Ubuntu):
- Go to https://pt.wikipedia.org/wiki/Categoria:Teoria_dos_n%C3%B3s?uselang=en&skin=vector
- Click on Add Links on the Languages section of the sidebar
- Copy/paste "enwiki" to the "Language" field
- Copy/paste "Category:Knot theory" to the Page field
- Press ENTER and wait for the "Link with page" confirmation screen
- Press ENTER again to save
Jun 27 2024
Nope, so I went ahead and just disabled the rules.
Mar 24 2024
Thank you! This worked:
const p = mw.util.addPortlet( 'foo', 'Foo portlet', '#p-interaction' ); mw.util.addPortletLink('foo', '#', 'New portlet link'); mw.util.addPortletLink('foo', '#', 'New portlet link 2'); if ( p ) { p.parentNode.appendChild(p); }
Mar 13 2024
That is great! Thank you! 😃
Mar 4 2024
Mar 2 2024
Aug 10 2023
Jul 2 2023
I believe the initial plan was to first finish the task T113682: Finish conversion from LQT to Flow on pt.wikibooks and then that would allow archiving the old LiquidThreads topics by leaving them read-only (current task).
Feb 23 2023
Aug 20 2022
Apr 10 2022
Having said all that, the ideal solution would be for abuse filters to provide more contextual feedback (T174554) and allow to define some rules as being safe to be checked in real time. this would allow all kinds of editors (not only Content Translation) to provide early and contextual guidance to users.
Feb 6 2022
What is blocking this?
I'm unable to reproduce this at the moment (tried on Firefox and Brave browsers)
Aug 29 2021
Jul 18 2021
Jul 16 2021
Apr 28 2021
Apr 19 2021
I'm also seeing translated content disappear when it is inside templates. E.g.: the paragraph inside "Citation needed span" in
https://en.wikipedia.org/wiki/Quadratic_voting?oldid=1012353281
Even if I make some edit after that paragraph, and wait for it to be saved, the English content is restored and the translation is lost.
Mar 27 2021
I'm ok with closing it as declined.
Mar 25 2021
Jan 25 2021
Jan 5 2021
I believe fixing T134678: ContentTranslation should generate an AbuseFilter log whenever it shows a warning for the users would help debugging this.
Dec 23 2020
This happened in most/all of my recent translations using the tool, and it also happens for the translation which is already in progress, at
https://pt.wikipedia.org/wiki/Special:ContentTranslation?page=Group+extension&from=en&to=pt&targettitle=Extens%C3%A3o+de+grupo
Dec 20 2020
I had to use the same workaround again for https://pt.wikipedia.org/w/index.php?diff=60053113
Dec 15 2020
For some reason I get 404 Not Found from https://cxserver.wikimedia.org/v2/page/en/Word2vec while https://cxserver.wikimedia.org/v1/page/en/Word2vec seems to work fine
Almost three years later, I'm still unable to start a new translation of https://en.wikipedia.org/wiki/Word2vec
Now it says:
Loading the saved translation...
followed by
The "Word2vec" page could not be found in English Wikipedia
which is incorrect, given the link above is working.
Dec 10 2020
Dec 9 2020
Dec 6 2020
Oct 21 2020
Oct 20 2020
Oct 13 2020
@Danilo made these tools for that:
https://ptwikis.toolforge.org/FiltroIP
https://ptwikis.toolforge.org/Filtros:180
Oct 12 2020
Oct 10 2020
Oct 5 2020
Indeed, now the main page says it is "Enabled, throttled":
https://pt.wikipedia.org/wiki/Special:AbuseFilter?offset=179&limit=1&uselang=en
I wonder if this deserves a higher priority now, given that https://gerrit.wikimedia.org/r/609773 implemented some kind of metadata (T254074: Implement the reverted edit tag). This feature should help with part of the concerns raised at
Aug 27 2020
Jul 22 2020
Jul 16 2020
Jul 7 2020
Jun 30 2020
Jun 27 2020
Jun 23 2020
@Danilo generated the following table comparing articlequality scores for the latest version of all articles to the scores which would be produced by the Python script which is/was used to make bot assessments:
MariaDB [s51206__ptwikis]> SELECT pe_qualidade, SUM(pe_qores = 0) ORES_0, SUM(pe_qores = 1) ORES_1, SUM(pe_qores = 2) ORES_2, SUM(pe_qores = 3) ORES_3, SUM(pe_qores = 4) ORES_4, SUM(pe_qores = 5) ORES_5, SUM(pe_qores = 6) ORES_6 FROM page_extra GROUP BY pe_qualidade ORDER BY pe_qualidade; +--------------+--------+--------+--------+--------+--------+--------+--------+ | pe_qualidade | ORES_0 | ORES_1 | ORES_2 | ORES_3 | ORES_4 | ORES_5 | ORES_6 | +--------------+--------+--------+--------+--------+--------+--------+--------+ | 0 | 68218 | 0 | 0 | 0 | 0 | 0 | 0 | | 1 | 3 | 618819 | 204187 | 27523 | 1847 | 3562 | 3261 | | 2 | 0 | 5565 | 69323 | 24777 | 1390 | 7496 | 350 | | 3 | 0 | 71 | 472 | 14361 | 1861 | 7412 | 572 | | 4 | 0 | 5 | 10 | 2948 | 2361 | 2978 | 1208 | | 5 | 0 | 0 | 16 | 59 | 136 | 1056 | 161 | | 6 | 0 | 0 | 0 | 35 | 190 | 188 | 782 | +--------------+--------+--------+--------+--------+--------+--------+--------+ 7 rows in set (3.70 sec)
(the label is set to zero if the quality is unknown, possibly due to the page being deleted)
Jun 19 2020
Jun 18 2020
Jun 14 2020
I confirmed this by replacing my global.js by console.log( 'Started global.js.' ); and then loading
https://pt.wikipedia.org/wiki/Special:BlankPage?debug=true
There should be a log in the console, but it was not there.
Jun 5 2020
May 23 2020
I've submitted https://github.com/wikimedia/articlequality/pull/132
May 22 2020
Updated info (as of commit c3a66b0 plus the specific changes which define each of the tests):
accuracy (micro=0.8, macro=0.861): 1 2 3 4 5 6 ----- ----- ----- ----- ----- ----- 0.781 0.827 0.877 0.899 0.875 0.908
$ cat datasets/ptwiki.labelings.20200301.remove_bots.json | json2tsv wp10 | sort | uniq -c 145657 1 32807 2 6177 3 2346 4 1646 5 1542 6 $ cat datasets/ptwiki.balanced_labelings.9k_2020.remove_bots.json | json2tsv wp10 | sort | uniq -c 1500 1 1500 2 1500 3 1500 4 1500 5 1328 6
accuracy (micro=0.81, macro=0.875): 1 2 3 4 5 6 ----- ----- ----- ----- ----- ----- 0.799 0.806 0.867 0.915 0.928 0.933
$ cat datasets/ptwiki.labelings.20200301.since_2014.json | json2tsv wp10 | sort | uniq -c 7537 1 3346 2 1276 3 690 4 653 5 684 6 $ cat datasets/ptwiki.balanced_labelings.9k_2020.since_2014.json | json2tsv wp10 | sort | uniq -c 1500 1 1500 2 1276 3 690 4 653 5 684 6
May 19 2020
@Halfak: Oops... I missed the -v flag when I used grep to remove the bot assessments. So, instead of considering only human assessments, I extracted only the bot assessments! Once I add that flag, the number of assessments by humans seems more reasonable:
$ cat datasets/ptwiki.labelings.20200301.user.json |grep -v -P '"user": "[^"]*([Bb][Oo][Tt]|[Rr][Oo][Bb][ÔôOo])[^"]*"' | json2tsv wp10 | sort | uniq -c 28403 1 13343 2 5329 3 2209 4 1458 5 1281 6
In this case, the explanation for such a high accuracy is likely that the bots assessments are very predictable (it is hardcoded in their dna code ;-).
For future reference: there is now a translation at https://pt.wikipedia.org/wiki/User:EpochFail/ArticleQuality
May 18 2020
@GoEThe: in case you have any suggestions on better images for this purpose, we can try changing them. @Halfak suggested the https://commons.wikimedia.org/wiki/Category:OOUI_icons as a good source of icons we could use.
@GoEThe : I see you've installed the version of the script I mentioned at T246667#6079484. Did you have the chance to test it on Special:Newpages? Is it good enough for us to publicize it for other users?
PS: I didn't change the thresholds in the Makefile, so the samples were not as balanced as might be wanted:
(Note: By mistake, I forgot the -v flag in the grep above, so the results for the first case are inverted, that is, they contain bot_only, instead of no_bots)
$ cat datasets/ptwiki.balanced_labelings.9k_2020.no_bots.json | json2tsv wp10 | sort | uniq -c 1500 1 1500 2 759 3 20 4 95 5 203 6 $ cat datasets/ptwiki.balanced_labelings.9k_2020.since_2014.json | json2tsv wp10 | sort | uniq -c 1500 1 1500 2 1247 3 674 4 630 5 654 6
May 15 2020
Halfak fixed this in https://github.com/wikimedia/wikilabels/pull/263
May 12 2020
May 11 2020
I've tried to do this:
diff --git a/Dockerfile b/Dockerfile index 5fc2d0d..19f39a5 100644 --- a/Dockerfile +++ b/Dockerfile @@ -5,7 +5,8 @@ RUN apt-get update && apt-get install -y \ g++ \ python3-dev \ libmemcached-dev \ - libz-dev + libz-dev \ + memcached
May 10 2020
After patching¹ the extractor to also collect user names, I found that these are the top 10 users who added/modified the most assessments:
FMTbot 91366 Rei-bot 25155 BotStats 15830 Fabiano Tatsch 14829 Leandro Drudo 4660 GoEThe 3172 Burmeister 3128 Rei-artur 2965 FilRBot 2444 VítoR Valente 1895
Then I produced² the following graphs showing the number of labels added/modified by bots³ by year, for each of the six quality levels. There are many quality 1 and 2 assessments made by bots.
May 9 2020
Here are some graphs showing the evolution of the assessments extracted from ptwiki:
May 8 2020
It occurred to me that some of these expressions are also used by Salebot¹, with the difference that in the bot config² users assign a score to each word/regex indicating how much it contributes towards classifying an edit as needing to be reverted. This allows it to "ignore" words which are common in good edits, unless there are too many of them.
May 7 2020
See https://github.com/wikimedia/articlequality/pull/127 for a possible solution.
May 6 2020
It is not uncommon for some good faith edit to add a new expression (or badly written regex) to such lists and then breaking (to some extent) the tools which use them (e.g. increasing its false positives).
Here are some examples of existing lists, of varying quality and formats, used by other tools:
- https://pt.wikipedia.org/wiki/WP:Software/Anti-vandal_tool/badwords
- https://pt.wikipedia.org/wiki/User:Alchimista/Aleph_Bot/Express%C3%B5es
- https://pt.wikipedia.org/wiki/User:Salebot/Config
- https://pt.wikipedia.org/wiki/WP:Projetos/AntiVandalismo/Express%C3%B5es_problem%C3%A1ticas
- https://pt.wikipedia.org/wiki/User:O_Politizador/curiosidades_apagadas/Lista_de_express%C3%B5es_idiom%C3%A1ticas_com_palavras_de_baixo_cal%C3%A3o
I've updated the patch.
Is this still wanted nowadays?
May 5 2020
While the dumps are processed, we could store the <id> of the talk pages instead of their <title>s. Then, an API query such as
https://pt.wikipedia.org/w/api.php?action=query&format=json&prop=info&pageids=18363&formatversion=2&inprop=subjectid
will return the <id> of the associated subject page (the one whose text we are interested in). This should work when pages are moved, since page moves do not change the pageid (but it is not guaranteed if the page is deleted and restored).
May 1 2020
Could the number of labels per article have a negative impact on the quality of the model?
These are the frequency of the number of labels/page in the full set and in the 9k sample:
$ cat ptwiki.labelings.20200301.json | json2tsv page_title | sort | uniq -c | cut -c-8 | sort |uniq -c 181477 1 3042 2 517 3 100 4 19 5 2 6 2 7
Apr 28 2020
The following pull request is related to improving the articlequality model: https://github.com/wikimedia/articlequality/pull/122
See https://github.com/wikimedia/articlequality/pull/122 for another possible explanation for the problem:
I didn't train extract/retrain the model after the change to verify its impact on the metrics, but I believe it might help by improving the dataset quality.
@Halfak do you have a quick way to get how many assessments by each user in the dataset ptwiki.balanced_labelings.*_2020.json which was used for articlequality model? Are we getting labels from a diverse set of users or mostly from just a few users?
Apr 27 2020
It could be. For example, @Darwinius noticed that images loaded from Wikidata are not counted:
https://www.mediawiki.org/wiki/ORES/Issues/Article_quality?diff=3804470
I wouldn't be surprised if there was some obscure problem with feature extraction.
@GoEThe Correct me if I'm mistaken, but I believe a reasonable amount of new articles having vandalism or spam would contain expressions such as the words_to_watch mentioned by Halfak. For reference, the expressions are listed at
https://github.com/wikimedia/revscoring/blob/76c737f2998bbba5b5dd942823f43383f1a4b47e/revscoring/languages/portuguese.py#L153-L189
That is odd. Does this tuning report reflect only the changes in the ptwiki features, or does it also include other articles to the dataset as mentioned at T246667#6067366?
Apr 23 2020
@Halfak: what should be considered as a true positive in these multi-class classification problems? (when filling the template misclassification report at mw:ORES/Issues/Article quality)
Would a "featured article" be a "positive" case for the articlequality model, and any other level is a "negative"? Or something else?
What about the draftquality model? (in this case it does not even seem to have any implicit order between the classes (e.g. nothing like OK < spam < unsuitable)

