Page MenuHomePhabricator

He7d3r (Helder)
Research

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 6 2014, 11:25 PM (344 w, 9 h)
Availability
Available
IRC Nick
he7d3r
LDAP User
He7d3r
MediaWiki User
He7d3r [ Global Accounts ]

Recent Activity

Wed, Apr 28

He7d3r updated the task description for T281346: Cannot start or resume a translation for articles with spaces or non-ascii characters in the title.
Wed, Apr 28, 11:53 AM · MW-1.37-notes (1.37.0-wmf.4; 2021-05-04), User-notice, Patch-For-Review, Language-Team (Language-2021-April-June), ContentTranslation
He7d3r created T281354: Unable to open any translation in progress.
Wed, Apr 28, 11:37 AM · ContentTranslation

Mon, Apr 19

He7d3r added a comment to T250382: Content is not saved.

I'm also seeing translated content disappear when it is inside templates. E.g.: the paragraph inside "Citation needed span" in
https://en.wikipedia.org/wiki/Quadratic_voting?oldid=1012353281
Even if I make some edit after that paragraph, and wait for it to be saved, the English content is restored and the translation is lost.

Mon, Apr 19, 5:10 PM · ContentTranslation

Apr 1 2021

GrounderUK awarded T63547: Make [[Special:WhatLinksHere]] and [[Special:RecentChangesLinked]] work with links which use [[Special:MyLanguage]] a Doubloon token.
Apr 1 2021, 10:33 PM · I18n, MediaWiki-General

Mar 27 2021

He7d3r added a comment to T93608: Switch ProofreadPage to use indicator tags.

I'm ok with closing it as declined.

Mar 27 2021, 12:41 PM · Patch-For-Review, ProofreadPage
He7d3r added a comment to T269632: Abuse Filter Graphs on ptwikis.toolforge.org loads jquery from external website.

It looks like there are quite a few other Content-Security-Policy violations for this tool: https://csp-report.toolforge.org/search?ft=ptwikis

Mar 27 2021, 12:14 PM · Privacy, Tools

Mar 25 2021

He7d3r created T278417: Allow editing source article on Special:ContentTranslation.
Mar 25 2021, 10:10 AM · ContentTranslation

Feb 7 2021

He7d3r awarded T30563: WikiEditor - Streamlining and customizing the groups or modules loaded by default should require less manual scripting a Heartbreak token.
Feb 7 2021, 10:07 PM · WikiEditor

Feb 3 2021

He7d3r awarded T209874: CX2: Unable to type diacritics in the translation a Heartbreak token.
Feb 3 2021, 10:42 AM · VisualEditor, ContentTranslation

Jan 25 2021

He7d3r created T272857: Schema code samples popup appears under the JSON table.
Jan 25 2021, 2:46 PM · Analytics-Radar, CSS, Analytics-EventLogging

Jan 24 2021

He7d3r awarded T155541: [Epic] Article importance prediction model a Love token.
Jan 24 2021, 10:34 PM · Research, Machine-Learning-Team, artificial-intelligence

Jan 14 2021

Krinkle awarded T29531: Implement link anchors to line numbers on syntax-highlighted pages (e.g. .css, .js) a Orange Medal token.
Jan 14 2021, 2:21 AM · MW-1.36-notes (1.36.0-wmf.26; 2021-01-12), User-notice, SyntaxHighlight

Jan 11 2021

SD0001 awarded T29531: Implement link anchors to line numbers on syntax-highlighted pages (e.g. .css, .js) a Love token.
Jan 11 2021, 4:55 PM · MW-1.36-notes (1.36.0-wmf.26; 2021-01-12), User-notice, SyntaxHighlight

Jan 5 2021

He7d3r added a comment to T207842: CX2: Abuse filter unexpectedly triggered.

I believe fixing T134678: ContentTranslation should generate an AbuseFilter log whenever it shows a warning for the users would help debugging this.

Jan 5 2021, 9:07 PM · ContentTranslation

Dec 23 2020

He7d3r created T270781: ContentTranslation should not close/collapse the issue box after the user clicks on "Mark as resolved".
Dec 23 2020, 5:33 PM · ContentTranslation
He7d3r updated the task description for T270780: ContentTranslation: Resolved problems are not resolved anymore when the translation is reloaded.
Dec 23 2020, 5:22 PM · ContentTranslation
He7d3r added a comment to T270780: ContentTranslation: Resolved problems are not resolved anymore when the translation is reloaded.

This happened in most/all of my recent translations using the tool, and it also happens for the translation which is already in progress, at
https://pt.wikipedia.org/wiki/Special:ContentTranslation?page=Group+extension&from=en&to=pt&targettitle=Extens%C3%A3o+de+grupo

Dec 23 2020, 5:21 PM · ContentTranslation
He7d3r created T270780: ContentTranslation: Resolved problems are not resolved anymore when the translation is reloaded.
Dec 23 2020, 5:19 PM · ContentTranslation

Dec 20 2020

He7d3r added a comment to T269840: Unable to publish translation due to "Uncaught TypeError: parentDomElement is null".

I had to use the same workaround again for https://pt.wikipedia.org/w/index.php?diff=60053113

Dec 20 2020, 11:03 AM · ContentTranslation

Dec 15 2020

He7d3r added a comment to T183824: ContentTranslator tries to load an old published translation when starting a new translation of the same article.

For some reason I get 404 Not Found from https://cxserver.wikimedia.org/v2/page/en/Word2vec while https://cxserver.wikimedia.org/v1/page/en/Word2vec seems to work fine

Dec 15 2020, 10:11 AM · ContentTranslation
He7d3r added a comment to T183824: ContentTranslator tries to load an old published translation when starting a new translation of the same article.

Almost three years later, I'm still unable to start a new translation of https://en.wikipedia.org/wiki/Word2vec
Now it says:

Loading the saved translation...

followed by

The "Word2vec" page could not be found in English Wikipedia

which is incorrect, given the link above is working.

Dec 15 2020, 10:09 AM · ContentTranslation

Dec 10 2020

He7d3r created T269840: Unable to publish translation due to "Uncaught TypeError: parentDomElement is null".
Dec 10 2020, 11:48 AM · ContentTranslation

Dec 9 2020

He7d3r updated the task description for T269524: Critical error: Content translation failed to load due to internal error.
Dec 9 2020, 1:03 PM · ContentTranslation

Dec 6 2020

He7d3r created T269524: Critical error: Content translation failed to load due to internal error.
Dec 6 2020, 12:25 PM · ContentTranslation

Oct 21 2020

He7d3r claimed T176711: jQueryMsg should generate external links with 'external' CSS class.
Oct 21 2020, 9:12 AM · Growth-Team-Filtering, MW-1.36-notes (1.36.0-wmf.16; 2020-11-03), Growth-Team, MediaWiki-extensions-GuidedTour, good first task, JavaScript, I18n

Oct 20 2020

He7d3r added a comment to T264490: ContentTranslation adds duplicate 'Category:' prefix.

Still happening: https://pt.wikipedia.org/wiki/Simetria_de_reflex%C3%A3o?diff=59627837#footer

Oct 20 2020, 9:49 AM · Language-Team (Language-2021-April-June), ContentTranslation
He7d3r created T265985: "Uncaught TypeError: item is null" when clicking on category translation.
Oct 20 2020, 9:43 AM · JavaScript, ContentTranslation

Oct 13 2020

He7d3r added a comment to T264940: Track metrics on Portuguese Wikipedia relating to IP-editing turn off.

@Danilo made these tools for that:
https://ptwikis.toolforge.org/FiltroIP
https://ptwikis.toolforge.org/Filtros:180

Oct 13 2020, 12:11 PM · Product-Analytics (Kanban), Anti-Harassment

Oct 12 2020

He7d3r created T265270: ContentTranslation2 adds invalid categories (Localized ns+English ns+Category name).
Oct 12 2020, 10:38 AM · ContentTranslation

Oct 10 2020

He7d3r added a comment to T264940: Track metrics on Portuguese Wikipedia relating to IP-editing turn off.

Number of edits: "Edits per day" graph tool. I have created that tool with a query similar to that I used to get the active users. The graph show us that IPs used to make approximately 1700 edits per day. After the mandatory registration the new users edits have raised approximately 700 daily edits (from ~700 to ~1400), that suggest that about 700 edits that was made by IPs become to be made by new registered users and about 1000 are no longer been made.
(...)
Quality of edits with ORES: https://quarry.wmflabs.org/query/48860. I used the ORES damaging model to estimate the proportion of damaging edits. The data shows that it has decreased from approx. 18% to approx. 7%. That suggest us that those approx. 1000 edits per day that are no longer been made by IPs are worse edits then those approx. 700 that become to be made by new registered users.

Oct 10 2020, 11:35 AM · Product-Analytics (Kanban), Anti-Harassment

Oct 5 2020

He7d3r added a comment to T264622: $wgAbuseFilterEmergencyDisableThreshold is ignored.

There you go, the stats were reset and the filter was throttled. Likely some caching issue. T264629 could help, probably.

Indeed, now the main page says it is "Enabled, throttled":
https://pt.wikipedia.org/wiki/Special:AbuseFilter?offset=179&limit=1&uselang=en

Oct 5 2020, 5:54 PM · AbuseFilter
He7d3r updated subscribers of T13664: Add User Preference Option to hide reverted edits from Watchlist and Page History.

I wonder if this deserves a higher priority now, given that https://gerrit.wikimedia.org/r/609773 implemented some kind of metadata (T254074: Implement the reverted edit tag). This feature should help with part of the concerns raised at

...
As for Huggle, not only there is a chronic problem of people willing to waste their volunteer time there operating it, instead of creating content, but it's a bad solution as well, since it fills up the historic of the articles with revertions after revertions, polluting it and making it much less readable.
...

Oct 5 2020, 5:39 PM · MediaWiki-User-preferences
He7d3r awarded T254074: Implement the reverted edit tag a Like token.
Oct 5 2020, 5:29 PM · User-notice, MW-1.35-notes (1.35.0-wmf.41; 2020-07-14), Patch-For-Review, Product-Analytics, MediaWiki-Page-editing
He7d3r awarded T13664: Add User Preference Option to hide reverted edits from Watchlist and Page History a Like token.
Oct 5 2020, 5:24 PM · MediaWiki-User-preferences
He7d3r created T264622: $wgAbuseFilterEmergencyDisableThreshold is ignored.
Oct 5 2020, 2:24 PM · AbuseFilter
He7d3r awarded T261133: Ban IP edits on pt.wiki a Dislike token.
Oct 5 2020, 12:39 PM · Growth-Team, Anti-Harassment, Wikimedia-Site-requests

Oct 3 2020

Krinkle awarded T256732: [[Special:Notifications]] uses deprecated $.trimByteLength a Orange Medal token.
Oct 3 2020, 10:51 PM · MW-1.36-notes (1.36.0-wmf.12; 2020-10-05; NEVER DEPLOYED), Technical-Debt, Growth-Team, JavaScript, Notifications

Oct 1 2020

Quiddity awarded T63547: Make [[Special:WhatLinksHere]] and [[Special:RecentChangesLinked]] work with links which use [[Special:MyLanguage]] a Doubloon token.
Oct 1 2020, 4:58 PM · I18n, MediaWiki-General

Sep 19 2020

DannyS712 awarded T157218: Special:Log should display all logs a user has the rights to see (instead of only public logs) a Like token.
Sep 19 2020, 5:58 PM · Platform Engineering, MediaWiki-Logevents, AbuseFilter, SpamBlacklist, TitleBlacklist

Aug 27 2020

He7d3r added a watcher for Outreach-Programs-Projects: He7d3r.
Aug 27 2020, 11:32 PM

Aug 9 2020

Pppery awarded T63547: Make [[Special:WhatLinksHere]] and [[Special:RecentChangesLinked]] work with links which use [[Special:MyLanguage]] a Like token.
Aug 9 2020, 12:35 AM · I18n, MediaWiki-General

Jul 28 2020

Yair_rand awarded T63547: Make [[Special:WhatLinksHere]] and [[Special:RecentChangesLinked]] work with links which use [[Special:MyLanguage]] a Doubloon token.
Jul 28 2020, 9:32 PM · I18n, MediaWiki-General

Jul 22 2020

He7d3r updated the task description for T253938: Future proof addPortletLink and work towards a standard mw-portlet class for all menus across all skins.
Jul 22 2020, 2:30 PM · MW-1.36-notes (1.36.0-wmf.12; 2020-10-05; NEVER DEPLOYED), Patch-For-Review, Readers-Web-Backlog (Kanbanana-FY-2020-21), MediaWiki-Core-Skin-Architecture, Timeless, Vector

Jul 16 2020

He7d3r created T258149: Show source article quality at Special:ContentTranslation's "translations in progress", "suggestions" and "for later" lists.
Jul 16 2020, 10:52 AM · ORES, Machine-Learning-Team, ContentTranslation

Jul 15 2020

He7d3r awarded T254352: Each filter should have a talk page a Love token.
Jul 15 2020, 9:22 PM · AbuseFilter

Jul 7 2020

He7d3r merged task T61688: Dissappeared content of categories cannot be gathered again into T6366: Category history should show past members.
Jul 7 2020, 6:10 PM · MediaWiki-Categories
He7d3r merged task T36269: (un)categorization actions should get logged into T6366: Category history should show past members.
Jul 7 2020, 6:10 PM · MediaWiki-Categories
He7d3r merged task T36597: Category history into T6366: Category history should show past members.
Jul 7 2020, 6:10 PM · MediaWiki-Categories
He7d3r merged task T7484: include page categorization/decategorization event in the related category watch list into T6366: Category history should show past members.
Jul 7 2020, 6:10 PM · MediaWiki-Categories
He7d3r merged task T7526: It should be possible to see the chronology of the additions and removals of articles in a given category into T6366: Category history should show past members.
Jul 7 2020, 6:10 PM · MediaWiki-Categories
He7d3r merged tasks T7526: It should be possible to see the chronology of the additions and removals of articles in a given category, T7484: include page categorization/decategorization event in the related category watch list, T36597: Category history, T36269: (un)categorization actions should get logged, T61688: Dissappeared content of categories cannot be gathered again into T6366: Category history should show past members.
Jul 7 2020, 6:10 PM · MediaWiki-Categories

Jun 30 2020

He7d3r created T256732: [[Special:Notifications]] uses deprecated $.trimByteLength.
Jun 30 2020, 10:37 AM · MW-1.36-notes (1.36.0-wmf.12; 2020-10-05; NEVER DEPLOYED), Technical-Debt, Growth-Team, JavaScript, Notifications

Jun 27 2020

He7d3r added a comment to T256534: "Uncaught Error: Syntax error, unrecognized expression: ." when clicking on paragraph.

Jun 27 2020, 2:18 PM · Language-Team (Language-2020-July-September), MW-1.36-notes (1.36.0-wmf.2; 2020-07-28), Patch-For-Review, ContentTranslation
He7d3r created T256534: "Uncaught Error: Syntax error, unrecognized expression: ." when clicking on paragraph.
Jun 27 2020, 2:11 PM · Language-Team (Language-2020-July-September), MW-1.36-notes (1.36.0-wmf.2; 2020-07-28), Patch-For-Review, ContentTranslation

Jun 26 2020

He7d3r awarded T134681: ContentTranslation should not validate single sections against abuse filters intended for full pages a Heartbreak token.
Jun 26 2020, 9:15 PM · WorkType-Maintenance, AbuseFilter, ContentTranslation
He7d3r awarded T134678: ContentTranslation should generate an AbuseFilter log whenever it shows a warning for the users a Heartbreak token.
Jun 26 2020, 9:12 PM · WorkType-NewFunctionality, AbuseFilter, ContentTranslation

Jun 23 2020

He7d3r updated subscribers of T250809: Review model performance for ptwiki 'articlequality' and 'draftquality'.

@Danilo generated the following table comparing articlequality scores for the latest version of all articles to the scores which would be produced by the Python script which is/was used to make bot assessments:

MariaDB [s51206__ptwikis]> SELECT pe_qualidade, SUM(pe_qores = 0) ORES_0, SUM(pe_qores = 1) ORES_1, SUM(pe_qores = 2) ORES_2, SUM(pe_qores = 3) ORES_3, SUM(pe_qores = 4) ORES_4, SUM(pe_qores = 5) ORES_5, SUM(pe_qores = 6) ORES_6 FROM page_extra GROUP BY pe_qualidade ORDER BY pe_qualidade;
+--------------+--------+--------+--------+--------+--------+--------+--------+
| pe_qualidade | ORES_0 | ORES_1 | ORES_2 | ORES_3 | ORES_4 | ORES_5 | ORES_6 |
+--------------+--------+--------+--------+--------+--------+--------+--------+
|            0 |  68218 |      0 |      0 |      0 |      0 |      0 |      0 |
|            1 |      3 | 618819 | 204187 |  27523 |   1847 |   3562 |   3261 |
|            2 |      0 |   5565 |  69323 |  24777 |   1390 |   7496 |    350 |
|            3 |      0 |     71 |    472 |  14361 |   1861 |   7412 |    572 |
|            4 |      0 |      5 |     10 |   2948 |   2361 |   2978 |   1208 |
|            5 |      0 |      0 |     16 |     59 |    136 |   1056 |    161 |
|            6 |      0 |      0 |      0 |     35 |    190 |    188 |    782 |
+--------------+--------+--------+--------+--------+--------+--------+--------+
7 rows in set (3.70 sec)

(the label is set to zero if the quality is unknown, possibly due to the page being deleted)

Jun 23 2020, 8:57 PM · Machine-Learning-Team (Active Tasks), ORES, artificial-intelligence

Jun 19 2020

He7d3r added a comment to T157271: Web-based AutoWikiBrowser alternative.

See also: https://en.wikipedia.org/wiki/User:Joeytje50/JWB

Jun 19 2020, 7:33 PM · Wikimedia-Hackathon-2017, Community-Wishlist-Survey-2016, AutoWikiBrowser

Jun 18 2020

He7d3r added a comment to T255796: Twinkle gadget broken on Telugu Wikipedia..

Possibly related to https://github.com/azatoth/twinkle/commit/8f7b2f367276c6cf8e0ef78b82d9957415221780

Jun 18 2020, 5:19 PM · Reading-Web-Local-Wiki-Issues
He7d3r updated the task description for T252447: Notify gadget users to update Vector scripts and styles.
Jun 18 2020, 11:22 AM · Tech-Ambassadors, Readers-Web-Backlog (Tracking), Desktop Improvements, Vector (Vector (Tracking)), User-notice

Jun 14 2020

He7d3r added a comment to T255367: Global script is not loaded on debug=true.

I confirmed this by replacing my global.js by console.log( 'Started global.js.' ); and then loading
https://pt.wikipedia.org/wiki/Special:BlankPage?debug=true
There should be a log in the console, but it was not there.

Jun 14 2020, 2:01 PM · Performance-Team, MediaWiki-ResourceLoader, GlobalCssJs

Jun 5 2020

He7d3r added a comment to T246668: Create follow-up edit quality campaign for ptwikipedia.

Progress (100% done, -19 labels left):


https://labels.wmflabs.org/stats/ptwiki/93

Jun 5 2020, 2:02 PM · Machine-Learning-Team (Active Tasks), editquality-modeling, Wikilabels, artificial-intelligence

May 23 2020

He7d3r added a comment to T250809: Review model performance for ptwiki 'articlequality' and 'draftquality'.

I've submitted https://github.com/wikimedia/articlequality/pull/132

May 23 2020, 10:59 AM · Machine-Learning-Team (Active Tasks), ORES, artificial-intelligence

May 22 2020

He7d3r committed rOWC05578ef2334f: Build new ptwiki model with data since 2014 (authored by He7d3r).
Build new ptwiki model with data since 2014
May 22 2020, 11:44 PM
He7d3r committed rOWC340b621c5ac0: Update class sizes and pop-rates (authored by He7d3r).
Update class sizes and pop-rates
May 22 2020, 9:04 PM
He7d3r committed rOWC1346f67c6478: Update Makefile to remove revisions older than 2014 (authored by He7d3r).
Update Makefile to remove revisions older than 2014
May 22 2020, 8:00 PM
He7d3r added a comment to T250809: Review model performance for ptwiki 'articlequality' and 'draftquality'.

Updated info (as of commit c3a66b0 plus the specific changes which define each of the tests):

accuracy (micro=0.8, macro=0.861):
	    1      2      3      4      5      6
	-----  -----  -----  -----  -----  -----
	0.781  0.827  0.877  0.899  0.875  0.908
$ cat datasets/ptwiki.labelings.20200301.remove_bots.json | json2tsv wp10 | sort | uniq -c
 145657 1
  32807 2
   6177 3
   2346 4
   1646 5
   1542 6
$ cat datasets/ptwiki.balanced_labelings.9k_2020.remove_bots.json | json2tsv wp10 | sort | uniq -c
   1500 1
   1500 2
   1500 3
   1500 4
   1500 5
   1328 6
accuracy (micro=0.81, macro=0.875):
	    1      2      3      4      5      6
	-----  -----  -----  -----  -----  -----
	0.799  0.806  0.867  0.915  0.928  0.933
$ cat datasets/ptwiki.labelings.20200301.since_2014.json | json2tsv wp10 | sort | uniq -c
   7537 1
   3346 2
   1276 3
    690 4
    653 5
    684 6
$ cat datasets/ptwiki.balanced_labelings.9k_2020.since_2014.json | json2tsv wp10 | sort | uniq -c
   1500 1
   1500 2
   1276 3
    690 4
    653 5
    684 6
May 22 2020, 5:45 PM · Machine-Learning-Team (Active Tasks), ORES, artificial-intelligence
He7d3r created T253388: Automatically create task on Phabricator based on Issues from Github repositories.
May 22 2020, 3:59 PM · Technical-Tool-Request, User-Majavah

May 21 2020

He7d3r committed rOWCa75e9327a258: Remove bots assessments from dataset (authored by He7d3r).
Remove bots assessments from dataset
May 21 2020, 8:50 PM
He7d3r committed rOWCae7bacd2d60c: Fix AttributeError when revision.user is None (authored by He7d3r).
Fix AttributeError when revision.user is None
May 21 2020, 6:08 PM

May 20 2020

He7d3r committed rOWCd45172394f86: Convert page id to string explicitly (authored by He7d3r).
Convert page id to string explicitly
May 20 2020, 8:44 PM
He7d3r committed rOWCeb97707eee3a: Convert page id to string explicitly (authored by He7d3r).
Convert page id to string explicitly
May 20 2020, 8:38 PM
He7d3r committed rOWC4a5095cff48c: Remove unused user (authored by He7d3r).
Remove unused user
May 20 2020, 2:31 PM
He7d3r committed rOWC4b7381456f0d: Add user to tests (authored by He7d3r).
Add user to tests
May 20 2020, 2:26 PM
He7d3r committed rOWC8c9042633b58: Remove bots assessments from dataset (authored by He7d3r).
Remove bots assessments from dataset
May 20 2020, 10:08 AM

May 19 2020

He7d3r committed rOWCac1c8df235e4: Remove bots assessments from dataset (authored by He7d3r).
Remove bots assessments from dataset
May 19 2020, 9:45 PM
He7d3r added a comment to T250809: Review model performance for ptwiki 'articlequality' and 'draftquality'.

@Halfak: Oops... I missed the -v flag when I used grep to remove the bot assessments. So, instead of considering only human assessments, I extracted only the bot assessments! Once I add that flag, the number of assessments by humans seems more reasonable:

$ cat datasets/ptwiki.labelings.20200301.user.json |grep -v -P '"user": "[^"]*([Bb][Oo][Tt]|[Rr][Oo][Bb][ÔôOo])[^"]*"' | json2tsv wp10 | sort | uniq -c
  28403 1
  13343 2
   5329 3
   2209 4
   1458 5
   1281 6

In this case, the explanation for such a high accuracy is likely that the bots assessments are very predictable (it is hardcoded in their dna code ;-).

May 19 2020, 8:41 PM · Machine-Learning-Team (Active Tasks), ORES, artificial-intelligence
He7d3r added a comment to T209387: Update documentation for ArticleQuality.js.

For future reference: there is now a translation at https://pt.wikipedia.org/wiki/User:EpochFail/ArticleQuality

May 19 2020, 10:57 AM · artificial-intelligence, articlequality-modeling, Machine-Learning-Team (Active Tasks)

May 18 2020

He7d3r added a comment to T246667: Build draft quality model for ptwikipedia.

@GoEThe: in case you have any suggestions on better images for this purpose, we can try changing them. @Halfak suggested the https://commons.wikimedia.org/wiki/Category:OOUI_icons as a good source of icons we could use.

May 18 2020, 10:28 PM · Machine-Learning-Team (Active Tasks), editquality-modeling, Wikilabels, artificial-intelligence
He7d3r added a comment to T246667: Build draft quality model for ptwikipedia.

@GoEThe : I see you've installed the version of the script I mentioned at T246667#6079484. Did you have the chance to test it on Special:Newpages? Is it good enough for us to publicize it for other users?

May 18 2020, 9:01 PM · Machine-Learning-Team (Active Tasks), editquality-modeling, Wikilabels, artificial-intelligence
He7d3r added a comment to T250809: Review model performance for ptwiki 'articlequality' and 'draftquality'.

PS: I didn't change the thresholds in the Makefile, so the samples were not as balanced as might be wanted:
(Note: By mistake, I forgot the -v flag in the grep above, so the results for the first case are inverted, that is, they contain bot_only, instead of no_bots)

$ cat datasets/ptwiki.balanced_labelings.9k_2020.no_bots.json | json2tsv wp10 | sort | uniq -c
   1500 1
   1500 2
    759 3
     20 4
     95 5
    203 6
$ cat datasets/ptwiki.balanced_labelings.9k_2020.since_2014.json | json2tsv wp10 | sort | uniq -c
   1500 1
   1500 2
   1247 3
    674 4
    630 5
    654 6
May 18 2020, 5:51 PM · Machine-Learning-Team (Active Tasks), ORES, artificial-intelligence
He7d3r added a comment to T250809: Review model performance for ptwiki 'articlequality' and 'draftquality'.

Wow. This is really awesome. I wonder what would happen if we retrained the models on recent data only. In enwiki we found that the definition of quality changed over time. There should be plenty of observation after 2014 to give us good signal.

May 18 2020, 5:49 PM · Machine-Learning-Team (Active Tasks), ORES, artificial-intelligence

May 15 2020

He7d3r reassigned T250704: Internal links on comment/summary point to Wikilabels instead of the target wiki from He7d3r to Halfak.
May 15 2020, 11:13 AM · Machine-Learning-Team, Wikilabels
He7d3r closed T250704: Internal links on comment/summary point to Wikilabels instead of the target wiki, a subtask of T252280: Improve Wikilabels UI, as Resolved.
May 15 2020, 11:12 AM · Machine-Learning-Team (Active Tasks), Wikilabels, Wikimedia-Hackathon-2020
He7d3r closed T250704: Internal links on comment/summary point to Wikilabels instead of the target wiki as Resolved.

Halfak fixed this in https://github.com/wikimedia/wikilabels/pull/263

May 15 2020, 11:12 AM · Machine-Learning-Team, Wikilabels

May 12 2020

He7d3r added a comment to T252441: Wikilabels: SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY.

See https://github.com/wikimedia/wikilabels/pull/264

May 12 2020, 12:09 PM · Machine-Learning-Team (Active Tasks), Wikilabels

May 11 2020

He7d3r added a comment to T252441: Wikilabels: SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY.

I've tried to do this:

diff --git a/Dockerfile b/Dockerfile
index 5fc2d0d..19f39a5 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -5,7 +5,8 @@ RUN apt-get update && apt-get install -y \
     g++ \
     python3-dev \
     libmemcached-dev \
-    libz-dev
+    libz-dev \
+    memcached
May 11 2020, 7:21 PM · Machine-Learning-Team (Active Tasks), Wikilabels
He7d3r created T252441: Wikilabels: SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY.
May 11 2020, 6:51 PM · Machine-Learning-Team (Active Tasks), Wikilabels

May 10 2020

He7d3r added a comment to T250809: Review model performance for ptwiki 'articlequality' and 'draftquality'.

After patching¹ the extractor to also collect user names, I found that these are the top 10 users who added/modified the most assessments:

FMTbot            91366
Rei-bot           25155
BotStats          15830
Fabiano Tatsch    14829
Leandro Drudo      4660
GoEThe             3172
Burmeister         3128
Rei-artur          2965
FilRBot            2444
VítoR Valente      1895

Then I produced² the following graphs showing the number of labels added/modified by bots³ by year, for each of the six quality levels. There are many quality 1 and 2 assessments made by bots.

May 10 2020, 10:22 PM · Machine-Learning-Team (Active Tasks), ORES, artificial-intelligence

May 9 2020

He7d3r added a comment to T250809: Review model performance for ptwiki 'articlequality' and 'draftquality'.

Here are some graphs showing the evolution of the assessments extracted from ptwiki:

May 9 2020, 5:45 PM · Machine-Learning-Team (Active Tasks), ORES, artificial-intelligence

May 8 2020

He7d3r added a comment to T251171: Add `words_to_watch` to articlequality and draftquality models in ptwiki.

It occurred to me that some of these expressions are also used by Salebot¹, with the difference that in the bot config² users assign a score to each word/regex indicating how much it contributes towards classifying an edit as needing to be reverted. This allows it to "ignore" words which are common in good edits, unless there are too many of them.

May 8 2020, 12:08 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence

May 7 2020

He7d3r added a comment to T252152: Extracted labels might not be accurate when there are multiple reverts.

See https://github.com/wikimedia/articlequality/pull/127 for a possible solution.

May 7 2020, 6:58 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling
He7d3r created T252152: Extracted labels might not be accurate when there are multiple reverts.
May 7 2020, 6:53 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling

May 6 2020

He7d3r added a comment to T158916: Store/read informals, badwords, stopwords and other language assets on a wiki page.

It is not uncommon for some good faith edit to add a new expression (or badly written regex) to such lists and then breaking (to some extent) the tools which use them (e.g. increasing its false positives).

May 6 2020, 6:30 PM · artificial-intelligence, revscoring, Machine-Learning-Team
He7d3r added a comment to T158916: Store/read informals, badwords, stopwords and other language assets on a wiki page.

Here are some examples of existing lists, of varying quality and formats, used by other tools:

May 6 2020, 6:24 PM · artificial-intelligence, revscoring, Machine-Learning-Team
He7d3r added a comment to T251608: Text fetched by articlequality's `fetch_text` might not match the talk page label (for moved pages).

I've updated the patch.

May 6 2020, 6:09 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling
He7d3r added a comment to T158916: Store/read informals, badwords, stopwords and other language assets on a wiki page.

Is this still wanted nowadays?

May 6 2020, 5:44 PM · artificial-intelligence, revscoring, Machine-Learning-Team

May 5 2020

He7d3r created T251904: New Wikitext Editor: Unable to add new group of tools to VisualEditor's toolbar .
May 5 2020, 2:43 PM · VisualEditor
He7d3r added a comment to T251608: Text fetched by articlequality's `fetch_text` might not match the talk page label (for moved pages).

While the dumps are processed, we could store the <id> of the talk pages instead of their <title>s. Then, an API query such as
https://pt.wikipedia.org/w/api.php?action=query&format=json&prop=info&pageids=18363&formatversion=2&inprop=subjectid
will return the <id> of the associated subject page (the one whose text we are interested in). This should work when pages are moved, since page moves do not change the pageid (but it is not guaranteed if the page is deleted and restored).

May 5 2020, 11:12 AM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling

May 1 2020

He7d3r created T251608: Text fetched by articlequality's `fetch_text` might not match the talk page label (for moved pages).
May 1 2020, 3:00 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling