Page MenuHomePhabricator

Filtering Special:NewPages by page size
Closed, ResolvedPublic

Description

Please add a new option to filter [[Special:Newpages]] by page size. Thus, pages with very small size (which should be deleted per empty or no content) and pages with very large size (possibly copyright violations) could be easily found. (Hundreds of pages are created every day in largest wikis and it is very tiring to check them all manually.)

Details

Reference
bz10817

Related Objects

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:52 PM
bzimport set Reference to bz10817.
bzimport added a subscriber: Unknown Object (MLST).
Meno25 created this task.Aug 6 2007, 2:00 PM

chughakshay16 wrote:

Posted a review in Gerrit , here is the link - https://gerrit.wikimedia.org/r/#/c/8036/

Patch needs only minor code style changes and a commit message that explains what the patch does and does not refer to a previous versiosn of that same patch.

Akshay: Patch has been inactive for a while so I'll reset the assignee of this bug report, but please assign this bug report again to yourself if you plan to work on your patch again and fix the remaining small issues (which would be very welcome to get this bug fixed!).

Change 8036 had a related patch set uploaded by Krinkle:
Filter Special:Newpages by page size

https://gerrit.wikimedia.org/r/8036

Change 8036 had a related patch set uploaded (by TTO):
New HTMLForm size filter field; add size filter to Special:Newpages

https://gerrit.wikimedia.org/r/8036

TTO added a project: DBA.Jun 24 2016, 12:18 PM
TTO edited subscribers, added: jcrespo, TTO; removed: wikibugs-l-list.

The patch requires attention from Jaime.

jcrespo moved this task from Triage to Next on the DBA board.Jun 24 2016, 2:51 PM
jcrespo moved this task from Next to Done on the DBA board.Oct 26 2016, 3:36 PM

I've tested this query on all shards, several wikis, with special attention to all large wikis (enwiki, dewiki, commons, wikidata, frwiki, zhwiki) both in the regular traffic slaves and special slaves. I found this query to be as good as it gets, detailed results follow (I only show the results on enwiki because the other ones match this):

MariaDB MARIADB db1073.eqiad.wmnet enwiki > EXPLAIN SELECT /* IndexPager::buildQueryInfo (NewPagesPager) Bawolff */ rc_namespace,rc_title,rc_cur_id,rc_user,rc_user_text,rc_comment,rc_timestamp,rc_patrolled,rc_id,rc_deleted,page_len AS `length`,page_latest AS `rev_id`,rc_this_oldid,page_namespace,page_title,(SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags` FROM `recentchanges` INNER JOIN `page` ON ((page_id=rc_cur_id)) WHERE rc_new = '1' AND (page_len <= 1000) AND rc_namespace = '0' AND page_is_redirect = '0' ORDER BY rc_timestamp DESC LIMIT 51\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: recentchanges
         type: ref
possible_keys: rc_namespace_title,rc_cur_id,new_name_timestamp,rc_ns_usertext,rc_name_type_patrolled_timestamp
          key: new_name_timestamp
      key_len: 5
          ref: const,const
         rows: 116688
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: page
         type: eq_ref
possible_keys: PRIMARY,page_len,page_redirect_namespace_len
          key: PRIMARY
      key_len: 4
          ref: enwiki.recentchanges.rc_cur_id
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: change_tag
         type: ref
possible_keys: change_tag_rc_tag
          key: change_tag_rc_tag
      key_len: 5
          ref: enwiki.recentchanges.rc_id
         rows: 1
        Extra: Using index
3 rows in set (0.01 sec)

51 rows in set (0.07 sec)

+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 353   |
| Handler_read_last          | 0     |
| Handler_read_next          | 9     |
| Handler_read_prev          | 300   |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
25 rows in set (0.00 sec)

MariaDB MARIADB db1055.eqiad.wmnet enwiki > EXPLAIN SELECT /* IndexPager::buildQueryInfo (NewPagesPager) Bawolff */ rc_namespace,rc_title,rc_cur_id,rc_user,rc_user_text,rc_comment,rc_timestamp,rc_patrolled,rc_id,rc_deleted,page_len AS `length`,page_latest AS `rev_id`,rc_this_oldid,page_namespace,page_title,(SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags` FROM `recentchanges` INNER JOIN `page` ON ((page_id=rc_cur_id)) WHERE rc_new = '1' AND (page_len <= 1000) AND rc_namespace = '0' AND page_is_redirect = '0' ORDER BY rc_timestamp DESC LIMIT 51\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: recentchanges
         type: ref
possible_keys: rc_namespace_title,rc_cur_id,new_name_timestamp,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp
          key: new_name_timestamp
      key_len: 5
          ref: const,const
         rows: 122688
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: page
         type: eq_ref
possible_keys: PRIMARY,page_len,page_redirect_namespace_len
          key: PRIMARY
      key_len: 4
          ref: enwiki.recentchanges.rc_cur_id
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: change_tag
         type: ref
possible_keys: ct_rc_id,change_tag_rc_tag
          key: ct_rc_id
      key_len: 5
          ref: enwiki.recentchanges.rc_id
         rows: 1
        Extra: Using index
3 rows in set (0.02 sec)

51 rows in set (0.00 sec)

+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 353   |
| Handler_read_last          | 0     |
| Handler_read_next          | 9     |
| Handler_read_prev          | 300   |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
25 rows in set (0.01 sec)

Change 8036 merged by jenkins-bot:
New HTMLForm size filter field; add size filter to Special:Newpages

https://gerrit.wikimedia.org/r/8036

TTO closed this task as Resolved.Oct 27 2016, 8:28 AM
TTO claimed this task.

Only took 9 years to get 100 lines of new code into MediaWiki...

matej_suchanek updated the task description. (Show Details)