insource should search article text on non-wikitext pages. Probably.
Closed, ResolvedPublic

Description

Searching for wg in the MediaWiki namespace on wiktionary finds common.js but searching for insource:wg doesn't. It probably should. The reason it doesn't is because non-wikitext pages don't _have_ a source field. Maybe they should and the source should just _be_ the text.

Manybubbles updated the task description. (Show Details)
Manybubbles raised the priority of this task from to Needs Triage.
Manybubbles added a subscriber: Manybubbles.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 1 2015, 8:18 PM
ksmith triaged this task as Low priority.Apr 2 2015, 5:50 PM
ksmith set Security to None.
Manybubbles moved this task from Needs triage to Search on the Discovery board.May 7 2015, 7:59 PM
Krinkle added subscribers: Pathoschild, Nemo_bis, MZMcBride.
Krinkle added a subscriber: Krinkle.

Yeah, this would be useful. Especially when looking for CSS class names.

A plain word search for a css class typically doesn't find most of the uses in wiki pages. insource does. In practice one has to check both to be sure.

insource:/plainlinksneverexpand/

This finds all articles that use it (e.g. class="foo plainlinksneverexpand"). But not any stylesheets that apply to it.

"plainlinksneverexpand"

This finds none of the articles, but does find (some?) stylesheet pages that apply to it.

[...] non-wikitext pages don't _have_ a source field. Maybe they should and the source should just _be_ the text.

It looks like non-wikitext pages now have a _source field.

@EBernhardson: Is there somewhere that I can query to see what data Elasticsearch has for (for example) the "MediaWiki:Common.js" page on the English Wikipedia (https://en.wikipedia.org/wiki/MediaWiki:Common.js? I know there's https://en.wikipedia.org/wiki/MediaWiki:Common.js?action=cirrusdump, but I want to see the data from Elasticsearch, not the data from MediaWiki. Can I query Elasticsearch's data somewhere currently?

If there's no public end-point for this data (I just briefly looked at T109715 again), can you query this information, please? I'd really like to track down why this bug is happening and fix it.

My guess is that _source was added to ?action=cirrusdump and that we haven't regenerated the indices to account for this.

ori added a subscriber: ori.EditedDec 24 2015, 5:46 AM

My guess is that _source was added to ?action=cirrusdump and that we haven't regenerated the indices to account for this.

No -- it's the 'source_text' subkey of '_source' that is (still) missing from the serialized representation of non-wikitext articles.

Is there somewhere that I can query to see what data Elasticsearch has for (for example) the "MediaWiki:Common.js" page on the English Wikipedia

If I am reading the source code correctly ?action=cirrusDump responses do come from ElasticSearch, though I do not know if they represent all the data that is available.

Change 260903 had a related patch set uploaded (by Ori.livneh):
For source code pages, index page contents as source_text

https://gerrit.wikimedia.org/r/260903

Deskana moved this task from Backlog to Needs review on the Discovery-Search (Current work) board.
Deskana assigned this task to ori.
Deskana added a subscriber: Deskana.

Thanks for the patch, @ori. We'll get it reviewed.

In T88247#1903093, @ori wrote:

My guess is that _source was added to ?action=cirrusdump and that we haven't regenerated the indices to account for this.

No -- it's the 'source_text' subkey of '_source' that is (still) missing from the serialized representation of non-wikitext articles.

Oh. That's a little tricky. Pretty-printing the output of https://en.wikipedia.org/wiki/MediaWiki:Common.js?action=cirrusdump made this a bit clearer:

[
  {
    "_index": "enwiki_general_1432193140",
    "_type": "page",
    "_id": "763577",
    "_version": [
      
    ],
    "_source": {
      "namespace": 8,
      "title": "Common.js",
      "timestamp": "2015-12-10T20:05:12Z",
      "text": "\/**  * Keep code in MediaWiki:Common.js to a minimum as it is unconditionally  * loaded for all users on every wiki page. If possible create a gadget that is  * enabled by default instead of adding it here (since gadgets are fully  * optimized ResourceLoader modules with possibility to add dependencies etc.)  *  * Since Common.js isn't a gadget, there is no place to declare its  * dependencies, so we have to lazy load them with mw.loader.using on demand and  * then execute the rest in the callback. In most cases these dependencies will  * be loaded (or loading) already and the callback will not be delayed. In case a  * dependency hasn't arrived yet it'll make sure those are loaded before this.  *\/  \/* global mw, $, importStylesheet, importScript *\/ \/* jshint strict:false, browser:true *\/  mw.loader.using( ['mediawiki.user', 'mediawiki.util', 'mediawiki.notify', 'jquery.client'] ).done( function () { \/* Begin of mw.loader.using callback *\/  \/**  * Main Page layout fixes  *  * Description: Adds an additional link to the complete list of languages available.  * Maintainers: [[User:AzaToth]], [[User:R. Koot]], [[User:Alex Smotrov]]  *\/ if ( mw.config.get( 'wgPageName' ) === 'Main_Page' || mw.config.get( 'wgPageName' ) === 'Talk:Main_Page' ) {     $( function () {         mw.util.addPortletLink( 'p-lang', '\/\/meta.wikimedia.org\/wiki\/List_of_Wikipedias',             'Complete list', 'interwiki-completelist', 'Complete list of Wikipedias' );     } ); }  \/**  * Redirect User:Name\/skin.js and skin.css to the current skin's pages  * (unless the 'skin' page really exists)  * @source: http:\/\/www.mediawiki.org\/wiki\/Snippets\/Redirect_skin.js  * @rev: 2  *\/ if ( mw.config.get( 'wgArticleId' ) === 0 && mw.config.get( 'wgNamespaceNumber' ) === 2 ) {     var titleParts = mw.config.get( 'wgPageName' ).split( '\/' );     \/* Make sure there was a part before and after the slash        and that the latter is 'skin.js' or 'skin.css' *\/     if ( titleParts.length == 2 ) {         var userSkinPage = titleParts.shift() + '\/' + mw.config.get( 'skin' );         if ( titleParts.slice( -1 ) == 'skin.js' ) {             window.location.href = mw.util.getUrl( userSkinPage + '.js' );         } else if ( titleParts.slice( -1 ) == 'skin.css' ) {             window.location.href = mw.util.getUrl( userSkinPage + '.css' );         }     } }  \/**  * Map addPortletLink to mw.util  * @deprecated: Use mw.util.addPortletLink instead.  *\/ mw.log.deprecate( window, 'addPortletLink', mw.util.addPortletLink, 'Use mw.util.addPortletLink instead' );  \/**  * Extract a URL parameter from the current URL  * @deprecated: Use mw.util.getParamValue with proper escaping  *\/ mw.log.deprecate( window, 'getURLParamValue', mw.util.getParamValue, 'Use mw.util.getParamValue instead' );  \/**  * Test if an element has a certain class  * @deprecated:  Use $(element).hasClass() instead.  *\/ mw.log.deprecate( window, 'hasClass', function ( element, className ) {     return $( element ).hasClass( className ); }, 'Use jQuery.hasClass() instead' );  \/**  * @source www.mediawiki.org\/wiki\/Snippets\/Load_JS_and_CSS_by_URL  * @rev 6  *\/ var extraCSS = mw.util.getParamValue( 'withCSS' ),     extraJS = mw.util.getParamValue( 'withJS' );  if ( extraCSS ) {     if ( extraCSS.match( \/^MediaWiki:[^&=%#]*\\.css$\/ ) ) {         importStylesheet( extraCSS );     } else {         mw.notify( 'Only pages from the MediaWiki namespace are allowed.', { title: 'Invalid withCSS value' } );     } }  if ( extraJS ) {     if ( extraJS.match( \/^MediaWiki:[^&=%#]*\\.js$\/ ) ) {         importScript( extraJS );     } else {         mw.notify( 'Only pages from the MediaWiki namespace are allowed.', { title: 'Invalid withJS value' } );     } }  \/**  * Import more specific scripts if necessary  *\/ if ( mw.config.get( 'wgAction' ) === 'edit' || mw.config.get( 'wgAction' ) === 'submit' || mw.config.get( 'wgCanonicalSpecialPageName' ) === 'Upload' ) {     \/* scripts specific to editing pages *\/     importScript( 'MediaWiki:Common.js\/edit.js' ); } else if ( mw.config.get( 'wgCanonicalSpecialPageName' ) === 'Watchlist' ) {     \/* watchlist scripts *\/     importScript( 'MediaWiki:Common.js\/watchlist.js' ); }  \/**  * Fix for Windows XP Unicode font rendering  *\/ if ( navigator.appVersion.search(\/windows nt 5\/i) !== -1 ) {     mw.util.addCSS( '.IPA { font-family: \"Lucida Sans Unicode\", \"Arial Unicode MS\"; } ' +                 '.Unicode { font-family: \"Arial Unicode MS\", \"Lucida Sans Unicode\"; } ' ); }  \/**  * WikiMiniAtlas  *  * Description: WikiMiniAtlas is a popup click and drag world map.  *              This script causes all of our coordinate links to display the WikiMiniAtlas popup button.  *              The script itself is located on meta because it is used by many projects.  *              See [[Meta:WikiMiniAtlas]] for more information.  * Maintainers: [[User:Dschwen]]  *\/ ( function () {     var require_wikiminiatlas = false;     var coord_filter = \/geohack\/;     $( function () {         $( 'a.external.text' ).each( function( key, link ) {             if ( link.href && coord_filter.exec( link.href ) ) {                 require_wikiminiatlas = true;                 \/\/ break from loop                 return false;             }         } );         if ( $( 'div.kmldata' ).length ) {             require_wikiminiatlas = true;         }         if ( require_wikiminiatlas ) {             mw.loader.load( '\/\/meta.wikimedia.org\/w\/index.php?title=MediaWiki:Wikiminiatlas.js&action=raw&ctype=text\/javascript' );         }     } ); } )();  \/**  * Collapsible tables  *  * Allows tables to be collapsed, showing only the header. See [[Wikipedia:NavFrame]].  *  * @version 2.0.3 (2014-03-14)  * @source https:\/\/www.mediawiki.org\/wiki\/MediaWiki:Gadget-collapsibleTables.js  * @author [[User:R. Koot]]  * @author [[User:Krinkle]]  * @deprecated Since MediaWiki 1.20: Use class=\"mw-collapsible\" instead which  * is supported in MediaWiki core.  *\/  var autoCollapse = 2; var collapseCaption = 'hide'; var expandCaption = 'show'; var tableIndex = 0;  function collapseTable( tableIndex ) {     var Button = document.getElementById( 'collapseButton' + tableIndex );     var Table = document.getElementById( 'collapsibleTable' + tableIndex );      if ( !Table || !Button ) {         return false;     }      var Rows = Table.rows;     var i;      if ( Button.firstChild.data === collapseCaption ) {         for ( i = 1; i = autoCollapse && $( NavigationBoxes[i] ).hasClass( 'autocollapse' ) )         ) {             collapseTable( i );         }         else if ( $( NavigationBoxes[i] ).hasClass ( 'innercollapse' ) ) {             var element = NavigationBoxes[i];             while ((element = element.parentNode)) {                 if ( $( element ).hasClass( 'outercollapse' ) ) {                     collapseTable ( i );                     break;                 }             }         }     } }  mw.hook( 'wikipage.content' ).add( createCollapseButtons );  \/**  * Dynamic Navigation Bars (experimental)  *  * Description: See [[Wikipedia:NavFrame]].  * Maintainers: UNMAINTAINED  *\/  \/* set up the words in your language *\/ var NavigationBarHide = '[' + collapseCaption + ']'; var NavigationBarShow = '[' + expandCaption + ']'; var indexNavigationBar = 0;  \/**  * Shows and hides content and picture (if available) of navigation bars  * Parameters:  *     indexNavigationBar: the index of navigation bar to be toggled  **\/ window.toggleNavigationBar = function ( indexNavigationBar, event ) {     var NavToggle = document.getElementById( 'NavToggle' + indexNavigationBar );     var NavFrame = document.getElementById( 'NavFrame' + indexNavigationBar );     var NavChild;      if ( !NavFrame || !NavToggle ) {         return false;     }      \/* if shown now *\/     if ( NavToggle.firstChild.data === NavigationBarHide ) {         for ( NavChild = NavFrame.firstChild; NavChild != null; NavChild = NavChild.nextSibling ) {             if ( $( NavChild ).hasClass( 'NavContent' ) || $( NavChild ).hasClass( 'NavPic' ) ) {                 NavChild.style.display = 'none';             }         }     NavToggle.firstChild.data = NavigationBarShow;      \/* if hidden now *\/     } else if ( NavToggle.firstChild.data === NavigationBarShow ) {         for ( NavChild = NavFrame.firstChild; NavChild != null; NavChild = NavChild.nextSibling ) {             if ( $( NavChild ).hasClass( 'NavContent' ) || $( NavChild ).hasClass( 'NavPic' ) ) {                 NavChild.style.display = 'block';             }         }         NavToggle.firstChild.data = NavigationBarHide;     }      event.preventDefault(); };  \/* adds show\/hide-button to navigation bars *\/ function createNavigationBarToggleButton( $content ) {     var NavChild;     \/* iterate over all -elements *\/     var $divs = $content.find( 'div' );     $divs.each( function ( i, NavFrame ) {         \/* if found a navigation bar *\/         if ( $( NavFrame ).hasClass( 'NavFrame' ) ) {              indexNavigationBar++;             var NavToggle = document.createElement( 'a' );             NavToggle.className = 'NavToggle';             NavToggle.setAttribute( 'id', 'NavToggle' + indexNavigationBar );             NavToggle.setAttribute( 'href', '#' );             $( NavToggle ).on( 'click', $.proxy( window.toggleNavigationBar, window, indexNavigationBar ) );              var isCollapsed = $( NavFrame ).hasClass( 'collapsed' );             \/**              * Check if any children are already hidden.  This loop is here for backwards compatibility:              * the old way of making NavFrames start out collapsed was to manually add style=\"display:none\"              * to all the NavPic\/NavContent elements.  Since this was bad for accessibility (no way to make              * the content visible without JavaScript support), the new recommended way is to add the class              * \"collapsed\" to the NavFrame itself, just like with collapsible tables.              *\/             for ( NavChild = NavFrame.firstChild; NavChild != null && !isCollapsed; NavChild = NavChild.nextSibling ) {                 if ( $( NavChild ).hasClass( 'NavPic' ) || $( NavChild ).hasClass( 'NavContent' ) ) {                     if ( NavChild.style.display === 'none' ) {                         isCollapsed = true;                     }                 }             }             if ( isCollapsed ) {                 for ( NavChild = NavFrame.firstChild; NavChild != null; NavChild = NavChild.nextSibling ) {                     if ( $( NavChild ).hasClass( 'NavPic' ) || $( NavChild ).hasClass( 'NavContent' ) ) {                         NavChild.style.display = 'none';                     }                 }             }             var NavToggleText = document.createTextNode( isCollapsed ? NavigationBarShow : NavigationBarHide );             NavToggle.appendChild( NavToggleText );              \/* Find the NavHead and attach the toggle link (Must be this complicated because Moz's firstChild handling is borked) *\/             for( var j = 0; j < NavFrame.childNodes.length; j++ ) {                 if ( $( NavFrame.childNodes[j] ).hasClass( 'NavHead' ) ) {                     NavToggle.style.color = NavFrame.childNodes[j].style.color;                     NavFrame.childNodes[j].appendChild( NavToggle );                 }             }             NavFrame.setAttribute( 'id', 'NavFrame' + indexNavigationBar );         }     } ); }  mw.hook( 'wikipage.content' ).add( createNavigationBarToggleButton );  \/**  * Uploadwizard_newusers  * Switches in a message for non-autoconfirmed users at [[Wikipedia:Upload]]  *  * Maintainers: [[User:Krimpet]]  *\/ function uploadwizard_newusers() {     if ( mw.config.get( 'wgNamespaceNumber' ) === 4 && mw.config.get( 'wgTitle' ) === 'Upload' && mw.config.get( 'wgAction' ) === 'view' ) {         var oldDiv = document.getElementById( 'autoconfirmedusers' ),             newDiv = document.getElementById( 'newusers' );         if ( oldDiv && newDiv ) {             var userGroups = mw.config.get( 'wgUserGroups' );             if ( userGroups ) {                 for ( var i = 0; i < userGroups.length; i++ ) {                     if ( userGroups[i] === 'autoconfirmed' ) {                         oldDiv.style.display = 'block';                         newDiv.style.display = 'none';                         return;                     }                 }             }             oldDiv.style.display = 'none';             newDiv.style.display = 'block';             return;         }     } }  $(uploadwizard_newusers);  \/**  * Magic editintros ****************************************************  *  * Description: Adds editintros on disambiguation pages and BLP pages.  * Maintainers: [[User:RockMFR]]  *\/ function addEditIntro( name ) {     $( '.mw-editsection, #ca-edit' ).find( 'a' ).each( function ( i, el ) {         el.href = $( this ).attr( 'href' ) + '&editintro=' + name;     } ); }  if ( mw.config.get( 'wgNamespaceNumber' ) === 0 ) {     $( function () {         if ( document.getElementById( 'disambigbox' ) ) {             addEditIntro( 'Template:Disambig_editintro' );         }     } );      $( function () {         var cats = mw.config.get('wgCategories');         if ( !cats ) {             return;         }         if ( $.inArray( 'Living people', cats ) !== -1 || $.inArray( 'Possibly living people', cats ) !== -1 ) {             addEditIntro( 'Template:BLP_editintro' );         }     } ); }  \/* End of mw.loader.using callback *\/ } ); \/* DO NOT ADD CODE BELOW THIS LINE *\/",
      "text_bytes": 15715,
      "category": [
        
      ],
      "template": [
        
      ],
      "heading": [
        
      ],
      "outgoing_link": [
        
      ],
      "external_link": [
        
      ],
      "incoming_links": 813,
      "redirect": [
        
      ],
      "namespace_text": "MediaWiki",
      "file_text": [
        
      ],
      "auxiliary_text": [
        
      ],
      "source_text": null,
      "opening_text": null,
      "language": "en",
      "version": 694675296,
      "version_type": "external"
    }
  }
]

When pretty-printed, it's easier to see that source_text is null. Like a fool, I was looking at _source['text'] instead of _source['source_text'].

Thank you for https://gerrit.wikimedia.org/r/260903! I poked around the CirrusSearch Git repo earlier, in addition to skimming https://wikitech.wikimedia.org/wiki/Search#CirrusSearch and https://noc.wikimedia.org/conf/CirrusSearch-common.php.txt and such.

I had found where we special-case JavaScript and CSS pages to clear out the heading, template, category, etc. arrays in includes/BuildDocument/PageDataBuilder.php, but I missed where we hardcoded only pages with a wikitext content model to populate (or set to null) the source_text key via buildSourceTextToIndex() in includes/BuildDocument/PageTextBuilder.php.

Like Lego commented on the Gerrit change, I get a weird feeling about the default behavior here. A hardcoded list of content models doesn't seem great. (Grepping through MediaWiki core, I see we also have CONTENT_MODEL_TEXT, for what it's worth.) Maybe the list logic should be inverted or the default behavior should be tweaked? I'm still trying to conceptualize the design decision here. The docstring reads:

Some sorts of content (basically wikitext) have expanded and unexpanded forms.

This doesn't really seem true of CSS, JavaScript, and JSON pages, though it's possible I'm misunderstanding what expanded and unexpanded mean in this context. That said, I'd much rather see this task resolved than worry about a docstring, so if https://gerrit.wikimedia.org/r/260903 gets us there, I'm fine with ignoring the potential code smell.

Nemo_bis raised the priority of this task from Low to High.Dec 24 2015, 10:12 AM
EBernhardson added a comment.EditedDec 28 2015, 7:02 PM

cirrusdump is the full content of what we store in elasticsearch, you can compare against

curl search.svc.eqiad.wmnet:9200/enwiki_general/page/763577?pretty

I'm fairly certain only storing source_text in certain instances is a performance optimization, perhaps premature but hard to say. ContentModel::getTextForSearchIndex() is already used to populate the text field in PageTextBuilder::buildTextToIndex(). The two options off the top of my head would be to either add source_text to everything, or construct the queries that use source_text to be something more like source_text contains 'foo' OR (source_text is null and text contains 'foo'). I spent some time playing with this but the performance is pretty bad and the adjustments to the highlighter query non-trivial.

It is likely much simpler to just stuff the text into source_text and have duplicated data. We would have to change not only the main insource query, but also the highlight query to flip between the two fields. I haven't specifically looked but i imagine the % of duplicated content is small, as most things are wikitext anyways.

The thing is though, since this data is already in the text field we don't need it for the above 'wg' example query, you can just search for it directly with no special syntax:

https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=wg*&fulltext=Search&ns8=1&profile=advanced

Note that even if we copied the data into the source_text field, you would still need to use wg* as opposed to wg, as by default we match whole words (and stemming, etc). I'm uncertain of the benefit of offering the data via insource: when its already available in a standard text search is that beneficial, but it does seem more obvious to (some) users. Thoughts?

I had to expand the above and saw krinkle offered a better example usage, where both insource and regular search ends up being required to find something. That is a more compelling use case than the wg* example.

Deskana lowered the priority of this task from High to Low.Dec 28 2015, 8:52 PM

This task is still low priority for Discovery.

cirrusdump is the full content of what we store in elasticsearch, you can compare against

curl search.svc.eqiad.wmnet:9200/enwiki_general/page/763577?pretty

Good to know. Thank you for this info.

It is likely much simpler to just stuff the text into source_text and have duplicated data.

Yes.

We would have to change not only the main insource query, but also the highlight query to flip between the two fields. I haven't specifically looked but i imagine the % of duplicated content is small, as most things are wikitext anyways.

I'm not sure I understood this part.

The thing is though, since this data is already in the text field we don't need it for the above 'wg' example query, you can just search for it directly with no special syntax:

https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=wg*&fulltext=Search&ns8=1&profile=advanced

Here's an example that has me really confused:

Sample affected page: en:User:Fæ/monobook.js contains rmflinks, but en:Special:Search/rmflinks doesn't show the page. This makes script maintenance difficult.

I tried modifying your example: https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=rmflinks*&fulltext=Search&ns2=1&profile=advanced. Still no luck.

Looking at https://en.wikipedia.org/wiki/User:F%C3%A6/monobook.js?action=cirrusdump, I don't see mention of "rmflinks", but I clearly see it at https://en.wikipedia.org/wiki/User:Fæ/monobook.js?action=edit. This may be a separate issue, but this type of behavior is why I'm still very wary of the search index, particularly when its results are being used in important determinations (such as deprecating or killing code).

This is also why I'm still annoyed at needing to ask shell users to run mwgrep; text search via Special:Search is still (inexplicably) missing results. :-(

EBernhardson added a comment.EditedDec 29 2015, 12:55 AM

We would have to change not only the main insource query, but also the highlight query to flip between the two fields. I haven't specifically looked but i imagine the % of duplicated content is small, as most things are wikitext anyways.

I'm not sure I understood this part.

The query we issue to elasticsearch for a simple 'insource:wg*' looks like this. This is actually slightly simplified from what happens with the production configuration, this is a bare bones mediawiki-vagrant configuration:

{
    "_source": [
        "id",
        "title",
        "namespace",
        "redirect.*",
        "timestamp",
        "text_bytes"
    ],
    "fields": "text.word_count",
    "query": {
        "filtered": {
            "query": {
                "match_all": {

                }
            },
            "filter": {
                "query": {
                    "safer": {
                        "phrase": {
                            "phrase_too_large_action": "convert_to_term_queries"
                        },
                        "query": {
                            "query_string": {
                                "query": "wg*",
                                "fields": [
                                    "source_text.plain"
                                ],
                                "default_operator": "AND",
                                "allow_leading_wildcard": false,
                                "fuzzy_prefix_length": 2,
                                "rewrite": "top_terms_boost_1024"
                            }
                        }
                    }
                }
            }
        }
    },
    "highlight": {
        "pre_tags": [
            "<span class=\"searchmatch\">"
        ],
        "post_tags": [
            "<\/span>"
        ],
        "fields": {
            "source_text.plain": {
                "type": "experimental",
                "number_of_fragments": 1,
                "fragmenter": "scan",
                "fragment_size": 150,
                "options": {
                    "top_scoring": true,
                    "boost_before": {
                        "20": 2,
                        "50": 1.8,
                        "200": 1.5,
                        "1000": 1.2
                    },
                    "max_fragments_scored": 5000
                },
                "no_match_size": 150,
                "highlight_query": {
                    "bool": {
                        "should": [
                            {
                                "safer": {
                                    "phrase": {
                                        "phrase_too_large_action": "convert_to_term_queries"
                                    },
                                    "query": {
                                        "query_string": {
                                            "query": "wg*",
                                            "fields": [
                                                "source_text.plain"
                                            ],
                                            "default_operator": "AND",
                                            "allow_leading_wildcard": false,
                                            "fuzzy_prefix_length": 2,
                                            "rewrite": "top_terms_boost_1024"
                                        }
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        }
    },
    "size": 20,
    "rescore": [
        {
            "window_size": 8192,
            "query": {
                "query_weight": 1,
                "rescore_query_weight": 1,
                "score_mode": "multiply",
                "rescore_query": {
                    "function_score": {
                        "functions": [
                            {
                                "field_value_factor_with_default": {
                                    "field": "incoming_links",
                                    "modifier": "log2p",
                                    "missing": 0
                                }
                            }
                        ]
                    }
                }
            }
        }
    ],
    "stats": [
        "full_text"
    ]
}

to do the or query it would have to look like this (roughly, i don't have the version i tested earlier so this is off the top of my head):

{
    "_source": [
        "id",
        "title",
        "namespace",
        "redirect.*",
        "timestamp",
        "text_bytes"
    ],
    "fields": "text.word_count",
    "query": {
        "filtered": {
            "query": {
                "match_all": {

                }
            },
            "filter": {
                "query": {
                    "safer": {
                        "phrase": {
                            "phrase_too_large_action": "convert_to_term_queries"
                        },
                        "bool": {
                            "should":[
                                { 
                                    "query": {
                                        "query_string": {
                                            "query": "wg*",
                                            "fields": [
                                                "source_text.plain"
                                            ],
                                            "default_operator": "AND",
                                            "allow_leading_wildcard": false,
                                            "fuzzy_prefix_length": 2,
                                            "rewrite": "top_terms_boost_1024"
                                        }
                                    } 
                                },
                                {
                                    "query": {
                                        "filtered": {
                                            "filter": {
                                                "missing": {
                                                    "field": "source_text",
                                                }
                                            },
                                            "query": {
                                                "query_string": {
                                                    "query": "wg*",
                                                    "fields": [
                                                        "text.plain"
                                                    ],
                                                    "default_operator": "AND",
                                                    "allow_leading_wildcard": false,
                                                    "fuzzy_prefix_length": 2,
                                                    "rewrite": "top_terms_boost_1024"
                                                }
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        }
    },
    "highlight": {
        "pre_tags": [
            "<span class=\"searchmatch\">"
        ],
        "post_tags": [
            "<\/span>"
        ],
        "fields": {
            "source_text.plain": {
                "type": "experimental",
                "number_of_fragments": 1,
                "fragmenter": "scan",
                "fragment_size": 150,
                "options": {
                    "top_scoring": true,
                    "boost_before": {
                        "20": 2,
                        "50": 1.8,
                        "200": 1.5,
                        "1000": 1.2
                    },
                    "max_fragments_scored": 5000
                },
                "no_match_size": 150,
                "highlight_query": {
                    "bool": {
                        "should": [
                            {
                                "safer": {
                                    "phrase": {
                                        "phrase_too_large_action": "convert_to_term_queries"
                                    },
                                    "query": {
                                        "filtered": {
                                            "filter": {
                                                "exists": {
                                                    "field": "source_text"
                                                },
                                            },
                                            "query": {
                                                "query_string": {
                                                    "query": "wg*",
                                                    "fields": [
                                                        "source_text.plain"
                                                    ],
                                                    "default_operator": "AND",
                                                    "allow_leading_wildcard": false,
                                                    "fuzzy_prefix_length": 2,
                                                    "rewrite": "top_terms_boost_1024"
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        ]
                    }
                }
            }
            "text.plain": {
                "type": "experimental",
                "number_of_fragments": 1,
                "fragmenter": "scan",
                "fragment_size": 150,
                "options": {
                    "top_scoring": true,
                    "boost_before": {
                        "20": 2,
                        "50": 1.8,
                        "200": 1.5,
                        "1000": 1.2
                    },
                    "max_fragments_scored": 5000
                },
                "no_match_size": 150,
                "highlight_query": {
                    "bool": {
                        "should": [
                            {
                                "safer": {
                                    "phrase": {
                                        "phrase_too_large_action": "convert_to_term_queries"
                                    },
                                    "query": {
                                        "filtered": {
                                            "filter": {
                                                "missing": {
                                                    "field": "source_text"
                                                }
                                            },
                                            "query": {
                                                "query_string": {
                                                    "query": "wg*",
                                                    "fields": [
                                                        "text.plain"
                                                    ],
                                                    "default_operator": "AND",
                                                    "allow_leading_wildcard": false,
                                                    "fuzzy_prefix_length": 2,
                                                    "rewrite": "top_terms_boost_1024"
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        }
    },
    "size": 20,
    "rescore": [
        {
            "window_size": 8192,
            "query": {
                "query_weight": 1,
                "rescore_query_weight": 1,
                "score_mode": "multiply",
                "rescore_query": {
                    "function_score": {
                        "functions": [
                            {
                                "field_value_factor_with_default": {
                                    "field": "incoming_links",
                                    "modifier": "log2p",
                                    "missing": 0
                                }
                            }
                        ]
                    }
                }
            }
        }
    ],
    "stats": [
        "full_text"
    ]
}

Each field that gets highlighted has it's own query which feeds the highlighter, highlighting the source_text or the text field requires duplicating that work.

EBernhardson added a comment.EditedDec 29 2015, 1:24 AM

Looking at https://en.wikipedia.org/wiki/User:F%C3%A6/monobook.js?action=cirrusdump, I don't see mention of "rmflinks", but I clearly see it at https://en.wikipedia.org/wiki/User:Fæ/monobook.js?action=edit. This may be a separate issue, but this type of behavior is why I'm still very wary of the search index, particularly when its results are being used in important determinations (such as deprecating or killing code).

This is also why I'm still annoyed at needing to ask shell users to run mwgrep; text search via Special:Search is still (inexplicably) missing results. :-(

I don't see how these are related to this ticket, please create a new one or your complaints will be lost in some closed ticket. The particular case with monobook.js looks to be related to Sanitizer::stripAllTags(). It will likely be low priority, as i'm not qualified to dig through and adjust our mediawiki wide security related code.

I don't see how these are related to this ticket, please create a new one or your complaints will be lost in some closed ticket.

I don't think anyone cares whether insource: or just a regular text search work. The problem is that currently doing a search gives the appearance of working, by providing 9 results, but it's inexplicably missing certain pages. I've filed T122566 to track this issue. I mentioned it here because the comment was left on a task that got merged with this task.

The particular case with monobook.js looks to be related to Sanitizer::stripAllTags(). It will likely be low priority, as i'm not qualified to dig through and adjust our mediawiki wide security related code.

So if source_text were populated for these "special" CSS and JavaScript wiki pages, presumably at least insource: queries would start working after a re-index? That would be a pretty big win compared to the current situation where both regular text searches and insource: searches fail for pages such as https://en.wikipedia.org/wiki/User:F%C3%A6/monobook.js.

Change 260903 merged by jenkins-bot:
For source code pages, index page contents as source_text

https://gerrit.wikimedia.org/r/260903

Hmmm, https://en.wikipedia.org/wiki/MediaWiki:Common.js?action=cirrusdump doesn't seem to have a source_text key at all currently:

[
  {
    "_index": "enwiki_general_1432193140",
    "_type": "page",
    "_id": "763577",
    "_version": [
      
    ],
    "_source": {
      "namespace": 8,
      "title": "Common.js",
      "timestamp": "2015-12-10T20:05:12Z",
      "text": "\/**  * Keep code in MediaWiki:Common.js to a minimum as it is unconditionally  * loaded for all users on every wiki page. If possible create a gadget that is  * enabled by default instead of adding it here (since gadgets are fully  * optimized ResourceLoader modules with possibility to add dependencies etc.)  *  * Since Common.js isn't a gadget, there is no place to declare its  * dependencies, so we have to lazy load them with mw.loader.using on demand and  * then execute the rest in the callback. In most cases these dependencies will  * be loaded (or loading) already and the callback will not be delayed. In case a  * dependency hasn't arrived yet it'll make sure those are loaded before this.  *\/  \/* global mw, $, importStylesheet, importScript *\/ \/* jshint strict:false, browser:true *\/  mw.loader.using( ['mediawiki.user', 'mediawiki.util', 'mediawiki.notify', 'jquery.client'] ).done( function () { \/* Begin of mw.loader.using callback *\/  \/**  * Main Page layout fixes  *  * Description: Adds an additional link to the complete list of languages available.  * Maintainers: [[User:AzaToth]], [[User:R. Koot]], [[User:Alex Smotrov]]  *\/ if ( mw.config.get( 'wgPageName' ) === 'Main_Page' || mw.config.get( 'wgPageName' ) === 'Talk:Main_Page' ) {     $( function () {         mw.util.addPortletLink( 'p-lang', '\/\/meta.wikimedia.org\/wiki\/List_of_Wikipedias',             'Complete list', 'interwiki-completelist', 'Complete list of Wikipedias' );     } ); }  \/**  * Redirect User:Name\/skin.js and skin.css to the current skin's pages  * (unless the 'skin' page really exists)  * @source: http:\/\/www.mediawiki.org\/wiki\/Snippets\/Redirect_skin.js  * @rev: 2  *\/ if ( mw.config.get( 'wgArticleId' ) === 0 && mw.config.get( 'wgNamespaceNumber' ) === 2 ) {     var titleParts = mw.config.get( 'wgPageName' ).split( '\/' );     \/* Make sure there was a part before and after the slash        and that the latter is 'skin.js' or 'skin.css' *\/     if ( titleParts.length == 2 ) {         var userSkinPage = titleParts.shift() + '\/' + mw.config.get( 'skin' );         if ( titleParts.slice( -1 ) == 'skin.js' ) {             window.location.href = mw.util.getUrl( userSkinPage + '.js' );         } else if ( titleParts.slice( -1 ) == 'skin.css' ) {             window.location.href = mw.util.getUrl( userSkinPage + '.css' );         }     } }  \/**  * Map addPortletLink to mw.util  * @deprecated: Use mw.util.addPortletLink instead.  *\/ mw.log.deprecate( window, 'addPortletLink', mw.util.addPortletLink, 'Use mw.util.addPortletLink instead' );  \/**  * Extract a URL parameter from the current URL  * @deprecated: Use mw.util.getParamValue with proper escaping  *\/ mw.log.deprecate( window, 'getURLParamValue', mw.util.getParamValue, 'Use mw.util.getParamValue instead' );  \/**  * Test if an element has a certain class  * @deprecated:  Use $(element).hasClass() instead.  *\/ mw.log.deprecate( window, 'hasClass', function ( element, className ) {     return $( element ).hasClass( className ); }, 'Use jQuery.hasClass() instead' );  \/**  * @source www.mediawiki.org\/wiki\/Snippets\/Load_JS_and_CSS_by_URL  * @rev 6  *\/ var extraCSS = mw.util.getParamValue( 'withCSS' ),     extraJS = mw.util.getParamValue( 'withJS' );  if ( extraCSS ) {     if ( extraCSS.match( \/^MediaWiki:[^&=%#]*\\.css$\/ ) ) {         importStylesheet( extraCSS );     } else {         mw.notify( 'Only pages from the MediaWiki namespace are allowed.', { title: 'Invalid withCSS value' } );     } }  if ( extraJS ) {     if ( extraJS.match( \/^MediaWiki:[^&=%#]*\\.js$\/ ) ) {         importScript( extraJS );     } else {         mw.notify( 'Only pages from the MediaWiki namespace are allowed.', { title: 'Invalid withJS value' } );     } }  \/**  * Import more specific scripts if necessary  *\/ if ( mw.config.get( 'wgAction' ) === 'edit' || mw.config.get( 'wgAction' ) === 'submit' || mw.config.get( 'wgCanonicalSpecialPageName' ) === 'Upload' ) {     \/* scripts specific to editing pages *\/     importScript( 'MediaWiki:Common.js\/edit.js' ); } else if ( mw.config.get( 'wgCanonicalSpecialPageName' ) === 'Watchlist' ) {     \/* watchlist scripts *\/     importScript( 'MediaWiki:Common.js\/watchlist.js' ); }  \/**  * Fix for Windows XP Unicode font rendering  *\/ if ( navigator.appVersion.search(\/windows nt 5\/i) !== -1 ) {     mw.util.addCSS( '.IPA { font-family: \"Lucida Sans Unicode\", \"Arial Unicode MS\"; } ' +                 '.Unicode { font-family: \"Arial Unicode MS\", \"Lucida Sans Unicode\"; } ' ); }  \/**  * WikiMiniAtlas  *  * Description: WikiMiniAtlas is a popup click and drag world map.  *              This script causes all of our coordinate links to display the WikiMiniAtlas popup button.  *              The script itself is located on meta because it is used by many projects.  *              See [[Meta:WikiMiniAtlas]] for more information.  * Maintainers: [[User:Dschwen]]  *\/ ( function () {     var require_wikiminiatlas = false;     var coord_filter = \/geohack\/;     $( function () {         $( 'a.external.text' ).each( function( key, link ) {             if ( link.href && coord_filter.exec( link.href ) ) {                 require_wikiminiatlas = true;                 \/\/ break from loop                 return false;             }         } );         if ( $( 'div.kmldata' ).length ) {             require_wikiminiatlas = true;         }         if ( require_wikiminiatlas ) {             mw.loader.load( '\/\/meta.wikimedia.org\/w\/index.php?title=MediaWiki:Wikiminiatlas.js&action=raw&ctype=text\/javascript' );         }     } ); } )();  \/**  * Collapsible tables  *  * Allows tables to be collapsed, showing only the header. See [[Wikipedia:NavFrame]].  *  * @version 2.0.3 (2014-03-14)  * @source https:\/\/www.mediawiki.org\/wiki\/MediaWiki:Gadget-collapsibleTables.js  * @author [[User:R. Koot]]  * @author [[User:Krinkle]]  * @deprecated Since MediaWiki 1.20: Use class=\"mw-collapsible\" instead which  * is supported in MediaWiki core.  *\/  var autoCollapse = 2; var collapseCaption = 'hide'; var expandCaption = 'show'; var tableIndex = 0;  function collapseTable( tableIndex ) {     var Button = document.getElementById( 'collapseButton' + tableIndex );     var Table = document.getElementById( 'collapsibleTable' + tableIndex );      if ( !Table || !Button ) {         return false;     }      var Rows = Table.rows;     var i;      if ( Button.firstChild.data === collapseCaption ) {         for ( i = 1; i = autoCollapse && $( NavigationBoxes[i] ).hasClass( 'autocollapse' ) )         ) {             collapseTable( i );         }         else if ( $( NavigationBoxes[i] ).hasClass ( 'innercollapse' ) ) {             var element = NavigationBoxes[i];             while ((element = element.parentNode)) {                 if ( $( element ).hasClass( 'outercollapse' ) ) {                     collapseTable ( i );                     break;                 }             }         }     } }  mw.hook( 'wikipage.content' ).add( createCollapseButtons );  \/**  * Dynamic Navigation Bars (experimental)  *  * Description: See [[Wikipedia:NavFrame]].  * Maintainers: UNMAINTAINED  *\/  \/* set up the words in your language *\/ var NavigationBarHide = '[' + collapseCaption + ']'; var NavigationBarShow = '[' + expandCaption + ']'; var indexNavigationBar = 0;  \/**  * Shows and hides content and picture (if available) of navigation bars  * Parameters:  *     indexNavigationBar: the index of navigation bar to be toggled  **\/ window.toggleNavigationBar = function ( indexNavigationBar, event ) {     var NavToggle = document.getElementById( 'NavToggle' + indexNavigationBar );     var NavFrame = document.getElementById( 'NavFrame' + indexNavigationBar );     var NavChild;      if ( !NavFrame || !NavToggle ) {         return false;     }      \/* if shown now *\/     if ( NavToggle.firstChild.data === NavigationBarHide ) {         for ( NavChild = NavFrame.firstChild; NavChild != null; NavChild = NavChild.nextSibling ) {             if ( $( NavChild ).hasClass( 'NavContent' ) || $( NavChild ).hasClass( 'NavPic' ) ) {                 NavChild.style.display = 'none';             }         }     NavToggle.firstChild.data = NavigationBarShow;      \/* if hidden now *\/     } else if ( NavToggle.firstChild.data === NavigationBarShow ) {         for ( NavChild = NavFrame.firstChild; NavChild != null; NavChild = NavChild.nextSibling ) {             if ( $( NavChild ).hasClass( 'NavContent' ) || $( NavChild ).hasClass( 'NavPic' ) ) {                 NavChild.style.display = 'block';             }         }         NavToggle.firstChild.data = NavigationBarHide;     }      event.preventDefault(); };  \/* adds show\/hide-button to navigation bars *\/ function createNavigationBarToggleButton( $content ) {     var NavChild;     \/* iterate over all -elements *\/     var $divs = $content.find( 'div' );     $divs.each( function ( i, NavFrame ) {         \/* if found a navigation bar *\/         if ( $( NavFrame ).hasClass( 'NavFrame' ) ) {              indexNavigationBar++;             var NavToggle = document.createElement( 'a' );             NavToggle.className = 'NavToggle';             NavToggle.setAttribute( 'id', 'NavToggle' + indexNavigationBar );             NavToggle.setAttribute( 'href', '#' );             $( NavToggle ).on( 'click', $.proxy( window.toggleNavigationBar, window, indexNavigationBar ) );              var isCollapsed = $( NavFrame ).hasClass( 'collapsed' );             \/**              * Check if any children are already hidden.  This loop is here for backwards compatibility:              * the old way of making NavFrames start out collapsed was to manually add style=\"display:none\"              * to all the NavPic\/NavContent elements.  Since this was bad for accessibility (no way to make              * the content visible without JavaScript support), the new recommended way is to add the class              * \"collapsed\" to the NavFrame itself, just like with collapsible tables.              *\/             for ( NavChild = NavFrame.firstChild; NavChild != null && !isCollapsed; NavChild = NavChild.nextSibling ) {                 if ( $( NavChild ).hasClass( 'NavPic' ) || $( NavChild ).hasClass( 'NavContent' ) ) {                     if ( NavChild.style.display === 'none' ) {                         isCollapsed = true;                     }                 }             }             if ( isCollapsed ) {                 for ( NavChild = NavFrame.firstChild; NavChild != null; NavChild = NavChild.nextSibling ) {                     if ( $( NavChild ).hasClass( 'NavPic' ) || $( NavChild ).hasClass( 'NavContent' ) ) {                         NavChild.style.display = 'none';                     }                 }             }             var NavToggleText = document.createTextNode( isCollapsed ? NavigationBarShow : NavigationBarHide );             NavToggle.appendChild( NavToggleText );              \/* Find the NavHead and attach the toggle link (Must be this complicated because Moz's firstChild handling is borked) *\/             for( var j = 0; j < NavFrame.childNodes.length; j++ ) {                 if ( $( NavFrame.childNodes[j] ).hasClass( 'NavHead' ) ) {                     NavToggle.style.color = NavFrame.childNodes[j].style.color;                     NavFrame.childNodes[j].appendChild( NavToggle );                 }             }             NavFrame.setAttribute( 'id', 'NavFrame' + indexNavigationBar );         }     } ); }  mw.hook( 'wikipage.content' ).add( createNavigationBarToggleButton );  \/**  * Uploadwizard_newusers  * Switches in a message for non-autoconfirmed users at [[Wikipedia:Upload]]  *  * Maintainers: [[User:Krimpet]]  *\/ function uploadwizard_newusers() {     if ( mw.config.get( 'wgNamespaceNumber' ) === 4 && mw.config.get( 'wgTitle' ) === 'Upload' && mw.config.get( 'wgAction' ) === 'view' ) {         var oldDiv = document.getElementById( 'autoconfirmedusers' ),             newDiv = document.getElementById( 'newusers' );         if ( oldDiv && newDiv ) {             var userGroups = mw.config.get( 'wgUserGroups' );             if ( userGroups ) {                 for ( var i = 0; i < userGroups.length; i++ ) {                     if ( userGroups[i] === 'autoconfirmed' ) {                         oldDiv.style.display = 'block';                         newDiv.style.display = 'none';                         return;                     }                 }             }             oldDiv.style.display = 'none';             newDiv.style.display = 'block';             return;         }     } }  $(uploadwizard_newusers);  \/**  * Magic editintros ****************************************************  *  * Description: Adds editintros on disambiguation pages and BLP pages.  * Maintainers: [[User:RockMFR]]  *\/ function addEditIntro( name ) {     $( '.mw-editsection, #ca-edit' ).find( 'a' ).each( function ( i, el ) {         el.href = $( this ).attr( 'href' ) + '&editintro=' + name;     } ); }  if ( mw.config.get( 'wgNamespaceNumber' ) === 0 ) {     $( function () {         if ( document.getElementById( 'disambigbox' ) ) {             addEditIntro( 'Template:Disambig_editintro' );         }     } );      $( function () {         var cats = mw.config.get('wgCategories');         if ( !cats ) {             return;         }         if ( $.inArray( 'Living people', cats ) !== -1 || $.inArray( 'Possibly living people', cats ) !== -1 ) {             addEditIntro( 'Template:BLP_editintro' );         }     } ); }  \/* End of mw.loader.using callback *\/ } ); \/* DO NOT ADD CODE BELOW THIS LINE *\/",
      "text_bytes": 15715,
      "category": [
        
      ],
      "template": [
        
      ],
      "heading": [
        
      ],
      "outgoing_link": [
        
      ],
      "external_link": [
        
      ],
      "incoming_links": 813,
      "redirect": [
        
      ],
      "namespace_text": "MediaWiki",
      "file_text": [
        
      ],
      "auxiliary_text": [
        
      ],
      "opening_text": null,
      "language": "en",
      "version": 694675296,
      "version_type": "external"
    }
  }
]

But https://en.wiktionary.org/wiki/MediaWiki:Common.js?action=cirrusdump has a source_text key, set to null currently, until we re-index:

[
  {
    "_index": "enwiktionary_general_1415232726",
    "_type": "page",
    "_id": "872865",
    "_version": [
      
    ],
    "_source": {
      "namespace": 8,
      "title": "Common.js",
      "timestamp": "2015-10-14T13:39:51Z",
      "text": "'use strict'; \/* Any JavaScript here will be loaded for all users on every page load. *\/ \/\/ {{documentation}}  \/*jshint shadow:true, undef:true, latedef:true, unused:true, es3:true *\/ \/*global jQuery, mw, importScript, importStylesheet *\/  if (!Array.prototype.indexOf) \/\/ IE Array.prototype.indexOf = function (needle, fromIndex) {  return jQuery.inArray(needle, this, fromIndex); };  \/** [[WT:PREFS]] v2.0 **\/ try { (function () {  var prefs; try {  prefs = window.localStorage.getItem('AGprefs'); } catch (e) {  prefs = jQuery.cookie('AGprefs'); }  prefs = prefs && jQuery.parseJSON(prefs);  if (mw.config.get('wgUserGroups').indexOf('autoconfirmed') !== -1)  return;  if (mw.config.get('wgUserGroups').indexOf('user') === -1) {  \/\/ XXX: [[Wiktionary:Preferences\/V2]] is just a temporary page   mw.loader.using(['mediawiki.util'], function () {   mw.util.addPortletLink('p-personal', mw.util.getUrl('Wiktionary:Preferences\/V2'),    'Preferences', 'pt-agprefs', 'Personalise Wiktionary (settings are kept per-browser).', '',    document.getElementById('pt-createaccount'));  });    if ((mw.config.get('wgAction') === 'view') && (mw.config.get('wgPageName') === 'Wiktionary:Preferences\/V2')) {   mw.loader.load('ext.gadget.AGprefs'); \/\/ [[MediaWiki:Gadget-AGprefs.js]]  } }  if (!prefs)  return;  mw.loader.state('the_pope_is_an_atheist_woman_alien', 'missing'); for (var key in prefs.modules) {  if (prefs.modules[key]) {   mw.loader.load([key]);  } else {   \/\/ unavoidable race condition. to prevent it, every enabled-by-default gadget should have \"site\" as a dependency   if (mw.loader.getState(key) !== 'ready') {    mw.loader.moduleRegistry[key].dependencies.push('the_pope_is_an_atheist_woman_alien');    mw.loader.state(key, 'missing');   } else {    \/\/ XXX    mw.log.warn(key + \" could not be disabled; make sure it has 'site' declared as a dependency\");   }  } }  for (var key in prefs.sheets) {  importStylesheet('MediaWiki:Gadget-' + key); }  for (var key in prefs.scripts) {  importScript('MediaWiki:Gadget-' + key); }  if (mw.config.get('wgUserGroups').indexOf('user') !== -1) mw.loader.using(['mediawiki.notify', 'mediawiki.api'], function () {  var changes = [];  for (var key in prefs.gadgets)   changes.push('gadget-' + key + '=' + (prefs.gadgets[key] ? '1' : '0'));   (new mw.Api()).postWithToken('options', {   action: 'options',   change: changes.join('|')  }).then(function () {   jQuery.cookie('AGprefs', null);   try { window.localStorage.removeItem('AGprefs'); } catch (e) { \/* *\/ }   mw.notify(    jQuery('Your per-browser preferences have been migrated' +    'From now on, you should use your user preferences page. ' +    'Preferences will no longer apply after you log out.')   );  }); });  })(); } catch (e) { mw.log.warn(e); }  mw.loader.using('mediawiki.util').done(function(){  \/** &withmodule= query parameter **\/  if (mw.util.getParamValue('withmodule'))   mw.loader.load(mw.util.getParamValue('withmodule').split(','));    \/** &preloadtext= and &preloadminor= **\/  if (mw.config.get('wgAction') === 'edit')  jQuery(document).ready(function() {   var wpTextbox1 = document.getElementById('wpTextbox1');   var wpMinoredit = document.getElementById('wpMinoredit');   if (!wpTextbox1)    return;    var preloadtext = mw.util.getParamValue('preloadtext');   var preloadminor = mw.util.getParamValue('preloadminor');     if (preloadtext && !wpTextbox1.value)    wpTextbox1.value = preloadtext;   if ((preloadminor !== null) && wpMinoredit)    wpMinoredit.checked = !\/^(0|false|no|)$\/i.test(preloadminor);  });   \/** Monthly subpages; see {{discussion recent months}} **\/  \/*  See also: [[Special:AbuseFilter\/43]]  *\/  if (\/^Wiktionary:(Beer_parlour|Grease_pit|Tea_room|Etymology_scriptorium|Information_desk)$\/.test(mw.config.get('wgPageName')))  jQuery(document).ready(function() {   var nNSR = document.getElementById('new-section-redirect').getElementsByTagName('a')[0];   var caAddSection = document.getElementById('ca-addsection');   if (!caAddSection) {    caAddSection = mw.util.addPortletLink(mw.config.get('skin') === 'vector' ? 'p-views' : 'p-cactions',     nNSR.href, '+', 'ca-addsection', \"Start a new section\", '+', document.getElementById('ca-history')    );   } else {    caAddSection.getElementsByTagName('a')[0].href = nNSR.href;   }  }); });  \/** [[Special:PrefixIndex\/Unsupported titles]] **\/ if ((mw.config.get('wgAction') === 'view') && \/^Unsupported_titles\\\/\/.test(mw.config.get('wgPageName'))) jQuery(document).ready(function () {  var titleMap = {   'Left_curly_bracket'      : '{',   'Right_curly_bracket'     : '}',   'Left_square_bracket'     : '[',   'Right_square_bracket'    : ']',   'Less_than_sign'          : '<',   'Greater_than_sign'       : '>',   'Double_colon'            : '::',   'Colon_equals'            : ':=',   'Colon_left_paren'        : ':(',   'Colon_right_paren'       : ':)',   'Less_than_greater_than'  : '<>',   'Less_than_three'         : '<3',   'Colon_hyphen_left_paren' : ':-(',   'Colon_hyphen_right_paren': ':-)',   'Vertical_line'           : '|',   'Vertical_line_space_vertical_line' : '| |',   'C_sharp'                 : 'C#',   'Number_sign'             : '#',   'Number_sign_space_number_sign'     : '# #',   'Colon'                   : ':',   'Double_period'           : '..',   'Full_stop'               : '.',   'Low_line'                : '_',   'Replacement_character'   : '\\ufffd',   'Square_brackets'         : '[ ]',   'Curly_brackets'          : '{ }',   'Square_bracketed_ellipsis'   : '[\u2026]',   'Low_line_space_low_line'   : '_ _',      'Thai_name_of_Bangkok'    : '\u0e01\u0e23\u0e38\u0e07\u0e40\u0e17\u0e1e\u0e21\u0e2b\u0e32\u0e19\u0e04\u0e23 \u0e2d\u0e21\u0e23\u0e23\u0e31\u0e15\u0e19\u0e42\u0e01\u0e2a\u0e34\u0e19\u0e17\u0e23\u0e4c \u0e21\u0e2b\u0e34\u0e19\u0e17\u0e23\u0e32\u0e22\u0e38\u0e18\u0e22\u0e32\u0e21\u0e2b\u0e32\u0e14\u0e34\u0e25\u0e01\u0e20\u0e1e \u0e19\u0e1e\u0e23\u0e31\u0e15\u0e19\u0e4c\u0e23\u0e32\u0e0a\u0e18\u0e32\u0e19\u0e35\u0e1a\u0e38\u0e23\u0e35\u0e23\u0e21\u0e22\u0e4c \u0e2d\u0e38\u0e14\u0e21\u0e23\u0e32\u0e0a\u0e19\u0e34\u0e40\u0e27\u0e28\u0e19\u0e4c\u0e21\u0e2b\u0e32\u0e2a\u0e16\u0e32\u0e19 \u0e2d\u0e21\u0e23\u0e1e\u0e34\u0e21\u0e32\u0e19\u0e2d\u0e27\u0e15\u0e32\u0e23\u0e2a\u0e16\u0e34\u0e15 \u0e2a\u0e31\u0e01\u0e01\u0e30\u0e17\u0e31\u0e15\u0e15\u0e34\u0e22\u0e30\u0e27\u0e34\u0e29\u0e13\u0e38\u0e01\u0e23\u0e23\u0e21\u0e1b\u0e23\u0e30\u0e2a\u0e34\u0e17\u0e18\u0e34\u0e4c',   'Ancient_Greek_dish'      : '\u03bb\u03bf\u03c0\u03b1\u03b4\u03bf\u03c4\u03b5\u03bc\u03b1\u03c7\u03bf\u03c3\u03b5\u03bb\u03b1\u03c7\u03bf\u03b3\u03b1\u03bb\u03b5\u03bf\u03ba\u03c1\u03b1\u03bd\u03b9\u03bf\u03bb\u03b5\u03b9\u03c8\u03b1\u03bd\u03bf\u03b4\u03c1\u03b9\u03bc\u03c5\u03c0\u03bf\u03c4\u03c1\u03b9\u03bc\u03bc\u03b1\u03c4\u03bf\u03c3\u03b9\u03bb\u03c6\u03b9\u03bf\u03ba\u03b1\u03c1\u03b1\u03b2\u03bf\u03bc\u03b5\u03bb\u03b9\u03c4\u03bf\u03ba\u03b1\u03c4\u03b1\u03ba\u03b5\u03c7\u03c5\u03bc\u03b5\u03bd\u03bf\u03ba\u03b9\u03c7\u03bb\\u00AD\u03b5\u03c0\u03b9\u03ba\u03bf\u03c3\u03c3\u03c5\u03c6\u03bf\u03c6\u03b1\u03c4\u03c4\u03bf\u03c0\u03b5\u03c1\u03b9\u03c3\u03c4\u03b5\u03c1\u03b1\u03bb\u03b5\u03ba\u03c4\u03c1\u03c5\u03bf\u03bd\u03bf\u03c0\u03c4\u03bf\u03ba\u03b5\u03c6\u03b1\u03bb\u03bb\u03b9\u03bf\u03ba\u03b9\u03b3\u03ba\u03bb\u03bf\u03c0\u03b5\u03bb\u03b5\u03b9\u03bf\u03bb\u03b1\u03b3\u1ff3\u03bf\u03c3\u03b9\u03c1\u03b1\u03b9\u03bf\u03b2\u03b1\u03c6\u03b7\u03c4\u03c1\u03b1\u03b3\u03b1\u03bd\u03bf\u03c0\u03c4\u03b5\u03c1\u03cd\u03b3\u03c9\u03bd',      'Ideographic_space'       : '[ideographic space]',   'Space'                   : '[space]',   'Ogham_space'             : '[Ogham space]',    ''                        : ''  };  var newTitle = titleMap[mw.config.get('wgPageName').replace(\/^Unsupported_titles\\\/\/, '')] ||   (mw.config.get('wgTitle').replace(\/^Unsupported titles\\\/\/, ''));    var titleTag = document.getElementsByTagName('title')[0];  titleTag.innerHTML = titleTag.innerHTML.replace(\/^.*(?= -)\/, newTitle.replace(\/]+>\/g, ''));  document.getElementById('firstHeading').innerHTML = newTitle; });  \/\/  if (mw.config.get('wgCanonicalSpecialPageName') == 'Badtitle') {  var m, rxArticlePath = new RegExp('^' + mw.config.get('wgArticlePath').replace('$1', '(.*)') + '$');  var title;  if ((m = rxArticlePath.exec(location.pathname))) {   title = decodeURIComponent(m[1]);  } else {   title = mw.util.getParamValue('title');  }    \/\/ not all titles are listed, because not all actually trigger the \"bad title\" message  var revTitleMap = {   '': 'Unsupported titles\/Greater than sign',   '{': 'Unsupported titles\/Left curly bracket',   '}': 'Unsupported titles\/Right curly bracket',   '[': 'Unsupported titles\/Left square bracket',   ']': 'Unsupported titles\/Right square bracket',   '_': 'Unsupported titles\/Low line',   ' ': 'Unsupported titles\/Space',   ':': 'Unsupported titles\/Colon',   '.': 'Unsupported titles\/Full stop',   '|': 'Unsupported titles\/Vertical line',      '::': 'Unsupported titles\/Double colon',   '': 'Unsupported titles\/Less than greater than',   '<3': 'Unsupported titles\/Less than three',      '[ ]': 'Unsupported titles\/Square brackets',   '{ }': 'Unsupported titles\/Curly brackets',    '\\ufffd': 'Unsupported titles\/Replacement character'  };  if (revTitleMap[title]) {   location.href = mw.util.getUrl(revTitleMap[title]);  } }  \/\/ The rest of the scripts are at [[MediaWiki:Gadget-legacy.js]]. \/\/ Most of them should be converted into gadgets as time and resources allow.",
      "text_bytes": 8516,
      "category": [
        
      ],
      "heading": [
        
      ],
      "outgoing_link": [
        
      ],
      "incoming_links": 86,
      "redirect": [
        
      ],
      "template": [
        
      ],
      "external_link": [
        
      ],
      "namespace_text": "MediaWiki",
      "file_text": [
        
      ],
      "auxiliary_text": [
        
      ],
      "source_text": null,
      "opening_text": null,
      "language": "en",
      "version": 34556571,
      "version_type": "external"
    }
  }
]

Weird.

ksmith moved this task from Search to On Sprint Board on the Discovery board.Jan 14 2016, 5:42 PM
Restricted Application added a subscriber: Luke081515. · View Herald TranscriptJan 15 2016, 5:46 PM

The above patch was merged and seems to have been deployed, but the insource query listed in the description still doesn't work. Do we need to do a reindex, or something else?

I've kicked off the reindex of all non-wikitext pages on all wiki's in a tmux session in terbium, unsure how long it will take. Will check back on it tomorrow

currently indexing dewikisource (goes in alphabetical order). I expect this to complete by monday or so.

currently indexing dewikisource (goes in alphabetical order). I expect this to complete by monday or so.

Thanks! I'm going to leave this as open for now, until I can verify that the example query given above starts to work.

EBernhardson updated the task description. (Show Details)Jan 23 2016, 7:29 PM

indexing is still in progress, but it's made it past enwiktionary. Turns out the query in the description was incorrect, because insource does pattern matching it needed to be wg* rather than wg. Query results are now as expected

Deskana closed this task as Resolved.Jan 23 2016, 9:30 PM

Thanks! This is a small use case but few JavaScript/CSS pages can have a huge impact sometimes. :)