Page MenuHomePhabricator

[Migrated] Title case for citations
Closed, DeclinedPublic

Description

Most people assume that you've got to keep the case that the source is using, but the MOS advises changing this to standard title case. So, may I suggest pushing a citation's title parameter through:

<source lang="csharp">
public static string ProperCase(string TextToFormat)
{
if(TextToFormat.ToUpper() == TextToFormat){
return new CultureInfo("en").TextInfo.ToTitleCase(TextToFormat.ToLower());
} else {
return TextToFormat;
}
}
</source>

to fix the most in your face, block-caps titles. I don't think that would leave you with any false positives. Cheers, - @Jarry1250 16:17, 20 June 2009 (UTC)

Event Timeline

Reguyla raised the priority of this task from to Needs Triage.
Reguyla updated the task description. (Show Details)
Reguyla added a project: AutoWikiBrowser.
Reguyla moved this task to General fixes on the AutoWikiBrowser board.

@Reedy, 19:50, 22 June 2009 (UTC) wrote:
Well, except for the non english wiki's ;)

@Jarry1250 (UTC) wrote:
Oh yeah :) I meant to change that "en" for some variable, but I couldn't be arsed to find out which was the right one.

@Jarry1250, 09:16, 28 June 2009 (UTC) wrote:
Maybe also external link titles? Harder to grab though, I would think.

@Rjwilmsi 10:27, 28 June 2009 (UTC) wrote:
When do you propose to convert the case of citation titles? Just when all in uppercase? Do you have some example articles?

@Jarry1250 10:31, 28 June 2009 (UTC) wrote:
Well, in a perfect world, a citation title of "EXAMPLE: Lorem ipsum" would be converted as well, but the false positive/pointless edit rate would be too high I fear. So yes, just when all in uppercase for maximum efficiency. I would like to see this as a general fix if possible, though I haven't tested the FP rate myself yet. I shall set about finding you an example now.

@Jarry1250 10:37, 28 June 2009 (UTC) wrote:
Ten random pages gave me https://en.wikipedia.org/wiki/Gaynor Cawley ("BIOGRAPHY") and https://en.wikipedia.org/wiki/Mustafa Ahmed Hamlily which includes a partial one (ref #12).

@Dispenser 12:13, 28 June 2009 (UTC) wrote:
I've had some experience programming reflinks with this, you can get most of the cases right. Here some edges cases

  • Newspaper Archive: MINOR STORY OF THE DAY; MAN BITES DOG
  • 65_PDF.pdf
  • SPACE PROBE 56T LAUNCHES
  • A.I.D.S. EPIDEMIC STILL SPREADING
  • FOREIGN AIDS STILL MISSING
  • ATLAS USER EQUIPMENT INTRODUCTION
  • FIRST ROBOTICS GIVES HOPE
  • NAVSTAR GPS
  • J P PENNY

Those are some example I can think off the top of my head. It also a good idea to apply it to the author/first/last/publisher fields as well.

@Jarry1250 18:58, 30 June 2009 (UTC) wrote:
Hey, thanks Dispenser. As written, the code doesn't touch .pdf (lowercase), capitalises "Of", and turns GPS to "Gps". The rest it gets right; hopefully, a few tweaks and it should be read to roll.

@Jarry1250 20:16, 30 June 2009 (UTC) wrote:
Here's a much improved function for converting to useful title case, which is more palatable than block caps (I personally prefer sentence case, but that would be more controversial / less widely deployable. It works on all the examples above (and some more I invented), with the exception of acronyms that could be words UNICEF, etc. GPS has no vowels, and is therefore easy to capitalise.

<source lang="csharp">
public static string ProperCase(string TextToFormat)
        {
            List<String> smalls = new List<String> { "and", "of", "the", "but", "in", "to", "a", "an" };

            if (TextToFormat.ToUpper() == TextToFormat)
            {
                TextToFormat = new CultureInfo("en").TextInfo.ToTitleCase(TextToFormat.ToLower());
                //Ignore first words
                String FirstBit = "";
                if (TextToFormat.Contains(" "))
                {
                    int Index = TextToFormat.IndexOf(" ");
                    FirstBit = TextToFormat.Substring(0, Index);
                    TextToFormat = TextToFormat.Substring(Index);
                }
                foreach (String small in smalls)
                {
                    TextToFormat = Regex.Replace(TextToFormat, "([^a-zA-Z0-9])" + small + "([^a-zA-Z0-9])", "$1" + small + "$2", RegexOptions.IgnoreCase);
                }
                TextToFormat = FirstBit + TextToFormat;
                String[] Bits = TextToFormat.Split(" ".ToCharArray());
                for (int i = 0; i < Bits.Length; i++)
                {
                    //Capitalise consonant only words, plus a few obvious ones
                    if (Regex.IsMatch(Bits[i], "^([BCDFGHJKLMNPQRSTVWXZ]{2,}|UK|USA)$", RegexOptions.IgnoreCase))
                    {
                        Bits[i] = Bits[i].ToUpper();
                    }
                }
                return String.Join(" ", Bits);
            }
            else
            {
                return TextToFormat;
            }
        }
</source>

@Dispenser 18:37, 2 July 2009 (UTC) wrote:
Maybe you should use a dictionary from a spellchecker to ensure words like GNU, LIDAR, and CBDTPA stay uppercased? You might also be able to capitalize names Ted Stevens.

@Jarry1250 18:41, 2 July 2009 (UTC) wrote:
Yeah... it's a question of how much in the way of resources one chooses to give over to such a minor (albeit intensely annoying to me) thing as capitalisation... hopefully the major acronyms can be hardcoded, and the rest left to the individual editors to catch. As the default Is This Sort Of Capitalisation, We Needn't Worry About Names Of People.

@ClickRick 18:47, 2 July 2009 (UTC) wrote:
And no matter how much effort we throw at the problem, there will always be "yet another exception", e.g. https://en.wikipedia.org/wiki/CAT scan
The answer has to be that this will be a computer-assisted process, not an entirely automated one.

@Rjwilmsi 15:38, 8 July 2009 (UTC) wrote:
This could be implemented as a general fix that users would have to explicitly turn on via the options menu (off by default) and could be disabled for bots. Question then is just what fields is this required on beyond the 'title=' field of a citation template?

@Kumioko 00:47, 26 August 2011 (UTC) wrote:
I think this was added to AWB. Not sure though but we might be able to archive this one.

@Magioladitis 22:41, 21 August 2012 (UTC) wrote:
We haven't added it yet.

@Magioladitis 23:30, 24 May 2013 (UTC) wrote:
Please ask for a bot to this.

@GoingBatty 19:07, 16 September 2013 (UTC) wrote:
@Ohconfucius - Does one of your scripts fix the capitalization in the {{para|title}} field of citation templates? (Maybe https://en.wikipedia.org/wiki/User:Ohconfucius/script/formatgeneral ?) Thanks!

@Magioladitis 23:40, 16 September 2013 (UTC) wrote:
I guess the one of Jarry1250 is fine for an example module?

@Ohconfucius 17 September 2013 (UTC) wrote:
*I have for some time thought such downcasing would be good for the 'pedia but lack the know-how to implement such a fix. I have already built in some case conversions into my formatting script to downcase certain combinations &ndash; mainly prepositions of fewer than 5 letters, per the MOS &ndash; but not restricted to titles. Most of my fixes are highly specific cases, such as "(\w )O(f|n|r) A(n? \w)" and "(\w )A(nd|t) T(he \w)", or "([Rr]unner)[\- ][Uu](?:ps)\b". I also downcase commonly capitalised "B(oard (?:of |))D(irectors?\W)" and "N(on[\s\-])[Ee](xecutive )D(irectors?\W)". Yet I constantly make a note of exceptions, so the rate of FP is tending to zero. My [[User:[[https://en.wikipedia.org/wiki/Ohconfucius/script/Sources|sources script]] also aligns cases for sources to the relevant WP article, even to camelcase where necessary, such as "AllMusic".<p>Sites frequently treat metadata as some sort of dustbin, and some often have FULLCAPS metadata. I suspect part of the fix could lie in https://en.wikipedia.org/wiki/WP:Reflinks not importing overcapitalised metadata 'as is'. The problem isn't going away and few people bother to fix these when they do occur; there may be the belief that we should cite sources verbatim including the errant formatting. As suggested by @Dispenser above, I assume that we can easily ask for specific cases such as acronyms that are not ordinary words to be exempted ("CAT scan" and not "CAT").

@Ohconfucius 17 September 2013 (UTC) wrote:
*I've just https://en.wikipedia.org/w/index.php?title=User_talk:Jarry1250&diff=prev&oldid=573247580 asked Jarry to help me incorporate this code.

Rjwilmsi changed the task status from Open to Stalled.Jun 19 2015, 12:15 PM

Would need to show clear MOS support and a process to identify acronyms and other exceptions before could add to main AWB.

https://en.wikipedia.org/wiki/Help:Citation_Style_1#Titles_and_chapters gives guidance to not use ALL CAPS. Maybe create a custom module that editors could choose to install, which could be used when they are manually checking their edits before saving?

@GoingBatty the custom module above does not work for you?

@Magioladitis - I asked the question because I didn't see any discussion of adding this code to AWB's custom module page. I tried to load the 30 June 2009 code as a custom module into SVN 11229 and received a complication error stating:

Line 10, col 11: [CS0535] 'AutoWikiBrowser.CustomModules.CustomModule' does not implement interface member 'WikiFunctions.Plugin.IModule.ProcessArticle(string, string, int, out string, out bool)'

@Magioladitis - I asked the question because I didn't see any discussion of adding this code to AWB's custom module page. I tried to load the 30 June 2009 code as a custom module into SVN 11229 and received a complication error stating:

Line 10, col 11: [CS0535] 'AutoWikiBrowser.CustomModules.CustomModule' does not implement interface member 'WikiFunctions.Plugin.IModule.ProcessArticle(string, string, int, out string, out bool)'

The custom module still needs the basic structure, implementing the correct functions is needed, whether or not you have extra functions

This comment was removed by Reedy.

@GoingBatty This is not a custom module. This is a function that could be used in a custom module and applied in a given text.

@Reguyla - Would it meet your needs if someone created an AWB custom module for you for this task?

@Magioladitis or @Reedy - If so, would one of you be willing to create the AWB custom module?

Thanks!

Magioladitis set Security to None.

@Reguyla - Would it meet your needs if someone created an AWB custom module for you for this task?

User account disabled, hence boldly declining this task. Feel free to set the status of this task to "Open" via the Add Action...Change Status dropdown if someone is interested in this and can answer the previous questions.