Archives for : September2013

Fun with strings

Next up is a few fun facts with strings. My software development path went from C/C++ -> Java -> C#. While in Java strings always felt awkward. It felt strange to have to use .charAt to iterate over a char array. It also felt odd to have to use .equals on a string. I know that doesn’t make sense. You can’t exactly do == on strings in C/C++. C# took strings and made working with them a lot easier.

  1. string is colored in Visual Studio such that it appears as a reserved keyword (as opposed to the coloring of a class), almost giving the impression that it is a primitive. But string is just an alias for System.String and is treated like a reference type. (Thanks Burton!:)
  2. You can iterate over a string like an array where each element is a char with an indexer.
    string word = "asdfqwer";
    for (int i = 0; i < word.Length; i++)
    {
    	char letter = word[i];
    	Debug.WriteLine(letter);
    }
  3. Since string extends IEnumerable you can treat it as such.
    string word = "asdfqwer";
    foreach(char letter in word)
    {
    	Debug.WriteLine(letter);
    }
  4. And since it's an IEnumberable you can use LINQ on it.
    string word = "asdfqwer";
    //doesn't work, type is System.Linq.Enumerable+<ReverseIterator>d__a0`1[System.Char]
    Debug.WriteLine(word.Reverse());
    //works but is ugly
    Debug.WriteLine(word.Aggregate("", (acc, c) => c + acc));
    //works and is ugly but probably the best solution
    Debug.WriteLine(new string(word.Reverse().ToArray()));
  5. string natively supports unicode. The first example is a grapheme, which is a standard character and a modifier (in this case a combined diaeresis), as well as regular unicode characters.
    string grapheme = "u0061u0308";
    Debug.WriteLine(grapheme);
    Debug.WriteLine(grapheme.Length);
    
    string singleChar = "u00e4";
    Debug.WriteLine(singleChar);
    Debug.WriteLine(grapheme.IndexOf(singleChar));
    Debug.WriteLine(grapheme[0] == singleChar[0]);

    Output:

    ä (if you look at the html this is two symbols)
    2
    ä
    0
    False
  6. But be careful with unicode. grapheme.IndexOf(singleChar) == 0. You may think therefore that grapheme[0] == singleChar[0] but this is incorrect.
  7. strings are immutable. Once you create a string it cannot be changed. Any changes to a string are actually new instances of a string.
    //create first string
    string word = "asdfqwer";
    //create another string
    word += "zxcv";
    //and another one
    word = word.Substring(0, 8);
  8. For this reason if you are doing a lot of string concatenations you may find using a StringBuilder of benefit.
    Microsoft's warning on this:
    Although the StringBuilder class generally offers better performance than the String class, you should not automatically replace String with StringBuilder whenever you want to manipulate strings. Performance depends on the size of the string, the amount of memory to be allocated for the new string, the system on which your app is executing, and the type of operation. You should be prepared to test your app to determine whether StringBuilder actually offers a significant performance improvement.

Anyways, that's all I wanted to get out today. There is a lot more. Strings are one of those classes we all use on a regular basis. Hopefully there was something here you didn't know.

Thanks,
Brian

Addendum: Prior to .net 4.0 you could change the value of string.Empty via reflection. This no longer works as of .net 4.0. I would never recommend doing this but it is kind of funny.

typeof(string)
.GetField("Empty", BindingFlags.Static | BindingFlags.Public)
.SetValue(null, "foo");

Debug.WriteLine(string.Empty);