LINQ group by and GroupBy

I initially starting using LINQ as it was easy to order the objects in a list without having to write a Comparer. Just write your lambda expression and BOOM!, list sorted.

I want to take this thought a step further, and as implied by the post title, do a group by.

Starting, here is an order by % 2 giving us a list of even and then odd numbers:

int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };

var orderedNumbers = from n in numbers
                     orderby n % 2 == 0 descending
                     select n;

foreach (var g in orderedNumbers)
{
    Console.Write("{0},", g);
}

This is all pretty straight forward, order by numbers that when modded by 2 are 0 and we have the numbers 4,8,6,2,0,5,1,3,9,7.

But what if I want to simply have two lists, one with evens and one with odds? That’s where group by comes in.

int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };

var numberGroups = from n in numbers
                   group n by n % 2 into g
                   select new { Remainder = g.Key, Numbers = g };

foreach (var g in numberGroups)
{
    if(g.Remainder.Equals(0))
        Console.WriteLine("Even Numbers:", g.Remainder);
    else
        Console.WriteLine("Odd Numbers:", g.Remainder);
    foreach (var n in g.Numbers)
    {
        Console.WriteLine(n);
    }
}

with the output:

Odd Numbers:
5
1
3
9
7
Even Numbers:
4
8
6
2
0

What’s happening here is that LINQ is using anonymous types to create new dictionary (actually a System.Linq.Enumerable.WhereSelectEnumerableIterator<System.Linq.IGrouping<int, int>>).

It is important to note here that the key here that everything is keyed on is the first value after the “by”.

Taking this one simple step forward let’s group a bunch of words. The following doesn’t work quite right:

string[] words = { "blueberry", "Chimpanzee", "abacus", "Banana", "apple", "cheese" };

var wordGroups = from w in words
                 group w by w[0] into g
                 select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };

foreach (var g in wordGroups)
{
    Console.WriteLine("Words that start with the letter '{0}':", g.FirstLetter);
    foreach (var w in g.Words)
    {
        Console.WriteLine(w);
    }
}

giving us the output:

Words that start with the letter 'b':
blueberry
Words that start with the letter 'c':
Chimpanzee
Words that start with the letter 'a':
abacus
apple
Words that start with the letter 'b':
Banana
Words that start with the letter 'c':
cheese

That’s because there is a bit of a red herring here. Remember that the first value after the by is what is used to group by. In our case w[0] for Chimpanzee is “C”, not c. If we change it to:

string[] words = { "blueberry", "Chimpanzee", "abacus", "Banana", "apple", "cheese" };

var wordGroups = from w in words
                 group w by w[0].ToString().ToLower() into g
                 select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };

foreach (var g in wordGroups)
{
    Console.WriteLine("Words that start with the letter '{0}':", g.FirstLetter);
    foreach (var w in g.Words)
    {
        Console.WriteLine(w);
    }
}

then we get the results we expect with:

Words that start with the letter 'b':
blueberry
Banana
Words that start with the letter 'c':
Chimpanzee
cheese
Words that start with the letter 'a':
abacus
apple

Taking this even one step further we can throw an orderby above the group and order things alphabetically:

var wordGroups = from w in words
orderby w[0].ToString().ToLower()
group w by w[0].ToString().ToLower() into g
select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };

So let’s now make this a bit over the top complex. Given the classes:

public class Customer
{
    public List<Order> Orders { get; set; }
}

public class Order
{
    public DateTime Date { get; set; }
    public int Total { get; set; }
}

lets group a customer list by customer, then by year, then by month:

List<Customer> customers = GetCustomerList();
 
var customerOrderGroups = from c in customers
                          select
                              new {c.CompanyName,
                                   YearGroups = from o in c.Orders
                                                group o by o.OrderDate.Year into yg
                                                select
                                                    new {Year = yg.Key,
                                                         MonthGroups = from o in yg
                                                         group o by o.OrderDate.Month into mg
                                                         select new { Month = mg.Key, Orders = mg }
                                                    }
                                  };

Whew! that took a lot to copy and paste from MSDN’s sample library! ;)
As mentioned previously the important part here is that the keys for these are the first value after the “by”. This just creates a bunch of dictionarys keyed embeded together keyed on the values after the “by”.

The GroupBy method that is a part of Linq can also take an IEqualityComparer. Given the comparer:

public class AnagramEqualityComparer : IEqualityComparer<string>
{
    public bool Equals(string x, string y)
    {
        return getCanonicalString(x) == getCanonicalString(y);
    }

    public int GetHashCode(string obj)
    {
        return getCanonicalString(obj).GetHashCode();
    }

    private string getCanonicalString(string word)
    {
        char[] wordChars = word.ToCharArray();
        Array.Sort<char>(wordChars);
        return new string(wordChars);
    }
}

we can find all the matching anagrams. This is possible because the IEqualityComparer compares words based on a sorted array of characters. If you take “meat” and “team” they both become “aemt” when sorted by their characters.

string[] anagrams = { "from", "salt", "earn", "last", "near", "form" };

var orderGroups = anagrams.GroupBy(
                      w => w.Trim(),
                      a => a.ToUpper(),
                      new AnagramEqualityComparer()
                  );

foreach (var group in orderGroups)
{
    Console.WriteLine("For the word "{0}" we found matches to:", group.Key);
    foreach (var word in group)
    {
        Console.WriteLine(word);
    }
}

Like the inline Linq, here the first value is the key and the second value is what to put into the list. The last value is the IEqualityComparer I mentioned earler. We don’t get double entries since “last” will match “salt” and there is no reason, therefore, to add a new key.

That’s all for now.

Brian

Comments (2)

  1. really good example…LINQ is really powerful..Thanks for sharing.

Leave a Reply