I initially starting using LINQ as it was easy to order the objects in a list without having to write a Comparer. Just write your lambda expression and BOOM!, list sorted.
I want to take this thought a step further, and as implied by the post title, do a group by.
Starting, here is an order by % 2 giving us a list of even and then odd numbers:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 }; var orderedNumbers = from n in numbers orderby n % 2 == 0 descending select n; foreach (var g in orderedNumbers) { Console.Write("{0},", g); }
This is all pretty straight forward, order by numbers that when modded by 2 are 0 and we have the numbers 4,8,6,2,0,5,1,3,9,7.
But what if I want to simply have two lists, one with evens and one with odds? That’s where group by comes in.
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 }; var numberGroups = from n in numbers group n by n % 2 into g select new { Remainder = g.Key, Numbers = g }; foreach (var g in numberGroups) { if(g.Remainder.Equals(0)) Console.WriteLine("Even Numbers:", g.Remainder); else Console.WriteLine("Odd Numbers:", g.Remainder); foreach (var n in g.Numbers) { Console.WriteLine(n); } }
with the output:
Odd Numbers: 5 1 3 9 7 Even Numbers: 4 8 6 2 0
What’s happening here is that LINQ is using anonymous types to create new dictionary (actually a System.Linq.Enumerable.WhereSelectEnumerableIterator<System.Linq.IGrouping<int, int>>).
It is important to note here that the key here that everything is keyed on is the first value after the “by”.
Taking this one simple step forward let’s group a bunch of words. The following doesn’t work quite right:
string[] words = { "blueberry", "Chimpanzee", "abacus", "Banana", "apple", "cheese" }; var wordGroups = from w in words group w by w[0] into g select new { FirstLetter = g.Key.ToString().ToLower(), Words = g }; foreach (var g in wordGroups) { Console.WriteLine("Words that start with the letter '{0}':", g.FirstLetter); foreach (var w in g.Words) { Console.WriteLine(w); } }
giving us the output:
Words that start with the letter 'b': blueberry Words that start with the letter 'c': Chimpanzee Words that start with the letter 'a': abacus apple Words that start with the letter 'b': Banana Words that start with the letter 'c': cheese
That’s because there is a bit of a red herring here. Remember that the first value after the by is what is used to group by. In our case w[0] for Chimpanzee is “C”, not c. If we change it to:
string[] words = { "blueberry", "Chimpanzee", "abacus", "Banana", "apple", "cheese" }; var wordGroups = from w in words group w by w[0].ToString().ToLower() into g select new { FirstLetter = g.Key.ToString().ToLower(), Words = g }; foreach (var g in wordGroups) { Console.WriteLine("Words that start with the letter '{0}':", g.FirstLetter); foreach (var w in g.Words) { Console.WriteLine(w); } }
then we get the results we expect with:
Words that start with the letter 'b': blueberry Banana Words that start with the letter 'c': Chimpanzee cheese Words that start with the letter 'a': abacus apple
Taking this even one step further we can throw an orderby above the group and order things alphabetically:
var wordGroups = from w in words orderby w[0].ToString().ToLower() group w by w[0].ToString().ToLower() into g select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };
So let’s now make this a bit over the top complex. Given the classes:
public class Customer { public List<Order> Orders { get; set; } } public class Order { public DateTime Date { get; set; } public int Total { get; set; } }
lets group a customer list by customer, then by year, then by month:
List<Customer> customers = GetCustomerList(); var customerOrderGroups = from c in customers select new {c.CompanyName, YearGroups = from o in c.Orders group o by o.OrderDate.Year into yg select new {Year = yg.Key, MonthGroups = from o in yg group o by o.OrderDate.Month into mg select new { Month = mg.Key, Orders = mg } } };
Whew! that took a lot to copy and paste from MSDN’s sample library! 😉
As mentioned previously the important part here is that the keys for these are the first value after the “by”. This just creates a bunch of dictionarys keyed embeded together keyed on the values after the “by”.
The GroupBy method that is a part of Linq can also take an IEqualityComparer. Given the comparer:
public class AnagramEqualityComparer : IEqualityComparer<string> { public bool Equals(string x, string y) { return getCanonicalString(x) == getCanonicalString(y); } public int GetHashCode(string obj) { return getCanonicalString(obj).GetHashCode(); } private string getCanonicalString(string word) { char[] wordChars = word.ToCharArray(); Array.Sort<char>(wordChars); return new string(wordChars); } }
we can find all the matching anagrams. This is possible because the IEqualityComparer compares words based on a sorted array of characters. If you take “meat” and “team” they both become “aemt” when sorted by their characters.
string[] anagrams = { "from", "salt", "earn", "last", "near", "form" }; var orderGroups = anagrams.GroupBy( w => w.Trim(), a => a.ToUpper(), new AnagramEqualityComparer() ); foreach (var group in orderGroups) { Console.WriteLine("For the word "{0}" we found matches to:", group.Key); foreach (var word in group) { Console.WriteLine(word); } }
Like the inline Linq, here the first value is the key and the second value is what to put into the list. The last value is the IEqualityComparer I mentioned earler. We don’t get double entries since “last” will match “salt” and there is no reason, therefore, to add a new key.
That’s all for now.
Brian
really good example…LINQ is really powerful..Thanks for sharing.
The best way to deal with linq groupby
http://www.srinetinfo.com/2012/12/linq-group-by.html