I initially starting using LINQ as it was easy to order the objects in a list without having to write a Comparer. Just write your lambda expression and BOOM!, list sorted.
I want to take this thought a step further, and as implied by the post title, do a group by.
Starting, here is an order by % 2 giving us a list of even and then odd numbers:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var orderedNumbers = from n in numbers
orderby n % 2 == 0 descending select n;
foreach (var g in orderedNumbers)
{
Console.Write("{0},", g);
}
This is all pretty straight forward, order by numbers that when modded by 2 are 0 and we have the numbers 4,8,6,2,0,5,1,3,9,7.
But what if I want to simply have two lists, one with evens and one with odds? That’s where group by comes in.
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var numberGroups = from n in numbers
group n by n % 2 into g
select new { Remainder = g.Key, Numbers = g };
foreach (var g in numberGroups)
{
if(g.Remainder.Equals(0))
Console.WriteLine("Even Numbers:", g.Remainder);
else
Console.WriteLine("Odd Numbers:", g.Remainder);
foreach (var n in g.Numbers)
{
Console.WriteLine(n);
}
}
with the output: Odd Numbers: 5 1 3 9 7 Even Numbers: 4 8 6 2 0
What’s happening here is that LINQ is using anonymous types to create new dictionary (actually a System.Linq.Enumerable.WhereSelectEnumerableIterator<System.Linq.IGrouping<int, int>>).
It is important to note here that the key here that everything is keyed on is the first value after the “by”.
Taking this one simple step forward let’s group a bunch of words. The following doesn’t work quite right:
string[] words = { "blueberry", "Chimpanzee", "abacus", "Banana", "apple", "cheese" };
var wordGroups = from w in words
group w by w[0] into g
select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };
foreach (var g in wordGroups)
{
Console.WriteLine("Words that start with the letter '{0}':", g.FirstLetter);
foreach (var w in g.Words)
{
Console.WriteLine(w);
}
}
giving us the output:
Words that start with the letter 'b':
blueberry
Words that start with the letter 'c':
Chimpanzee
Words that start with the letter 'a':
abacus
apple
Words that start with the letter 'b':
Banana
Words that start with the letter 'c':
cheese
That’s because there is a bit of a red herring here. Remember that the first value after the by is what is used to group by. In our case w[0] for Chimpanzee is “C”, not c. If we change it to:
string[] words = { "blueberry", "Chimpanzee", "abacus", "Banana", "apple", "cheese" };
var wordGroups = from w in words
group w by w[0].ToString().ToLower() into g
select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };
foreach (var g in wordGroups)
{
Console.WriteLine("Words that start with the letter '{0}':", g.FirstLetter);
foreach (var w in g.Words)
{
Console.WriteLine(w);
}
}
then we get the results we expect with:
Words that start with the letter 'b':
blueberry
Banana
Words that start with the letter 'c':
Chimpanzee
cheese
Words that start with the letter 'a':
abacus
apple
Taking this even one step further we can throw an orderby above the group and order things alphabetically:
var wordGroups = from w in words
orderby w[0].ToString().ToLower()
group w by w[0].ToString().ToLower() into g
select new { FirstLetter = g.Key.ToString().ToLower(), Words = g };
So let’s now make this a bit over the top complex. Given the classes:
public class Customer
{
public List<Order> Orders { get; set; }
}
public class Order
{
public DateTime Date { get; set; }
public int Total { get; set; }
}
lets group a customer list by customer, then by year, then by month:
List<Customer> customers = GetCustomerList();
var customerOrderGroups = from c in customers
select
new {c.CompanyName,
YearGroups = from o in c.Orders
group o by o.OrderDate.Year into yg
select
new {Year = yg.Key,
MonthGroups = from o in yg
group o by o.OrderDate.Month into mg
select new { Month = mg.Key, Orders = mg }
}
};
Whew! that took a lot to copy and paste from MSDN’s sample library! 😉
As mentioned previously the important part here is that the keys for these are the first value after the “by”. This just creates a bunch of dictionaries embedded together keyed on the values after the “by”.
The GroupBy method that is a part of Linq can also take an IEqualityComparer. Given the comparer:
public class AnagramEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return getCanonicalString(x) == getCanonicalString(y);
}
public int GetHashCode(string obj)
{
return getCanonicalString(obj).GetHashCode();
}
private string getCanonicalString(string word)
{
char[] wordChars = word.ToCharArray();
Array.Sort<char>(wordChars);
return new string(wordChars);
}
}
we can find all the matching anagrams. This is possible because the IEqualityComparer compares words based on a sorted array of characters. If you take “meat” and “team” they both become “aemt” when sorted by their characters.
string[] anagrams = { "from", "salt", "earn", "last", "near", "form" };
var orderGroups = anagrams.GroupBy(
w => w.Trim(),
a => a.ToUpper(),
new AnagramEqualityComparer()
);
foreach (var group in orderGroups)
{
Console.WriteLine("For the word "{0}" we found matches to:", group.Key);
foreach (var word in group)
{
Console.WriteLine(word);
}
}
Like the inline Linq, here the first value is the key and the second value is what to put into the list. The last value is the IEqualityComparer I mentioned earler. We don’t get double entries since “last” will match “salt” and there is no reason, therefore, to add a new key.
That’s all for now.
Brian