Archives for : November2013

Part One: Starting with MVVM
Part Two: The MVVM solution structure and basic framework
Part Three: Base Classes
Part 4: Sampler View, View Model and Model
Part 5: Running and working with the TPL samples
Part 6: Parallel.For Sample

In the last post we discussed where using a Parallel.For isn’t effective. The answer is fairly straightforward, Parallel.For (and by extension Parallel.ForEach) isn’t effective when you can’t give it enough work. Spinning off threads from the Thread Pool has its own overhead and if you can’t give the threads enough work it doesn’t make sense. Today we are going to discuss using Parallel.For effectively and what you have to change to convert from using a for to a Parallel.For.

GreyScaleSample.Run()

public override void Run(System.Drawing.Bitmap bmp = null, Action<string> UpdateLog = null)
{
	if(bmp == null)
		throw new InvalidOperationException("Bitmap must be defined.");
	
	Stopwatch s = new Stopwatch();
	s.Start();

	System.Drawing.Imaging.BitmapData bmData = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, System.Drawing.Imaging.PixelFormat.Format24bppRgb);
	int stride = bmData.Stride;
	System.IntPtr Scan0 = bmData.Scan0;
	unsafe
	{
		byte* p = (byte*)(void*)Scan0;
		byte red, green, blue;

		for (int y = 0; y < bmp.Height; ++y)
		{
			for (int x = 0; x < bmp.Width; ++x)
			{
				blue = p[0];
				green = p[1];
				red = p[2];

				p[0] = p[1] = p[2] = (byte)(.299 * red
					+ .587 * green
					+ .114 * blue);

				p += 3;
			}
		}
	}
	bmp.UnlockBits(bmData);

	s.Stop();
	RunTime = s.Elapsed;
}

In the above sample we iterate over the image, starting at the first row (which is Scan0 but is redefined as “p” for pixel just for clarity of the code) and then iterating over the columns in that row. A bitmap is made up of a long byte array where every three bytes is the blue, green and red colors (which seems opposite of what we expect) that make up a pixel. The width of the row is defined by the stride but this is really the same thing as the width of the bitmap. We get the RGB values and reset the pixels to the gray value of the color. We then increment the pixel by 3 (since it represents the 3 bytes of RGB) and move on to the next one.

There is some messy pointer stuff here but all-in-all the code should be clear in what we’re doing.

GreyScaleParallelSample.Run()

public override void Run(System.Drawing.Bitmap bmp = null, Action<string> UpdateLog = null)
{
	if(bmp == null)
		throw new InvalidOperationException("Bitmap must be defined.");
	
	Stopwatch s = new Stopwatch();
	s.Start();

	System.Drawing.Imaging.BitmapData bmData = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, System.Drawing.Imaging.PixelFormat.Format24bppRgb);
	int stride = bmData.Stride;
	System.IntPtr Scan0 = bmData.Scan0;
	unsafe
	{
		byte* start = (byte*)(void*)Scan0;

		int height = bmp.Height;
		int width = bmp.Width;

		Parallel.For(0, height, y =>
		{
			byte* p = start + (y * stride);
			for (int x = 0; x < width; ++x)
			{
				byte blue = p[0];
				byte green = p[1];
				byte red = p[2];

				p[0] = p[1] = p[2] = (byte)(.299 * red
					+ .587 * green
					+ .114 * blue);

				p += 3;
			}
		});
	}
	bmp.UnlockBits(bmData);

	s.Stop();
	RunTime = s.Elapsed;
}

In the Parallel.For sample things are a bit different and these differences are important.

First off we have to remember that each loop of the Parallel.For is a seperate thread. As such there can’t be any variables that will be modified that are common between the loops(at least not without using Interlocked but that’s a different post). Imagine if the pointer to the pixel was common between the threads like it is in the first sample. If the thread pool spawns off 10 threads they would all have that same initial value for the pixel. This is problematic and as such the code is changed here to recalculate the pixel at the start of the row at the beginning of each iteration.

Second we move the declaration of the bytes for blue, green and red into the inner loop. This was only done originally merely for more evident code is isn’t really a functional change.

GreyScaleDoubleParallelSample.Run()

public override void Run(System.Drawing.Bitmap bmp = null, Action<string> UpdateLog = null)
{
	if(bmp == null)
		throw new InvalidOperationException("Bitmap must be defined.");
	
	Stopwatch s = new Stopwatch();
	s.Start();

	System.Drawing.Imaging.BitmapData bmData = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, System.Drawing.Imaging.PixelFormat.Format24bppRgb);
	int stride = bmData.Stride;
	System.IntPtr Scan0 = bmData.Scan0;
	unsafe
	{
		byte* start = (byte*)(void*)Scan0;

		int height = bmp.Height;
		int width = bmp.Width;

		Parallel.For(0, height, y =>
		{
			Parallel.For(0, width, x =>
			{
				byte* p = (start + (y * stride)) + (x * 3);
				byte blue = p[0];
				byte green = p[1];
				byte red = p[2];

				p[0] = p[1] = p[2] = (byte)(.299 * red
					+ .587 * green
					+ .114 * blue);
			});
		});
	}
	bmp.UnlockBits(bmData);

	s.Stop();
	RunTime = s.Elapsed;
}

Finally we have a sample that works pretty much like LineParallelSample.Run() (except here we’re setting the pixel to gray instead of black). The code spins off a thread for each row and then within that thread spins off a thread for setting each pixel. Again, we have to move the pixel declaration internal to the inner Parallel.For since this value will be modified and must be unique to each thread.

Running the samples you will get results similar to:

Reseting Image
Starting Grey Scale Sample
Completed Grey Scale Sample
Grey Scale Sample ran in 00:00:00.0268376

Reseting Image
Starting Grey Scale Parallel Sample
Completed Grey Scale Parallel Sample
Grey Scale Parallel Sample ran in 00:00:00.0020127

Reseting Image
Starting Grey Scale Double Parallel Sample
Completed Grey Scale Double Parallel Sample
Grey Scale Double Parallel Sample ran in 00:00:00.0037469

This is the results run with the included image of my son which is 93KB, a small image.

I have another image I test against which is ~8MB. This results in:

Reseting Image
Starting Grey Scale Sample
Completed Grey Scale Sample
Grey Scale Sample ran in 00:00:02.2118701

Reseting Image
Starting Grey Scale Parallel Sample
Completed Grey Scale Parallel Sample
Grey Scale Parallel Sample ran in 00:00:00.1626661

Reseting Image
Starting Grey Scale Double Parallel Sample
Completed Grey Scale Double Parallel Sample
Grey Scale Double Parallel Sample ran in 00:00:00.2232706

You can see by these results the Parallel.For sample runs nearly 14 times faster. This is major. Now looking at the Parallel.For sample and the Double Parallel.For sample the results are actually detrimental in this case. Running the sample, as with the ParallelLine sample you don’t get any benefit to adding the interal Parallel.For just to set a pixel. Again, depending on how you use the Parallel.For, you may have an example where you can give the internal threads enough work that it may be beneficial, just not here.

Up next I’m going to add two new models showing Matrix multiplication. This sample is actually similar to the GreyScale samples here but I added it to the original source because we do a lot of matrix operations and I wanted to so a clear, real-world example that was directly applicable to the work we do.

Thanks,
Brian

The Task Parallel Library Sampler – Part 6: Parallel.For Sample

Part One: Starting with MVVM
Part Two: The MVVM solution structure and basic framework
Part Three: Base Classes
Part 4: Sampler View, View Model and Model
Part 5: Running and working with the TPL samples

In the solution directory Models, you will find the LineSample and LineParallelSample models. These are fairly straight forward samples.

LineSample.Run()

public override void Run(System.Drawing.Bitmap bmp = null, Action<string> UpdateLog = null)
{
	if(bmp == null)
		throw new InvalidOperationException("Bitmap must be defined.");
	
	double X1 = 0, Y1 = 0;
	double X2 = bmp.Width - 1, Y2 = bmp.Height - 1;

	Stopwatch s = new Stopwatch();
	s.Start();

	double slope = (Y2 - Y1) / (X2 - X1);
	double beta = Y1 - (slope * X1);

	System.Drawing.Imaging.BitmapData bmData = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, System.Drawing.Imaging.PixelFormat.Format24bppRgb);
	int stride = bmData.Stride;
	System.IntPtr Scan0 = bmData.Scan0;
	unsafe
	{
		byte* startPos = (byte*)(void*)Scan0;
		for (int x = (int)X1; x <= X2; x++)
		{
			GeneralMathOperations.DrawPixelByPointSlope(slope, beta, stride, startPos, x);
		}
	}
	bmp.UnlockBits(bmData);

	s.Stop();
	RunTime = s.Elapsed;
}

We start a stop watch. Then calculate our slope and beta for the point slope formula. Since we want to use the image so that we can compare the same operation between the two samples we have to use some pointer operations that makes all this a lot easier. Then we run through a for loop that just moves from the top left to the bottom right and sets each pixel in a line to black. We then unlock the image, stop the stop watch and then set our RunTime to the time it took for the loop to run.

LineParallelSample.Run()

public override void Run(System.Drawing.Bitmap bmp = null, Action UpdateLog = null)
{
	if(bmp == null)
		throw new InvalidOperationException("Bitmap must be defined.");

	double X1 = 0, Y1 = 0;
	double X2 = bmp.Width - 1, Y2 = bmp.Height - 1;

	Stopwatch s = new Stopwatch();
	s.Start();

	double slope = (Y2 - Y1) / (X2 - X1);
	double beta = Y1 - (slope * X1);

	System.Drawing.Imaging.BitmapData bmData = bmp.LockBits(new System.Drawing.Rectangle(0, 0, bmp.Width, bmp.Height), System.Drawing.Imaging.ImageLockMode.ReadWrite, System.Drawing.Imaging.PixelFormat.Format24bppRgb);
	int stride = bmData.Stride;
	System.IntPtr Scan0 = bmData.Scan0;
	unsafe
	{
		byte* startPos = (byte*)(void*)Scan0;
		Parallel.For((int)X1, (int)X2 + 1, x =>
		{
			GeneralMathOperations.DrawPixelByPointSlope(slope, beta, stride, startPos, x);
		});
	}
	bmp.UnlockBits(bmData);

	s.Stop();
	RunTime = s.Elapsed;
}

In LineParallelSample the only difference is the Parallel.For at line 21. Parallel.For’s first parameter is where to start, the second parameter is where to end and the third parameter is the body of the action. It’s important to remember that the “from” is inclusive meaning it includes the value and the “to” is exclusive meaning it goes to this amount doesn’t include it. That is why in LineSample we run to x <= X2 but in LineParallelSample we run to X2 + 1 which ends up being x < X2 + 1. We want to make sure we include that last pixel. What is really interesting is the results. Running the samples a few times with the included image (one of my son) you will see similar results such as these:

Reseting Image
Starting Line Sample
Completed Line Sample
Line Sample ran in 00:00:00.0009594

Reseting Image
Starting Parallel Line Sample
Completed Parallel Line Sample
Parallel Line Sample ran in 00:00:00.0009714

Changing to a much larger image you end up with similar results.
So, what is so exciting? Well, the Parallel.For doesn’t help. But, but… well, that can’t be right.
Ah, however, it is right. When using the Parallel.For and Parallel.ForEach you have to remember that there is a bit of overhead when managing the threads, spinning up the threads and context switching. I wrote this sample to explicitly show that the TPL isn’t a magic bullet. In order to maximize your use of the TPL you have to give each thread enough work. In this sample all it is doing is drawing a single point. This is pretty simple to do and the overhead of the threading doesn’t justify using the TPL in this instance.

In the next post we’ll go over the three grey scale samples where using a Parallel.For and Parallel.ForEach make a huge difference.

Thanks,
Brian

Part One: Starting with MVVM
Part Two: The MVVM solution structure and basic framework
Part Three: Base Classes
Part 4: Sampler View, View Model and Model

There is a new version of the solution available.

Finally, before getting in the actual TPL samples there is one last thing to cover. How do we run the samples? Well, as previously discussed the Submit button in SamplerView is bound to the SamplerViewModel.Submit().

public async void Submit()
{
	CurrentState = "Running";
	await Sampler.RunSamples();
	CurrentState = "Completed";
}

Using async/await, the model then runs the samples in the model via Sampler.RunSamples(). The async/await is critical here so the UI isn’t locked while the samples are run.

Here is Sampler.RunSamples()

public async Task RunSamples()
{
	await Task.Run(() =>
	{
		System.Drawing.Bitmap bmp = null;
		foreach (var sample in Samples)
		{
			if(!sample.IsEnabled)
				continue;

			if (sample.ImageRequired)
			{
				UpdateResults("Reseting Image");
				ResetDestinationImage();
				bmp = GetBitmapFromDestinationImage();
			}
			UpdateResults("Starting " + sample.SampleName);
			sample.Run(bmp, UpdateResults);
			UpdateResults("Completed " + sample.SampleName);
			UpdateResults(sample.SampleName + " ran in " + sample.RunTime.ToString() + Environment.NewLine);
		}

		if (bmp != null)
		{
			SetDestinationImageFromBitmap(bmp);
		}
	});
}

private void UpdateResults(string result)
{
	Results += result + Environment.NewLine;
}

The async keyword in the method declaration is what allows the ViewModel to run this method asynchronously. In order to define a method as asynchronous, it must contain an await call. To do this I call Task.Run with an await so the framework knows to spin off a thread and wait for it to return. There’s a bit more than that but I’ll discuss async/await in more detail in the TPL sample for it.

The really interesting thing here is passing the “UpdateResults” method into the run method of the sample model. You’ll recall in the “base classes” post that the abstract sample model takes a bitmap and Action<string>. The SamplerView has a text box that is bound to the Results property of our Sampler model. This way we can get real-time updating from the samples as they run. Since their running in their own thread (via the await Task.Run) it won’t block the UI and the binding takes care of invoking the update to the text so that it happens on the main thread without having to worry about updating the UI on the wrong thread.

And that’s it. It’s pretty simple. Next up I’ll go over the first three samples included in the solution that demonstrate when to and when not to use Parallel.For (and by extension Parallel.ForEach).

Thanks,
Brian