Data vs Task Parallelism in C#

Parallelism is an important concept in programming because it divides tasks and allocates those tasks to separate threads for processing. In .NET/C# for parallelism you can use the System.Threading and System.Threading.Tasks namespaces. These namespaces provide several classes and methods to help you implement parallelism in your .NET applications.

In C#, you can several types of parallelism to optimize your code and increase efficiency, but the most important ones are Data parallelism and Task parallelism.

Data Parallelism

Data parallelism is about dividing a large data set into smaller chunks and processing all the chunks in parallel. This is a common technique when there is a large number of datasets that need to be processed.

Let’s say you have an array with 100’000’000 records. If you were to fill it in using a for loop, it would look like the below:

using System;
using System.Diagnostics;


// Create a new Stopwatch object
Stopwatch stopwatch = new Stopwatch();

// Start the stopwatch
stopwatch.Start();

int[] data = new int[100000000];
// Run the for loop
for (int i = 0; i < data.Length; i++)
{
    // Code to be executed in the loop
    data[i] = i;
}

// Stop the stopwatch
stopwatch.Stop();

// Get the elapsed time as a TimeSpan
TimeSpan elapsedTime = stopwatch.Elapsed;

// Display the elapsed time in milliseconds
Console.WriteLine("Elapsed time: {0} ms", elapsedTime.TotalMilliseconds);

The above code is a simple code that declares an array with 100’000’000 items and uses a for-each loop to assign all the values. If I run the code the response will be as below:

You can see that it takes around 950 ms to accomplish the task.

By using Data parallelism, you can improve the performance of the code above. For that, you need to use Parallel.For instead of a for-loop. If I were to convert the code above to use parallelism, it would look like the below:

using System;
using System.Diagnostics;

// Create a new Stopwatch object
Stopwatch stopwatch = new Stopwatch();

// Start the stopwatch
stopwatch.Start();

int[] data = new int[100000000];

// Run the Parallel.For
Parallel.For(0, data.Length, i =>
{
    data[i] = i * i;
});

// Stop the stopwatch
stopwatch.Stop();

// Get the elapsed time as a TimeSpan
TimeSpan elapsedTime = stopwatch.Elapsed;

// Display the elapsed time in milliseconds
Console.WriteLine("Elapsed time: {0} ms", elapsedTime.TotalMilliseconds);

The result would be

You can see that using Parallel.For takes around 720ms which is less than the 950ms that the for-loop needs. This is a really simple task and you can not really see the importance of using Parallel.For, but you can imagine that in some more complex scenarios Parallel.For will be way more performant, especially, when you process data and you want that data to be processed individually.

Another example of using Parallel.For would-be to process large images more quickly by applying filters, transformations, or other operations to each pixel concurrently.

Data parallelism is also really useful for large data set processing, an example would be to use it to speed up Monte Carlo simulations, which involve a large number of random experiments to estimate statistical properties.

Task Parallelism

Task parallelism is another parallelism type and it is about running multiple, potentially unrelated tasks concurrently. Task parallelism is useful when you have multiple independent tasks that you want to execute simultaneously without having them wait for each other. For example, let’s say you want to listen to a podcast and also type a blog post.

An example of task parallelism looks like this below:

Task task1 = Task.Run(() => PerformTask("Task 1"));
Task task2 = Task.Run(() => PerformTask("Task 2"));
Task task3 = Task.Run(() => PerformTask("Task 3"));

await Task.WhenAll(task1, task2, task3);


static void PerformTask(string taskName)
{
    for (int i = 0; i < 3; i++)
    {
        Console.WriteLine($"{taskName} is executing step {i + 1}");
        Task.Delay(500).Wait();
    }
}

In the example above, three tasks are created and executed concurrently using Task.Run. The PerformTask method is called with different task names for each task. Inside the PerformTask method, we have a for-loop to show the taskName to the console and also each execution step which is 1, 2, and 3.

The await Task.WhenAll(task1, task2, task3) line ensures that the Main the method waits for all three tasks to complete before exiting the program.

The response of the code above would be like below:

Now, this is just a simple example of logging information, but Task parallelism is a widely used concept in C# as it allows you to do multiple tasks at the same time without having to block the thread and keep the user waiting for the app to response for a long time.

A good example would be file download or file processing, task parallelism allows you to download multiple files without having to wait for a file to be downloaded before you start downloading the other one. Or file processing in which case you can start uploading multiple files at the same time without waiting for the first file to be uploaded so you can start the upload of the second file.

Image by ededchechine on Freepik




Enjoyed this post? Subscribe to my YouTube channel for more great content. Your support is much appreciated. Thank you!


Check out my Udemy profile for more great content and exclusive learning resources! Thank you for your support.
Ervis Trupja - Udemy



Enjoyed this blog post? Share it with your friends and help spread the word! Don't keep all this knowledge to yourself.