Update
It was just pointed out to me in a comment by Jimi, that DownloadFileAsync is an event driven call and not awaitable. Though, there is a WebClient.DownloadFileTaskAsync version, which would be the appropriate one to use in this example, it is an awaitable call and returns a Task
Downloads the specified resource to a local file as an asynchronous
operation using a task object.
Original answer
I know I could run multiple threads or even parallel but what's the
best way
Yes you can make it parallel and be in control of the resources you use.
I'm not too worried about speed as long as it isn't as slow as right
now, but I don't want to overpower the device's resources such as CPU
trying to speed it up
You should be able to achieve this and configure this fairly well.
OK, so there are many ways to do this. Here are some things to think about:
- You have 1000s of IO bound tasks (as opposed to CPU bound tasks)
- With this many files, you want sort of parallelism and to be able to to configure the amount of concurrent tasks.
- You will want to do this in an
async
/ await
pattern so you're not wasting system resources on IO completion ports or smashing your CPU
Some immediate solutions:
- Tasks, and
WaitAll
in an asnyc
/ await
pattern, this is a great approach however it's a little bit trickier to limit concurrent tasks.
- You have the
Parallel.ForEach
and Parallel.For
, this has a nice approach to limit concurrent workloads, but its just not suited to IO bound tasks
- Or another option you might consider is the Microsoft Dataflow (Task Parallel Library), I have come to like these libraries a lot lately as they can give you the best of both worlds.
Please note: there are many other approaches.
So Parallel.ForEach
uses the thread pool. Moreover, IO bound operations will block those threads waiting for a device to respond and tie up resources. A general rule of thumb here is
- If you have CPU bound code,
Parallel.ForEach
is appropriate;
- Though if you have IO bound code, Asynchrony is appropriate.
In this case, downloading a file is clearly I/O, there is a DownloadFileAsync
version, and 1000 files to download, so you are best to use async
/await
pattern and some type of limit on concurrent tasks
Here is a very basic example of how you might achieve this:
Given
public class WorkLoad
{
public string Url {get;set;}
public string FileName {get;set;}
}
Dataflow example
public async Task DoWorkLoads(List<WorkLoad> workloads)
{
var options = new ExecutionDataflowBlockOptions
{
// add pepper and salt to taste
MaxDegreeOfParallelism = 50
};
// create an action block
var block = new ActionBlock<WorkLoad>(MyMethodAsync, options);
// Queue them up
foreach (var workLoad in workloads)
block.Post(workLoad );
// wait for them to finish
block.Complete();
await block.Completion;
}
...
// Notice we are using the async / await pattern
public async Task MyMethodAsync(WorkLoad workLoad)
{
try
{
Console.WriteLine("Downloading: " + workLoad.Url);
await client.DownloadFileAsync(workLoad.Url, workLoad.FileName);
}
catch (Exception)
{
// probably best to add some error checking some how
}
}
Summary
This approach gives you Asynchrony, it also gives you MaxDegreeOfParallelism
, it doesn't waste resources, and lets IO be IO
Disclaimer, DataFlow may not be where you want to be, however I just thought I'd give you some more information
Disclaimer 2, Also the above code has not been tested, I would seriously consider researching this technology first and doing your on due diligence thoroughly.
Loosely related demo here