Smart Software Solutions Inc 108 S Pierre St.
Pierre, SD 57501
605-222-3403
sales@smartsoftwareinc.com

Contact Us

Articles

Playing with Parallel I/O - Part 2

Published 1 year ago

I'm back for Part 2 in my series on parallel I/O. Just a quick recap, in Part 1 we focused on copying a file in a single and multithreaded program. Even with our worker thread model and bigger block sizes the single thread one each round. Our 2 remaining bottlenecks appear to be context switching and disk seeking. Disk seeking is certainly the lowest hanging fruit. So let’s see what happens if we attempt to eliminate that.

The simplest solution to reducing disk latency is to use different disks/devices. So I wrote two test programs. The first will create a single thread and iterate over each device and write a file to each device. The second test will create a thread for each individual device and write a file to each device. To save time I'm starting with the worker thread model from the beginning. As far as file size and block size, I'm using the same rules as Part 1. 250MB file with 4K, 64K and 256K blocks.

Now that the tests are ready to go, I started setting up my test environment. I decided four was a reasonable number of devices for testing. On my VM, I added 3 additional virtual disks and mounted each to unique directories. Before starting my 100 iteration test I did a quick 1 iteration test to verify the test. As soon as I hit enter I realized this wasn't going to work, and sure enough the single thread finished first. Even though my virtual machine is writing to individual virtual disks, those disks are still all writing to the same hard disk on my host machine. There is a positive to this mishap. It allowed me to satisfy an additional test case. After I finished Part 1, I was curious if writing multiple files to the same disk would produce a multithreaded benefit even though the single file test didn't. Here are the results.

4KB Blocks
Threads Block Size Blocks Time(ms)
1 4096 65536 1109
4 4096 65536 3938
65KB Blocks
Threads Block Size Blocks Time(ms)
1 65536 4096 813
4 65536 4096 4064
250KB Blocks
Threads Block Size Blocks Time(ms)
1 262144 1024 805
4 262144 1024 4921

So even having each thread write its own file is still slower than 1 thread writing all the files. What if we up the number of files and or threads? For this test, I want to keep the work even among the devices, so the thread and file count needs to be a factor of 4, since that is my device count. So if I used 8 threads and 12 files, each device will have 2 threads each writing 6 files to the device. With 12 threads and 12 files, it will be 3 threads per device and each will write 4 files to the device. Here we go.

4KB Blocks
Threads Block Size Blocks Time(ms)
1 4096 65536 1436
4 4096 65536 24411
8 4096 65536 76599
12 4096 65536 82198
65KB Blocks
Threads Block Size Blocks Time(ms)
1 65536 4096 9432
4 65536 4096 25164
8 65536 4096 72946
12 65536 4096 83906
250KB Blocks
Threads Block Size Blocks Time(ms)
1 262144 1024 9415
4 262144 1024 2114
8 262144 1024 77281
12 262144 1024 86382

I think it's now safe to conclude that when using a single disk, it's going to be very difficult to improve performance with a multithreaded I/O approach.

I now have a test environment that uses 3 physical machines. I'm still using 4 total devices. On the actual test machine, I'm using the local disk and a mounted instance of TmpFS. TmpFS is a FS that operates only in RAM, this allows me to simulate 2 devices on the same machine with only 1 physical disk. The 2 remaining devices will be an SMB and NFS share with each using one of the remaining physical machines. I think we are now ready to run some tests. Just a heads up, since we are introducing network latency using SMB and NFS, our times are going to increase significantly. Luckily our focus is on the comparison between thread usages. For our first test we will copy 1 file to each device using 1 thread and 4 threads.

4KB Blocks
Threads Block Size Blocks Time(ms)
1 4096 65536 52812
4 4096 65536 46763
65KB Blocks
Threads Block Size Blocks Time(ms)
1 65536 4096 53082
4 65536 4096 47035
250KB Blocks
Threads Block Size Blocks Time(ms)
1 262144 1024 53389
4 262144 1024 47351

Finally! The multithreaded test was about 12% faster. Let’s see how it scales. I’ll increase the number of files to 12 and try again with 1, 4, 8, and 12 threads?

4KB Blocks
Threads Block Size Blocks Time(ms)
1 4096 65536 631845
4 4096 65536 552663
8 4096 65536 551202
12 4096 65536 551391
65KB Blocks
Threads Block Size Blocks Time(ms)
1 65536 4096 636366
4 65536 4096 559615
8 65536 4096 551014
12 65536 4096 550098
250KB Blocks
Threads Block Size Blocks Time(ms)
1 262144 1024 632776
4 262144 1024 552279
8 262144 1024 550813
12 262144 1024 551547

So again, the multithreaded test was about 12% faster in each test case. That came out to approximately 1 min on each test. It’s worth mentioning, the total data set of the test was 3GB. What if it was 30GB or 300GB? If that 12% scales linearly, you have yourself a massive time increase. Also note the time did go down adding additional threads, certainly not as significant as the jump from single to multiple threads, but enough to where it’s worth contemplating the additional thread management. I wish I had quick access to a physical machine with four local disks. Because I think the reason we see gains with more than 4 threads is because our network latency covers for the disk seek and context switching time. Think Big O.

So what have we learned with these tests? It really comes down to what you are trying to accomplish. If you are handling I/O to multiple disks and or machines, it would be worth spending the additional time to implement a multithread approach. If you are writing to a single local drive, stick to the KISS method.

AUTHOR Ian May

Ian May has been with Smart Software Solutions since April of 2007.  He has a B.S. in Computer Science from the University of South Dakota.  Ian has experience working with various web platforms including PHP, J2EE, and ASP.net.  His main focus is in Windows and Linux platform development.  His knowledge and skill set cover both kernel and user mode development.

Ian enjoys spending time with his wife and kids.  And if there is any free time outside of that, he loves reading and writing code.