Pagini
Workshops
Parteneri
Press the gas pedal of a Venom GT car to the max and you can reach a speed of over 400 km/h. Ask Andy Roddick to show you a fast serve and you will hear the sound that a tennis ball makes when flying at a speed of almost 250 km/h.
Now imagine not just two, not tens, but thousands of acceleration pedals being pressed to the floor at the same time. All in parallel! Imagine thousands of powerful tennis serves and the sound that they make. All in parallel!
No, Clarkson will not be your trainer for this workshop, nor will Andy Roddick explain to you the secret recipe for the perfect forehand.
But, as we still have to quench our thirst for speed and performance, we’ll take a look together at the hundreds or thousands of cores that your computer probably has and we will teach you the fundamentals for starting your own experiments with parallel burning cores.
Throughout the course you will learn the basics of OpenCL parallel programming paradigm with a focus on GPUs. While getting familiar with the OpenCL concepts, you will have to add OpenCL functionalities to an existing image processing C application and port the existing algorithms to run on the GPU.
Can you make it run faster? How much faster?
September 12th - September 16th 2015.
Date | Time | Room |
---|---|---|
September 12th 2015 | 10:00-13:00 | EG304 |
September 13th 2015 | 10:00-13:00 | EG304 |
September 14th 2015 | 18:00-20:30 | EG304 |
September 15th 2015 | 18:00-20:30 | EG304 |
September 16th 2015 | 18:00-20:30 | EG304 |
DAY 1
DAY 2
DAY 3
DAY 4
DAY 5
If you are interested in learning the fundamentals of OpenCL or simply eager to take a first step in the world of parallel programming with GPUs, then you're definitely part of the target audience. You are expected to be familiar with computer architecture and have good C programming knowledge.
To register for this workshop, please fill in the form. Please try to just be yourself and provide honest and simple answers. We want to get a better idea about what you already know and what you would like to learn, but also to polish the last details of the training materials according to your requirements and preference. For any questions regarding this workshop, please feel free to contact the trainer.
Registration is now closed.
The workshop is organized by ROSEdu in partnership with StreamComputing.
We, the people at StreamComputing, are crazy about speed and performance. We specialize in optimizing software, by means of GPUs, multi-core CPUs, FPGAs or any other kind of hardware that usually lays around unused by normal applications. When people need faster code, that's when we come in.
E-mail: anca@streamcomputing.eu
For some of the participants the lab sessions were simply not enough. So after the workshop we had no other option but to have a small competition for them. The participants were given a functional implementation of an algorithm in C and OpenCL. There were two goals: to get the best possible performance out of the OpenCL kernel and to get the best overall speedup for the entire application. All participants had to use the same machine and the same GPU.
And the winners are (…drumroll…): Cristi Alexandru Vasile and Costin Giorgian Papuc! Congratulations! The runner up with very close performance is Alexandru Grad.
Here are the results of our winners:
Name | Input Size | Overall Speedup | Kernel Speedup |
---|---|---|---|
Cristi Alexandru Vasile | 16K | 28.22X | 2.31X |
Cristi Alexandru Vasile | 64K | 26.16X | 2.29X |
Cristi Alexandru Vasile | 144K | 25.97X | 2.31X |
Cristi Alexandru Vasile | 256K | 25.81X | 2.51X |
Costin Giorgian Papuc | 16K | 29.18X | 2.32X |
Costin Giorgian Papuc | 64K | 26.86X | 2.29X |
Costin Giorgian Papuc | 144K | 26.98X | 2.29X |
Costin Giorgian Papuc | 256K | 26.27X | 2.36X |
The overall speedup is measured as the ratio between the execution time of the C implementation and the execution time for the OpenCL implementation. The measured execution time for the OpenCL implementation also includes the time for allocating buffers on the device, transferring the data to the device and back to the host. However, it does not include the time needed for initializing the OpenCL context and building the OpenCL kernel. The C implementation is single threaded and does not make use of SIMD instructions.