Your hardware is the backbone of your computing power. Do you understand it?
In order to get the most out of your hardware in a parallel computing setting, there are certain subtleties to be aware of. This post will explore these by unpacking:
Physical vs Logical
Modern multicore CPUs come in two different flavours. What we refer to as a “core” can either be physical or logical.
A physical core is a distinct hardware component. It may correspond to one or more logical cores.
For instance, an Intel® Xeon® Gold 6130 processor has 16 cores and 32 threads.
In distributed computing, this can be very misleading. On Linux, using htop on a machine equipped with two of those processors shows 64 different CPUs. However, not all 64 slots have equivalent power.
Hyperthreading is a technology that allows a physical core to run multiple threads. We refer to each of these threads as a logical core. This is great for our applications because we can avoid bottlenecks, but it is important to keep in mind that these threads use the same underlying hardware to perform their calculations. This means that there is a hard limit on the speedup Hyperthreading alone can give us.
Assume we need 1 second to perform 1 trillion calculations. If our algorithm is perfect, using two threads that run on the same physical core is actually more likely to slow down our program running than speed it up. This is because the threading overhead in the core itself outweighs the bottleneck that we avoid. In other words, in this case, no matter how we code our algorithm these 1 trillion calculations will still need to be performed, and the physical core has a specific bandwidth in the number of calculations it can process per second. This is simply physics.
Again, by simple physics, employing multiple physical cores gives us more computing bandwidth. Intuitively, two physical cores can process twice as much information as one, regardless of the number of logical cores available. This means that, if we play our cards right, we can split these 1 trillion calculations into two big chunks that can be calculated concurrently by the two physical cores. If we do this perfectly, we can now complete our calculations in 0.5 seconds rather than 1.
Why is this Important?
High-performance computing software, like Octeract Engine, employs both threads and multicore processing to crunch the numbers quickly.Tweet
However, simply because there are a lot of numbers that need to be crunched, we need actual physical cores to provide the muscle behind the calculations. Threading techniques are mostly used to avoid bottlenecks and keep things running smoothly, rather than doing any of the actual heavy lifting.
In practice, this means that if we, for example, use a processor with 32 physical cores and 64 logical ones, we will start seeing diminishing returns once we exceed 32 cores. Our benchmarks with Octeract Engine indicate that we see linear speedups in most problems if we only run on physical cores. However, once we start using logical cores, the most we can gain by doubling computing power is 20%.
Interesting, so what should I be doing?
If you are running an application like Octeract Engine that can actually exploit your hardware to its fullest potential, you need to know your hardware. Look up your CPU online and see how many physical cores it has and whether it supports Hyperthreading.
For a smooth experience, we recommend running the Engine with Nu_Physical_Cores – 2. This will keep two physical cores available at all times so that you can keep using your system smoothly while your optimisation problem is being solved.