Intel’s Xeon Phi SE10P (red) beat Nvidia’s Tesla C2050 and K20 GPUs (light and dark green, respectively) in 18 out of 22 tests. The Xeon Phi also beat dual Xeon X5680s (each with six cores for 12 cores total, light blue) and dual Xeon E5-2670s (each with eight cores for 16 total, dark blue) in 15 out of 22 tests. Source: Ohio State
Nvidia’s “Cuda” cores on its Tesla coprocessor, on the other hand, do not even try to emulate the x86 instruction set, opting instead for more economical instructions that allow it to cram many more cores on a chip.
As a result, Nvidia’s Tesla has 40-times more cores (2,496) than Intel’s Xeon Phi (60). The question then becomes: “is it worth it” to rewrite x86 parallel software for Nvidia’s Cuda, in order to gain access to the thousands of more cores available with Tesla over Xeon Phi?
To find the answer, Ohio State decided to narrow down the question to the types of parallel programs scientific researchers run regularly. For the test, researchers chose the parallel processing operations routinely performed on large sparse matrices. Variously called eigensolvers, linear solvers and graph-mining algorithms, these applications encode vast parallelism into wide-dense vectors multiplied by the large sparse matrices.
The results? Xeon Phi outperformed even the fastest Tesla coprocessor–the K20 with 2,496 cores each running at .7 GHz–while using only 61 cores each running at 1.1 GHz.
Further Reading