Multiple Processes
The simplest test -- running multiple instances of compute-heavy processes.
Artificial Neural Network
The FANN fixed-point ANN is an ideal test of the capabilities of the T2000 to sustain multiple compute-bound integer tasks. The gcc 4.0 compiler was used on the T2000 and the Xeon 3.2x2; gcc 3.3.6 was used on the Athon 2600. This tests a 16-4-16 bit ANN implementation of an encoder/decoder (a "MUX"), running the fixed-point ANN for 10000 iterations for each of the 16 test cases. All times are in milliseconds.In this fully integer test, we see that the T2000 achieves throughput peaking at about 7 times that of the Athon 2600. The dual Xeon 3.2 kicked butt, though. I suspect that this is because the test is very sensitive to branch prediction, speculative loading, or caching specifics, due to the fact that it performs intensive calculations using very small integer arrays. Tests on the T2000 using the Sun C 5.8 compiler that came with Solaris 10 showed poorer performance than with gcc 4.0.2. It could also have something to do with the way Solaris handles multiple processes; results using a single multi-threaded executable rather than running multiple separate executables (or running Linux on the T2000) could be different.
| threads | t2000 | athlon 2600 | xeon 3.2x2 | graph |
|---|---|---|---|---|
| 1 | 3650 | 1390 | 310 |
|
| 2 | 3630 | 2790 | 320 | |
| 4 | 3640 | 5560 | 490 | |
| 8 | 3730 | 11140 | 990 | |
| 16 | 4430 | 22490 | 1940 | |
| 32 | 6570 | 45210 | 3860 | |
| 64 | 13480 | 90260 | 7720 |
Smart Pointers
My reference-counting C++ "smart pointer" implementation
has a suite of unit tests which is a repeatable set of compute-bound
mostly integer operations, with a large cache footprint. I think
it is a realistic test of the T2000's ability to perform many parallel
tasks. This test runs the unit test suite, in parallel 1, 2, 4,
8, 16, 32 and 64 times. Results are in milliseconds of elapsed
time for all processes to complete (lower is better.)
| threads | t2000 | athlon 2600 | xeon 3.2x2 | graph |
|---|---|---|---|---|
| 1 | 3300 | 880 | 480 |
|
| 2 | 3300 | 1620 | 530 | |
| 4 | 3300 | 3180 | 890 | |
| 8 | 3400 | 6360 | 2130 | |
| 16 | 4300 | 12720 | 3580 | |
| 32 | 6300 | 25480 | 7160 | |
| 64 | 12380 | 50910 | 13950 |
It is clear that while each T2000 core is slower than the Athon 2600, the fact that there are 8 cores and 32 hardware threads quickly allows the T2000 to outstrip the performance of the single-core Athlon. In fact, performance is completely flat right up to 8 tests, and then quite flat right up to 32 threads. Only when we get to the 64-thread test does the test timing begin to become linear with the increase in number of threads. As the number of threads increase, it becomes clear that the throughput of the T2000 is about quadruple that of the Athlon 2600 on this test. It does not completely dominate the dual 3.5GHz xeon, however; they come out pretty close in throughput, and the xeon easily trounces the T2000 in the lower thread count tests.