Personal tools
You are here: Home Domain Sunfire T2000 Performance Multiple Processes
Document Actions

Multiple Processes

by Perry Kundert last modified 2006-05-18 06:50

The simplest test -- running multiple instances of compute-heavy processes.

Artificial Neural Network

The FANN fixed-point ANN is an ideal test of the capabilities of the T2000 to sustain multiple compute-bound integer tasks.  The gcc 4.0 compiler was used on the T2000 and the Xeon 3.2x2; gcc 3.3.6 was used on the Athon 2600.  This tests a 16-4-16 bit ANN implementation of an encoder/decoder (a "MUX"), running the fixed-point ANN for 10000 iterations for each of the 16 test cases.  All times are in milliseconds. 

In this fully integer test, we see that the T2000 achieves throughput peaking at about 7 times that of the Athon 2600.  The dual Xeon 3.2 kicked butt, though.  I suspect that this is because the test is very sensitive to branch prediction, speculative loading, or caching specifics, due to the fact that it performs intensive calculations using very small integer arrays.  Tests on the T2000 using the Sun C 5.8 compiler that came with Solaris 10 showed poorer performance than with gcc 4.0.2.  It could also have something to do with the way Solaris handles multiple processes; results using a single multi-threaded executable rather than running multiple separate executables (or running Linux on the T2000) could be different.
threads
t2000athlon 2600xeon 3.2x2graph
136501390310 
236302790320 
436405560490 
8373011140990 
164430224901940 
326570452103860 
6413480902607720 



Smart Pointers

My reference-counting C++ "smart pointer" implementation has a suite of unit tests which is a repeatable set of compute-bound mostly integer operations, with a large cache footprint.  I think it is a realistic test of the T2000's ability to perform many parallel tasks.  This test runs the unit test suite, in parallel 1, 2, 4, 8, 16, 32 and 64 times.  Results are in milliseconds of elapsed time for all processes to complete (lower is better.)

threadst2000athlon 2600xeon 3.2x2graph
1
3300880  480
2
33001620  530
4
33003180  890
3400 6360  2130
16
430012720  3580
32
630025480  7160
64
1238050910  13950

It is clear that while each T2000 core is slower than the Athon 2600, the fact that there are 8 cores and 32 hardware threads quickly allows the T2000 to outstrip the performance of the single-core Athlon.  In fact, performance is completely flat right up to 8 tests, and then quite flat right up to 32 threads.  Only when we get to the 64-thread test does the test timing begin to become linear with the increase in number of threads.  As the number of threads increase, it becomes clear that the throughput of the T2000 is about quadruple that of the Athlon 2600 on this test.  It does not completely dominate the dual 3.5GHz xeon, however; they come out pretty close in throughput, and the xeon easily trounces the T2000 in the lower thread count tests.


Site by Sunflower Graphics
Powered by Plone, the Open Source Content Management System

This site conforms to the following standards: