1 Performance of Various Computers Using Standard Linear Equ
发布时间:2024-11-28
发布时间:2024-11-28
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
|||||||| CS - 89 - 85||||||||
Performance of Various Computers Using Standard Linear Equations SoftwareJack J. Dongarra*Computer Science Department University of Tennessee Knoxville, TN 37996-1301 and Mathematical Sciences Section Oak Ridge National Laboratory Oak Ridge, TN 37831 CS - 89 - 85 January 18, 2001
* Electronic mail address: dongarra@cs.utk.edu. An up-to-date version of this report can be found at http:///benchmark/performance.ps This work was supported in part by the Applied Mathematical Sciences subprogram of the O ce of Energy Research, U.S. Department of Energy, under Contract DE-AC05-96OR22464, and in part by the Science Alliance a state supported program at the University of Tennessee. 1
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
1
Performance of Various Computers Using Standard Linear Equations SoftwareComputer Science Department University of Tennessee Knoxville, TN 37996-1301 and Mathematical Sciences Section Oak Ridge National Laboratory Oak Ridge, TN 37831 January 18, 2001This report compares the performance of di erent computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scienti c workstations such as the Apollo and Sun to IBM PCs.Jack J. Dongarra
Abstract
1 Introduction and ObjectivesThe timing information presented here should in no way be used to judge the overall performance of a computer system. The results re ect only one problem area: solving dense systems of equations. This report provides performance information on a wide assortment of computers ranging from the home-used PC up to the most powerful supercomputers. The information has been collected over a period of time and will undergo change as new machines are added and as hardware and software systems improve. The programs used to generate this data can easily be obtained over the Internet. While we make every attempt to verify the results obtained from users and vendors, errors are bound to exist and should be brought to our attention. We encourage users to obtain the programs and run the routines on their machines, reporting any discrepancies with the numbers listed here. The rst table reports three numbers for each machine listed (in some cases the numbers are missing because of lack of data). All performance numbers re ect arithmetic performed in full precision (usually 64-bit), unless noted. On some machines full precision may be single precision, such as the Cray, or double precision, such as the IBM. The rst number is for the LINPACK 1] benchmark program for a matrix of order 100 in a Fortran environment. The second number is for solving a system of equations of order 1000, with no restriction on the method or its implementation. The third number is the theoretical peak performance of the machine. LINPACK programs can be characterized as having a high percentage of oating-point arithmetic operations. The routines involved in this timing study, SGEFA and SGESL, use columnori
ented algorithms. That is, the programs usually reference array elements sequentially down a
This work was supported in part by the Applied Mathematical Sciences subprogram of the O ce of Energy Research, U.S. Department of Energy, under Contract DE-AC05-84OR21400, and in part by the Science Alliance a state supported program at the University of Tennessee. An up-to-date version of this report can be found at http:///benchmark/performance.ps
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
2
column, not across a row. Column orientation is important in increasing e ciency because of the way Fortran stores arrays. Most oating-point operations in LINPACK take place in a set of subprograms, the Basic Linear Algebra Subprograms (BLAS) 3], which are called repeatedly throughout the calculation. These BLAS, referred to now as Level 1 BLAS, reference one-dimensional arrays, rather than two-dimensional arrays. In the rst case, the problem size is relatively small (order 100), and no changes were made to the LINPACK software. Moreover, no attempt was made to use special hardware features or to exploit vector capabilities or multiple processors. (The compilers on some machines may, of course, generate optimized code that itself accesses special features.) Thus, many high-performance machines may not have reached their asymptotic execution rates. In the second case, the problem size is larger (matrix of order 1000), and modifying or replacing the algorithm and software was permitted to achieve as high an execution rate as possible. Thus, the hardware had more opportunity for reaching near-asymptotic rates. An important constraint, however, was that all optimized programs maintain the same relative accuracy as standard techniques, such as Gaussian elimination used in LINPACK. Furthermore, the driver program (supplied with the LINPACK benchmark) had to be run to ensure that the same problem is solved. The driver program sets up the matrix, calls the routines to solve the problem, veri es that the answers are correct, and computes the total number of operations to solve the problem (independent of the method) as 2n3=3+ 2n2, where n= 1000. The last column is based not on an actual program run, but on a paper computation to determine the theoretical peak M op/s rate for the machine. This is the number manufacturers often cite; it represents an upper bound on performance. That is, the manufacturer guarantees that programs will not exceed this rate|sort of a\speed of light" for a given computer. The theoretical peak performance is determined by counting the number of oating-point additions and multiplications (in full precision) that can be completed during a period of time, usually the cycle time of the machine. As an example, the Cray Y-MP/8 has a cycle time of 6 ns. During a cycle the results of both an addition and a multiplication can be completed 1 cycle 2 operations= 333 M op/s on a single processor. On the Cray Y-MP/8 there are 6 ns 1 cycle 8 processors; thus, the
peak performance is 2667 M op/s. The information in this report is presented to users to provide a range of performance for the various computers and to show the e ects of typical Fortran programming and the results that can be obtained through careful programming. The maximum rate of execution is given for comparison. The column labeled\Computer" gives the name of the computer hardware on which the program was run. In some cases we have indicated the number of processors in the con guration and, in some cases, the cycle time of the processor in nanoseconds. The column labeled\LINPACK Benchmark" gives the operating system and compiler used. The run was based on two routines from LINPACK: SGEFA and SGESL were used for single precision, and DGEFA and DGESL were used for double precision. These routines perform standard LU decomposition with partial pivoting and backsubstitution. The timing was done on a matrix of order 100, where no changes are allowed to the Fortran programs. The column labeled\TPP" (Toward Peak Performance) gives the results of hand optimization; the problem size was of order 1000. The nal column labeled\Theoretical Peak" gives the maximum rate of execution based on the cycle time of the hardware. The same matrix was used to solve the system of equations. The results were checked for accuracy by calculating a residual for the problem jjAx? bjj=(jjAjjjjxjj): The term M op/s, used as a rate of execution, stands for millions of oating-point operations completed per second. For solving a system of n equations, 2=3n3+ 2n2 operations are performed
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
3
(we count both additions and multiplications). The information in the tables was compiled over a period of time. Subsequent systems software and hardware changes may alter the timings to some extent. One further note: The following tables should not be taken too seriously. In multiprogramming environments it is often di cult to reliably measure the execution time of a single program. We trust that anyone actually evaluating machines and operating systems will gather more reliable and more representative data.
2 A Look at Parallel ProcessingWhile collecting the data presented in Table 1, we were able to experiment with parallel processing on a number of computer systems. For these experiments, we used either the standard LINPACK algorithm or an algorithm based on matrix-matrix 2] techniques. In the case of the LINPACK algorithm, the loop around the SAXPY can be performed in parallel. In the matrix-matrix implementation the matrix product can be split into submatrices and performed in parallel. In either case, the parallelism follows a simple fork-and-join model where each processor gets some number of operations to perform. For a problem of size 1000, we expect a high degree of parallelism. Thus, it is not surprising that we get such high e ciency (see Table 2). The actual percentage of parallelism, of course, depends on the algorithm and on the speed of the uni
processor on the parallel part relative to the speed of the uniprocessor on the non-parallel part.
3 Highly Parallel ComputingWith the arrival of massively parallel computers there is a need to benchmark such machines on problems that make sense. The problem size and rule for the runs re ected in the Tables 1 and 2 do not permit massively parallel computers to demonstrate their potential performance. The basic aw is the problem size is too small. To provide a forum for comparing such machines the following benchmark was run on a number of massively parallel machines. The benchmark involves solving a system of linear equations (as was done in Tables 1 and 2). However in this case, the problem size is allowed to increase and the performance numbers re ect the largest problem run on the machine. The ground rules are as follows: Solve systems of linear equations by some method, allow the size of the problem to vary, and measure the execution time for each size problem. In computing the oating-point execution rate, use 2n3=3+ 2n2 operations independent of the actual method used. (If you choose to do Gaussian Elimination, partial pivoting must be used.) Compute and report a residual for the accuracy of solution as jjAx? bjj=(jjAjjjjxjj): The columns in Table 3 are de ned as follows: Rmax the performance in G op/s for the largest problem run on a machine. Nmax the size of the largest problem run on a machine. N1=2 the size where half the Rmax execution rate is achieved. Rpeak the theoretical peak performance in G op/s for the machine. In addition, the number of processors and the cycle time is listed.
4 Obtaining the Software and Running the BenchmarksThe software used to generate the data for this report can be obtained by sending electronic mail to
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
4
netlib@ornl.gov
The rst results listed in Table 1 involved no hand optimization of the LINPACK benchmark. To receive the single-precision software for this benchmark, in the mail message to netlib@ornl.gov type: send linpacks from benchmark . To receive the double-precision software for the LINPACK Benchmark, type: send linpackd from benchmark . To run the timing programs, one must supply a real function SECOND which returns the time in seconds from some xed starting time. There is only one ground rule for running this benchmark: No changes are to be made to the Fortran source code, not even changes in the comments. The compiler and operating system must be generally available. Results from a beta version of a compiler are allowed, however the standard compiler results must also be listed. The second set of results listed in Table 1 re ected user optimization of the software. To receive the single-precision software for the column labeled\Toward Peak Performance," in the mail message netlib@ornl.gov type: send 1000s from benchmark To receive the double-precision software, type: send 1000d from benchmark The ground rules for running this benchmark are as follows: Replacements
or modi cations are allowed in the routine LU. The user is allowed to supply any method for the solution of the system of equations. The M op/s rate will be computed based on the operation count for LU decomposition. In all cases, the main driver routine, with its test matrix generator and residual check, must be used. This report is updated from time to time. A fax copy of this report can be supplied, for details contact the author. To obtain a Postscript copy of the report send mail to netlib@ornl.gov and in the message type: send performance from benchmark. To have results veri ed, please send the output of the runs to Jack Dongarra Computer Science Department University of Tennessee Knoxville, TN 37996-1301 Email: dongarra@cs.utk.edu There is a\Frequently Asked Questions" le for the Linpack benchmark and Top500 at http://utk/people/JackDongarra/faq-linpack.html
4.1 LINPACK Benchmark
4.2 Toward Peak Performance
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
5
Table 1:
Performance in Solving a System of Linear EquationsComputer
\LINPACK Benchmark"\TPP"\Theoretical n= 100 Best E ort Peak" M op/s OS/Compiler M op/s n=1000, M op/s Fujitsu VPP5000/1(1 proc.3.33ns) frt -Wv,-r128 -Of -KA32 1156 8784 9600 29360 57600 Cray T932 (32 proc. 2.2 ns) Cray T928 (28 proc. 2.2 ns) 28340 50400 Cray T924 (24 proc. 2.2 ns) 26170 43200 19980 28800 Cray T916 (16 proc. 2.2 ns) Cray T916 (8 proc. 2.2 ns) 10880 14400 Cray T94 (4 proc. 2.2 ns) f90 -O3,inline2 1129 5735 7200 Cray T94 (3 proc. 2.2 ns) f90 -O3,inline2 1029 4387 5400 Cray T94 (2 proc. 2.2 ns) f90 -O3,inline2 962 2998 3600 45030 64000 NEC SX-5/16 (16 proc. 4.0 ns) NEC SX-5/8 (8 proc. 4.0 ns) 32570 64000 NEC SX-5/4 (4 proc. 4.0 ns) 19220 32000 11150 16000 NEC SX-5/2 (2 proc. 4.0 ns) R9.1 -pi -wf"-prob use" 856 7280 8000 NEC SX-5/1 (1 proc. 4.0 ns) frt -Wv,-r128 -Of -KA32 813 7091 8000 Fujitsu VPP800/1 (1 proc 4.0ns) Cray SV1-1-32 (31 proc. 300 MHz) 10910 37200 Cray SV1-1-32 (28 proc. 300 MHz) 10770 33600 10420 28800 Cray SV1-1-32 (24 proc. 300 MHz) Cray SV1-1-32 (20 proc. 300 MHz) 9945 24000 Cray SV1-1-32 (16 proc. 300 MHz) f90 -O3, inline2 751 9156 19200 Cray SV1-1-32 (12 proc. 300 MHz) f90 -O3, inline2 748 7837 14000 Cray SV1-1-32 (8 proc. 300 MHz) f90 -O3, inline2 710 6055 9600 f90 -O3,inline2 705 1603 1800 Cray T94 (1 proc. 2.2 ns) Cray T3E 1350F (16 proc 375 MHz) 3204 24000 Cray T3E 1350F (12 proc 375 MHz) 2716 18000 2518 12000 Cray T3E 1350F (8 proc 375 MHz) Cray T3E 1350F (6 proc 375 MHz) 2199 9000 Cray T3E 1350F (4 proc 375 MHz) 1797 6000 1197 3000 Cray T3E 1350F (2 proc 375 MHz) Cray T3E 1350F (1 proc 375 MHz) f90 ver. 3.5 -O3,inline2 591 728 1500 f90 -O3, inline2 596 3574 4800 Cray SV1-1-32 (4 proc. 300 MHz) NEC SX-4/32 (32 proc. 8.0 ns) 31060 64000 NEC SX-4/24 (24 proc. 8.0 ns) 27440 48000 NEC SX-4/16 (16 proc. 8.0 ns) 21470 32000 NEC SX-4/8 (8 proc. 8.0 ns) 12780 16000 NEC SX-4/4 (4 proc. 8.0 ns) 6780 8000 3570 4000 NEC SX-4/2 (2 proc. 8.0 ns) NEC SX-4/1 (1 proc. 8.0 ns) 137 R6.1 -fopp f=x inline 578 1944 2000 19
23 2668 Compaq Server DS20e(2 proc 667MHz) Compaq Server DS20e(667 MHz) 558 1025 1334 Compaq Server ES40(4 proc 667MHz) 3804 5336 1923 2668 Compaq Server ES40(2 proc 667MHz) Compaq Server ES40(1 proc 667MHz) kf77 -fkapargs='-inline=daxpy -ur=12 -ur2=320 ' -O5 -tune ev5
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
6
\LINPACK Benchmark"\TPP"\Theoretical n= 100 Best E ort Peak" M op/s OS/Compiler M op/s n=1000, M op/s -assume nounderscore 561 1031 1334 Cray SV1-1-32 (2 proc. 300 MHz) 1959 2400 f90 -O3, inline2 549 1028 1200 Cray SV1-1-32 (1 proc. 300 MHz) NEC SX-4B/2(2proc.8.8ns) 3246 3636 NEC SX-4B/1(1proc.8.8ns) R7.1 -fopp f=x inline 524 1767 1818 -O -Q -q oat=hs t IBM RS/6000 44P-170 (450 MHz) -qarch=pwr3 -qtune=pwr3 -Pv -Wp,-ea478,-g1 503 1440 1800 R7.1 -fopp f=x inline 500 980 1000 NEC SX-4/Ce (1 proc. ) Cray C90 (16 proc. 4.2 ns) CF77 5.0 -Zp -Wd-e68 479 10780 15238 HP SuperDome (16 proc 552 MHz) 12220 35328 HP SuperDome (8 proc 552 MHz) 8055 17664 HP SuperDome (4 proc 552 MHz) 4319 8832 2506 4416 HP SuperDome (2 proc 552 MHz) HP SuperDome (1 proc 552 MHz) f77+O3+Oinline=daxpy 470 1497 2208 Cray C90 (8 proc. 4.2 ns) CF77 5.0 -Zp -Wd-e68 468 6175 7619 7762 17600 HP N4000 (8 proc. 550 MHz) 4494 8800 HP N4000 (4 proc. 550 MHz) 2662 4400 HP N4000 (2 proc. 550 MHz) HP N4000 (1 proc. 550 MHz) f77+O3+Oinline=daxpy 468 1583 2200 NEC SX-4/16A(16proc.8.0ns) 20620 32000 12490 16000 NEC SX-4/8A(8proc.8.0ns) NEC SX-4/4A(4proc.8.0ns) 6692 8000 NEC SX-4/2A(2proc.8.0ns) 3525 4000 NEC SX-4/1A(1proc.8.0ns) R7.1 -fopp f=x inline 467 1929 2000 NEC SX-4B/2A (2 proc. 8.8 ns) 3204 3636 9068 35200 HP V2600 (16 proc 550 MHz) HP V2600 (8 proc 550 MHz) 6323 17600 HP V2600 (4 proc 550 MHz) 3448 8800 2030 4400 HP V2600 (2 proc 550 MHz) Hewlett-Packard V2600(550 MHz) f77+O3+Oinline=daxpy 465 1221 2200 Compaq 8400 6/575(8proc 1.7 ns) 5305 9600 4085 6900 Compaq 8400 6/575(6proc 1.7 ns) Compaq 8400 6/575(4proc 1.7 ns) 3003 4600 1615 2300 Compaq 8400 6/575(2proc 1.7 ns) Compaq 8400 6/575(1proc 1.7 ns) kf77 -fkapargs='-inline=daxpy -ur=12' -tune ev6 -O5 460 847 1150 NEC SX-4B/e (1 proc. 8.8ns) R7.1 -fopp f=x inline 454 890 909 Compaq Alpha Server DS20/500MHz kf77 -fkapargs='-inline=daxpy -ur=12' -tune ev6 -O5 440 1000 3981 6900 Compaq 8200 6/575(6proc 1.7 ns) Compaq 8200 6/575(4proc 1.7 ns) 3003 4600 1615 2300 Compaq 8200 6/575(2proc 1.7 ns) Compaq 8200 6/575(1proc 1.7 ns) kf77 -fkapargs='-inline=daxpy -ur=12' -tune ev6 -O5 431 831 1150 R7.1 -fopp f=x inline 427 1753 1818 NEC SX-4B/1A (1 proc. 8.8 ns)
Computer
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
7\LINPACK Benchmark"\TPP"\Theoretical n= 100 Best E ort Peak" M op/s OS/Compiler M op/s n=1000, M op/s 3879 6000 2101 3000 -O -Q -q oat=hs t -qarch=pwr3 -qtune=pwr3 -Pv -Wp,-ea478,-g1 426 1109 1500 3879 6000 2101 3000 -O -Q -q oat=hs t -qarch=pwr3 -qtune=pwr3 -Pv -Wp,-ea478,-g1 426 1109 1500 3879 2101 -qarch=pwr3 -qtune=pwr3 -Pv -Wp,-ea478,-g1 426 1109 3879 -O -Q -q oat=hs t -qarch=pwr3 -qtune=pwr3 -Pv -Wp,-ea478,-g1 2101 426 1109 3902 2180 -qarch=pwr3 -qtune=pw
r3 -Pv -Wp,-ea478,-g1 426 1234 3902 -O -Q -q oat=hs t -qarch=pwr3 -qtune=pwr3 -Pv -Wp,-ea478,-g1 2180 426 1234 7699 7187 5928 3728 1208 2862 20640 6000 3000 1500 6000 3000 1500 6000 3000 1500 6000 3000 1500 24000 18000 12000 6000 1500 3792 32000
Computer IBM RS/6K 44P-270(4 proc 375 MHz) IBM RS/6K 44P-270(2 proc 375 MHz) IBM RS/6K 44P-270(1 proc 375 MHz) IBM RS/6K 7026-B08(4 proc 375 MHz) IBM RS/6K 7026-B08(2 proc 375 MHz) IBM RS/6K 7026-B08(1 proc 375 MHz) IBM eServer pSeries 640 (4 proc,375MHz,4MB L2) IBM eServer pSeries 640 (2 proc,375MHz,4MB L2) IBM eServer pSeries 640 (1 proc,375MHz,4MB L2) IBM RS/6K 44P-270 (4 proc,375MHz,4MB L2) IBM RS/6K 44P-270 (2 proc,375MHz,4MB L2) IBM RS/6K 44P-270 (1 proc,375MHz,4MB L2) IBM eServer pSeries 640 (4 proc,375MHz,8MB L2) IBM eServer pSeries 640 (2 proc,375MHz,8MB L2) IBM eServer pSeries 640 (1 proc,375MHz,8MB L2) IBM RS/6K 44P-270 (4 proc,375MHz,8MB L2) IBM RS/6K 44P-270 (2 proc,375MHz,8MB L2) IBM RS/6K 44P-270 (1 proc,375MHz,8MB L2)
IBM RS/6K SP Power3(16 proc 375 MHz) IBM RS/6K SP Power3(12 proc 375 MHz) IBM RS/6K SP Power3(8 proc 375 MHz) IBM RS/6K SP Power3(4 proc 375 MHz) IBM RS/6K SP Power3(1 proc 375 MHz) -O -Q -q oat=hs t -qarch=pwr3 -qtune=pwr3 -Pv -Wp,-ea478,-g1 Cray 3-128 (4 proc. 2.11 ns) CSOS 1.0 level 129 Hitachi S-3800/480(4 proc 2 ns)
424 421
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
8
\LINPACK Benchmark"\TPP"\Theoretical n= 100 Best E ort Peak" M op/s OS/Compiler M op/s n=1000, M op/s Hitachi S-3800/380(3 proc 2 ns) 16880 24000 Hitachi S-3800/280(2 proc 2 ns) 12190 16000 OSF/1 MJ FORTRAN:V03-00 408 6431 8000 Hitachi S-3800/180(1 proc 2 ns) IBM RS/6K SP (4 proc 375 MHz) 3700 6000 IBM RS/6K SP (2 proc 375 MHz) 2166 3000 xlf 6.1.0.3 -O3 -Q -q oat=hs t IBM RS/6K SP (1 proc 375 MHz) -qarch=pwr3 -qtune=pwr3 -Pv -Wp,-ea478,-g1 409 1236 1500 CSOS 1.0 level 129 393 1622 1896 Cray 3-128 (2 proc. 2.11 ns) Cray C90 (4 proc. 4.2 ns) CF77 5.0 -Zp -Wd-e68 388 3275 3810 Cray C90 (2 proc. 4.2 ns) CF77 5.0 -Zp -Wd-e68 387 1703 1905 Cray C90 (1 proc. 4.2 ns) CF77 5.0 -Zp -Wd-e68 387 902 952 HP N4000 (8 proc. 440 MHz) 6410 14080 3724 7040 HP N4000 (4 proc. 440 MHz) HP N4000 (2 proc. 440 MHz) 2212 3520 HP N4000 (1 proc. 440 MHz) f77+O3+Oinline=daxpy 375 1290 1760 8217 28160 HP V2500 (16 proc. 440 MHz) 6914 21120 HP V2500 (12 proc. 440 MHz) 5111 14080 HP V2500 (8 proc. 440 MHz) HP V2500 (4 proc. 440 MHz) 3041 7040 HP V2500 (2 proc. 440 MHz) 1751 3520 f77+O3+Oinline=daxpy 375 1047 1760 HP V2500 (1 proc. 440 MHz) NEC SX-3/44R (4 proc. 2.5 ns) 15120 25600 NEC SX-3/42R (4 proc. 2.5 ns) 8950 12800 NEC SX-3/41R (4 proc. 2.5 ns) 4815 6400 NEC SX-3/34R (3 proc. 2.5 ns) 12730 19200 6718 9600 NEC SX-3/32R (3 proc. 2.5 ns) NEC SX-3/31R (3 proc. 2.5 ns) 3638 4800 NEC SX-3/24R (2 proc. 2.5 ns) 9454 12800 5116 6400 NEC SX-3/22R (2 proc. 2.5 ns) NEC SX-3/21R (2 proc. 2.5 ns) 2627 3200 NEC SX-3/14R (1 proc. 2.5 ns) f77sx 040 R2.2 -pi*:* 368 5199 6400 f77sx 040 R2.2 -pi*:* 368 2757 3200 NEC SX-3/12R (1 proc. 2.5 ns) Cray 3-128 (1 proc. 2
.11 ns) CSOS 1.0 level 129 327 876 948 IBM RS6000/397(160 MHz ThinNode) -qarch=pwr2 -qhot -O3 -Pv -Wp,-ea478,-g1 315 532 640 Compaq XP1000 (500 MHz) kf77 -tune ev6 -O5 -fkapargs='-inline=daxpy -ur=12' 335 1000 NEC SX-3/44 (4 proc. 2.9 ns) 13420 22000 NEC SX-3/24 (2 proc. 2.9 ns) 8149 11000 7752 11000 NEC SX-3/42 (4 proc. 2.9 ns) NEC SX-3/22 (2 proc. 2.9 ns) 4404 5500 f77sx 020 R1.13 -pi*:* 314 4511 5500 NEC SX-3/14 (1 proc. 2.9 ns) NEC SX-3/12 (1 proc. 2.9 ns) f77sx 020 R1.13 -pi*:* 313 2283 2750 DEC 8400 5/625(8 proc,612 MHz) 3608 9792 2377 4896 DEC 8400 5/625(4 proc,612 MHz)
Computer
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
9Computer\LINPACK Benchmark"\TPP"\Theoretical n= 100 Best E ort Peak" M op/s OS/Compiler M op/s n=1000, M op/s 1375 2448 f77 -O5 -fast 287 764 1224 CF77 4.0 -Zp -Wd-e68 275 2144 2667 -fast -O5 -arch ev6 -tune ev6 270 1000 2696 9792 2313 4896 1366 2448 f77 -O5 -fast 268 750 1224 -qarch=pwr2 -qhot -O3 -Pv -Wp,-ea478,-g1 265 440 540 3516 7104 3014 5328 2153 3552 1247 1776 g77 -O3 -s -funroll-loops -fomit-frame-pointer 260 557 1200 -O3 -Q -q oat=hs t -qarch=pwr3 -qtune=pwr3 -bnso -bI:/lib/syscalls.exp -Pv 250 684 888 FORTRAN77 EX/VP V11L10 249 4009 5000 kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 235 590 1000 -qarch=pwr2 -qhot -O3 -Pv -Wp,-ea478,-g1 233 406 480 -O5 -fast -tune ev56 -inline all -speculate all 227 1200 CF77 4.0 -Zp -Wd-e68 226 1159 1333 2062.0 3600 1615.0 2700 1172.0 1800 -fast -xO5 -xarch=v8plusa -xchip=ultra 208 607 900 FORTRAN77EX/VP V12L20 206 1490 1600 kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 3112 7040 kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 1945 3520 kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 1090 1760 kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 205 588 880 CF77 5.0 -Zp -Wd-e68 204 1733 2666 Fortran90/VP V10L10 203 1936 2200 Fortran90/VP V10L10 203 1936 2200 Fortran90/VP V10L10 203 1936 2200 FORTRAN77 EX/VP V12L10 203 1048 1250
DEC 8400 5/625(2 proc,612 MHz) DEC 8400 5/625(1 proc,612 MHz) Cray Y-MP/832 (8 proc. 6 ns) Compaq Alpha Server ds20/500MHz DEC 8200 5/625(8 proc,612 MHz) DEC 8200 5/625(4 proc,612 MHz) DEC 8200 5/625(2 proc,612 MHz) DEC 8200 5/625(1 proc,612 MHz) IBM RS6K/595(135 MHz WideNode) IBM RS6K SP Power3SMP(8 Proc 222 MHz) IBM RS6K SP Power3SMP(6 Proc 222 MHz) IBM RS6K SP Power3SMP(4 Proc 222 MHz) IBM RS6K SP Power3SMP(2 Proc 222 MHz) AMD Athlon (600 Mhz) IBM RS6K SP Power3SMP(1 Proc 222 MHz) Fujitsu VP2600/10 (3.2 ns) DEC 500/500 (1 proc, 500 MHz) IBM P2SC (120 MHz Thin Node) DEC PersonalWorkstation 600 Cray Y-MP/832 (4 proc. 6 ns) Sun Ultra 80(4 proc 450MHz) Sun Ultra 80(3 proc 450MHz) Sun Ultra 80(2 proc 450MHz) Sun Ultra 80 (450MHz/4MB L2$) Fujitsu VPP500/1(1 proc. 10 ns) DEC 8400 5/440(8 proc, 440 MHz) DEC 8100 5/440(4 proc, 440 MHz) DEC 8100 5/440(2 proc, 440 MHz) DEC 8100 5/440(1 proc, 440 MHz) Cray Y-MP M98 (8 proc. 6 ns) Fujitsu VX/1 (1 proc. 7 ns) Fujitsu VPP300/1 (1 proc. 7 ns) Fujitsu VPP700/1 (1 proc. 7 ns) Fujitsu VP2200/10 (3.2 ns)
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
10\LINPACK Benchmark" n= 1
00 OS/Compiler+O3+Oinline=daxpy+O3+Oinline=daxpy+O3+Oinline=daxpy+O3+Oinline=daxpy+O3+Oinline=daxpy+O3+Oinline=daxpy+O3+Oinline=daxpy+O3+Oinline=daxpy HP-UX 11.0+O3+Oinline=daxpy CSOS 1.0 level 129 f77sx 040 R2.2 -pi*:* f77sx 040 R2.2 -pi*:*+O3+Oinline=daxpy kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 -fast -xO5 -xarch=v8plusa -xchip=ultra CF77 5.0 -Zp -Wd-e68 CF77 4.0 -Zp -Wd-e68 CF77 5.0 -Zp -Wd-e68\TPP"\Theoretical Best E ort Peak" M op/s M op/s n=1000, M op/s 5935 15360 5394 13440 5202 11520 4585 9600 4125 7680 3350 4760 2414 3840 1260 1920 203 202 202 201 197 189 743 1406 1418 767 667 449 1821 1001 189 187 183 181 178 177 531 440 1841 1050 552 604 822 1114 3970 3032 1957 1074 960 1951 1600 800 944 800 3200 1600 800 800 3200 1600 800 667 940 1333 9600 4800 2400 1200
Computer HP Exemplar V-Class(16 proc.240 MHz) HP Exemplar V-Class(14 proc.240 MHz) HP Exemplar V-Class(12 proc.240 MHz) HP Exemplar V-Class(10 proc.240 MHz) HP Exemplar V-Class(8 proc.240 MHz) HP Exemplar V-Class(6 proc.240 MHz) HP Exemplar V-Class(4 proc.240 MHz) HP Exemplar V-Class(2 proc.240 MHz) HP Exemplar V-Class(1 proc.240 MHz) Cray 2S/4-128 (4 proc. 4.1 ns) NEC SX-3/11R (1 proc. 2.5 ns) NEC SX-3/1LR (1 proc. 2.5 ns) Hewlett-Packard C240 236 MHz DEC 500/400 (1 proc, 400 MHz) DEC 4100 5/400(4 proc, 400 MHz) DEC 4100 5/400(2 proc, 400 MHz) DEC 4100 5/400(1 proc, 400 MHz) DEC 1000A 5/400(1 proc, 400 MHz) Sun HPC 450 (400 MHz, 4 proc) Sun HPC 450 (400 MHz, 2 proc) Sun HPC 450 (400 MHz, 4MB L2) Cray Y-MP/832 (2 proc. 6 ns) Cray X-MP/416 (4 proc. 8.5 ns) Cray Y-MP M98 (4 proc. 6 ns) SGI Origin 2000 (300 Mhz,16 proc) SGI Origin 2000 (300 Mhz, 8 proc) SGI Origin 2000 (300 Mhz, 4 proc) SGI Origin 2000 (300 Mhz, 2 proc) SGI Origin 2000 (300 Mhz)
f77 -IPA -O3 -n32 -mips4 -r10000 -call shared -TENV:X=4 -OPT:IEEE arithmetic=3:roundo=3 -LNO:blocking=o:ou max=6:pf2=0 -INLINE:array bounds f77sx 020 R1.13 -pi*:* NEC SX-3/11 (1 proc. 2.9 ns) NEC SX-3/1L (1 proc. 2.9 ns) f77sx 020 R1.13 -pi*:* FORTRAN77 EX/VP V11L10 Fujitsu VP2400/10 (4 ns) HP Exemplar V-Class(16 proc.200 MHz) HP-UX 11.0 HP Exemplar V-Class(14 proc.200 MHz) HP-UX 11.0 HP Exemplar V-Class(12 proc.200 MHz) HP-UX 11.0
173 173 171 170
553 1223 661 1688 4832 4442 4109
600 1370 680 2000 12800 11200 8400
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
11
\LINPACK Benchmark"\TPP"\Theoretical n= 100 Best E ort Peak" M op/s OS/Compiler M op/s n=1000, M op/s HP Exemplar V-Class(10 proc.200 MHz) HP-UX 11.0 3506 8000 HP Exemplar V-Class(8 proc.200 MHz) HP-UX 11.0 3206 6400 2608 4200 HP Exemplar V-Class(6 proc.200 MHz) HP-UX 11.0 HP Exemplar V-Class(4 proc.200 MHz) HP-UX 11.0 1912 3200 HP Exemplar V-Class(2 proc.200 MHz) HP-UX 11.0 1082 1600 HP Exemplar V-Class(1 proc.200 MHz) HP-UX 11.0+O3+Oinline=daxpy 169 613 800 Cray 2S/4-128 (2 proc. 4.1 ns) CSOS 1.0
level 129 167 741 976+O3+Oinline=daxpy 166 550 800 Hewlett-Packard C200 200 MHz DEC 8400 5/350 (1 proc 350 MHz) kf77 -fkapargs='-inline=daxpy -ur3=100' -tune ev5 -O5 -assume nounderscore 164 510 700 DEC 8400 5/300 (8 proc 300 MHz) 2282 4800 1902 3600 DEC 8400 5/300 (6 proc 300 MHz) DEC 8400 5/300 (4 proc 300 MHz) 1351 2400 DEC 8400 5/300 (2 proc 300 MHz) 757 1200 CF77 5.0 -Zp -Wd-e68 161 324 333 Cray Y-MP/832 (1 proc. 6 ns) fc9.0.0.5 -tm c4 -O3 -ds Convex C4/XA-4(4 proc) (7.41 ns) -ep 4 -is . 160 2531 3240 Hewlett-Packard K460-EG 180 MHz+Oall+Oinline=daxpy 158 510 720 Hewlett-Packard C180-XP 180 MHz+Oall+Oinline=daxpy 158 480 720 SPP-UX 5.2 4609 11520 HP Exemplar S-Class (16 proc) HP Exemplar S-Class (14 proc) SPP-UX 5.2 4217 10080 HP Exemplar S-Class (12 proc) SPP-UX 5.2 4019 8640 HP Exemplar S-Class (10 proc) SPP-UX 5.2 3389 7200 HP Exemplar S-Class (8 proc) SPP-UX 5.2 2979 5760 SPP-UX 5.2 2305 4320 HP Exemplar S-Class (6 proc) HP Exemplar S-Class (4 proc) SPP-UX 5.2 1629 2880 HP Exemplar S-Class (2 proc) SPP-UX 5.2 967 1440 SPP-UX 5.2+Oall+Oinline=daxpy 156 545 720 HP Exemplar S-Class(1 proc) Sun UltraSPARC II(30 proc)336MHz 5187 20160 Sun UltraSPARC II(24 proc)336MHz 4755 16128 3981 10752 Sun UltraSPARC II(16 proc)336MHz Sun UltraSPARC II(14 proc)336MHz 3721 9408 2481 5376 Sun UltraSPARC II(8 proc)336MHz Sun UltraSPARC II(6 proc)336MHz 1990 4032 Sun UltraSPARC II(4 proc)336MHz 1438 2688 Sun UltraSPARC II(2 proc)336MHz 843 1344 Sun UltraSPARC II(1 proc)336MHz -fast -xO5 -xarch=v8plusa -xchip=ultra -o 154 461 672 CF77 5.0 -Zp -Wd-e68 154 596 666 Cray Y-MP M98 (2 proc. 6 ns) DEC AlphaStation 600 5/333 MHz -fkapargs='-inline=daxpy -ur3=100' -tune ev5 -O5 153 666 Convex C4/XA-3(3 proc) fc9.0.0.5 -tm c4 -O3 -ds (7.41 ns) -ep 3 -is . 151 1933 2430 CF77 5.0 -Zp -Wd-e68 150 307 333 Cray Y-MP M98 (1 proc. 6 ns)
Computer
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
12
\LINPACK Benchmark"\TPP"\Theoretical n= 100 Best E ort Peak" M op/s OS/Compiler M op/s n=1000, M op/s Cray Y-MP M92 (2 proc. 6 ns) CF77 5.0 -Zp -Wd-e68 145 550 666 Cray Y-MP M92 (1 proc. 6 ns) CF77 5.0 -Zp -Wd-e68 145 332 333 CF77 5.0 -Zp -Wd-e68 143 426 470 Cray X-MP/416 (2 proc. 8.5 ns) IBM RS/6000-R24 (71.5 MHz) v3.1.1 xlf -Pv -Wp,-me,-ew -O3 -qarch=pwrx -qtune=pwrx -qhot-qhs t -qnosave 142 246 284 DEC Alphastations 433 MHz f90 -O 141 866 Hewlett-Packard C160 160 MHz+Oall+Oinline=daxpy 140 421 640 -O-Pv-Wp-ea478-g1-qarch=pwrx 140 254 286 IBM POWER2-990(71.5 MHz) DEC 4100 5/300(4 proc, 300 MHz) kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 1287 2400 DEC 4100 5/300(2 proc, 300 MHz) kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 734 1200 DEC 4100 5/300(1 proc, 300 MHz) kf77 -inline=daxpy -ur=3 -fast -O5 -tune ev5 140 420 600 DEC 8400 5/350 (8 proc 350 MHz) 2853 5600 2313 4200 DEC 8400 5/350 (6 proc 350 MHz) 1678 2800 DEC 8400 5/350 (4 proc 350 MHz) 938 1400 DEC 8400 5/350 (2 proc 350 MHz) DEC 8400 5/300 (1 proc 300 MHz) -inline=daxpy -ur=3 -fast -O5 -tune ev5 140 411 600 1821 3600 DEC 8200 5/300 (6 pro
c 300 MHz) DEC 8200 5/300 (4 proc 300 MHz) 1317 2400 DEC 8200 5/300 (2 proc 300 MHz) 752 1200 DEC 8200 5/300 (1 proc 300 MHz) -inline=daxpy -ur=3 -fast -O5 -tune ev5 140 411 600 v3.1.1 xlf -Pv -Wp,-me,-ew IBM RS/6000-59H (66 MHz) -O3 -qarch=pwrx -qtune=pwrx -qhot-qhs t -qnosave 132 230 264 130 236 264 IBM POWER2 model 590(66 MHz) -O-Pv-Wp,-ea478,-g1-qarch=pwrx Convex C4/XA-2(2 proc) fc9.0.0.5 -tm c4 -O3 -ds (7.41 ns) -ep 2 -is . 129 1335 1620 CF77 6.0 -Zp -Wd-e68 2471 3200 Cray J916 (16 proc. 10 ns) Cray J916 (12 proc. 10 ns) CF77 6.0 -Zp -Wd-e68 2046 2400 CF77 6.0 -Zp -Wd-e68 1439 1600 Cray J916 (8 proc. 10 ns) Cray J916 (7 proc. 10 ns) CF77 6.0 -Zp -Wd-e68 129 1254 1400 Fujitsu VP2200/10 (4 ns) FORTRAN77 EX/VP V11L10 127 842 1000 Cray J932 (32 proc. 10 ns) cf77 (6.0) -Zp -Wd-68 4486 6400 Cray J932 (28 proc. 10 ns) cf77 (6.0) -Zp -Wd-68 4235 5600 Cray J932 (24 proc. 10 ns) cf77 (6.0) -Zp -Wd-68 3775 4800 cf77 (6.0) -Zp -Wd-68 3238 4000 Cray J932 (20 proc. 10 ns) Cray J932 (16 proc. 10 ns) cf77 (6.0) -Zp -Wd-68 2709 3200 cf77 (6.0) -Zp -Wd-68 2029 2400 Cray J932 (12 proc. 10 ns) Cray J932 (8 proc. 10 ns) cf77 (6.0) -Zp -Wd-68 1425 1600 Cray J932 (7 proc. 10 ns) cf77 (6.0) -Zp -Wd-68 126 1221 1400 SGI POWER CHALLENGE
Computer
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
13\LINPACK Benchmark" n= 100 OS/Compiler\TPP"\Theoretical Best E ort Peak" M op/s M op/s n=1000, M op/s 3240 5760 2045 1124 -non shared -OPT: IEEE arithmetic=3:roundo=3 -TENV:X=4 -col120 -WK,-ur=12, -ur2=200 -WK,-so=3,-ro=3,-o=5 -WK,-inline=daxpy:dscal:idamax -SWP:max pair candidates=2 -SWP:strict ivdep=false CF77 6.0 -Zp -Wd-e68 CF77 5.0 -Zp -Wd-e68 CSOS 1.0 level 129 -inline=daxpy -ur=3 -fast -O5 -tune ev5 cf77 (6.0) -Zp -Wd-68 569 2880 1440 720
Computer (90 MHz,16 proc) SGI POWER CHALLENGE (90 MHz,8 proc) SGI POWER CHALLENGE (90 MHz,4 proc) SGI POWER CHALLENGE (90 MHz,2 proc) SGI POWER CHALLENGE (90 MHz,1 proc)
Cray J916 (4 proc. 10 ns) Cray X-MP/416 (1 proc. 8.5 ns) Cray 2S/4-128 (1 proc. 4.1 ns) DEC 2100 5/250 (4 proc 250 MHz) DEC 2100 5/250 (2 proc 250 MHz) DEC 2100 5/250 (1 proc 250 MHz)
126 121 121 120 119 117
308 743 218 384 1022 578 317 730 3146 2182 1292 667
360 800 235 488 2000 1000 500 800 6240 3120 1560 780 390 2656 1992 1328 664 500 400 15000 12000 8000 7000 4000
Cray J932 (4 proc. 10 ns) SGI Origin 2000 (195 MHz, 16 proc) SGI Origin 2000 (195 MHz, 8 proc) SGI Origin 2000 (195 MHz, 4 proc) SGI Origin 2000 (195 MHz, 2 proc) SGI Origin 2000 -n32 -mips4 -Ofast=ip27 -TENV:X=4 (195MHz,1proc) -LNO:blocking=o:ou max=6:pf2=0 IBM RS/6000 F50 (332 MHz,4 proc) IBM RS/6000 F50 (332 MHz,3 proc) IBM RS/6000 F50 (332 MHz,2 proc) IBM RS/6000 F50 (332 MHz,1 proc) -O -qhot -qarch=ppc -q oat=hs t -Pv -Wp,-ea478, -g1 -bnso -bI:/lib/syscalls.exp -bnodelcsect Fujitsu VP2100/10 (4 ns) FORTRAN77 EX/VP V11L10 CF77 6.0 -Zp -Wd-e68 Cray J916 (2 proc. 10 ns) Sun Ultra HPC 6000(250 MHz,30 p) Sun Ultra HPC 6000(250 MHz,24 p) Sun Ultra HPC 6000(250 MHz,16 p) Sun Ultra HPC 6000(250 MHz,14 p) Sun Ultra HPC 6000
(250 MHz, 8 p)
114
344 1049 842 599 317 445 380 4755 4389 3493 3112 2038
116 112 111
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
14\LINPACK Benchmark" n= 100 OS/Compiler\TPP"\Theoretical Best E ort Peak" M op/s M op/s n=1000, M op/s 1607 3000 1126 2000
Computer
Sun Ultra HPC 6000(250 MHz, 6 p) Sun Ultra HPC 6000(250 MHz, 4 p) -fast -native -xarch=v8plusa Sun Ultra HPC 6000 (250 MHz,1MB L2) -xsafe=mem -dalign -libmil -xO5 -fsimple=2 -stackvar -xarch=v8plusa -xcache=16/32/1:512/64/1 -xchip=ultra -xdepend -xlibmil -xlibmopt -xsafe=mem -Qoption cg -Qms pipe+ oat loop ld=16 -xcross le Cray J932 (2 proc. 10 ns) cf77 (6.0) -Zp -Wd-68 Hitachi S-820/80 (4 ns) FORT77/HAP V23-0C Cray J916 (1 proc. 10 ns) CF77 6.0 -Zp -Wd-e68 Cray J932 (1 proc. 10 ns) cf77 (6.0) -Zp -Wd-68 CF77 4.0 -Zp -Wd-e68 Cray 2S/8-128 (8 proc. 4.1 ns) IBM POWER2 model 58H(55 MHz) -O-Pv-Wp-ea478-g1-qarch=pwrx SGI POWER CHALLENGE (75 MHz,18 proc) SGI POWER CHALLENGE (75 MHz,16 proc) SGI POWER CHALLENGE (75 MHz,14 proc) SGI POWER CHALLENGE (75 MHz,12 proc) SGI POWER CHALLENGE (75 MHz,10 proc) SGI POWER CHALLENGE (75 MHz,8 proc) SGI POWER CHALLENGE (75 MHz,6 proc) SGI POWER CHALLENGE (75 MHz,4 proc) SGI POWER CHALLENGE (75 MHz,2 proc) SGI POWER CHALLENGE -non shared -OPT: (75 MHz,1 proc) IEEE arithmetic=3:roundo=3 -TENV:X=4 -col120 -WK,-ur=12, -ur2=200 -WK,-so=3,-ro=3,-o=5 -WK,-inline=daxpy:dscal:idamax -SWP:max pair candidates=2 -SWP:strict ivdep=false Convex C4/XA-1(1 proc.)(7.41 ns) fc9.0.0.5 -tm c4 -O2 -is . Intel Pentium II Xeon (450 MHz) g77 -funroll-all-loops -O3 ETAV/FTN200 ETA 10-G (1 proc. 7 ns) Convex C-3880 (8 proc.) (16.7 ns) fc7.0 -tm c38 -O3 -ep 8 -ds -is . IBM ES/9000-982 VF(8 proc 7.1ns) VAST-2/VS Fortran V2R5 IBM ES/9000-972 VF(7 proc 7.1ns) VAST-2/VS Fortran V2R5
110 109 107 106 104 102 101
376 203 202 2171 197 3227 3033 2775 2499 2167 1818 1421 993 505
500 400 3000 200 200 3902 220 5400 4800 4200 3600 3000 2400 1800 1200 600
104 99 98 93 86
261 705 295 496 795 2278 2072
300 810 450 571 960 4507 3944
This report compares the performance of different computer systems in solving dense systems of linear equations. The comparison involves approximately a hundred computers, ranging from a Cray Y-MP to scientific workstations such as the Apollo and Sun to IB
January 18, 2001
15
\LINPACK Benchmark"\TPP"\Theoretical n= 100 Best E ort Peak" M op/s OS/Compiler M op/s n=1000, M op/s IBM ES/9000-962 VF(6 proc 7.1ns) VAST-2/VS Fortran V2R5 1923 3380 IBM ES/9000-952 VF(5 proc 7.1ns) VAST-2/VS Fortran V2R5 1681 2817 1377 2254 IBM ES/9000-942 VF(4 proc 7.1ns) VAST-2/VS Fortran V2R5 IBM ES/9000-831 VF(3 proc 7.1ns) VAST-2/VS Fortran V2R5 1082 1690 IBM ES/9000-821 VF(2 proc 7.1ns) VAST-2/VS Fortran V2R5 767 1127 86 422 563 IBM ES/9000-711 VF(1 proc 7.1ns) VAST-2/VS Fortran V2R5 HALstation 300 model 350(118MHz) -Kfast -Keval -KGREG -Kgs -KV8PLUS -X7 -Kpreex -Kpreload -Kfuse -x FLDFLAGS= -dn 85 177 236 SUN-Ultra 1 mod. 170 f77 v4.0 -fast -O4 76 Convex C-3840 (4 proc.) (16.7 ns) fc7.0 -tm c38 -O3 -ep 4 -ds -is . 75 425 480 HALstation 300 model 330(101MHz) -Kfast -Keval -KGREG -Kgs -KV8PLUS -X7 -Kpreex -Kpreload -Kfuse -x FLDFLAGS= -dn 72 153 202 SGI CHALLENGE/Onyx (6.6ns, 36 proc) 557 2700 SGI CHALLENGE/Onyx (6.6ns, 32 proc) 539 2400 SGI CHAL
LENGE/Onyx (6.6ns, 28 proc) 531 2100 SGI CHALLENGE/Onyx (6.6ns, 24 proc) 499 1800 SGI CHALLENGE/Onyx (6.6ns, 20 proc) 474 1500 SGI CHALLENGE/Onyx (6.6ns, 18 proc) 458 1350 SGI CHALLENGE/Onyx (6.6ns, 16 proc) 431 1200 SGI CHALLENGE/Onyx (6.6ns, 14 proc) 393 1050 SGI CHALLENGE/Onyx (6.6ns, 12 proc) 374 900 SGI CHALLENGE/Onyx (6.6ns, 10 proc) 338 750 IRIX 5.2,f77,-O2-mips2-Wo, SGI CHALLENGE/Onyx (6.6ns, 8 proc) -loopunroll,8-Olimit2000-Wf -dchacheopt-jmpopt-non shared -pfa keep-WK, -WK, -ipa=daxpy:saxpy,-ur=1,-mc=100 73 311 600 Convex C-3830 (3 proc.) (16.7 ns) fc7.0 -tm c38 -O3 -ep 3 -ds -is . 71 327 360 Sun UltraSPARC 1(24 proc)167MHz 3566 8000 3170 6667 Sun UltraSPARC 1(20 proc)167MHz Sun UltraSPARC 1(16 proc)167MHz 2761 5333 Sun UltraSPARC 1(12 proc)167MHz 2238 4000 1607 2667 Sun UltraSPARC 1(8 proc)167MHz
Computer
上一篇:数据同步原理介绍