Somebody asked how one may count the number of floating point operations in a MATLAB program. @cise.ufl.edu>

Prior to version 6, one used to be able to do this with the command`flops`

, but this command is no longer available with the newer versions of MATLAB.`flops`

is a relic from the LINPACK days of MATLAB (LINPACK has since been replaced by LAPACK). With the use of LAPACK in MATLAB, it will be more approrpiate to use`tic`

and`toc`

to count elapsed CPU time instead (cf.`tic`

,`toc`

).

If you're interested to know why`flops`

is obsolete, you may wish to read the exchanges in NA digest regarding`flops`

.

Nevertheless, if you feel that youreallydo need a command to count floating point operations in MATLAB, what you can do is to install Tom Minka's Lightspeed MATLAB toolbox and use the flops counting operations therein.

@cise.ufl.edu>

To count flops, we need to first know what they are. What is a flop? @cise.ufl.edu>

LAPACK is not the only place where the question "what is a flop?" is

relevant. Sparse matrix codes are another. Multifrontal and supernodal

factorization algorithms store L and U (and intermediate submatrices, for

the multifrontal method) as a set of dense submatrices. It's more

efficient that way, since the dense BLAS can be used within the dense

submatrices. It is often better explicitly store some of the numerical

zeros, so that one ends up with fewer frontal matrices or supernodes.

So what happens when I compute zero times zero plus zero? Is that a flop

(or two flops)? I computed it, so one could argue that it counts. But it

was useless, so one could argue that it shouldn't count. Computing it

allowed me to use more BLAS-3, so I get a faster algorithm that happens to

do some useless flops. How do I compare the "mflop rate" of two

algorithms that make different decisions on what flops to perform and

which of those to include in the "flop count"?

A somewhat better measure would be to compare the two algorithms based an

external count. For example, the "true" flop counts for sparse LU

factorization can be computed in Matlab from the pattern of L and U as:

[L,U,P] = lu (A) ;

Lnz = full (sum (spones (L))) - 1 ; % off diagonal nz in cols of L

Unz = full (sum (spones (U')))' - 1 ; % off diagonal nz in rows of U

flops = 2*Lnz*Unz + sum (Lnz) ;

The same can be done on the LU factors found by any other factorization

code. This does count a few spurious flops, namely the computation a_ij +

l_ik*u_kj is always counted as two flops, even if a_ij is initially zero.

However, even with this "better" measure, the algorithm that does more

flops can be much faster. You're better off picking the algorithm with

the smallest memory space requirements (which is not always the smallest

nnz (L+U)) and/or fastest run time.

So my vote is to either leave out the the flop count, or at most return a

reasonable agreed-upon estimate (like the "true flop count" for LU, above)

that is somewhat independent of algorithmic details. Matrix multiply, for

example, should report 2*n^3, as Cleve states in his Winter 2000

newsletter, even though "better" methods with fewer flops (Strassen's

method) are available.

Tim Davis

University of Florida

davis@cise.ufl.edu