MyraMath
|
/multifrontal
solvers (part 1) This tutorial covers the basic steps for using the direct solvers in /multifrontal to solve A*X=B
when A
is sparse and structurally symmetric. Unlike the dense case, sparse matrix decompositions can't generally be performed in-place due to fill-in (the creation of additional nonzeroes in the L
and U
factors). Careful reordering of A
can greatly reduce fill-in and is a key step in efficient sparse direct solution.
After reordering comes a symbolic factorization step wherein the patterns of the L
and U
factors are tabulated along with predictions of memory and flop cost. If the need arises to solve multiple matrices with identical nonzero patterns and similar numerical properties, it is possible to precompute the reordering and symbolic factorization once, and then reuse it.
Numeric factorization is the most time consuming part of the direct solution process, and is internally multithreaded for improved performance. When numerical factorization is complete, the decomposed A
can be used to solve for arbitrary/multiple right hand sides B
with relatively little effort. This can be the chief advantage of using a sparse direct solver like those in /multifrontal
.
Below are links to the sections of this page:
Reordering unknowns prior to sparse factorization can have a dramatic impact on the total storage and flop requirements. Multifrontal solvers (like all the ones in MyraMath) tend to perform best with nested dissection (ND) orderings. ND is a divide and conquer strategy in which a top-level "separator" of unknowns is identified, whose removal will split the remainder into two unconnected subgroups, "left" and "right". When the unknowns are reordered into (left, right, separator), no structural fill-in can propagate between the left and right groups. The algorithm then recurses, dissecting the left group further, dissecting the right group further, and so forth.
For Pattern's arising from structured problems (like 2D/3D Cartesian grids), ND is extremely easy to perform because the separator is always a line/plane of unknowns at the "middle" of the grid. For the more important case of general unstructured Pattern's, ND is more expensive and considerably harder to implement. MyraMath provides orderings for both cases, the 2D/3D structured cases are implemented by bisect2()
and bisect3()
(in sparse/laplacian2.h
and sparse/laplacian3.h
), while the unstructured case is implemented by reorder()
(in multifrontal/symbolic/reorder.h
).
The excerpt below (tutorial/multifrontal/reorder.cpp
) applies both algorithms to a 7x7 structured grid.
(However, see Sidenote 1)
For sake of completeness, there are other reordering algorithms present in MyraMath that are based on other (non-ND) principles. There are two variants of the minimum-degree (MD) algorithm: amd()
(approximate-MD) and mmd()
(minimum-MD). When reordering/solving sparse systems that arose from a 2D/3D discretization of a partial differential equation, these algorithms are typically beaten by nested dissection. But, they are sometimes useful for other/non-geometrical sparse systems (analysis of circuit networks or unstructured graph laplacians, for instance). See multifrontal/symbolic/amd.h
and multifrontal/symbolic/mmd.h
for details.
The output of the bisect()
and reorder()
routines is a Permutation, which should be submitted to downstream symbolic/numeric factorization stages. It is anticipated that end users might wish to inject their own ordering, perhaps by calling into other third party reordering packages like Metis, AMD or Scotch. In that case, you can construct a Permutation from their output, a (permuted) index vector. See Permutation::from_perm()
and Permutation::from_iperm()
for details.
In an effort to streamline the numerical factorization and backsolution phases, sparse direct solvers perform extensive symbolic analysis that (i) explicitly tabulates all the fill-in for the L
and U
factors (ii) aggregates unknowns with similar connectivity into "supernodes" for improved BLAS3 utilization and (iii) performs dependency analysis for the purpose of exposing parallelism. In MyraMath, the symbolic factorization is encapsulated into a data structure called the AssemblyTree. An AssemblyTree is constructed from a Pattern and a Permutation, typically the output of bisect2()/bisect3()
or reorder()
. The code excerpt below (tutorial/multifrontal/symbolic.cpp
) shows the workflow to produce an AssemblyTree. Alternatively, you can access the AssemblyTree of an existing factorization (to reuse on a second A
with the same Pattern and similar numerical properties).
AssemblyTree is primarily for internal use. The only member functions of any interest to an end-user are probably .n_words()
and .n_work()
, which return space and time cost estimates for A=L*L'
or A=L*U
. For instance, the output of this example program indicates that the cost for direct factorization of a sparse matrix arising from discretizing PDE in three dimensions scales with the unknown count n as O(n^2). (Since each time the problem size doubled, the flop count quadrupled).
There are currently six different solver classes inside /multifrontal
, each is applicable to a different type of linear A*X=B
system:
SparseRCholeskySolver
, for real symmetric positive definite A
, within multifrontal/rcholesky/solver.h
SparseZCholeskySolver
, for complex hermitian positive definite A
, within multifrontal/zcholesky/solver.h
SparseRLDLTSolver
, for real symmetric indefinite A
, within multifrontal/rldlt/solver.h
SparseZLDLHSolver
, for complex hermitian indefinite A
, within multifrontal/zldlh/solver.h
SparseZLDLTSolver
, for complex symmetric A
, within multifrontal/zldlt/solver.h
SparseLUSolver
, for symmetric-pattern, nonsymmetric-valued A
, within multifrontal/lu/solver.h
SparseNormalSolver
, solves the normal equations for nonsymmetric A
with full column rank, within multifrontal/normal/solver.h
Generally speaking, solvers higher on this list are simpler, more efficient or more robust than solvers lower on this list. Pick the simplest solver that is still suitable for your A
. Note that MyraMath currently offers little functionality for systems with nonsymmetric nonzero patterns. For such systems, try using SuperLU or UMPACK instead.
Constructing any solver requires the input SparseMatrix A
, and you can also optionally specify a Permutation for reordering (from a third party package, for example) or a precomputed AssemblyTree (when solving multiple A
's with the same Pattern, for example). The code excerpt below illustrates these cases (tutorial/multifrontal/solver.cpp
):
A variety of control and tuning parameters can be injected into a solver constructor call through its trailing (and defaulted) multifrontal::Options pack. In the code excerpt above, example3d()
uses this to pass in a callback function for progress visualization. For further details, see Multifrontal Options.
Each of the solver classes listed above offers a general purpose .solve()
method for solving op(A)*X=B
or X*op(A)=B
. Regardless of the op
or side
parameters, .solve()
always has in-place semantics, which means you pass in B
and the routine overwrites it with X
. Below, tutorial/multifrontal/solve.cpp
shows how to use .solve()
:
Generally speaking, backsolution exhibits poorer thread-parallel scalability than factorization, so .solve()
runs with only a single thread by default. This can be overridden through Options, but appreciable speedup should only be expected when the number of right hand sides is comparable to the internal blocksize of the solver itself (a few dozen). At this point, the algorithm crosses over from BLAS2 dominated (memory bound) to BLAS3 dominated (compute bound).
External data can be injected into these A*X=B
solvers by using a SparseMatrixRange to wrap around the external A
stored in CSC format, and a MatrixRange to wrap around the external B
stored in column major format. The desired solution X
will overwrite B
. The example code below (tutorial/multifrontal/external1.cpp
) solves the same system as before, but uses raw C-arrays as mock "external" data:
Continue to Tutorial for /multifrontal
solvers (part 2), or go back to API Tutorials