MyraMath
|
/multifrontal
solvers (part 2) This tutorial covers more advanced use cases for the sparse direct solvers in in /multifrontal. It is recommended that you read the Tutorial for /multifrontal
solvers (part 1) first.
As mentioned in part 1, effective sparse direct solvers rely upon reordering the system for reduced fill-in. Unfortunately, pre-selecting the elimination order is in conflict with the need for dynamic partial pivoting for numerical stability. Pivoting strategies for sparse direct solvers do exist and are implemented in MyraMath, but are somewhat compromised compared to their dense equivalents.
The poorer overall stability of sparse direct solvers leads one to pursue solution improvement strategies based on either backwards refinement or preconditioned Krylov schemes. Most of the solvers in /multifrontal
offer a no-fuss .refine()
method based on backwards refinement. The signature for .refine()
is exactly the same as .solve()
, enabling drop-in replacement. Use .refine()
liberally.
MyraMath aims to support finite element substructuring methods, the design of block preconditioners for multiphysics models, and constrained optimization. Within such contexts, the need often arises to compute schur complements of the form S=B'*inv(A)*B
or S=B'*inv(A)*C
where the A,B,C
operands are sparse. There are multifrontal techniques for computing such products, and each solver exposes .schur(B)
and .schur(B,C)
methods for doing so.
Another niche calculation explored here is the explicit calculation of specific entries of inv(A)
. Generally speaking, any interesting SparseMatrix has a dense inverse, so it is intractable to tabulate every entry of inv(A)
. However, computing a subset of entries (for instance, every diagonal of inv(A)
, or a nearest neighbor stencil restricted to a boundary) is not unreasonable and might arise in the design of advanced preconditioners. All the solvers in /multifrontal
offer .inverse()
methods for these types of calculations.
Lastly, MyraMath's solvers offer a specialized .partialsolve()
routine. It is essentially a sparsity-exploiting variant of the regular .solve()
method, and is especially useful in the context of substructuring and model order reduction.
Below are links to the sections of this page:
Though sparse direct solvers are somewhat less stable as their dense equivalents, applying few steps of backwards refinement can help make them more robust. Most of the solvers in /multifrontal
provide a .refine()
method with the exact same signature/semantics as the .solve()
method. That is, pass in B
and the .refine()
will overwrite it with X
such that A*X=B
. Note that .refine()
supports the same op
and side
parameters as .solve()
, too. The excerpt below (tutorial/multifrontal/refine.cpp
) uses .refine()
on a sparse indefinite A=LDL'
decomposition and shows how it can offer improved accuracy over the basic .solve()
method:
Generally speaking, .refine()
is slower than .solve()
, because the former calls the latter on each iteration and has the added burden of forward multiplying by the original A
to form residuals. Typically only a handful of iterations is required to drive the residual to working precision, and the improved robustness is worth the cost. But some experimentation is encouraged because there are many problem-specific idiosyncrasies that influence solver performance (spectral properties, scaling/equilibration, working precision, required accuracy, etc). Finally, note that the Cholesky decompositions for positive definite systems (SparseRCholesky and SparseZCholesky) don't actually have a .refine()
method. They are naturally very stable and the basic .solve()
method is sufficient.
Like .solve()
, the .refine()
method defaults to using a single thread but this behavior can be overridden through Options.
In some contexts, the need arises to manipulate Schur complements of the form S=B'*inv(A)*B
or S=B'*inv(A)*C
where the A,B,C
operands are sparse. If only the "action" y=B'*(A\(C*x))
is required, it can be applied by cascading three steps: a sparse matvec using gemm(C)
, direct backsolution by a .solve()
method, and then a sparse matvec using gemm(B')
. This approach is commonplace, as it enables S
to be solved using Krylov techniques.
Unfortunately, if S
is very ill-conditioned the situation is messier. It might be necessary to explicitly tabulate S
to enable (i) further algebraic preconditioning or (ii) a direct S=L*U
decomposition. For the record, although clever multifrontal techniques exist for explicitly tabulating S
, it is still a costly calculation. But as long as the size of S
is no bigger than the root separator of A
, factoring A
and tabulating S
have the same asymptotic costs. This is often the case in substructuring methods for FEM, where S
represents the static condensation of 3D geometry to a 2D boundary.
The example program below (tutorial/multifrontal/schur.cpp
) demonstrates how to use the .schur()
method of a solver on a 2D structured problem.
Generally speaking, S
is dense even when the A,B,C
operands are sparse. The only structure of S
that is exploited is symmetry, in the sense that the single argument .schur(B)
method will return just one triangle of S=B'*inv(A)*B
as a LowerMatrix (costing half the memory and half the flops). The two argument .schur(B,C)
method does not make any symmetry assumptions and always returns a fully populated Matrix representing S=B'*inv(A)*C
.
The need to compute selected entries of inv(A)
occasionally arises. For example, in linear least squares solution, the variance of each unknown (a measure of merit of the fit) is encoded in the diagonal entries of the inverse of the covariance matrix (sometimes referred to as the "precision matrix"). Note that computing the (i,j)
entry of inv(A)
is closely related to the calculation of a Schur complement. If one defines Ei
as the (sparse) column vector with a 1
in the i'th position and zeros everywhere else, then inv(A)(i,j) = Ei'*inv(A)*Ej
. The process of constructing Ei
/Ej
and then forming the Schur complement Ei'*inv(A)*Ej
is encapsulated within the .inverse()
routine. The example below (tutorial/multifrontal/inverse.cpp
) shows how to use it.
If multiple entries of inv(A)
are needed, it might be possible to reduce duplication of work by computing them concurrently. Every solver provides several .inverse()
routines, each works upon a different "stencil" of (i,j)
indices:
.inverse(int ij)
, returns the ij'th diagonal entry of inv(A)
.inverse(int i, int j)
, returns the (i,j)'th entry of inv(A)
.inverse(intRange i, intRange j)
, returns a tensor product of entries from inv(A)
, as a dense Matrix .inverse(intRange ij)
, returns a symmetric tensor product of of entries from inv(A)
, as a dense LowerMatrix There are space/time tradeoffs between these routines. Some end-user experimentation is encouraged to determine which one is best for a specific problem.
The last routine discussed here, .partialsolve()
, performs sparsity-aware backsolution. It is closely related to both the .solve()
and .inverse()
methods discussed already. Much like .solve()
, the .partialsolve()
method applies the action of Z=inv(A)
to forcing data B
and populates solution data X
. However, .partialsolve()
is also like .inverse()
, in that it allows the caller to specify restrictions i
and j
, solving just over a subset of indices Z(i,j)
. If the cardinalities of indices |i|
and |j|
are smaller than the number of unknowns n
of A
, considerable flop savings can be realized. Furthermore, .partialsolve()
can do this in less space than X=gemm(solver.inverse(i,j),B)
. That's because .partialsolve()
never tabulates an explicit dense Matrix representation of Z(i,j)
, instead it just applies the action of Z(i,j)
directly to B with a small amount (Vector cardinality) of additional workspace.
The typical use case for .partialsolve()
is within a substructuring framework, where A
represents a volume discretization of a PDE and the user wishes to impose forcing data upon one surface (as encoded by the j
indices) but only observe the solution data upon another surface (as encoded by the i
indices). The example below (tutorial/multifrontal/partialsolve.cpp
) shows this workflow in action.
Note that .partialsolve()
is not without drawbacks. It is intrinsically a BLAS2-like operation and cannot achieve the same flop rates or thread-parallel efficiencies as other BLAS3-like routines, even though it requires fewer flops overall. This is not unlike the performance differences between gemv()
and gemm()
. As the number of columns of B
and X
grows, computing the .inverse(i,j)
explicitly and applying it via gemm()
will beat out multiple .partialsolve(i,j)
calls across each right hand side. Similarly, as |i|
and |j|
increase towards n
, the flop expense of .partialsolve()
will approach that of .solve()
, and might even be slower in practice due to additional symbolic overheads and data movement. For best results some user experimentation is encouraged with these routines (.solve()
, .inverse()
and .partialsolve()
, all three) on representative datasets.
Through the use of various Range types, the inplace versions of all these routines can take external data as inputs and write into external data as outputs. The example program below (tutorial/multifrontal/external2.cpp
) forms a Schur complement of the form S = B'*inv(A)*C
, by calling solver(A).schur_inplace(S,B,C)
, where A,B,C
are SparseMatrixRange's on top of CSC formatted C-style arrays, and S
is a MatrixRange on top of a column-major formatted C-style array.
Similar inplace variants of the .inverse()
and .partialsolve()
methods also exist, consult the source-level documentation for further details.
Go back to API Tutorials