MyraMath
Tutorial for /multifrontal solvers (part 2)

Overview

This tutorial covers more advanced use cases for the sparse direct solvers in in /multifrontal. It is recommended that you read the Tutorial for /multifrontal solvers (part 1) first.

As mentioned in part 1, effective sparse direct solvers rely upon reordering the system for reduced fill-in. Unfortunately, pre-selecting the elimination order is in conflict with the need for dynamic partial pivoting for numerical stability. Pivoting strategies for sparse direct solvers do exist and are implemented in MyraMath, but are somewhat compromised compared to their dense equivalents.

The poorer overall stability of sparse direct solvers leads one to pursue solution improvement strategies based on either backwards refinement or preconditioned Krylov schemes. Most of the solvers in /multifrontal offer a no-fuss .refine() method based on backwards refinement. The signature for .refine() is exactly the same as .solve(), enabling drop-in replacement. Use .refine() liberally.

MyraMath aims to support finite element substructuring methods, the design of block preconditioners for multiphysics models, and constrained optimization. Within such contexts, the need often arises to compute schur complements of the form S=B'*inv(A)*B or S=B'*inv(A)*C where the A,B,C operands are sparse. There are multifrontal techniques for computing such products, and each solver exposes .schur(B) and .schur(B,C) methods for doing so.

Another niche calculation explored here is the explicit calculation of specific entries of inv(A). Generally speaking, any interesting SparseMatrix has a dense inverse, so it is intractable to tabulate every entry of inv(A). However, computing a subset of entries (for instance, every diagonal of inv(A), or a nearest neighbor stencil restricted to a boundary) is not unreasonable and might arise in the design of advanced preconditioners. All the solvers in /multifrontal offer .inverse() methods for these types of calculations.

Lastly, MyraMath's solvers offer a specialized .partialsolve() routine. It is essentially a sparsity-exploiting variant of the regular .solve() method, and is especially useful in the context of substructuring and model order reduction.

Below are links to the sections of this page:

Iterative Refinement

Though sparse direct solvers are somewhat less stable as their dense equivalents, applying few steps of backwards refinement can help make them more robust. Most of the solvers in /multifrontal provide a .refine() method with the exact same signature/semantics as the .solve() method. That is, pass in B and the .refine() will overwrite it with X such that A*X=B. Note that .refine() supports the same op and side parameters as .solve(), too. The excerpt below (tutorial/multifrontal/refine.cpp) uses .refine() on a sparse indefinite A=LDL' decomposition and shows how it can offer improved accuracy over the basic .solve() method:

2 
6 
9 #include <myramath/sparse/gemm.h>
11 
13 
14 #include <tests/myratest.h>
15 
16 #include <iostream>
17 
18 using namespace myra;
19 
20 ADD_TEST("multifrontal_refine","[multifrontal]") // int main()
21  {
22  // Make a symmetric indefinite A by shifting a laplacian.
23  // Use single precision to emphasize the need for refine()
24  SparseMatrix<float> A = laplacian2<float>(100,100);
25  int N = A.size().first;
26  for (int n = 0; n < N; ++n)
27  A(n,n) -= 2;
28  SparseRLDLTSolver<float> solver(A);
29  // Make random X, form B = A*X
30  auto X = Matrix<float>::random(N,5);
31  auto B = A*X;
32  // Check residual after using .solve()
33  auto X_solve = B;
34  solver.solve(X_solve);
35  std::cout << "|A*X-B| (solve) = " << frobenius(A*X_solve-B)/frobenius(B) << std::endl;
36  // Check residual after using .refine()
37  auto X_refine = B;
38  auto history = solver.refine(X_refine);
39  std::cout << "|A*X-B| (refine) = " << frobenius(A*X_refine-B)/frobenius(B) << std::endl;
40  using myra_stlprint::operator<<;
41  std::cout << "Residual history = " << history << std::endl;
42  }
Interface class for representing subranges of dense Matrix&#39;s.
Variety of routines for mixed dense*sparse or dense*sparse matrix multiplies. The dense*dense case is...
Routines for computing Frobenius norms of various algebraic containers.
static Matrix< Number > random(int I, int J)
Generates a random Matrix of specified size.
Definition: Matrix.cpp:353
Sparse direct solver suitable for real symmetric indefinite systems.
Definition: SparseRLDLTSolver.h:61
General purpose compressed-sparse-column (CSC) container.
Definition: syntax.dox:1
Routines for printing the contents of various std::container&#39;s to a std::ostream using operator <<...
General purpose dense matrix container, O(i*j) storage.
Sparse direct solver suitable for real symmetric indefinite systems.
Stores an IxJ matrix A in compressed sparse column format.
Definition: bothcat.h:23
std::pair< int, int > size() const
Size inspector.
Definition: SparseMatrix.cpp:201
Helper routines for reordering/filling 2D structured grids. Used by many unit tests.
Range/Iterator types associated with SparseMatrix.
|A*X-B| (solve) = 2.39045e-06
|A*X-B| (refine) = 7.5068e-08
Residual history = [ 2.39054e-06 7.40593e-08 ] (2)

Generally speaking, .refine() is slower than .solve(), because the former calls the latter on each iteration and has the added burden of forward multiplying by the original A to form residuals. Typically only a handful of iterations is required to drive the residual to working precision, and the improved robustness is worth the cost. But some experimentation is encouraged because there are many problem-specific idiosyncrasies that influence solver performance (spectral properties, scaling/equilibration, working precision, required accuracy, etc). Finally, note that the Cholesky decompositions for positive definite systems (SparseRCholesky and SparseZCholesky) don't actually have a .refine() method. They are naturally very stable and the basic .solve() method is sufficient.

Like .solve(), the .refine() method defaults to using a single thread but this behavior can be overridden through Options.

Schur Complements

In some contexts, the need arises to manipulate Schur complements of the form S=B'*inv(A)*B or S=B'*inv(A)*C where the A,B,C operands are sparse. If only the "action" y=B'*(A\(C*x)) is required, it can be applied by cascading three steps: a sparse matvec using gemm(C), direct backsolution by a .solve() method, and then a sparse matvec using gemm(B'). This approach is commonplace, as it enables S to be solved using Krylov techniques.

Unfortunately, if S is very ill-conditioned the situation is messier. It might be necessary to explicitly tabulate S to enable (i) further algebraic preconditioning or (ii) a direct S=L*U decomposition. For the record, although clever multifrontal techniques exist for explicitly tabulating S, it is still a costly calculation. But as long as the size of S is no bigger than the root separator of A, factoring A and tabulating S have the same asymptotic costs. This is often the case in substructuring methods for FEM, where S represents the static condensation of 3D geometry to a 2D boundary.

The example program below (tutorial/multifrontal/schur.cpp) demonstrates how to use the .schur() method of a solver on a 2D structured problem.

2 
7 #include <myramath/dense/gemm.h>
11 
19 
21 
22 #include <tests/myratest.h>
23 
24 #include <iostream>
25 
26 using namespace myra;
27 
28 // Layout of unknowns for laplacian2(7,7):
29 //
30 // 0 7 14 21 28 35 42
31 // 1 8 15 31 41 51 43
32 // 2 9 . 44
33 // 3 10 . 45
34 // 4 11 . 46
35 // 5 12 . 47
36 // 6 13 20 27 34 41 48
37 
38 ADD_TEST("multifrontal_schur","[multifrontal]") // int main()
39  {
40  // Form a laplacian2() matrix A and a solver for it.
41  SparseMatrix<double> A = laplacian2<double>(7,7);
43  // Form matrix B that only has nonzeros on the "top wall" of the grid, unknowns 0:7:42
45  for (int nz = 0; nz < 30; ++nz)
46  {
47  int i = random_int(7)*7;
48  int j = random_int(7);
49  double b = random<double>();
50  B(i,j) = b;
51  }
52  // Visualize the Pattern's of A and B (using PatternBuilder, it prints prettier).
53  std::cout << "A = " << std::endl << PatternBuilder(A.pattern()) << std::endl;
54  std::cout << "B = " << std::endl << PatternBuilder(B.pattern()) << std::endl;
55  // Form B'*inv(A)*B using solver.schur()
56  LowerMatrix<double> S1 = solver.schur(B.make_SparseMatrix());
57  // Form B'*inv(A)*B using dense kernels, compare.
58  Matrix<double> AA = A.make_Matrix();
59  Matrix<double> BB = B.make_Matrix();
60  Matrix<double> S2 = transpose(BB)*inverse(AA)*BB;
61  // Compare.
62  std::cout << "S1 = B'*inv(A)*B (sparse) = " << S1 << std::endl;
63  std::cout << "S2 = B'*inv(A)*B (dense) = " << S2 << std::endl;
64  double error = frobenius(S1.make_Matrix('T')-S2);
65  std::cout << "|S1-S2| = " << error << std::endl;
66  };
Interface class for representing subranges of dense Matrix&#39;s.
Tabulates an IxJ matrix. Allows random access, has column major layout to be compatible with BLAS/LAP...
Definition: bdsqr.h:20
Routines for computing Frobenius norms of various algebraic containers.
Builder class for a sparse nonzero pattern, used in reordering/symbolic analysis. ...
General purpose compressed-sparse-column (CSC) container.
Range construct for a lower triangular matrix stored in rectangular packed format.
Definition: syntax.dox:1
Specialized container for a lower triangular matrix, O(N^2/2) storage. Used by symmetry exploiting ma...
Returns a transposed copy of a Matrix. The inplace version only works on a square operand...
PatternRange pattern() const
Returns the nonzero Pattern over all of A(:,:)
Definition: SparseMatrix.cpp:221
General purpose dense matrix container, O(i*j) storage.
Range/Iterator types associated with Pattern.
Sparse direct solver suitable for real symmetric positive definite systems.
Like Pattern, but easier to populate via insert()/erase() methods.
Definition: PatternBuilder.h:31
Container class for a sparse nonzero pattern, used in reordering/symbolic analysis.
Overwrites a LowerMatrix, DiagonalMatrix, or square Matrix with its own inverse. Or, returns it as a copy.
Like SparseMatrix, but easier to populate via random access (i,j) operator.
Definition: SparseMatrix.h:32
Sparse direct solver suitable for real symmetric positive definite systems.
Definition: SparseRCholeskySolver.h:61
Convenience type for building SparseMatrix&#39;s, allow random access without fussing with upfront constr...
Simplistic random number functions.
Stores a lower triangular matrix in rectangular packed format.
Definition: conjugate.h:22
Stores an IxJ matrix A in compressed sparse column format.
Definition: bothcat.h:23
Helper routines for reordering/filling 2D structured grids. Used by many unit tests.
Variety of routines all for dense Matrix*Matrix multiplies. Delegates to the BLAS.
Range/Iterator types associated with SparseMatrix.
Matrix< Number > make_Matrix() const
Accumulates *this onto a Matrix<Number>.
Definition: SparseMatrix.cpp:581
A =
size 49 by 49 PatternBuilder:
[ x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - x x - - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ x - - - - - - x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - x - - - - - x x - - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - x - - - - - - x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - x - - - - - x x - - - - - - x - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - x - - - - - - x x - - - - - x - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - x - - - - - x x - - - - - - x - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - x - - - - - - x x - - - - - x - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x - - - - - - x - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - - x x - - - - - x - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x - - - - - - x ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - - x x - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x ]
B =
size 49 by 7 PatternBuilder:
[ - - - - x - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - x - - - x x ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - x - - x x - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - x x x x x ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ x - x - - x - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ x x - - x - x ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - x x x - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
S1 = B'*inv(A)*B (sparse) = size 7 by 7 LowerMatrix of double:
[ 0.168059 * * * * * * ]
[ -0.164305 0.853895 * * * * * ]
[ 0.001834 -0.0156 0.0062586 * * * * ]
[ 0.0589205 0.0332125 -0.02354 0.402193 * * * ]
[ -0.0350081 0.144599 -0.0243758 0.255877 0.625003 * * ]
[ 0.0830003 -0.481278 0.0257226 0.197717 -0.0350709 0.656239 * ]
[ 0.0740291 -0.241743 0.0148005 -0.0938633 -0.151868 0.137609 0.122292 ]
S2 = B'*inv(A)*B (dense) = size 7 by 7 Matrix of double:
[ 0.168059 -0.164305 0.001834 0.0589205 -0.0350081 0.0830003 0.0740291 ]
[ -0.164305 0.853895 -0.0156 0.0332125 0.144599 -0.481278 -0.241743 ]
[ 0.001834 -0.0156 0.0062586 -0.02354 -0.0243758 0.0257226 0.0148005 ]
[ 0.0589205 0.0332125 -0.02354 0.402193 0.255877 0.197717 -0.0938633 ]
[ -0.0350081 0.144599 -0.0243758 0.255877 0.625003 -0.0350709 -0.151868 ]
[ 0.0830003 -0.481278 0.0257226 0.197717 -0.0350709 0.656239 0.137609 ]
[ 0.0740291 -0.241743 0.0148005 -0.0938633 -0.151868 0.137609 0.122292 ]
|S1-S2| = 4.64229e-16

Generally speaking, S is dense even when the A,B,C operands are sparse. The only structure of S that is exploited is symmetry, in the sense that the single argument .schur(B) method will return just one triangle of S=B'*inv(A)*B as a LowerMatrix (costing half the memory and half the flops). The two argument .schur(B,C) method does not make any symmetry assumptions and always returns a fully populated Matrix representing S=B'*inv(A)*C.

Computing Entries of inv(A)

The need to compute selected entries of inv(A) occasionally arises. For example, in linear least squares solution, the variance of each unknown (a measure of merit of the fit) is encoded in the diagonal entries of the inverse of the covariance matrix (sometimes referred to as the "precision matrix"). Note that computing the (i,j) entry of inv(A) is closely related to the calculation of a Schur complement. If one defines Ei as the (sparse) column vector with a 1 in the i'th position and zeros everywhere else, then inv(A)(i,j) = Ei'*inv(A)*Ej. The process of constructing Ei/Ej and then forming the Schur complement Ei'*inv(A)*Ej is encapsulated within the .inverse() routine. The example below (tutorial/multifrontal/inverse.cpp) shows how to use it.

5 #include <myramath/dense/diag.h>
8 
12 
14 
15 #include <tests/myratest.h>
16 
17 #include <iostream>
18 
19 using namespace myra;
20 
21 ADD_TEST("multifrontal_inverse","[multifrontal]") // int main()
22  {
23  // Form a laplacian2() matrix A and a solver for it.
24  SparseMatrix<double> A = laplacian2<double>(5,5);
26  // Fill a DiagonalMatrix D2 with every diagonal entry of inv(A)
27  DiagonalMatrix<double> D1 = diag(inverse(A.make_Matrix()));
28  // Repeat calculation, but using solver.inverse() method.
30  for (int n = 0; n < D2.size(); ++n)
31  D2(n) = solver.inverse(n);
32  // Compare two results.
33  std::cout << "D1 = diag(inv(A)) (dense ) = " << D1 << std::endl;
34  std::cout << "D2 = diag(inv(A)) (sparse) = " << D2 << std::endl;
35  double error = frobenius(D1-D2);
36  std::cout << "|D1-D2| = " << error << std::endl;
37  };
Interface class for representing subranges of DiagonalMatrix&#39;s.
int size() const
Size inspector.
Definition: DiagonalMatrix.cpp:101
Tabulates the values of a square NxN diagonal matrix. Allows random access, but only on the diagonal...
Definition: conjugate.h:23
Interface class for representing subranges of dense Matrix&#39;s.
Routines for computing Frobenius norms of various algebraic containers.
General purpose compressed-sparse-column (CSC) container.
Definition: syntax.dox:1
General purpose dense matrix container, O(i*j) storage.
Sparse direct solver suitable for real symmetric positive definite systems.
Returns the diagonal of a dense Matrix or LowerMatrix.
Overwrites a LowerMatrix, DiagonalMatrix, or square Matrix with its own inverse. Or, returns it as a copy.
Sparse direct solver suitable for real symmetric positive definite systems.
Definition: SparseRCholeskySolver.h:61
Container for a diagonal matrix, O(n) storage. Used by SVD, row/column scaling, etc.
Stores an IxJ matrix A in compressed sparse column format.
Definition: bothcat.h:23
Helper routines for reordering/filling 2D structured grids. Used by many unit tests.
Range/Iterator types associated with SparseMatrix.
Matrix< Number > make_Matrix() const
Accumulates *this onto a Matrix<Number>.
Definition: SparseMatrix.cpp:581
D1 = diag(inv(A)) (dense ) = size 25 DiagonalMatrix of double: [ 0.421266 0.332583 0.324223 0.332583 0.421266 0.332583 0.267952 0.262229 0.267952 0.332583 0.324223 0.262229 0.257087 0.262229 0.324223 0.332583 0.267952 0.262229 0.267952 0.332583 0.421266 0.332583 0.324223 0.332583 0.421266 ]
D2 = diag(inv(A)) (sparse) = size 25 DiagonalMatrix of double: [ 0.421266 0.332583 0.324223 0.332583 0.421266 0.332583 0.267952 0.262229 0.267952 0.332583 0.324223 0.262229 0.257087 0.262229 0.324223 0.332583 0.267952 0.262229 0.267952 0.332583 0.421266 0.332583 0.324223 0.332583 0.421266 ]
|D1-D2| = 5.43896e-16

If multiple entries of inv(A) are needed, it might be possible to reduce duplication of work by computing them concurrently. Every solver provides several .inverse() routines, each works upon a different "stencil" of (i,j) indices:

There are space/time tradeoffs between these routines. Some end-user experimentation is encouraged to determine which one is best for a specific problem.

Partial Solve Routines

The last routine discussed here, .partialsolve(), performs sparsity-aware backsolution. It is closely related to both the .solve() and .inverse() methods discussed already. Much like .solve(), the .partialsolve() method applies the action of Z=inv(A) to forcing data B and populates solution data X. However, .partialsolve() is also like .inverse(), in that it allows the caller to specify restrictions i and j, solving just over a subset of indices Z(i,j). If the cardinalities of indices |i| and |j| are smaller than the number of unknowns n of A, considerable flop savings can be realized. Furthermore, .partialsolve() can do this in less space than X=gemm(solver.inverse(i,j),B). That's because .partialsolve() never tabulates an explicit dense Matrix representation of Z(i,j), instead it just applies the action of Z(i,j) directly to B with a small amount (Vector cardinality) of additional workspace.

The typical use case for .partialsolve() is within a substructuring framework, where A represents a volume discretization of a PDE and the user wishes to impose forcing data upon one surface (as encoded by the j indices) but only observe the solution data upon another surface (as encoded by the i indices). The example below (tutorial/multifrontal/partialsolve.cpp) shows this workflow in action.

1 #include <myramath/dense/gemm.h>
6 
8 
13 
16 
17 #include <tests/myratest.h>
18 
19 #include <algorithm>
20 #include <iostream>
21 
22 using namespace myra;
23 
24 ADD_TEST("multifrontal_partialsolve","[multifrontal]") // int main()
25  {
26  // Make a ProgressMeter and some non-default Options, just to get flop counts printed.
27  ProgressMeter progress = make_TextProgressMeter();
28  typedef multifrontal::Options Options;
29  Options options = Options::create().set_progress(progress);
30  int T = options.nthreads;
31 
32  // Form a laplacian2() matrix A and a solver Z for it.
33  int I = 1000;
34  int J = 1000;
35  Natural2D N(I,J);
36  SparseMatrix<double> A = laplacian2<double>(I,J);
37  Permutation perm = bisect2(I,J);
38  SparseRCholeskySolver<double> Z(A,perm,options);
39  std::cout << std::endl;
40 
41  // Form restriction Rj at left wall (i=0)
42  std::vector<int> Rj;
43  for (int j = 0; j < J; ++j)
44  Rj.push_back( N(0,j) );
45  std::sort(Rj.begin(),Rj.end());
46 
47  // Form restriction Ri at top wall (j=0)
48  std::vector<int> Ri;
49  for (int i = 0; i < I; ++i)
50  Ri.push_back( N(i,0) );
51  std::sort(Ri.begin(),Ri.end());
52 
53  // Form random B, for exercising partialsolve('L'eft)
54  auto B = Matrix<double>::random(J,1);
55 
56  // Compute X = Ri*Z*Rj'B, by scattering from B, performing a full solve(), gathering into X
57  std::cout << "Calling X0 = Z.solve('L').." << std::endl;
58  auto B_full = Matrix<double>::zeros(I*J,1);
59  for (int j = 0; j < J; ++j)
60  B_full.row(Rj[j]) += B.row(j);
61  options.set_nthreads(1);
62  Z.solve(B_full,'L','N',options);
63  options.set_nthreads(T);
64  const Matrix<double>& X_full = B_full;
65  auto X0 = Matrix<double>::zeros(I,1);
66  for (int i = 0; i < I; ++i)
67  X0.row(i) += X_full.row(Ri[i]);
68  std::cout << std::endl;
69 
70  // Compute X = Z(Ri,Rj)*B using inverse().
71  std::cout << "Calling X1 = Z.inverse('L').." << std::endl;
72  Matrix<double> X1 = Z.inverse(Ri,Rj,options)*B;
73  std::cout << std::endl;
74 
75  // Compute X = Z(Ri,Rj)*B using partialsolve(). Should be the fastest.
76  std::cout << "Calling X2 = Z.partialsolve('L').." << std::endl;
77  options.set_nthreads(1);
78  Matrix<double> X2 = Z.partialsolve(Ri,Rj,B,'L','N',options);
79  options.set_nthreads(T);
80  std::cout << std::endl;
81 
82  // Examine errors.
83  std::cout << "|X1-X0| = " << frobenius(X1-X0) << std::endl;
84  std::cout << "|X2-X0| = " << frobenius(X2-X0) << std::endl;
85  std::cout << std::endl;
86 
87  // Form random C, for exercising partialsolve('R'ight)
88  auto C = Matrix<double>::random(1,J);
89 
90  // Compute Y = C*Ri*Z*Rj, by scattering from C, performing a full solve(), gathering into Y
91  std::cout << "Calling Y0 = Z.solve('R').." << std::endl;
92  auto C_full = Matrix<double>::zeros(1,I*J);
93  for (int i = 0; i < I; ++i)
94  C_full.column(Ri[i]) += C.column(i);
95  options.set_nthreads(1);
96  Z.solve(C_full,'R','N',options);
97  options.set_nthreads(T);
98  const Matrix<double>& Y_full = C_full;
99  auto Y0 = Matrix<double>::zeros(1,J);
100  for (int j = 0; j < J; ++j)
101  Y0.column(j) += Y_full.column(Rj[j]);
102  std::cout << std::endl;
103 
104  // Compute Y = Z(Ri,Rj)*B using inverse().
105  std::cout << "Calling Y1 = Z.inverse('R').." << std::endl;
106  Matrix<double> Y1 = C*Z.inverse(Ri,Rj,options);
107  std::cout << std::endl;
108 
109  // Compute Y = C*Z(Ri,Rj) using partialsolve(). Should be the fastest.
110  std::cout << "Calling Y2 = Z.partialsolve('R').." << std::endl;
111  options.set_nthreads(1);
112  Matrix<double> Y2 = Z.partialsolve(Ri,Rj,C,'R','N',options);
113  options.set_nthreads(T);
114  std::cout << std::endl;
115 
116  // Examine errors.
117  std::cout << "|Y1-Y0| = " << frobenius(Y1-Y0) << std::endl;
118  std::cout << "|Y2-Y0| = " << frobenius(Y2-Y0) << std::endl;
119  std::cout << std::endl;
120  };
121 
Interface class for representing subranges of dense Matrix&#39;s.
Options pack for routines in /multifrontal.
Definition: Options.h:24
Displays a text-based ProgressMeter on console output.
Represents a Permutation matrix, used to reorder rows/columns/etc of various numeric containers...
Definition: Permutation.h:34
Tabulates an IxJ matrix. Allows random access, has column major layout to be compatible with BLAS/LAP...
Definition: bdsqr.h:20
static Matrix< Number > zeros(int I, int J)
Generates a zeros Matrix of specified size.
Definition: Matrix.cpp:357
Routines for computing Frobenius norms of various algebraic containers.
static Matrix< Number > random(int I, int J)
Generates a random Matrix of specified size.
Definition: Matrix.cpp:353
CMatrixRange< Number > row(int i) const
Returns a MatrixRange over this(i,:)
Definition: Matrix.cpp:189
General purpose compressed-sparse-column (CSC) container.
Definition: syntax.dox:1
Interface for measuring progress via callbacks. Wraps an underlying polymorphic ProgressMeterBase.
Definition: ProgressMeter.h:55
General purpose dense matrix container, O(i*j) storage.
Options pack for routines in /multifrontal.
Sparse direct solver suitable for real symmetric positive definite systems.
Aggregates a (perm, iperm, swaps) triple into a vocabulary type.
A helper class that generates a natural ordering on a 2D structured grid of size IxJ.
Definition: laplacian2.h:43
Sparse direct solver suitable for real symmetric positive definite systems.
Definition: SparseRCholeskySolver.h:61
CMatrixRange< Number > column(int j) const
Returns a MatrixRange over this(:,j)
Definition: Matrix.cpp:245
Stores an IxJ matrix A in compressed sparse column format.
Definition: bothcat.h:23
Helper routines for reordering/filling 2D structured grids. Used by many unit tests.
Variety of routines all for dense Matrix*Matrix multiplies. Delegates to the BLAS.
Range/Iterator types associated with SparseMatrix.
Interface class for representing subranges of contiguous int&#39;s.
int nthreads
How many threads to use?
Definition: Options.h:32
Sparse A = L*L' factor(1000000 unknowns, work = 16237011413):
[----------------------------------------------------------------] (4718 ms)
Calling X0 = Z.solve('L')..
SequentialJobGraph{LambdaJob + solvel(1000000/1 unknowns/rhs, work = 127959091) + solveu(1000000/1 unknowns/rhs, work = 127959091) + LambdaJob}:
[----------------------------------------------------------------] (1589 ms)
Calling X1 = Z.inverse('L')..
SequentialJobGraph{ParallelJobGraph{Multifrontal SchurSolve(1000000/1000 unknowns/rhs, work = 2417874541) + Multifrontal SchurSolve(1000000/1000 unknowns/rhs, work = 2384687143)} + Multifrontal SchurGemm(1000000/1000/1000 unknowns/i/j, work = 1430271993) + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph}:
[----------------------------------------------------------------] (597 ms)
Calling X2 = Z.partialsolve('L')..
SequentialJobGraph{LambdaJob + partialsolvel(1000000/1 unknowns/rhs, work = 6303001) + LambdaJob + partialsolveu(1000000/1 unknowns/rhs, work = 7775053) + LambdaJob + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph}:
[----------------------------------------------------------------] (73 ms)
|X1-X0| = 7.16732e-18
|X2-X0| = 0
Calling Y0 = Z.solve('R')..
SequentialJobGraph{LambdaJob + solveu_right(1000000/1 unknowns/lhs, work = 127959091) + solvel_right(1000000/1 unknowns/lhs, work = 127959091) + LambdaJob}:
[----------------------------------------------------------------] (1542 ms)
Calling Y1 = Z.inverse('R')..
SequentialJobGraph{ParallelJobGraph{Multifrontal SchurSolve(1000000/1000 unknowns/rhs, work = 2417874541) + Multifrontal SchurSolve(1000000/1000 unknowns/rhs, work = 2384687143)} + Multifrontal SchurGemm(1000000/1000/1000 unknowns/i/j, work = 1430271993) + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph}:
[----------------------------------------------------------------] (564 ms)
Calling Y2 = Z.partialsolve('R')..
SequentialJobGraph{LambdaJob + partialsolveu_right(1/1000000 lhs/unknowns, work = 7775053) + LambdaJob + partialsolvel_right(1/1000000 lhs/unknowns, work = 6303001) + LambdaJob + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph}:
[----------------------------------------------------------------] (91 ms)
|Y1-Y0| = 6.50655e-17
|Y2-Y0| = 0

Note that .partialsolve() is not without drawbacks. It is intrinsically a BLAS2-like operation and cannot achieve the same flop rates or thread-parallel efficiencies as other BLAS3-like routines, even though it requires fewer flops overall. This is not unlike the performance differences between gemv() and gemm(). As the number of columns of B and X grows, computing the .inverse(i,j) explicitly and applying it via gemm() will beat out multiple .partialsolve(i,j) calls across each right hand side. Similarly, as |i| and |j| increase towards n, the flop expense of .partialsolve() will approach that of .solve(), and might even be slower in practice due to additional symbolic overheads and data movement. For best results some user experimentation is encouraged with these routines (.solve(), .inverse() and .partialsolve(), all three) on representative datasets.

Appendix: Working with external data.

Through the use of various Range types, the inplace versions of all these routines can take external data as inputs and write into external data as outputs. The example program below (tutorial/multifrontal/external2.cpp) forms a Schur complement of the form S = B'*inv(A)*C, by calling solver(A).schur_inplace(S,B,C), where A,B,C are SparseMatrixRange's on top of CSC formatted C-style arrays, and S is a MatrixRange on top of a column-major formatted C-style array.

3 #include <myramath/dense/gemm.h>
7 
10 
11 #include <tests/myratest.h>
12 
13 #include <iostream>
14 
15 using namespace myra;
16 
17 // Using external data, forms the schur complement S = B'*inv(A)*C, where:
18 //
19 // [ 3 - 0 0 - 0 0 0 0 0 0 0 0 0 0 0 ]
20 // [ - 4 - 0 0 - 0 0 0 0 0 0 0 0 0 0 ]
21 // [ 0 - 4 - 0 0 - 0 0 0 0 0 0 0 0 0 ]
22 // [ 0 0 - 3 0 0 0 - 0 0 0 0 0 0 0 0 ]
23 // [ - 0 0 0 4 - 0 0 - 0 0 0 0 0 0 0 ]
24 // [ 0 - 0 0 - 5 - 0 0 - 0 0 0 0 0 0 ]
25 // [ 0 0 - 0 0 - 5 - 0 0 - 0 0 0 0 0 ]
26 // [ 0 0 0 - 0 0 - 4 0 0 0 - 0 0 0 0 ]
27 // A = [ 0 0 0 0 - 0 0 0 4 - 0 0 - 0 0 0 ]
28 // [ 0 0 0 0 0 - 0 0 - 5 - 0 0 - 0 0 ]
29 // [ 0 0 0 0 0 0 - 0 0 - 5 - 0 0 - 0 ]
30 // [ 0 0 0 0 0 0 0 - 0 0 - 4 0 0 0 - ]
31 // [ 0 0 0 0 0 0 0 0 - 0 0 0 3 - 0 0 ]
32 // [ 0 0 0 0 0 0 0 0 0 - 0 0 - 4 - 0 ]
33 // [ 0 0 0 0 0 0 0 0 0 0 - 0 0 - 4 - ]
34 // [ 0 0 0 0 0 0 0 0 0 0 0 - 0 0 - 3 ]
35 //
36 // (Size 16x16, symmetric positive definite)
37 // (Note "-" means -1)
38 //
39 // [ 1 0 0 0 ] [ 1 0 0 0 0 1 0 ]
40 // [ 0 8 0 5 ] [ 0 0 2 0 0 0 8 ]
41 // [ 0 0 0 7 ] [ 7 0 0 4 9 4 0 ]
42 // [ 0 2 0 1 ] [ 0 7 4 0 0 6 4 ]
43 // [ 0 0 5 2 ] [ 0 4 0 0 3 4 0 ]
44 // [ 0 0 0 0 ] [ 3 0 0 0 0 0 0 ]
45 // [ 0 0 0 0 ] [ 0 6 7 0 6 0 0 ]
46 // B = [ 0 0 0 0 ], C = [ 0 0 0 6 0 8 3 ]
47 // [ 5 0 0 4 ] [ 6 0 0 0 0 0 5 ]
48 // [ 0 0 0 0 ] [ 0 0 0 0 0 0 0 ]
49 // [ 0 0 0 9 ] [ 8 0 0 0 0 0 2 ]
50 // [ 6 0 0 0 ] [ 0 2 0 0 0 2 0 ]
51 // [ 0 7 0 0 ] [ 0 0 3 1 7 3 0 ]
52 // [ 0 0 0 5 ] [ 0 6 0 0 0 9 1 ]
53 // [ 0 0 0 7 ] [ 0 0 0 0 8 7 0 ]
54 // [ 3 0 4 0 ] [ 6 0 0 7 0 0 4 ]
55 //
56 // (Size 16x4) (Size 16x7)
57 
58 ADD_TEST("multifrontal_external2","[multifrontal]") // int main()
59  {
60  // Encode A in CSC format.
61  int A_strides[17] = {0,3,7,11,14,18,23,28,32,36,41,46,50,53,57,61,64};
62  int A_indices[64] = {0,1,4,0,1,2,5,1,2,3,6,2,3,7,0,4,5,8,1,4,5,6,9,2,5,
63  6,7,10,3,6,7,11,4,8,9,12,5,8,9,10,13,6,9,10,11,14,
64  7,10,11,15,8,12,13,9,12,13,14,10,13,14,15,11,14,15};
65  double A_values[64] = {3,-1,-1,-1,4,-1,-1,-1,4,-1,-1,-1,3,-1,-1,4,-1,-1,-1,-1,5,
66  -1,-1,-1,-1,5,-1,-1,-1,-1,4,-1,-1,4,-1,-1,-1,-1,5,-1,-1,-1,
67  -1,5,-1,-1,-1,-1,4,-1,-1,3,-1,-1,-1,4,-1,-1,-1,4,-1,-1,-1,3};
68  SparseMatrixRange<double> A(A_strides,A_indices,A_values,16,16);
69  // Encode B in CSC format.
70  int B_strides[5] = {0,4,7,9,18};
71  int B_indices[18] = {0,8,11,15,1,3,12,4,15,1,2,3,4,8,10,13,14,15};
72  double B_values[18] = {1,5,6,3,8,2,7,5,4,5,7,1,2,4,9,5,7};
73  SparseMatrixRange<double> B(B_strides,B_indices,B_values,16,4);
74  // Encode C in CSC format.
75  int C_strides[8] = {0,6,11,15,19,24,33,40};
76  int C_indices[40] = {0,2,5,8,10,15,3,4,6,11,13,1,3,6,12,2,7,12,15,2,4,
77  6,12,14,0,2,3,4,7,11,12,13,14,1,3,7,8,10,13,15};
78  double C_values[40] = {1,7,3,6,8,6,7,4,6,2,6,2,4,7,3,4,6,1,7,9,
79  3,6,7,8,1,4,6,4,8,2,3,9,7,8,4,3,5,2,1,4};
80  SparseMatrixRange<double> C(C_strides,C_indices,C_values,16,7);
81  // Encode the output S in column-major format, it should be 4x7
82  double S_values[28] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
83  MatrixRange<double> S(S_values,4,7,4);
84  // Form solver for A, calculate B'*inv(A)*C
86  solver.schur_inplace(B,C,S);
87  std::cout << "S (sparse) = " << std::endl << S << std::endl;
88  // For reference/comparison, repeat the same calculation in dense arithmetic.
89  Matrix<double> S2 = transpose(B.make_Matrix())*inverse(A.make_Matrix())*C.make_Matrix();
90  std::cout << "S (dense) = " << std::endl << S2 << std::endl;
91  // Compare results.
92  double error = frobenius(S-S2);
93  std::cout << "error = " << error << std::endl;
94  }
95 
Represents a mutable SparseMatrixRange.
Definition: conjugate.h:21
Interface class for representing subranges of dense Matrix&#39;s.
Tabulates an IxJ matrix. Allows random access, has column major layout to be compatible with BLAS/LAP...
Definition: bdsqr.h:20
Routines for computing Frobenius norms of various algebraic containers.
Definition: syntax.dox:1
Returns a transposed copy of a Matrix. The inplace version only works on a square operand...
Represents a mutable MatrixRange.
Definition: conjugate.h:26
General purpose dense matrix container, O(i*j) storage.
Sparse direct solver suitable for real symmetric positive definite systems.
Overwrites a LowerMatrix, DiagonalMatrix, or square Matrix with its own inverse. Or, returns it as a copy.
Sparse direct solver suitable for real symmetric positive definite systems.
Definition: SparseRCholeskySolver.h:61
Variety of routines all for dense Matrix*Matrix multiplies. Delegates to the BLAS.
Range/Iterator types associated with SparseMatrix.
S (sparse) =
size 4 by 7 Matrix of double:
[ 35.8544 17.6676 8.44455 22.1638 21.5178 33.8206 27.4344 ]
[ 24.3857 22.8748 24.343 12.7857 41.9849 43.5827 38.1817 ]
[ 20.4934 12.5403 4.09292 14.6859 15.4052 20.3202 15.1026 ]
[ 90.8162 59.1858 35.3601 39.0162 97.5147 116.433 68.9376 ]
S (dense) =
size 4 by 7 Matrix of double:
[ 35.8544 17.6676 8.44455 22.1638 21.5178 33.8206 27.4344 ]
[ 24.3857 22.8748 24.343 12.7857 41.9849 43.5827 38.1817 ]
[ 20.4934 12.5403 4.09292 14.6859 15.4052 20.3202 15.1026 ]
[ 90.8162 59.1858 35.3601 39.0162 97.5147 116.433 68.9376 ]
error = 3.36572e-14

Similar inplace variants of the .inverse() and .partialsolve() methods also exist, consult the source-level documentation for further details.

Go back to API Tutorials