Overview

This tutorial covers more advanced use cases for the sparse direct solvers in in /multifrontal. It is recommended that you read the Tutorial for /multifrontal solvers (part 1) first.

As mentioned in part 1, effective sparse direct solvers rely upon reordering the system for reduced fill-in. Unfortunately, pre-selecting the elimination order is in conflict with the need for dynamic partial pivoting for numerical stability. Pivoting strategies for sparse direct solvers do exist and are implemented in MyraMath, but are somewhat compromised compared to their dense equivalents.

The poorer overall stability of sparse direct solvers leads one to pursue solution improvement strategies based on either backwards refinement or preconditioned Krylov schemes. Most of the solvers in /multifrontal offer a no-fuss .refine() method based on backwards refinement. The signature for .refine() is exactly the same as .solve(), enabling drop-in replacement. Use .refine() liberally.

MyraMath aims to support finite element substructuring methods, the design of block preconditioners for multiphysics models, and constrained optimization. Within such contexts, the need often arises to compute schur complements of the form S=B'*inv(A)*B or S=B'*inv(A)*C where the A,B,C operands are sparse. There are multifrontal techniques for computing such products, and each solver exposes .schur(B) and .schur(B,C) methods for doing so.

Another niche calculation explored here is the explicit calculation of specific entries of inv(A). Generally speaking, any interesting SparseMatrix has a dense inverse, so it is intractable to tabulate every entry of inv(A). However, computing a subset of entries (for instance, every diagonal of inv(A), or a nearest neighbor stencil restricted to a boundary) is not unreasonable and might arise in the design of advanced preconditioners. All the solvers in /multifrontal offer .inverse() methods for these types of calculations.

Lastly, MyraMath's solvers offer a specialized .partialsolve() routine. It is essentially a sparsity-exploiting variant of the regular .solve() method, and is especially useful in the context of substructuring and model order reduction.

Below are links to the sections of this page:

Iterative Refinement

Though sparse direct solvers are somewhat less stable as their dense equivalents, applying few steps of backwards refinement can help make them more robust. Most of the solvers in /multifrontal provide a .refine() method with the exact same signature/semantics as the .solve() method. That is, pass in B and the .refine() will overwrite it with X such that A*X=B. Note that .refine() supports the same op and side parameters as .solve(), too. The excerpt below (tutorial/multifrontal/refine.cpp) uses .refine() on a sparse indefinite A=LDL' decomposition and shows how it can offer improved accuracy over the basic .solve() method:

 #include <myramath/utility/stlprint.h>
 
 #include <myramath/dense/Matrix.h>
 #include <myramath/dense/MatrixRange.h>
 #include <myramath/dense/frobenius.h>
 
 #include <myramath/sparse/SparseMatrix.h>
 #include <myramath/sparse/SparseMatrixRange.h>
 #include <myramath/sparse/gemm.h>
 #include <myramath/sparse/laplacian2.h>
 
 #include <myramath/multifrontal/rldlt/SparseRLDLTSolver.h>
 
 #include <tests/myratest.h>
 
 #include <iostream>
 
 using namespace myra;
 
 ADD_TEST("multifrontal_refine","[multifrontal]") // int main()
   {
   // Make a symmetric indefinite A by shifting a laplacian.
   // Use single precision to emphasize the need for refine()
   SparseMatrix<float> A = laplacian2<float>(100,100);
   int N = A.size().first;
   for (int n = 0; n < N; ++n)
     A(n,n) -= 2;
   SparseRLDLTSolver<float> solver(A);
   // Make random X, form B = A*X
   auto X = Matrix<float>::random(N,5);
   auto B = A*X;
   // Check residual after using .solve()
   auto X_solve = B;
   solver.solve(X_solve);
   std::cout << "|A*X-B| (solve)  = " << frobenius(A*X_solve-B)/frobenius(B) << std::endl;
   // Check residual after using .refine()
   auto X_refine = B;
   auto history = solver.refine(X_refine);
   std::cout << "|A*X-B| (refine) = " << frobenius(A*X_refine-B)/frobenius(B) << std::endl;    
   using myra_stlprint::operator<<;
   std::cout << "Residual history = " << history << std::endl;
   }

|A*X-B| (solve)  = 2.39045e-06
|A*X-B| (refine) = 7.5068e-08
Residual history = [ 2.39054e-06 7.40593e-08 ] (2)

Generally speaking, .refine() is slower than .solve(), because the former calls the latter on each iteration and has the added burden of forward multiplying by the original A to form residuals. Typically only a handful of iterations is required to drive the residual to working precision, and the improved robustness is worth the cost. But some experimentation is encouraged because there are many problem-specific idiosyncrasies that influence solver performance (spectral properties, scaling/equilibration, working precision, required accuracy, etc). Finally, note that the Cholesky decompositions for positive definite systems (SparseRCholesky and SparseZCholesky) don't actually have a .refine() method. They are naturally very stable and the basic .solve() method is sufficient.

Like .solve(), the .refine() method defaults to using a single thread but this behavior can be overridden through Options.

Schur Complements

In some contexts, the need arises to manipulate Schur complements of the form S=B'*inv(A)*B or S=B'*inv(A)*C where the A,B,C operands are sparse. If only the "action" y=B'*(A\(C*x)) is required, it can be applied by cascading three steps: a sparse matvec using gemm(C), direct backsolution by a .solve() method, and then a sparse matvec using gemm(B'). This approach is commonplace, as it enables S to be solved using Krylov techniques.

Unfortunately, if S is very ill-conditioned the situation is messier. It might be necessary to explicitly tabulate S to enable (i) further algebraic preconditioning or (ii) a direct S=L*U decomposition. For the record, although clever multifrontal techniques exist for explicitly tabulating S, it is still a costly calculation. But as long as the size of S is no bigger than the root separator of A, factoring A and tabulating S have the same asymptotic costs. This is often the case in substructuring methods for FEM, where S represents the static condensation of 3D geometry to a 2D boundary.

The example program below (tutorial/multifrontal/schur.cpp) demonstrates how to use the .schur() method of a solver on a 2D structured problem.

 #include <myramath/utility/random.h>
 
 #include <myramath/dense/Matrix.h>
 #include <myramath/dense/MatrixRange.h>
 #include <myramath/dense/LowerMatrix.h>
 #include <myramath/dense/LowerMatrixRange.h>
 #include <myramath/dense/gemm.h>
 #include <myramath/dense/inverse.h>
 #include <myramath/dense/transpose.h>
 #include <myramath/dense/frobenius.h>
 
 #include <myramath/sparse/SparseMatrix.h>
 #include <myramath/sparse/SparseMatrixRange.h>
 #include <myramath/sparse/SparseMatrixBuilder.h>
 #include <myramath/sparse/Pattern.h>
 #include <myramath/sparse/PatternRange.h>
 #include <myramath/sparse/PatternBuilder.h>
 #include <myramath/sparse/laplacian2.h>
 
 #include <myramath/multifrontal/rcholesky/SparseRCholeskySolver.h>
 
 #include <tests/myratest.h>
 
 #include <iostream>
 
 using namespace myra;
 
 // Layout of unknowns for laplacian2(7,7):
 //
 //  0  7 14 21 28 35 42
 //  1  8 15 31 41 51 43
 //  2  9 .           44
 //  3 10    .        45
 //  4 11       .     46
 //  5 12          .  47
 //  6 13 20 27 34 41 48
 
 ADD_TEST("multifrontal_schur","[multifrontal]") // int main()
   {
   // Form a laplacian2() matrix A and a solver for it.
   SparseMatrix<double> A = laplacian2<double>(7,7);
   SparseRCholeskySolver<double> solver(A);
   // Form matrix B that only has nonzeros on the "top wall" of the grid, unknowns 0:7:42
   SparseMatrixBuilder<double> B(49,7);
   for (int nz = 0; nz < 30; ++nz)
     {
     int i = random_int(7)*7;
     int j = random_int(7);
     double b = random<double>();
     B(i,j) = b;
     }
   // Visualize the Pattern's of A and B.
   std::cout << "A = " << std::endl << A.pattern() << std::endl;
   std::cout << "B = " << std::endl << B.make_Pattern() << std::endl;
   // Form B'*inv(A)*B using solver.schur()
   LowerMatrix<double> S1 = solver.schur(B.make_SparseMatrix());
   // Form B'*inv(A)*B using dense kernels, compare.
   Matrix<double> AA = A.make_Matrix();
   Matrix<double> BB = B.make_Matrix();
   Matrix<double> S2 = transpose(BB)*inverse(AA)*BB;
   // Compare.
   std::cout << "S1 = B'*inv(A)*B (sparse) = " << S1 << std::endl;
   std::cout << "S2 = B'*inv(A)*B (dense)  = " << S2 << std::endl;
   double error = frobenius(S1.make_Matrix('T')-S2);
   std::cout << "|S1-S2| = " << error << std::endl;
   };

A =
size 49 by 49 PatternBuilder:
[ x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - x x - - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ x - - - - - - x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - x - - - - - x x - - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - x - - - - - - x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - x - - - - - x x - - - - - - x - - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - x - - - - - - x x - - - - - x - - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - x - - - - - x x - - - - - - x - - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - x - - - - - - x x - - - - - x - - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x - - - - - - x - - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - - x x - - - - - x - - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - - x - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x - - - - - - x ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - - x x - - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x - ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x x ]
[ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - x - - - - - x x ]
B =
size 49 by 7 PatternBuilder:
[ - - - - x - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - x - - - x x ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - x - - x x - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - x x x x x ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ x - x - - x - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ x x - - x - x ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - x x x - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
[ - - - - - - - ]
S1 = B'*inv(A)*B (sparse) = size 7 by 7 LowerMatrix of double:
[ 0.168059 * * * * * * ]
[ -0.164305 0.853895 * * * * * ]
[ 0.001834 -0.0156 0.0062586 * * * * ]
[ 0.0589205 0.0332125 -0.02354 0.402193 * * * ]
[ -0.0350081 0.144599 -0.0243758 0.255877 0.625003 * * ]
[ 0.0830003 -0.481278 0.0257226 0.197717 -0.0350709 0.656239 * ]
[ 0.0740291 -0.241743 0.0148005 -0.0938633 -0.151868 0.137609 0.122292 ]
S2 = B'*inv(A)*B (dense)  = size 7 by 7 Matrix of double:
[ 0.168059 -0.164305 0.001834 0.0589205 -0.0350081 0.0830003 0.0740291 ]
[ -0.164305 0.853895 -0.0156 0.0332125 0.144599 -0.481278 -0.241743 ]
[ 0.001834 -0.0156 0.0062586 -0.02354 -0.0243758 0.0257226 0.0148005 ]
[ 0.0589205 0.0332125 -0.02354 0.402193 0.255877 0.197717 -0.0938633 ]
[ -0.0350081 0.144599 -0.0243758 0.255877 0.625003 -0.0350709 -0.151868 ]
[ 0.0830003 -0.481278 0.0257226 0.197717 -0.0350709 0.656239 0.137609 ]
[ 0.0740291 -0.241743 0.0148005 -0.0938633 -0.151868 0.137609 0.122292 ]
|S1-S2| = 4.64229e-16

Generally speaking, S is dense even when the A,B,C operands are sparse. The only structure of S that is exploited is symmetry, in the sense that the single argument .schur(B) method will return just one triangle of S=B'*inv(A)*B as a LowerMatrix (costing half the memory and half the flops). The two argument .schur(B,C) method does not make any symmetry assumptions and always returns a fully populated Matrix representing S=B'*inv(A)*C.

Computing Entries of inv(A)

The need to compute selected entries of inv(A) occasionally arises. For example, in linear least squares solution, the variance of each unknown (a measure of merit of the fit) is encoded in the diagonal entries of the inverse of the covariance matrix (sometimes referred to as the "precision matrix"). Note that computing the (i,j) entry of inv(A) is closely related to the calculation of a Schur complement. If one defines Ei as the (sparse) column vector with a 1 in the i'th position and zeros everywhere else, then inv(A)(i,j) = Ei'*inv(A)*Ej. The process of constructing Ei/Ej and then forming the Schur complement Ei'*inv(A)*Ej is encapsulated within the .inverse() routine. The example below (tutorial/multifrontal/inverse.cpp) shows how to use it.

 #include <myramath/dense/Matrix.h>
 #include <myramath/dense/MatrixRange.h>
 #include <myramath/dense/DiagonalMatrix.h>
 #include <myramath/dense/DiagonalMatrixRange.h>
 #include <myramath/dense/diag.h>
 #include <myramath/dense/inverse.h>
 #include <myramath/dense/frobenius.h>
 
 #include <myramath/sparse/SparseMatrix.h>
 #include <myramath/sparse/SparseMatrixRange.h>
 #include <myramath/sparse/laplacian2.h>
 
 #include <myramath/multifrontal/rcholesky/SparseRCholeskySolver.h>
 
 #include <tests/myratest.h>
 
 #include <iostream>
 
 using namespace myra;
 
 ADD_TEST("multifrontal_inverse","[multifrontal]") // int main()
   {
   // Form a laplacian2() matrix A and a solver for it.
   SparseMatrix<double> A = laplacian2<double>(5,5);
   SparseRCholeskySolver<double> solver(A);  
   // Fill a DiagonalMatrix D2 with every diagonal entry of inv(A)
   DiagonalMatrix<double> D1 = diag(inverse(A.make_Matrix()));
   // Repeat calculation, but using solver.inverse() method.
   DiagonalMatrix<double> D2(D1.size());
   for (int n = 0; n < D2.size(); ++n)
     D2(n) = solver.inverse(n);
   // Compare two results.
   std::cout << "D1 = diag(inv(A)) (dense ) = " << D1 << std::endl;
   std::cout << "D2 = diag(inv(A)) (sparse) = " << D2  << std::endl;  
   double error = frobenius(D1-D2);
   std::cout << "|D1-D2| = " << error << std::endl;
   };

D1 = diag(inv(A)) (dense ) = size 25 DiagonalMatrix of double: [ 0.421266 0.332583 0.324223 0.332583 0.421266 0.332583 0.267952 0.262229 0.267952 0.332583 0.324223 0.262229 0.257087 0.262229 0.324223 0.332583 0.267952 0.262229 0.267952 0.332583 0.421266 0.332583 0.324223 0.332583 0.421266 ]
D2 = diag(inv(A)) (sparse) = size 25 DiagonalMatrix of double: [ 0.421266 0.332583 0.324223 0.332583 0.421266 0.332583 0.267952 0.262229 0.267952 0.332583 0.324223 0.262229 0.257087 0.262229 0.324223 0.332583 0.267952 0.262229 0.267952 0.332583 0.421266 0.332583 0.324223 0.332583 0.421266 ]
|D1-D2| = 5.43896e-16

If multiple entries of inv(A) are needed, it might be possible to reduce duplication of work by computing them concurrently. Every solver provides several .inverse() routines, each works upon a different "stencil" of (i,j) indices:

.inverse(int ij), returns the ij'th diagonal entry of inv(A)
.inverse(int i, int j), returns the (i,j)'th entry of inv(A)
.inverse(intRange i, intRange j), returns a tensor product of entries from inv(A), as a dense Matrix
.inverse(intRange ij), returns a symmetric tensor product of of entries from inv(A), as a dense LowerMatrix

There are space/time tradeoffs between these routines. Some end-user experimentation is encouraged to determine which one is best for a specific problem.

Partial Solve Routines

The last routine discussed here, .partialsolve(), performs sparsity-aware backsolution. It is closely related to both the .solve() and .inverse() methods discussed already. Much like .solve(), the .partialsolve() method applies the action of Z=inv(A) to forcing data B and populates solution data X. However, .partialsolve() is also like .inverse(), in that it allows the caller to specify restrictions i and j, solving just over a subset of indices Z(i,j). If the cardinalities of indices |i| and |j| are smaller than the number of unknowns n of A, considerable flop savings can be realized. Furthermore, .partialsolve() can do this in less space than X=gemm(solver.inverse(i,j),B). That's because .partialsolve() never tabulates an explicit dense Matrix representation of Z(i,j), instead it just applies the action of Z(i,j) directly to B with a small amount (Vector cardinality) of additional workspace.

The typical use case for .partialsolve() is within a substructuring framework, where A represents a volume discretization of a PDE and the user wishes to impose forcing data upon one surface (as encoded by the j indices) but only observe the solution data upon another surface (as encoded by the i indices). The example below (tutorial/multifrontal/partialsolve.cpp) shows this workflow in action.

 #include <myramath/dense/gemm.h>
 #include <myramath/dense/Matrix.h>
 #include <myramath/dense/MatrixRange.h>
 #include <myramath/dense/intRange.h>
 #include <myramath/dense/frobenius.h>
 
 #include <myramath/jobgraph/TextProgressMeter.h>
 
 #include <myramath/sparse/Permutation.h>
 #include <myramath/sparse/SparseMatrix.h>
 #include <myramath/sparse/SparseMatrixRange.h>
 #include <myramath/sparse/laplacian2.h>
 
 #include <myramath/multifrontal/Options.h>
 #include <myramath/multifrontal/rcholesky/SparseRCholeskySolver.h>
 
 #include <tests/myratest.h>
 
 #include <algorithm>
 #include <iostream>
 
 using namespace myra;
 
 ADD_TEST("multifrontal_partialsolve","[multifrontal]") // int main()
   {
   // Make a ProgressMeter and some non-default Options, just to get flop counts printed.
   ProgressMeter progress = make_TextProgressMeter();
   typedef multifrontal::Options Options;
   Options options = Options::create().set_progress(progress);
   int T = options.nthreads;
 
   // Form a laplacian2() matrix A and a solver Z for it.
   int I = 1000;
   int J = 1000;
   Natural2D N(I,J);
   SparseMatrix<double> A = laplacian2<double>(I,J);
   Permutation perm = bisect2(I,J);
   SparseRCholeskySolver<double> Z(A,perm,options);
   std::cout << std::endl;
 
   // Form restriction Rj at left wall (i=0)
   std::vector<int> Rj;
   for (int j = 0; j < J; ++j)
     Rj.push_back( N(0,j) );
   std::sort(Rj.begin(),Rj.end());
 
   // Form restriction Ri at top wall (j=0)
   std::vector<int> Ri;
   for (int i = 0; i < I; ++i)
     Ri.push_back( N(i,0) );  
   std::sort(Ri.begin(),Ri.end());
 
   // Form random B, for exercising partialsolve('L'eft)
   auto B = Matrix<double>::random(J,1);
 
   // Compute X = Ri*Z*Rj'B, by scattering from B, performing a full solve(), gathering into X
   std::cout << "Calling X0 = Z.solve('L').." << std::endl;
   auto B_full = Matrix<double>::zeros(I*J,1);
   for (int j = 0; j < J; ++j)
     B_full.row(Rj[j]) += B.row(j);
   options.set_nthreads(1);
   Z.solve(B_full,'L','N',options);
   options.set_nthreads(T);  
   const Matrix<double>& X_full = B_full;
   auto X0 = Matrix<double>::zeros(I,1);
   for (int i = 0; i < I; ++i)
     X0.row(i) += X_full.row(Ri[i]);
   std::cout << std::endl;
 
   // Compute X = Z(Ri,Rj)*B using inverse().
   std::cout << "Calling X1 = Z.inverse('L').." << std::endl;
   Matrix<double> X1 = Z.inverse(Ri,Rj,options)*B;
   std::cout << std::endl;
 
   // Compute X = Z(Ri,Rj)*B using partialsolve(). Should be the fastest.
   std::cout << "Calling X2 = Z.partialsolve('L').." << std::endl;
   options.set_nthreads(1);
   Matrix<double> X2 = Z.partialsolve(Ri,Rj,B,'L','N',options);
   options.set_nthreads(T);
   std::cout << std::endl;
   
   // Examine errors.
   std::cout << "|X1-X0| = " << frobenius(X1-X0) << std::endl;
   std::cout << "|X2-X0| = " << frobenius(X2-X0) << std::endl;
   std::cout << std::endl;
 
   // Form random C, for exercising partialsolve('R'ight)
   auto C = Matrix<double>::random(1,J);
 
   // Compute Y = C*Ri*Z*Rj, by scattering from C, performing a full solve(), gathering into Y
   std::cout << "Calling Y0 = Z.solve('R').." << std::endl;
   auto C_full = Matrix<double>::zeros(1,I*J);
   for (int i = 0; i < I; ++i)
     C_full.column(Ri[i]) += C.column(i);
   options.set_nthreads(1);
   Z.solve(C_full,'R','N',options);
   options.set_nthreads(T);
   const Matrix<double>& Y_full = C_full;
   auto Y0 = Matrix<double>::zeros(1,J);
   for (int j = 0; j < J; ++j)
     Y0.column(j) += Y_full.column(Rj[j]);
   std::cout << std::endl;
 
   // Compute Y = Z(Ri,Rj)*B using inverse().
   std::cout << "Calling Y1 = Z.inverse('R').." << std::endl;
   Matrix<double> Y1 = C*Z.inverse(Ri,Rj,options);
   std::cout << std::endl;
 
   // Compute Y = C*Z(Ri,Rj) using partialsolve(). Should be the fastest.
   std::cout << "Calling Y2 = Z.partialsolve('R').." << std::endl;
   options.set_nthreads(1);
   Matrix<double> Y2 = Z.partialsolve(Ri,Rj,C,'R','N',options);
   options.set_nthreads(T);
   std::cout << std::endl;
 
   // Examine errors.
   std::cout << "|Y1-Y0| = " << frobenius(Y1-Y0) << std::endl;
   std::cout << "|Y2-Y0| = " << frobenius(Y2-Y0) << std::endl;
   std::cout << std::endl;
   };
 

Sparse A = L*L' factor(1000000 unknowns, work = 16237011413):
[----------------------------------------------------------------] (4718 ms)
Calling X0 = Z.solve('L')..
SequentialJobGraph{LambdaJob + solvel(1000000/1 unknowns/rhs, work = 127959091) + solveu(1000000/1 unknowns/rhs, work = 127959091) + LambdaJob}:
[----------------------------------------------------------------] (1589 ms)
Calling X1 = Z.inverse('L')..
SequentialJobGraph{ParallelJobGraph{Multifrontal SchurSolve(1000000/1000 unknowns/rhs, work = 2417874541) + Multifrontal SchurSolve(1000000/1000 unknowns/rhs, work = 2384687143)} + Multifrontal SchurGemm(1000000/1000/1000 unknowns/i/j, work = 1430271993) + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph}:
[----------------------------------------------------------------] (597 ms)
Calling X2 = Z.partialsolve('L')..
SequentialJobGraph{LambdaJob + partialsolvel(1000000/1 unknowns/rhs, work = 6303001) + LambdaJob + partialsolveu(1000000/1 unknowns/rhs, work = 7775053) + LambdaJob + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph}:
[----------------------------------------------------------------] (73 ms)
|X1-X0| = 7.16732e-18
|X2-X0| = 0
Calling Y0 = Z.solve('R')..
SequentialJobGraph{LambdaJob + solveu_right(1000000/1 unknowns/lhs, work = 127959091) + solvel_right(1000000/1 unknowns/lhs, work = 127959091) + LambdaJob}:
[----------------------------------------------------------------] (1542 ms)
Calling Y1 = Z.inverse('R')..
SequentialJobGraph{ParallelJobGraph{Multifrontal SchurSolve(1000000/1000 unknowns/rhs, work = 2417874541) + Multifrontal SchurSolve(1000000/1000 unknowns/rhs, work = 2384687143)} + Multifrontal SchurGemm(1000000/1000/1000 unknowns/i/j, work = 1430271993) + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph}:
[----------------------------------------------------------------] (564 ms)
Calling Y2 = Z.partialsolve('R')..
SequentialJobGraph{LambdaJob + partialsolveu_right(1/1000000 lhs/unknowns, work = 7775053) + LambdaJob + partialsolvel_right(1/1000000 lhs/unknowns, work = 6303001) + LambdaJob + DeleteJobGraph + DeleteJobGraph + DeleteJobGraph}:
[----------------------------------------------------------------] (91 ms)
|Y1-Y0| = 6.50655e-17
|Y2-Y0| = 0

Note that .partialsolve() is not without drawbacks. It is intrinsically a BLAS2-like operation and cannot achieve the same flop rates or thread-parallel efficiencies as other BLAS3-like routines, even though it requires fewer flops overall. This is not unlike the performance differences between gemv() and gemm(). As the number of columns of B and X grows, computing the .inverse(i,j) explicitly and applying it via gemm() will beat out multiple .partialsolve(i,j) calls across each right hand side. Similarly, as |i| and |j| increase towards n, the flop expense of .partialsolve() will approach that of .solve(), and might even be slower in practice due to additional symbolic overheads and data movement. For best results some user experimentation is encouraged with these routines (.solve(), .inverse() and .partialsolve(), all three) on representative datasets.

Appendix: Working with external data.

Through the use of various Range types, the inplace versions of all these routines can take external data as inputs and write into external data as outputs. The example program below (tutorial/multifrontal/external2.cpp) forms a Schur complement of the form S = B'*inv(A)*C, by calling solver(A).schur_inplace(S,B,C), where A,B,C are SparseMatrixRange's on top of CSC formatted C-style arrays, and S is a MatrixRange on top of a column-major formatted C-style array.

 #include <myramath/dense/Matrix.h>
 #include <myramath/dense/MatrixRange.h>
 #include <myramath/dense/gemm.h>
 #include <myramath/dense/inverse.h>
 #include <myramath/dense/frobenius.h>
 #include <myramath/dense/transpose.h>
 
 #include <myramath/sparse/SparseMatrixRange.h>
 #include <myramath/multifrontal/rcholesky/SparseRCholeskySolver.h>
 
 #include <tests/myratest.h>
 
 #include <iostream>
 
 using namespace myra;
 
 // Using external data, forms the schur complement S = B'*inv(A)*C, where:
 //
 //     [ 3 - 0 0 - 0 0 0 0 0 0 0 0 0 0 0 ]
 //     [ - 4 - 0 0 - 0 0 0 0 0 0 0 0 0 0 ]
 //     [ 0 - 4 - 0 0 - 0 0 0 0 0 0 0 0 0 ]
 //     [ 0 0 - 3 0 0 0 - 0 0 0 0 0 0 0 0 ]
 //     [ - 0 0 0 4 - 0 0 - 0 0 0 0 0 0 0 ]
 //     [ 0 - 0 0 - 5 - 0 0 - 0 0 0 0 0 0 ]
 //     [ 0 0 - 0 0 - 5 - 0 0 - 0 0 0 0 0 ]
 //     [ 0 0 0 - 0 0 - 4 0 0 0 - 0 0 0 0 ]
 // A = [ 0 0 0 0 - 0 0 0 4 - 0 0 - 0 0 0 ]
 //     [ 0 0 0 0 0 - 0 0 - 5 - 0 0 - 0 0 ]
 //     [ 0 0 0 0 0 0 - 0 0 - 5 - 0 0 - 0 ]
 //     [ 0 0 0 0 0 0 0 - 0 0 - 4 0 0 0 - ]
 //     [ 0 0 0 0 0 0 0 0 - 0 0 0 3 - 0 0 ]
 //     [ 0 0 0 0 0 0 0 0 0 - 0 0 - 4 - 0 ]
 //     [ 0 0 0 0 0 0 0 0 0 0 - 0 0 - 4 - ]
 //     [ 0 0 0 0 0 0 0 0 0 0 0 - 0 0 - 3 ]
 //
 //  (Size 16x16, symmetric positive definite)
 //             (Note "-" means -1)
 //
 //     [ 1 0 0 0 ]       [ 1 0 0 0 0 1 0 ]
 //     [ 0 8 0 5 ]       [ 0 0 2 0 0 0 8 ]
 //     [ 0 0 0 7 ]       [ 7 0 0 4 9 4 0 ]
 //     [ 0 2 0 1 ]       [ 0 7 4 0 0 6 4 ]
 //     [ 0 0 5 2 ]       [ 0 4 0 0 3 4 0 ]
 //     [ 0 0 0 0 ]       [ 3 0 0 0 0 0 0 ]
 //     [ 0 0 0 0 ]       [ 0 6 7 0 6 0 0 ]
 // B = [ 0 0 0 0 ],  C = [ 0 0 0 6 0 8 3 ]
 //     [ 5 0 0 4 ]       [ 6 0 0 0 0 0 5 ]
 //     [ 0 0 0 0 ]       [ 0 0 0 0 0 0 0 ]
 //     [ 0 0 0 9 ]       [ 8 0 0 0 0 0 2 ]
 //     [ 6 0 0 0 ]       [ 0 2 0 0 0 2 0 ]
 //     [ 0 7 0 0 ]       [ 0 0 3 1 7 3 0 ]
 //     [ 0 0 0 5 ]       [ 0 6 0 0 0 9 1 ]
 //     [ 0 0 0 7 ]       [ 0 0 0 0 8 7 0 ]
 //     [ 3 0 4 0 ]       [ 6 0 0 7 0 0 4 ]
 //
 //     (Size 16x4)          (Size 16x7)
 
 ADD_TEST("multifrontal_external2","[multifrontal]") // int main()
   {
   // Encode A in CSC format.
   int A_strides[17] = {0,3,7,11,14,18,23,28,32,36,41,46,50,53,57,61,64};
   int A_indices[64] = {0,1,4,0,1,2,5,1,2,3,6,2,3,7,0,4,5,8,1,4,5,6,9,2,5,
                        6,7,10,3,6,7,11,4,8,9,12,5,8,9,10,13,6,9,10,11,14,
                        7,10,11,15,8,12,13,9,12,13,14,10,13,14,15,11,14,15};
   double A_values[64] = {3,-1,-1,-1,4,-1,-1,-1,4,-1,-1,-1,3,-1,-1,4,-1,-1,-1,-1,5,
                          -1,-1,-1,-1,5,-1,-1,-1,-1,4,-1,-1,4,-1,-1,-1,-1,5,-1,-1,-1,
                          -1,5,-1,-1,-1,-1,4,-1,-1,3,-1,-1,-1,4,-1,-1,-1,4,-1,-1,-1,3};
   SparseMatrixRange<double> A(A_strides,A_indices,A_values,16,16);
   // Encode B in CSC format.
   int B_strides[5] = {0,4,7,9,18};
   int B_indices[18] = {0,8,11,15,1,3,12,4,15,1,2,3,4,8,10,13,14,15};
   double B_values[18] = {1,5,6,3,8,2,7,5,4,5,7,1,2,4,9,5,7};
   SparseMatrixRange<double> B(B_strides,B_indices,B_values,16,4);
   // Encode C in CSC format.
   int C_strides[8] = {0,6,11,15,19,24,33,40};
   int C_indices[40] = {0,2,5,8,10,15,3,4,6,11,13,1,3,6,12,2,7,12,15,2,4,
                        6,12,14,0,2,3,4,7,11,12,13,14,1,3,7,8,10,13,15};
   double C_values[40] = {1,7,3,6,8,6,7,4,6,2,6,2,4,7,3,4,6,1,7,9,
                          3,6,7,8,1,4,6,4,8,2,3,9,7,8,4,3,5,2,1,4};
   SparseMatrixRange<double> C(C_strides,C_indices,C_values,16,7);
   // Encode the output S in column-major format, it should be 4x7
   double S_values[28] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
   MatrixRange<double> S(S_values,4,7,4);
   // Form solver for A, calculate B'*inv(A)*C
   SparseRCholeskySolver<double> solver(A);
   solver.schur_inplace(B,C,S);
   std::cout << "S (sparse) = " << std::endl << S << std::endl;
   // For reference/comparison, repeat the same calculation in dense arithmetic.
   Matrix<double> S2 = transpose(B.make_Matrix())*inverse(A.make_Matrix())*C.make_Matrix();
   std::cout << "S (dense) = " << std::endl << S2 << std::endl;
   // Compare results.
   double error = frobenius(S-S2);
   std::cout << "error = " << error << std::endl;
   }
 

S (sparse) =
size 4 by 7 Matrix of double:
[ 35.8544 17.6676 8.44455 22.1638 21.5178 33.8206 27.4344 ]
[ 24.3857 22.8748 24.343 12.7857 41.9849 43.5827 38.1817 ]
[ 20.4934 12.5403 4.09292 14.6859 15.4052 20.3202 15.1026 ]
[ 90.8162 59.1858 35.3601 39.0162 97.5147 116.433 68.9376 ]
S (dense) =
size 4 by 7 Matrix of double:
[ 35.8544 17.6676 8.44455 22.1638 21.5178 33.8206 27.4344 ]
[ 24.3857 22.8748 24.343 12.7857 41.9849 43.5827 38.1817 ]
[ 20.4934 12.5403 4.09292 14.6859 15.4052 20.3202 15.1026 ]
[ 90.8162 59.1858 35.3601 39.0162 97.5147 116.433 68.9376 ]
error = 3.36572e-14

Similar inplace variants of the .inverse() and .partialsolve() methods also exist, consult the source-level documentation for further details.

Go back to API Tutorials