The /dense folder contains basic containers for tabulated numeric data like Matrix and Vector, as well as specialized containers for numeric data with exploitable nonzero structure, like DiagonalMatrix and LowerMatrix. Layered on top of these containers are various Range types, which are views into numeric data that is owned by some other underlying container. Algorithms complete the picture, by performing useful linear algebraic operations (arithmetic, decompositions, linear solution, eigensolution) upon Range's.
This layered approach (containers/ranges/algorithms) imitates the design of C++ Standard Template Library (containers/iterators/algorithms) and is intended to provide a measure of container-agnosticism to algorithms. Note that most of the heavy lifting within /dense is performed by the underlying BLAS and LAPACK libraries.
Of all the containers in the library, Matrix is the one you will probably use the most. It tabulates numeric data into an IxJ rectangle. The internal storage is column-major to remain backwards compatible with Fortran, LAPACK and the BLAS. Matrix indices always start at zero. In fact, every container in the library uses a 0-based indexing convention. Matrix has many member functions for generating a subview (a MatrixRange) over it. Consider tutorial/dense/range.cpp, listed below, which shows some of them in action:
Returns a MatrixRange over the i topmost rows, this(0:i,:)
Definition: Matrix.cpp:207
A = size 3 by 3 Matrix of double:
[ 1 2 3 ]
[ 4 5 6 ]
[ 7 8 9 ]
A.row(0) = size 1 by 3 Matrix of double:
[ 1 2 3 ]
A.column(1) = size 3 by 1 Matrix of double:
[ 2 ]
[ 5 ]
[ 8 ]
A.left(2) = size 3 by 2 Matrix of double:
[ 1 2 ]
[ 4 5 ]
[ 7 8 ]
A.window(1,3,1,3) = size 2 by 2 Matrix of double:
[ 5 6 ]
[ 8 9 ]
A.top(2) = size 2 by 3 Matrix of double:
[ 1 2 3 ]
[ 4 5 6 ]
A.top(2).left(2) = size 2 by 2 Matrix of double:
[ 1 2 ]
[ 4 5 ]
A.top(2).left(2) = size 2 by 2 Matrix of double:
[ 1 2 ]
[ 4 0 ]
A = size 3 by 3 Matrix of double:
[ 1 2 3 ]
[ 4 0 6 ]
[ 7 8 9 ]
The last few lines of this example show that:
MatrixRange's can be further partitioned into other MatrixRange's by method chaining.
Accessors for peeking and poking scalar values use "function call syntax", A(i,j) = ..whatever..
Modifying values in a MatrixRange causes change in the underlying Matrix. MatrixRange's are views, not deep copies.
BLAS1 and BLAS2 Routines
The next container of interest is Vector, which has an associated VectorRange for representing subviews. The usual BLAS1 vector operations like inner products (dense/dot.h), norms (dense/euclidean.h) and addition (dense/axpy.h) are present and delegate directly to the BLAS. Note that all these algorithms accept VectorRange's, so it's easy to use "just a piece" of a Vector as an operand. Of course, any Vector is implicitly convertible to a VectorRange over all of its contents, so algorithms work with regular Vector's too. The listing below, tutorial/dense/axpy.cpp, shows these algorithms in action.
Container for either a column vector or row vector (depends upon the usage context) ...
x = size 3 Vector of double:
[ 1 2 3 ]
x = size 3 Vector of double:
[ 4 5 6 ]
euclidean(x) = 3.74166
dot(x,y) = 32
z = size 3 Vector of double:
[ 14 19 24 ]
2*x+3*y = size 3 Vector of double:
[ 14 19 24 ]
w = size 6 Vector of double:
[ 1 2 3 4 5 6 ]
w(0:3)'w(3:6) = 32
w = size 6 Vector of double:
[ 2 4 6 8 5 6 ]
Vector's can be initialized from MatrixRange's that have size Nx1 or 1xN, such as the .row(i) and .column(j) subviews of a Matrix. When it comes to the question "is this a row Vector, or a column Vector?", the class is agnostic. It's just an N-vector. The "row-ness" or "column-ness" is only deduced from context. For instance, in the code excerpt below (tutorial/dense/gemv.cpp), the gemv(A,x) call will treat x as a column vector (size 3x1), while gemv(x,A) will treat x as a row vector (size 1x3).
Container for either a column vector or row vector (depends upon the usage context) ...
A = size 3 by 3 Matrix of double:
[ 1 4 7 ]
[ 2 5 8 ]
[ 3 6 9 ]
x = size 3 Vector of double:
[ 7 8 9 ]
A*x = size 3 Vector of double:
[ 102 126 150 ]
x*A = size 3 Vector of double:
[ 50 122 194 ]
BLAS3 Routines
BLAS3 routines, which perform O(n^3) work upon O(n^2) data, typically provide the highest level of performance on modern CPU's due to improved data reuse from cache. The most well-known routine in this class is dense/gemm.h, which performs a fused multiply-add C=alpha*op(A)*op(B)+beta*C for MatrixRange's C, A and B. To avoid the need for data movement and temporaries, gemm() allows for various modifiers ("operations") to be applied to the A and B inputs. Each is denoted by a single character argument:
'T'ranspose an input
'C'onjugate an input
'H'ermitian transpose an input (that is, transpose and conjugate it)
Do 'N'othing to an input, just use it as-is
For the record, it's worth pointing out that this library uses different conventions from the BLAS for "op" modifiers: the BLAS has no option for just conjugating, and instead uses 'C' to denote conjugate transpose (where as this library calls that op='H', for hermitian). Use care. The example below (tutorial/dense/gemm.cpp) exercises all of these options for complex operands and serves as your reference for expected behavior.
Variety of routines all for dense Matrix*Matrix multiplies. Delegates to the BLAS.
A = size 2 by 2 Matrix of std::complex<double>:
[ (1,2) (5,6) ]
[ (3,4) (7,8) ]
B = size 2 by 2 Matrix of std::complex<double>:
[ (9,10) (13,14) ]
[ (11,12) (15,16) ]
A*B = size 2 by 2 Matrix of std::complex<double>:
[ (-28,154) (-36,210) ]
[ (-32,238) (-40,326) ]
transpose(A)*B = size 2 by 2 Matrix of std::complex<double>:
[ (-26,108) (-34,148) ]
[ (-34,276) (-42,380) ]
hermitian(A)*B = size 2 by 2 Matrix of std::complex<double>:
[ (110,-16) (150,-24) ]
[ (278,-8) (382,-16) ]
conjugate(A)*B = size 2 by 2 Matrix of std::complex<double>:
[ (156,-14) (212,-22) ]
[ (240,-10) (328,-18) ]
Another noteworthy BLAS3 routine is dense/syrk.h, which is worth exploring because it leads to a new container type. Short for "symmetric rank-K" update, B=syrk(A) is semantically equivalent to B=gemm(A,A,'T') but performs roughly half the work by exploiting the fact that the output is known to be symmetric. The BLAS-provided syrk() routines write into a square output matrix, but will touch only one of its triangles ('U'pper or 'L'ower) as specified by the caller.
Typically the untouched triangle of syrk()'s output just goes along for the ride as wasted space. However LAPACK working note #199 describes an alternative that is waste-free and BLAS3-friendly called "rectangular packed format", which is encapsulated here as a specialized container named LowerMatrix. The code excerpt below (tutorial/dense/syrk.cpp) explores the relationships between calling (i) gemm(A,A'T') on a Matrix, (ii) syrk() on the 'L'ower triangle of a Matrix, and (iii) syrk() directly on a LowerMatrix:
Variety of routines all for dense Matrix*Matrix multiplies. Delegates to the BLAS.
B1 = gemm_inplace(A,'N',A,'T') = size 3 by 3 Matrix of double:
[ 66 78 90 ]
[ 78 93 108 ]
[ 90 108 126 ]
B2 = syrk_inplace('L',A) = size 3 by 3 Matrix of double:
[ 66 0 0 ]
[ 78 93 0 ]
[ 90 108 126 ]
B3 = syrk_inplace(A) = size 3 by 3 LowerMatrix of double:
[ 66 * * ]
[ 78 93 * ]
[ 90 108 126 ]
syrk(A) = size 3 by 3 LowerMatrix of double:
[ 66 * * ]
[ 78 93 * ]
[ 90 108 126 ]
syrk(A).make_Matrix('T') = size 3 by 3 Matrix of double:
[ 66 78 90 ]
[ 78 93 108 ]
[ 90 108 126 ]
The BLAS3 routine dense/trsm.h is very relevant for linear system solution using LU-like decompositions. It performs triangular system backsolution across multiple right hand sides. Just like syrk(), trsm() can operate upon either the 'U'pper/'L'ower triangle of a Matrix, or a LowerMatrix directly. Note that trsm() is highly reconfigurable, being able to solve L*X=B "from the left", X*L=B "from the right", and can also apply an op modifier ('N','T','H' or 'C') to L during the process.
The remaining BLAS3 routines are:
Hermitian rank-K updates, see dense/herk.h, very similar to syrk()
Multiplication by a triangular matrix, see dense/trmm.h, essentially the reverse operation to trsm()
A specialized multiplication routine for a symmetric matrix, see dense/symm.h
A specialized multiplication routine for a hermitian matrix, see dense/hemm.h
LAPACK Routines
Building on the high performance basic blocks of the BLAS, LAPACK provides a collection of matrix decompositions that can be used for (i) linear system solution, (ii) least squares solution and (iii) spectral representations. LAPACK provides a huge amount of functionality, this library really only exposes the routines that are relevant for sparse linear system solution. Many of the routines operate "in-place" and overwrite their input data. The example program below (tutorial/dense/potrf.cpp) uses potrf_inplace() ("positive triangular factorization", the Cholesky algorithm) to solve an A*X=B system. LowerMatrix plays a large role in this example because the system A is symmetric.
Routines for symmetric rank-k updates, a specialized form of Matrix*Matrix multiplication.
G = size 3 by 3 Matrix of double:
[ 1 1 1 ]
[ 1 2 3 ]
[ 1 3 6 ]
A = syrk(G) = size 3 by 3 LowerMatrix of double:
[ 3 * * ]
[ 6 14 * ]
[ 10 25 46 ]
X = size 3 by 2 Matrix of double:
[ 4 6 ]
[ 3 2 ]
[ 5 1 ]
B = symm(A,X) = size 3 by 2 Matrix of double:
[ 80 40 ]
[ 191 89 ]
[ 345 156 ]
L = potrf(A) = size 3 by 3 LowerMatrix of double:
[ 1.73205 * * ]
[ 3.4641 1.41421 * ]
[ 5.7735 3.53553 0.408248 ]
A\B = L\L'\B = size 3 by 2 Matrix of double:
[ 4 6 ]
[ 3 2 ]
[ 5 1 ]
frobenius(A\B-X) = 1.42081e-12
For sake of completeness, a nonsymmetric example (tutorial/dense/getrf.cpp) based on getrf_inplace() (a pivoted LU decomposition) is shown below. The notable changes are (i) the pivoting step results in row swaps that have to accounted for during backsolution (ii) the lack of symmetry of the problem means that the both the 'L'ower and 'U'pper triangles of the decomposition must be stored, leading to different looking calls to trsm_inplace(). Since pivoting requires additional integer-valued storage to record the row swaps, MyraMath introduces another view-like type (an intRange) to represent this parameter.
Eigen-decomposition of a real symmetric matrix, see dense/syev.h
Eigen-decomposition of a real symmetric tridiagonal matrix, see dense/stev.h
Eigen-decomposition of a complex hermitian matrix, see dense/heev.h
Eigen-decomposition of a general matrix, see dense/geevd.h
Singular value decomposition of a general matrix, see dense/gesvd.h
This list may grow as needs require.
Solver Classes
LAPACK and the BLAS expose a fixed set of functions that can be pipelined together for solving linear algebra problems. That design is (for the most part) imitated by the routines in /dense. However, one area where slight embellishments have been made is with the introduction of "Solver" classes which encapsulate all the steps (factorization, pivot reordering, backsolution) associated with a given decomposition. Consider tutorial/dense/solver.cpp, below. It solves the same linear system as tutorial/dense/getrf.cpp but at a slightly higher level (using LUSolver() instead of getrf_inplace(), trsm_inplace(), etc).
Variety of routines all for dense Matrix*Matrix multiplies. Delegates to the BLAS.
A = size 3 by 3 Matrix of double:
[ 1 4 9 ]
[ 3 6 7 ]
[ 5 8 2 ]
X = size 3 by 2 Matrix of double:
[ 10 13 ]
[ 11 14 ]
[ 12 15 ]
B = A*X = size 3 by 2 Matrix of double:
[ 162 204 ]
[ 180 228 ]
[ 162 207 ]
A\B = size 3 by 2 Matrix of double:
[ 10 13 ]
[ 11 14 ]
[ 12 15 ]
frobenius(A\B-X) = 2.7231e-14
On toy examples the benefits of LUSolver vs getrf_inplace() are admittedly debatable, but it could in principle save you few moments thinking about (i) "what was the sequence of trsm() calls, again?" and (ii) "how do I undo the row interchanges, again?"
The real motivation for introducing the Solver concept now is that it sets the stage for all the sparse /multifrontal solvers to come (where there is no universally accepted "LAPACK/BLAS"-like fixed function layer). Each /multifrontal solver encapsulates related functionality (reordering, factoring, solving, refinement, and more) into a single Solver entity.
Appendix: Working with external data
To expand on an idea mentioned earlier, the main purpose of the "iterator" concept in the C++ STL is to decouple std::algorithms from the std::containers upon which they operate. All of the "Range" classes in this library are intended to serve a similar role. They allow algorithms (like gemm() or gemv()) to operate upon data other than our own Matrix and Vector containers, provided that the external data is "layout compatible" with the assumptions of the Range (column major, fixed leading dimension between columns). In practice, this is almost always the case, because it's really BLAS/LAPACK/Fortran interoperability that is sets these layout conventions for everybody. MyraMath just complies with the conventions, and provides MatrixRange to help you get your own compliant data into MyraMath algorithms.
This appendix example (tutorial/dense/external.cpp) shows how to use MatrixRange to feed "external" data into MyraMath's gemm_inplace() wrapper. Here, a C-style array of double[]'s serves as mock "external" data. To instantiate a MatrixRange, you must provide essentially the same parameters that you'd pass into the BLAS anyway: (i) a pointer to a buffer of numbers, (ii) size metadata, (iii) "leading dimension" metadata.
Variety of routines all for dense Matrix*Matrix multiplies. Delegates to the BLAS.
A = size 3 by 3 Matrix of double:
[ 1 3 5 ]
[ 4 6 8 ]
[ 9 7 2 ]
B = size 3 by 3 Matrix of double:
[ 0.1 3.4 6.7 ]
[ 1.2 4.5 7.8 ]
[ 2.3 5.6 8.9 ]
C = size 3 by 3 Matrix of double:
[ 0 0 0 ]
[ 0 0 0 ]
[ 0 0 0 ]
C = gemm(A,B) = size 3 by 3 Matrix of double:
[ 15.2 44.9 74.6 ]
[ 26 85.4 144.8 ]
[ 13.9 73.3 132.7 ]
C_values[0] = 15.2
C_values[1] = 26
C_values[2] = 13.9
C_values[3] = 44.9
C_values[4] = 85.4
C_values[5] = 73.3
C_values[6] = 74.6
C_values[7] = 144.8
C_values[8] = 132.7
If there was no intermediary like MatrixRange, and gemm_inplace() only accepted concrete Matrix objects, this wouldn't have been possible. At least, not without copying to/from temporaries for B and C. Similar idioms are used for getting "external" data into de-facto-standard formats for sparse matrices (like compressed-sparse-column) into MyraMath's /sparse routines and /multifrontal solvers. Furthermore, the /dense MatrixRange's and VectorRange's still have roles to play even in /sparse-land, because they serve as the interface types for right-hand-sides and solutions.
Appendix: A note about const-correctness.
As mentioned earlier, a MatrixRange is a view into data "owned" by some other container (a slice/window of a Matrix for example, or possibly external data). MatrixRange is a "mutable" view which permits modification of the underlying data. For "immutable" (const) views, there is a complementary class called CMatrixRange. This design is similar to the containers from the C++ STL, which present both (mutable) iterator's and (immutable) const_iterator's. Typically both range types are passed around as const&, when reading signatures just remember that the mutability of the underlying data is baked into the type itself (MatrixRange vs. CMatrixRange), not the const& qualifier. As an example, consider the signature of gemm_inplace(), which updates C+=A*B:
1void gemm_inplace(
3const MatrixRange<float>& C, // C's data is mutable
4const CMatrixRange<float>& A, // A's data is const
5const CMatrixRange<float>& B); // B's data is const
Because C is modified, it must be a (mutable) MatrixRange. But since A and B are passed as (immutable) CMatrixRange's, the routine is not permitted to modify them. Note that a MatrixRange can always be implicitly converted into a CMatrixRange, because it's always safe to "add constness" to a variable. The reverse conversion, "casting away const" from CMatrixRange to MatrixRange, is stylistically discouraged but can be accomplished by explicitly calling the .remove_const() member function of CMatrixRange.