JetStream+Matrix+Structures

toc The matrix structures in JetStream can be pretty difficult to figure out, hence the existence of this page. We will try to do our best to explain it.

//"We used to think that if we knew one, we knew two, because one and one are two. We are finding that we must learn a great deal more about 'and'."// -- Sir Arthur Eddington

=BCR and CSR= There are two formats in which the LHS is stored in JetStream: Block-Compressed Row (BCR) and Compressed Sparse Row (CSR). The matrix is filled out in BCR format for readability and converted to CSR format for efficiency with the ILU factorization and linear solver. Since everything else except for the row and column scaling are performed in BCR format, this document will apply to the BCR format of the matrix.

=Matrix Size and Structure= //“Too much of anything is bad, but too much good whiskey is barely enough.”// -- Mark Twain

There are currently two matrices used by JetStream: **Pmat** holds the preconditioner and **Jmat** holds the full Jacobian. They are both of type **DistMat** (see the pre-amble of pKISlib/pKISlib.f90 for full data structure). Pmat is always allocated, but Jmat is only allocated if needed. This is typically only for optimization, as we typically use Jacobian-free matrix-vector products. Pmat contains a smaller stencil than Jmat mainly due to the approximation of the fourth-difference dissipation by a lower order localization. If we want to refer to the stencil size locally, we typically call the local variable **stclsize**. The corresponding module variables for Pmat and Jmat are **paStencil** and **asStencil** respectively. For second-order flow solves:
 * ~ stclsize ||~ Euler ||~ Laminar ||~ RANS ||
 * = paStencil ||= 7 ||= 16 ||= 16 ||
 * = asStencil ||= 13 ||= 52 ||= 52 ||

The "good stuff" (i.e. the actual matrix entries, not whiskey) in each matrix is stored in **mat%as**. This is a real dp array allocated as follows: **mat%as(mat%bsize,mat%bsize,msize)**. In this expression, **mat%bsize** is the block size (often represented by **nVar** in the codes) and is 5 for Euler/laminar and 6 for RANS. Recall that there are nVar equations at each grid point which are a function of the nVar variables at the current node, plus the nVar variables at stclsize-1 neighbouring nodes. We additionally append **mat%bnbnd** equations for the internal interfaces nodes (see Additional Notes). In total, this results in **msize = stclsize*(mat%bnloc + mat%bnbnd)**, where **mat%bnloc** is the total number of grid points over all local blocks on the current process: **mat%bnloc = sum_{ib=1}^nBlk (blk(ib)%nNodes)**, where **blk(ib)%nNodes** is the total number of nodes on block ib and **nBlk** is the number of blocks //per process//.

=Organization of Internal Block-Elements= "//The secret of all victory lies in the organization of the non-obvious."// -- Marcus Aurelius

//"Organization charts and fancy titles count for next to nothing."// -- Colin Powell

Now that we have some idea of the shape of the matrix and what is stored in it, let's see how to access it. All local grid points on each block are assigned an index from 1 to bnloc, and this index is stored in **blk%indx**. If there are multiple local blocks, the indexing continues across all local blocks. Local ghost blocks also contain indexing in blk%indx, and this indexing also starts at 1. Using the short-hand **node = blk(ib)%indx(j,k,m)**, the pointer to the matrix row of nVar by nVar blocks corresponding to internal grid point (j,k,m) on block ib begins at **ptr = (node - 1)*stclsize + 1** and terminates at **ptrb = node*stclsize**. This does not account for the SAT terms, however. They will be discussed later.

We now know which row corresponds to which grid point, but not how data is organized along each individual row. The basic structure is to start with the diagonal term, then proceed with the j-components in ascending order, then the k components, etc. For example, let **ptr = (node - 1)*stclsize + 1 + off** index the matrix row block-elements. Assume, for this example, that stclsize=7, as with paStencil for the Euler equations. Then for //internal nodes//, the values of **off** are as follows: If we are including cross-derivatives and wider stencil (for example, viscous Jmat), then this numbering is extended. The following masterpiece shows the numbering in the j-k plane: ` ` ` ` ` ` ` ` 18 ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` 7 -- 4 -- 8 ` ` ` ` ` ` ` ` ` ` ` ` | ` ` | ` ` | ` ` ` ` ` ` ` ` 15 -- 1 -- 0 -- 2 -- 16 ` ` ` ` ` ` ` ` | ` ` | ` ` | ` ` ` ` ` ` ` ` ` ` ` ` 9 -- 3 --10 ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` | ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` 17 ` ` ` ` ` ` ` ` `
 * =  ||= Diagonal ||= j - 1 ||= j + 1 ||= k - 1 ||= k + 1 ||= m - 1 ||= m + 1 ||
 * = **off** ||= 0 ||= 1 ||= 2 ||= 3 ||= 4 ||= 5 ||= 6 ||

Often, we represent the direction of differentiation by the local variable **di**. In this case, we can easily get the offset **off = 2*di** for the high side and **off = 2*di - 1** for the low side. This appears quite often in the codes.

If the equation for which the block-element row of the matrix corresponds is at a block interface, the above indexing is modified. Since the SAT terms are stored at the end of the matrix (possibly for easy access for the Schur complement method), the matrix index corresponding to non-existent nodes across the interface is removed and the index is reduced to account for it. To continue our stclsize=7 example above, let us assume that we are at an interface where the j+1 node is not present but all other nodes are present. Then **off=1** would still correspond to j-1, but **off=2** would now correspond to k-1, **off=3** would correspond to k+1, etc. To indicate this, we use the variable **blk%stcl(j,k,m)**. This variable stores a binary representation of the stencil as a 32-bit integer. For our example, the integer would be the binary number **blk%stcl(j,k,m) = 101111**. To use this value we use the function
 * nbrpos(a,b)** located in common/Common_Mod.f90. This function returns the number of non-zero binary elements in number **a** up to the position defined by **b**. So, to finalize our example, if we want to get the high side offset for the position corresponding to direction **di**, we would use **off = nbrpos(blk%stcl(j,k,m),di*2)**.

=Organization of SAT Terms= Finally, we come to the SAT terms. We still use **node = blk%indx(j,k,m)** but we now also need to apply an offset.

For an **internal interface SAT**, there are two terms involved: one corresponding to the local interface node and one corresponding to the other block on the same process which it shares an interface. For the local interface node, we use the usual **ptr = (node - 1)*stclsize + 1** for the pointer, placing this in the correct place in the matrix. The part of the SAT corresponding to the ghost block however requires an additional offset to be applied. This offset is determined using information from the Interface data type: **off = iface%off2(jit1+1,jit2+1) + nbrpos(blk2%stcl(j2,k2,m2),stclsize)**. In this expression, **jit1** and **jit2** are two of the (j,k,m) indices. Only two of (j,k,m) are needed since the interface is two-dimensional, and so the other index is fixed. blk2 and (j2,k2,m2) are the ghost block and corresponding (j,k,m) indices. To see how j, k, and m relate to jit1 and jit2, you can take a look at fillExternalSATLHS, or. See the Additional Notes section for more info on **iface%off1** and **iface%off2**.

When filling out an internal interface SAT, we also need to fill out the SAT terms for the corresponding equation on the second block. In this case, the second block is treated as the main block and the first block becomes the ghost block. The expression is the same, but the indices are reversed: **node = blk2(j2,k2,m2)**, **ptr = (node - 1)*stclsize + 1**, **off = iface%off1(jit1+1,jit2+1) + nbrpos(blk%stcl(j,k,m),stclsize)**, where **ptr** gives the location of the local SAT and **ptr + off** gives the location of the ghost block SAT.

The portion of the **external interface SATs** corresponding to the ghost blocks are not contained with their equation but are placed at the bottom of the matrix. The component of the SAT term local to the process is again stored using the familiar **ptr = (node - 1)*stclsize + 1**. The portion of the SAT corresponding to the ghost block again requires an offset to be applied. The offset in this instance is: **off = mat%bnbnd + iface%off1(jit1+1,jit2+1)**. As before, for Jmat we need to use **iface%offV1** in place of **iface%off1**.

=CSR Format= //"CSR format is confusing as f* * *"// --David Brown

After the matrix is filled out it is converted to CSR format. When in CSR format, the matrix is treated as a general matrix with bsize by bsize block elements and we are no interested in any relationship to the grid nodes. This is the format used for the ILU decomposition and the linear solvers, which don't care about the CFD problem. The indexing in CSR format is a different style than the usual (i,j) coordinate system used for matrices. The two additional arrays used to index the CSR format matrix is also stored in the DistMat derived type. These arrays are **mat%ia** and **mat%ja**.

The basic idea of CSR format is that the first array **mat%ia** stores the number of nonzero elements per row and the second array **mat%ja** stores the actual column index. You can google CSR format to get some examples. However, there are some notable differences in the way we store our matrices and how the internet thinks they should be stored. Mainly: Saying that the indices point to block locations is not the same as saying that the blocks are treated as elements. The mat%ja array points to the first element of each block but it does so by referring to its value in the actual (theoretical) matrix.
 * 1) Our mat%ia array starts at 1 instead of 0 for some reason.
 * 2) Instead of pointing to individual elements in the matrix, our indices point to the bsize by bsize block elements.

Here is an example of some output from an actual 3D Euler test case. Obviously the actual arrays are huge, but here are the first few elements: From **mat%ia**, the first row has 4 blocks (corresponding to a corner on the boundary), the second row has 5 blocks (an edge), the third has 5, and we are obviously moving along an edge so a lot more will have 5 after that. Once we finish with the edge, we will start to fill out the rest of the surface and the rows will begin to have 6 blocks stored per row, and when we get into the interior it ill be 7. From **mat%ja**, the blocks begin at the indices given above, and we know from mat%ia that there are 4 nonzero blocks in the first row, so the actual columns in the first row of the matrix that would be filled are 1-5, 16-20, 11-20, and 6-10. These blocks correspond to **mat%as(:,:,1:4)**. In the second row of blocks (which would be the sixth row in the actual matrix), the elements would be 6-10, 31-35, 26-30, 21-25, and 1-5. Note also that the indexing is not global, it starts at 1 again on each process.
 * mat%ia** = [1, 5, 10, 15, 20, ... ]
 * mat%ja** = [1, 16, 11, 6, 6, 31, 26, 21, 1, 11, ... ]

If you've gotten this far, you probably need some **Advil**, so take a moment to go and get some. The next thing we will consider is the interfaces and SAT terms. As mentioned before, the terms corresponding to the external interface nodes are stored after the terms corresponding to the internal nodes and the SAT terms are stored last. If you print out the full mat%ja array, you will see near the end of the array a lot of zeros will be stored. At first there will be only a few nonzeros (every seventh term for the Euler equations) where the interface equations are and then a lot more (6/7 of the entries will be 0 for the Euler equations) once we presumably get to the SAT terms. However, when using CSR format, every element of mat%ja is accessed, and if an index of 0 is used in an array it would give a segmentation fault. So why are there zeros in mat%ja? (I hope you took that Advil...)

The answer is that these values stored at the end of mat%ja are actually not used by the solver. If you extract only the nonzero elements of the portion of mat%ja corresponding to the external interface equations and SAT terms, you will notice that this exact pattern appears earlier in the matrix. Most likely the mat%ja array was initialized with zeros and then the zeros were parsed out, writing the smaller array on top of the larger one but the elements at the end of the array were never removed, they are simply ignored.

With regard to the actual indexing of the SAT terms, the mat%ia follows consecutively from the previous terms as if the SAT terms were additional equations on the matrix (which in reality they are not). The mat%ja array on the other hand restarts from 1 and treats the SAT array like an additional block. This simplifies the indexing of external vectors a bit, as indicated below.

One of the advantages of CSR format is that it is easy to write matrix-vector products because the relationship between the matrix and vector is easily established - to relate the element of a matrix to the appropriate vector element, all we need is the column index. For example, **mat%as(:,:,k)** would be multiplied by **x(mat%ja(k):mat%ja(k)+bsize-1)**. This indexing can also be used for external matvecs, i.e. the part of the matvec corresponding to the SAT term. See the parallelMatVec routine in pKISlib for an example.

=Additional Notes=
 * **icol**: Supposedly, this is a column indexing array. Currently it is an array containing its own indices, equivalent to the identity transformation and thus pointless. I know that Jason Hicken originally experimented with different matrix structures, and this is most likely a remnant from that. It appears often in the LHS routines. You can really just ignore it.
 * **mat%bnbnd**: The sum of all interface nodes internal to the matrix (read that carefully). Internal interfaces add 2 to the count per interface node, external interfaces add 1 to the count per interface node, and boundary (wall, freestream, and symmetry planes) are not counted. "Corner" nodes are not double-counted.
 * **iface%off1** and **iface%off2**: When calculating the full Jacobian (Jmat), the SAT stencil can be larger and we need to use the offset **iface%offV1** in the place of **iface%off1**, and similarly for **iface%off2**. In the current version of the code, however, the value of these terms are the same, and this is unlikely to change in the future. It is also worth mentioning that internal interface SATs are not used in the Schur complement method and so ptr+off still corresponds to a block-element internal to the matrix, i.e. within the first stclsize*bnloc block-elements.

=That's All= That's about it. Feel free to add to this page anything that you think will be useful.