The matrix structures in JetStream can be pretty difficult to figure out, hence the existence of this page. We will try
to do our best to explain it.
"We used to think that if we knew one, we knew two, because one and one are two. We are finding that we must learn a great deal more about 'and'."
-- Sir Arthur Eddington
BCR and CSR
There are two formats in which the LHS is stored in JetStream: Block-Compressed Row (BCR) and Compressed Sparse Row (CSR). The matrix is filled out in BCR format for readability and converted to CSR format for efficiency with the ILU factorization and linear solver. Since everything else except for the row and column scaling are performed in BCR format, this document will apply to the BCR format of the matrix.
Matrix Size and Structure
“Too much of anything is bad, but too much good whiskey is barely enough.”
-- Mark Twain
There are currently two matrices used by JetStream: Pmat holds the preconditioner and Jmat holds the full Jacobian. They are both of type DistMat (see the pre-amble of pKISlib/pKISlib.f90 for full data structure). Pmat is always allocated, but Jmat is only allocated if needed. This is typically only for optimization, as we typically use Jacobian-free matrix-vector products. Pmat contains a smaller stencil than Jmat mainly due to the approximation of the fourth-difference dissipation by a lower order localization. If we want to refer to the stencil size locally, we typically call the local variable stclsize. The corresponding module variables for Pmat and Jmat are paStencil and asStencil respectively. For second-order flow solves:
stclsize
Euler
Laminar
RANS
paStencil
7
16
16
asStencil
13
52
52
The "good stuff" (i.e. the actual matrix entries, not whiskey) in each matrix is stored in mat%as(:,:,:). This is a real dp array allocated as follows: mat%as(mat%bsize,mat%bsize,msize). In this expression, mat%bsize is the block size (often represented by nVar in the codes) and is 5 for Euler/laminar and 6 for RANS. Recall that there are nVar equations at each grid point which are a function of the nVar variables at the current node, plus the nVar variables at stclsize-1 neighbouring nodes. We additionally append mat%bnbnd equations for the internal interfaces nodes (see Additional Notes). In total, this results in msize = stclsize*(mat%bnloc + mat%bnbnd), where mat%bnloc is the total number of grid points over all local blocks on the current process: mat%bnloc = sum_{ib=1}^nBlk (blk(ib)%nNodes), where blk(ib)%nNodes is the total number of nodes on block ib and nBlk is the number of blocks per process.
Organization of Internal Block-Elements
"The secret of all victory lies in the organization of the non-obvious."
-- Marcus Aurelius
"Organization charts and fancy titles count for next to nothing."
-- Colin Powell
Now that we have some idea of the shape of the matrix and what is stored in it, let's see how to access it. All local grid points on each block are assigned an index from 1 to bnloc, and this index is stored in blk%indx. If there are multiple local blocks, the indexing continues across all local blocks. Local ghost blocks also contain indexing in blk%indx, and this indexing also starts at 1. Using the short-hand node = blk(ib)%indx(j,k,m), the pointer to the matrix row of nVar by nVar blocks corresponding to internal grid point (j,k,m) on block ib begins at ptr = (node - 1)*stclsize + 1 and terminates at ptrb = node*stclsize. This does not account for the SAT terms, however. They will be discussed later.
We now know which row corresponds to which grid point, but not how data is organized along each individual row. The basic structure is to start with the diagonal term, then proceed with the j-components in ascending order, then the k components, etc. For example, let ptr = (node - 1)*stclsize + 1 + off index the matrix row block-elements. Assume, for this example, that stclsize=7, as with paStencil for the Euler equations. Then for internal nodes, the values of off are as follows:
Often, we represent the direction of differentiation by the local variable di. In this case, we can easily get the offset off = 2*di for the high side and off = 2*di - 1 for the low side. This appears quite often in the codes.
If the equation for which the block-element row of the matrix corresponds is at a block interface, the above indexing is modified. Since the SAT terms are stored at the end of the matrix (possibly for easy access for the Schur complement method), the matrix index corresponding to non-existent nodes across the interface is removed and the index is reduced to account for it. To continue our stclsize=7 example above, let us assume that we are at an interface where the j+1 node is not present but all other nodes are present. Then off=1 would still correspond to j-1, but off=2 would now correspond to k-1, off=3 would correspond to k+1, etc. To indicate this, we use the variable blk%stcl(j,k,m). This variable stores a binary representation of the stencil as a 32-bit integer. For our example, the integer would be the binary number blk%stcl(j,k,m) = 101111. To use this value we use the function nbrpos(a,b) located in common/Common_Mod.f90. This function returns the number of non-zero binary elements in number a up to the position defined by b. So, to finalize our example, if we want to get the high side offset for the position corresponding to direction di, we would use off = nbrpos(blk%stcl(j,k,m),di*2).
Organization of SAT Terms
Finally, we come to the SAT terms. We still use node = blk%indx(j,k,m) but we now also need to apply an offset.
For an internal interface SAT, there are two terms involved: one corresponding to the local interface node and one corresponding to the other block on the same process which it shares an interface. For the local interface node, we use the usual ptr = (node - 1)*stclsize + 1 for the pointer, placing this in the correct place in the matrix. The part of the SAT corresponding to the ghost block however requires an additional offset to be applied. This offset is determined using information from the Interface data type: off = iface%off2(jit1+1,jit2+1) + nbrpos(blk2%stcl(j2,k2,m2),stclsize). In this expression, jit1 and jit2 are two of the (j,k,m) indices. Only two of (j,k,m) are needed since the interface is two-dimensional, and so the other index is fixed. blk2 and (j2,k2,m2) are the ghost block and corresponding (j,k,m) indices. To see how j, k, and m relate to jit1 and jit2, you can take a look at fillExternalSATLHS, or Diablo_developers_guide.pdf. See the Additional Notes section for more info on iface%off1 and iface%off2.
When filling out an internal interface SAT, we also need to fill out the SAT terms for the corresponding equation on the second block. In this case, the second block is treated as the main block and the first block becomes the ghost block. The expression is the same, but the indices are reversed: node = blk2(j2,k2,m2), ptr = (node - 1)*stclsize + 1, off = iface%off1(jit1+1,jit2+1) + nbrpos(blk%stcl(j,k,m),stclsize), where ptr gives the location of the local SAT and ptr + off gives the location of the ghost block SAT.
The portion of the external interface SATs corresponding to the ghost blocks are not contained with their equation but are placed at the bottom of the matrix. The component of the SAT term local to the process is again stored using the familiar ptr = (node - 1)*stclsize + 1. The portion of the SAT corresponding to the ghost block again requires an offset to be applied. The offset in this instance is: off = mat%bnbnd + iface%off1(jit1+1,jit2+1). As before, for Jmat we need to use iface%offV1 in place of iface%off1.
CSR Format
"CSR format is confusing as f* * *"
--David Brown
After the matrix is filled out it is converted to CSR format. When in CSR format, the matrix is treated as a general matrix with bsize by bsize block elements and we are no interested in any relationship to the grid nodes. This is the format used for the ILU decomposition and the linear solvers, which don't care about the CFD problem. The indexing in CSR format is a different style than the usual (i,j) coordinate system used for matrices. The two additional arrays used to index the CSR format matrix is also stored in the DistMat derived type. These arrays are mat%ia(:) and mat%ja(:).
The basic idea of CSR format is that the first array mat%ia stores the number of nonzero elements per row and the second array mat%ja stores the actual column index. You can google CSR format to get some examples. However, there are some notable differences in the way we store our matrices and how the internet thinks they should be stored. Mainly:
Our mat%ia array starts at 1 instead of 0 for some reason.
Instead of pointing to individual elements in the matrix, our indices point to the bsize by bsize block elements.
Saying that the indices point to block locations is not the same as saying that the blocks are treated as elements. The mat%ja array points to the first element of each block but it does so by referring to its value in the actual (theoretical) matrix.
Here is an example of some output from an actual 3D Euler test case. Obviously the actual arrays are huge, but here are the first few elements: mat%ia = [1, 5, 10, 15, 20, ... ] mat%ja = [1, 16, 11, 6, 6, 31, 26, 21, 1, 11, ... ]
From mat%ia, the first row has 4 blocks (corresponding to a corner on the boundary), the second row has 5 blocks (an edge), the third has 5, and we are obviously moving along an edge so a lot more will have 5 after that. Once we finish with the edge, we will start to fill out the rest of the surface and the rows will begin to have 6 blocks stored per row, and when we get into the interior it ill be 7. From mat%ja, the blocks begin at the indices given above, and we know from mat%ia that there are 4 nonzero blocks in the first row, so the actual columns in the first row of the matrix that would be filled are 1-5, 16-20, 11-20, and 6-10. These blocks correspond to mat%as(:,:,1:4). In the second row of blocks (which would be the sixth row in the actual matrix), the elements would be 6-10, 31-35, 26-30, 21-25, and 1-5. Note also that the indexing is not global, it starts at 1 again on each process.
If you've gotten this far, you probably need some Advil, so take a moment to go and get some. The next thing we will consider is the interfaces and SAT terms. As mentioned before, the terms corresponding to the external interface nodes are stored after the terms corresponding to the internal nodes and the SAT terms are stored last. If you print out the full mat%ja array, you will see near the end of the array a lot of zeros will be stored. At first there will be only a few nonzeros (every seventh term for the Euler equations) where the interface equations are and then a lot more (6/7 of the entries will be 0 for the Euler equations) once we presumably get to the SAT terms. However, when using CSR format, every element of mat%ja is accessed, and if an index of 0 is used in an array it would give a segmentation fault. So why are there zeros in mat%ja? (I hope you took that Advil...)
The answer is that these values stored at the end of mat%ja are actually not used by the solver. If you extract only the nonzero elements of the portion of mat%ja corresponding to the external interface equations and SAT terms, you will notice that this exact pattern appears earlier in the matrix. Most likely the mat%ja array was initialized with zeros and then the zeros were parsed out, writing the smaller array on top of the larger one but the elements at the end of the array were never removed, they are simply ignored.
With regard to the actual indexing of the SAT terms, the mat%ia follows consecutively from the previous terms as if the SAT terms were additional equations on the matrix (which in reality they are not). The mat%ja array on the other hand restarts from 1 and treats the SAT array like an additional block. This simplifies the indexing of external vectors a bit, as indicated below.
One of the advantages of CSR format is that it is easy to write matrix-vector products because the relationship between the matrix and vector is easily established - to relate the element of a matrix to the appropriate vector element, all we need is the column index. For example, mat%as(:,:,k) would be multiplied by x(mat%ja(k):mat%ja(k)+bsize-1) .
This indexing can also be used for external matvecs, i.e. the part of the matvec corresponding to the SAT term. See the parallelMatVec routine in pKISlib for an example.
Additional Notes
icol: Supposedly, this is a column indexing array. Currently it is an array containing its own indices, equivalent to the identity transformation and thus pointless. I know that Jason Hicken originally experimented with different matrix structures, and this is most likely a remnant from that. It appears often in the LHS routines. You can really just ignore it.
mat%bnbnd: The sum of all interface nodes internal to the matrix (read that carefully). Internal interfaces add 2 to the count per interface node, external interfaces add 1 to the count per interface node, and boundary (wall, freestream, and symmetry planes) are not counted. "Corner" nodes are not double-counted.
iface%off1 and iface%off2: When calculating the full Jacobian (Jmat), the SAT stencil can be larger and we need to use the offset iface%offV1 in the place of iface%off1, and similarly for iface%off2. In the current version of the code, however, the value of these terms are the same, and this is unlikely to change in the future. It is also worth mentioning that internal interface SATs are not used in the Schur complement method and so ptr+off still corresponds to a block-element internal to the matrix, i.e. within the first stclsize*bnloc block-elements.
That's All
That's about it. Feel free to add to this page anything that you think will be useful.
Table of Contents
to do our best to explain it.
"We used to think that if we knew one, we knew two, because one and one are two. We are finding that we must learn a great deal more about 'and'."
-- Sir Arthur Eddington
BCR and CSR
There are two formats in which the LHS is stored in JetStream: Block-Compressed Row (BCR) and Compressed Sparse Row (CSR). The matrix is filled out in BCR format for readability and converted to CSR format for efficiency with the ILU factorization and linear solver. Since everything else except for the row and column scaling are performed in BCR format, this document will apply to the BCR format of the matrix.Matrix Size and Structure
“Too much of anything is bad, but too much good whiskey is barely enough.”-- Mark Twain
There are currently two matrices used by JetStream: Pmat holds the preconditioner and Jmat holds the full Jacobian. They are both of type DistMat (see the pre-amble of pKISlib/pKISlib.f90 for full data structure). Pmat is always allocated, but Jmat is only allocated if needed. This is typically only for optimization, as we typically use Jacobian-free matrix-vector products. Pmat contains a smaller stencil than Jmat mainly due to the approximation of the fourth-difference dissipation by a lower order localization. If we want to refer to the stencil size locally, we typically call the local variable stclsize. The corresponding module variables for Pmat and Jmat are paStencil and asStencil respectively. For second-order flow solves:
The "good stuff" (i.e. the actual matrix entries, not whiskey) in each matrix is stored in mat%as(:,:,:). This is a real dp array allocated as follows: mat%as(mat%bsize,mat%bsize,msize). In this expression, mat%bsize is the block size (often represented by nVar in the codes) and is 5 for Euler/laminar and 6 for RANS. Recall that there are nVar equations at each grid point which are a function of the nVar variables at the current node, plus the nVar variables at stclsize-1 neighbouring nodes. We additionally append mat%bnbnd equations for the internal interfaces nodes (see Additional Notes). In total, this results in msize = stclsize*(mat%bnloc + mat%bnbnd), where mat%bnloc is the total number of grid points over all local blocks on the current process: mat%bnloc = sum_{ib=1}^nBlk (blk(ib)%nNodes), where blk(ib)%nNodes is the total number of nodes on block ib and nBlk is the number of blocks per process.
Organization of Internal Block-Elements
"The secret of all victory lies in the organization of the non-obvious."-- Marcus Aurelius
"Organization charts and fancy titles count for next to nothing."
-- Colin Powell
Now that we have some idea of the shape of the matrix and what is stored in it, let's see how to access it. All local grid points on each block are assigned an index from 1 to bnloc, and this index is stored in blk%indx. If there are multiple local blocks, the indexing continues across all local blocks. Local ghost blocks also contain indexing in blk%indx, and this indexing also starts at 1. Using the short-hand node = blk(ib)%indx(j,k,m), the pointer to the matrix row of nVar by nVar blocks corresponding to internal grid point (j,k,m) on block ib begins at ptr = (node - 1)*stclsize + 1 and terminates at ptrb = node*stclsize. This does not account for the SAT terms, however. They will be discussed later.
We now know which row corresponds to which grid point, but not how data is organized along each individual row. The basic structure is to start with the diagonal term, then proceed with the j-components in ascending order, then the k components, etc. For example, let ptr = (node - 1)*stclsize + 1 + off index the matrix row block-elements. Assume, for this example, that stclsize=7, as with paStencil for the Euler equations. Then for internal nodes, the values of off are as follows:
` ` ` ` ` ` ` ` 18 ` ` ` ` ` ` ` ` `
` ` ` ` ` ` ` ` ` | ` ` ` ` ` ` ` ` `
` ` ` ` ` ` 7 -- 4 -- 8 ` ` ` ` ` `
` ` ` ` ` ` | ` ` | ` ` | ` ` ` ` ` `
` ` 15 -- 1 -- 0 -- 2 -- 16 ` `
` ` ` ` ` ` | ` ` | ` ` | ` ` ` ` ` `
` ` ` ` ` ` 9 -- 3 --10 ` ` ` ` ` `
` ` ` ` ` ` ` ` ` | ` ` ` ` ` ` ` ` `
` ` ` ` ` ` ` ` 17 ` ` ` ` ` ` ` ` `
Often, we represent the direction of differentiation by the local variable di. In this case, we can easily get the offset off = 2*di for the high side and off = 2*di - 1 for the low side. This appears quite often in the codes.
If the equation for which the block-element row of the matrix corresponds is at a block interface, the above indexing is modified. Since the SAT terms are stored at the end of the matrix (possibly for easy access for the Schur complement method), the matrix index corresponding to non-existent nodes across the interface is removed and the index is reduced to account for it. To continue our stclsize=7 example above, let us assume that we are at an interface where the j+1 node is not present but all other nodes are present. Then off=1 would still correspond to j-1, but off=2 would now correspond to k-1, off=3 would correspond to k+1, etc. To indicate this, we use the variable blk%stcl(j,k,m). This variable stores a binary representation of the stencil as a 32-bit integer. For our example, the integer would be the binary number blk%stcl(j,k,m) = 101111. To use this value we use the function
nbrpos(a,b) located in common/Common_Mod.f90. This function returns the number of non-zero binary elements in number a up to the position defined by b. So, to finalize our example, if we want to get the high side offset for the position corresponding to direction di, we would use off = nbrpos(blk%stcl(j,k,m),di*2).
Organization of SAT Terms
Finally, we come to the SAT terms. We still use node = blk%indx(j,k,m) but we now also need to apply an offset.For an internal interface SAT, there are two terms involved: one corresponding to the local interface node and one corresponding to the other block on the same process which it shares an interface. For the local interface node, we use the usual ptr = (node - 1)*stclsize + 1 for the pointer, placing this in the correct place in the matrix. The part of the SAT corresponding to the ghost block however requires an additional offset to be applied. This offset is determined using information from the Interface data type: off = iface%off2(jit1+1,jit2+1) + nbrpos(blk2%stcl(j2,k2,m2),stclsize). In this expression, jit1 and jit2 are two of the (j,k,m) indices. Only two of (j,k,m) are needed since the interface is two-dimensional, and so the other index is fixed. blk2 and (j2,k2,m2) are the ghost block and corresponding (j,k,m) indices. To see how j, k, and m relate to jit1 and jit2, you can take a look at fillExternalSATLHS, or Diablo_developers_guide.pdf. See the Additional Notes section for more info on iface%off1 and iface%off2.
When filling out an internal interface SAT, we also need to fill out the SAT terms for the corresponding equation on the second block. In this case, the second block is treated as the main block and the first block becomes the ghost block. The expression is the same, but the indices are reversed: node = blk2(j2,k2,m2), ptr = (node - 1)*stclsize + 1, off = iface%off1(jit1+1,jit2+1) + nbrpos(blk%stcl(j,k,m),stclsize), where ptr gives the location of the local SAT and ptr + off gives the location of the ghost block SAT.
The portion of the external interface SATs corresponding to the ghost blocks are not contained with their equation but are placed at the bottom of the matrix. The component of the SAT term local to the process is again stored using the familiar ptr = (node - 1)*stclsize + 1. The portion of the SAT corresponding to the ghost block again requires an offset to be applied. The offset in this instance is: off = mat%bnbnd + iface%off1(jit1+1,jit2+1). As before, for Jmat we need to use iface%offV1 in place of iface%off1.
CSR Format
"CSR format is confusing as f* * *"--David Brown
After the matrix is filled out it is converted to CSR format. When in CSR format, the matrix is treated as a general matrix with bsize by bsize block elements and we are no interested in any relationship to the grid nodes. This is the format used for the ILU decomposition and the linear solvers, which don't care about the CFD problem. The indexing in CSR format is a different style than the usual (i,j) coordinate system used for matrices. The two additional arrays used to index the CSR format matrix is also stored in the DistMat derived type. These arrays are mat%ia(:) and mat%ja(:).
The basic idea of CSR format is that the first array mat%ia stores the number of nonzero elements per row and the second array mat%ja stores the actual column index. You can google CSR format to get some examples. However, there are some notable differences in the way we store our matrices and how the internet thinks they should be stored. Mainly:
- Our mat%ia array starts at 1 instead of 0 for some reason.
- Instead of pointing to individual elements in the matrix, our indices point to the bsize by bsize block elements.
Saying that the indices point to block locations is not the same as saying that the blocks are treated as elements. The mat%ja array points to the first element of each block but it does so by referring to its value in the actual (theoretical) matrix.Here is an example of some output from an actual 3D Euler test case. Obviously the actual arrays are huge, but here are the first few elements:
mat%ia = [1, 5, 10, 15, 20, ... ]
mat%ja = [1, 16, 11, 6, 6, 31, 26, 21, 1, 11, ... ]
From mat%ia, the first row has 4 blocks (corresponding to a corner on the boundary), the second row has 5 blocks (an edge), the third has 5, and we are obviously moving along an edge so a lot more will have 5 after that. Once we finish with the edge, we will start to fill out the rest of the surface and the rows will begin to have 6 blocks stored per row, and when we get into the interior it ill be 7. From mat%ja, the blocks begin at the indices given above, and we know from mat%ia that there are 4 nonzero blocks in the first row, so the actual columns in the first row of the matrix that would be filled are 1-5, 16-20, 11-20, and 6-10. These blocks correspond to mat%as(:,:,1:4). In the second row of blocks (which would be the sixth row in the actual matrix), the elements would be 6-10, 31-35, 26-30, 21-25, and 1-5. Note also that the indexing is not global, it starts at 1 again on each process.
If you've gotten this far, you probably need some Advil, so take a moment to go and get some. The next thing we will consider is the interfaces and SAT terms. As mentioned before, the terms corresponding to the external interface nodes are stored after the terms corresponding to the internal nodes and the SAT terms are stored last. If you print out the full mat%ja array, you will see near the end of the array a lot of zeros will be stored. At first there will be only a few nonzeros (every seventh term for the Euler equations) where the interface equations are and then a lot more (6/7 of the entries will be 0 for the Euler equations) once we presumably get to the SAT terms. However, when using CSR format, every element of mat%ja is accessed, and if an index of 0 is used in an array it would give a segmentation fault. So why are there zeros in mat%ja? (I hope you took that Advil...)
The answer is that these values stored at the end of mat%ja are actually not used by the solver. If you extract only the nonzero elements of the portion of mat%ja corresponding to the external interface equations and SAT terms, you will notice that this exact pattern appears earlier in the matrix. Most likely the mat%ja array was initialized with zeros and then the zeros were parsed out, writing the smaller array on top of the larger one but the elements at the end of the array were never removed, they are simply ignored.
With regard to the actual indexing of the SAT terms, the mat%ia follows consecutively from the previous terms as if the SAT terms were additional equations on the matrix (which in reality they are not). The mat%ja array on the other hand restarts from 1 and treats the SAT array like an additional block. This simplifies the indexing of external vectors a bit, as indicated below.
One of the advantages of CSR format is that it is easy to write matrix-vector products because the relationship between the matrix and vector is easily established - to relate the element of a matrix to the appropriate vector element, all we need is the column index. For example, mat%as(:,:,k) would be multiplied by x(mat%ja(k):mat%ja(k)+bsize-1) .
This indexing can also be used for external matvecs, i.e. the part of the matvec corresponding to the SAT term. See the parallelMatVec routine in pKISlib for an example.
Additional Notes
That's All
That's about it. Feel free to add to this page anything that you think will be useful.