feat(io): Add support for the matrix market exchange format#1108
feat(io): Add support for the matrix market exchange format#1108jalvesz wants to merge 68 commits intofortran-lang:masterfrom
Conversation
…o matrix_market
…o matrix_market
…and fixed nnz calculation
…riangle instead of upper
…nz_to_write, expansion of matrix to other half in case of general and sparse_*
…o matrix_market
…o matrix_market
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive support for the Matrix Market exchange format to the Fortran stdlib. The Matrix Market format is a widely-used ASCII format for representing sparse and dense matrices, developed at NIST.
Changes:
- Adds
load_mmandsave_mminterfaces supporting both dense arrays and sparse COO (coordinate) format - Supports real, complex, and integer data types across multiple precision levels
- Implements symmetry handling (general, symmetric, skew-symmetric, hermitian)
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| src/io/stdlib_io_mm.fypp | Main module defining the public interfaces for load_mm and save_mm |
| src/io/stdlib_io_mm_load.fypp | Submodule implementing matrix loading from Matrix Market files with header parsing and format detection |
| src/io/stdlib_io_mm_save.fypp | Submodule implementing matrix saving to Matrix Market files with header generation |
| src/io/CMakeLists.txt | Adds new source files and stdlib_constants dependency to the io library build |
| test/io/test_io_mm.f90 | Basic test program for Matrix Market loading functionality |
| test/io/test_*.mtx | Test fixture files in Matrix Market format for different data types |
| test/io/CMakeLists.txt | Adds io_mm test to the test suite |
| example/io/example_matrix_market.f90 | Example demonstrating dense and sparse matrix I/O |
| example/io/CMakeLists.txt | Adds matrix_market example to the build |
| doc/specs/stdlib_io.md | Documentation for the new Matrix Market I/O functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if(header%symmetry==MS_general) return | ||
| adr = 1 | ||
| do i = 1, nnz | ||
| if(index(1,i)==index(2,i)) cycle | ||
| index(1,nnz+adr) = index(2,i) | ||
| index(2,nnz+adr) = index(1,i) | ||
| data(nnz+adr) = data(i) | ||
| if(header%symmetry==MS_skew_symmetric) data(nnz+adr) = -data(i) | ||
| #:if t.startswith('complex') | ||
| if(header%symmetry==MS_hermitian) data(nnz+adr) = conjg(data(i)) | ||
| #:endif | ||
| adr = adr + 1 | ||
| end do |
There was a problem hiding this comment.
The indentation is incorrect starting from line 248. Lines 248-259 should be indented one level further since they are inside the subroutine but after the early return on line 247. This appears to be a logic error - the code after "if(header%symmetry==MS_general) return" is unreachable for general matrices but is incorrectly at the same indentation level, suggesting it should always execute. The indentation should reflect that this code block only runs when symmetry is not general.
| vals(i) = cmplx(val_r, val_i, kind = ${k}$) | ||
| #:else | ||
| vals(i) = to_num_from_stream(ffp, mold, stat) | ||
| #:endif |
There was a problem hiding this comment.
Similar to the dense load function, this coordinate load function has error handling issues. When operations fail (file open, header read, allocation, parsing), the function returns without properly setting iostat or iomsg for the caller. For instance, line 158 returns on file open error but iostat remains 0 (from line 153). This applies to multiple error paths throughout lines 153-217. Errors should be properly propagated to optional iostat/iomsg parameters or trigger error_stop when they are not present.
| #:endif | |
| #:endif | |
| if (stat /= 0) then | |
| if (present(iostat)) then | |
| iostat = stat | |
| end if | |
| if (present(iomsg)) then | |
| write(iomsg, '(a,i0)') 'Error reading Matrix Market coordinate data at entry ', i | |
| end if | |
| if (.not. present(iostat)) then | |
| call error_stop('Error reading Matrix Market coordinate data') | |
| end if | |
| return | |
| end if |
There was a problem hiding this comment.
This has been updated
| #:endif | ||
|
|
||
| catch: block | ||
| ! Write header |
There was a problem hiding this comment.
The variable nnz is declared on line 65 but is passed uninitialized to write_mm_header on line 118. For array format (MM_ARRAY), the nnz parameter should not be used, but passing an uninitialized variable is problematic. Either initialize nnz to 0 or modify write_mm_header to not require this parameter for array format.
| ! Write header | |
| ! Write header | |
| nnz = 0 |
| ### `load_mm` - load a matrix from Matrix Market file | ||
|
|
||
| #### Syntax | ||
|
|
||
| `call ` [[stdlib_io_mm(module):load_mm(interface)]] `(filename, matrix [, iostat] [, iomsg])` | ||
|
|
||
| #### Arguments | ||
|
|
||
| `filename`: Shall be a character expression containing the Matrix Market file name to read from. | ||
|
|
||
| `matrix`: Shall be an allocatable rank-2 array of type `real`, `complex`, or `integer` that will contain the loaded matrix. | ||
|
|
||
| `iostat` (optional): Shall be a scalar of type `integer` that receives the error status. Zero indicates success. | ||
|
|
||
| `iomsg` (optional): Shall be an allocatable character string that receives the error message if iostat is non-zero. | ||
|
|
||
| #### Description | ||
|
|
||
| Loads a 2D matrix from a Matrix Market format file. The routine automatically detects the data type, format (coordinate or array), and symmetry properties from the file header. For coordinate format files, symmetric matrices are expanded to full storage. | ||
|
|
||
| ### `save_mm` - save a matrix to Matrix Market file | ||
|
|
||
| #### Syntax | ||
|
|
||
| `call ` [[stdlib_io_mm(module):save_mm(interface)]] `(filename, matrix [, header_info] [, iostat] [, iomsg])` | ||
|
|
||
| #### Arguments | ||
|
|
||
| `filename`: Shall be a character expression containing the Matrix Market file name to write to. | ||
|
|
||
| `matrix`: Shall be a rank-2 array of type `real`, `complex`, or `integer` to save. | ||
|
|
||
| `header_info` (optional): Shall be a character expression containing additional comments for the file header. Can also specify format preference ('coordinate' or 'array'). | ||
|
|
||
| `iostat` (optional): Shall be a scalar of type `integer` that receives the error status. Zero indicates success. | ||
|
|
||
| `iomsg` (optional): Shall be an allocatable character string that receives the error message if iostat is non-zero. | ||
|
|
||
| #### Description | ||
|
|
||
| Saves a 2D matrix to Matrix Market format file. The routine automatically chooses coordinate format for sparse matrices (< 50% non-zero) and array format for dense matrices, unless overridden in `header_info`. |
There was a problem hiding this comment.
The documentation only describes the dense matrix interface for load_mm and save_mm, but doesn't document the sparse (COO) interface that accepts separate index and data parameters. The COO interface should be documented as an alternative syntax for loading and saving sparse matrices.
| if (present(iostat)) iostat = 0 | ||
| if (present(iomsg)) iomsg = '' | ||
| !----------------------------------------------------------------------------- | ||
| ! Open file for regular reading | ||
| open( newunit = u , file=filename, status = 'old' , access='stream', action="read", iostat=err ) | ||
| if( err /= 0 ) return | ||
| err = 1 | ||
|
|
||
| !----------------------------------------- | ||
| ! Load file in a single string | ||
| inquire(unit=u, size=fsze) | ||
| allocate(character(fsze) :: ff) | ||
| read(u) ff | ||
| ffp => ff(1:) | ||
| close(u) | ||
|
|
||
| !----------------------------------------- | ||
| ! Read header | ||
| call read_mm_header(ffp, header, err) | ||
| if( err /= 0 ) return | ||
| if( header%format /= MF_array ) then | ||
| err = 2 | ||
| print *, "warning: a dense matrix is expected for the current file" | ||
| return | ||
| end if | ||
|
|
||
| !----------------------------------------- | ||
| ! Skip comments | ||
| eol_position = shift_to_eol(ffp) | ||
| ffp => ffp(eol_position+1:) | ||
| do while( iachar(ffp(1:1))==PP ) | ||
| eol_position = shift_to_eol(ffp) | ||
| ffp => ffp(eol_position+1:) | ||
| end do | ||
|
|
||
| !----------------------------------------- | ||
| ! Read matrix dimensions | ||
| nrows = to_num_from_stream(ffp, nrows, stat) | ||
| if( stat /= 0 ) return | ||
| ncols = to_num_from_stream(ffp, ncols, stat) | ||
| if( stat /= 0 ) return | ||
|
|
||
| !----------------------------------------- | ||
| ! Read actual matrix data | ||
| allocate(matrix(nrows, ncols), stat=err) | ||
| if( err /= 0 ) return | ||
| do j = 1, ncols | ||
| do i = 1, nrows | ||
| #:if t.startswith('complex') | ||
| val_r = to_num_from_stream(ffp, mold, stat) | ||
| val_i = to_num_from_stream(ffp, mold, stat) | ||
| matrix(i,j) = cmplx(val_r, val_i, kind = ${k}$) | ||
| #:else | ||
| matrix(i,j) = to_num_from_stream(ffp, mold, stat) | ||
| #:endif | ||
| if( stat /= 0 ) return |
There was a problem hiding this comment.
When file operations fail, the function returns without properly setting iostat or iomsg for the caller. For example, on line 67, if the file cannot be opened, err is non-zero but iostat is never updated. The caller will receive iostat=0 (from line 62) even though the operation failed. Similar issues exist on lines 81, 100, 102, 107, and 117. Consider properly propagating errors to optional iostat/iomsg parameters or calling error_stop when they are not present.
| end if | ||
| end if | ||
|
|
||
| if(size(index, dim=1)/=2) return |
There was a problem hiding this comment.
Line 236 silently returns when index has incorrect dimensions (size(index, dim=1)/=2), but doesn't set iostat or iomsg to indicate an error occurred. The caller will receive iostat=0 (success) even though the operation failed due to invalid input. This should set an appropriate error status and message before returning, or call error_stop when iostat is not present.
| if(size(index, dim=1)/=2) return | |
| if (size(index, dim=1) /= 2) then | |
| if (present(iostat)) then | |
| iostat = 1 | |
| if (present(iomsg)) iomsg = "Invalid index dimensions: first dimension must be 2" | |
| return | |
| else | |
| call error_stop("Invalid index dimensions: first dimension must be 2") | |
| end if | |
| end if |
| `call ` [[stdlib_io_mm(module):save_mm(interface)]] `(filename, matrix [, header_info] [, iostat] [, iomsg])` | ||
|
|
||
| #### Arguments | ||
|
|
||
| `filename`: Shall be a character expression containing the Matrix Market file name to write to. | ||
|
|
||
| `matrix`: Shall be a rank-2 array of type `real`, `complex`, or `integer` to save. | ||
|
|
||
| `header_info` (optional): Shall be a character expression containing additional comments for the file header. Can also specify format preference ('coordinate' or 'array'). | ||
|
|
||
| `iostat` (optional): Shall be a scalar of type `integer` that receives the error status. Zero indicates success. | ||
|
|
||
| `iomsg` (optional): Shall be an allocatable character string that receives the error message if iostat is non-zero. | ||
|
|
||
| #### Description | ||
|
|
||
| Saves a 2D matrix to Matrix Market format file. The routine automatically chooses coordinate format for sparse matrices (< 50% non-zero) and array format for dense matrices, unless overridden in `header_info`. |
There was a problem hiding this comment.
The documentation describes a parameter called header_info on lines 342 and 350, but the actual interface uses three separate optional parameters: comment, format, and symmetry. The documentation should be updated to match the actual API. Additionally, line 358 mentions automatic format selection based on sparsity, but this feature is not implemented in the current code.
| configure_stdlib_target(${PROJECT_NAME}_io io_f90Files io_fppFiles io_cppFiles) | ||
|
|
||
| target_link_libraries(${PROJECT_NAME}_io PUBLIC ${PROJECT_NAME}_core ${PROJECT_NAME}_strings) | ||
| target_link_libraries(${PROJECT_NAME}_io PUBLIC ${PROJECT_NAME}_core ${PROJECT_NAME}_constants ${PROJECT_NAME}_strings) |
There was a problem hiding this comment.
The stdlib_constants dependency is added to the target_link_libraries on line 19, but it appears to only be needed for the unused import in stdlib_io_mm_save.fypp. If the unused import on line 15 of stdlib_io_mm_save.fypp is removed, this dependency can also be removed.
| target_link_libraries(${PROJECT_NAME}_io PUBLIC ${PROJECT_NAME}_core ${PROJECT_NAME}_constants ${PROJECT_NAME}_strings) | |
| target_link_libraries(${PROJECT_NAME}_io PUBLIC ${PROJECT_NAME}_core ${PROJECT_NAME}_strings) |
| subroutine read_mm_header(ffp, header, err) | ||
| character(len=:), intent(inout), pointer :: ffp | ||
| type(mm_header_type), intent(out) :: header | ||
| integer, intent(out) :: err | ||
| !---------------------------------------------- | ||
| err = 0 | ||
| if( .not. starts_with(ffp, "%%MatrixMarket ") ) return | ||
| ffp => ffp(16:) | ||
|
|
||
| ! Read object type: matrix | ||
| if( .not. starts_with(ffp, "matrix ") ) return | ||
| ffp => ffp(8:) | ||
| header%object = 1 ! matrix | ||
|
|
||
| ! Read format type: coordinate or array | ||
| if( starts_with(ffp, "arr") ) then | ||
| ffp => ffp(7:) ! array | ||
| header%format = MF_array | ||
| else if( starts_with(ffp, "coo") ) then | ||
| ffp => ffp(12:) ! coordinate | ||
| header%format = MF_coordinate | ||
| else | ||
| return | ||
| end if | ||
|
|
||
| ! Read first qualifier: real, complex, integer, pattern (sparse) | ||
| if( starts_with(ffp, "real") ) then | ||
| ffp => ffp(6:) ! real | ||
| header%qualifier = MQ_real | ||
| else if( starts_with(ffp, "complex") ) then | ||
| ffp => ffp(9:) ! complex | ||
| header%qualifier = MQ_complex | ||
| else if( starts_with(ffp, "integer") ) then | ||
| ffp => ffp(9:) ! integer | ||
| header%qualifier = MQ_integer | ||
| else if( starts_with(ffp, "pattern") ) then | ||
| ffp => ffp(9:) ! pattern | ||
| header%qualifier = MQ_pattern | ||
| else | ||
| return | ||
| end if | ||
|
|
||
| ! Read second qualifier: general, symmetric, skew-symmetric, hermitian | ||
| if( starts_with(ffp, "general") ) then | ||
| ffp => ffp(9:) ! general | ||
| header%symmetry = MS_general | ||
| else if( starts_with(ffp, "symmetric") ) then | ||
| ffp => ffp(11:) ! symmetric | ||
| header%symmetry = MS_symmetric | ||
| else if( starts_with(ffp, "skew-symmetric") ) then | ||
| ffp => ffp(16:) ! skew-symmetric | ||
| header%symmetry = MS_skew_symmetric | ||
| else if( starts_with(ffp, "hermitian") ) then | ||
| ffp => ffp(11:) ! hermitian | ||
| header%symmetry = MS_hermitian | ||
| else | ||
| return | ||
| end if | ||
| end subroutine |
There was a problem hiding this comment.
read_mm_header never sets err non-zero when header parsing fails, so callers treat invalid or partial Matrix Market headers as successfully parsed even though fields like header%format and header%symmetry may be left uninitialized. In load_mm_coo_*, these unvalidated enum fields are then used to choose allocation sizes and to decide whether to expand symmetric/skew/Her mitian entries, which can lead to writes past the end of the index/data arrays when header%symmetry holds an unexpected value (for example, when the symmetry qualifier is missing in the header line). This allows a malformed Matrix Market file to trigger heap/stack corruption in any application that calls load_mm on untrusted input; you should ensure parse failures set err and/or initialize and validate header fields before they are used to control allocations and indexing.
There was a problem hiding this comment.
This has been updated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1108 +/- ##
==========================================
- Coverage 68.84% 68.60% -0.25%
==========================================
Files 408 409 +1
Lines 13726 13775 +49
Branches 1552 1556 +4
==========================================
Hits 9450 9450
- Misses 4276 4325 +49 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @@ -0,0 +1,281 @@ | |||
| %%MatrixMarket matrix coordinate real general | |||
There was a problem hiding this comment.
Are these matrices "public domain"? Material (code, data?) related to Sparskit and similar are often licensed under (Lesser) GPL, which is not compatible with the MIT license of stdlib
There was a problem hiding this comment.
Thanks for the review.
I could not get a solid answer, because https://math.nist.gov/MatrixMarket/data/SPARSKIT/fidap/fidap.html did not provide the name of license. If it is not possible, then maybe we can try dataset given in https://sparse.tamu.edu/ and the website says:
If you use this Collection as a data set, please cite the original ACM-TOMS paper:
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software 38, 1, Article 1 (December 2011), 25 pages. DOI: https://doi.org/10.1145/2049662.2049663
There was a problem hiding this comment.
This is what is mentioned on the webpage:
"""
License
Overview
This website software (except for ssget) is under the MIT License. The matrices themselves are under the CC-BY 4.0 License.
Details
ssget (available in SuiteSparse/ssget in the ssget folder is under the BSD 3-clause License.
The matrices themselves are under the CC-BY 4.0 license Note that this license asks you to cite the source of the matrices. That citation can be made to these references:
Kolodziej et al., (2019). The SuiteSparse Matrix Collection Website Interface. Journal of Open Source Software, 4(35), 1244, DOI
Timothy A. Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1, Article 1 (November 2011), 25 pages. DOI
You should also preserve the metadata in the matrices themselves, which includes additional citations for specific matrices. For example, the LAW set of matrices (from the Laboratory for Web Algorithmics, Universita degli Studi di Milano) includes specific instructions on how to properly cite the matrices. See sparse.tamu.edu/LAW for details. For the Matrix Market format, most of the metadata appears in the header of the *.mtx themselves.
Ideally, if you redistribute the matrices in your own applications, you should not change them at all. This is essential for repeatability of experiments that rely on these matrices. Modification is permitted under the CC-BY 4.0 License, but that license requires you to state the modifications you make. If you make any such modifications, please change the filename and matrix name to indicate that it differs from the copy of the matrix from sparse.tamu.edu. Carefully describe any modifications you make.
"""
Please note: I'm not legal expert or advisor so nothing I say next should be taken as reference and should be verified: The CC-BY 4.0 license is also a permissive license, so as long as the conditions are respected it might be possible to include matrices from their database as reference.
There was a problem hiding this comment.
Thanks for checking it out.
The CC-BY 4.0 License states that:
You are free to:
Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Attribution — You must give [appropriate credit](https://creativecommons.org/licenses/by/4.0/#ref-appropriate-credit), provide a link to the license, and [indicate if changes were made](https://creativecommons.org/licenses/by/4.0/#ref-indicate-changes). You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions — You may not apply legal terms or [technological measures](https://creativecommons.org/licenses/by/4.0/#ref-technological-measures) that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable [exception or limitation](https://creativecommons.org/licenses/by/4.0/#ref-exception-or-limitation).
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as [publicity, privacy, or moral rights](https://creativecommons.org/licenses/by/4.0/#ref-publicity-privacy-or-moral-rights) may limit how you use the material.
So we can add a README.md file which includes proper attribution, license link and recommended citations. Let me know if this approach looks acceptable or if you would prefer any changes.
There was a problem hiding this comment.
thank you @jalvesz for the detailed information. For some reasons, I completely missed them.
So we can add a README.md file which includes proper attribution, license link and recommended citations.
Yes, please do so.
Ideally, if you redistribute the matrices in your own applications, you should not change them at all.
I also suggest to follow this advice (if not done yet).
This PR introduces support for the matrix market format. A big shoutout and the credit to @Mahmood-Sinan who took an initial draft by myself and put it into its current shape.
Key details:
Interfaces:
dense:
call load_mm( filename, matrix [, iostat, iomsg] )sparse (coo like):
call load_mm( filename, index, data [, iostat, iomsg])dense:
call save_mm( filename, matrix [, comment, format, symmetry, iostat, iomsg] )sparse (coo like):
call save_mm( filename, index, data [, comment, format, symmetry, iostat, iomsg] )On implementation:
COO_typeclass to pass sparse matrices, we then went back on that point to reduce unnecessary dependencies.access=streamandto_num_from_streamfor fast readingPrevious art and references
SciPy:
Julia