Stack overflow vulnerability in HDF5 1.10.3
Loginsoft-2018-17439
September 24, 2018
CVE Number
CVE-2018-17439
CWE
CWE-121: Stack-based Buffer Overflow
Product Details
HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of data types, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.
URL:https://www.hdfgroup.org/downloads
Vulnerable Versions
HDF5 1.10.3
Vulnerability Details
An issue was discovered in the HDF HDF5 1.10.3 library. There is a stack-based buffer overflow in the function H5S_extent_get_dims() in H5S.c. Specifically, this issue occurs while converting an HDF5 file to a GIF file.
SYNOPSIS
Datasets: Very similar to `NumPy` arrays, they are homogeneous collections of data elements, with an immutable datatype and (hyper)rectangular shape.
Attributes :
– shape
– size
– dtype
Chunk/chuncking – Chunking refers to a storage layout where a dataset is partitioned into fixed-size multi-dimensional chunks.
Raw chunk cache data – Calling write many times from the application would result in poor performance when data is written within a chunk. A raw data chunk cache layer was added to improve the performance. By default, the chunk cache will store 521 chunks or 1MB of data.
Ref: http://docs.h5py.org/en/latest/high/dataset.html
H5D__open_name()
function opens an existing dataset via the name & looks for the dataset object thereby checking for the correctness of the object found. If valid, it accesses the dataset by calling H5D_open() [1] internally calling H5D__open_oid()
, which is responsible for doing different operations such as opening the dataset object, loading type, dataspace information, caching the dataspace info, getting the layout/pline/efl message information etc.
During the operation of getting the layout/pline/efl message information, the function H5D__layout_oh_read() [2] is called to initiate the operation. It invokes H5D__chunk_init()
, Initializing the raw data chunk cache for a dataset (culprit), usually called when the dataset is initialized. While computing the scaled dimension info, the value of raw data chunk cache is computed by performing a division of the dataset current dimensions dset->shared->curr_dims[u]
with the dataset layout chunk dimension dset->shared->layout.u.chunk.dim[u]
[3]. The value of dataset layout chunk dimension if gone zero, will end up creating Divide by zero issue & raising a floating-point exception.
Fix –
As a part of fix, bound check is being done to check if dataset layout chunk dimension is a non-zero value.
https://www.hdfgroup.org/2018/07/hdfview-3-0-pre-release-newsletter-163/
(dset->shared->layout.u.chunk.dim[u] == 0)
“`
- if(dset->shared->layout.u.chunk.dim[u] == 0)
- HGOTO_ERROR(H5E_DATASET, H5E_BADVALUE, FAIL,
- “chunk size must be > 0, dim = %u “, u)rdcc->scaled_dims[u] = dset->shared->curr_dims[u] / dset->shared->layout.u.chunk.dim[u];
Analysis
Backtrace :
Proof of concept
./h5stat -A -T -G -D -S $POC
-A prints attribute information-T prints dataset’s datatype metadata-G prints file space information for groups’ metadata-D prints file space information for dataset’s metadata
Vulnerability DetailsA SIGFPE signal is raised in the function H5D__chunk_set_info_real() of H5Dchunk.c in the HDF HDF5 1.10.3 library during an attempted parse of a crafted HDF file, because of incorrect protection against division by zero. This issue is different from CVE-2018-11207.
SYNOPSIS
A similar issue as CVE-2018-15672 was discovered in H5D__chunk_set_info_real()
function at src/H5Dchunk.c
.
The Function H5D__layout_oh_read()
invokes H5D__chunk_init()
which Initializes the raw data chunk cache for a dataset, called when the dataset is initialized.
It then computes the scaled dimension information followed by setting the number of chunks in a dataset for which it calls H5D__chunk_set_info()
[1] passing the dataset (dset).
H5D__chunk_set_info_real()
[2] then sets the base layout information. While computing the number of chunks in dataset dimensions, there’s an invalid computation during the calculation of the layout chunk. The current dimensions curr_dims[u]
is added to the layout dimension layout->dim[u]
, subtracted by 1 and divided with the layout dimension layout->dim[u]
[3]. The layout dimensions if set to zero, can end up creating Divide by zero issue & raising a floating-point exception.
Analysis
Backtrace
Proof of concept
./h5dump -H $POC
-H Prints the header but displays no data.
Vulnerability Details
A SIGFPE signal is raised in the function H5D__create_chunk_file_map_hyper() of H5Dchunk.c in the HDF HDF5 through 1.10.3 library during an attempted parse of a crafted HDF file, because of incorrect protection against division by zero. It could allow a remote denial of service attack.
SYNOPSIS
[1] H5Dread()
functions read a part of dataset file into the applications memory buffer, it internally calls H5D__read()
[2] for reading in the raw data.
H5D__chunk_io_init()
is responsible for performing any initialization before any I/O on the raw data, further calling H5D__chunk_io_init()
. Inside H5D__chunk_io_init()
a check is done to find out if the file selection is not a hyperslab selection, for which it calls H5D__create_chunk_file_map_hyper()
[3]. It also is responsible for building the file selection for each chunk and creating all chunk selections in a file. It gets the number of elements selected in a file, bounding box for selection & then sets the initial chunk location & hyperslab size, being the area where things are going wrong.
There a division being done between the Offset of low bound of file selection sel_start[u]
and the file memory layout of the dataset fm->layout->u.chunk.dim[u]
[4]. The file memory layout of the dataset if set to zero, can end up providing a result of zero causing Divide by zero issue & raising a floating-point exception.
Analysis
Backtrace
Proof of concept
h5dump -r -d BAG_root/metadata $POC
-r switch is used to print 1-bytes integer datasets as ASCII.
-d is for dumping a dataset from a group in a hdf5 file.
Timeline
Vendor Disclosure: 2018-09-24
Patch Release: 2018-09-25
Public Disclosure: 2018-09-26
Credit
Discovered by ACE Team – Loginsoft