C-Blosc2: A simple, compressed, fast and persistent data store library for C¶
The Blosc Development Team
- Travis CI
What is it?¶
Blosc is a high performance compressor optimized for binary data (i.e. floating point numbers, integers and booleans). It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc main goal is not just to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations.
C-Blosc2 is the new major version of C-Blosc, with full support for 64-bit containers, filter pipelining, new filters, new codecs and dictionaries for improved compression ratio. The new 64-bit data containers support both sparse (super-chunks) and sequential (frames) storage, either in-memory or on-disk. The frame is a sequential format that is very simple and meant to be used for either persistency or send to other processes or machines. Finally, the frames can be annotated with metainfo (metalayers, usermeta) that is provided by the user. More info about the improved capabilities of C-Blosc2 can be found in this talk.
C-Blosc2 tries hard to be backward compatible with both the C-Blosc1 API and in-memory format. Furthermore, if you just use the C-Blosc1 API you are guaranteed to generate compressed data containers that can be read with C-Blosc2, but getting the benefit of better performance, like for example leveraging the accelerated versions of codecs present in Intel’s IPP (LZ4 is supported now and others will follow).
C-Blosc2 is currently in beta stage, so not ready to be used in production yet. Having said this, the beta stage means that the API has been declared frozen, so there is guarantee that your programs will continue to work with future versions of the library. If you want to collaborate in this development you are welcome. We need help in the different areas listed at the ROADMAP; also, be sure to read our DEVELOPING-GUIDE. Blosc is distributed using the BSD license.
Meta-compression and other advantages over existing compressors¶
C-Blosc2 is not like other compressors: it should rather be called a meta-compressor. This is so because it can use different compressors and filters (programs that generally improve compression ratio). At any rate, it can also be called a compressor because it happens that it already comes with several compressor and filters, so it can actually work like so.
Currently C-Blosc2 comes with support of BloscLZ, a compressor heavily based on FastLZ, LZ4 and LZ4HC, Zstd, Lizard and Zlib, via miniz:, as well as a highly optimized (it can use SSE2, AVX2, NEON or ALTIVEC instructions, if available) shuffle and bitshuffle filters (for info on how shuffling works, see slide 17 of http://www.slideshare.net/PyData/blosc-py-data-2014).
Blosc is in charge of coordinating the different compressor and filters so that they can leverage the blocking technique as well as multi-threaded execution automatically. That makes that every codec and filter in the pipeline will run efficiently on modern CPUs, even if it was not initially designed for doing blocking or multi-threading.
Another important aspect of C-Blosc2 is that it splits large datasets in smaller containers called chunks, which are basically Blosc1 containers. For maximum performance, these chunks are meant to fit in the LLC (Last Level Cache) of CPUs. In practice this means that in order to leverage C-Blosc2 containers effectively, the user should ask for C-Blosc2 to uncompress the chunks, consume them before they hit main memory and then proceed with the new chunk (as in any streaming operation). We call this process Streamed Compressed Computing and it effectively avoids uncompressed data to travel to RAM, saving precious time in modern architectures where RAM access is very expensive compared with CPU speeds.
As said, C-Blosc2 adds a powerful mechanism for adding different metalayers on top of its containers. Caterva is a sibling library that adds such a metalayer specifying not only the dimensionality of a dataset, but also the dimensionality of the chunks inside the dataset. In addition, Caterva adds machinery for retrieving arbitrary multi-dimensional slices (aka hyper-slices) out of the multi-dimensional containers in the most efficient way. Hence, Caterva brings the convenience of multi-dimensional containers to your application very easily. For more info, check out the Caterva documentation.
Compiling the C-Blosc2 library with CMake¶
Blosc can be built, tested and installed using CMake. The following procedure describes a typical CMake build.
Create the build directory inside the sources and move into it:
$ cd c-blosc2-sources $ mkdir build $ cd build
Now run CMake configuration and optionally specify the installation directory (e.g. ‘/usr’ or ‘/usr/local’):
$ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory ..
CMake allows to configure Blosc in many different ways, like prefering internal or external sources for compressors or enabling/disabling them. Please note that configuration can also be performed using UI tools provided by CMake (ccmake or cmake-gui):
$ ccmake .. # run a curses-based interface $ cmake-gui .. # run a graphical interface
Build, test and install Blosc:
$ cmake --build . $ ctest $ cmake --build . --target install
The static and dynamic version of the Blosc library, together with header files, will be installed into the specified CMAKE_INSTALL_PREFIX.
Once you have compiled your Blosc library, you can easily link your apps with it as shown in the examples/ directory.
Handling support for codecs (LZ4, LZ4HC, Zstd, Lizard, Zlib)¶
C-Blosc2 comes with full sources for LZ4, LZ4HC, Zstd, Lizard and Zlib and in general, you should not worry about not having (or CMake not finding) the libraries in your system because by default the included sources will be automatically compiled and included in the C-Blosc2 library. This means that you can be confident in having a complete support for all the codecs in all the Blosc deployments (unless you are explicitly excluding support for some of them).
If you want to force Blosc to use external libraries instead of the included compression sources:
$ cmake -DPREFER_EXTERNAL_LZ4=ON ..
You can also disable support for some compression libraries:
$ cmake -DDEACTIVATE_SNAPPY=ON ..
C-Blosc2 is meant to support all platforms where a C99 compliant C compiler can be found. The ones that are mostly tested are Intel (Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM Blue Gene Q embedded “A2” processor are reported to work too.
For Windows, you will need at least VS2015 or higher on x86 and x64 targets (i.e. ARM is not supported on Windows).
For Mac OSX, make sure that you have installed the command line developer tools. You can always install them with:
$ xcode-select --install
Support for the LZ4 optimized version in Intel IPP¶
C-Blosc2 comes with support for a highly optimized version of the LZ4 codec present in Intel IPP, and actually if the cmake machinery in C-Blosc2 discovers IPP installed in your system it will use it automatically by default. Here it is a way to easily install Intel IPP in Ubuntu machines:
$ wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB $ apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB $ sudo sh -c 'echo deb https://apt.repos.intel.com/ipp all main > /etc/apt/sources.list.d/intel-ipp.list' $ sudo apt-get update && sudo apt-get install intel-ipp-64bit-2019.X # replace .X by the latest version
Check Intel IPP website for instructions on how to install it for other platforms.
There is an official mailing list for Blosc at:
Blosc2 Frame Format¶
Blosc (as of version 2.0.0) has a frame format that allows to store different data chunks sequentially, either in-memory or on-disk.
The frame is composed by a header, a chunks section and a trailer, which are variable-length and stored sequentially:
+---------+--------+---------+ | header | chunks | trailer | +---------+--------+---------+
These are described below.
The header of a frame is encoded via msgpack and it follows the next format:
|-0-|-1-|-2-|-3-|-4-|-5-|-6-|-7-|-8-|-9-|-A-|-B-|-C-|-D-|-E-|-F-|-10|-11|-12|-13|-14|-15|-16|-17| | 9X| aX| "b2frame\0" | d2| header_size | cf| frame_size | |---|---|-------------------------------|---|---------------|---|-------------------------------| ^ ^ ^ ^ ^ | | | | | | | | | +--[msgpack] uint64 | | | | | | | +--[msgpack] int32 | | +---magic number, currently "b2frame" | +------[msgpack] str with 8 elements +---[msgpack] fixarray with X=0xD (13) elements |-18|-19|-1A|-1B|-1C|-1D|-1E|-1F|-20|-21|-22|-23|-24|-25|-26|-27|-28|-29|-2A|-2B|-2C|-2D|-2E| | a4|_f0|_f1|_f2|_f3| d3| uncompressed_size | d3| compressed_size | |---|---|---|---|---|---|-------------------------------|---|-------------------------------| ^ ^ ^ ^ ^ ^ ^ | | | | | | +--[msgpack] int64 | | | | | +--[msgpack] int64 | | | | +-- reserved flags | | | +--codec_flags (see below) | | +---reserved flags | +------general_flags (see below) +---[msgpack] str with 4 elements (flags) |-2F|-30|-31|-32|-33|-34|-35|-36|-37|-38|-39|-3A|-3B|-3C|-3D|-3E|-3F| | d2| type_size | d2| chunk_size | d1| tcomp | d1|tdecomp| cX| |---|---------------|---|---------------|---|-------|---|-------|---| ^ ^ ^ ^ ^ ^ ^ | | | | | | +--[msgpack] bool for has_usermeta | | | | | +--number of threads for decompression | | | | +-- [msgpack] int16 | | | +--number of threads for compression | | +---[msgpack] int16 | +------[msgpack] int32 +---[msgpack] int32
Then it follows the info about the filter pipeline. There is place for a pipeline that is 8 slots deep, and there is a reserved byte per every filter code and another byte for a possible associated meta-info:
|-40|-41|-42|-43|-44|-45|-46|-47|-48|-49|-4A|-4B|-4C|-4D|-4E|-4F|-50|-51| | d2| X | filter_codes | filter_meta | |---|---|-------------------------------|-------------------------------| ^ ^ | | | +--number of filters +--[msgpack] fixext 16
In addition, a frame can be completed with meta-information about the stored data; these data blocks are called metalayers and it is up to the user to store whatever data they want there, with the only (strong) suggestion that they have to be in the msgpack format. Here it is the format for the case that there exist some metalayers:
|-52|-53|-54|-55|-56|----------------------- | 93| cd| idx | de| map_of_metalayers |---|---------------|----------------------- ^ ^ ^ ^ | | | | | | | +--[msgpack] map 16 with N keys | | +--size of the map (index) of offsets | +--[msgpack] uint16 +-- [msgpack] fixarray with 3 elements
- map of metalayers
This is a msgpack-formattted map for the different metalayers. The keys will be a string (0xa0 + namelen) for the names of the metalayers, followed by an int32 (0xd2) for the offset of the value of this metalayer. The actual value will be encoded as a bin32 (0xc6) value later in frame.
int32) Size of the header of the frame (including metalayers).
uint64) Size of the whole frame (including compressed data).
uint8) General flags.
Enumerated for chunk offsets. :
Chunks of fixed length (0) or variable length (1)
uint8) Filter flags that are the defaults for all the chunks in storage.
- bit 0
If set, blocks are not split in sub-blocks.
- bit 1
Filter pipeline is described in bits 3 to 6; else in _filter_pipeline system metalayer.
- bit 2
- bit 3
Whether the shuffle filter has been applied or not.
- bit 4
Whether the internal buffer is a pure memcpy or not.
- bit 5
Whether the bitshuffle filter has been applied or not.
- bit 6
Whether the delta codec has been applied or not.
- bit 7
uint8) Compressor enumeration (defaults for all the chunks in storage).
Enumerated for codecs (up to 16) :
Compression level (up to 16)
uint8) Space reserved.
int64) Size of uncompressed data in frame (excluding metadata).
int64) Size of compressed data in frame (excluding metadata).
int32) Size of each item.
int32) Size of each data chunk. 0 if not a fixed chunksize.
int16) Number of threads for compression. If 0, same than cctx.
int16) Number of threads for decompression. If 0, same than dctx.
Here there is the actual data chunks stored sequentially:
+========+========+========+===========+ | chunk0 | chunk1 | ... | chunk idx | +========+========+========+===========+
The different chunks are described in the chunk format document. The chunk idx is an index for the different chunks in this section. It is made by the 64-bit offsets to the different chunks and compressed into a new chunk, following the regular Blosc chunk format.
Here it is data that can change in size, mainly the usermeta chunk:
|-0-|-1-|-2-|-3-|-4-|-5-|-6-|====================|---|---------------|---|---|=================| | 9X| aX| c6| usermeta_len | usermeta_chunk | ce| trailer_len | d8|fpt| fingerprint | |---|---|---|---------------|====================|---|---------------|---|---|=================| ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | +-- fingerprint type | | | | | | +--[msgpack] fixext 16 | | | | | +-- trailer length (network endian) | | | | +--[msgpack] uint32 for trailer length | | | +--[msgpack] usermeta length (network endian) | | +---[msgpack] bin32 for usermeta | +------[msgpack] int8 for trailer version +---[msgpack] fixarray with X=4 elements
int32) The length of the usermeta chunk.
varlen) The usermeta chunk (a Blosc chunk).
uint32) Size of the trailer of the frame (including usermeta chunk).
int8) Fingerprint type: 0 -> no fp; 1 -> 32-bit; 2 -> 64-bit; 3 -> 128-bit
uint128) Fix storage space for the fingerprint, padded to the left.