Message ID | 20200325195849.407900-4-nickrterrell@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add support for ZSTD-compressed kernel and initramfs | expand |
Hi! On Wed, Mar 25, 2020 at 12:58:44PM -0700, Nick Terrell wrote: > From: Nick Terrell <terrelln@fb.com> > * Add unzstd() and the zstd decompress interface. Here I do not understand why you limit the window size to 8MB even when you read a larger value from the header. I do not see a reason why there should be such a limitation at the first place and if there should be, why it differs from ZSTD_WINDOWLOG_MAX. I removed that limitation to be able to test it in my environment and I found the performance is worst than with my patch by roughly 20% (on i7-3520M), which is a major drawback considering the main motivation to use zstd is the decompression speed. I will test on arm as well and share the result tomorrow. Petr
> On Mar 26, 2020, at 9:47 AM, Petr Malat <oss@malat.biz> wrote: > > Hi! > On Wed, Mar 25, 2020 at 12:58:44PM -0700, Nick Terrell wrote: >> From: Nick Terrell <terrelln@fb.com> >> * Add unzstd() and the zstd decompress interface. > Here I do not understand why you limit the window size to 8MB even when > you read a larger value from the header. I do not see a reason why there > should be such a limitation at the first place and if there should be, > why it differs from ZSTD_WINDOWLOG_MAX. When we are doing streaming decompression (either flush or fill is provided) we have to allocate memory proportional to the window size. We want to bound that memory so we don’t accidentally allocate too much memory. When we are doing a single-pass decompression (neither flush nor fill are provided) the window size doesn’t matter, and we only have to allocate a fixed amount of memory ~192 KB. The zstd spec [0] specifies that all decoders should allow window sizes up to 8 MB. Additionally, the zstd CLI won’t produce window sizes greater than 8 MB by default. The window size is controlled by the compression level, and can be explicitly set. I would expect larger window sizes to be beneficial for compression ratio, though there is demising returns. I would expect that for kernel image compression larger window sizes are beneficial, since it is decompressed with a single pass. For initramfs decompression, I would expect that limiting the window size could help decompression speed, since it uses streaming compression, so unzstd() has to allocate a buffer of window size bytes. > I removed that limitation to be able to test it in my environment and I > found the performance is worst than with my patch by roughly 20% (on > i7-3520M), which is a major drawback considering the main motivation > to use zstd is the decompression speed. I will test on arm as well and > share the result tomorrow. > Petr What do you mean by that? Can you share with me the test you ran? Is this for kernel decompression or initramfs decompression? Best, Nick [0] https://tools.ietf.org/html/rfc8478#section-3.1.1.1.2
Hi! On Thu, Mar 26, 2020 at 07:03:54PM +0000, Nick Terrell wrote: > >> * Add unzstd() and the zstd decompress interface. > > Here I do not understand why you limit the window size to 8MB even when > > you read a larger value from the header. I do not see a reason why there > > should be such a limitation at the first place and if there should be, > > why it differs from ZSTD_WINDOWLOG_MAX. > > When we are doing streaming decompression (either flush or fill is provided) > we have to allocate memory proportional to the window size. We want to > bound that memory so we don't accidentally allocate too much memory. > When we are doing a single-pass decompression (neither flush nor fill > are provided) the window size doesn't matter, and we only have to allocate > a fixed amount of memory ~192 KB. > > The zstd spec [0] specifies that all decoders should allow window sizes > up to 8 MB. Additionally, the zstd CLI won't produce window sizes greater > than 8 MB by default. The window size is controlled by the compression > level, and can be explicitly set. Yes, one needs to pass --ultra option to zstd to produce an incompatible archive, but that doesn't justify the reason to limit this in the kernel, especially if one is able to read the needed window size from the header when allocating the memory. At the time when initramfs is extracted, there usually is memory available as it's before any processes are started and this memory is reclaimed after the decompression. If, on the other hand, an user makes an initramfs for a memory constrained system, he limits the window size while compressing the archive and the small window size will be announced in the header. The only scenario where using the hard-coded limit makes sense is in a case the window size is not available (I'm not sure if it's mandatory to provide it). That's how my code works - if the size is available, it uses the provided value, if not it uses 1 << ZSTD_WINDOWLOG_MAX. I would also agree a fixed limit would make a sense if a user (or network) provided data would be used, but in this case only the system owner is able to provide an initramfs. If one is able to change initramfs, he can render the system unusable simply by providing a corrupted file. He doesn't have to bother making the window bigger than the available memory. > I would expect larger window sizes to be beneficial for compression ratio, > though there is demising returns. I would expect that for kernel image > compression larger window sizes are beneficial, since it is decompressed > with a single pass. For initramfs decompression, I would expect that limiting > the window size could help decompression speed, since it uses streaming > compression, so unzstd() has to allocate a buffer of window size bytes. Yes, larger window improves the compression ratio, see here a comparison between level 19 and 22 on my testing x86-64 initramfs: 30775022 rootfs.cpio.zst-19 28755429 rootfs.cpio.zst-22 These 7% can be noticeable when one has a slow storage, e.g. a flash memory on SPI bus. > > I removed that limitation to be able to test it in my environment and I > > found the performance is worst than with my patch by roughly 20% (on > > i7-3520M), which is a major drawback considering the main motivation > > to use zstd is the decompression speed. I will test on arm as well and > > share the result tomorrow. > > Petr > > What do you mean by that? Can you share with me the test you ran? > Is this for kernel decompression or initramfs decompression? Initramfs - you can apply my v2 patch on v5.5 and try with your test data. I have tested your patch also on ARMv7 platform and there the degradation was 8%. Petr
> On Mar 26, 2020, at 1:16 PM, Petr Malat <oss@malat.biz> wrote: > > Hi! > On Thu, Mar 26, 2020 at 07:03:54PM +0000, Nick Terrell wrote: >>>> * Add unzstd() and the zstd decompress interface. >>> Here I do not understand why you limit the window size to 8MB even when >>> you read a larger value from the header. I do not see a reason why there >>> should be such a limitation at the first place and if there should be, >>> why it differs from ZSTD_WINDOWLOG_MAX. >> >> When we are doing streaming decompression (either flush or fill is provided) >> we have to allocate memory proportional to the window size. We want to >> bound that memory so we don't accidentally allocate too much memory. >> When we are doing a single-pass decompression (neither flush nor fill >> are provided) the window size doesn't matter, and we only have to allocate >> a fixed amount of memory ~192 KB. >> >> The zstd spec [0] specifies that all decoders should allow window sizes >> up to 8 MB. Additionally, the zstd CLI won't produce window sizes greater >> than 8 MB by default. The window size is controlled by the compression >> level, and can be explicitly set. > Yes, one needs to pass --ultra option to zstd to produce an incompatible > archive, but that doesn't justify the reason to limit this in the kernel, > especially if one is able to read the needed window size from the header > when allocating the memory. At the time when initramfs is extracted, > there usually is memory available as it's before any processes are > started and this memory is reclaimed after the decompression. I’m happy to increase this limit. I set it to 8 MB to be conservative, but I am happy to increase it to 128 MB == 1 << ZSTD_WINDOWLOG_MAX. I will submit a new version with that change. > If, on the other hand, an user makes an initramfs for a memory constrained > system, he limits the window size while compressing the archive and > the small window size will be announced in the header. > > The only scenario where using the hard-coded limit makes sense is in a > case the window size is not available (I'm not sure if it's mandatory > to provide it). That's how my code works - if the size is available, > it uses the provided value, if not it uses 1 << ZSTD_WINDOWLOG_MAX. > > I would also agree a fixed limit would make a sense if a user (or network) > provided data would be used, but in this case only the system owner is able > to provide an initramfs. If one is able to change initramfs, he can render > the system unusable simply by providing a corrupted file. He doesn't have > to bother making the window bigger than the available memory. That makes sense to me. >> I would expect larger window sizes to be beneficial for compression ratio, >> though there is demising returns. I would expect that for kernel image >> compression larger window sizes are beneficial, since it is decompressed >> with a single pass. For initramfs decompression, I would expect that limiting >> the window size could help decompression speed, since it uses streaming >> compression, so unzstd() has to allocate a buffer of window size bytes. > Yes, larger window improves the compression ratio, see here a comparison > between level 19 and 22 on my testing x86-64 initramfs: > 30775022 rootfs.cpio.zst-19 > 28755429 rootfs.cpio.zst-22 > These 7% can be noticeable when one has a slow storage, e.g. a flash memory > on SPI bus. > >>> I removed that limitation to be able to test it in my environment and I >>> found the performance is worst than with my patch by roughly 20% (on >>> i7-3520M), which is a major drawback considering the main motivation >>> to use zstd is the decompression speed. I will test on arm as well and >>> share the result tomorrow. >>> Petr >> >> What do you mean by that? Can you share with me the test you ran? >> Is this for kernel decompression or initramfs decompression? > Initramfs - you can apply my v2 patch on v5.5 and try with your test data. > > I have tested your patch also on ARMv7 platform and there the degradation > was 8%. Are you comparing the performance of an 8 MB window size vs a 128 MB window size? > Petr
On Thu, Mar 26, 2020 at 09:13:54PM +0000, Nick Terrell wrote: > >> What do you mean by that? Can you share with me the test you ran? > >> Is this for kernel decompression or initramfs decompression? > > Initramfs - you can apply my v2 patch on v5.5 and try with your test data. > > > > I have tested your patch also on ARMv7 platform and there the degradation > > was 8%. > > Are you comparing the performance of an 8 MB window size vs a 128 MB > window size? No, I use the same initramfs file with 2 different kernels for the test. I have removed if (params.windowSize > ZSTD_WINDOWSIZE_MAX) goto out; from your code. Petr
> On Mar 26, 2020, at 2:44 PM, Petr Malat <oss@malat.biz> wrote: > > On Thu, Mar 26, 2020 at 09:13:54PM +0000, Nick Terrell wrote: >>>> What do you mean by that? Can you share with me the test you ran? >>>> Is this for kernel decompression or initramfs decompression? >>> Initramfs - you can apply my v2 patch on v5.5 and try with your test data. >>> >>> I have tested your patch also on ARMv7 platform and there the degradation >>> was 8%. >> >> Are you comparing the performance of an 8 MB window size vs a 128 MB >> window size? > No, I use the same initramfs file with 2 different kernels for the test. I have > removed > if (params.windowSize > ZSTD_WINDOWSIZE_MAX) goto out; > from your code. Thanks for the clarification. I will try to reproduce the speed difference you’ve measured before submitting v4 (that deletes the windowSize bound). Initramfs passes the whole input buffer (doesn’t use fill), but does use flush. Zstd always decompresses into an internal buffer, then copies into the ZSTD_outBuffer. That means the only functional difference between the two versions for initramfs should be that I will call flush() 4 KB at a time, and you call flush 128 KB at a time. Naively I wouldn’t expect this to matter too much, but I will measure. Best, Nick
> On Mar 26, 2020, at 1:16 PM, Petr Malat <oss@malat.biz> wrote: > > Hi! > On Thu, Mar 26, 2020 at 07:03:54PM +0000, Nick Terrell wrote: >>>> * Add unzstd() and the zstd decompress interface. >>> Here I do not understand why you limit the window size to 8MB even when >>> you read a larger value from the header. I do not see a reason why there >>> should be such a limitation at the first place and if there should be, >>> why it differs from ZSTD_WINDOWLOG_MAX. >> >> When we are doing streaming decompression (either flush or fill is provided) >> we have to allocate memory proportional to the window size. We want to >> bound that memory so we don't accidentally allocate too much memory. >> When we are doing a single-pass decompression (neither flush nor fill >> are provided) the window size doesn't matter, and we only have to allocate >> a fixed amount of memory ~192 KB. >> >> The zstd spec [0] specifies that all decoders should allow window sizes >> up to 8 MB. Additionally, the zstd CLI won't produce window sizes greater >> than 8 MB by default. The window size is controlled by the compression >> level, and can be explicitly set. > Yes, one needs to pass --ultra option to zstd to produce an incompatible > archive, but that doesn't justify the reason to limit this in the kernel, > especially if one is able to read the needed window size from the header > when allocating the memory. At the time when initramfs is extracted, > there usually is memory available as it's before any processes are > started and this memory is reclaimed after the decompression. > > If, on the other hand, an user makes an initramfs for a memory constrained > system, he limits the window size while compressing the archive and > the small window size will be announced in the header. > > The only scenario where using the hard-coded limit makes sense is in a > case the window size is not available (I'm not sure if it's mandatory > to provide it). That's how my code works - if the size is available, > it uses the provided value, if not it uses 1 << ZSTD_WINDOWLOG_MAX. > > I would also agree a fixed limit would make a sense if a user (or network) > provided data would be used, but in this case only the system owner is able > to provide an initramfs. If one is able to change initramfs, he can render > the system unusable simply by providing a corrupted file. He doesn't have > to bother making the window bigger than the available memory. > >> I would expect larger window sizes to be beneficial for compression ratio, >> though there is demising returns. I would expect that for kernel image >> compression larger window sizes are beneficial, since it is decompressed >> with a single pass. For initramfs decompression, I would expect that limiting >> the window size could help decompression speed, since it uses streaming >> compression, so unzstd() has to allocate a buffer of window size bytes. > Yes, larger window improves the compression ratio, see here a comparison > between level 19 and 22 on my testing x86-64 initramfs: > 30775022 rootfs.cpio.zst-19 > 28755429 rootfs.cpio.zst-22 > These 7% can be noticeable when one has a slow storage, e.g. a flash memory > on SPI bus. > >>> I removed that limitation to be able to test it in my environment and I >>> found the performance is worst than with my patch by roughly 20% (on >>> i7-3520M), which is a major drawback considering the main motivation >>> to use zstd is the decompression speed. I will test on arm as well and >>> share the result tomorrow. >>> Petr >> >> What do you mean by that? Can you share with me the test you ran? >> Is this for kernel decompression or initramfs decompression? > Initramfs - you can apply my v2 patch on v5.5 and try with your test data. > > I have tested your patch also on ARMv7 platform and there the degradation > was 8%. Thanks again for measuring the speed differences between the two patchsets! I’ve found that the difference in performance between our two patchsets is caused by the output buffer size. I expect this is due to calling flush() more often, since that is a complex state machine in initramfs’s use case. I’ve measured the speed of this patch set (v3), compared against this patch set with a 128 KB buffer size (ZSTD_DStreamOutSize()), vs Petr’s patchset. I’m measuring on an Intel i9-9900K with turbo disabled on CPU 0. I’m booting the kernel using QEMU. To measure the initramfs decompression speed I look at the difference in timestamp between “Unpacking initramfs…” and “Freeing ignited memory”. The initramfs is compressed using level 19, but results for level 22 are similar. Times are reported in seconds. I ran each test 3 times and took the median time, but the results are very stable. On ARM the initramfs is 26 MB. On x86-64 the initramfs is 97 MB. Arch v3 128 Petr Arm 1.67 1.52 1.55 x64 1.76 1.69 1.66 The results for my patch are slightly better on ARM, yours are slightly better on x86. In v4 of my patchset, which I will send out tonight, I will increase ZSTD_IOBUF_SIZE to 128 KB (as well as remove the 8 MB window size limit). Please let me know if your results align with mine on v4. I’ve also measured the x86_64 zstd kernel decompression speed using our two patchsets. I measured it by the timing between the “Decompressing Linux…” message and the “Parsing ELF” message with this script [0]. I used the same technique for measurement as above. The kernel I am testing is compressed at level 19 with my patchset and at level 19 with a window size of 4 MB with your patchset. I found that my patchset takes 70ms to decompress and yours takes 318ms. Your patchset also uses 4 MB of heap memory, where mine only needs 192 KB. The difference is caused by two things: 1. memcpy() is replaced by __builtin_memcpy() in patch 1 of my set. This is the core of the decompression hot loop, and without it the compiler can’t inline memcpy. 2. My patchset calls decompress_single() when neither flush nor fill are provided, like when decompressing the kernel. This saves the 4 MB of memory, as well as speeds up decompression a little bit. Best, Nick Terrell [0] https://gist.github.com/terrelln/9bd53321a669f62683c608af8944fbc2
diff --git a/include/linux/decompress/unzstd.h b/include/linux/decompress/unzstd.h new file mode 100644 index 000000000000..56d539ae880f --- /dev/null +++ b/include/linux/decompress/unzstd.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef LINUX_DECOMPRESS_UNZSTD_H +#define LINUX_DECOMPRESS_UNZSTD_H + +int unzstd(unsigned char *inbuf, long len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), + unsigned char *output, + long *pos, + void (*error_fn)(char *x)); +#endif diff --git a/lib/Kconfig b/lib/Kconfig index bc7e56370129..11de5fa09a52 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -336,6 +336,10 @@ config DECOMPRESS_LZ4 select LZ4_DECOMPRESS tristate +config DECOMPRESS_ZSTD + select ZSTD_DECOMPRESS + tristate + # # Generic allocator support is selected if needed # diff --git a/lib/Makefile b/lib/Makefile index 611872c06926..09ad45ba6883 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -160,6 +160,7 @@ lib-$(CONFIG_DECOMPRESS_LZMA) += decompress_unlzma.o lib-$(CONFIG_DECOMPRESS_XZ) += decompress_unxz.o lib-$(CONFIG_DECOMPRESS_LZO) += decompress_unlzo.o lib-$(CONFIG_DECOMPRESS_LZ4) += decompress_unlz4.o +lib-$(CONFIG_DECOMPRESS_ZSTD) += decompress_unzstd.o obj-$(CONFIG_TEXTSEARCH) += textsearch.o obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o diff --git a/lib/decompress.c b/lib/decompress.c index 857ab1af1ef3..ab3fc90ffc64 100644 --- a/lib/decompress.c +++ b/lib/decompress.c @@ -13,6 +13,7 @@ #include <linux/decompress/inflate.h> #include <linux/decompress/unlzo.h> #include <linux/decompress/unlz4.h> +#include <linux/decompress/unzstd.h> #include <linux/types.h> #include <linux/string.h> @@ -37,6 +38,9 @@ #ifndef CONFIG_DECOMPRESS_LZ4 # define unlz4 NULL #endif +#ifndef CONFIG_DECOMPRESS_ZSTD +# define unzstd NULL +#endif struct compress_format { unsigned char magic[2]; @@ -52,6 +56,7 @@ static const struct compress_format compressed_formats[] __initconst = { { {0xfd, 0x37}, "xz", unxz }, { {0x89, 0x4c}, "lzo", unlzo }, { {0x02, 0x21}, "lz4", unlz4 }, + { {0x28, 0xb5}, "zstd", unzstd }, { {0, 0}, NULL, NULL } }; diff --git a/lib/decompress_unzstd.c b/lib/decompress_unzstd.c new file mode 100644 index 000000000000..a6b391b47ab8 --- /dev/null +++ b/lib/decompress_unzstd.c @@ -0,0 +1,338 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Important notes about in-place decompression + * + * At least on x86, the kernel is decompressed in place: the compressed data + * is placed to the end of the output buffer, and the decompressor overwrites + * most of the compressed data. There must be enough safety margin to + * guarantee that the write position is always behind the read position. + * + * The safety margin for ZSTD with a 128 KB block size is calculated below. + * Note that the margin with ZSTD is bigger than with GZIP or XZ! + * + * The worst case for in-place decompression is that the beginning of + * the file is compressed extremely well, and the rest of the file is + * uncompressible. Thus, we must look for worst-case expansion when the + * compressor is encoding uncompressible data. + * + * The structure of the .zst file in case of a compresed kernel is as follows. + * Maximum sizes (as bytes) of the fields are in parenthesis. + * + * Frame Header: (18) + * Blocks: (N) + * Checksum: (4) + * + * The frame header and checksum overhead is at most 22 bytes. + * + * ZSTD stores the data in blocks. Each block has a header whose size is + * a 3 bytes. After the block header, there is up to 128 KB of payload. + * The maximum uncompressed size of the payload is 128 KB. The minimum + * uncompressed size of the payload is never less than the payload size + * (excluding the block header). + * + * The assumption, that the uncompressed size of the payload is never + * smaller than the payload itself, is valid only when talking about + * the payload as a whole. It is possible that the payload has parts where + * the decompressor consumes more input than it produces output. Calculating + * the worst case for this would be tricky. Instead of trying to do that, + * let's simply make sure that the decompressor never overwrites any bytes + * of the payload which it is currently reading. + * + * Now we have enough information to calculate the safety margin. We need + * - 22 bytes for the .zst file format headers; + * - 3 bytes per every 128 KiB of uncompressed size (one block header per + * block); and + * - 128 KiB (biggest possible zstd block size) to make sure that the + * decompressor never overwrites anything from the block it is currently + * reading. + * + * We get the following formula: + * + * safety_margin = 22 + uncompressed_size * 3 / 131072 + 131072 + * <= 22 + (uncompressed_size >> 15) + 131072 + */ + +/* + * Preboot environments #include "path/to/decompress_unzstd.c". + * All of the source files we depend on must be #included. + * zstd's only source dependeny is xxhash, which has no source + * dependencies. + * + * zstd and xxhash avoid declaring themselves as modules + * when ZSTD_PREBOOT and XXH_PREBOOT are defined. + */ +#ifdef STATIC +# define ZSTD_PREBOOT +# define XXH_PREBOOT +# include "xxhash.c" +# include "zstd/entropy_common.c" +# include "zstd/fse_decompress.c" +# include "zstd/huf_decompress.c" +# include "zstd/zstd_common.c" +# include "zstd/decompress.c" +#endif + +#include <linux/decompress/mm.h> +#include <linux/kernel.h> +#include <linux/zstd.h> + +/* 8 MB maximum window size */ +#define ZSTD_WINDOWSIZE_MAX (1 << 23) +/* Size of the input and output buffers in multi-call mdoe */ +#define ZSTD_IOBUF_SIZE 4096 + +static int INIT handle_zstd_error(size_t ret, void (*error)(char *x)) +{ + const int err = ZSTD_getErrorCode(ret); + + if (!ZSTD_isError(ret)) + return 0; + + switch (err) { + case ZSTD_error_memory_allocation: + error("ZSTD decompressor ran out of memory"); + break; + case ZSTD_error_prefix_unknown: + error("Input is not in the ZSTD format (wrong magic bytes)"); + break; + case ZSTD_error_dstSize_tooSmall: + case ZSTD_error_corruption_detected: + case ZSTD_error_checksum_wrong: + error("ZSTD-compressed data is corrupt"); + break; + default: + error("ZSTD-compressed data is probably corrupt"); + break; + } + return -1; +} + +/* + * Handle the case where we have the entire input and output in one segment. + * We can allocate less memory (no circular buffer for the sliding window), + * and avoid some memcpy() calls. + */ +static int INIT decompress_single(const u8 *in_buf, long in_len, u8 *out_buf, + long out_len, long *in_pos, + void (*error)(char *x)) +{ + const size_t wksp_size = ZSTD_DCtxWorkspaceBound(); + void *wksp = large_malloc(wksp_size); + ZSTD_DCtx *dctx = ZSTD_initDCtx(wksp, wksp_size); + int err; + size_t ret; + + if (dctx == NULL) { + error("Out of memory while allocating ZSTD_DCtx"); + err = -1; + goto out; + } + /* + * Find out how large the frame actually is, there may be junk at + * the end of the frame that ZSTD_decompressDCtx() can't handle. + */ + ret = ZSTD_findFrameCompressedSize(in_buf, in_len); + err = handle_zstd_error(ret, error); + if (err) + goto out; + in_len = (long)ret; + + ret = ZSTD_decompressDCtx(dctx, out_buf, out_len, in_buf, in_len); + err = handle_zstd_error(ret, error); + if (err) + goto out; + + if (in_pos != NULL) + *in_pos = in_len; + + err = 0; +out: + if (wksp != NULL) + large_free(wksp); + return err; +} + +static int INIT __unzstd(unsigned char *in_buf, long in_len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), + unsigned char *out_buf, long out_len, + long *in_pos, + void (*error)(char *x)) +{ + ZSTD_inBuffer in; + ZSTD_outBuffer out; + ZSTD_frameParams params; + void *in_allocated = NULL; + void *out_allocated = NULL; + void *wksp = NULL; + size_t wksp_size; + ZSTD_DStream *dstream; + int err; + size_t ret; + + if (out_len == 0) + out_len = LONG_MAX; /* no limit */ + + if (fill == NULL && flush == NULL) + /* + * We can decompress faster and with less memory when we have a + * single chunk. + */ + return decompress_single(in_buf, in_len, out_buf, out_len, + in_pos, error); + + /* + * If in_buf is not provided, we must be using fill(), so allocate + * a large enough buffer. If it is provided, it must be at least + * ZSTD_IOBUF_SIZE large. + */ + if (in_buf == NULL) { + in_allocated = malloc(ZSTD_IOBUF_SIZE); + if (in_allocated == NULL) { + error("Out of memory while allocating input buffer"); + err = -1; + goto out; + } + in_buf = in_allocated; + in_len = 0; + } + /* Read the first chunk, since we need to decode the frame header. */ + if (fill != NULL) + in_len = fill(in_buf, ZSTD_IOBUF_SIZE); + if (in_len < 0) { + error("ZSTD-compressed data is truncated"); + err = -1; + goto out; + } + /* Set the first non-empty input buffer. */ + in.src = in_buf; + in.pos = 0; + in.size = in_len; + /* Allocate the output buffer if we are using flush(). */ + if (flush != NULL) { + out_allocated = malloc(ZSTD_IOBUF_SIZE); + if (out_allocated == NULL) { + error("Out of memory while allocating output buffer"); + err = -1; + goto out; + } + out_buf = out_allocated; + out_len = ZSTD_IOBUF_SIZE; + } + /* Set the output buffer. */ + out.dst = out_buf; + out.pos = 0; + out.size = out_len; + + /* + * We need to know the window size to allocate the ZSTD_DStream. + * Since we are streaming, we need to allocate a buffer for the sliding + * window. The window size varies from 1 KB to ZSTD_WINDOWSIZE_MAX + * (8 MB), so it is important to use the actual value so as not to + * waste memory when it is smaller. + */ + ret = ZSTD_getFrameParams(¶ms, in.src, in.size); + err = handle_zstd_error(ret, error); + if (err) + goto out; + if (ret != 0) { + error("ZSTD-compressed data has an incomplete frame header"); + err = -1; + goto out; + } + if (params.windowSize > ZSTD_WINDOWSIZE_MAX) { + error("ZSTD-compressed data has too large a window size"); + err = -1; + goto out; + } + + /* + * Allocate the ZSTD_DStream now that we know how much memory is + * required. + */ + wksp_size = ZSTD_DStreamWorkspaceBound(params.windowSize); + wksp = large_malloc(wksp_size); + dstream = ZSTD_initDStream(params.windowSize, wksp, wksp_size); + if (dstream == NULL) { + error("Out of memory while allocating ZSTD_DStream"); + err = -1; + goto out; + } + + /* + * Decompression loop: + * Read more data if necessary (error if no more data can be read). + * Call the decompression function, which returns 0 when finished. + * Flush any data produced if using flush(). + */ + if (in_pos != NULL) + *in_pos = 0; + do { + /* + * If we need to reload data, either we have fill() and can + * try to get more data, or we don't and the input is truncated. + */ + if (in.pos == in.size) { + if (in_pos != NULL) + *in_pos += in.pos; + in_len = fill ? fill(in_buf, ZSTD_IOBUF_SIZE) : -1; + if (in_len < 0) { + error("ZSTD-compressed data is truncated"); + err = -1; + goto out; + } + in.pos = 0; + in.size = in_len; + } + /* Returns zero when the frame is complete. */ + ret = ZSTD_decompressStream(dstream, &out, &in); + err = handle_zstd_error(ret, error); + if (err) + goto out; + /* Flush all of the data produced if using flush(). */ + if (flush != NULL && out.pos > 0) { + if (out.pos != flush(out.dst, out.pos)) { + error("Failed to flush()"); + err = -1; + goto out; + } + out.pos = 0; + } + } while (ret != 0); + + if (in_pos != NULL) + *in_pos += in.pos; + + err = 0; +out: + if (in_allocated != NULL) + free(in_allocated); + if (out_allocated != NULL) + free(out_allocated); + if (wksp != NULL) + large_free(wksp); + return err; +} + +#ifndef ZSTD_PREBOOT +STATIC int INIT unzstd(unsigned char *buf, long len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), + unsigned char *out_buf, + long *pos, + void (*error)(char *x)) +{ + return __unzstd(buf, len, fill, flush, out_buf, 0, pos, error); +} +#else +STATIC int INIT __decompress(unsigned char *buf, long len, + long (*fill)(void*, unsigned long), + long (*flush)(void*, unsigned long), + unsigned char *out_buf, long out_len, + long *pos, + void (*error)(char *x)) +{ + return __unzstd(buf, len, fill, flush, out_buf, out_len, pos, error); +} +#endif