Message ID | 20211117201459.1194876-3-nickrterrell@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix stack usage on parisc & improve code size bloat | expand |
Hi Nick, On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <nickrterrell@gmail.com> wrote: > `zstd_opt.c` contains the match finder for the highest compression > levels. These levels are already very slow, and are unlikely to be used > in the kernel. If they are used, they shouldn't be used in latency > sensitive workloads, so slowing them down shouldn't be a big deal. > > This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0]. > I've also opened an issue upstream [1] so that we can properly tackle > the code size issue in `zstd_opt.c` for all users, and can hopefully > remove this hack in the next zstd version we import. > > Bloat-o-meter output on x86-64: > > ``` > > ../scripts/bloat-o-meter vmlinux.old vmlinux > add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266) > Function old new delta > ZSTD_compressBlock_opt_generic.constprop - 7559 +7559 > ZSTD_insertBtAndGetAllMatches - 6304 +6304 > ZSTD_insertBt1 - 1731 +1731 > ZSTD_storeSeq - 693 +693 > ZSTD_BtGetAllMatches - 255 +255 > ZSTD_updateRep - 128 +128 > ZSTD_updateTree 96 99 +3 > ZSTD_insertAndFindFirstIndexHash3 81 - -81 > ZSTD_setBasePrices.constprop 98 - -98 > ZSTD_litLengthPrice.constprop 138 - -138 > ZSTD_count 362 181 -181 > ZSTD_count_2segments 1407 938 -469 > ZSTD_insertBt1.constprop 2689 - -2689 > ZSTD_compressBlock_btultra2 19990 423 -19567 > ZSTD_compressBlock_btultra 19633 15 -19618 > ZSTD_initStats_ultra 19825 - -19825 > ZSTD_compressBlock_btopt 20374 12 -20362 > ZSTD_compressBlock_btopt_extDict 29984 12 -29972 > ZSTD_compressBlock_btultra_extDict 30718 15 -30703 > ZSTD_compressBlock_btopt_dictMatchState 32689 12 -32677 > ZSTD_compressBlock_btultra_dictMatchState 33574 15 -33559 > Total: Before=6611828, After=6418562, chg -2.92% > ``` > > [0] https://lkml.org/lkml/2021/11/14/189 > [1] https://github.com/facebook/zstd/issues/2862 > > Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> > Signed-off-by: Nick Terrell <terrelln@fb.com> Thanks for your patch! Impact on lib/zstd/zstd_compress.ko for atari_defconfig: add/remove: 5/4 grow/shrink: 1/9 up/down: 15392/-167214 (-151822) Nice! Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> > --- a/lib/zstd/compress/zstd_opt.c > +++ b/lib/zstd/compress/zstd_opt.c > @@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_ > */ > U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock; > ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot); > - } > + } > ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes); > } > ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock); This change is unrelated. With that removed: Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
> On Nov 17, 2021, at 11:52 PM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Nick, > > On Wed, Nov 17, 2021 at 9:08 PM Nick Terrell <nickrterrell@gmail.com> wrote: >> `zstd_opt.c` contains the match finder for the highest compression >> levels. These levels are already very slow, and are unlikely to be used >> in the kernel. If they are used, they shouldn't be used in latency >> sensitive workloads, so slowing them down shouldn't be a big deal. >> >> This saves 188 KB of the 288 KB regression reported by Geert Uytterhoeven [0]. >> I've also opened an issue upstream [1] so that we can properly tackle >> the code size issue in `zstd_opt.c` for all users, and can hopefully >> remove this hack in the next zstd version we import. >> >> Bloat-o-meter output on x86-64: >> >> ``` >>> ../scripts/bloat-o-meter vmlinux.old vmlinux >> add/remove: 6/5 grow/shrink: 1/9 up/down: 16673/-209939 (-193266) >> Function old new delta >> ZSTD_compressBlock_opt_generic.constprop - 7559 +7559 >> ZSTD_insertBtAndGetAllMatches - 6304 +6304 >> ZSTD_insertBt1 - 1731 +1731 >> ZSTD_storeSeq - 693 +693 >> ZSTD_BtGetAllMatches - 255 +255 >> ZSTD_updateRep - 128 +128 >> ZSTD_updateTree 96 99 +3 >> ZSTD_insertAndFindFirstIndexHash3 81 - -81 >> ZSTD_setBasePrices.constprop 98 - -98 >> ZSTD_litLengthPrice.constprop 138 - -138 >> ZSTD_count 362 181 -181 >> ZSTD_count_2segments 1407 938 -469 >> ZSTD_insertBt1.constprop 2689 - -2689 >> ZSTD_compressBlock_btultra2 19990 423 -19567 >> ZSTD_compressBlock_btultra 19633 15 -19618 >> ZSTD_initStats_ultra 19825 - -19825 >> ZSTD_compressBlock_btopt 20374 12 -20362 >> ZSTD_compressBlock_btopt_extDict 29984 12 -29972 >> ZSTD_compressBlock_btultra_extDict 30718 15 -30703 >> ZSTD_compressBlock_btopt_dictMatchState 32689 12 -32677 >> ZSTD_compressBlock_btultra_dictMatchState 33574 15 -33559 >> Total: Before=6611828, After=6418562, chg -2.92% >> ``` >> >> [0] https://lkml.org/lkml/2021/11/14/189 >> [1] https://github.com/facebook/zstd/issues/2862 >> >> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> >> Signed-off-by: Nick Terrell <terrelln@fb.com> > > Thanks for your patch! > > Impact on lib/zstd/zstd_compress.ko for atari_defconfig: > > add/remove: 5/4 grow/shrink: 1/9 up/down: 15392/-167214 (-151822) > > Nice! > > Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> > >> --- a/lib/zstd/compress/zstd_opt.c >> +++ b/lib/zstd/compress/zstd_opt.c >> @@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_ >> */ >> U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock; >> ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot); >> - } >> + } >> ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes); >> } >> ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock); > > This change is unrelated. With that removed: > Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Thanks for the review! I will remove this change before submitting the PR. Unless you have a strong preference, I won’t put up another version of the patch set with just this change. Best, Nick Terrell > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds
diff --git a/lib/zstd/common/compiler.h b/lib/zstd/common/compiler.h index a1a051e4bce6..f5a9c70a228a 100644 --- a/lib/zstd/common/compiler.h +++ b/lib/zstd/common/compiler.h @@ -16,6 +16,7 @@ *********************************************************/ /* force inlining */ +#if !defined(ZSTD_NO_INLINE) #if (defined(__GNUC__) && !defined(__STRICT_ANSI__)) || defined(__cplusplus) || defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L /* C99 */ # define INLINE_KEYWORD inline #else @@ -24,6 +25,12 @@ #define FORCE_INLINE_ATTR __attribute__((always_inline)) +#else + +#define INLINE_KEYWORD +#define FORCE_INLINE_ATTR + +#endif /* On MSVC qsort requires that functions passed into it use the __cdecl calling conversion(CC). diff --git a/lib/zstd/compress/zstd_opt.c b/lib/zstd/compress/zstd_opt.c index 04337050fe9a..09483f518dc3 100644 --- a/lib/zstd/compress/zstd_opt.c +++ b/lib/zstd/compress/zstd_opt.c @@ -8,6 +8,18 @@ * You may select, at your option, one of the above-listed licenses. */ +/* + * Disable inlining for the optimal parser for the kernel build. + * It is unlikely to be used in the kernel, and where it is used + * latency shouldn't matter because it is very slow to begin with. + * We prefer a ~180KB binary size win over faster optimal parsing. + * + * TODO(https://github.com/facebook/zstd/issues/2862): + * Improve the code size of the optimal parser in general, so we + * don't need this hack for the kernel build. + */ +#define ZSTD_NO_INLINE 1 + #include "zstd_compress_internal.h" #include "hist.h" #include "zstd_opt.h" @@ -894,7 +906,7 @@ static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_ */ U32 posOvershoot = currPosInBlock - optLdm->endPosInBlock; ZSTD_optLdm_skipRawSeqStoreBytes(&optLdm->seqStore, posOvershoot); - } + } ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, remainingBytes); } ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock);