mbox series

[00/11] btrfs: add zstd compression level support

Message ID 20190128212437.11597-1-dennis@kernel.org (mailing list archive)
Headers show
Series btrfs: add zstd compression level support | expand

Message

Dennis Zhou Jan. 28, 2019, 9:24 p.m. UTC
Hi everyone,

This is a respin of [1] which aims to add zstd compression level
support. V3 moves away from the using set_level() to resize workspaces
in favor of just allocating a workspace of the appropriate level and
using a timer to reclaim unused workspaces.

Zstd compression requires different amounts of memory for each level of
compression. The prior patches implemented indirection to allow for each
compression type to manage their workspaces independently. This patch
uses this indirection to implement compression level support for zstd.

As mentioned above, a requirement that differs zstd from zlib is that
higher levels of compression require more memory. To manage this, each
compression level has its own queue of workspaces. A global LRU is used
to help with reclaim. To guarantee forward progress, a max level
workspace is preallocated and hidden from the LRU.

When getting a workspace, it uses a bitmap to identify the levels that
are populated and scans up. If it finds a workspace that is greater than
it, it uses it, but does not update the last_used time and the
corresponding place in the LRU. This provides a mechanism to decrease
memory utilization as we only keep around workspaces that are sized
appropriately for the in use compression levels.

By knowing which compression levels have available workspaces, we can
recycle rather than always create new workspaces as well as take
advantage of the preallocated max level for forward progress. If we hit
memory pressure, we sleep on the max level workspace. We continue to
rescan in case we can use a smaller workspace, but eventually should be
able to obtain the max level workspace or allocate one again should
memory pressure subside. The memory requirement for decompression is the
same as level 1, and therefore can use any of available workspace.

The number of workspaces is bound by an upper limit of the workqueue's
limit which currently is 2 (percpu) limit). Second, a reclaim timer is
used to free inactive/improperly sized workspaces. The reclaim timer is
set to 67s to avoid colliding with transaction commit (every 30s) and
attempts to reclaim any unused workspace older than 45s.

Repeating the experiment from v2 [1], the Silesia corpus was copied to a
btrfs filesystem 10 times and then read back after dropping the caches.
The btrfs filesystem was on an SSD.

Level   Ratio   Compression (MB/s)  Decompression (MB/s)
1       2.658        438.47                910.51
2       2.744        364.86                886.55
3       2.801        336.33                828.41
4       2.858        286.71                886.55
5       2.916        212.77                556.84
6       2.363        119.82                990.85
7       3.000        154.06                849.30
8       3.011        159.54                875.03
9       3.025        100.51                940.15
10      3.033        118.97                616.26
11      3.036         94.19                802.11
12      3.037         73.45                931.49
13      3.041         55.17                835.26
14      3.087         44.70                716.78
15      3.126         37.30                878.84

[1] https://lore.kernel.org/linux-btrfs/20181031181108.289340-1-terrelln@fb.com/

This patchset contains the following 11 patches:
  0001-btrfs-add-macros-for-compression-type-and-level.patch
  0002-btrfs-rename-workspaces_list-to-workspace_manager.patch
  0003-btrfs-manage-heuristic-workspace-as-index-0.patch
  0004-btrfs-unify-compression-ops-with-workspace_manager.patch
  0005-btrfs-add-helper-methods-for-workspace-manager-init-.patch
  0006-btrfs-add-compression-interface-in-get-put-_workspac.patch
  0007-btrfs-move-to-fn-pointers-for-get-put-workspaces.patch
  0008-btrfs-plumb-level-through-the-compression-interface.patch
  0009-btrfs-change-set_level-to-bound-the-level-passed-in.patch
  0010-btrfs-zstd-use-the-passed-through-level-instead-of-d.patch
  0011-btrfs-add-zstd-compression-level-support.patch

0001 adds macros for type_level conversion. 0002 renames workspaces_list
to workspace_manager.  0003 moves back to managing the heuristic
workspaces as the index 0 compression level. 0004-0007 unify operations
with the workspace_manager with 0007 moving to compression types owning
their workspace_manager. 0008-0010 plumbs level throughout the
compression level getting interface and converts set_level() to be a
bounding function rather than setting level on a workspace. 0011 adds
zstd compression level support.

This patchset is on top of kdave#master d73aba1115cf.

diffstats below:

Dennis Zhou (11):
  btrfs: add macros for compression type and level
  btrfs: rename workspaces_list to workspace_manager
  btrfs: manage heuristic workspace as index 0
  btrfs: unify compression ops with workspace_manager
  btrfs: add helper methods for workspace manager init and cleanup
  btrfs: add compression interface in (get/put)_workspace()
  btrfs: move to fn pointers for get/put workspaces
  btrfs: plumb level through the compression interface
  btrfs: change set_level() to bound the level passed in
  btrfs: zstd use the passed through level instead of default
  btrfs: add zstd compression level support

 fs/btrfs/compression.c  | 251 ++++++++++++++++++--------------------
 fs/btrfs/compression.h  |  39 +++++-
 fs/btrfs/ioctl.c        |   2 +-
 fs/btrfs/lzo.c          |  31 ++++-
 fs/btrfs/super.c        |  10 +-
 fs/btrfs/tree-checker.c |   4 +-
 fs/btrfs/zlib.c         |  45 +++++--
 fs/btrfs/zstd.c         | 261 ++++++++++++++++++++++++++++++++++++++--
 8 files changed, 485 insertions(+), 158 deletions(-)

Thanks,
Dennis

Comments

David Sterba Jan. 29, 2019, 5:18 p.m. UTC | #1
On Mon, Jan 28, 2019 at 04:24:26PM -0500, Dennis Zhou wrote:
> As mentioned above, a requirement that differs zstd from zlib is that
> higher levels of compression require more memory. To manage this, each
> compression level has its own queue of workspaces. A global LRU is used
> to help with reclaim. To guarantee forward progress, a max level
> workspace is preallocated and hidden from the LRU.

Here I'd like to bring up what was mentioned in previous iteration, the
workspace sizes.

Level   Compression Memory
1       0.8 MB
2       1.0 MB
3       1.3 MB
4       0.9 MB
5       1.4 MB
6       1.5 MB
7       1.4 MB
8       1.8 MB
9       1.8 MB
10      1.8 MB
11      1.8 MB
12      1.8 MB
13      2.3 MB
14      2.6 MB
15      2.6 MB

and decompression needs memory of level 1. The sizes can be grouped
together to say 3 sizes, I'm not sure that we'd really need 15 distinct
workspaces. The reclaim mechanism helps, but I'd rather keep a smaller
number of workspaces that covers average use.

Default level is 3, that's 1.3 MiB, that also covers level 1, 2 and 4.
For 5 to 12 it's 1.8 and the rest is 2.6 MiB.

> btrfs filesystem 10 times and then read back after dropping the caches.
> The btrfs filesystem was on an SSD.
> 
> Level   Ratio   Compression (MB/s)  Decompression (MB/s)
> 1       2.658        438.47                910.51
> 2       2.744        364.86                886.55
> 3       2.801        336.33                828.41
> 4       2.858        286.71                886.55
> 5       2.916        212.77                556.84
> 6       2.363        119.82                990.85
> 7       3.000        154.06                849.30
> 8       3.011        159.54                875.03
> 9       3.025        100.51                940.15
> 10      3.033        118.97                616.26
> 11      3.036         94.19                802.11
> 12      3.037         73.45                931.49
> 13      3.041         55.17                835.26
> 14      3.087         44.70                716.78
> 15      3.126         37.30                878.84

From my casual user's perspective, I'd use the level 1 for speed, 7 for
better ratio and 15 for the best compression. Anything else does not
look good, though the results would vary based on the data set. I
assume that the silesia corpus serves as a good approximation of the
worst case average.

The levels 7-14 strike particularly obvious pattern: same ratio but the
speed gets worse with each level. Taking the default level into account,
(my) recommended levels would be 1, 3, 7, 15.

I went through the patches, looks mostly ok, I don't like the
indirections but at the moment it's an implementation detail as I'd like
to agree on the overall approach first.

We might need a few revisions or cleanup rounds to converge to an
efficient solution, the advantage here is that it's all in-memory and
without compatibility concerns once the level support for zstd is in and
works.

For that reason, I'm not opposed to the current version of the patchset.
Given the time in development schedule, it's really close to code
freeze, but the functionality has a narrow scope so I'm tentatively
counting with it for 5.1.
Nick Terrell Jan. 29, 2019, 9:12 p.m. UTC | #2
> On Jan 29, 2019, at 9:18 AM, David Sterba <dsterba@suse.cz> wrote:
> 
> On Mon, Jan 28, 2019 at 04:24:26PM -0500, Dennis Zhou wrote:
>> As mentioned above, a requirement that differs zstd from zlib is that
>> higher levels of compression require more memory. To manage this, each
>> compression level has its own queue of workspaces. A global LRU is used
>> to help with reclaim. To guarantee forward progress, a max level
>> workspace is preallocated and hidden from the LRU.
> 
> Here I'd like to bring up what was mentioned in previous iteration, the
> workspace sizes.
> 
> Level   Compression Memory
> 1       0.8 MB
> 2       1.0 MB
> 3       1.3 MB
> 4       0.9 MB
> 5       1.4 MB
> 6       1.5 MB
> 7       1.4 MB
> 8       1.8 MB
> 9       1.8 MB
> 10      1.8 MB
> 11      1.8 MB
> 12      1.8 MB
> 13      2.3 MB
> 14      2.6 MB
> 15      2.6 MB
> 
> and decompression needs memory of level 1. The sizes can be grouped
> together to say 3 sizes, I'm not sure that we'd really need 15 distinct
> workspaces. The reclaim mechanism helps, but I'd rather keep a smaller
> number of workspaces that covers average use.
> 
> Default level is 3, that's 1.3 MiB, that also covers level 1, 2 and 4.
> For 5 to 12 it's 1.8 and the rest is 2.6 MiB.
> 
>> btrfs filesystem 10 times and then read back after dropping the caches.
>> The btrfs filesystem was on an SSD.
>> 
>> Level   Ratio   Compression (MB/s)  Decompression (MB/s)
>> 1       2.658        438.47                910.51
>> 2       2.744        364.86                886.55
>> 3       2.801        336.33                828.41
>> 4       2.858        286.71                886.55
>> 5       2.916        212.77                556.84
>> 6       2.363        119.82                990.85
>> 7       3.000        154.06                849.30
>> 8       3.011        159.54                875.03
>> 9       3.025        100.51                940.15
>> 10      3.033        118.97                616.26
>> 11      3.036         94.19                802.11
>> 12      3.037         73.45                931.49
>> 13      3.041         55.17                835.26
>> 14      3.087         44.70                716.78
>> 15      3.126         37.30                878.84
> 
> From my casual user's perspective, I'd use the level 1 for speed, 7 for
> better ratio and 15 for the best compression. Anything else does not
> look good, though the results would vary based on the data set. I
> assume that the silesia corpus serves as a good approximation of the
> worst case average.
>
> The levels 7-14 strike particularly obvious pattern: same ratio but the
> speed gets worse with each level. Taking the default level into account,
> (my) recommended levels would be 1, 3, 7, 15.

Silesia is used because it is a standard corpus, and I'd call it about average,
but there is a lot of variance and extreme edge case data. The intermediate
strategies will change in effectiveness on different types of data. For example,
the lower levels are generally more effective on text, and you want slightly
higher levels for non-text data, because they can find shorter matches.

Upstream zstd also shifts around its levels, and the memory usage of each
level from time-to-time, and I am going to update zstd in the kernel in this
next year, since we are slowing down development. The shifts will be small
though.

It could make sense to map the levels into size classes, since that could
reduce memory spikes, at the cost of higher stead-state memory usage.
I'm not familiar with the machinery used in these patches, so I can't actually
say much. I would probably use levels 1, 3, 7 (after it is made monotonic),
12, and 15. You might skip 7, but leave 12.

> I went through the patches, looks mostly ok, I don't like the
> indirections but at the moment it's an implementation detail as I'd like
> to agree on the overall approach first.
> 
> We might need a few revisions or cleanup rounds to converge to an
> efficient solution, the advantage here is that it's all in-memory and
> without compatibility concerns once the level support for zstd is in and
> works.
> 
> For that reason, I'm not opposed to the current version of the patchset.
> Given the time in development schedule, it's really close to code
> freeze, but the functionality has a narrow scope so I'm tentatively
> counting with it for 5.1.
Dennis Zhou Jan. 30, 2019, 5:40 p.m. UTC | #3
Hi David,

On Tue, Jan 29, 2019 at 06:18:30PM +0100, David Sterba wrote:
> On Mon, Jan 28, 2019 at 04:24:26PM -0500, Dennis Zhou wrote:
> > As mentioned above, a requirement that differs zstd from zlib is that
> > higher levels of compression require more memory. To manage this, each
> > compression level has its own queue of workspaces. A global LRU is used
> > to help with reclaim. To guarantee forward progress, a max level
> > workspace is preallocated and hidden from the LRU.
> 
> Here I'd like to bring up what was mentioned in previous iteration, the
> workspace sizes.
> 
> Level   Compression Memory
> 1       0.8 MB
> 2       1.0 MB
> 3       1.3 MB
> 4       0.9 MB
> 5       1.4 MB
> 6       1.5 MB
> 7       1.4 MB
> 8       1.8 MB
> 9       1.8 MB
> 10      1.8 MB
> 11      1.8 MB
> 12      1.8 MB
> 13      2.3 MB
> 14      2.6 MB
> 15      2.6 MB
> 
> and decompression needs memory of level 1. The sizes can be grouped
> together to say 3 sizes, I'm not sure that we'd really need 15 distinct
> workspaces. The reclaim mechanism helps, but I'd rather keep a smaller
> number of workspaces that covers average use.
> 
> Default level is 3, that's 1.3 MiB, that also covers level 1, 2 and 4.
> For 5 to 12 it's 1.8 and the rest is 2.6 MiB.
> 

I realize the current implementation doesn't have a monotonic memory
requirement guarantee. I've added that, and below is updated memory
requirements per level. I've updated the commit to include this too.

Level     Memory (KB)
1            780 
2           1004
3           1260
4           1260
5           1388
6           1516
7           1516
8           1772
9           1772
10          1772
11          1772
12          1772
13          2284
14          2547
15          2547

> > btrfs filesystem 10 times and then read back after dropping the caches.
> > The btrfs filesystem was on an SSD.
> > 
> > Level   Ratio   Compression (MB/s)  Decompression (MB/s)
> > 1       2.658        438.47                910.51
> > 2       2.744        364.86                886.55
> > 3       2.801        336.33                828.41
> > 4       2.858        286.71                886.55
> > 5       2.916        212.77                556.84
> > 6       2.363        119.82                990.85
> > 7       3.000        154.06                849.30
> > 8       3.011        159.54                875.03
> > 9       3.025        100.51                940.15
> > 10      3.033        118.97                616.26
> > 11      3.036         94.19                802.11
> > 12      3.037         73.45                931.49
> > 13      3.041         55.17                835.26
> > 14      3.087         44.70                716.78
> > 15      3.126         37.30                878.84
> 
> From my casual user's perspective, I'd use the level 1 for speed, 7 for
> better ratio and 15 for the best compression. Anything else does not
> look good, though the results would vary based on the data set. I
> assume that the silesia corpus serves as a good approximation of the
> worst case average.
> 
> The levels 7-14 strike particularly obvious pattern: same ratio but the
> speed gets worse with each level. Taking the default level into account,
> (my) recommended levels would be 1, 3, 7, 15.
> 

I do see why we want to limit the number of levels as the memory
requirements do kind of bucket themselves. But, this means when zstd
gets updated, we'd have to reevaluate the compression levels btrfs
supports. I'm not sure it's a great idea to have that dependency.
I imagine we could offer some level of guidance, but it really would be
up to the user to figure out what works best for them.

The reclaim mechanism only keeps workspaces around if they are being
used by the appropriate level. So, the memory overhead is actively used
memory and if not, it is reclaimed after at most ~2 minutes later. I
also scan up before allocating a workspace, so that should help limit
the number of workspaces in circulation.

> I went through the patches, looks mostly ok, I don't like the
> indirections but at the moment it's an implementation detail as I'd like
> to agree on the overall approach first.
> 
> We might need a few revisions or cleanup rounds to converge to an
> efficient solution, the advantage here is that it's all in-memory and
> without compatibility concerns once the level support for zstd is in and
> works.
> 
> For that reason, I'm not opposed to the current version of the patchset.
> Given the time in development schedule, it's really close to code
> freeze, but the functionality has a narrow scope so I'm tentatively
> counting with it for 5.1.

Thanks,
Dennis
David Sterba Jan. 31, 2019, 2:04 p.m. UTC | #4
On Wed, Jan 30, 2019 at 12:40:59PM -0500, Dennis Zhou wrote:
> Hi David,
> 
> On Tue, Jan 29, 2019 at 06:18:30PM +0100, David Sterba wrote:
> > On Mon, Jan 28, 2019 at 04:24:26PM -0500, Dennis Zhou wrote:
> > > As mentioned above, a requirement that differs zstd from zlib is that
> > > higher levels of compression require more memory. To manage this, each
> > > compression level has its own queue of workspaces. A global LRU is used
> > > to help with reclaim. To guarantee forward progress, a max level
> > > workspace is preallocated and hidden from the LRU.
> > 
> > Here I'd like to bring up what was mentioned in previous iteration, the
> > workspace sizes.
> > 
> > Level   Compression Memory
> > 1       0.8 MB
> > 2       1.0 MB
> > 3       1.3 MB
> > 4       0.9 MB
> > 5       1.4 MB
> > 6       1.5 MB
> > 7       1.4 MB
> > 8       1.8 MB
> > 9       1.8 MB
> > 10      1.8 MB
> > 11      1.8 MB
> > 12      1.8 MB
> > 13      2.3 MB
> > 14      2.6 MB
> > 15      2.6 MB
> > 
> > and decompression needs memory of level 1. The sizes can be grouped
> > together to say 3 sizes, I'm not sure that we'd really need 15 distinct
> > workspaces. The reclaim mechanism helps, but I'd rather keep a smaller
> > number of workspaces that covers average use.
> > 
> > Default level is 3, that's 1.3 MiB, that also covers level 1, 2 and 4.
> > For 5 to 12 it's 1.8 and the rest is 2.6 MiB.
> > 
> 
> I realize the current implementation doesn't have a monotonic memory
> requirement guarantee. I've added that, and below is updated memory
> requirements per level. I've updated the commit to include this too.
> 
> Level     Memory (KB)
> 1            780 
> 2           1004
> 3           1260
> 4           1260
> 5           1388
> 6           1516
> 7           1516
> 8           1772
> 9           1772
> 10          1772
> 11          1772
> 12          1772
> 13          2284
> 14          2547
> 15          2547
> 
> > > btrfs filesystem 10 times and then read back after dropping the caches.
> > > The btrfs filesystem was on an SSD.
> > > 
> > > Level   Ratio   Compression (MB/s)  Decompression (MB/s)
> > > 1       2.658        438.47                910.51
> > > 2       2.744        364.86                886.55
> > > 3       2.801        336.33                828.41
> > > 4       2.858        286.71                886.55
> > > 5       2.916        212.77                556.84
> > > 6       2.363        119.82                990.85
> > > 7       3.000        154.06                849.30
> > > 8       3.011        159.54                875.03
> > > 9       3.025        100.51                940.15
> > > 10      3.033        118.97                616.26
> > > 11      3.036         94.19                802.11
> > > 12      3.037         73.45                931.49
> > > 13      3.041         55.17                835.26
> > > 14      3.087         44.70                716.78
> > > 15      3.126         37.30                878.84
> > 
> > From my casual user's perspective, I'd use the level 1 for speed, 7 for
> > better ratio and 15 for the best compression. Anything else does not
> > look good, though the results would vary based on the data set. I
> > assume that the silesia corpus serves as a good approximation of the
> > worst case average.
> > 
> > The levels 7-14 strike particularly obvious pattern: same ratio but the
> > speed gets worse with each level. Taking the default level into account,
> > (my) recommended levels would be 1, 3, 7, 15.
> > 
> 
> I do see why we want to limit the number of levels as the memory
> requirements do kind of bucket themselves. But, this means when zstd
> gets updated, we'd have to reevaluate the compression levels btrfs
> supports. I'm not sure it's a great idea to have that dependency.
> I imagine we could offer some level of guidance, but it really would be
> up to the user to figure out what works best for them.

If it was not clear, I did not mean to have only 4 levels, keep all 15
same as there are 9 for zlib. The guildelines would be desirable and I
don't want to make decision for the user which level to pick. So we
don't disagree.

> The reclaim mechanism only keeps workspaces around if they are being
> used by the appropriate level. So, the memory overhead is actively used
> memory and if not, it is reclaimed after at most ~2 minutes later. I
> also scan up before allocating a workspace, so that should help limit
> the number of workspaces in circulation.

We'd need to observe that in practice before doing refinements, simpler
logic is better for the start.

There's some penalty caused by the allocation if there are no workspaces
at all, as the amount of memory is quite large for kernel.
This could stress the memory subsystem also because the memory has to be
either contiguous or vmalloced. As the memory is released soon, all the
work might need to be done again and again. So, more than one
preallocated workspace could be good but the number of levels does not
make it easy to choose which one.
Dennis Zhou Jan. 31, 2019, 3:56 p.m. UTC | #5
On Thu, Jan 31, 2019 at 03:04:36PM +0100, David Sterba wrote:
> On Wed, Jan 30, 2019 at 12:40:59PM -0500, Dennis Zhou wrote:
> > Hi David,
> > 
> > On Tue, Jan 29, 2019 at 06:18:30PM +0100, David Sterba wrote:
> > > On Mon, Jan 28, 2019 at 04:24:26PM -0500, Dennis Zhou wrote:
> > > > As mentioned above, a requirement that differs zstd from zlib is that
> > > > higher levels of compression require more memory. To manage this, each
> > > > compression level has its own queue of workspaces. A global LRU is used
> > > > to help with reclaim. To guarantee forward progress, a max level
> > > > workspace is preallocated and hidden from the LRU.
> > > 
> > > Here I'd like to bring up what was mentioned in previous iteration, the
> > > workspace sizes.
> > > 
> > > Level   Compression Memory
> > > 1       0.8 MB
> > > 2       1.0 MB
> > > 3       1.3 MB
> > > 4       0.9 MB
> > > 5       1.4 MB
> > > 6       1.5 MB
> > > 7       1.4 MB
> > > 8       1.8 MB
> > > 9       1.8 MB
> > > 10      1.8 MB
> > > 11      1.8 MB
> > > 12      1.8 MB
> > > 13      2.3 MB
> > > 14      2.6 MB
> > > 15      2.6 MB
> > > 
> > > and decompression needs memory of level 1. The sizes can be grouped
> > > together to say 3 sizes, I'm not sure that we'd really need 15 distinct
> > > workspaces. The reclaim mechanism helps, but I'd rather keep a smaller
> > > number of workspaces that covers average use.
> > > 
> > > Default level is 3, that's 1.3 MiB, that also covers level 1, 2 and 4.
> > > For 5 to 12 it's 1.8 and the rest is 2.6 MiB.
> > > 
> > 
> > I realize the current implementation doesn't have a monotonic memory
> > requirement guarantee. I've added that, and below is updated memory
> > requirements per level. I've updated the commit to include this too.
> > 
> > Level     Memory (KB)
> > 1            780 
> > 2           1004
> > 3           1260
> > 4           1260
> > 5           1388
> > 6           1516
> > 7           1516
> > 8           1772
> > 9           1772
> > 10          1772
> > 11          1772
> > 12          1772
> > 13          2284
> > 14          2547
> > 15          2547
> > 
> > > > btrfs filesystem 10 times and then read back after dropping the caches.
> > > > The btrfs filesystem was on an SSD.
> > > > 
> > > > Level   Ratio   Compression (MB/s)  Decompression (MB/s)
> > > > 1       2.658        438.47                910.51
> > > > 2       2.744        364.86                886.55
> > > > 3       2.801        336.33                828.41
> > > > 4       2.858        286.71                886.55
> > > > 5       2.916        212.77                556.84
> > > > 6       2.363        119.82                990.85
> > > > 7       3.000        154.06                849.30
> > > > 8       3.011        159.54                875.03
> > > > 9       3.025        100.51                940.15
> > > > 10      3.033        118.97                616.26
> > > > 11      3.036         94.19                802.11
> > > > 12      3.037         73.45                931.49
> > > > 13      3.041         55.17                835.26
> > > > 14      3.087         44.70                716.78
> > > > 15      3.126         37.30                878.84
> > > 
> > > From my casual user's perspective, I'd use the level 1 for speed, 7 for
> > > better ratio and 15 for the best compression. Anything else does not
> > > look good, though the results would vary based on the data set. I
> > > assume that the silesia corpus serves as a good approximation of the
> > > worst case average.
> > > 
> > > The levels 7-14 strike particularly obvious pattern: same ratio but the
> > > speed gets worse with each level. Taking the default level into account,
> > > (my) recommended levels would be 1, 3, 7, 15.
> > > 
> > 
> > I do see why we want to limit the number of levels as the memory
> > requirements do kind of bucket themselves. But, this means when zstd
> > gets updated, we'd have to reevaluate the compression levels btrfs
> > supports. I'm not sure it's a great idea to have that dependency.
> > I imagine we could offer some level of guidance, but it really would be
> > up to the user to figure out what works best for them.
> 
> If it was not clear, I did not mean to have only 4 levels, keep all 15
> same as there are 9 for zlib. The guildelines would be desirable and I
> don't want to make decision for the user which level to pick. So we
> don't disagree.
> 

I see, that was my misunderstanding.

> > The reclaim mechanism only keeps workspaces around if they are being
> > used by the appropriate level. So, the memory overhead is actively used
> > memory and if not, it is reclaimed after at most ~2 minutes later. I
> > also scan up before allocating a workspace, so that should help limit
> > the number of workspaces in circulation.
> 
> We'd need to observe that in practice before doing refinements, simpler
> logic is better for the start.
> 
> There's some penalty caused by the allocation if there are no workspaces
> at all, as the amount of memory is quite large for kernel.
> This could stress the memory subsystem also because the memory has to be
> either contiguous or vmalloced. As the memory is released soon, all the
> work might need to be done again and again. So, more than one
> preallocated workspace could be good but the number of levels does not
> make it easy to choose which one.

That makes sense. I don't have an answer for how to balance the number
of workspaces, but am happy to iterate on this as we get more data.

If no one has any other comments on the series after another day or so I
can send v2 with the handful of things people have mentioned and the
monotonic memory requirement patch.

Thanks,
Dennis