mbox series

[v2,0/2] btrfs: remove the inode_need_compress() call in

Message ID 20200804072548.34001-1-wqu@suse.com (mailing list archive)
Headers show
Series btrfs: remove the inode_need_compress() call in | expand

Message

Qu Wenruo Aug. 4, 2020, 7:25 a.m. UTC
This is an attempt to remove the inode_need_compress() call in
compress_file_extent().

As that compress_file_extent() can race with inode ioctl or bad
compression ratio, to cause NULL pointer dereferecen for @pages, it's
nature to try to remove that inode_need_compress() to remove the race
completely.

However that's not that easy, we have the following problems:

- We still need to check @pages anyway
  That @pages check is for kcalloc() failure, so what we really get is
  just removing one indent from the if (inode_need_compress()).
  Everything else is still the same (in fact, even worse, see below
  problems)

- Behavior change
  Before that change, every async_chunk does their check on
  INODE_NO_COMPRESS flags.
  If we hit any bad compression ratio, all incoming async_chunk will
  fall back to plain text write.

  But if we remove that inode_need_compress() check, then we still try
  to compress, and lead to potentially wasted CPU times.

- Still race between compression disable and NULL pointer dereferecen
  There is a hidden race, mostly exposed by btrfs/071 test case, that we
  have "compress_type = fs_info->compress_type", so we can still hit case
  where that compress_type is NONE (caused by remount -o nocompress), and
  then btrfs_compress_pages() will return -E2BIG, without modifying
  @nr_pages

  Then later when we cleanup @pages, we try to access pages[i]->mapping,
  triggering NULL pointer dereference.

  This will be address in the first patch though.

Changelog:
v2:
- Fix a bad commit merge
  Which merges the commit message for the first patch, though the content is
  still correct.

Qu Wenruo (2):
  btrfs: prevent NULL pointer dereference in compress_file_range() when
    btrfs_compress_pages() hits default case
  btrfs: inode: don't re-evaluate inode_need_compress() in
    compress_file_extent()

 fs/btrfs/compression.c |  10 ++--
 fs/btrfs/inode.c       | 106 ++++++++++++++++++++---------------------
 2 files changed, 59 insertions(+), 57 deletions(-)

Comments

David Sterba Aug. 31, 2020, 9:25 p.m. UTC | #1
On Tue, Aug 04, 2020 at 03:25:46PM +0800, Qu Wenruo wrote:
> This is an attempt to remove the inode_need_compress() call in
> compress_file_extent().
> 
> As that compress_file_extent() can race with inode ioctl or bad
> compression ratio, to cause NULL pointer dereferecen for @pages, it's
> nature to try to remove that inode_need_compress() to remove the race
> completely.
> 
> However that's not that easy, we have the following problems:
> 
> - We still need to check @pages anyway
>   That @pages check is for kcalloc() failure, so what we really get is
>   just removing one indent from the if (inode_need_compress()).
>   Everything else is still the same (in fact, even worse, see below
>   problems)
> 
> - Behavior change
>   Before that change, every async_chunk does their check on
>   INODE_NO_COMPRESS flags.
>   If we hit any bad compression ratio, all incoming async_chunk will
>   fall back to plain text write.
> 
>   But if we remove that inode_need_compress() check, then we still try
>   to compress, and lead to potentially wasted CPU times.

Hm, any way I read it, this does not follow the intentions how the
compression decisions should work. The whole idea is to have 2 modes,
force-compress which overrides everything, potentially wasting cpu
cycles.  This is the fallback when the automatic nocompress logic is
bailing out too early.

To fix that, the normal 'compress' mode plus heuristics should decide if
the compression should happen at all and after it does not, flip the
nocompress bit. This is still not ideal as there's no "learning" that
some file ranges can compress worse but we still want to keep the file
compression on, just skip the bad ranges. Compared to one bad range then
skip compression on the file completely.

And your code throws away the location where the heuristic can give the
hint.

As we have several ways to request the compression (mount option, defrag
ioctl, property), the decision needs to be more fine grained.

1. btrfs_run_delalloc_range

- check if the compression is allowed at all (not if there are nodatacow
  or nodatasum bits)
- check if there's explicit request (defrag, compress-force)
- skip it if the nocompress bit is set
- the rest are the weak tests, either mount option or inode flags/props

At this point we don't need to call the heuristics.

2. compress_file_range

The range is split to the 128k chunks and each is evaluated separately.
Here we know we want to compress the range because of all the checks in
1 passed, so the only remaining decisions are based on:

- heuristics, try the given range and bail out eventuall or be more
  clever and do some longer-term statistics before flipping nocompress
  bit
- check the nocompress bit that could have been set in previous
  iteration

So this is the plan, you can count how many things are not implemented.
I have prototype for the heuristic learning, but that's not the
important part, first the test conditions need to be shuffled around to
follow the above logic.

> - Still race between compression disable and NULL pointer dereferecen
>   There is a hidden race, mostly exposed by btrfs/071 test case, that we
>   have "compress_type = fs_info->compress_type", so we can still hit case
>   where that compress_type is NONE (caused by remount -o nocompress), and
>   then btrfs_compress_pages() will return -E2BIG, without modifying
>   @nr_pages

We maybe should add some last sanity check before btrfs_compress_pages
is called in case the defrag or prop changes after the checks.

>   Then later when we cleanup @pages, we try to access pages[i]->mapping,
>   triggering NULL pointer dereference.
> 
>   This will be address in the first patch though.

The patch makes it more robust so this is ok. The unexpected changes of
the compress type due to remount or prop changes could be avoided by
saving the type + level at the beginning of compress_file_range.
David Sterba April 19, 2021, 7:34 p.m. UTC | #2
On Tue, Aug 04, 2020 at 03:25:46PM +0800, Qu Wenruo wrote:
> This is an attempt to remove the inode_need_compress() call in
> compress_file_extent().
> 
> As that compress_file_extent() can race with inode ioctl or bad
> compression ratio, to cause NULL pointer dereferecen for @pages, it's
> nature to try to remove that inode_need_compress() to remove the race
> completely.
> 
> However that's not that easy, we have the following problems:
> 
> - We still need to check @pages anyway
>   That @pages check is for kcalloc() failure, so what we really get is
>   just removing one indent from the if (inode_need_compress()).
>   Everything else is still the same (in fact, even worse, see below
>   problems)
> 
> - Behavior change
>   Before that change, every async_chunk does their check on
>   INODE_NO_COMPRESS flags.
>   If we hit any bad compression ratio, all incoming async_chunk will
>   fall back to plain text write.
> 
>   But if we remove that inode_need_compress() check, then we still try
>   to compress, and lead to potentially wasted CPU times.
> 
> - Still race between compression disable and NULL pointer dereferecen
>   There is a hidden race, mostly exposed by btrfs/071 test case, that we
>   have "compress_type = fs_info->compress_type", so we can still hit case
>   where that compress_type is NONE (caused by remount -o nocompress), and
>   then btrfs_compress_pages() will return -E2BIG, without modifying
>   @nr_pages
> 
>   Then later when we cleanup @pages, we try to access pages[i]->mapping,
>   triggering NULL pointer dereference.
> 
>   This will be address in the first patch though.
> 
> Changelog:
> v2:
> - Fix a bad commit merge
>   Which merges the commit message for the first patch, though the content is
>   still correct.
> 
> Qu Wenruo (2):
>   btrfs: prevent NULL pointer dereference in compress_file_range() when
>     btrfs_compress_pages() hits default case

Patch 1 added to misc-next, due to this bug report
https://bugzilla.kernel.org/show_bug.cgi?id=212331