diff mbox series

btrfs: inode: Fix NULL pointer dereference if inode doesn't need compression

Message ID 20200728083926.19518-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: inode: Fix NULL pointer dereference if inode doesn't need compression | expand

Commit Message

Qu Wenruo July 28, 2020, 8:39 a.m. UTC
[BUG]
There is a bug report of NULL pointer dereference caused in
compress_file_extent():

  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs]
  NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs]
  LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs]
  Call Trace:
  [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable)
  [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs]
  [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs]
  [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0
  [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660
  [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0
  [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c
  ---[ end trace f16954aa20d822f6 ]---

[CAUSE]
For the following execution route of compress_file_range(), it's
possible to hit NULL pointer dereference:

 compress_file_extent()
 |- pages = NULL;
 |- start = async_chunk->start = 0;
 |- end = async_chunk = 4095;
 |- nr_pages = 1;
 |- inode_need_compress() == false; <<< Possible, see later explanation
 |  Now, we have nr_pages = 1, pages = NULL
 |- cont:
 |- 		ret = cow_file_range_inline();
 |- 		if (ret <= 0) {
 |-		for (i = 0; i < nr_pages; i++) {
 |-			WARN_ON(pages[i]->mapping);	<<< Crash

To enter above call execution branch, we need the following race:

    Thread 1 (chattr)     |            Thread 2 (writeback)
--------------------------+------------------------------
                          | btrfs_run_delalloc_range
                          | |- inode_need_compress = true
                          | |- cow_file_range_async()
btrfs_ioctl_set_flag()    |
|- binode_flags |=        |
   BTRFS_INODE_NOCOMPRESS |
                          | compress_file_range()
                          | |- inode_need_compress = false
                          | |- nr_page = 1 while pages = NULL
                          | |  Then hit the crash

[FIX]
This patch will fix it by checking @pages before doing accessing it.
This patch is only designed as a hot fix and easy to backport.

More elegant fix may make btrfs only check inode_need_compress() once to
avoid such race, but that would be another story.

Fixes: 4d3a800ebb12 ("btrfs: merge nr_pages input and output parameter in compress_pages")
Reported-by: Luciano Chavez <chavez@us.ibm.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

David Sterba July 28, 2020, 1:19 p.m. UTC | #1
On Tue, Jul 28, 2020 at 04:39:26PM +0800, Qu Wenruo wrote:
> [BUG]
> There is a bug report of NULL pointer dereference caused in
> compress_file_extent():
> 
>   Oops: Kernel access of bad area, sig: 11 [#1]
>   LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>   Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs]
>   NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs]
>   LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs]
>   Call Trace:
>   [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable)
>   [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs]
>   [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs]
>   [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0
>   [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660
>   [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0
>   [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c
>   ---[ end trace f16954aa20d822f6 ]---
> 
> [CAUSE]
> For the following execution route of compress_file_range(), it's
> possible to hit NULL pointer dereference:
> 
>  compress_file_extent()
>  |- pages = NULL;
>  |- start = async_chunk->start = 0;
>  |- end = async_chunk = 4095;
>  |- nr_pages = 1;
>  |- inode_need_compress() == false; <<< Possible, see later explanation
>  |  Now, we have nr_pages = 1, pages = NULL
>  |- cont:
>  |- 		ret = cow_file_range_inline();
>  |- 		if (ret <= 0) {
>  |-		for (i = 0; i < nr_pages; i++) {
>  |-			WARN_ON(pages[i]->mapping);	<<< Crash
> 
> To enter above call execution branch, we need the following race:
> 
>     Thread 1 (chattr)     |            Thread 2 (writeback)
> --------------------------+------------------------------
>                           | btrfs_run_delalloc_range
>                           | |- inode_need_compress = true
>                           | |- cow_file_range_async()
> btrfs_ioctl_set_flag()    |
> |- binode_flags |=        |
>    BTRFS_INODE_NOCOMPRESS |
>                           | compress_file_range()
>                           | |- inode_need_compress = false
>                           | |- nr_page = 1 while pages = NULL
>                           | |  Then hit the crash
> 
> [FIX]
> This patch will fix it by checking @pages before doing accessing it.
> This patch is only designed as a hot fix and easy to backport.
> 
> More elegant fix may make btrfs only check inode_need_compress() once to
> avoid such race, but that would be another story.

Yeah it gets mistakenly called twice.

> Fixes: 4d3a800ebb12 ("btrfs: merge nr_pages input and output parameter in compress_pages")

How does this patch cause the bug?
Qu Wenruo July 28, 2020, 1:26 p.m. UTC | #2
On 2020/7/28 下午9:19, David Sterba wrote:
> On Tue, Jul 28, 2020 at 04:39:26PM +0800, Qu Wenruo wrote:
>> [BUG]
>> There is a bug report of NULL pointer dereference caused in
>> compress_file_extent():
>>
>>   Oops: Kernel access of bad area, sig: 11 [#1]
>>   LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>>   Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs]
>>   NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs]
>>   LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs]
>>   Call Trace:
>>   [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable)
>>   [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs]
>>   [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs]
>>   [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0
>>   [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660
>>   [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0
>>   [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c
>>   ---[ end trace f16954aa20d822f6 ]---
>>
>> [CAUSE]
>> For the following execution route of compress_file_range(), it's
>> possible to hit NULL pointer dereference:
>>
>>  compress_file_extent()
>>  |- pages = NULL;
>>  |- start = async_chunk->start = 0;
>>  |- end = async_chunk = 4095;
>>  |- nr_pages = 1;
>>  |- inode_need_compress() == false; <<< Possible, see later explanation
>>  |  Now, we have nr_pages = 1, pages = NULL
>>  |- cont:
>>  |- 		ret = cow_file_range_inline();
>>  |- 		if (ret <= 0) {
>>  |-		for (i = 0; i < nr_pages; i++) {
>>  |-			WARN_ON(pages[i]->mapping);	<<< Crash
>>
>> To enter above call execution branch, we need the following race:
>>
>>     Thread 1 (chattr)     |            Thread 2 (writeback)
>> --------------------------+------------------------------
>>                           | btrfs_run_delalloc_range
>>                           | |- inode_need_compress = true
>>                           | |- cow_file_range_async()
>> btrfs_ioctl_set_flag()    |
>> |- binode_flags |=        |
>>    BTRFS_INODE_NOCOMPRESS |
>>                           | compress_file_range()
>>                           | |- inode_need_compress = false
>>                           | |- nr_page = 1 while pages = NULL
>>                           | |  Then hit the crash
>>
>> [FIX]
>> This patch will fix it by checking @pages before doing accessing it.
>> This patch is only designed as a hot fix and easy to backport.
>>
>> More elegant fix may make btrfs only check inode_need_compress() once to
>> avoid such race, but that would be another story.
> 
> Yeah it gets mistakenly called twice.
> 
>> Fixes: 4d3a800ebb12 ("btrfs: merge nr_pages input and output parameter in compress_pages")
> 
> How does this patch cause the bug?
> 
Sorry, I should explain more on that.

In fact it takes me quite some time to find the proper culprit.

Before that commit, we have @nr_pages_ret initialized to 0 in
compress_file_extent().

If inode_need_compress() returned false in that function, we continue to
the same inline file extent insert,.

But in free_pages_out: tag, we use @nr_pages_nr to free pages, which is
still 0, as it only get initialized to proper values after
btrfs_compress_pages() call, which we skipped due to
inode_need_compress() returned false.

Then free_pages_out: tag will not execute the WARN_ON() and put_pages()
calls, just skip to kfree(pages). And kfree() can handle NULL pointers
without any problem.

Thus a completely sane looking cleanup in fact caused the NULL pointer
dereference regression for race cases.

Thanks,
Qu
David Sterba July 28, 2020, 1:41 p.m. UTC | #3
On Tue, Jul 28, 2020 at 09:26:43PM +0800, Qu Wenruo wrote:
> >> Fixes: 4d3a800ebb12 ("btrfs: merge nr_pages input and output parameter in compress_pages")
> > 
> > How does this patch cause the bug?
> > 
> Sorry, I should explain more on that.
> 
> In fact it takes me quite some time to find the proper culprit.
> 
> Before that commit, we have @nr_pages_ret initialized to 0 in
> compress_file_extent().
> 
> If inode_need_compress() returned false in that function, we continue to
> the same inline file extent insert,.
> 
> But in free_pages_out: tag, we use @nr_pages_nr to free pages, which is
> still 0, as it only get initialized to proper values after
> btrfs_compress_pages() call, which we skipped due to
> inode_need_compress() returned false.
> 
> Then free_pages_out: tag will not execute the WARN_ON() and put_pages()
> calls, just skip to kfree(pages). And kfree() can handle NULL pointers
> without any problem.
> 
> Thus a completely sane looking cleanup in fact caused the NULL pointer
> dereference regression for race cases.

Thanks. It was not obvious so I was expecting a bit more convoluted way
to hit the bug.
Nikolay Borisov Aug. 2, 2020, 7:16 p.m. UTC | #4
On 28.07.20 г. 11:39 ч., Qu Wenruo wrote:
> [BUG]
> There is a bug report of NULL pointer dereference caused in
> compress_file_extent():
> 
>   Oops: Kernel access of bad area, sig: 11 [#1]
>   LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>   Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs]
>   NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs]
>   LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs]
>   Call Trace:
>   [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable)
>   [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs]
>   [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs]
>   [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0
>   [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660
>   [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0
>   [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c
>   ---[ end trace f16954aa20d822f6 ]---
> 
> [CAUSE]
> For the following execution route of compress_file_range(), it's
> possible to hit NULL pointer dereference:
> 
>  compress_file_extent()
>  |- pages = NULL;
>  |- start = async_chunk->start = 0;
>  |- end = async_chunk = 4095;
>  |- nr_pages = 1;
>  |- inode_need_compress() == false; <<< Possible, see later explanation
>  |  Now, we have nr_pages = 1, pages = NULL
>  |- cont:
>  |- 		ret = cow_file_range_inline();
>  |- 		if (ret <= 0) {
>  |-		for (i = 0; i < nr_pages; i++) {
>  |-			WARN_ON(pages[i]->mapping);	<<< Crash
> 
> To enter above call execution branch, we need the following race:
> 
>     Thread 1 (chattr)     |            Thread 2 (writeback)
> --------------------------+------------------------------
>                           | btrfs_run_delalloc_range
>                           | |- inode_need_compress = true
>                           | |- cow_file_range_async()
> btrfs_ioctl_set_flag()    |
> |- binode_flags |=        |
>    BTRFS_INODE_NOCOMPRESS |
>                           | compress_file_range()
>                           | |- inode_need_compress = false
>                           | |- nr_page = 1 while pages = NULL
>                           | |  Then hit the crash
> 
> [FIX]
> This patch will fix it by checking @pages before doing accessing it.
> This patch is only designed as a hot fix and easy to backport.
> 
> More elegant fix may make btrfs only check inode_need_compress() once to
> avoid such race, but that would be another story.

So why not do the elegant fix in the first place rather than adding
cruft like this hotfix which later has to be cleaned up when the
'proper' fix lands?

> 
> Fixes: 4d3a800ebb12 ("btrfs: merge nr_pages input and output parameter in compress_pages")
> Reported-by: Luciano Chavez <chavez@us.ibm.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/inode.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 611b3412fbfd..9988d754e465 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -653,12 +653,18 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
>  						     page_error_op |
>  						     PAGE_END_WRITEBACK);
>  
> -			for (i = 0; i < nr_pages; i++) {
> -				WARN_ON(pages[i]->mapping);
> -				put_page(pages[i]);
> +			/*
> +			 * Ensure we only free the compressed pages if we have
> +			 * them allocated, as we can still reach here with
> +			 * inode_need_compress() == false.
> +			 */
> +			if (pages) {
> +				for (i = 0; i < nr_pages; i++) {
> +					WARN_ON(pages[i]->mapping);
> +					put_page(pages[i]);
> +				}
> +				kfree(pages);
>  			}
> -			kfree(pages);
> -
>  			return 0;
>  		}
>  	}
>
Qu Wenruo Aug. 2, 2020, 11:14 p.m. UTC | #5
On 2020/8/3 上午3:16, Nikolay Borisov wrote:
>
>
> On 28.07.20 г. 11:39 ч., Qu Wenruo wrote:
>> [BUG]
>> There is a bug report of NULL pointer dereference caused in
>> compress_file_extent():
>>
>>   Oops: Kernel access of bad area, sig: 11 [#1]
>>   LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>>   Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs]
>>   NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs]
>>   LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs]
>>   Call Trace:
>>   [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable)
>>   [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs]
>>   [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs]
>>   [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0
>>   [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660
>>   [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0
>>   [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c
>>   ---[ end trace f16954aa20d822f6 ]---
>>
>> [CAUSE]
>> For the following execution route of compress_file_range(), it's
>> possible to hit NULL pointer dereference:
>>
>>  compress_file_extent()
>>  |- pages = NULL;
>>  |- start = async_chunk->start = 0;
>>  |- end = async_chunk = 4095;
>>  |- nr_pages = 1;
>>  |- inode_need_compress() == false; <<< Possible, see later explanation
>>  |  Now, we have nr_pages = 1, pages = NULL
>>  |- cont:
>>  |- 		ret = cow_file_range_inline();
>>  |- 		if (ret <= 0) {
>>  |-		for (i = 0; i < nr_pages; i++) {
>>  |-			WARN_ON(pages[i]->mapping);	<<< Crash
>>
>> To enter above call execution branch, we need the following race:
>>
>>     Thread 1 (chattr)     |            Thread 2 (writeback)
>> --------------------------+------------------------------
>>                           | btrfs_run_delalloc_range
>>                           | |- inode_need_compress = true
>>                           | |- cow_file_range_async()
>> btrfs_ioctl_set_flag()    |
>> |- binode_flags |=        |
>>    BTRFS_INODE_NOCOMPRESS |
>>                           | compress_file_range()
>>                           | |- inode_need_compress = false
>>                           | |- nr_page = 1 while pages = NULL
>>                           | |  Then hit the crash
>>
>> [FIX]
>> This patch will fix it by checking @pages before doing accessing it.
>> This patch is only designed as a hot fix and easy to backport.
>>
>> More elegant fix may make btrfs only check inode_need_compress() once to
>> avoid such race, but that would be another story.
>
> So why not do the elegant fix in the first place rather than adding
> cruft like this hotfix which later has to be cleaned up when the
> 'proper' fix lands?

For backport purpose.

This is reported from one vendor kernel, not upstream.
Thus backport is definitely required.

Thanks,
Qu
>
>>
>> Fixes: 4d3a800ebb12 ("btrfs: merge nr_pages input and output parameter in compress_pages")
>> Reported-by: Luciano Chavez <chavez@us.ibm.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/inode.c | 16 +++++++++++-----
>>  1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 611b3412fbfd..9988d754e465 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -653,12 +653,18 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
>>  						     page_error_op |
>>  						     PAGE_END_WRITEBACK);
>>
>> -			for (i = 0; i < nr_pages; i++) {
>> -				WARN_ON(pages[i]->mapping);
>> -				put_page(pages[i]);
>> +			/*
>> +			 * Ensure we only free the compressed pages if we have
>> +			 * them allocated, as we can still reach here with
>> +			 * inode_need_compress() == false.
>> +			 */
>> +			if (pages) {
>> +				for (i = 0; i < nr_pages; i++) {
>> +					WARN_ON(pages[i]->mapping);
>> +					put_page(pages[i]);
>> +				}
>> +				kfree(pages);
>>  			}
>> -			kfree(pages);
>> -
>>  			return 0;
>>  		}
>>  	}
>>
Qu Wenruo Aug. 4, 2020, 6:41 a.m. UTC | #6
On 2020/8/3 上午3:16, Nikolay Borisov wrote:
>
>
> On 28.07.20 г. 11:39 ч., Qu Wenruo wrote:
>> [BUG]
>> There is a bug report of NULL pointer dereference caused in
>> compress_file_extent():
>>
>>   Oops: Kernel access of bad area, sig: 11 [#1]
>>   LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>>   Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs]
>>   NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs]
>>   LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs]
>>   Call Trace:
>>   [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable)
>>   [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs]
>>   [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs]
>>   [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0
>>   [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660
>>   [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0
>>   [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c
>>   ---[ end trace f16954aa20d822f6 ]---
>>
>> [CAUSE]
>> For the following execution route of compress_file_range(), it's
>> possible to hit NULL pointer dereference:
>>
>>  compress_file_extent()
>>  |- pages = NULL;
>>  |- start = async_chunk->start = 0;
>>  |- end = async_chunk = 4095;
>>  |- nr_pages = 1;
>>  |- inode_need_compress() == false; <<< Possible, see later explanation
>>  |  Now, we have nr_pages = 1, pages = NULL
>>  |- cont:
>>  |- 		ret = cow_file_range_inline();
>>  |- 		if (ret <= 0) {
>>  |-		for (i = 0; i < nr_pages; i++) {
>>  |-			WARN_ON(pages[i]->mapping);	<<< Crash
>>
>> To enter above call execution branch, we need the following race:
>>
>>     Thread 1 (chattr)     |            Thread 2 (writeback)
>> --------------------------+------------------------------
>>                           | btrfs_run_delalloc_range
>>                           | |- inode_need_compress = true
>>                           | |- cow_file_range_async()
>> btrfs_ioctl_set_flag()    |
>> |- binode_flags |=        |
>>    BTRFS_INODE_NOCOMPRESS |
>>                           | compress_file_range()
>>                           | |- inode_need_compress = false
>>                           | |- nr_page = 1 while pages = NULL
>>                           | |  Then hit the crash
>>
>> [FIX]
>> This patch will fix it by checking @pages before doing accessing it.
>> This patch is only designed as a hot fix and easy to backport.
>>
>> More elegant fix may make btrfs only check inode_need_compress() once to
>> avoid such race, but that would be another story.
>
> So why not do the elegant fix in the first place rather than adding
> cruft like this hotfix which later has to be cleaned up when the
> 'proper' fix lands?

Tried that "elegant" method, but there are several hidden problems:

- We still need to check @pages anyway
  That @pages check is for kcalloc() failure, so what we really get is
just removing
  one indent from the if (inode_need_compress()).
  Everything else is still the same (in fact, even worse, see below
problems)

- Behavior change
  Before that change, every async_chunk does their check on
INODE_NO_COMPRESS flags.
  If we hit any bad compression ratio, all incoming async_chunk will
fall back to plain text
  write.

  But if we remove that inode_need_compress() check, then we still try
to compress, and
  lead to potentially wasted CPU times.

- Still race between compression disable
  There is a hidden race, mostly exposed by btrfs/071 test case, that we
have
  "compress_type = fs_info->compress_type", so we can still hit case
where that
  compress_type is NONE, and cause btrfs_compress_pages() return -E2BIG.

  This would leave @pages uninitialized, and cause NULL pointer
dereferences again.
  I would fix this bug anyway, since it's still possible to hit even
without the
  inode_need_compress() removal.

I will still try to send out the patch, but the benefit is really small.

I hope some refactor can happen for compress_file_range(), which is
really far from elegant, and race-prone already.

Thanks,
Qu
>
>>
>> Fixes: 4d3a800ebb12 ("btrfs: merge nr_pages input and output parameter in compress_pages")
>> Reported-by: Luciano Chavez <chavez@us.ibm.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/inode.c | 16 +++++++++++-----
>>  1 file changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 611b3412fbfd..9988d754e465 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -653,12 +653,18 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
>>  						     page_error_op |
>>  						     PAGE_END_WRITEBACK);
>>
>> -			for (i = 0; i < nr_pages; i++) {
>> -				WARN_ON(pages[i]->mapping);
>> -				put_page(pages[i]);
>> +			/*
>> +			 * Ensure we only free the compressed pages if we have
>> +			 * them allocated, as we can still reach here with
>> +			 * inode_need_compress() == false.
>> +			 */
>> +			if (pages) {
>> +				for (i = 0; i < nr_pages; i++) {
>> +					WARN_ON(pages[i]->mapping);
>> +					put_page(pages[i]);
>> +				}
>> +				kfree(pages);
>>  			}
>> -			kfree(pages);
>> -
>>  			return 0;
>>  		}
>>  	}
>>
David Sterba Aug. 25, 2020, 3:03 p.m. UTC | #7
On Mon, Aug 03, 2020 at 07:14:22AM +0800, Qu Wenruo wrote:
> On 2020/8/3 上午3:16, Nikolay Borisov wrote:
> > On 28.07.20 г. 11:39 ч., Qu Wenruo wrote:
> >> [BUG]
> >> There is a bug report of NULL pointer dereference caused in
> >> compress_file_extent():
> >>
> >>   Oops: Kernel access of bad area, sig: 11 [#1]
> >>   LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> >>   Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs]
> >>   NIP [c008000006dd4d34] compress_file_range.constprop.41+0x75c/0x8a0 [btrfs]
> >>   LR [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs]
> >>   Call Trace:
> >>   [c000000c69093b00] [c008000006dd4d1c] compress_file_range.constprop.41+0x744/0x8a0 [btrfs] (unreliable)
> >>   [c000000c69093bd0] [c008000006dd4ebc] async_cow_start+0x44/0xa0 [btrfs]
> >>   [c000000c69093c10] [c008000006e14824] normal_work_helper+0xdc/0x598 [btrfs]
> >>   [c000000c69093c80] [c0000000001608c0] process_one_work+0x2c0/0x5b0
> >>   [c000000c69093d10] [c000000000160c38] worker_thread+0x88/0x660
> >>   [c000000c69093db0] [c00000000016b55c] kthread+0x1ac/0x1c0
> >>   [c000000c69093e20] [c00000000000b660] ret_from_kernel_thread+0x5c/0x7c
> >>   ---[ end trace f16954aa20d822f6 ]---
> >>
> >> [CAUSE]
> >> For the following execution route of compress_file_range(), it's
> >> possible to hit NULL pointer dereference:
> >>
> >>  compress_file_extent()
> >>  |- pages = NULL;
> >>  |- start = async_chunk->start = 0;
> >>  |- end = async_chunk = 4095;
> >>  |- nr_pages = 1;
> >>  |- inode_need_compress() == false; <<< Possible, see later explanation
> >>  |  Now, we have nr_pages = 1, pages = NULL
> >>  |- cont:
> >>  |- 		ret = cow_file_range_inline();
> >>  |- 		if (ret <= 0) {
> >>  |-		for (i = 0; i < nr_pages; i++) {
> >>  |-			WARN_ON(pages[i]->mapping);	<<< Crash
> >>
> >> To enter above call execution branch, we need the following race:
> >>
> >>     Thread 1 (chattr)     |            Thread 2 (writeback)
> >> --------------------------+------------------------------
> >>                           | btrfs_run_delalloc_range
> >>                           | |- inode_need_compress = true
> >>                           | |- cow_file_range_async()
> >> btrfs_ioctl_set_flag()    |
> >> |- binode_flags |=        |
> >>    BTRFS_INODE_NOCOMPRESS |
> >>                           | compress_file_range()
> >>                           | |- inode_need_compress = false
> >>                           | |- nr_page = 1 while pages = NULL
> >>                           | |  Then hit the crash
> >>
> >> [FIX]
> >> This patch will fix it by checking @pages before doing accessing it.
> >> This patch is only designed as a hot fix and easy to backport.
> >>
> >> More elegant fix may make btrfs only check inode_need_compress() once to
> >> avoid such race, but that would be another story.
> >
> > So why not do the elegant fix in the first place rather than adding
> > cruft like this hotfix which later has to be cleaned up when the
> > 'proper' fix lands?
> 
> For backport purpose.
> 
> This is reported from one vendor kernel, not upstream.
> Thus backport is definitely required.

Agreed, minimal fixes are desired when possible. For example this patch
got to from 4.14 up to 5.8, with minimal or no conflicts.
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 611b3412fbfd..9988d754e465 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -653,12 +653,18 @@  static noinline int compress_file_range(struct async_chunk *async_chunk)
 						     page_error_op |
 						     PAGE_END_WRITEBACK);
 
-			for (i = 0; i < nr_pages; i++) {
-				WARN_ON(pages[i]->mapping);
-				put_page(pages[i]);
+			/*
+			 * Ensure we only free the compressed pages if we have
+			 * them allocated, as we can still reach here with
+			 * inode_need_compress() == false.
+			 */
+			if (pages) {
+				for (i = 0; i < nr_pages; i++) {
+					WARN_ON(pages[i]->mapping);
+					put_page(pages[i]);
+				}
+				kfree(pages);
 			}
-			kfree(pages);
-
 			return 0;
 		}
 	}