diff mbox series

btrfs: Fix avail_in bytes for s390 zlib HW compression path

Message ID 20241212135000.1926110-1-zaslonko@linux.ibm.com (mailing list archive)
State New
Headers show
Series btrfs: Fix avail_in bytes for s390 zlib HW compression path | expand

Commit Message

Mikhail Zaslonko Dec. 12, 2024, 1:50 p.m. UTC
Since the input data length passed to zlib_compress_folios() can be
arbitrary, always setting strm.avail_in to a multiple of PAGE_SIZE may
cause read-in bytes to exceed the input range. Currently this triggers
an assert in btrfs_compress_folios() on the debug kernel. But it may
potentially lead to data corruption.
Fix strm.avail_in calculation for S390 hardware acceleration path.

Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Fixes: fd1e75d0105d ("btrfs: make compression path to be subpage compatible")
---
 fs/btrfs/zlib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Ilya Leoshkevich Dec. 12, 2024, 3:37 p.m. UTC | #1
On Thu, 2024-12-12 at 14:50 +0100, Mikhail Zaslonko wrote:
> Since the input data length passed to zlib_compress_folios() can be
> arbitrary, always setting strm.avail_in to a multiple of PAGE_SIZE
> may
> cause read-in bytes to exceed the input range. Currently this
> triggers
> an assert in btrfs_compress_folios() on the debug kernel. But it may
> potentially lead to data corruption.
> Fix strm.avail_in calculation for S390 hardware acceleration path.
> 
> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> Fixes: fd1e75d0105d ("btrfs: make compression path to be subpage
> compatible")
> ---
>  fs/btrfs/zlib.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Qu Wenruo Dec. 12, 2024, 8:37 p.m. UTC | #2
在 2024/12/13 00:20, Mikhail Zaslonko 写道:
> Since the input data length passed to zlib_compress_folios() can be
> arbitrary, always setting strm.avail_in to a multiple of PAGE_SIZE may
> cause read-in bytes to exceed the input range. Currently this triggers
> an assert in btrfs_compress_folios() on the debug kernel. But it may
> potentially lead to data corruption.

Mind to provide the real world ASSERT() call trace?

AFAIK the range passed into btrfs_compress_folios() should always have
its start/length aligned to sector size.

Since s390 only supports 4K page size, that means the range is always
aligned to page size, and the existing code is also doing full page copy
anyway, thus I see no problem with the existing read.

Thanks,
Qu

> Fix strm.avail_in calculation for S390 hardware acceleration path.
>
> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> Fixes: fd1e75d0105d ("btrfs: make compression path to be subpage compatible")
> ---
>   fs/btrfs/zlib.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
> index ddf0d5a448a7..c9e92c6941ec 100644
> --- a/fs/btrfs/zlib.c
> +++ b/fs/btrfs/zlib.c
> @@ -174,10 +174,10 @@ int zlib_compress_folios(struct list_head *ws, struct address_space *mapping,
>   					copy_page(workspace->buf + i * PAGE_SIZE,
>   						  data_in);
>   					start += PAGE_SIZE;
> -					workspace->strm.avail_in =
> -						(in_buf_folios << PAGE_SHIFT);
>   				}
>   				workspace->strm.next_in = workspace->buf;
> +				workspace->strm.avail_in = min(bytes_left,
> +							       in_buf_folios << PAGE_SHIFT);
>   			} else {
>   				unsigned int pg_off;
>   				unsigned int cur_len;
Mikhail Zaslonko Dec. 13, 2024, 9:34 a.m. UTC | #3
Hello Qu,

On 12.12.2024 21:37, Qu Wenruo wrote:
> 
> 
> 在 2024/12/13 00:20, Mikhail Zaslonko 写道:
>> Since the input data length passed to zlib_compress_folios() can be
>> arbitrary, always setting strm.avail_in to a multiple of PAGE_SIZE may
>> cause read-in bytes to exceed the input range. Currently this triggers
>> an assert in btrfs_compress_folios() on the debug kernel. But it may
>> potentially lead to data corruption.
> 
> Mind to provide the real world ASSERT() call trace?

Here is the call trace triggered by one of our tests (wasn't sure whether to include it to the commit message):

[ 2928.542300] BTRFS: device fsid 98138b99-a1bc-47cd-9b77-9e64fbba11de devid 1 transid 55 /dev/dasdc1 (94:9) scanned by mount (2000)                            
[ 2928.543029] BTRFS info (device dasdc1): first mount of filesystem 98138b99-a1bc-47cd-9b77-9e64fbba11de                                                       
[ 2928.543051] BTRFS info (device dasdc1): using crc32c (crc32c-vx) checksum algorithm                                                                          
[ 2928.543058] BTRFS info (device dasdc1): using free-space-tree                                                                                                
[ 2964.842146] assertion failed: *total_in <= orig_len, in fs/btrfs/compression.c:1041                                                                          
[ 2964.842226] ------------[ cut here ]------------                                                                                                             
[ 2964.842229] kernel BUG at fs/btrfs/compression.c:1041!                                                                                                       
[ 2964.842306] monitor event: 0040 ilc:2 [#1] PREEMPT SMP                                                                                                       
[ 2964.842314] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat n
f_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables pkey_pckmo s390_trng rng_core vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel drm loop i2c_co
re dm_multipath drm_panel_orientation_quirks nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock lcs ctcm fsm zfcp scsi_transport_fc ghash_s390 prn
g chacha_s390 aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkey autof
s4 ecdsa_generic ecc                                                                                                                                            
[ 2964.842387] CPU: 16 UID: 0 PID: 325 Comm: kworker/u273:3 Not tainted 6.13.0-20241204.rc1.git6.fae3b21430ca.300.fc41.s390x+debug #1                           
[ 2964.842406] Hardware name: IBM 3931 A01 703 (z/VM 7.4.0)                                                                                                     
[ 2964.842409] Workqueue: btrfs-delalloc btrfs_work_helper                                                                                                      
[ 2964.842420] Krnl PSW : 0704d00180000000 0000021761df6538 (btrfs_compress_folios+0x198/0x1a0)                                                                 
[ 2964.842426]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3                                                                          
[ 2964.842430] Krnl GPRS: 0000000080000000 0000000000000001 0000000000000047 0000000000000000                                                                   
[ 2964.842433]            0000000000000006 ffffff01757bb000 000001976232fcc0 000000000000130c                                                                   
[ 2964.842436]            000001976232fcd0 000001976232fcc8 00000118ff4a0e30 0000000000000001                                                                   
[ 2964.842438]            00000111821ab400 0000011100000000 0000021761df6534 000001976232fb58                                                                   
[ 2964.842446] Krnl Code: 0000021761df6528: c020006f5ef4        larl    %r2,0000021762be2310                                                                    
[ 2964.842446]            0000021761df652e: c0e5ffbd09d5        brasl   %r14,00000217615978d8                                                                   
[ 2964.842446]           #0000021761df6534: af000000            mc      0,0                                                                                     
[ 2964.842446]           >0000021761df6538: 0707                bcr     0,%r7                                                                                   
[ 2964.842446]            0000021761df653a: 0707                bcr     0,%r7                                                                                   
[ 2964.842446]            0000021761df653c: 0707                bcr     0,%r7                                                                                   
[ 2964.842446]            0000021761df653e: 0707                bcr     0,%r7                                                                                   
[ 2964.842446]            0000021761df6540: c004004bb7ec        brcl    0,000002176276d518                                                                      
[ 2964.842463] Call Trace:                                                                                                                                      
[ 2964.842465]  [<0000021761df6538>] btrfs_compress_folios+0x198/0x1a0                                                                                          
[ 2964.842468] ([<0000021761df6534>] btrfs_compress_folios+0x194/0x1a0)                                                                                         
[ 2964.842708]  [<0000021761d97788>] compress_file_range+0x3b8/0x6d0                                                                                            
[ 2964.842714]  [<0000021761dcee7c>] btrfs_work_helper+0x10c/0x160                                                                                              
[ 2964.842718]  [<0000021761645760>] process_one_work+0x2b0/0x5d0                                                                                               
[ 2964.842724]  [<000002176164637e>] worker_thread+0x20e/0x3e0                                                                                                  
[ 2964.842728]  [<000002176165221a>] kthread+0x15a/0x170                                                                                                        
[ 2964.842732]  [<00000217615b859c>] __ret_from_fork+0x3c/0x60                                                                                                  
[ 2964.842736]  [<00000217626e72d2>] ret_from_fork+0xa/0x38                                                                                                     
[ 2964.842744] INFO: lockdep is turned off.                                                                                                                     
[ 2964.842746] Last Breaking-Event-Address:                                                                                                                     
[ 2964.842748]  [<0000021761597924>] _printk+0x4c/0x58                                                                                                          
[ 2964.842755] Kernel panic - not syncing: Fatal exception: panic_on_oops                                                                                       

Let me know if I can provide any other details.

> 
> AFAIK the range passed into btrfs_compress_folios() should always have
> its start/length aligned to sector size.

Based on my tests, btrfs_compress_folios() input length (total_out) is not always a
multiple of PAGE_SIZE. One can see this when writing less than 4K of data to an
empty btrfs file.

> 
> Since s390 only supports 4K page size, that means the range is always
> aligned to page size, and the existing code is also doing full page copy
> anyway, thus I see no problem with the existing read.

The code is doing full page copy to the workspace buffer for further compression. But the
number of bytes actually processed by zlib_deflate() is controlled by strm.avail_in
parameter.

> 
> Thanks,
> Qu
> 
>> Fix strm.avail_in calculation for S390 hardware acceleration path.
>>
>> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
>> Fixes: fd1e75d0105d ("btrfs: make compression path to be subpage compatible")
>> ---
>>   fs/btrfs/zlib.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
>> index ddf0d5a448a7..c9e92c6941ec 100644
>> --- a/fs/btrfs/zlib.c
>> +++ b/fs/btrfs/zlib.c
>> @@ -174,10 +174,10 @@ int zlib_compress_folios(struct list_head *ws, struct address_space *mapping,
>>                       copy_page(workspace->buf + i * PAGE_SIZE,
>>                             data_in);
>>                       start += PAGE_SIZE;
>> -                    workspace->strm.avail_in =
>> -                        (in_buf_folios << PAGE_SHIFT);
>>                   }
>>                   workspace->strm.next_in = workspace->buf;
>> +                workspace->strm.avail_in = min(bytes_left,
>> +                                   in_buf_folios << PAGE_SHIFT);
>>               } else {
>>                   unsigned int pg_off;
>>                   unsigned int cur_len;
>
Qu Wenruo Dec. 13, 2024, 9:42 a.m. UTC | #4
在 2024/12/13 20:04, Zaslonko Mikhail 写道:
> Hello Qu,
>
> On 12.12.2024 21:37, Qu Wenruo wrote:
>>
>>
>> 在 2024/12/13 00:20, Mikhail Zaslonko 写道:
>>> Since the input data length passed to zlib_compress_folios() can be
>>> arbitrary, always setting strm.avail_in to a multiple of PAGE_SIZE may
>>> cause read-in bytes to exceed the input range. Currently this triggers
>>> an assert in btrfs_compress_folios() on the debug kernel. But it may
>>> potentially lead to data corruption.
>>
>> Mind to provide the real world ASSERT() call trace?
>
> Here is the call trace triggered by one of our tests (wasn't sure whether to include it to the commit message):
>
> [ 2928.542300] BTRFS: device fsid 98138b99-a1bc-47cd-9b77-9e64fbba11de devid 1 transid 55 /dev/dasdc1 (94:9) scanned by mount (2000)
> [ 2928.543029] BTRFS info (device dasdc1): first mount of filesystem 98138b99-a1bc-47cd-9b77-9e64fbba11de
> [ 2928.543051] BTRFS info (device dasdc1): using crc32c (crc32c-vx) checksum algorithm
> [ 2928.543058] BTRFS info (device dasdc1): using free-space-tree
> [ 2964.842146] assertion failed: *total_in <= orig_len, in fs/btrfs/compression.c:1041
> [ 2964.842226] ------------[ cut here ]------------
> [ 2964.842229] kernel BUG at fs/btrfs/compression.c:1041!
> [ 2964.842306] monitor event: 0040 ilc:2 [#1] PREEMPT SMP
> [ 2964.842314] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat n
> f_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables pkey_pckmo s390_trng rng_core vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel drm loop i2c_co
> re dm_multipath drm_panel_orientation_quirks nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock lcs ctcm fsm zfcp scsi_transport_fc ghash_s390 prn
> g chacha_s390 aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkey autof
> s4 ecdsa_generic ecc
> [ 2964.842387] CPU: 16 UID: 0 PID: 325 Comm: kworker/u273:3 Not tainted 6.13.0-20241204.rc1.git6.fae3b21430ca.300.fc41.s390x+debug #1
> [ 2964.842406] Hardware name: IBM 3931 A01 703 (z/VM 7.4.0)
> [ 2964.842409] Workqueue: btrfs-delalloc btrfs_work_helper
> [ 2964.842420] Krnl PSW : 0704d00180000000 0000021761df6538 (btrfs_compress_folios+0x198/0x1a0)
> [ 2964.842426]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
> [ 2964.842430] Krnl GPRS: 0000000080000000 0000000000000001 0000000000000047 0000000000000000
> [ 2964.842433]            0000000000000006 ffffff01757bb000 000001976232fcc0 000000000000130c
> [ 2964.842436]            000001976232fcd0 000001976232fcc8 00000118ff4a0e30 0000000000000001
> [ 2964.842438]            00000111821ab400 0000011100000000 0000021761df6534 000001976232fb58
> [ 2964.842446] Krnl Code: 0000021761df6528: c020006f5ef4        larl    %r2,0000021762be2310
> [ 2964.842446]            0000021761df652e: c0e5ffbd09d5        brasl   %r14,00000217615978d8
> [ 2964.842446]           #0000021761df6534: af000000            mc      0,0
> [ 2964.842446]           >0000021761df6538: 0707                bcr     0,%r7
> [ 2964.842446]            0000021761df653a: 0707                bcr     0,%r7
> [ 2964.842446]            0000021761df653c: 0707                bcr     0,%r7
> [ 2964.842446]            0000021761df653e: 0707                bcr     0,%r7
> [ 2964.842446]            0000021761df6540: c004004bb7ec        brcl    0,000002176276d518
> [ 2964.842463] Call Trace:
> [ 2964.842465]  [<0000021761df6538>] btrfs_compress_folios+0x198/0x1a0
> [ 2964.842468] ([<0000021761df6534>] btrfs_compress_folios+0x194/0x1a0)
> [ 2964.842708]  [<0000021761d97788>] compress_file_range+0x3b8/0x6d0
> [ 2964.842714]  [<0000021761dcee7c>] btrfs_work_helper+0x10c/0x160
> [ 2964.842718]  [<0000021761645760>] process_one_work+0x2b0/0x5d0
> [ 2964.842724]  [<000002176164637e>] worker_thread+0x20e/0x3e0
> [ 2964.842728]  [<000002176165221a>] kthread+0x15a/0x170
> [ 2964.842732]  [<00000217615b859c>] __ret_from_fork+0x3c/0x60
> [ 2964.842736]  [<00000217626e72d2>] ret_from_fork+0xa/0x38
> [ 2964.842744] INFO: lockdep is turned off.
> [ 2964.842746] Last Breaking-Event-Address:
> [ 2964.842748]  [<0000021761597924>] _printk+0x4c/0x58
> [ 2964.842755] Kernel panic - not syncing: Fatal exception: panic_on_oops
>
> Let me know if I can provide any other details.

OK, I see the reason.

In compress_file_ranges() we initialize the original length according
the i_size.

The behavior is different from non-compressed write, as non-compressed
write is always sector aligned, for unaligned i_size, it just zero out
the remaining part and still submit the full sector).

Please include this call trace

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
>
>>
>> AFAIK the range passed into btrfs_compress_folios() should always have
>> its start/length aligned to sector size.
>
> Based on my tests, btrfs_compress_folios() input length (total_out) is not always a
> multiple of PAGE_SIZE. One can see this when writing less than 4K of data to an
> empty btrfs file.
>
>>
>> Since s390 only supports 4K page size, that means the range is always
>> aligned to page size, and the existing code is also doing full page copy
>> anyway, thus I see no problem with the existing read.
>
> The code is doing full page copy to the workspace buffer for further compression. But the
> number of bytes actually processed by zlib_deflate() is controlled by strm.avail_in
> parameter.
>
>>
>> Thanks,
>> Qu
>>
>>> Fix strm.avail_in calculation for S390 hardware acceleration path.
>>>
>>> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
>>> Fixes: fd1e75d0105d ("btrfs: make compression path to be subpage compatible")
>>> ---
>>>    fs/btrfs/zlib.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
>>> index ddf0d5a448a7..c9e92c6941ec 100644
>>> --- a/fs/btrfs/zlib.c
>>> +++ b/fs/btrfs/zlib.c
>>> @@ -174,10 +174,10 @@ int zlib_compress_folios(struct list_head *ws, struct address_space *mapping,
>>>                        copy_page(workspace->buf + i * PAGE_SIZE,
>>>                              data_in);
>>>                        start += PAGE_SIZE;
>>> -                    workspace->strm.avail_in =
>>> -                        (in_buf_folios << PAGE_SHIFT);
>>>                    }
>>>                    workspace->strm.next_in = workspace->buf;
>>> +                workspace->strm.avail_in = min(bytes_left,
>>> +                                   in_buf_folios << PAGE_SHIFT);
>>>                } else {
>>>                    unsigned int pg_off;
>>>                    unsigned int cur_len;
>>
>
diff mbox series

Patch

diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index ddf0d5a448a7..c9e92c6941ec 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -174,10 +174,10 @@  int zlib_compress_folios(struct list_head *ws, struct address_space *mapping,
 					copy_page(workspace->buf + i * PAGE_SIZE,
 						  data_in);
 					start += PAGE_SIZE;
-					workspace->strm.avail_in =
-						(in_buf_folios << PAGE_SHIFT);
 				}
 				workspace->strm.next_in = workspace->buf;
+				workspace->strm.avail_in = min(bytes_left,
+							       in_buf_folios << PAGE_SHIFT);
 			} else {
 				unsigned int pg_off;
 				unsigned int cur_len;