diff mbox series

[v2] mm/swap: consider max pages in iomap_swapfile_add_extent

Message ID 6dac22b2a8a22254fc538054cb42a32e53d2482b.1629355682.git.xuyu@linux.alibaba.com (mailing list archive)
State New
Headers show
Series [v2] mm/swap: consider max pages in iomap_swapfile_add_extent | expand

Commit Message

Xu Yu Aug. 19, 2021, 6:52 a.m. UTC
When the max pages (last_page in the swap header + 1) is smaller than
the total pages (inode size) of the swapfile, iomap_swapfile_activate
overwrites sis->max with total pages.

However, frontswap_map is allocated using max pages. When test and clear
the sis offset, which is larger than max pages, of frontswap_map in
__frontswap_invalidate_page(), neighbors of frontswap_map may be
overwritten, i.e., slab is polluted.

This fixes the issue by considering the limitation of max pages of swap
info in iomap_swapfile_add_extent().

To reproduce the case, compile kernel with slub RED ZONE, then run test:
$ sudo stress-ng -a 1 -x softlockup,resources -t 1h --metrics --times \
 --verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \
 --log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \
 --temp-path /root/tmpdir/stress-ng/

We'll get the following error log usually within 5 minutes.

[ 1151.015141] =============================================================================
[ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten
[ 1151.017486] -----------------------------------------------------------------------------
[ 1151.017486]
[ 1151.018997] Disabling lock debugging due to kernel taint
[ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc
[ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816
[ 1151.022538]  __slab_alloc+0xe/0x20
[ 1151.023069]  __kmalloc_node+0xfd/0x4b0
[ 1151.023704]  __do_sys_swapon+0xcf6/0x1170
[ 1151.024346]  do_syscall_64+0x33/0x40
[ 1151.024925]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041
[ 1151.026889]  kfree+0x276/0x2b0
[ 1151.027405]  put_cred_rcu+0xa1/0xc0
[ 1151.027949]  rcu_do_batch+0x17d/0x410
[ 1151.028566]  rcu_core+0x14e/0x2b0
[ 1151.029084]  __do_softirq+0x101/0x29e
[ 1151.029645]  asm_call_irq_on_stack+0x12/0x20
[ 1151.030381]  do_softirq_own_stack+0x37/0x40
[ 1151.031037]  do_softirq.part.15+0x2b/0x30
[ 1151.031710]  __local_bh_enable_ip+0x4b/0x50
[ 1151.032412]  copy_fpstate_to_sigframe+0x111/0x360
[ 1151.033197]  __setup_rt_frame+0xce/0x480
[ 1151.033809]  arch_do_signal+0x1a3/0x250
[ 1151.034463]  exit_to_user_mode_prepare+0xcf/0x110
[ 1151.035242]  syscall_exit_to_user_mode+0x27/0x190
[ 1151.035970]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201
[ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000
[ 1151.038323]
[ 1151.039683] Redzone  000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
[ 1151.041180] Object   000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 1151.042714] Redzone  0000000084e43932: 00 00 00 c0 cc cc cc cc                          ........
[ 1151.044120] Padding  000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
[ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G    B             5.10.50+ #7
[ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 1151.048633] Call Trace:
[ 1151.049072]  dump_stack+0x57/0x6a
[ 1151.049585]  check_bytes_and_report+0xed/0x110
[ 1151.050320]  check_object+0x1eb/0x290
[ 1151.050924]  ? __x64_sys_swapoff+0x39a/0x540
[ 1151.051646]  free_debug_processing+0x151/0x350
[ 1151.052333]  __slab_free+0x21a/0x3a0
[ 1151.052938]  ? _cond_resched+0x2d/0x40
[ 1151.053529]  ? __vunmap+0x1de/0x220
[ 1151.054139]  ? __x64_sys_swapoff+0x39a/0x540
[ 1151.054796]  ? kfree+0x276/0x2b0
[ 1151.055307]  kfree+0x276/0x2b0
[ 1151.055832]  __x64_sys_swapoff+0x39a/0x540
[ 1151.056466]  do_syscall_64+0x33/0x40
[ 1151.057084]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1151.057866] RIP: 0033:0x150340b0ffb7
[ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d.
[ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8
[ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7
[ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860
[ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd
[ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094
[ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c
[ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc
[ 1151.066413]
[ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed

Fixes: 0e6895ba00b7 ("ext4: implement swap_activate aops using iomap")
Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com>
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
v1->v2:
- update the commit log
- add Reviewed-by of Darrick J. Wong
---
 fs/iomap/swapfile.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Christoph Hellwig Aug. 19, 2021, 8:25 a.m. UTC | #1
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong Aug. 19, 2021, 4:48 p.m. UTC | #2
On Thu, Aug 19, 2021 at 02:52:05PM +0800, Xu Yu wrote:
> When the max pages (last_page in the swap header + 1) is smaller than
> the total pages (inode size) of the swapfile, iomap_swapfile_activate
> overwrites sis->max with total pages.
> 
> However, frontswap_map is allocated using max pages. When test and clear
> the sis offset, which is larger than max pages, of frontswap_map in
> __frontswap_invalidate_page(), neighbors of frontswap_map may be
> overwritten, i.e., slab is polluted.
> 
> This fixes the issue by considering the limitation of max pages of swap
> info in iomap_swapfile_add_extent().
> 
> To reproduce the case, compile kernel with slub RED ZONE, then run test:
> $ sudo stress-ng -a 1 -x softlockup,resources -t 1h --metrics --times \
>  --verify -v -Y /root/tmpdir/stress-ng/stress-statistic-12.yaml \
>  --log-file /root/tmpdir/stress-ng/stress-logfile-12.txt \
>  --temp-path /root/tmpdir/stress-ng/
> 
> We'll get the following error log usually within 5 minutes.
> 
> [ 1151.015141] =============================================================================
> [ 1151.016489] BUG kmalloc-16 (Not tainted): Right Redzone overwritten
> [ 1151.017486] -----------------------------------------------------------------------------
> [ 1151.017486]
> [ 1151.018997] Disabling lock debugging due to kernel taint
> [ 1151.019873] INFO: 0x0000000084e43932-0x0000000098d17cae @offset=7392. First byte 0x0 instead of 0xcc
> [ 1151.021303] INFO: Allocated in __do_sys_swapon+0xcf6/0x1170 age=43417 cpu=9 pid=3816
> [ 1151.022538]  __slab_alloc+0xe/0x20
> [ 1151.023069]  __kmalloc_node+0xfd/0x4b0
> [ 1151.023704]  __do_sys_swapon+0xcf6/0x1170
> [ 1151.024346]  do_syscall_64+0x33/0x40
> [ 1151.024925]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1151.025749] INFO: Freed in put_cred_rcu+0xa1/0xc0 age=43424 cpu=3 pid=2041
> [ 1151.026889]  kfree+0x276/0x2b0
> [ 1151.027405]  put_cred_rcu+0xa1/0xc0
> [ 1151.027949]  rcu_do_batch+0x17d/0x410
> [ 1151.028566]  rcu_core+0x14e/0x2b0
> [ 1151.029084]  __do_softirq+0x101/0x29e
> [ 1151.029645]  asm_call_irq_on_stack+0x12/0x20
> [ 1151.030381]  do_softirq_own_stack+0x37/0x40
> [ 1151.031037]  do_softirq.part.15+0x2b/0x30
> [ 1151.031710]  __local_bh_enable_ip+0x4b/0x50
> [ 1151.032412]  copy_fpstate_to_sigframe+0x111/0x360
> [ 1151.033197]  __setup_rt_frame+0xce/0x480
> [ 1151.033809]  arch_do_signal+0x1a3/0x250
> [ 1151.034463]  exit_to_user_mode_prepare+0xcf/0x110
> [ 1151.035242]  syscall_exit_to_user_mode+0x27/0x190
> [ 1151.035970]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1151.036795] INFO: Slab 0x000000003b9de4dc objects=44 used=9 fp=0x00000000539e349e flags=0xfffffc0010201
> [ 1151.038323] INFO: Object 0x000000004855ba01 @offset=7376 fp=0x0000000000000000
> [ 1151.038323]
> [ 1151.039683] Redzone  000000008d0afd3d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
> [ 1151.041180] Object   000000004855ba01: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [ 1151.042714] Redzone  0000000084e43932: 00 00 00 c0 cc cc cc cc                          ........
> [ 1151.044120] Padding  000000000864c042: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
> [ 1151.045615] CPU: 5 PID: 3816 Comm: stress-ng Tainted: G    B             5.10.50+ #7
> [ 1151.046846] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [ 1151.048633] Call Trace:
> [ 1151.049072]  dump_stack+0x57/0x6a
> [ 1151.049585]  check_bytes_and_report+0xed/0x110
> [ 1151.050320]  check_object+0x1eb/0x290
> [ 1151.050924]  ? __x64_sys_swapoff+0x39a/0x540
> [ 1151.051646]  free_debug_processing+0x151/0x350
> [ 1151.052333]  __slab_free+0x21a/0x3a0
> [ 1151.052938]  ? _cond_resched+0x2d/0x40
> [ 1151.053529]  ? __vunmap+0x1de/0x220
> [ 1151.054139]  ? __x64_sys_swapoff+0x39a/0x540
> [ 1151.054796]  ? kfree+0x276/0x2b0
> [ 1151.055307]  kfree+0x276/0x2b0
> [ 1151.055832]  __x64_sys_swapoff+0x39a/0x540
> [ 1151.056466]  do_syscall_64+0x33/0x40
> [ 1151.057084]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1151.057866] RIP: 0033:0x150340b0ffb7
> [ 1151.058481] Code: Unable to access opcode bytes at RIP 0x150340b0ff8d.
> [ 1151.059537] RSP: 002b:00007fff7f4ee238 EFLAGS: 00000246 ORIG_RAX: 00000000000000a8
> [ 1151.060768] RAX: ffffffffffffffda RBX: 00007fff7f4ee66c RCX: 0000150340b0ffb7
> [ 1151.061904] RDX: 000000000000000a RSI: 0000000000018094 RDI: 00007fff7f4ee860
> [ 1151.063033] RBP: 00007fff7f4ef980 R08: 0000000000000000 R09: 0000150340a672bd
> [ 1151.064135] R10: 00007fff7f4edca0 R11: 0000000000000246 R12: 0000000000018094
> [ 1151.065253] R13: 0000000000000005 R14: 000000000160d930 R15: 00007fff7f4ee66c
> [ 1151.066413] FIX kmalloc-16: Restoring 0x0000000084e43932-0x0000000098d17cae=0xcc
> [ 1151.066413]
> [ 1151.067890] FIX kmalloc-16: Object at 0x000000004855ba01 not freed
> 
> Fixes: 0e6895ba00b7 ("ext4: implement swap_activate aops using iomap")
> Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com>
> Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Applied, thanks.  I've reworked the second paragraph slightly:

"However, frontswap_map is a swap page state bitmap allocated using the
initial sis->max page count read from the swap header.  If swapfile
activation increases sis->max, it's possible for the frontswap code to
walk off the end of the bitmap, thereby corrupting kernel memory."

--D

> ---
> v1->v2:
> - update the commit log
> - add Reviewed-by of Darrick J. Wong
> ---
>  fs/iomap/swapfile.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> index a5e478de1417..2ceea45aefd8 100644
> --- a/fs/iomap/swapfile.c
> +++ b/fs/iomap/swapfile.c
> @@ -30,11 +30,16 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
>  {
>  	struct iomap *iomap = &isi->iomap;
>  	unsigned long nr_pages;
> +	unsigned long max_pages;
>  	uint64_t first_ppage;
>  	uint64_t first_ppage_reported;
>  	uint64_t next_ppage;
>  	int error;
>  
> +	if (unlikely(isi->nr_pages >= isi->sis->max))
> +		return 0;
> +	max_pages = isi->sis->max - isi->nr_pages;
> +
>  	/*
>  	 * Round the start up and the end down so that the physical
>  	 * extent aligns to a page boundary.
> @@ -47,6 +52,7 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
>  	if (first_ppage >= next_ppage)
>  		return 0;
>  	nr_pages = next_ppage - first_ppage;
> +	nr_pages = min(nr_pages, max_pages);
>  
>  	/*
>  	 * Calculate how much swap space we're adding; the first page contains
> -- 
> 2.20.1.2432.ga663e714
>
diff mbox series

Patch

diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index a5e478de1417..2ceea45aefd8 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -30,11 +30,16 @@  static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
 {
 	struct iomap *iomap = &isi->iomap;
 	unsigned long nr_pages;
+	unsigned long max_pages;
 	uint64_t first_ppage;
 	uint64_t first_ppage_reported;
 	uint64_t next_ppage;
 	int error;
 
+	if (unlikely(isi->nr_pages >= isi->sis->max))
+		return 0;
+	max_pages = isi->sis->max - isi->nr_pages;
+
 	/*
 	 * Round the start up and the end down so that the physical
 	 * extent aligns to a page boundary.
@@ -47,6 +52,7 @@  static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi)
 	if (first_ppage >= next_ppage)
 		return 0;
 	nr_pages = next_ppage - first_ppage;
+	nr_pages = min(nr_pages, max_pages);
 
 	/*
 	 * Calculate how much swap space we're adding; the first page contains