diff mbox series

[1/2] mm: fix null-ptr-deref in kswapd_is_running()

Message ID 20220824071909.192535-1-wangkefeng.wang@huawei.com (mailing list archive)
State New
Headers show
Series [1/2] mm: fix null-ptr-deref in kswapd_is_running() | expand

Commit Message

Kefeng Wang Aug. 24, 2022, 7:19 a.m. UTC
The kswapd_run/stop() will set pgdat->kswapd to NULL, which
could race with kswapd_is_running() in kcompactd(),

kswapd_run/stop()	kcompactd()
			  kswapd_is_running()
				if (pgdat->kswapd) // load non-NULL pgdat->kswapd
  pgdat->kswapd = NULL
				task_is_running(pgdat->kswapd) // Null pointer derefence

The KASAN report the null-ptr-deref shown below,

  vmscan: Failed to start kswapd on node 0
  ...
  BUG: KASAN: null-ptr-deref in kcompactd+0x440/0x504
  Read of size 8 at addr 0000000000000024 by task kcompactd0/37

  CPU: 0 PID: 37 Comm: kcompactd0 Kdump: loaded Tainted: G           OE     5.10.60 #1
  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  Call trace:
   dump_backtrace+0x0/0x394
   show_stack+0x34/0x4c
   dump_stack+0x158/0x1e4
   __kasan_report+0x138/0x140
   kasan_report+0x44/0xdc
   __asan_load8+0x94/0xd0
   kcompactd+0x440/0x504
   kthread+0x1a4/0x1f0
   ret_from_fork+0x10/0x18

For race between kswapd_run() and kcompactd(), adding a temporary value
when create a kthread, and only set it to pgdat->kswapd if kthread_run()
return successful task_struct to fix the issue.

For race between kswapd_stop() and kcompactd(), let's call kcompactd_stop()
before kswapd_stop() to fix the issue.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/memory_hotplug.c | 2 +-
 mm/vmscan.c         | 8 +++++---
 2 files changed, 6 insertions(+), 4 deletions(-)

Comments

David Hildenbrand Aug. 24, 2022, 7:56 a.m. UTC | #1
On 24.08.22 09:19, Kefeng Wang wrote:
> The kswapd_run/stop() will set pgdat->kswapd to NULL, which
> could race with kswapd_is_running() in kcompactd(),
> 
> kswapd_run/stop()	kcompactd()
> 			  kswapd_is_running()
> 				if (pgdat->kswapd) // load non-NULL pgdat->kswapd
>   pgdat->kswapd = NULL
> 				task_is_running(pgdat->kswapd) // Null pointer derefence
> 
> The KASAN report the null-ptr-deref shown below,
> 
>   vmscan: Failed to start kswapd on node 0
>   ...
>   BUG: KASAN: null-ptr-deref in kcompactd+0x440/0x504
>   Read of size 8 at addr 0000000000000024 by task kcompactd0/37
> 
>   CPU: 0 PID: 37 Comm: kcompactd0 Kdump: loaded Tainted: G           OE     5.10.60 #1
>   Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
>   Call trace:
>    dump_backtrace+0x0/0x394
>    show_stack+0x34/0x4c
>    dump_stack+0x158/0x1e4
>    __kasan_report+0x138/0x140
>    kasan_report+0x44/0xdc
>    __asan_load8+0x94/0xd0
>    kcompactd+0x440/0x504
>    kthread+0x1a4/0x1f0
>    ret_from_fork+0x10/0x18
> 
> For race between kswapd_run() and kcompactd(), adding a temporary value
> when create a kthread, and only set it to pgdat->kswapd if kthread_run()
> return successful task_struct to fix the issue.
> 
> For race between kswapd_stop() and kcompactd(), let's call kcompactd_stop()
> before kswapd_stop() to fix the issue.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/memory_hotplug.c | 2 +-
>  mm/vmscan.c         | 8 +++++---
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index fad6d1f2262a..2fd45ccbce45 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1940,8 +1940,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  
>  	node_states_clear_node(node, &arg);
>  	if (arg.status_change_nid >= 0) {
> -		kswapd_stop(node);
>  		kcompactd_stop(node);
> +		kswapd_stop(node);
>  	}

This looks just fragile to randomly break again in the future when
people work on this code without being aware of this condition. Or once
with other (future?) kswapd_is_running() users. We at least need some
comment explaining that the order here matters and why.

But I do wonder if we can't handle it in a cleaner, more obvious, way.

kswapd_start()/kswapd_stop() should have a proper way to synchronize
with kswapd_is_running(). Just the matter of finding a suitable locking
primitive :)
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index fad6d1f2262a..2fd45ccbce45 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1940,8 +1940,8 @@  int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 
 	node_states_clear_node(node, &arg);
 	if (arg.status_change_nid >= 0) {
-		kswapd_stop(node);
 		kcompactd_stop(node);
+		kswapd_stop(node);
 	}
 
 	writeback_set_ratelimit();
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b2b1431352dc..08c6497f76c3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4642,16 +4642,18 @@  unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 void kswapd_run(int nid)
 {
 	pg_data_t *pgdat = NODE_DATA(nid);
+	struct task_struct *t;
 
 	if (pgdat->kswapd)
 		return;
 
-	pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
-	if (IS_ERR(pgdat->kswapd)) {
+	t = kthread_run(kswapd, pgdat, "kswapd%d", nid);
+	if (IS_ERR(t)) {
 		/* failure at boot is fatal */
 		BUG_ON(system_state < SYSTEM_RUNNING);
 		pr_err("Failed to start kswapd on node %d\n", nid);
-		pgdat->kswapd = NULL;
+	} else {
+		pgdat->kswapd = t;
 	}
 }