diff mbox series

mm/vmscan: fix data races at kswapd_classzone_idx

Message ID 1582649726-15474-1-git-send-email-cai@lca.pw (mailing list archive)
State New, archived
Headers show
Series mm/vmscan: fix data races at kswapd_classzone_idx | expand

Commit Message

Qian Cai Feb. 25, 2020, 4:55 p.m. UTC
pgdat->kswapd_classzone_idx could be accessed concurrently in
wakeup_kswapd(). Plain writes and reads without any lock protection
result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as
well as saving a branch (compilers might well optimize the original code
in an unintentional way anyway). The data races were reported by KCSAN,

 BUG: KCSAN: data-race in wakeup_kswapd / wakeup_kswapd

 write to 0xffff9f427ffff2dc of 4 bytes by task 7454 on cpu 13:
  wakeup_kswapd+0xf1/0x400
  wakeup_kswapd at mm/vmscan.c:3967
  wake_all_kswapds+0x59/0xc0
  wake_all_kswapds at mm/page_alloc.c:4241
  __alloc_pages_slowpath+0xdcc/0x1290
  __alloc_pages_slowpath at mm/page_alloc.c:4512
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x16e/0x6f0
  __handle_mm_fault+0xcd5/0xd40
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 1 lock held by mtest01/7454:
  #0: ffff9f425afe8808 (&mm->mmap_sem#2){++++}, at:
 do_page_fault+0x143/0x6f9
 do_user_addr_fault at arch/x86/mm/fault.c:1405
 (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
 irq event stamp: 6944085
 count_memcg_event_mm+0x1a6/0x270
 count_memcg_event_mm+0x119/0x270
 __do_softirq+0x34c/0x57c
 irq_exit+0xa2/0xc0

 read to 0xffff9f427ffff2dc of 4 bytes by task 7472 on cpu 38:
  wakeup_kswapd+0xc8/0x400
  wake_all_kswapds+0x59/0xc0
  __alloc_pages_slowpath+0xdcc/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x16e/0x6f0
  __handle_mm_fault+0xcd5/0xd40
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 1 lock held by mtest01/7472:
  #0: ffff9f425a9ac148 (&mm->mmap_sem#2){++++}, at:
 do_page_fault+0x143/0x6f9
 irq event stamp: 6793561
 count_memcg_event_mm+0x1a6/0x270
 count_memcg_event_mm+0x119/0x270
 __do_softirq+0x34c/0x57c
 irq_exit+0xa2/0xc0

Signed-off-by: Qian Cai <cai@lca.pw>
---
 mm/vmscan.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

Comments

Matthew Wilcox Feb. 26, 2020, 1:29 a.m. UTC | #1
On Tue, Feb 25, 2020 at 11:55:26AM -0500, Qian Cai wrote:
> -	if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES)
> -		pgdat->kswapd_classzone_idx = classzone_idx;
> -	else
> -		pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx,
> -						  classzone_idx);
> +	if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES ||
> +	    READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx)
> +		WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx);
> +
>  	pgdat->kswapd_order = max(pgdat->kswapd_order, order);

Doesn't this line have the exact same problem you're "solving" above?

Also, why would you do this crazy "f(READ_ONCE(x)) || g(READ_ONCE(x))?
Surely you should be stashing READ_ONCE(x) into a local variable?

And there are a _lot_ of places which access kswapd_classzone_idx
without a lock.  Are you sure this patch is sufficient?
Qian Cai Feb. 26, 2020, 1:48 a.m. UTC | #2
> On Feb 25, 2020, at 8:29 PM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Tue, Feb 25, 2020 at 11:55:26AM -0500, Qian Cai wrote:
>> -	if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES)
>> -		pgdat->kswapd_classzone_idx = classzone_idx;
>> -	else
>> -		pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx,
>> -						  classzone_idx);
>> +	if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES ||
>> +	    READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx)
>> +		WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx);
>> +
>> 	pgdat->kswapd_order = max(pgdat->kswapd_order, order);
> 
> Doesn't this line have the exact same problem you're "solving" above?

Good catch. I missed that.

> 
> Also, why would you do this crazy "f(READ_ONCE(x)) || g(READ_ONCE(x))?
> Surely you should be stashing READ_ONCE(x) into a local variable?

Not a bad idea.

> 
> And there are a _lot_ of places which access kswapd_classzone_idx
> without a lock.  Are you sure this patch is sufficient?

I am not sure, but KCSAN did not complain about other places after running extended
period of testing, so I will look into once other places show up later.
Andrew Morton Feb. 26, 2020, 2:11 a.m. UTC | #3
On Tue, 25 Feb 2020 11:55:26 -0500 Qian Cai <cai@lca.pw> wrote:

> pgdat->kswapd_classzone_idx could be accessed concurrently in
> wakeup_kswapd(). Plain writes and reads without any lock protection
> result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as
> well as saving a branch (compilers might well optimize the original code
> in an unintentional way anyway). The data races were reported by KCSAN,
> 
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3961,11 +3961,10 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
>  		return;
>  	pgdat = zone->zone_pgdat;
>  
> -	if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES)
> -		pgdat->kswapd_classzone_idx = classzone_idx;
> -	else
> -		pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx,
> -						  classzone_idx);
> +	if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES ||
> +	    READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx)
> +		WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx);
> +
>  	pgdat->kswapd_order = max(pgdat->kswapd_order, order);
>  	if (!waitqueue_active(&pgdat->kswapd_wait))
>  		return;

This is very partial, isn't it?  The above code itself is racy against
other code which manipulates ->kswapd_classzone_idx and the
manipulation in allow_direct_reclaim() is performed by threads other
than kswapd and so need the READ_ONCE treatment and is still racy with
that?

I guess occasional races here don't really matter, but a grossly wrong
read from load tearing might matter.  In which case shouldn't we be
defending against them in all cases where non-kswapd threads read this
field?
Qian Cai Feb. 26, 2020, 2:26 a.m. UTC | #4
> On Feb 25, 2020, at 9:11 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> On Tue, 25 Feb 2020 11:55:26 -0500 Qian Cai <cai@lca.pw> wrote:
> 
>> pgdat->kswapd_classzone_idx could be accessed concurrently in
>> wakeup_kswapd(). Plain writes and reads without any lock protection
>> result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as
>> well as saving a branch (compilers might well optimize the original code
>> in an unintentional way anyway). The data races were reported by KCSAN,
>> 
>> ...
>> 
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -3961,11 +3961,10 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
>> 		return;
>> 	pgdat = zone->zone_pgdat;
>> 
>> -	if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES)
>> -		pgdat->kswapd_classzone_idx = classzone_idx;
>> -	else
>> -		pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx,
>> -						  classzone_idx);
>> +	if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES ||
>> +	    READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx)
>> +		WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx);
>> +
>> 	pgdat->kswapd_order = max(pgdat->kswapd_order, order);
>> 	if (!waitqueue_active(&pgdat->kswapd_wait))
>> 		return;
> 
> This is very partial, isn't it?  The above code itself is racy against
> other code which manipulates ->kswapd_classzone_idx and the
> manipulation in allow_direct_reclaim() is performed by threads other
> than kswapd and so need the READ_ONCE treatment and is still racy with
> that?

Right, I suppose allow_direct_reclaim() could use some treatment too.

> 
> I guess occasional races here don't really matter, but a grossly wrong
> read from load tearing might matter.  In which case shouldn't we be
> defending against them in all cases where non-kswapd threads read this
> field?
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 876370565455..400950734e5a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3961,11 +3961,10 @@  void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
 		return;
 	pgdat = zone->zone_pgdat;
 
-	if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES)
-		pgdat->kswapd_classzone_idx = classzone_idx;
-	else
-		pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx,
-						  classzone_idx);
+	if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES ||
+	    READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx)
+		WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx);
+
 	pgdat->kswapd_order = max(pgdat->kswapd_order, order);
 	if (!waitqueue_active(&pgdat->kswapd_wait))
 		return;