Message ID | 1582649726-15474-1-git-send-email-cai@lca.pw (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/vmscan: fix data races at kswapd_classzone_idx | expand |
On Tue, Feb 25, 2020 at 11:55:26AM -0500, Qian Cai wrote: > - if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES) > - pgdat->kswapd_classzone_idx = classzone_idx; > - else > - pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx, > - classzone_idx); > + if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES || > + READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx) > + WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx); > + > pgdat->kswapd_order = max(pgdat->kswapd_order, order); Doesn't this line have the exact same problem you're "solving" above? Also, why would you do this crazy "f(READ_ONCE(x)) || g(READ_ONCE(x))? Surely you should be stashing READ_ONCE(x) into a local variable? And there are a _lot_ of places which access kswapd_classzone_idx without a lock. Are you sure this patch is sufficient?
> On Feb 25, 2020, at 8:29 PM, Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, Feb 25, 2020 at 11:55:26AM -0500, Qian Cai wrote: >> - if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES) >> - pgdat->kswapd_classzone_idx = classzone_idx; >> - else >> - pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx, >> - classzone_idx); >> + if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES || >> + READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx) >> + WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx); >> + >> pgdat->kswapd_order = max(pgdat->kswapd_order, order); > > Doesn't this line have the exact same problem you're "solving" above? Good catch. I missed that. > > Also, why would you do this crazy "f(READ_ONCE(x)) || g(READ_ONCE(x))? > Surely you should be stashing READ_ONCE(x) into a local variable? Not a bad idea. > > And there are a _lot_ of places which access kswapd_classzone_idx > without a lock. Are you sure this patch is sufficient? I am not sure, but KCSAN did not complain about other places after running extended period of testing, so I will look into once other places show up later.
On Tue, 25 Feb 2020 11:55:26 -0500 Qian Cai <cai@lca.pw> wrote: > pgdat->kswapd_classzone_idx could be accessed concurrently in > wakeup_kswapd(). Plain writes and reads without any lock protection > result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as > well as saving a branch (compilers might well optimize the original code > in an unintentional way anyway). The data races were reported by KCSAN, > > ... > > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -3961,11 +3961,10 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order, > return; > pgdat = zone->zone_pgdat; > > - if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES) > - pgdat->kswapd_classzone_idx = classzone_idx; > - else > - pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx, > - classzone_idx); > + if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES || > + READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx) > + WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx); > + > pgdat->kswapd_order = max(pgdat->kswapd_order, order); > if (!waitqueue_active(&pgdat->kswapd_wait)) > return; This is very partial, isn't it? The above code itself is racy against other code which manipulates ->kswapd_classzone_idx and the manipulation in allow_direct_reclaim() is performed by threads other than kswapd and so need the READ_ONCE treatment and is still racy with that? I guess occasional races here don't really matter, but a grossly wrong read from load tearing might matter. In which case shouldn't we be defending against them in all cases where non-kswapd threads read this field?
> On Feb 25, 2020, at 9:11 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 25 Feb 2020 11:55:26 -0500 Qian Cai <cai@lca.pw> wrote: > >> pgdat->kswapd_classzone_idx could be accessed concurrently in >> wakeup_kswapd(). Plain writes and reads without any lock protection >> result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as >> well as saving a branch (compilers might well optimize the original code >> in an unintentional way anyway). The data races were reported by KCSAN, >> >> ... >> >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -3961,11 +3961,10 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order, >> return; >> pgdat = zone->zone_pgdat; >> >> - if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES) >> - pgdat->kswapd_classzone_idx = classzone_idx; >> - else >> - pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx, >> - classzone_idx); >> + if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES || >> + READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx) >> + WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx); >> + >> pgdat->kswapd_order = max(pgdat->kswapd_order, order); >> if (!waitqueue_active(&pgdat->kswapd_wait)) >> return; > > This is very partial, isn't it? The above code itself is racy against > other code which manipulates ->kswapd_classzone_idx and the > manipulation in allow_direct_reclaim() is performed by threads other > than kswapd and so need the READ_ONCE treatment and is still racy with > that? Right, I suppose allow_direct_reclaim() could use some treatment too. > > I guess occasional races here don't really matter, but a grossly wrong > read from load tearing might matter. In which case shouldn't we be > defending against them in all cases where non-kswapd threads read this > field?
diff --git a/mm/vmscan.c b/mm/vmscan.c index 876370565455..400950734e5a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3961,11 +3961,10 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order, return; pgdat = zone->zone_pgdat; - if (pgdat->kswapd_classzone_idx == MAX_NR_ZONES) - pgdat->kswapd_classzone_idx = classzone_idx; - else - pgdat->kswapd_classzone_idx = max(pgdat->kswapd_classzone_idx, - classzone_idx); + if (READ_ONCE(pgdat->kswapd_classzone_idx) == MAX_NR_ZONES || + READ_ONCE(pgdat->kswapd_classzone_idx) < classzone_idx) + WRITE_ONCE(pgdat->kswapd_classzone_idx, classzone_idx); + pgdat->kswapd_order = max(pgdat->kswapd_order, order); if (!waitqueue_active(&pgdat->kswapd_wait)) return;
pgdat->kswapd_classzone_idx could be accessed concurrently in wakeup_kswapd(). Plain writes and reads without any lock protection result in data races. Fix them by adding a pair of READ|WRITE_ONCE() as well as saving a branch (compilers might well optimize the original code in an unintentional way anyway). The data races were reported by KCSAN, BUG: KCSAN: data-race in wakeup_kswapd / wakeup_kswapd write to 0xffff9f427ffff2dc of 4 bytes by task 7454 on cpu 13: wakeup_kswapd+0xf1/0x400 wakeup_kswapd at mm/vmscan.c:3967 wake_all_kswapds+0x59/0xc0 wake_all_kswapds at mm/page_alloc.c:4241 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_slowpath at mm/page_alloc.c:4512 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 1 lock held by mtest01/7454: #0: ffff9f425afe8808 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x143/0x6f9 do_user_addr_fault at arch/x86/mm/fault.c:1405 (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 irq event stamp: 6944085 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x34c/0x57c irq_exit+0xa2/0xc0 read to 0xffff9f427ffff2dc of 4 bytes by task 7472 on cpu 38: wakeup_kswapd+0xc8/0x400 wake_all_kswapds+0x59/0xc0 __alloc_pages_slowpath+0xdcc/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x16e/0x6f0 __handle_mm_fault+0xcd5/0xd40 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 1 lock held by mtest01/7472: #0: ffff9f425a9ac148 (&mm->mmap_sem#2){++++}, at: do_page_fault+0x143/0x6f9 irq event stamp: 6793561 count_memcg_event_mm+0x1a6/0x270 count_memcg_event_mm+0x119/0x270 __do_softirq+0x34c/0x57c irq_exit+0xa2/0xc0 Signed-off-by: Qian Cai <cai@lca.pw> --- mm/vmscan.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)