[-next,v2] mm: mark an intentional data race in page_zonenum
diff mbox series

Message ID 20200206183000.913-1-cai@lca.pw
State New
Headers show
Series
  • [-next,v2] mm: mark an intentional data race in page_zonenum
Related show

Commit Message

Qian Cai Feb. 6, 2020, 6:30 p.m. UTC
page->flags could be accessed concurrently as noticied by KCSAN,

 BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page

 write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3:
  page_cpupid_xchg_last+0x51/0x80
  page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11)
  wp_page_reuse+0x3e/0xc0
  wp_page_reuse at mm/memory.c:2453
  do_wp_page+0x472/0x7b0
  do_wp_page at mm/memory.c:2798
  __handle_mm_fault+0xcb0/0xd00
  handle_pte_fault at mm/memory.c:4049
  (inlined by) __handle_mm_fault at mm/memory.c:4163
  handle_mm_fault+0xfc/0x2f0
  handle_mm_fault at mm/memory.c:4200
  do_page_fault+0x263/0x6f9
  do_user_addr_fault at arch/x86/mm/fault.c:1465
  (inlined by) do_page_fault at arch/x86/mm/fault.c:1539
  page_fault+0x34/0x40

 read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69:
  put_page+0x15a/0x1f0
  page_zonenum at include/linux/mm.h:923
  (inlined by) is_zone_device_page at include/linux/mm.h:929
  (inlined by) page_is_devmap_managed at include/linux/mm.h:948
  (inlined by) put_page at include/linux/mm.h:1023
  wp_page_copy+0x571/0x930
  wp_page_copy at mm/memory.c:2615
  do_wp_page+0x107/0x7b0
  __handle_mm_fault+0xcb0/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G        W  O L 5.5.0-next-20200204+ #6
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

Both the read and write are done only with the non-exclusive mmap_sem
held. Since the read only check for a specific bit range (up to 3 bits)
in the flag but the write here never change those 3 bits, so load
tearing would be harmless here. Thus, just mark it as an intentional
data races using the data_race() macro which is designed for those
situations [1].

[1] https://lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/

Signed-off-by: Qian Cai <cai@lca.pw>
---

v2: update the commit log.

 include/linux/mm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Matthew Wilcox Feb. 6, 2020, 8:20 p.m. UTC | #1
On Thu, Feb 06, 2020 at 01:30:00PM -0500, Qian Cai wrote:
> Both the read and write are done only with the non-exclusive mmap_sem
> held. Since the read only check for a specific bit range (up to 3 bits)
> in the flag but the write here never change those 3 bits, so load
> tearing would be harmless here. Thus, just mark it as an intentional
> data races using the data_race() macro which is designed for those
> situations [1].

This changelog makes me think you don't really understand the situation.

A page never changes its zone number.  The zone number happens to be
stored in the same word as other bits which are modified, but the zone
number bits will never be modified by any other write.  So we can accept
a reload of the zone bits after an intervening write and we don't need
to use READ_ONCE().
Qian Cai Feb. 6, 2020, 8:25 p.m. UTC | #2
> On Feb 6, 2020, at 3:20 PM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Thu, Feb 06, 2020 at 01:30:00PM -0500, Qian Cai wrote:
>> Both the read and write are done only with the non-exclusive mmap_sem
>> held. Since the read only check for a specific bit range (up to 3 bits)
>> in the flag but the write here never change those 3 bits, so load
>> tearing would be harmless here. Thus, just mark it as an intentional
>> data races using the data_race() macro which is designed for those
>> situations [1].
> 
> This changelog makes me think you don't really understand the situation.
> 
> A page never changes its zone number.  The zone number happens to be
> stored in the same word as other bits which are modified, but the zone
> number bits will never be modified by any other write.  So we can accept
> a reload of the zone bits after an intervening write and we don't need
> to use READ_ONCE().

Maybe your explanation is better, but I did try to explain the same thing.
I’ll let Andrew to decide if he would like to update the commit log a bit
with your wording.
Andrew Morton Feb. 10, 2020, 2:20 a.m. UTC | #3
On Thu, 6 Feb 2020 15:25:41 -0500 Qian Cai <cai@lca.pw> wrote:

> 
> 
> > On Feb 6, 2020, at 3:20 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > 
> > On Thu, Feb 06, 2020 at 01:30:00PM -0500, Qian Cai wrote:
> >> Both the read and write are done only with the non-exclusive mmap_sem
> >> held. Since the read only check for a specific bit range (up to 3 bits)
> >> in the flag but the write here never change those 3 bits, so load
> >> tearing would be harmless here. Thus, just mark it as an intentional
> >> data races using the data_race() macro which is designed for those
> >> situations [1].
> > 
> > This changelog makes me think you don't really understand the situation.
> > 
> > A page never changes its zone number.  The zone number happens to be
> > stored in the same word as other bits which are modified, but the zone
> > number bits will never be modified by any other write.  So we can accept
> > a reload of the zone bits after an intervening write and we don't need
> > to use READ_ONCE().
> 
> Maybe your explanation is better, but I did try to explain the same thing.
> I’ll let Andrew to decide if he would like to update the commit log a bit
> with your wording.

Using data_race() here seems misleading - there is no race, but we're
using data_race() to suppress a false positive warning from KCSAN, yes?

That's a bit hacky.  If this happens rarely then perhaps adding a
suitable comment in page_zonenum() which explains this will suffice. 
But if we keep on abusing data_race() in this fashion then it would be
better to add a new macro for this purpose.
Qian Cai Feb. 10, 2020, 2:41 a.m. UTC | #4
> On Feb 9, 2020, at 9:20 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> Using data_race() here seems misleading - there is no race, but we're
> using data_race() to suppress a false positive warning from KCSAN, yes?

It is a data race in the sense of compilers, i.e., KCSAN is a compiler instrumentation, so here the load and store are both in word-size, but code here is only interested in 3 bits which are never changed. Thus, it is a harmless data race.

Marco also mentioned,

“Various options were considered, and based on feedback from Linus,
decided 'data_race(..)' is the best option:”

lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/

Paul also said,

”People will get used to the name more quickly than they will get used
to typing the extra seven characters.  Here is the current comment header:

/*
 * data_race(): macro to document that accesses in an expression may conflict with
 * other concurrent accesses resulting in data races, but the resulting
 * behaviour is deemed safe regardless.
 *
 * This macro *does not* affect normal code generation, but is a hint to tooling
 * that data races here should be ignored.
 */

I will be converting this to docbook form.

In addition, in the KCSAN documentation:

* KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that
  any data races due to accesses in ``expr`` should be ignored and resulting
  behaviour when encountering a data race is deemed safe.”
Andrew Morton Feb. 10, 2020, 4:06 a.m. UTC | #5
On Sun, 9 Feb 2020 21:41:56 -0500 Qian Cai <cai@lca.pw> wrote:

> 
> 
> > On Feb 9, 2020, at 9:20 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > Using data_race() here seems misleading - there is no race, but we're
> > using data_race() to suppress a false positive warning from KCSAN, yes?
> 
> It is a data race in the sense of compilers, i.e., KCSAN is a compiler instrumentation, so here the load and store are both in word-size, but code here is only interested in 3 bits which are never changed. Thus, it is a harmless data race.
> 
> Marco also mentioned,
> 
> “Various options were considered, and based on feedback from Linus,
> decided 'data_race(..)' is the best option:”
> 
> lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/
> 
> Paul also said,
> 
> ”People will get used to the name more quickly than they will get used
> to typing the extra seven characters.  Here is the current comment header:
> 
> /*
>  * data_race(): macro to document that accesses in an expression may conflict with
>  * other concurrent accesses resulting in data races, but the resulting
>  * behaviour is deemed safe regardless.
>  *
>  * This macro *does not* affect normal code generation, but is a hint to tooling
>  * that data races here should be ignored.
>  */
> 
> I will be converting this to docbook form.
> 
> In addition, in the KCSAN documentation:
> 
> * KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that
>   any data races due to accesses in ``expr`` should be ignored and resulting
>   behaviour when encountering a data race is deemed safe.”

OK.  But I believe page_zonenum() still deserves a comment explaining
that there is no race and explaining why we're using data_race()
anyway.  Otherwise the use of data_race() is simply misleading.
Marco Elver Feb. 10, 2020, 7:58 a.m. UTC | #6
On Mon, 10 Feb 2020 at 05:06, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Sun, 9 Feb 2020 21:41:56 -0500 Qian Cai <cai@lca.pw> wrote:
>
> >
> >
> > > On Feb 9, 2020, at 9:20 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > Using data_race() here seems misleading - there is no race, but we're
> > > using data_race() to suppress a false positive warning from KCSAN, yes?
> >
> > It is a data race in the sense of compilers, i.e., KCSAN is a compiler instrumentation, so here the load and store are both in word-size, but code here is only interested in 3 bits which are never changed. Thus, it is a harmless data race.
> >
> > Marco also mentioned,
> >
> > “Various options were considered, and based on feedback from Linus,
> > decided 'data_race(..)' is the best option:”
> >
> > lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/
> >
> > Paul also said,
> >
> > ”People will get used to the name more quickly than they will get used
> > to typing the extra seven characters.  Here is the current comment header:
> >
> > /*
> >  * data_race(): macro to document that accesses in an expression may conflict with
> >  * other concurrent accesses resulting in data races, but the resulting
> >  * behaviour is deemed safe regardless.
> >  *
> >  * This macro *does not* affect normal code generation, but is a hint to tooling
> >  * that data races here should be ignored.
> >  */
> >
> > I will be converting this to docbook form.
> >
> > In addition, in the KCSAN documentation:
> >
> > * KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that
> >   any data races due to accesses in ``expr`` should be ignored and resulting
> >   behaviour when encountering a data race is deemed safe.”
>
> OK.  But I believe page_zonenum() still deserves a comment explaining
> that there is no race and explaining why we're using data_race()
> anyway.  Otherwise the use of data_race() is simply misleading.
>

I have a better suggestion for page_zonenum(), pending a patch if it
makes sense:
  http://lkml.kernel.org/r/CANpmjNNaHAnKCMLb+Njs3AhEoJT9O6-Yh63fcNcVTjBbNQiEPg@mail.gmail.com
If that makes more sense, the patch here could eventually be replaced.

Thanks,
-- Marco

Patch
diff mbox series

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52269e56c514..cafccad584c2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -920,7 +920,7 @@  vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
 
 static inline enum zone_type page_zonenum(const struct page *page)
 {
-	return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
+	return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK);
 }
 
 #ifdef CONFIG_ZONE_DEVICE