Message ID | 20200206183000.913-1-cai@lca.pw (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [-next,v2] mm: mark an intentional data race in page_zonenum | expand |
On Thu, Feb 06, 2020 at 01:30:00PM -0500, Qian Cai wrote: > Both the read and write are done only with the non-exclusive mmap_sem > held. Since the read only check for a specific bit range (up to 3 bits) > in the flag but the write here never change those 3 bits, so load > tearing would be harmless here. Thus, just mark it as an intentional > data races using the data_race() macro which is designed for those > situations [1]. This changelog makes me think you don't really understand the situation. A page never changes its zone number. The zone number happens to be stored in the same word as other bits which are modified, but the zone number bits will never be modified by any other write. So we can accept a reload of the zone bits after an intervening write and we don't need to use READ_ONCE().
> On Feb 6, 2020, at 3:20 PM, Matthew Wilcox <willy@infradead.org> wrote: > > On Thu, Feb 06, 2020 at 01:30:00PM -0500, Qian Cai wrote: >> Both the read and write are done only with the non-exclusive mmap_sem >> held. Since the read only check for a specific bit range (up to 3 bits) >> in the flag but the write here never change those 3 bits, so load >> tearing would be harmless here. Thus, just mark it as an intentional >> data races using the data_race() macro which is designed for those >> situations [1]. > > This changelog makes me think you don't really understand the situation. > > A page never changes its zone number. The zone number happens to be > stored in the same word as other bits which are modified, but the zone > number bits will never be modified by any other write. So we can accept > a reload of the zone bits after an intervening write and we don't need > to use READ_ONCE(). Maybe your explanation is better, but I did try to explain the same thing. I’ll let Andrew to decide if he would like to update the commit log a bit with your wording.
On Thu, 6 Feb 2020 15:25:41 -0500 Qian Cai <cai@lca.pw> wrote: > > > > On Feb 6, 2020, at 3:20 PM, Matthew Wilcox <willy@infradead.org> wrote: > > > > On Thu, Feb 06, 2020 at 01:30:00PM -0500, Qian Cai wrote: > >> Both the read and write are done only with the non-exclusive mmap_sem > >> held. Since the read only check for a specific bit range (up to 3 bits) > >> in the flag but the write here never change those 3 bits, so load > >> tearing would be harmless here. Thus, just mark it as an intentional > >> data races using the data_race() macro which is designed for those > >> situations [1]. > > > > This changelog makes me think you don't really understand the situation. > > > > A page never changes its zone number. The zone number happens to be > > stored in the same word as other bits which are modified, but the zone > > number bits will never be modified by any other write. So we can accept > > a reload of the zone bits after an intervening write and we don't need > > to use READ_ONCE(). > > Maybe your explanation is better, but I did try to explain the same thing. > I’ll let Andrew to decide if he would like to update the commit log a bit > with your wording. Using data_race() here seems misleading - there is no race, but we're using data_race() to suppress a false positive warning from KCSAN, yes? That's a bit hacky. If this happens rarely then perhaps adding a suitable comment in page_zonenum() which explains this will suffice. But if we keep on abusing data_race() in this fashion then it would be better to add a new macro for this purpose.
> On Feb 9, 2020, at 9:20 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > > Using data_race() here seems misleading - there is no race, but we're > using data_race() to suppress a false positive warning from KCSAN, yes? It is a data race in the sense of compilers, i.e., KCSAN is a compiler instrumentation, so here the load and store are both in word-size, but code here is only interested in 3 bits which are never changed. Thus, it is a harmless data race. Marco also mentioned, “Various options were considered, and based on feedback from Linus, decided 'data_race(..)' is the best option:” lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/ Paul also said, ”People will get used to the name more quickly than they will get used to typing the extra seven characters. Here is the current comment header: /* * data_race(): macro to document that accesses in an expression may conflict with * other concurrent accesses resulting in data races, but the resulting * behaviour is deemed safe regardless. * * This macro *does not* affect normal code generation, but is a hint to tooling * that data races here should be ignored. */ I will be converting this to docbook form. In addition, in the KCSAN documentation: * KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that any data races due to accesses in ``expr`` should be ignored and resulting behaviour when encountering a data race is deemed safe.”
On Sun, 9 Feb 2020 21:41:56 -0500 Qian Cai <cai@lca.pw> wrote: > > > > On Feb 9, 2020, at 9:20 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > > > > Using data_race() here seems misleading - there is no race, but we're > > using data_race() to suppress a false positive warning from KCSAN, yes? > > It is a data race in the sense of compilers, i.e., KCSAN is a compiler instrumentation, so here the load and store are both in word-size, but code here is only interested in 3 bits which are never changed. Thus, it is a harmless data race. > > Marco also mentioned, > > “Various options were considered, and based on feedback from Linus, > decided 'data_race(..)' is the best option:” > > lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/ > > Paul also said, > > ”People will get used to the name more quickly than they will get used > to typing the extra seven characters. Here is the current comment header: > > /* > * data_race(): macro to document that accesses in an expression may conflict with > * other concurrent accesses resulting in data races, but the resulting > * behaviour is deemed safe regardless. > * > * This macro *does not* affect normal code generation, but is a hint to tooling > * that data races here should be ignored. > */ > > I will be converting this to docbook form. > > In addition, in the KCSAN documentation: > > * KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that > any data races due to accesses in ``expr`` should be ignored and resulting > behaviour when encountering a data race is deemed safe.” OK. But I believe page_zonenum() still deserves a comment explaining that there is no race and explaining why we're using data_race() anyway. Otherwise the use of data_race() is simply misleading.
On Mon, 10 Feb 2020 at 05:06, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Sun, 9 Feb 2020 21:41:56 -0500 Qian Cai <cai@lca.pw> wrote: > > > > > > > > On Feb 9, 2020, at 9:20 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > Using data_race() here seems misleading - there is no race, but we're > > > using data_race() to suppress a false positive warning from KCSAN, yes? > > > > It is a data race in the sense of compilers, i.e., KCSAN is a compiler instrumentation, so here the load and store are both in word-size, but code here is only interested in 3 bits which are never changed. Thus, it is a harmless data race. > > > > Marco also mentioned, > > > > “Various options were considered, and based on feedback from Linus, > > decided 'data_race(..)' is the best option:” > > > > lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/ > > > > Paul also said, > > > > ”People will get used to the name more quickly than they will get used > > to typing the extra seven characters. Here is the current comment header: > > > > /* > > * data_race(): macro to document that accesses in an expression may conflict with > > * other concurrent accesses resulting in data races, but the resulting > > * behaviour is deemed safe regardless. > > * > > * This macro *does not* affect normal code generation, but is a hint to tooling > > * that data races here should be ignored. > > */ > > > > I will be converting this to docbook form. > > > > In addition, in the KCSAN documentation: > > > > * KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that > > any data races due to accesses in ``expr`` should be ignored and resulting > > behaviour when encountering a data race is deemed safe.” > > OK. But I believe page_zonenum() still deserves a comment explaining > that there is no race and explaining why we're using data_race() > anyway. Otherwise the use of data_race() is simply misleading. > I have a better suggestion for page_zonenum(), pending a patch if it makes sense: http://lkml.kernel.org/r/CANpmjNNaHAnKCMLb+Njs3AhEoJT9O6-Yh63fcNcVTjBbNQiEPg@mail.gmail.com If that makes more sense, the patch here could eventually be replaced. Thanks, -- Marco
diff --git a/include/linux/mm.h b/include/linux/mm.h index 52269e56c514..cafccad584c2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -920,7 +920,7 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf); static inline enum zone_type page_zonenum(const struct page *page) { - return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; + return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK); } #ifdef CONFIG_ZONE_DEVICE
page->flags could be accessed concurrently as noticied by KCSAN, BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3: page_cpupid_xchg_last+0x51/0x80 page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11) wp_page_reuse+0x3e/0xc0 wp_page_reuse at mm/memory.c:2453 do_wp_page+0x472/0x7b0 do_wp_page at mm/memory.c:2798 __handle_mm_fault+0xcb0/0xd00 handle_pte_fault at mm/memory.c:4049 (inlined by) __handle_mm_fault at mm/memory.c:4163 handle_mm_fault+0xfc/0x2f0 handle_mm_fault at mm/memory.c:4200 do_page_fault+0x263/0x6f9 do_user_addr_fault at arch/x86/mm/fault.c:1465 (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 page_fault+0x34/0x40 read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69: put_page+0x15a/0x1f0 page_zonenum at include/linux/mm.h:923 (inlined by) is_zone_device_page at include/linux/mm.h:929 (inlined by) page_is_devmap_managed at include/linux/mm.h:948 (inlined by) put_page at include/linux/mm.h:1023 wp_page_copy+0x571/0x930 wp_page_copy at mm/memory.c:2615 do_wp_page+0x107/0x7b0 __handle_mm_fault+0xcb0/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Reported by Kernel Concurrency Sanitizer on: CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Both the read and write are done only with the non-exclusive mmap_sem held. Since the read only check for a specific bit range (up to 3 bits) in the flag but the write here never change those 3 bits, so load tearing would be harmless here. Thus, just mark it as an intentional data races using the data_race() macro which is designed for those situations [1]. [1] https://lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/ Signed-off-by: Qian Cai <cai@lca.pw> --- v2: update the commit log. include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)