Message ID | 1580995070-25139-1-git-send-email-cai@lca.pw (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: fix a data race in put_page() | expand |
On 06.02.20 14:17, Qian Cai wrote: > page->flags could be accessed concurrently as noticied by KCSAN, > > BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page > > write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3: > page_cpupid_xchg_last+0x51/0x80 > page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11) > wp_page_reuse+0x3e/0xc0 > wp_page_reuse at mm/memory.c:2453 > do_wp_page+0x472/0x7b0 > do_wp_page at mm/memory.c:2798 > __handle_mm_fault+0xcb0/0xd00 > handle_pte_fault at mm/memory.c:4049 > (inlined by) __handle_mm_fault at mm/memory.c:4163 > handle_mm_fault+0xfc/0x2f0 > handle_mm_fault at mm/memory.c:4200 > do_page_fault+0x263/0x6f9 > do_user_addr_fault at arch/x86/mm/fault.c:1465 > (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 > page_fault+0x34/0x40 > > read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69: > put_page+0x15a/0x1f0 > page_zonenum at include/linux/mm.h:923 > (inlined by) is_zone_device_page at include/linux/mm.h:929 > (inlined by) page_is_devmap_managed at include/linux/mm.h:948 > (inlined by) put_page at include/linux/mm.h:1023 > wp_page_copy+0x571/0x930 > wp_page_copy at mm/memory.c:2615 > do_wp_page+0x107/0x7b0 > __handle_mm_fault+0xcb0/0xd00 > handle_mm_fault+0xfc/0x2f0 > do_page_fault+0x263/0x6f9 > page_fault+0x34/0x40 > > Reported by Kernel Concurrency Sanitizer on: > CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6 > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 > > Both the read and write are done only with the non-exclusive mmap_sem > held. Since the read will check for specific bits (up to three bits for > now) in the flag, load tearing could in theory trigger a logic bug. > > To fix it, it could introduce put_page_lockless() in those places but > that could be an overkill, and difficult to use. Thus, just add > READ_ONCE() for the read in page_zonenum() for now where it should not > affect the performance and correctness with a small trade-off that > compilers might generate less efficient optimization in some places. > > Signed-off-by: Qian Cai <cai@lca.pw> > --- > include/linux/mm.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 52269e56c514..f8529aa971c0 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -920,7 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, > > static inline enum zone_type page_zonenum(const struct page *page) > { > - return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > + return (READ_ONCE(page->flags) >> ZONES_PGSHIFT) & ZONES_MASK; I can understand why other bits/flags might change, but not the zone number? Nobody should be changing that without heavy locking (out of memory hot(un)plug code). Or am I missing something? Can load tearing actually produce an issue if these 3 bits will never change?
On Thu, 2020-02-06 at 14:33 +0100, David Hildenbrand wrote: > On 06.02.20 14:17, Qian Cai wrote: > > page->flags could be accessed concurrently as noticied by KCSAN, > > > > BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page > > > > write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3: > > page_cpupid_xchg_last+0x51/0x80 > > page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11) > > wp_page_reuse+0x3e/0xc0 > > wp_page_reuse at mm/memory.c:2453 > > do_wp_page+0x472/0x7b0 > > do_wp_page at mm/memory.c:2798 > > __handle_mm_fault+0xcb0/0xd00 > > handle_pte_fault at mm/memory.c:4049 > > (inlined by) __handle_mm_fault at mm/memory.c:4163 > > handle_mm_fault+0xfc/0x2f0 > > handle_mm_fault at mm/memory.c:4200 > > do_page_fault+0x263/0x6f9 > > do_user_addr_fault at arch/x86/mm/fault.c:1465 > > (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 > > page_fault+0x34/0x40 > > > > read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69: > > put_page+0x15a/0x1f0 > > page_zonenum at include/linux/mm.h:923 > > (inlined by) is_zone_device_page at include/linux/mm.h:929 > > (inlined by) page_is_devmap_managed at include/linux/mm.h:948 > > (inlined by) put_page at include/linux/mm.h:1023 > > wp_page_copy+0x571/0x930 > > wp_page_copy at mm/memory.c:2615 > > do_wp_page+0x107/0x7b0 > > __handle_mm_fault+0xcb0/0xd00 > > handle_mm_fault+0xfc/0x2f0 > > do_page_fault+0x263/0x6f9 > > page_fault+0x34/0x40 > > > > Reported by Kernel Concurrency Sanitizer on: > > CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6 > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 > > > > Both the read and write are done only with the non-exclusive mmap_sem > > held. Since the read will check for specific bits (up to three bits for > > now) in the flag, load tearing could in theory trigger a logic bug. > > > > To fix it, it could introduce put_page_lockless() in those places but > > that could be an overkill, and difficult to use. Thus, just add > > READ_ONCE() for the read in page_zonenum() for now where it should not > > affect the performance and correctness with a small trade-off that > > compilers might generate less efficient optimization in some places. > > > > Signed-off-by: Qian Cai <cai@lca.pw> > > --- > > include/linux/mm.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 52269e56c514..f8529aa971c0 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -920,7 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, > > > > static inline enum zone_type page_zonenum(const struct page *page) > > { > > - return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > > + return (READ_ONCE(page->flags) >> ZONES_PGSHIFT) & ZONES_MASK; > > I can understand why other bits/flags might change, but not the zone > number? Nobody should be changing that without heavy locking (out of > memory hot(un)plug code). Or am I missing something? Can load tearing > actually produce an issue if these 3 bits will never change? > I think you are right in this case that the writer page_cpupid_xchg_last() will not change those 3 bits. I don't have an answer if other writers might race here. Until someone could chime in probably from ZONE_DEVICE side to prove that it otherwise, I'll just update the previous patch to mark it as a harmless data race.
On Thu 06-02-20 14:33:22, David Hildenbrand wrote: > On 06.02.20 14:17, Qian Cai wrote: > > page->flags could be accessed concurrently as noticied by KCSAN, > > > > BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page > > > > write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3: > > page_cpupid_xchg_last+0x51/0x80 > > page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11) > > wp_page_reuse+0x3e/0xc0 > > wp_page_reuse at mm/memory.c:2453 > > do_wp_page+0x472/0x7b0 > > do_wp_page at mm/memory.c:2798 > > __handle_mm_fault+0xcb0/0xd00 > > handle_pte_fault at mm/memory.c:4049 > > (inlined by) __handle_mm_fault at mm/memory.c:4163 > > handle_mm_fault+0xfc/0x2f0 > > handle_mm_fault at mm/memory.c:4200 > > do_page_fault+0x263/0x6f9 > > do_user_addr_fault at arch/x86/mm/fault.c:1465 > > (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 > > page_fault+0x34/0x40 > > > > read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69: > > put_page+0x15a/0x1f0 > > page_zonenum at include/linux/mm.h:923 > > (inlined by) is_zone_device_page at include/linux/mm.h:929 > > (inlined by) page_is_devmap_managed at include/linux/mm.h:948 > > (inlined by) put_page at include/linux/mm.h:1023 > > wp_page_copy+0x571/0x930 > > wp_page_copy at mm/memory.c:2615 > > do_wp_page+0x107/0x7b0 > > __handle_mm_fault+0xcb0/0xd00 > > handle_mm_fault+0xfc/0x2f0 > > do_page_fault+0x263/0x6f9 > > page_fault+0x34/0x40 > > > > Reported by Kernel Concurrency Sanitizer on: > > CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6 > > Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 > > > > Both the read and write are done only with the non-exclusive mmap_sem > > held. Since the read will check for specific bits (up to three bits for > > now) in the flag, load tearing could in theory trigger a logic bug. > > > > To fix it, it could introduce put_page_lockless() in those places but > > that could be an overkill, and difficult to use. Thus, just add > > READ_ONCE() for the read in page_zonenum() for now where it should not > > affect the performance and correctness with a small trade-off that > > compilers might generate less efficient optimization in some places. > > > > Signed-off-by: Qian Cai <cai@lca.pw> > > --- > > include/linux/mm.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 52269e56c514..f8529aa971c0 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -920,7 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, > > > > static inline enum zone_type page_zonenum(const struct page *page) > > { > > - return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > > + return (READ_ONCE(page->flags) >> ZONES_PGSHIFT) & ZONES_MASK; > > I can understand why other bits/flags might change, but not the zone > number? Nobody should be changing that without heavy locking (out of > memory hot(un)plug code). Or am I missing something? Can load tearing > actually produce an issue if these 3 bits will never change? I don't think the problem is real. The question is how to make KCSAN happy in a way that doesn't silence other possibly useful things it can find and also which makes it most obvious to the reader what's going on... IMHO using READ_ONCE() fulfills these targets nicely - it is free performance-wise in this case, it silences the checker without impacting other races on page->flags, its kind of obvious we don't want the load torn in this case so it makes sense to the reader (although a comment may be nice). Honza
On 06.02.20 15:55, Jan Kara wrote: > On Thu 06-02-20 14:33:22, David Hildenbrand wrote: >> On 06.02.20 14:17, Qian Cai wrote: >>> page->flags could be accessed concurrently as noticied by KCSAN, >>> >>> BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page >>> >>> write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3: >>> page_cpupid_xchg_last+0x51/0x80 >>> page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11) >>> wp_page_reuse+0x3e/0xc0 >>> wp_page_reuse at mm/memory.c:2453 >>> do_wp_page+0x472/0x7b0 >>> do_wp_page at mm/memory.c:2798 >>> __handle_mm_fault+0xcb0/0xd00 >>> handle_pte_fault at mm/memory.c:4049 >>> (inlined by) __handle_mm_fault at mm/memory.c:4163 >>> handle_mm_fault+0xfc/0x2f0 >>> handle_mm_fault at mm/memory.c:4200 >>> do_page_fault+0x263/0x6f9 >>> do_user_addr_fault at arch/x86/mm/fault.c:1465 >>> (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 >>> page_fault+0x34/0x40 >>> >>> read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69: >>> put_page+0x15a/0x1f0 >>> page_zonenum at include/linux/mm.h:923 >>> (inlined by) is_zone_device_page at include/linux/mm.h:929 >>> (inlined by) page_is_devmap_managed at include/linux/mm.h:948 >>> (inlined by) put_page at include/linux/mm.h:1023 >>> wp_page_copy+0x571/0x930 >>> wp_page_copy at mm/memory.c:2615 >>> do_wp_page+0x107/0x7b0 >>> __handle_mm_fault+0xcb0/0xd00 >>> handle_mm_fault+0xfc/0x2f0 >>> do_page_fault+0x263/0x6f9 >>> page_fault+0x34/0x40 >>> >>> Reported by Kernel Concurrency Sanitizer on: >>> CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6 >>> Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 >>> >>> Both the read and write are done only with the non-exclusive mmap_sem >>> held. Since the read will check for specific bits (up to three bits for >>> now) in the flag, load tearing could in theory trigger a logic bug. >>> >>> To fix it, it could introduce put_page_lockless() in those places but >>> that could be an overkill, and difficult to use. Thus, just add >>> READ_ONCE() for the read in page_zonenum() for now where it should not >>> affect the performance and correctness with a small trade-off that >>> compilers might generate less efficient optimization in some places. >>> >>> Signed-off-by: Qian Cai <cai@lca.pw> >>> --- >>> include/linux/mm.h | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>> index 52269e56c514..f8529aa971c0 100644 >>> --- a/include/linux/mm.h >>> +++ b/include/linux/mm.h >>> @@ -920,7 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, >>> >>> static inline enum zone_type page_zonenum(const struct page *page) >>> { >>> - return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; >>> + return (READ_ONCE(page->flags) >> ZONES_PGSHIFT) & ZONES_MASK; >> >> I can understand why other bits/flags might change, but not the zone >> number? Nobody should be changing that without heavy locking (out of >> memory hot(un)plug code). Or am I missing something? Can load tearing >> actually produce an issue if these 3 bits will never change? > > I don't think the problem is real. The question is how to make KCSAN happy > in a way that doesn't silence other possibly useful things it can find and > also which makes it most obvious to the reader what's going on... IMHO > using READ_ONCE() fulfills these targets nicely - it is free > performance-wise in this case, it silences the checker without impacting > other races on page->flags, its kind of obvious we don't want the load torn > in this case so it makes sense to the reader (although a comment may be > nice). Yes, I can understand that - but reading the other comments, I think I am not the only one who doesn't like to see the code getting uglyfied (sorry, but that's what it is) and harder to read just to make some code checker happy. Having that said, I don't really care. But I do think that expressing "this is not possible but only to make $TOOL happy" in the commit message is worth having. The patch description should be reworked.
> On Feb 6, 2020, at 9:55 AM, Jan Kara <jack@suse.cz> wrote: > > I don't think the problem is real. The question is how to make KCSAN happy > in a way that doesn't silence other possibly useful things it can find and > also which makes it most obvious to the reader what's going on... IMHO > using READ_ONCE() fulfills these targets nicely - it is free > performance-wise in this case, it silences the checker without impacting > other races on page->flags, its kind of obvious we don't want the load torn > in this case so it makes sense to the reader (although a comment may be > nice). Actually, use the data_race() macro there fulfilling the same purpose too, i.e, silence the splat here but still keep searching for other races.
On 2/6/20 7:23 AM, Qian Cai wrote: > > >> On Feb 6, 2020, at 9:55 AM, Jan Kara <jack@suse.cz> wrote: >> >> I don't think the problem is real. The question is how to make KCSAN happy >> in a way that doesn't silence other possibly useful things it can find and >> also which makes it most obvious to the reader what's going on... IMHO >> using READ_ONCE() fulfills these targets nicely - it is free >> performance-wise in this case, it silences the checker without impacting >> other races on page->flags, its kind of obvious we don't want the load torn >> in this case so it makes sense to the reader (although a comment may be >> nice). > > Actually, use the data_race() macro there fulfilling the same purpose too, i.e, silence the splat here but still keep searching for other races. > Yes, but both READ_ONCE() and data_race() would be saying untrue things about this code, and that somewhat offends my sense of perfection... :) * READ_ONCE(): this field need not be restricted to being read only once, so the name is immediately wrong. We're using side effects of READ_ONCE(). * data_race(): there is no race on the N bits worth of page zone number data. There is only a perceived race, due to tools that look at word-level granularity. I'd propose one or both of the following: a) Hope that Marcus has an idea to enhance KCSAN so as to support this model of access, and/or b) Add a new, better-named macro to indicate what's going on. Initial bikeshed-able candidates: READ_RO_BITS() READ_IMMUTABLE_BITS() ...etc... thanks,
On 2/6/20 3:34 PM, John Hubbard wrote: > On 2/6/20 7:23 AM, Qian Cai wrote: >> >> >>> On Feb 6, 2020, at 9:55 AM, Jan Kara <jack@suse.cz> wrote: >>> >>> I don't think the problem is real. The question is how to make KCSAN happy >>> in a way that doesn't silence other possibly useful things it can find and >>> also which makes it most obvious to the reader what's going on... IMHO >>> using READ_ONCE() fulfills these targets nicely - it is free >>> performance-wise in this case, it silences the checker without impacting >>> other races on page->flags, its kind of obvious we don't want the load torn >>> in this case so it makes sense to the reader (although a comment may be >>> nice). >> >> Actually, use the data_race() macro there fulfilling the same purpose too, i.e, silence the splat here but still keep searching for other races. >> > > Yes, but both READ_ONCE() and data_race() would be saying untrue things about this code, > and that somewhat offends my sense of perfection... :) > > * READ_ONCE(): this field need not be restricted to being read only once, so the > name is immediately wrong. We're using side effects of READ_ONCE(). > > * data_race(): there is no race on the N bits worth of page zone number data. There > is only a perceived race, due to tools that look at word-level granularity. > > I'd propose one or both of the following: > > a) Hope that Marcus has an idea to enhance KCSAN so as to support this model of > access, and/or > arggh, make that "Marco Elver"! Really sorry for the typo on your name. thanks,
> On Feb 6, 2020, at 6:34 PM, John Hubbard <jhubbard@nvidia.com> wrote: > > * data_race(): there is no race on the N bits worth of page zone number data. There > is only a perceived race, due to tools that look at word-level granularity. I’d like to think this from the compiler level. There is a data race from the compiler load and store point of view. It is just in the case it is harmless to affect the actual code logic.
> On Feb 6, 2020, at 6:34 PM, John Hubbard <jhubbard@nvidia.com> wrote: > > On 2/6/20 7:23 AM, Qian Cai wrote: >> >> >>> On Feb 6, 2020, at 9:55 AM, Jan Kara <jack@suse.cz> wrote: >>> >>> I don't think the problem is real. The question is how to make KCSAN happy >>> in a way that doesn't silence other possibly useful things it can find and >>> also which makes it most obvious to the reader what's going on... IMHO >>> using READ_ONCE() fulfills these targets nicely - it is free >>> performance-wise in this case, it silences the checker without impacting >>> other races on page->flags, its kind of obvious we don't want the load torn >>> in this case so it makes sense to the reader (although a comment may be >>> nice). >> >> Actually, use the data_race() macro there fulfilling the same purpose too, i.e, silence the splat here but still keep searching for other races. >> > > Yes, but both READ_ONCE() and data_race() would be saying untrue things about this code, > and that somewhat offends my sense of perfection... :) > > * READ_ONCE(): this field need not be restricted to being read only once, so the > name is immediately wrong. We're using side effects of READ_ONCE(). > > * data_race(): there is no race on the N bits worth of page zone number data. There > is only a perceived race, due to tools that look at word-level granularity. > > I'd propose one or both of the following: > > a) Hope that Marcus has an idea to enhance KCSAN so as to support this model of > access, and/or A similar thing was brought up before, i.e., anything compared to zero is immune to load-tearing issues, but it is rather difficult to implement it in the compiler, so it was settled to use data_race(), https://lore.kernel.org/lkml/CANpmjNN8J1oWtLPHTgCwbbtTuU_Js-8HD=cozW5cYkm8h-GTBg@mail.gmail.com/#r > > b) Add a new, better-named macro to indicate what's going on. Initial bikeshed-able > candidates: > > READ_RO_BITS() > READ_IMMUTABLE_BITS() > ...etc... > Actually, Linus might hate those kinds of complication rather than a simple data_race() macro, https://lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/
On 2/6/20 4:18 PM, Qian Cai wrote: >> On Feb 6, 2020, at 6:34 PM, John Hubbard <jhubbard@nvidia.com> wrote: >> On 2/6/20 7:23 AM, Qian Cai wrote: >>>> On Feb 6, 2020, at 9:55 AM, Jan Kara <jack@suse.cz> wrote: >>>> I don't think the problem is real. The question is how to make KCSAN happy >>>> in a way that doesn't silence other possibly useful things it can find and >>>> also which makes it most obvious to the reader what's going on... IMHO >>>> using READ_ONCE() fulfills these targets nicely - it is free >>>> performance-wise in this case, it silences the checker without impacting >>>> other races on page->flags, its kind of obvious we don't want the load torn >>>> in this case so it makes sense to the reader (although a comment may be >>>> nice). >>> >>> Actually, use the data_race() macro there fulfilling the same purpose too, i.e, silence the splat here but still keep searching for other races. >>> >> >> Yes, but both READ_ONCE() and data_race() would be saying untrue things about this code, >> and that somewhat offends my sense of perfection... :) >> >> * READ_ONCE(): this field need not be restricted to being read only once, so the >> name is immediately wrong. We're using side effects of READ_ONCE(). >> >> * data_race(): there is no race on the N bits worth of page zone number data. There >> is only a perceived race, due to tools that look at word-level granularity. >> >> I'd propose one or both of the following: >> >> a) Hope that Marco (I've fixed the typo in his name. --jh) has an idea to enhance KCSAN so as to support this model of >> access, and/or > > A similar thing was brought up before, i.e., anything compared to zero is immune to load-tearing > issues, but it is rather difficult to implement it in the compiler, so it was settled to use data_race(), > > https://lore.kernel.org/lkml/CANpmjNN8J1oWtLPHTgCwbbtTuU_Js-8HD=cozW5cYkm8h-GTBg@mail.gmail.com/#r > Thanks for that link to the previous discussion, good context. >> >> b) Add a new, better-named macro to indicate what's going on. Initial bikeshed-able >> candidates: >> >> READ_RO_BITS() >> READ_IMMUTABLE_BITS() >> ...etc... >> > > Actually, Linus might hate those kinds of complication rather than a simple data_race() macro, > > https://lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/ > Another good link. However, my macros above haven't been proposed yet, and I'm perfectly comfortable proposing something that Linus *might* (or might not!) hate. No point in guessing about it, IMHO. If you want, I'll be happy to put on my flame suit and post a patchset proposing READ_IMMUTABLE_BITS() (or a better-named thing, if someone has another name idea). :) thanks,
> On Feb 6, 2020, at 7:27 PM, John Hubbard <jhubbard@nvidia.com> wrote: > > On 2/6/20 4:18 PM, Qian Cai wrote: >>> On Feb 6, 2020, at 6:34 PM, John Hubbard <jhubbard@nvidia.com> wrote: >>> On 2/6/20 7:23 AM, Qian Cai wrote: >>>>> On Feb 6, 2020, at 9:55 AM, Jan Kara <jack@suse.cz> wrote: >>>>> I don't think the problem is real. The question is how to make KCSAN happy >>>>> in a way that doesn't silence other possibly useful things it can find and >>>>> also which makes it most obvious to the reader what's going on... IMHO >>>>> using READ_ONCE() fulfills these targets nicely - it is free >>>>> performance-wise in this case, it silences the checker without impacting >>>>> other races on page->flags, its kind of obvious we don't want the load torn >>>>> in this case so it makes sense to the reader (although a comment may be >>>>> nice). >>>> >>>> Actually, use the data_race() macro there fulfilling the same purpose too, i.e, silence the splat here but still keep searching for other races. >>>> >>> >>> Yes, but both READ_ONCE() and data_race() would be saying untrue things about this code, >>> and that somewhat offends my sense of perfection... :) >>> >>> * READ_ONCE(): this field need not be restricted to being read only once, so the >>> name is immediately wrong. We're using side effects of READ_ONCE(). >>> >>> * data_race(): there is no race on the N bits worth of page zone number data. There >>> is only a perceived race, due to tools that look at word-level granularity. >>> >>> I'd propose one or both of the following: >>> >>> a) Hope that Marco (I've fixed the typo in his name. --jh) has an idea to enhance KCSAN so as to support this model of >>> access, and/or >> >> A similar thing was brought up before, i.e., anything compared to zero is immune to load-tearing >> issues, but it is rather difficult to implement it in the compiler, so it was settled to use data_race(), >> >> https://lore.kernel.org/lkml/CANpmjNN8J1oWtLPHTgCwbbtTuU_Js-8HD=cozW5cYkm8h-GTBg@mail.gmail.com/#r >> > > > Thanks for that link to the previous discussion, good context. > > >>> >>> b) Add a new, better-named macro to indicate what's going on. Initial bikeshed-able >>> candidates: >>> >>> READ_RO_BITS() >>> READ_IMMUTABLE_BITS() >>> ...etc... >>> >> >> Actually, Linus might hate those kinds of complication rather than a simple data_race() macro, >> >> https://lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/ >> > > Another good link. However, my macros above haven't been proposed yet, and I'm perfectly > comfortable proposing something that Linus *might* (or might not!) hate. No point in > guessing about it, IMHO. > > If you want, I'll be happy to put on my flame suit and post a patchset proposing > READ_IMMUTABLE_BITS() (or a better-named thing, if someone has another name idea). :) > BTW, the current comment said (note, it is called “access” which in this case it does read the whole word rather than those 3 bits, even though it is only those bits are of interested for us), /* * data_race(): macro to document that accesses in an expression may conflict with * other concurrent accesses resulting in data races, but the resulting * behaviour is deemed safe regardless. * * This macro *does not* affect normal code generation, but is a hint to tooling * that data races here should be ignored. */ Macro might have more to say.
On Fri, 7 Feb 2020 at 01:55, Qian Cai <cai@lca.pw> wrote: > > > On Feb 6, 2020, at 7:27 PM, John Hubbard <jhubbard@nvidia.com> wrote: > > > > On 2/6/20 4:18 PM, Qian Cai wrote: > >>> On Feb 6, 2020, at 6:34 PM, John Hubbard <jhubbard@nvidia.com> wrote: > >>> On 2/6/20 7:23 AM, Qian Cai wrote: > >>>>> On Feb 6, 2020, at 9:55 AM, Jan Kara <jack@suse.cz> wrote: > >>>>> I don't think the problem is real. The question is how to make KCSAN happy > >>>>> in a way that doesn't silence other possibly useful things it can find and > >>>>> also which makes it most obvious to the reader what's going on... IMHO > >>>>> using READ_ONCE() fulfills these targets nicely - it is free > >>>>> performance-wise in this case, it silences the checker without impacting > >>>>> other races on page->flags, its kind of obvious we don't want the load torn > >>>>> in this case so it makes sense to the reader (although a comment may be > >>>>> nice). > >>>> > >>>> Actually, use the data_race() macro there fulfilling the same purpose too, i.e, silence the splat here but still keep searching for other races. > >>>> > >>> > >>> Yes, but both READ_ONCE() and data_race() would be saying untrue things about this code, > >>> and that somewhat offends my sense of perfection... :) > >>> > >>> * READ_ONCE(): this field need not be restricted to being read only once, so the > >>> name is immediately wrong. We're using side effects of READ_ONCE(). > >>> > >>> * data_race(): there is no race on the N bits worth of page zone number data. There > >>> is only a perceived race, due to tools that look at word-level granularity. > >>> > >>> I'd propose one or both of the following: > >>> > >>> a) Hope that Marco (I've fixed the typo in his name. --jh) has an idea to enhance KCSAN so as to support this model of > >>> access, and/or From the other thread: On Fri, 7 Feb 2020 at 00:18, John Hubbard <jhubbard@nvidia.com> wrote: > > Yes. I'm grasping at straws now, but...what about the idiom that page_zonenum() > uses: a set of bits that are "always" (after a certain early point) read-only? > What are your thoughts on that? Without annotations it's hard to tell. The problem is that the compiler can still emit a word-sized load, even if you're just checking 1 bit. The instrumentation emitted for KCSAN only cares about loads/stores, where access size is in number of bytes and not bits, since that's what the compiler has to emit. So, strictly speaking these are data races: concurrent reads / writes where at least one access is plain. With the above caveat out of the way, we already have the following defaults in KCSAN (after similar requests): 1. KCSAN ignores same-value stores, i.e. races with writes that appear to write the same value do not result in data race reports. 2. KCSAN does not demand aligned writes (including things like 'var++' if there are no concurrent writers) up to word size to be marked (with READ_ONCE etc.), assuming there is no way for the compiler to screw these up. [I still recommend writes to be marked though, if at all possible, because I'm still not entirely convinced it's always safe!] So, because of (2), KCSAN will not complain if you have something like 'flags |= SOME_FLAG' (where the read is marked). Because of (1), it'll still complain about 'flags & SOME_FLAG' though, since the load is not marked, and only sees this is a word-sized access (assuming flags is a long) where a bit changed. I don't think it's possible to easily convey to KCSAN which bits of an access you only care about, so that we could extend (1). Ideas? > >> A similar thing was brought up before, i.e., anything compared to zero is immune to load-tearing > >> issues, but it is rather difficult to implement it in the compiler, so it was settled to use data_race(), > >> > >> https://lore.kernel.org/lkml/CANpmjNN8J1oWtLPHTgCwbbtTuU_Js-8HD=cozW5cYkm8h-GTBg@mail.gmail.com/#r > >> > > > > Thanks for that link to the previous discussion, good context. > > > >>> > >>> b) Add a new, better-named macro to indicate what's going on. Initial bikeshed-able > >>> candidates: > >>> > >>> READ_RO_BITS() > >>> READ_IMMUTABLE_BITS() > >>> ...etc... > >>> This could work, but 'READ_BITS()' is enough, if KCSAN's same-value filter is default on anyway. Although my preference is also to avoid more macros if possible. > >> Actually, Linus might hate those kinds of complication rather than a simple data_race() macro, > >> > >> https://lore.kernel.org/linux-fsdevel/CAHk-=wg5CkOEF8DTez1Qu0XTEFw_oHhxN98bDnFqbY7HL5AB2g@mail.gmail.com/ > >> > > > > Another good link. However, my macros above haven't been proposed yet, and I'm perfectly > > comfortable proposing something that Linus *might* (or might not!) hate. No point in > > guessing about it, IMHO. > > > > If you want, I'll be happy to put on my flame suit and post a patchset proposing > > READ_IMMUTABLE_BITS() (or a better-named thing, if someone has another name idea). :) > > > > BTW, the current comment said (note, it is called “access” which in this case it does read the whole word > rather than those 3 bits, even though it is only those bits are of interested for us), > > /* > * data_race(): macro to document that accesses in an expression may conflict with > * other concurrent accesses resulting in data races, but the resulting > * behaviour is deemed safe regardless. > * > * This macro *does not* affect normal code generation, but is a hint to tooling > * that data races here should be ignored. > */ > > Macro might have more to say. I agree that data_race() would be the most suitable here, since technically it's still a data race. It just so happens that we end up "throwing away" most of the bits of the access, and just care about a few bits. Thanks, -- Marco
On 2/7/20 5:17 AM, Marco Elver wrote: ... >> Yes. I'm grasping at straws now, but...what about the idiom that page_zonenum() >> uses: a set of bits that are "always" (after a certain early point) read-only? >> What are your thoughts on that? > > Without annotations it's hard to tell. The problem is that the > compiler can still emit a word-sized load, even if you're just > checking 1 bit. The instrumentation emitted for KCSAN only cares about > loads/stores, where access size is in number of bytes and not bits, > since that's what the compiler has to emit. So, strictly speaking > these are data races: concurrent reads / writes where at least one > access is plain. > > With the above caveat out of the way, we already have the following > defaults in KCSAN (after similar requests): > 1. KCSAN ignores same-value stores, i.e. races with writes that appear > to write the same value do not result in data race reports. > 2. KCSAN does not demand aligned writes (including things like 'var++' > if there are no concurrent writers) up to word size to be marked (with > READ_ONCE etc.), assuming there is no way for the compiler to screw > these up. [I still recommend writes to be marked though, if at all > possible, because I'm still not entirely convinced it's always safe!] > > So, because of (2), KCSAN will not complain if you have something like > 'flags |= SOME_FLAG' (where the read is marked). Because of (1), it'll > still complain about 'flags & SOME_FLAG' though, since the load is not > marked, and only sees this is a word-sized access (assuming flags is a > long) where a bit changed. > > I don't think it's possible to easily convey to KCSAN which bits of an > access you only care about, so that we could extend (1). Ideas? I'm drawing a blank as far as easy ways forward, so I'm just going accept the compiler word-level constraints as a "given". I was hoping maybe you had some magic available, just checking. :) > >>>> A similar thing was brought up before, i.e., anything compared to zero is immune to load-tearing >>>> issues, but it is rather difficult to implement it in the compiler, so it was settled to use data_race(), >>>> >>>> https://lore.kernel.org/lkml/CANpmjNN8J1oWtLPHTgCwbbtTuU_Js-8HD=cozW5cYkm8h-GTBg@mail.gmail.com/#r >>>> >>> >>> Thanks for that link to the previous discussion, good context. >>> >>>>> >>>>> b) Add a new, better-named macro to indicate what's going on. Initial bikeshed-able >>>>> candidates: >>>>> >>>>> READ_RO_BITS() >>>>> READ_IMMUTABLE_BITS() >>>>> ...etc... >>>>> > > This could work, but 'READ_BITS()' is enough, if KCSAN's same-value > filter is default on anyway. Although my preference is also to avoid > more macros if possible. So it looks like we're probably stuck with having to annotate the code. Given that, there is a balance between how many macros, and how much commenting. For example, if there is a single macro (data_race, for example), then we'll need to add comments for the various cases, explaining which data_race situation is happening. That's still true, but to a lesser extent if more macros are added. In this case, I suspect that READ_BITS() makes the commenting easier and shorter. So I'd tentatively lead towards adding it, but what do others on the list think? thanks,
> On Feb 8, 2020, at 8:44 PM, John Hubbard <jhubbard@nvidia.com> wrote: > > So it looks like we're probably stuck with having to annotate the code. Given > that, there is a balance between how many macros, and how much commenting. For > example, if there is a single macro (data_race, for example), then we'll need to > add comments for the various cases, explaining which data_race situation is > happening. On the other hand, it is perfect fine of not commenting on each data_race() that most of times, people could run git blame to learn more details. Actually, no maintainers from various of subsystems asked for commenting so far. > > That's still true, but to a lesser extent if more macros are added. In this case, > I suspect that READ_BITS() makes the commenting easier and shorter. So I'd tentatively > lead towards adding it, but what do others on the list think? Even read bits could be dangerous from data races and confusing at best, so I am not really sure what the value of introducing this new macro. People who like to understand it correctly still need to read the commit logs. This flags->zonenum is such a special case that I don’t really see it regularly for the last few weeks digging KCSAN reports, so even if it is worth adding READ_BITS(), there are more equally important macros need to be added together to be useful initially. For example, HARMLESS_COUNTERS(), READ_SINGLE_BIT(), READ_IMMUTATABLE_BITS() etc which Linus said exactly wanted to avoid.
On 2/8/20 7:10 PM, Qian Cai wrote: > > >> On Feb 8, 2020, at 8:44 PM, John Hubbard <jhubbard@nvidia.com> wrote: >> >> So it looks like we're probably stuck with having to annotate the code. Given >> that, there is a balance between how many macros, and how much commenting. For >> example, if there is a single macro (data_race, for example), then we'll need to >> add comments for the various cases, explaining which data_race situation is >> happening. > > On the other hand, it is perfect fine of not commenting on each data_race() that most of times, people could run git blame to learn more details. Actually, no maintainers from various of subsystems asked for commenting so far. > Well, maybe I'm looking at this wrong. I was thinking that one should attempt to understand the code on the screen, and that's generally best--but here, maybe "data_race" is just something that means "tool cruft", really. So mentally we would move toward visually filtering out the data_race "key word". I really don't like it but at least there is a significant benefit from the tool that probably makes it worth the visual noise. Blue sky thoughts for The Far Future: It would be nice if the tools got a lot better--maybe in the direction of C language extensions, even if only used in this project at first. thanks,
On Sun, 9 Feb 2020 at 08:15, John Hubbard <jhubbard@nvidia.com> wrote: > > On 2/8/20 7:10 PM, Qian Cai wrote: > > > > > >> On Feb 8, 2020, at 8:44 PM, John Hubbard <jhubbard@nvidia.com> wrote: > >> > >> So it looks like we're probably stuck with having to annotate the code. Given > >> that, there is a balance between how many macros, and how much commenting. For > >> example, if there is a single macro (data_race, for example), then we'll need to > >> add comments for the various cases, explaining which data_race situation is > >> happening. > > > > On the other hand, it is perfect fine of not commenting on each data_race() that most of times, people could run git blame to learn more details. Actually, no maintainers from various of subsystems asked for commenting so far. > > > > Well, maybe I'm looking at this wrong. I was thinking that one should attempt to > understand the code on the screen, and that's generally best--but here, maybe > "data_race" is just something that means "tool cruft", really. So mentally we > would move toward visually filtering out the data_race "key word". One thing to note is that 'data_race()' points out concurrency, and that somebody has deemed that the code won't break even with data races. Somebody trying to understand or modify the code should ensure this will still be the case. So, 'data_race()' isn't just tool cruft. It's documentation for something that really isn't obvious from the code alone. Whenever we see a READ_ONCE or other marked access it is obvious to the reader that there are concurrent accesses happening. I'd argue that for intentional data races, we should convey similar information, to avoid breaking the code (of course KCSAN would tell you, but only after the change was done). Even moreso, since changes to code involving 'data_race()' will need re-verification that the data races are still safe. > I really don't like it but at least there is a significant benefit from the tool > that probably makes it worth the visual noise. > > Blue sky thoughts for The Far Future: It would be nice if the tools got a lot > better--maybe in the direction of C language extensions, even if only used in > this project at first. Still thinking about this. What we want to convey is that, while there are races on the particular variable, nobody should be modifying the bits here. Adding a READ_ONCE (or data_race()) would miss a harmful race where somebody modifies these bits, so in principle I agree. However, I think the tool can't automatically tell (even if we had compiler extensions to give us the bits accessed) which bits we care about, because we might have something like: int foo_bar = READ_ONCE(flags) >> FOO_BAR_SHIFT; // need the READ_ONCE because of FOO bits .. (foo_bar & FOO_MASK) .. // FOO bits can be modified concurrently .. (foo_bar & BAR_MASK) .. // nobody should modify BAR bits concurrently though ! What we want is to assert that nobody touches a particular set of bits. KCSAN has recently gotten ASSERT_EXCLUSIVE_{WRITER,ACCESS} macros which help assert properties of concurrent code, where bugs won't manifest as data races. Along those lines, I can see the value in doing an exclusivity check on a bitmask of a variable. I don't know how much a READ_BITS macro could help, since it's probably less ergonomic to have to say something like: READ_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT) >> ZONES_PGSHIFT. Here is an alternative: Let's say KCSAN gives you this: /* ... Assert that the bits set in mask are not written concurrently; they may still be read concurrently. The access that immediately follows is assumed to access those bits and safe w.r.t. data races. For example, this may be used when certain bits of @flags may only be modified when holding the appropriate lock, but other bits may still be modified locklessly. ... */ #define ASSERT_EXCLUSIVE_BITS(flags, mask) .... Then we can write page_zonenum as follows: static inline enum zone_type page_zonenum(const struct page *page) { + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; } This will accomplish the following: 1. The current code is not touched, and we do not have to verify that the change is correct without KCSAN. 2. We're not introducing a bunch of special macros to read bits in various ways. 3. KCSAN will assume that the access is safe, and no data race report is generated. 4. If somebody modifies ZONES bits concurrently, KCSAN will tell you about the race. 5. We're documenting the code. Anything I missed? Thanks, -- Marco > thanks, > -- > John Hubbard > NVIDIA > > >> > >> That's still true, but to a lesser extent if more macros are added. In this case, > >> I suspect that READ_BITS() makes the commenting easier and shorter. So I'd tentatively > >> lead towards adding it, but what do others on the list think? > > > > Even read bits could be dangerous from data races and confusing at best, so I am not really sure what the value of introducing this new macro. People who like to understand it correctly still need to read the commit logs. > > > > This flags->zonenum is such a special case that I don’t really see it regularly for the last few weeks digging KCSAN reports, so even if it is worth adding READ_BITS(), there are more equally important macros need to be added together to be useful initially. For example, HARMLESS_COUNTERS(), READ_SINGLE_BIT(), READ_IMMUTATABLE_BITS() etc which Linus said exactly wanted to avoid. > >
> On Feb 10, 2020, at 2:48 AM, Marco Elver <elver@google.com> wrote: > > Here is an alternative: > > Let's say KCSAN gives you this: > /* ... Assert that the bits set in mask are not written > concurrently; they may still be read concurrently. > The access that immediately follows is assumed to access those > bits and safe w.r.t. data races. > > For example, this may be used when certain bits of @flags may > only be modified when holding the appropriate lock, > but other bits may still be modified locklessly. > ... > */ > #define ASSERT_EXCLUSIVE_BITS(flags, mask) .... > > Then we can write page_zonenum as follows: > > static inline enum zone_type page_zonenum(const struct page *page) > { > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > } > > This will accomplish the following: > 1. The current code is not touched, and we do not have to verify that > the change is correct without KCSAN. > 2. We're not introducing a bunch of special macros to read bits in various ways. > 3. KCSAN will assume that the access is safe, and no data race report > is generated. > 4. If somebody modifies ZONES bits concurrently, KCSAN will tell you > about the race. > 5. We're documenting the code. > > Anything I missed? I don’t know. Having to write the same line twice does not feel me any better than data_race() with commenting occasionally.
On Mon, 10 Feb 2020 at 13:16, Qian Cai <cai@lca.pw> wrote: > > > > > On Feb 10, 2020, at 2:48 AM, Marco Elver <elver@google.com> wrote: > > > > Here is an alternative: > > > > Let's say KCSAN gives you this: > > /* ... Assert that the bits set in mask are not written > > concurrently; they may still be read concurrently. > > The access that immediately follows is assumed to access those > > bits and safe w.r.t. data races. > > > > For example, this may be used when certain bits of @flags may > > only be modified when holding the appropriate lock, > > but other bits may still be modified locklessly. > > ... > > */ > > #define ASSERT_EXCLUSIVE_BITS(flags, mask) .... > > > > Then we can write page_zonenum as follows: > > > > static inline enum zone_type page_zonenum(const struct page *page) > > { > > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); > > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > > } > > > > This will accomplish the following: > > 1. The current code is not touched, and we do not have to verify that > > the change is correct without KCSAN. > > 2. We're not introducing a bunch of special macros to read bits in various ways. > > 3. KCSAN will assume that the access is safe, and no data race report > > is generated. > > 4. If somebody modifies ZONES bits concurrently, KCSAN will tell you > > about the race. > > 5. We're documenting the code. > > > > Anything I missed? > > I don’t know. Having to write the same line twice does not feel me any better than data_race() with commenting occasionally. Point 4 above: While data_race() will ignore cause KCSAN to not report the data race, now you might be missing a real bug: if somebody concurrently modifies the bits accessed, you want to know about it! Either way, it's up to you to add the ASSERT_EXCLUSIVE_BITS, but just remember that if you decide to silence it with data_race(), you need to be sure there are no concurrent writers to those bits. There is no way to automatically infer all over the kernel which bits we care about, and the most reliable is to be explicit about it. I don't see a problem with it per se. Thanks, -- Marco
On Mon, 2020-02-10 at 13:58 +0100, Marco Elver wrote: > On Mon, 10 Feb 2020 at 13:16, Qian Cai <cai@lca.pw> wrote: > > > > > > > > > On Feb 10, 2020, at 2:48 AM, Marco Elver <elver@google.com> wrote: > > > > > > Here is an alternative: > > > > > > Let's say KCSAN gives you this: > > > /* ... Assert that the bits set in mask are not written > > > concurrently; they may still be read concurrently. > > > The access that immediately follows is assumed to access those > > > bits and safe w.r.t. data races. > > > > > > For example, this may be used when certain bits of @flags may > > > only be modified when holding the appropriate lock, > > > but other bits may still be modified locklessly. > > > ... > > > */ > > > #define ASSERT_EXCLUSIVE_BITS(flags, mask) .... > > > > > > Then we can write page_zonenum as follows: > > > > > > static inline enum zone_type page_zonenum(const struct page *page) > > > { > > > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); > > > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > > > } > > > > > > This will accomplish the following: > > > 1. The current code is not touched, and we do not have to verify that > > > the change is correct without KCSAN. > > > 2. We're not introducing a bunch of special macros to read bits in various ways. > > > 3. KCSAN will assume that the access is safe, and no data race report > > > is generated. > > > 4. If somebody modifies ZONES bits concurrently, KCSAN will tell you > > > about the race. > > > 5. We're documenting the code. > > > > > > Anything I missed? > > > > I don’t know. Having to write the same line twice does not feel me any better than data_race() with commenting occasionally. > > Point 4 above: While data_race() will ignore cause KCSAN to not report > the data race, now you might be missing a real bug: if somebody > concurrently modifies the bits accessed, you want to know about it! > Either way, it's up to you to add the ASSERT_EXCLUSIVE_BITS, but just > remember that if you decide to silence it with data_race(), you need > to be sure there are no concurrent writers to those bits. Right, in this case, there is no concurrent writers to those bits, so I'll add a comment should be sufficient. However, I'll keep ASSERT_EXCLUSIVE_BITS() in mind for other places. > > There is no way to automatically infer all over the kernel which bits > we care about, and the most reliable is to be explicit about it. I > don't see a problem with it per se. > > Thanks, > -- Marco
On Mon, 10 Feb 2020 at 14:36, Qian Cai <cai@lca.pw> wrote: > > On Mon, 2020-02-10 at 13:58 +0100, Marco Elver wrote: > > On Mon, 10 Feb 2020 at 13:16, Qian Cai <cai@lca.pw> wrote: > > > > > > > > > > > > > On Feb 10, 2020, at 2:48 AM, Marco Elver <elver@google.com> wrote: > > > > > > > > Here is an alternative: > > > > > > > > Let's say KCSAN gives you this: > > > > /* ... Assert that the bits set in mask are not written > > > > concurrently; they may still be read concurrently. > > > > The access that immediately follows is assumed to access those > > > > bits and safe w.r.t. data races. > > > > > > > > For example, this may be used when certain bits of @flags may > > > > only be modified when holding the appropriate lock, > > > > but other bits may still be modified locklessly. > > > > ... > > > > */ > > > > #define ASSERT_EXCLUSIVE_BITS(flags, mask) .... > > > > > > > > Then we can write page_zonenum as follows: > > > > > > > > static inline enum zone_type page_zonenum(const struct page *page) > > > > { > > > > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); > > > > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > > > > } > > > > > > > > This will accomplish the following: > > > > 1. The current code is not touched, and we do not have to verify that > > > > the change is correct without KCSAN. > > > > 2. We're not introducing a bunch of special macros to read bits in various ways. > > > > 3. KCSAN will assume that the access is safe, and no data race report > > > > is generated. > > > > 4. If somebody modifies ZONES bits concurrently, KCSAN will tell you > > > > about the race. > > > > 5. We're documenting the code. > > > > > > > > Anything I missed? > > > > > > I don’t know. Having to write the same line twice does not feel me any better than data_race() with commenting occasionally. > > > > Point 4 above: While data_race() will ignore cause KCSAN to not report > > the data race, now you might be missing a real bug: if somebody > > concurrently modifies the bits accessed, you want to know about it! > > Either way, it's up to you to add the ASSERT_EXCLUSIVE_BITS, but just > > remember that if you decide to silence it with data_race(), you need > > to be sure there are no concurrent writers to those bits. > > Right, in this case, there is no concurrent writers to those bits, so I'll add a > comment should be sufficient. However, I'll keep ASSERT_EXCLUSIVE_BITS() in mind > for other places. Right now there are no concurrent writers to those bits. But somebody might introduce a bug that will write them, even though they shouldn't have. With ASSERT_EXCLUSIVE_BITS() you can catch that. Once I have the patches for this out, I would consider adding it here for this reason. > > > > There is no way to automatically infer all over the kernel which bits > > we care about, and the most reliable is to be explicit about it. I > > don't see a problem with it per se. > > > > Thanks, > > -- Marco
On Mon, 2020-02-10 at 14:38 +0100, Marco Elver wrote: > On Mon, 10 Feb 2020 at 14:36, Qian Cai <cai@lca.pw> wrote: > > > > On Mon, 2020-02-10 at 13:58 +0100, Marco Elver wrote: > > > On Mon, 10 Feb 2020 at 13:16, Qian Cai <cai@lca.pw> wrote: > > > > > > > > > > > > > > > > > On Feb 10, 2020, at 2:48 AM, Marco Elver <elver@google.com> wrote: > > > > > > > > > > Here is an alternative: > > > > > > > > > > Let's say KCSAN gives you this: > > > > > /* ... Assert that the bits set in mask are not written > > > > > concurrently; they may still be read concurrently. > > > > > The access that immediately follows is assumed to access those > > > > > bits and safe w.r.t. data races. > > > > > > > > > > For example, this may be used when certain bits of @flags may > > > > > only be modified when holding the appropriate lock, > > > > > but other bits may still be modified locklessly. > > > > > ... > > > > > */ > > > > > #define ASSERT_EXCLUSIVE_BITS(flags, mask) .... > > > > > > > > > > Then we can write page_zonenum as follows: > > > > > > > > > > static inline enum zone_type page_zonenum(const struct page *page) > > > > > { > > > > > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); > > > > > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > > > > > } > > > > > > > > > > This will accomplish the following: > > > > > 1. The current code is not touched, and we do not have to verify that > > > > > the change is correct without KCSAN. > > > > > 2. We're not introducing a bunch of special macros to read bits in various ways. > > > > > 3. KCSAN will assume that the access is safe, and no data race report > > > > > is generated. > > > > > 4. If somebody modifies ZONES bits concurrently, KCSAN will tell you > > > > > about the race. > > > > > 5. We're documenting the code. > > > > > > > > > > Anything I missed? > > > > > > > > I don’t know. Having to write the same line twice does not feel me any better than data_race() with commenting occasionally. > > > > > > Point 4 above: While data_race() will ignore cause KCSAN to not report > > > the data race, now you might be missing a real bug: if somebody > > > concurrently modifies the bits accessed, you want to know about it! > > > Either way, it's up to you to add the ASSERT_EXCLUSIVE_BITS, but just > > > remember that if you decide to silence it with data_race(), you need > > > to be sure there are no concurrent writers to those bits. > > > > Right, in this case, there is no concurrent writers to those bits, so I'll add a > > comment should be sufficient. However, I'll keep ASSERT_EXCLUSIVE_BITS() in mind > > for other places. > > Right now there are no concurrent writers to those bits. But somebody > might introduce a bug that will write them, even though they shouldn't > have. With ASSERT_EXCLUSIVE_BITS() you can catch that. Once I have the > patches for this out, I would consider adding it here for this reason. Surely, we could add many of those to catch theoretical issues. I can think of more like ASSERT_HARMLESS_COUNTERS() because the worry about one day someone might change the code to use counters from printing out information to making important MM heuristic decisions. Then, we might end up with those too many macros situation again. The list goes on, ASSERT_COMPARE_ZERO_NOLOOP(), ASSERT_SINGLE_BIT() etc. On the other hand, maybe to take a more pragmatic approach that if there are strong evidences that developers could easily make mistakes in a certain place, then we could add a new macro, so the next time Joe developer wants to a new macro, he/she has to provide the same strong justifications? > > > > > > > There is no way to automatically infer all over the kernel which bits > > > we care about, and the most reliable is to be explicit about it. I > > > don't see a problem with it per se. > > > > > > Thanks, > > > -- Marco
On Mon, 10 Feb 2020 at 14:55, Qian Cai <cai@lca.pw> wrote: > > On Mon, 2020-02-10 at 14:38 +0100, Marco Elver wrote: > > On Mon, 10 Feb 2020 at 14:36, Qian Cai <cai@lca.pw> wrote: > > > > > > On Mon, 2020-02-10 at 13:58 +0100, Marco Elver wrote: > > > > On Mon, 10 Feb 2020 at 13:16, Qian Cai <cai@lca.pw> wrote: > > > > > > > > > > > > > > > > > > > > > On Feb 10, 2020, at 2:48 AM, Marco Elver <elver@google.com> wrote: > > > > > > > > > > > > Here is an alternative: > > > > > > > > > > > > Let's say KCSAN gives you this: > > > > > > /* ... Assert that the bits set in mask are not written > > > > > > concurrently; they may still be read concurrently. > > > > > > The access that immediately follows is assumed to access those > > > > > > bits and safe w.r.t. data races. > > > > > > > > > > > > For example, this may be used when certain bits of @flags may > > > > > > only be modified when holding the appropriate lock, > > > > > > but other bits may still be modified locklessly. > > > > > > ... > > > > > > */ > > > > > > #define ASSERT_EXCLUSIVE_BITS(flags, mask) .... > > > > > > > > > > > > Then we can write page_zonenum as follows: > > > > > > > > > > > > static inline enum zone_type page_zonenum(const struct page *page) > > > > > > { > > > > > > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); > > > > > > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > > > > > > } > > > > > > > > > > > > This will accomplish the following: > > > > > > 1. The current code is not touched, and we do not have to verify that > > > > > > the change is correct without KCSAN. > > > > > > 2. We're not introducing a bunch of special macros to read bits in various ways. > > > > > > 3. KCSAN will assume that the access is safe, and no data race report > > > > > > is generated. > > > > > > 4. If somebody modifies ZONES bits concurrently, KCSAN will tell you > > > > > > about the race. > > > > > > 5. We're documenting the code. > > > > > > > > > > > > Anything I missed? > > > > > > > > > > I don’t know. Having to write the same line twice does not feel me any better than data_race() with commenting occasionally. > > > > > > > > Point 4 above: While data_race() will ignore cause KCSAN to not report > > > > the data race, now you might be missing a real bug: if somebody > > > > concurrently modifies the bits accessed, you want to know about it! > > > > Either way, it's up to you to add the ASSERT_EXCLUSIVE_BITS, but just > > > > remember that if you decide to silence it with data_race(), you need > > > > to be sure there are no concurrent writers to those bits. > > > > > > Right, in this case, there is no concurrent writers to those bits, so I'll add a > > > comment should be sufficient. However, I'll keep ASSERT_EXCLUSIVE_BITS() in mind > > > for other places. > > > > Right now there are no concurrent writers to those bits. But somebody > > might introduce a bug that will write them, even though they shouldn't > > have. With ASSERT_EXCLUSIVE_BITS() you can catch that. Once I have the > > patches for this out, I would consider adding it here for this reason. > > Surely, we could add many of those to catch theoretical issues. I can think of > more like ASSERT_HARMLESS_COUNTERS() because the worry about one day someone > might change the code to use counters from printing out information to making > important MM heuristic decisions. Then, we might end up with those too many > macros situation again. The list goes on, ASSERT_COMPARE_ZERO_NOLOOP(), > ASSERT_SINGLE_BIT() etc. I'm sorry, but the above don't assert any quantifiable properties in the code. What we want is to be able to catch bugs that violate the *current* properties of the code *today*. A very real property of the code *today* is that nobody should modify zonenum without taking a lock. If you mark the access here, there is no tool that can help you. I'm trying to change that. The fact that we have bits that can be modified locklessly and some that can't is an inconvenience, but can be solved. Makes sense? Thanks, -- Marco > On the other hand, maybe to take a more pragmatic approach that if there are > strong evidences that developers could easily make mistakes in a certain place, > then we could add a new macro, so the next time Joe developer wants to a new > macro, he/she has to provide the same strong justifications? > > > > > > > > > > > There is no way to automatically infer all over the kernel which bits > > > > we care about, and the most reliable is to be explicit about it. I > > > > don't see a problem with it per se. > > > > > > > > Thanks, > > > > -- Marco
On Mon, 2020-02-10 at 15:12 +0100, Marco Elver wrote: > On Mon, 10 Feb 2020 at 14:55, Qian Cai <cai@lca.pw> wrote: > > > > On Mon, 2020-02-10 at 14:38 +0100, Marco Elver wrote: > > > On Mon, 10 Feb 2020 at 14:36, Qian Cai <cai@lca.pw> wrote: > > > > > > > > On Mon, 2020-02-10 at 13:58 +0100, Marco Elver wrote: > > > > > On Mon, 10 Feb 2020 at 13:16, Qian Cai <cai@lca.pw> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > On Feb 10, 2020, at 2:48 AM, Marco Elver <elver@google.com> wrote: > > > > > > > > > > > > > > Here is an alternative: > > > > > > > > > > > > > > Let's say KCSAN gives you this: > > > > > > > /* ... Assert that the bits set in mask are not written > > > > > > > concurrently; they may still be read concurrently. > > > > > > > The access that immediately follows is assumed to access those > > > > > > > bits and safe w.r.t. data races. > > > > > > > > > > > > > > For example, this may be used when certain bits of @flags may > > > > > > > only be modified when holding the appropriate lock, > > > > > > > but other bits may still be modified locklessly. > > > > > > > ... > > > > > > > */ > > > > > > > #define ASSERT_EXCLUSIVE_BITS(flags, mask) .... > > > > > > > > > > > > > > Then we can write page_zonenum as follows: > > > > > > > > > > > > > > static inline enum zone_type page_zonenum(const struct page *page) > > > > > > > { > > > > > > > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); > > > > > > > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > > > > > > > } > > > > > > > > > > > > > > This will accomplish the following: > > > > > > > 1. The current code is not touched, and we do not have to verify that > > > > > > > the change is correct without KCSAN. > > > > > > > 2. We're not introducing a bunch of special macros to read bits in various ways. > > > > > > > 3. KCSAN will assume that the access is safe, and no data race report > > > > > > > is generated. > > > > > > > 4. If somebody modifies ZONES bits concurrently, KCSAN will tell you > > > > > > > about the race. > > > > > > > 5. We're documenting the code. > > > > > > > > > > > > > > Anything I missed? > > > > > > > > > > > > I don’t know. Having to write the same line twice does not feel me any better than data_race() with commenting occasionally. > > > > > > > > > > Point 4 above: While data_race() will ignore cause KCSAN to not report > > > > > the data race, now you might be missing a real bug: if somebody > > > > > concurrently modifies the bits accessed, you want to know about it! > > > > > Either way, it's up to you to add the ASSERT_EXCLUSIVE_BITS, but just > > > > > remember that if you decide to silence it with data_race(), you need > > > > > to be sure there are no concurrent writers to those bits. > > > > > > > > Right, in this case, there is no concurrent writers to those bits, so I'll add a > > > > comment should be sufficient. However, I'll keep ASSERT_EXCLUSIVE_BITS() in mind > > > > for other places. > > > > > > Right now there are no concurrent writers to those bits. But somebody > > > might introduce a bug that will write them, even though they shouldn't > > > have. With ASSERT_EXCLUSIVE_BITS() you can catch that. Once I have the > > > patches for this out, I would consider adding it here for this reason. > > > > Surely, we could add many of those to catch theoretical issues. I can think of > > more like ASSERT_HARMLESS_COUNTERS() because the worry about one day someone > > might change the code to use counters from printing out information to making > > important MM heuristic decisions. Then, we might end up with those too many > > macros situation again. The list goes on, ASSERT_COMPARE_ZERO_NOLOOP(), > > ASSERT_SINGLE_BIT() etc. > > I'm sorry, but the above don't assert any quantifiable properties in the code. > > What we want is to be able to catch bugs that violate the *current* > properties of the code *today*. A very real property of the code > *today* is that nobody should modify zonenum without taking a lock. If > you mark the access here, there is no tool that can help you. I'm > trying to change that. > > The fact that we have bits that can be modified locklessly and some > that can't is an inconvenience, but can be solved. > > Makes sense? OK, go ahead adding it if you really feel like. I'd hope this is not the Pandora's box where people will eventually find more way to assert quantifiable properties in the code only to address theoretical issues... > > Thanks, > -- Marco > > > On the other hand, maybe to take a more pragmatic approach that if there are > > strong evidences that developers could easily make mistakes in a certain place, > > then we could add a new macro, so the next time Joe developer wants to a new > > macro, he/she has to provide the same strong justifications? > > > > > > > > > > > > > > > There is no way to automatically infer all over the kernel which bits > > > > > we care about, and the most reliable is to be explicit about it. I > > > > > don't see a problem with it per se. > > > > > > > > > > Thanks, > > > > > -- Marco
On Mon, 2020-02-10 at 08:48 +0100, Marco Elver wrote: > On Sun, 9 Feb 2020 at 08:15, John Hubbard <jhubbard@nvidia.com> wrote: > > > > On 2/8/20 7:10 PM, Qian Cai wrote: > > > > > > > > > > On Feb 8, 2020, at 8:44 PM, John Hubbard <jhubbard@nvidia.com> wrote: > > > > > > > > So it looks like we're probably stuck with having to annotate the code. Given > > > > that, there is a balance between how many macros, and how much commenting. For > > > > example, if there is a single macro (data_race, for example), then we'll need to > > > > add comments for the various cases, explaining which data_race situation is > > > > happening. > > > > > > On the other hand, it is perfect fine of not commenting on each data_race() that most of times, people could run git blame to learn more details. Actually, no maintainers from various of subsystems asked for commenting so far. > > > > > > > Well, maybe I'm looking at this wrong. I was thinking that one should attempt to > > understand the code on the screen, and that's generally best--but here, maybe > > "data_race" is just something that means "tool cruft", really. So mentally we > > would move toward visually filtering out the data_race "key word". > > One thing to note is that 'data_race()' points out concurrency, and > that somebody has deemed that the code won't break even with data > races. Somebody trying to understand or modify the code should ensure > this will still be the case. So, 'data_race()' isn't just tool cruft. > It's documentation for something that really isn't obvious from the > code alone. > > Whenever we see a READ_ONCE or other marked access it is obvious to > the reader that there are concurrent accesses happening. I'd argue > that for intentional data races, we should convey similar information, > to avoid breaking the code (of course KCSAN would tell you, but only > after the change was done). Even moreso, since changes to code > involving 'data_race()' will need re-verification that the data races > are still safe. > > > I really don't like it but at least there is a significant benefit from the tool > > that probably makes it worth the visual noise. > > > > Blue sky thoughts for The Far Future: It would be nice if the tools got a lot > > better--maybe in the direction of C language extensions, even if only used in > > this project at first. > > Still thinking about this. What we want to convey is that, while > there are races on the particular variable, nobody should be modifying > the bits here. Adding a READ_ONCE (or data_race()) would miss a > harmful race where somebody modifies these bits, so in principle I > agree. However, I think the tool can't automatically tell (even if we > had compiler extensions to give us the bits accessed) which bits we > care about, because we might have something like: > > int foo_bar = READ_ONCE(flags) >> FOO_BAR_SHIFT; // need the > READ_ONCE because of FOO bits > .. (foo_bar & FOO_MASK) .. // FOO bits can be modified concurrently > .. (foo_bar & BAR_MASK) .. // nobody should modify BAR bits > concurrently though ! > > What we want is to assert that nobody touches a particular set of > bits. KCSAN has recently gotten ASSERT_EXCLUSIVE_{WRITER,ACCESS} > macros which help assert properties of concurrent code, where bugs > won't manifest as data races. Along those lines, I can see the value > in doing an exclusivity check on a bitmask of a variable. > > I don't know how much a READ_BITS macro could help, since it's > probably less ergonomic to have to say something like: > READ_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT) >> ZONES_PGSHIFT. > > Here is an alternative: > > Let's say KCSAN gives you this: > /* ... Assert that the bits set in mask are not written > concurrently; they may still be read concurrently. > The access that immediately follows is assumed to access those > bits and safe w.r.t. data races. > > For example, this may be used when certain bits of @flags may > only be modified when holding the appropriate lock, > but other bits may still be modified locklessly. > ... > */ > #define ASSERT_EXCLUSIVE_BITS(flags, mask) .... > > Then we can write page_zonenum as follows: > > static inline enum zone_type page_zonenum(const struct page *page) > { > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > } Actually, it seems still need to write if I understand correctly, ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK); On the other hand, if you really worry about this thing could go wrong, it might be better of using READ_ONCE() at the first place where it will be more future- proof with the trade-off it might generate less efficient code optimization? Alternatively, is there a way to write this as this? return ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); Kind of ugly but it probably cleaner. > > This will accomplish the following: > 1. The current code is not touched, and we do not have to verify that > the change is correct without KCSAN. > 2. We're not introducing a bunch of special macros to read bits in various ways. > 3. KCSAN will assume that the access is safe, and no data race report > is generated. > 4. If somebody modifies ZONES bits concurrently, KCSAN will tell you > about the race. > 5. We're documenting the code. > > Anything I missed? > > Thanks, > -- Marco > > > > > > > thanks, > > -- > > John Hubbard > > NVIDIA > > > > > > > > > > That's still true, but to a lesser extent if more macros are added. In this case, > > > > I suspect that READ_BITS() makes the commenting easier and shorter. So I'd tentatively > > > > lead towards adding it, but what do others on the list think? > > > > > > Even read bits could be dangerous from data races and confusing at best, so I am not really sure what the value of introducing this new macro. People who like to understand it correctly still need to read the commit logs. > > > > > > This flags->zonenum is such a special case that I don’t really see it regularly for the last few weeks digging KCSAN reports, so even if it is worth adding READ_BITS(), there are more equally important macros need to be added together to be useful initially. For example, HARMLESS_COUNTERS(), READ_SINGLE_BIT(), READ_IMMUTATABLE_BITS() etc which Linus said exactly wanted to avoid. > > >
On Mon, 10 Feb 2020 at 17:23, Qian Cai <cai@lca.pw> wrote: > > On Mon, 2020-02-10 at 08:48 +0100, Marco Elver wrote: > > On Sun, 9 Feb 2020 at 08:15, John Hubbard <jhubbard@nvidia.com> wrote: > > > > > > On 2/8/20 7:10 PM, Qian Cai wrote: > > > > > > > > > > > > > On Feb 8, 2020, at 8:44 PM, John Hubbard <jhubbard@nvidia.com> wrote: > > > > > > > > > > So it looks like we're probably stuck with having to annotate the code. Given > > > > > that, there is a balance between how many macros, and how much commenting. For > > > > > example, if there is a single macro (data_race, for example), then we'll need to > > > > > add comments for the various cases, explaining which data_race situation is > > > > > happening. > > > > > > > > On the other hand, it is perfect fine of not commenting on each data_race() that most of times, people could run git blame to learn more details. Actually, no maintainers from various of subsystems asked for commenting so far. > > > > > > > > > > Well, maybe I'm looking at this wrong. I was thinking that one should attempt to > > > understand the code on the screen, and that's generally best--but here, maybe > > > "data_race" is just something that means "tool cruft", really. So mentally we > > > would move toward visually filtering out the data_race "key word". > > > > One thing to note is that 'data_race()' points out concurrency, and > > that somebody has deemed that the code won't break even with data > > races. Somebody trying to understand or modify the code should ensure > > this will still be the case. So, 'data_race()' isn't just tool cruft. > > It's documentation for something that really isn't obvious from the > > code alone. > > > > Whenever we see a READ_ONCE or other marked access it is obvious to > > the reader that there are concurrent accesses happening. I'd argue > > that for intentional data races, we should convey similar information, > > to avoid breaking the code (of course KCSAN would tell you, but only > > after the change was done). Even moreso, since changes to code > > involving 'data_race()' will need re-verification that the data races > > are still safe. > > > > > I really don't like it but at least there is a significant benefit from the tool > > > that probably makes it worth the visual noise. > > > > > > Blue sky thoughts for The Far Future: It would be nice if the tools got a lot > > > better--maybe in the direction of C language extensions, even if only used in > > > this project at first. > > > > Still thinking about this. What we want to convey is that, while > > there are races on the particular variable, nobody should be modifying > > the bits here. Adding a READ_ONCE (or data_race()) would miss a > > harmful race where somebody modifies these bits, so in principle I > > agree. However, I think the tool can't automatically tell (even if we > > had compiler extensions to give us the bits accessed) which bits we > > care about, because we might have something like: > > > > int foo_bar = READ_ONCE(flags) >> FOO_BAR_SHIFT; // need the > > READ_ONCE because of FOO bits > > .. (foo_bar & FOO_MASK) .. // FOO bits can be modified concurrently > > .. (foo_bar & BAR_MASK) .. // nobody should modify BAR bits > > concurrently though ! > > > > What we want is to assert that nobody touches a particular set of > > bits. KCSAN has recently gotten ASSERT_EXCLUSIVE_{WRITER,ACCESS} > > macros which help assert properties of concurrent code, where bugs > > won't manifest as data races. Along those lines, I can see the value > > in doing an exclusivity check on a bitmask of a variable. > > > > I don't know how much a READ_BITS macro could help, since it's > > probably less ergonomic to have to say something like: > > READ_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT) >> ZONES_PGSHIFT. > > > > Here is an alternative: > > > > Let's say KCSAN gives you this: > > /* ... Assert that the bits set in mask are not written > > concurrently; they may still be read concurrently. > > The access that immediately follows is assumed to access those > > bits and safe w.r.t. data races. > > > > For example, this may be used when certain bits of @flags may > > only be modified when holding the appropriate lock, > > but other bits may still be modified locklessly. > > ... > > */ > > #define ASSERT_EXCLUSIVE_BITS(flags, mask) .... > > > > Then we can write page_zonenum as follows: > > > > static inline enum zone_type page_zonenum(const struct page *page) > > { > > + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); > > return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; > > } > > Actually, it seems still need to write if I understand correctly, > > ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); > return data_race((page->flags >> ZONES_PGSHIFT) & ZONES_MASK); No, I designed it so you won't need 'data_race()' if you don't want to. I'll send the patches shortly. > On the other hand, if you really worry about this thing could go wrong, it might > be better of using READ_ONCE() at the first place where it will be more future- > proof with the trade-off it might generate less efficient code optimization? The READ_ONCE() I'd still advocate for, but KCSAN won't complain if the pattern is as written above. > Alternatively, is there a way to write this as this? > > return ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); It's an ASSERT, without KCSAN it should do nothing, so this is wrong. Also, this won't work because you're no longer returning the same value. I thought about this for READ_BITS, but you'd need (I wrote this earlier in the thread that it likely won't be suitable): READ_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT) >> ZONES_PGSHIFT to get the equivalent result (notice this will result in a redundant shift). Because we have all kinds of permutations and variants of how to extract the same bits out of some flags, it's cleaner to have one 'ASSERT_EXCLUSIVE_BITS' and just give it the bits you care about. Thanks, -- Marco > Kind of ugly but it probably cleaner. > > > > > This will accomplish the following: > > 1. The current code is not touched, and we do not have to verify that > > the change is correct without KCSAN. > > 2. We're not introducing a bunch of special macros to read bits in various ways. > > 3. KCSAN will assume that the access is safe, and no data race report > > is generated. > > 4. If somebody modifies ZONES bits concurrently, KCSAN will tell you > > about the race. > > 5. We're documenting the code. > > > > Anything I missed? > > > > Thanks, > > -- Marco > > > > > > > > > > > > > thanks, > > > -- > > > John Hubbard > > > NVIDIA > > > > > > > > > > > > > That's still true, but to a lesser extent if more macros are added. In this case, > > > > > I suspect that READ_BITS() makes the commenting easier and shorter. So I'd tentatively > > > > > lead towards adding it, but what do others on the list think? > > > > > > > > Even read bits could be dangerous from data races and confusing at best, so I am not really sure what the value of introducing this new macro. People who like to understand it correctly still need to read the commit logs. > > > > > > > > This flags->zonenum is such a special case that I don’t really see it regularly for the last few weeks digging KCSAN reports, so even if it is worth adding READ_BITS(), there are more equally important macros need to be added together to be useful initially. For example, HARMLESS_COUNTERS(), READ_SINGLE_BIT(), READ_IMMUTATABLE_BITS() etc which Linus said exactly wanted to avoid. > > > >
diff --git a/include/linux/mm.h b/include/linux/mm.h index 52269e56c514..f8529aa971c0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -920,7 +920,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, static inline enum zone_type page_zonenum(const struct page *page) { - return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; + return (READ_ONCE(page->flags) >> ZONES_PGSHIFT) & ZONES_MASK; } #ifdef CONFIG_ZONE_DEVICE
page->flags could be accessed concurrently as noticied by KCSAN, BUG: KCSAN: data-race in page_cpupid_xchg_last / put_page write (marked) to 0xfffffc0d48ec1a00 of 8 bytes by task 91442 on cpu 3: page_cpupid_xchg_last+0x51/0x80 page_cpupid_xchg_last at mm/mmzone.c:109 (discriminator 11) wp_page_reuse+0x3e/0xc0 wp_page_reuse at mm/memory.c:2453 do_wp_page+0x472/0x7b0 do_wp_page at mm/memory.c:2798 __handle_mm_fault+0xcb0/0xd00 handle_pte_fault at mm/memory.c:4049 (inlined by) __handle_mm_fault at mm/memory.c:4163 handle_mm_fault+0xfc/0x2f0 handle_mm_fault at mm/memory.c:4200 do_page_fault+0x263/0x6f9 do_user_addr_fault at arch/x86/mm/fault.c:1465 (inlined by) do_page_fault at arch/x86/mm/fault.c:1539 page_fault+0x34/0x40 read to 0xfffffc0d48ec1a00 of 8 bytes by task 94817 on cpu 69: put_page+0x15a/0x1f0 page_zonenum at include/linux/mm.h:923 (inlined by) is_zone_device_page at include/linux/mm.h:929 (inlined by) page_is_devmap_managed at include/linux/mm.h:948 (inlined by) put_page at include/linux/mm.h:1023 wp_page_copy+0x571/0x930 wp_page_copy at mm/memory.c:2615 do_wp_page+0x107/0x7b0 __handle_mm_fault+0xcb0/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Reported by Kernel Concurrency Sanitizer on: CPU: 69 PID: 94817 Comm: systemd-udevd Tainted: G W O L 5.5.0-next-20200204+ #6 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Both the read and write are done only with the non-exclusive mmap_sem held. Since the read will check for specific bits (up to three bits for now) in the flag, load tearing could in theory trigger a logic bug. To fix it, it could introduce put_page_lockless() in those places but that could be an overkill, and difficult to use. Thus, just add READ_ONCE() for the read in page_zonenum() for now where it should not affect the performance and correctness with a small trade-off that compilers might generate less efficient optimization in some places. Signed-off-by: Qian Cai <cai@lca.pw> --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)