Message ID | 20200130025133.5232-1-cai@lca.pw (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/util: fix a data race in __vm_enough_memory() | expand |
On Wed, Jan 29, 2020 at 09:51:33PM -0500, Qian Cai wrote: > "vm_committed_as.count" could be accessed concurrently as reported by > KCSAN, > > read to 0xffffffff923164f8 of 8 bytes by task 1268 on cpu 38: > __vm_enough_memory+0x43/0x280 mm/util.c:801 > mmap_region+0x1b2/0xb90 mm/mmap.c:1726 > do_mmap+0x45c/0x700 > vm_mmap_pgoff+0xc0/0x130 > vm_mmap+0x71/0x90 > elf_map+0xa1/0x1b0 > load_elf_binary+0x9de/0x2180 > search_binary_handler+0xd8/0x2b0 > __do_execve_file+0xb61/0x1080 > __x64_sys_execve+0x5f/0x70 > do_syscall_64+0x91/0xb47 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > write to 0xffffffff923164f8 of 8 bytes by task 1265 on cpu 41: > percpu_counter_add_batch+0x83/0xd0 lib/percpu_counter.c:91 > exit_mmap+0x178/0x220 include/linux/mman.h:68 > mmput+0x10e/0x270 > flush_old_exec+0x572/0xfe0 > load_elf_binary+0x467/0x2180 > search_binary_handler+0xd8/0x2b0 > __do_execve_file+0xb61/0x1080 > __x64_sys_execve+0x5f/0x70 > do_syscall_64+0x91/0xb47 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > Since only the read is operating as lockless, fix it by using > READ_ONLY() for it to avoid any possible false warning due to load You mean READ_ONCE ... > { > long allowed; > > - VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) < > + VM_WARN_ONCE(READ_ONCE(vm_committed_as.count) < > -(s64)vm_committed_as_batch * num_online_cpus(), I'm really not a fan of exposing the internals of a percpu_counter outside the percpu_counter.h file. Why shouldn't this be fixed by putting the READ_ONCE() inside percpu_counter_read()?
> On Jan 29, 2020, at 11:20 PM, Matthew Wilcox <willy@infradead.org> wrote: > > I'm really not a fan of exposing the internals of a percpu_counter outside > the percpu_counter.h file. Why shouldn't this be fixed by putting the > READ_ONCE() inside percpu_counter_read()? It is because not all places suffer from a data race. For example, in __wb_update_bandwidth(), it was protected by a lock. I was a bit worry about blindly adding READ_ONCE() inside percpu_counter_read() might has unexpected side-effect. For example, it is unnecessary to have READ_ONCE() for a volatile variable. So, I thought just to keep the change minimal with a trade off by exposing a bit internal details as you mentioned. However, I had also copied the percpu maintainers to see if they have any preferences?
On Thu, 30 Jan 2020 at 12:50, Qian Cai <cai@lca.pw> wrote: > > > On Jan 29, 2020, at 11:20 PM, Matthew Wilcox <willy@infradead.org> wrote: > > > > I'm really not a fan of exposing the internals of a percpu_counter outside > > the percpu_counter.h file. Why shouldn't this be fixed by putting the > > READ_ONCE() inside percpu_counter_read()? > > It is because not all places suffer from a data race. For example, in __wb_update_bandwidth(), it was protected by a lock. I was a bit worry about blindly adding READ_ONCE() inside percpu_counter_read() might has unexpected side-effect. For example, it is unnecessary to have READ_ONCE() for a volatile variable. So, I thought just to keep the change minimal with a trade off by exposing a bit internal details as you mentioned. > > However, I had also copied the percpu maintainers to see if they have any preferences? I would not add READ_ONCE to percpu_counter_read(), given the writes (increments) are not atomic either, so not much is gained. Notice that this is inside a WARN_ONCE, so you may argue that a data race here doesn't matter to the correct behaviour of the system (except if you have panic_on_warn on). For the warning to trigger, vm_committed_as must decrease. Assume that a data race (assuming bad compiler optimizations) can somehow accomplish this, then the load or write must cause a transient value to somehow be less than a stable value. My hypothesis is this is very unlikely. Given the fact this is a WARN_ONCE, and the fact that a transient decrease in the value is unlikely, you may consider 'VM_WARN_ONCE(data_race(percpu_counter_read(&vm_committed_as)) < ...)'. That way you won't modify percpu_counter_read and still catch unintended races elsewhere. [ Note that the 'data_race()' macro is still only in -next, -tip, and -rcu. ] Thanks, -- Marco
On Thu, 30 Jan 2020 13:35:18 +0100 Marco Elver <elver@google.com> wrote: > On Thu, 30 Jan 2020 at 12:50, Qian Cai <cai@lca.pw> wrote: > > > > > On Jan 29, 2020, at 11:20 PM, Matthew Wilcox <willy@infradead.org> wrote: > > > > > > I'm really not a fan of exposing the internals of a percpu_counter outside > > > the percpu_counter.h file. Why shouldn't this be fixed by putting the > > > READ_ONCE() inside percpu_counter_read()? > > > > It is because not all places suffer from a data race. For example, in __wb_update_bandwidth(), it was protected by a lock. I was a bit worry about blindly adding READ_ONCE() inside percpu_counter_read() might has unexpected side-effect. For example, it is unnecessary to have READ_ONCE() for a volatile variable. So, I thought just to keep the change minimal with a trade off by exposing a bit internal details as you mentioned. > > > > However, I had also copied the percpu maintainers to see if they have any preferences? > > I would not add READ_ONCE to percpu_counter_read(), given the writes > (increments) are not atomic either, so not much is gained. > > Notice that this is inside a WARN_ONCE, so you may argue that a data > race here doesn't matter to the correct behaviour of the system > (except if you have panic_on_warn on). > > For the warning to trigger, vm_committed_as must decrease. Assume that > a data race (assuming bad compiler optimizations) can somehow > accomplish this, then the load or write must cause a transient value > to somehow be less than a stable value. My hypothesis is this is very > unlikely. > > Given the fact this is a WARN_ONCE, and the fact that a transient > decrease in the value is unlikely, you may consider > 'VM_WARN_ONCE(data_race(percpu_counter_read(&vm_committed_as)) < > ...)'. That way you won't modify percpu_counter_read and still catch > unintended races elsewhere. > That, or add an alternative version of per_cpu_counter_read() to the percpu API. A very carefully commented version!
> On Jan 30, 2020, at 9:18 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 30 Jan 2020 13:35:18 +0100 Marco Elver <elver@google.com> wrote: > >> On Thu, 30 Jan 2020 at 12:50, Qian Cai <cai@lca.pw> wrote: >>> >>>> On Jan 29, 2020, at 11:20 PM, Matthew Wilcox <willy@infradead.org> wrote: >>>> >>>> I'm really not a fan of exposing the internals of a percpu_counter outside >>>> the percpu_counter.h file. Why shouldn't this be fixed by putting the >>>> READ_ONCE() inside percpu_counter_read()? >>> >>> It is because not all places suffer from a data race. For example, in __wb_update_bandwidth(), it was protected by a lock. I was a bit worry about blindly adding READ_ONCE() inside percpu_counter_read() might has unexpected side-effect. For example, it is unnecessary to have READ_ONCE() for a volatile variable. So, I thought just to keep the change minimal with a trade off by exposing a bit internal details as you mentioned. >>> >>> However, I had also copied the percpu maintainers to see if they have any preferences? >> >> I would not add READ_ONCE to percpu_counter_read(), given the writes >> (increments) are not atomic either, so not much is gained. >> >> Notice that this is inside a WARN_ONCE, so you may argue that a data >> race here doesn't matter to the correct behaviour of the system >> (except if you have panic_on_warn on). >> >> For the warning to trigger, vm_committed_as must decrease. Assume that >> a data race (assuming bad compiler optimizations) can somehow >> accomplish this, then the load or write must cause a transient value >> to somehow be less than a stable value. My hypothesis is this is very >> unlikely. >> >> Given the fact this is a WARN_ONCE, and the fact that a transient >> decrease in the value is unlikely, you may consider >> 'VM_WARN_ONCE(data_race(percpu_counter_read(&vm_committed_as)) < >> ...)'. That way you won't modify percpu_counter_read and still catch >> unintended races elsewhere. >> > > That, or add an alternative version of per_cpu_counter_read() to the > percpu API. A very carefully commented version! I send a patch to use data_race() which should be sufficient, https://lore.kernel.org/linux-mm/20200130145649.1240-1-cai@lca.pw/
diff --git a/mm/util.c b/mm/util.c index 988d11e6c17c..58cd8f28651c 100644 --- a/mm/util.c +++ b/mm/util.c @@ -798,7 +798,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) { long allowed; - VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) < + VM_WARN_ONCE(READ_ONCE(vm_committed_as.count) < -(s64)vm_committed_as_batch * num_online_cpus(), "memory commitment underflow");
"vm_committed_as.count" could be accessed concurrently as reported by KCSAN, read to 0xffffffff923164f8 of 8 bytes by task 1268 on cpu 38: __vm_enough_memory+0x43/0x280 mm/util.c:801 mmap_region+0x1b2/0xb90 mm/mmap.c:1726 do_mmap+0x45c/0x700 vm_mmap_pgoff+0xc0/0x130 vm_mmap+0x71/0x90 elf_map+0xa1/0x1b0 load_elf_binary+0x9de/0x2180 search_binary_handler+0xd8/0x2b0 __do_execve_file+0xb61/0x1080 __x64_sys_execve+0x5f/0x70 do_syscall_64+0x91/0xb47 entry_SYSCALL_64_after_hwframe+0x49/0xbe write to 0xffffffff923164f8 of 8 bytes by task 1265 on cpu 41: percpu_counter_add_batch+0x83/0xd0 lib/percpu_counter.c:91 exit_mmap+0x178/0x220 include/linux/mman.h:68 mmput+0x10e/0x270 flush_old_exec+0x572/0xfe0 load_elf_binary+0x467/0x2180 search_binary_handler+0xd8/0x2b0 __do_execve_file+0xb61/0x1080 __x64_sys_execve+0x5f/0x70 do_syscall_64+0x91/0xb47 entry_SYSCALL_64_after_hwframe+0x49/0xbe Since only the read is operating as lockless, fix it by using READ_ONLY() for it to avoid any possible false warning due to load tearing. Signed-off-by: Qian Cai <cai@lca.pw> --- mm/util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)