Message ID | 20200324153643.15527-4-will@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Improve list integrity checking | expand |
On Tue, Mar 24, 2020 at 4:37 PM Will Deacon <will@kernel.org> wrote: > Some list predicates can be used locklessly even with the non-RCU list > implementations, since they effectively boil down to a test against > NULL. For example, checking whether or not a list is empty is safe even > in the presence of a concurrent, tearing write to the list head pointer. > Similarly, checking whether or not an hlist node has been hashed is safe > as well. > > Annotate these lockless list predicates with data_race() and READ_ONCE() > so that KCSAN and the compiler are aware of what's going on. The writer > side can then avoid having to use WRITE_ONCE() in the non-RCU > implementation. [...] > static inline int list_empty(const struct list_head *head) > { > - return READ_ONCE(head->next) == head; > + return data_race(READ_ONCE(head->next) == head); > } [...] > static inline int hlist_unhashed(const struct hlist_node *h) > { > - return !READ_ONCE(h->pprev); > + return data_race(!READ_ONCE(h->pprev)); > } This is probably valid in practice for hlist_unhashed(), which compares with NULL, as long as the most significant byte of all kernel pointers is non-zero; but I think list_empty() could realistically return false positives in the presence of a concurrent tearing store? This could break the following code pattern: /* optimistic lockless check */ if (!list_empty(&some_list)) { /* slowpath */ mutex_lock(&some_mutex); list_for_each(tmp, &some_list) { ... } mutex_unlock(&some_mutex); } (I'm not sure whether patterns like this appear commonly though.)
On Tue, 24 Mar 2020 at 16:37, Will Deacon <will@kernel.org> wrote: > > Some list predicates can be used locklessly even with the non-RCU list > implementations, since they effectively boil down to a test against > NULL. For example, checking whether or not a list is empty is safe even > in the presence of a concurrent, tearing write to the list head pointer. > Similarly, checking whether or not an hlist node has been hashed is safe > as well. > > Annotate these lockless list predicates with data_race() and READ_ONCE() > so that KCSAN and the compiler are aware of what's going on. The writer > side can then avoid having to use WRITE_ONCE() in the non-RCU > implementation. > > Cc: Paul E. McKenney <paulmck@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Marco Elver <elver@google.com> > Signed-off-by: Will Deacon <will@kernel.org> > --- > include/linux/list.h | 10 +++++----- > include/linux/list_bl.h | 5 +++-- > include/linux/list_nulls.h | 6 +++--- > include/linux/llist.h | 2 +- > 4 files changed, 12 insertions(+), 11 deletions(-) > > diff --git a/include/linux/list.h b/include/linux/list.h > index 4fed5a0f9b77..4d9f5f9ed1a8 100644 > --- a/include/linux/list.h > +++ b/include/linux/list.h > @@ -279,7 +279,7 @@ static inline int list_is_last(const struct list_head *list, > */ > static inline int list_empty(const struct list_head *head) > { > - return READ_ONCE(head->next) == head; > + return data_race(READ_ONCE(head->next) == head); Double-marking should never be necessary, at least if you want to make KCSAN happy. From what I gather there is an unmarked write somewhere, correct? In that case, KCSAN will still complain because if it sees a race between this read and the other write, then at least one is still plain (the write). Then, my suggestion would be to mark the write with data_race() and just leave this as a READ_ONCE(). Having a data_race() somewhere only makes KCSAN stop reporting the race if the paired access is also marked (be it with data_race() or _ONCE, etc.). Alternatively, if marking the write is impossible, you can surround the access with kcsan_disable_current()/kcsan_enable_current(). Or, as a last resort, just leaving as-is is fine too, because KCSAN's default config (still) has KCSAN_ASSUME_PLAIN_WRITES_ATOMIC selected. Thanks, -- Marco > } > > /** > @@ -524,7 +524,7 @@ static inline void list_splice_tail_init(struct list_head *list, > */ > #define list_first_entry_or_null(ptr, type, member) ({ \ > struct list_head *head__ = (ptr); \ > - struct list_head *pos__ = READ_ONCE(head__->next); \ > + struct list_head *pos__ = data_race(READ_ONCE(head__->next)); \ > pos__ != head__ ? list_entry(pos__, type, member) : NULL; \ > }) > > @@ -772,13 +772,13 @@ static inline void INIT_HLIST_NODE(struct hlist_node *h) > * hlist_unhashed - Has node been removed from list and reinitialized? > * @h: Node to be checked > * > - * Not that not all removal functions will leave a node in unhashed > + * Note that not all removal functions will leave a node in unhashed > * state. For example, hlist_nulls_del_init_rcu() does leave the > * node in unhashed state, but hlist_nulls_del() does not. > */ > static inline int hlist_unhashed(const struct hlist_node *h) > { > - return !READ_ONCE(h->pprev); > + return data_race(!READ_ONCE(h->pprev)); > } > > /** > @@ -787,7 +787,7 @@ static inline int hlist_unhashed(const struct hlist_node *h) > */ > static inline int hlist_empty(const struct hlist_head *h) > { > - return !READ_ONCE(h->first); > + return data_race(!READ_ONCE(h->first)); > } > > static inline void __hlist_del(struct hlist_node *n) > diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h > index ae1b541446c9..1804fdb84dda 100644 > --- a/include/linux/list_bl.h > +++ b/include/linux/list_bl.h > @@ -51,7 +51,7 @@ static inline void INIT_HLIST_BL_NODE(struct hlist_bl_node *h) > > static inline bool hlist_bl_unhashed(const struct hlist_bl_node *h) > { > - return !h->pprev; > + return data_race(!READ_ONCE(h->pprev)); > } > > static inline struct hlist_bl_node *hlist_bl_first(struct hlist_bl_head *h) > @@ -71,7 +71,8 @@ static inline void hlist_bl_set_first(struct hlist_bl_head *h, > > static inline bool hlist_bl_empty(const struct hlist_bl_head *h) > { > - return !((unsigned long)READ_ONCE(h->first) & ~LIST_BL_LOCKMASK); > + unsigned long first = data_race((unsigned long)READ_ONCE(h->first)); > + return !(first & ~LIST_BL_LOCKMASK); > } > > static inline void hlist_bl_add_head(struct hlist_bl_node *n, > diff --git a/include/linux/list_nulls.h b/include/linux/list_nulls.h > index 3a9ff01e9a11..fa51a801bf32 100644 > --- a/include/linux/list_nulls.h > +++ b/include/linux/list_nulls.h > @@ -60,18 +60,18 @@ static inline unsigned long get_nulls_value(const struct hlist_nulls_node *ptr) > * hlist_nulls_unhashed - Has node been removed and reinitialized? > * @h: Node to be checked > * > - * Not that not all removal functions will leave a node in unhashed state. > + * Note that not all removal functions will leave a node in unhashed state. > * For example, hlist_del_init_rcu() leaves the node in unhashed state, > * but hlist_nulls_del() does not. > */ > static inline int hlist_nulls_unhashed(const struct hlist_nulls_node *h) > { > - return !READ_ONCE(h->pprev); > + return data_race(!READ_ONCE(h->pprev)); > } > > static inline int hlist_nulls_empty(const struct hlist_nulls_head *h) > { > - return is_a_nulls(READ_ONCE(h->first)); > + return data_race(is_a_nulls(READ_ONCE(h->first))); > } > > static inline void hlist_nulls_add_head(struct hlist_nulls_node *n, > diff --git a/include/linux/llist.h b/include/linux/llist.h > index 2e9c7215882b..c7f042b73899 100644 > --- a/include/linux/llist.h > +++ b/include/linux/llist.h > @@ -186,7 +186,7 @@ static inline void init_llist_head(struct llist_head *list) > */ > static inline bool llist_empty(const struct llist_head *head) > { > - return READ_ONCE(head->first) == NULL; > + return data_race(READ_ONCE(head->first) == NULL); > } > > static inline struct llist_node *llist_next(struct llist_node *node) > -- > 2.20.1 >
On Tue, Mar 24, 2020 at 05:20:45PM +0100, Jann Horn wrote: > On Tue, Mar 24, 2020 at 4:37 PM Will Deacon <will@kernel.org> wrote: > > Some list predicates can be used locklessly even with the non-RCU list > > implementations, since they effectively boil down to a test against > > NULL. For example, checking whether or not a list is empty is safe even > > in the presence of a concurrent, tearing write to the list head pointer. > > Similarly, checking whether or not an hlist node has been hashed is safe > > as well. > > > > Annotate these lockless list predicates with data_race() and READ_ONCE() > > so that KCSAN and the compiler are aware of what's going on. The writer > > side can then avoid having to use WRITE_ONCE() in the non-RCU > > implementation. > [...] > > static inline int list_empty(const struct list_head *head) > > { > > - return READ_ONCE(head->next) == head; > > + return data_race(READ_ONCE(head->next) == head); > > } > [...] > > static inline int hlist_unhashed(const struct hlist_node *h) > > { > > - return !READ_ONCE(h->pprev); > > + return data_race(!READ_ONCE(h->pprev)); > > } > > This is probably valid in practice for hlist_unhashed(), which > compares with NULL, as long as the most significant byte of all kernel > pointers is non-zero; but I think list_empty() could realistically > return false positives in the presence of a concurrent tearing store? > This could break the following code pattern: > > /* optimistic lockless check */ > if (!list_empty(&some_list)) { > /* slowpath */ > mutex_lock(&some_mutex); > list_for_each(tmp, &some_list) { > ... > } > mutex_unlock(&some_mutex); > } > > (I'm not sure whether patterns like this appear commonly though.) I would hope not as the list could go "empty" before the lock is grabbed. That pattern would be wrong. thanks, greg k-h
On Tue, Mar 24, 2020 at 5:26 PM Greg KH <greg@kroah.com> wrote: > On Tue, Mar 24, 2020 at 05:20:45PM +0100, Jann Horn wrote: > > On Tue, Mar 24, 2020 at 4:37 PM Will Deacon <will@kernel.org> wrote: > > > Some list predicates can be used locklessly even with the non-RCU list > > > implementations, since they effectively boil down to a test against > > > NULL. For example, checking whether or not a list is empty is safe even > > > in the presence of a concurrent, tearing write to the list head pointer. > > > Similarly, checking whether or not an hlist node has been hashed is safe > > > as well. > > > > > > Annotate these lockless list predicates with data_race() and READ_ONCE() > > > so that KCSAN and the compiler are aware of what's going on. The writer > > > side can then avoid having to use WRITE_ONCE() in the non-RCU > > > implementation. > > [...] > > > static inline int list_empty(const struct list_head *head) > > > { > > > - return READ_ONCE(head->next) == head; > > > + return data_race(READ_ONCE(head->next) == head); > > > } > > [...] > > > static inline int hlist_unhashed(const struct hlist_node *h) > > > { > > > - return !READ_ONCE(h->pprev); > > > + return data_race(!READ_ONCE(h->pprev)); > > > } > > > > This is probably valid in practice for hlist_unhashed(), which > > compares with NULL, as long as the most significant byte of all kernel > > pointers is non-zero; but I think list_empty() could realistically > > return false positives in the presence of a concurrent tearing store? > > This could break the following code pattern: > > > > /* optimistic lockless check */ > > if (!list_empty(&some_list)) { > > /* slowpath */ > > mutex_lock(&some_mutex); > > list_for_each(tmp, &some_list) { > > ... > > } > > mutex_unlock(&some_mutex); > > } > > > > (I'm not sure whether patterns like this appear commonly though.) > > > I would hope not as the list could go "empty" before the lock is > grabbed. That pattern would be wrong. If the list becomes empty in between, the loop just iterates over nothing, and the effect is no different from what you'd get if you had bailed out before. But sure, you have to be aware that that can happen.
On Tue, Mar 24, 2020 at 03:36:25PM +0000, Will Deacon wrote: > diff --git a/include/linux/list.h b/include/linux/list.h > index 4fed5a0f9b77..4d9f5f9ed1a8 100644 > --- a/include/linux/list.h > +++ b/include/linux/list.h > @@ -279,7 +279,7 @@ static inline int list_is_last(const struct list_head *list, > */ > static inline int list_empty(const struct list_head *head) > { > - return READ_ONCE(head->next) == head; > + return data_race(READ_ONCE(head->next) == head); > } list_empty() isn't lockless safe, that's what we have list_empty_careful() for.
On Tue, Mar 24, 2020 at 5:51 PM Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Mar 24, 2020 at 03:36:25PM +0000, Will Deacon wrote: > > diff --git a/include/linux/list.h b/include/linux/list.h > > index 4fed5a0f9b77..4d9f5f9ed1a8 100644 > > --- a/include/linux/list.h > > +++ b/include/linux/list.h > > @@ -279,7 +279,7 @@ static inline int list_is_last(const struct list_head *list, > > */ > > static inline int list_empty(const struct list_head *head) > > { > > - return READ_ONCE(head->next) == head; > > + return data_race(READ_ONCE(head->next) == head); > > } > > list_empty() isn't lockless safe, that's what we have > list_empty_careful() for. That thing looks like it could also use some READ_ONCE() sprinkled in...
On Tue, Mar 24, 2020 at 05:38:30PM +0100, Jann Horn wrote: > On Tue, Mar 24, 2020 at 5:26 PM Greg KH <greg@kroah.com> wrote: > > On Tue, Mar 24, 2020 at 05:20:45PM +0100, Jann Horn wrote: > > > On Tue, Mar 24, 2020 at 4:37 PM Will Deacon <will@kernel.org> wrote: > > > > Some list predicates can be used locklessly even with the non-RCU list > > > > implementations, since they effectively boil down to a test against > > > > NULL. For example, checking whether or not a list is empty is safe even > > > > in the presence of a concurrent, tearing write to the list head pointer. > > > > Similarly, checking whether or not an hlist node has been hashed is safe > > > > as well. > > > > > > > > Annotate these lockless list predicates with data_race() and READ_ONCE() > > > > so that KCSAN and the compiler are aware of what's going on. The writer > > > > side can then avoid having to use WRITE_ONCE() in the non-RCU > > > > implementation. > > > [...] > > > > static inline int list_empty(const struct list_head *head) > > > > { > > > > - return READ_ONCE(head->next) == head; > > > > + return data_race(READ_ONCE(head->next) == head); > > > > } > > > [...] > > > > static inline int hlist_unhashed(const struct hlist_node *h) > > > > { > > > > - return !READ_ONCE(h->pprev); > > > > + return data_race(!READ_ONCE(h->pprev)); > > > > } > > > > > > This is probably valid in practice for hlist_unhashed(), which > > > compares with NULL, as long as the most significant byte of all kernel > > > pointers is non-zero; but I think list_empty() could realistically > > > return false positives in the presence of a concurrent tearing store? > > > This could break the following code pattern: > > > > > > /* optimistic lockless check */ > > > if (!list_empty(&some_list)) { > > > /* slowpath */ > > > mutex_lock(&some_mutex); > > > list_for_each(tmp, &some_list) { > > > ... > > > } > > > mutex_unlock(&some_mutex); > > > } > > > > > > (I'm not sure whether patterns like this appear commonly though.) > > > > > > I would hope not as the list could go "empty" before the lock is > > grabbed. That pattern would be wrong. > > If the list becomes empty in between, the loop just iterates over > nothing, and the effect is no different from what you'd get if you had > bailed out before. But sure, you have to be aware that that can > happen. Doh, yeah, so it is safe, crazy, but safe :)
On Tue, Mar 24, 2020 at 5:59 PM Greg KH <greg@kroah.com> wrote: > On Tue, Mar 24, 2020 at 05:38:30PM +0100, Jann Horn wrote: > > On Tue, Mar 24, 2020 at 5:26 PM Greg KH <greg@kroah.com> wrote: > > > On Tue, Mar 24, 2020 at 05:20:45PM +0100, Jann Horn wrote: > > > > On Tue, Mar 24, 2020 at 4:37 PM Will Deacon <will@kernel.org> wrote: > > > > > Some list predicates can be used locklessly even with the non-RCU list > > > > > implementations, since they effectively boil down to a test against > > > > > NULL. For example, checking whether or not a list is empty is safe even > > > > > in the presence of a concurrent, tearing write to the list head pointer. > > > > > Similarly, checking whether or not an hlist node has been hashed is safe > > > > > as well. > > > > > > > > > > Annotate these lockless list predicates with data_race() and READ_ONCE() > > > > > so that KCSAN and the compiler are aware of what's going on. The writer > > > > > side can then avoid having to use WRITE_ONCE() in the non-RCU > > > > > implementation. > > > > [...] > > > > > static inline int list_empty(const struct list_head *head) > > > > > { > > > > > - return READ_ONCE(head->next) == head; > > > > > + return data_race(READ_ONCE(head->next) == head); > > > > > } > > > > [...] > > > > > static inline int hlist_unhashed(const struct hlist_node *h) > > > > > { > > > > > - return !READ_ONCE(h->pprev); > > > > > + return data_race(!READ_ONCE(h->pprev)); > > > > > } > > > > > > > > This is probably valid in practice for hlist_unhashed(), which > > > > compares with NULL, as long as the most significant byte of all kernel > > > > pointers is non-zero; but I think list_empty() could realistically > > > > return false positives in the presence of a concurrent tearing store? > > > > This could break the following code pattern: > > > > > > > > /* optimistic lockless check */ > > > > if (!list_empty(&some_list)) { > > > > /* slowpath */ > > > > mutex_lock(&some_mutex); > > > > list_for_each(tmp, &some_list) { > > > > ... > > > > } > > > > mutex_unlock(&some_mutex); > > > > } > > > > > > > > (I'm not sure whether patterns like this appear commonly though.) > > > > > > > > > I would hope not as the list could go "empty" before the lock is > > > grabbed. That pattern would be wrong. > > > > If the list becomes empty in between, the loop just iterates over > > nothing, and the effect is no different from what you'd get if you had > > bailed out before. But sure, you have to be aware that that can > > happen. > > Doh, yeah, so it is safe, crazy, but safe :) Here's an example of that pattern, I think (which I think is technically incorrect if what peterz said is accurate?): /** * waitqueue_active -- locklessly test for waiters on the queue * @wq_head: the waitqueue to test for waiters * * returns true if the wait list is not empty * * NOTE: this function is lockless and requires care, incorrect usage _will_ * lead to sporadic and non-obvious failure. * * Use either while holding wait_queue_head::lock or when used for wakeups * with an extra smp_mb() like:: * * CPU0 - waker CPU1 - waiter * * for (;;) { * @cond = true; prepare_to_wait(&wq_head, &wait, state); * smp_mb(); // smp_mb() from set_current_state() * if (waitqueue_active(wq_head)) if (@cond) * wake_up(wq_head); break; * schedule(); * } * finish_wait(&wq_head, &wait); * * Because without the explicit smp_mb() it's possible for the * waitqueue_active() load to get hoisted over the @cond store such that we'll * observe an empty wait list while the waiter might not observe @cond. * * Also note that this 'optimization' trades a spin_lock() for an smp_mb(), * which (when the lock is uncontended) are of roughly equal cost. */ static inline int waitqueue_active(struct wait_queue_head *wq_head) { return !list_empty(&wq_head->head); } void signalfd_cleanup(struct sighand_struct *sighand) { wait_queue_head_t *wqh = &sighand->signalfd_wqh; /* * The lockless check can race with remove_wait_queue() in progress, * but in this case its caller should run under rcu_read_lock() and * sighand_cachep is SLAB_TYPESAFE_BY_RCU, we can safely return. */ if (likely(!waitqueue_active(wqh))) return; /* wait_queue_entry_t->func(POLLFREE) should do remove_wait_queue() */ wake_up_poll(wqh, EPOLLHUP | POLLFREE); } and __add_wait_queue() just uses plain list_add(&wq_entry->entry, &wq_head->head) under a lock.
[mutt crashed while I was sending this; apologies if you receive it twice] On Tue, Mar 24, 2020 at 05:56:15PM +0100, Jann Horn wrote: > On Tue, Mar 24, 2020 at 5:51 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Mar 24, 2020 at 03:36:25PM +0000, Will Deacon wrote: > > > diff --git a/include/linux/list.h b/include/linux/list.h > > > index 4fed5a0f9b77..4d9f5f9ed1a8 100644 > > > --- a/include/linux/list.h > > > +++ b/include/linux/list.h > > > @@ -279,7 +279,7 @@ static inline int list_is_last(const struct list_head *list, > > > */ > > > static inline int list_empty(const struct list_head *head) > > > { > > > - return READ_ONCE(head->next) == head; > > > + return data_race(READ_ONCE(head->next) == head); > > > } > > > > list_empty() isn't lockless safe, that's what we have > > list_empty_careful() for. > > That thing looks like it could also use some READ_ONCE() sprinkled in... Crikey, how did I miss that? I need to spend some time understanding the ordering there. So it sounds like the KCSAN splats relating to list_empty() and loosely referred to by 1c97be677f72 ("list: Use WRITE_ONCE() when adding to lists and hlists") are indicative of real bugs and we should actually restore list_empty() to its former glory prior to 1658d35ead5d ("list: Use READ_ONCE() when testing for empty lists"). Alternatively, assuming list_empty_careful() does what it says on the tin, we could just make that the default. Will
On Tue, Mar 24, 2020 at 05:23:30PM +0100, Marco Elver wrote: > On Tue, 24 Mar 2020 at 16:37, Will Deacon <will@kernel.org> wrote: > > > > Some list predicates can be used locklessly even with the non-RCU list > > implementations, since they effectively boil down to a test against > > NULL. For example, checking whether or not a list is empty is safe even > > in the presence of a concurrent, tearing write to the list head pointer. > > Similarly, checking whether or not an hlist node has been hashed is safe > > as well. > > > > Annotate these lockless list predicates with data_race() and READ_ONCE() > > so that KCSAN and the compiler are aware of what's going on. The writer > > side can then avoid having to use WRITE_ONCE() in the non-RCU > > implementation. > > > > Cc: Paul E. McKenney <paulmck@kernel.org> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: Marco Elver <elver@google.com> > > Signed-off-by: Will Deacon <will@kernel.org> > > --- > > include/linux/list.h | 10 +++++----- > > include/linux/list_bl.h | 5 +++-- > > include/linux/list_nulls.h | 6 +++--- > > include/linux/llist.h | 2 +- > > 4 files changed, 12 insertions(+), 11 deletions(-) > > > > diff --git a/include/linux/list.h b/include/linux/list.h > > index 4fed5a0f9b77..4d9f5f9ed1a8 100644 > > --- a/include/linux/list.h > > +++ b/include/linux/list.h > > @@ -279,7 +279,7 @@ static inline int list_is_last(const struct list_head *list, > > */ > > static inline int list_empty(const struct list_head *head) > > { > > - return READ_ONCE(head->next) == head; > > + return data_race(READ_ONCE(head->next) == head); > > Double-marking should never be necessary, at least if you want to make > KCSAN happy. From what I gather there is an unmarked write somewhere, > correct? In that case, KCSAN will still complain because if it sees a > race between this read and the other write, then at least one is still > plain (the write). > > Then, my suggestion would be to mark the write with data_race() and > just leave this as a READ_ONCE(). Having a data_race() somewhere only > makes KCSAN stop reporting the race if the paired access is also > marked (be it with data_race() or _ONCE, etc.). > > Alternatively, if marking the write is impossible, you can surround > the access with kcsan_disable_current()/kcsan_enable_current(). Or, as > a last resort, just leaving as-is is fine too, because KCSAN's default > config (still) has KCSAN_ASSUME_PLAIN_WRITES_ATOMIC selected. Right, it looks like this is a bit of a smoking gun and we need to decide on whether list_empty() is actually usable without synchronisation first. Based on the outcome of that discussion, I'll update this patch accordingly. The main thing I want to avoid is marking parts of the non-RCU list implementation with data_race() or {READ,WRITE}_ONCE() because that's a sure-fire way to hide real bugs. Cheers, Will
On Tue, Mar 24, 2020 at 09:32:01PM +0000, Will Deacon wrote: > [mutt crashed while I was sending this; apologies if you receive it twice] > > On Tue, Mar 24, 2020 at 05:56:15PM +0100, Jann Horn wrote: > > On Tue, Mar 24, 2020 at 5:51 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > On Tue, Mar 24, 2020 at 03:36:25PM +0000, Will Deacon wrote: > > > > diff --git a/include/linux/list.h b/include/linux/list.h > > > > index 4fed5a0f9b77..4d9f5f9ed1a8 100644 > > > > --- a/include/linux/list.h > > > > +++ b/include/linux/list.h > > > > @@ -279,7 +279,7 @@ static inline int list_is_last(const struct list_head *list, > > > > */ > > > > static inline int list_empty(const struct list_head *head) > > > > { > > > > - return READ_ONCE(head->next) == head; > > > > + return data_race(READ_ONCE(head->next) == head); > > > > } > > > > > > list_empty() isn't lockless safe, that's what we have > > > list_empty_careful() for. > > > > That thing looks like it could also use some READ_ONCE() sprinkled in... > > Crikey, how did I miss that? I need to spend some time understanding the > ordering there. > > So it sounds like the KCSAN splats relating to list_empty() and loosely > referred to by 1c97be677f72 ("list: Use WRITE_ONCE() when adding to lists > and hlists") are indicative of real bugs and we should actually restore > list_empty() to its former glory prior to 1658d35ead5d ("list: Use > READ_ONCE() when testing for empty lists"). Alternatively, assuming > list_empty_careful() does what it says on the tin, we could just make that > the default. The list_empty_careful() function (suitably annotated) returns false if the list is non-empty, including when it is in the process of becoming either empty or non-empty. It would be fine for the lockless use cases I have come across. Thanx, Paul
On Tue, Mar 24, 2020 at 05:23:30PM +0100, Marco Elver wrote: > On Tue, 24 Mar 2020 at 16:37, Will Deacon <will@kernel.org> wrote: > > Some list predicates can be used locklessly even with the non-RCU list > > implementations, since they effectively boil down to a test against > > NULL. For example, checking whether or not a list is empty is safe even > > in the presence of a concurrent, tearing write to the list head pointer. > > Similarly, checking whether or not an hlist node has been hashed is safe > > as well. > > > > Annotate these lockless list predicates with data_race() and READ_ONCE() > > so that KCSAN and the compiler are aware of what's going on. The writer > > side can then avoid having to use WRITE_ONCE() in the non-RCU > > implementation. > > > > Cc: Paul E. McKenney <paulmck@kernel.org> > > Cc: Peter Zijlstra <peterz@infradead.org> > > Cc: Marco Elver <elver@google.com> > > Signed-off-by: Will Deacon <will@kernel.org> > > --- > > include/linux/list.h | 10 +++++----- > > include/linux/list_bl.h | 5 +++-- > > include/linux/list_nulls.h | 6 +++--- > > include/linux/llist.h | 2 +- > > 4 files changed, 12 insertions(+), 11 deletions(-) > > > > diff --git a/include/linux/list.h b/include/linux/list.h > > index 4fed5a0f9b77..4d9f5f9ed1a8 100644 > > --- a/include/linux/list.h > > +++ b/include/linux/list.h > > @@ -279,7 +279,7 @@ static inline int list_is_last(const struct list_head *list, > > */ > > static inline int list_empty(const struct list_head *head) > > { > > - return READ_ONCE(head->next) == head; > > + return data_race(READ_ONCE(head->next) == head); > > Double-marking should never be necessary, at least if you want to make > KCSAN happy. From what I gather there is an unmarked write somewhere, > correct? In that case, KCSAN will still complain because if it sees a > race between this read and the other write, then at least one is still > plain (the write). Ok, then I should drop the data_race() annotation and stick to READ_ONCE(), I think (but see below). > Then, my suggestion would be to mark the write with data_race() and > just leave this as a READ_ONCE(). Having a data_race() somewhere only > makes KCSAN stop reporting the race if the paired access is also > marked (be it with data_race() or _ONCE, etc.). The problem with taking that approach is that it ends up much of the list implementation annotated with either WRITE_ONCE() or data_race(), meaning that concurrent, racy list operations will no longer be reported by KCSAN. I think that's a pretty big deal and I'm strongly against annotating the internals of library code such as this because it means that buggy callers will largely go undetected. The situation we have here is that some calls, e.g. hlist_empty() are safe even in the presence of a racy write and I'd like to suppress KCSAN reports without annotating the writes at all. > Alternatively, if marking the write is impossible, you can surround > the access with kcsan_disable_current()/kcsan_enable_current(). Or, as > a last resort, just leaving as-is is fine too, because KCSAN's default > config (still) has KCSAN_ASSUME_PLAIN_WRITES_ATOMIC selected. Hmm, I suppose some bright spark will want to change the default at the some point though, no? ;) I'll look at using kcsan_disable_current()/kcsan_enable_current(), thanks. Will
On Tue, 31 Mar 2020 at 15:10, Will Deacon <will@kernel.org> wrote: > > On Tue, Mar 24, 2020 at 05:23:30PM +0100, Marco Elver wrote: > > On Tue, 24 Mar 2020 at 16:37, Will Deacon <will@kernel.org> wrote: > > > Some list predicates can be used locklessly even with the non-RCU list > > > implementations, since they effectively boil down to a test against > > > NULL. For example, checking whether or not a list is empty is safe even > > > in the presence of a concurrent, tearing write to the list head pointer. > > > Similarly, checking whether or not an hlist node has been hashed is safe > > > as well. > > > > > > Annotate these lockless list predicates with data_race() and READ_ONCE() > > > so that KCSAN and the compiler are aware of what's going on. The writer > > > side can then avoid having to use WRITE_ONCE() in the non-RCU > > > implementation. > > > > > > Cc: Paul E. McKenney <paulmck@kernel.org> > > > Cc: Peter Zijlstra <peterz@infradead.org> > > > Cc: Marco Elver <elver@google.com> > > > Signed-off-by: Will Deacon <will@kernel.org> > > > --- > > > include/linux/list.h | 10 +++++----- > > > include/linux/list_bl.h | 5 +++-- > > > include/linux/list_nulls.h | 6 +++--- > > > include/linux/llist.h | 2 +- > > > 4 files changed, 12 insertions(+), 11 deletions(-) > > > > > > diff --git a/include/linux/list.h b/include/linux/list.h > > > index 4fed5a0f9b77..4d9f5f9ed1a8 100644 > > > --- a/include/linux/list.h > > > +++ b/include/linux/list.h > > > @@ -279,7 +279,7 @@ static inline int list_is_last(const struct list_head *list, > > > */ > > > static inline int list_empty(const struct list_head *head) > > > { > > > - return READ_ONCE(head->next) == head; > > > + return data_race(READ_ONCE(head->next) == head); > > > > Double-marking should never be necessary, at least if you want to make > > KCSAN happy. From what I gather there is an unmarked write somewhere, > > correct? In that case, KCSAN will still complain because if it sees a > > race between this read and the other write, then at least one is still > > plain (the write). > > Ok, then I should drop the data_race() annotation and stick to READ_ONCE(), > I think (but see below). > > > Then, my suggestion would be to mark the write with data_race() and > > just leave this as a READ_ONCE(). Having a data_race() somewhere only > > makes KCSAN stop reporting the race if the paired access is also > > marked (be it with data_race() or _ONCE, etc.). > > The problem with taking that approach is that it ends up much of the > list implementation annotated with either WRITE_ONCE() or data_race(), > meaning that concurrent, racy list operations will no longer be reported > by KCSAN. I think that's a pretty big deal and I'm strongly against > annotating the internals of library code such as this because it means > that buggy callers will largely go undetected. > > The situation we have here is that some calls, e.g. hlist_empty() are > safe even in the presence of a racy write and I'd like to suppress KCSAN > reports without annotating the writes at all. > > > Alternatively, if marking the write is impossible, you can surround > > the access with kcsan_disable_current()/kcsan_enable_current(). Or, as > > a last resort, just leaving as-is is fine too, because KCSAN's default > > config (still) has KCSAN_ASSUME_PLAIN_WRITES_ATOMIC selected. > > Hmm, I suppose some bright spark will want to change the default at the some > point though, no? ;) I'll look at using > kcsan_disable_current()/kcsan_enable_current(), thanks. I think this will come up again (it did already come up in some other patch I reviewed, and Paul also mentioned it), so it seems best to change data_race() to match the intuitive semantics of just completely ignoring the access marked with it. I.e. marking accesses racing with accesses marked with data_race() is now optional: https://lkml.kernel.org/r/20200331193233.15180-1-elver@google.com In which case, the original patch you had here works just fine. Thanks, -- Marco
On Wed, Apr 01, 2020 at 08:34:36AM +0200, Marco Elver wrote: > On Tue, 31 Mar 2020 at 15:10, Will Deacon <will@kernel.org> wrote: > > On Tue, Mar 24, 2020 at 05:23:30PM +0100, Marco Elver wrote: > > > Then, my suggestion would be to mark the write with data_race() and > > > just leave this as a READ_ONCE(). Having a data_race() somewhere only > > > makes KCSAN stop reporting the race if the paired access is also > > > marked (be it with data_race() or _ONCE, etc.). > > > > The problem with taking that approach is that it ends up much of the > > list implementation annotated with either WRITE_ONCE() or data_race(), > > meaning that concurrent, racy list operations will no longer be reported > > by KCSAN. I think that's a pretty big deal and I'm strongly against > > annotating the internals of library code such as this because it means > > that buggy callers will largely go undetected. > > > > The situation we have here is that some calls, e.g. hlist_empty() are > > safe even in the presence of a racy write and I'd like to suppress KCSAN > > reports without annotating the writes at all. > > > > > Alternatively, if marking the write is impossible, you can surround > > > the access with kcsan_disable_current()/kcsan_enable_current(). Or, as > > > a last resort, just leaving as-is is fine too, because KCSAN's default > > > config (still) has KCSAN_ASSUME_PLAIN_WRITES_ATOMIC selected. > > > > Hmm, I suppose some bright spark will want to change the default at the some > > point though, no? ;) I'll look at using > > kcsan_disable_current()/kcsan_enable_current(), thanks. > > I think this will come up again (it did already come up in some other > patch I reviewed, and Paul also mentioned it), so it seems best to > change data_race() to match the intuitive semantics of just completely > ignoring the access marked with it. I.e. marking accesses racing with > accesses marked with data_race() is now optional: > https://lkml.kernel.org/r/20200331193233.15180-1-elver@google.com /me goes look. Thanks! > In which case, the original patch you had here works just fine. Ah yes, so now data_race(READ_ONCE(...)) does make sense as a combination. It's tempting to wrap that up as an accessor, but actually forcing people to spell it out might not be a bad thing after all. Will
On Mon, Mar 30, 2020 at 04:13:15PM -0700, Paul E. McKenney wrote: > On Tue, Mar 24, 2020 at 09:32:01PM +0000, Will Deacon wrote: > > [mutt crashed while I was sending this; apologies if you receive it twice] > > > > On Tue, Mar 24, 2020 at 05:56:15PM +0100, Jann Horn wrote: > > > On Tue, Mar 24, 2020 at 5:51 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Tue, Mar 24, 2020 at 03:36:25PM +0000, Will Deacon wrote: > > > > > diff --git a/include/linux/list.h b/include/linux/list.h > > > > > index 4fed5a0f9b77..4d9f5f9ed1a8 100644 > > > > > --- a/include/linux/list.h > > > > > +++ b/include/linux/list.h > > > > > @@ -279,7 +279,7 @@ static inline int list_is_last(const struct list_head *list, > > > > > */ > > > > > static inline int list_empty(const struct list_head *head) > > > > > { > > > > > - return READ_ONCE(head->next) == head; > > > > > + return data_race(READ_ONCE(head->next) == head); > > > > > } > > > > > > > > list_empty() isn't lockless safe, that's what we have > > > > list_empty_careful() for. > > > > > > That thing looks like it could also use some READ_ONCE() sprinkled in... > > > > Crikey, how did I miss that? I need to spend some time understanding the > > ordering there. > > > > So it sounds like the KCSAN splats relating to list_empty() and loosely > > referred to by 1c97be677f72 ("list: Use WRITE_ONCE() when adding to lists > > and hlists") are indicative of real bugs and we should actually restore > > list_empty() to its former glory prior to 1658d35ead5d ("list: Use > > READ_ONCE() when testing for empty lists"). Alternatively, assuming > > list_empty_careful() does what it says on the tin, we could just make that > > the default. > > The list_empty_careful() function (suitably annotated) returns false if > the list is non-empty, including when it is in the process of becoming > either empty or non-empty. It would be fine for the lockless use cases > I have come across. Hmm, I had a look at the implementation and I'm not at all convinced that it's correct. First of all, the comment above it states: * NOTE: using list_empty_careful() without synchronization * can only be safe if the only activity that can happen * to the list entry is list_del_init(). Eg. it cannot be used * if another CPU could re-list_add() it. but it seems that people disregard this note and instead use it as a general-purpose lockless test, taking a lock and rechecking if it returns non-empty. It would also mean we'd have to keep the WRITE_ONCE() in INIT_LIST_HEAD, which is something that I've been trying to remove. In the face of something like a concurrent list_add(); list_add_tail() sequence, then the tearing writes to the head->{prev,next} pointers could cause list_empty_careful() to indicate that the list is momentarily empty. I've started looking at whether we can use a NULL next pointer to indicate an empty list, which might allow us to kill the __list_del_clearprev() hack at the same time, but I've not found enough time to really get my teeth into it yet. Will
On Fri, Apr 24, 2020 at 06:39:33PM +0100, Will Deacon wrote: > On Mon, Mar 30, 2020 at 04:13:15PM -0700, Paul E. McKenney wrote: > > On Tue, Mar 24, 2020 at 09:32:01PM +0000, Will Deacon wrote: > > > [mutt crashed while I was sending this; apologies if you receive it twice] > > > > > > On Tue, Mar 24, 2020 at 05:56:15PM +0100, Jann Horn wrote: > > > > On Tue, Mar 24, 2020 at 5:51 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > On Tue, Mar 24, 2020 at 03:36:25PM +0000, Will Deacon wrote: > > > > > > diff --git a/include/linux/list.h b/include/linux/list.h > > > > > > index 4fed5a0f9b77..4d9f5f9ed1a8 100644 > > > > > > --- a/include/linux/list.h > > > > > > +++ b/include/linux/list.h > > > > > > @@ -279,7 +279,7 @@ static inline int list_is_last(const struct list_head *list, > > > > > > */ > > > > > > static inline int list_empty(const struct list_head *head) > > > > > > { > > > > > > - return READ_ONCE(head->next) == head; > > > > > > + return data_race(READ_ONCE(head->next) == head); > > > > > > } > > > > > > > > > > list_empty() isn't lockless safe, that's what we have > > > > > list_empty_careful() for. > > > > > > > > That thing looks like it could also use some READ_ONCE() sprinkled in... > > > > > > Crikey, how did I miss that? I need to spend some time understanding the > > > ordering there. > > > > > > So it sounds like the KCSAN splats relating to list_empty() and loosely > > > referred to by 1c97be677f72 ("list: Use WRITE_ONCE() when adding to lists > > > and hlists") are indicative of real bugs and we should actually restore > > > list_empty() to its former glory prior to 1658d35ead5d ("list: Use > > > READ_ONCE() when testing for empty lists"). Alternatively, assuming > > > list_empty_careful() does what it says on the tin, we could just make that > > > the default. > > > > The list_empty_careful() function (suitably annotated) returns false if > > the list is non-empty, including when it is in the process of becoming > > either empty or non-empty. It would be fine for the lockless use cases > > I have come across. > > Hmm, I had a look at the implementation and I'm not at all convinced that > it's correct. First of all, the comment above it states: > > * NOTE: using list_empty_careful() without synchronization > * can only be safe if the only activity that can happen > * to the list entry is list_del_init(). Eg. it cannot be used > * if another CPU could re-list_add() it. Huh. This thing is unchanged since 2.6.12-rc2, back in 2005: static inline int list_empty_careful(const struct list_head *head) { struct list_head *next = head->next; return (next == head) && (next == head->prev); } I can imagine compiler value-caching optimizations that would cause trouble, for example, if a previous obsolete fetch from head->prev was lying around in a register, causing this function to say "not empty" when it was in fact empty. Of course, if obsolete values for both head->next and head->prev were lying around, pretty much anything could happen. > but it seems that people disregard this note and instead use it as a > general-purpose lockless test, taking a lock and rechecking if it returns > non-empty. It would also mean we'd have to keep the WRITE_ONCE() in > INIT_LIST_HEAD, which is something that I've been trying to remove. > > In the face of something like a concurrent list_add(); list_add_tail() > sequence, then the tearing writes to the head->{prev,next} pointers could > cause list_empty_careful() to indicate that the list is momentarily empty. > > I've started looking at whether we can use a NULL next pointer to indicate > an empty list, which might allow us to kill the __list_del_clearprev() hack > at the same time, but I've not found enough time to really get my teeth into > it yet. In the delete-only case, I kind of get it, other than the potential for optimization. Once the list becomes empty, it will forever remain empty. And the additional test of head->prev avoids this returning true while the deletion is half done (again, aside from the potential for optimization). If insertions are allowed, the thing I haven't quite figured out yet is what is being gained by the additional check of head->prev. After all, if updates are not excluded, the return value can become obsolete immediately anyhow. Yes, it could be used as a heuristic, but it could report empty immediately before a list_add(), so there would need to either be a careful wakeup protocol or a periodic poll of the list. Or am I missing a trick here? Thanx, Paul
diff --git a/include/linux/list.h b/include/linux/list.h index 4fed5a0f9b77..4d9f5f9ed1a8 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -279,7 +279,7 @@ static inline int list_is_last(const struct list_head *list, */ static inline int list_empty(const struct list_head *head) { - return READ_ONCE(head->next) == head; + return data_race(READ_ONCE(head->next) == head); } /** @@ -524,7 +524,7 @@ static inline void list_splice_tail_init(struct list_head *list, */ #define list_first_entry_or_null(ptr, type, member) ({ \ struct list_head *head__ = (ptr); \ - struct list_head *pos__ = READ_ONCE(head__->next); \ + struct list_head *pos__ = data_race(READ_ONCE(head__->next)); \ pos__ != head__ ? list_entry(pos__, type, member) : NULL; \ }) @@ -772,13 +772,13 @@ static inline void INIT_HLIST_NODE(struct hlist_node *h) * hlist_unhashed - Has node been removed from list and reinitialized? * @h: Node to be checked * - * Not that not all removal functions will leave a node in unhashed + * Note that not all removal functions will leave a node in unhashed * state. For example, hlist_nulls_del_init_rcu() does leave the * node in unhashed state, but hlist_nulls_del() does not. */ static inline int hlist_unhashed(const struct hlist_node *h) { - return !READ_ONCE(h->pprev); + return data_race(!READ_ONCE(h->pprev)); } /** @@ -787,7 +787,7 @@ static inline int hlist_unhashed(const struct hlist_node *h) */ static inline int hlist_empty(const struct hlist_head *h) { - return !READ_ONCE(h->first); + return data_race(!READ_ONCE(h->first)); } static inline void __hlist_del(struct hlist_node *n) diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h index ae1b541446c9..1804fdb84dda 100644 --- a/include/linux/list_bl.h +++ b/include/linux/list_bl.h @@ -51,7 +51,7 @@ static inline void INIT_HLIST_BL_NODE(struct hlist_bl_node *h) static inline bool hlist_bl_unhashed(const struct hlist_bl_node *h) { - return !h->pprev; + return data_race(!READ_ONCE(h->pprev)); } static inline struct hlist_bl_node *hlist_bl_first(struct hlist_bl_head *h) @@ -71,7 +71,8 @@ static inline void hlist_bl_set_first(struct hlist_bl_head *h, static inline bool hlist_bl_empty(const struct hlist_bl_head *h) { - return !((unsigned long)READ_ONCE(h->first) & ~LIST_BL_LOCKMASK); + unsigned long first = data_race((unsigned long)READ_ONCE(h->first)); + return !(first & ~LIST_BL_LOCKMASK); } static inline void hlist_bl_add_head(struct hlist_bl_node *n, diff --git a/include/linux/list_nulls.h b/include/linux/list_nulls.h index 3a9ff01e9a11..fa51a801bf32 100644 --- a/include/linux/list_nulls.h +++ b/include/linux/list_nulls.h @@ -60,18 +60,18 @@ static inline unsigned long get_nulls_value(const struct hlist_nulls_node *ptr) * hlist_nulls_unhashed - Has node been removed and reinitialized? * @h: Node to be checked * - * Not that not all removal functions will leave a node in unhashed state. + * Note that not all removal functions will leave a node in unhashed state. * For example, hlist_del_init_rcu() leaves the node in unhashed state, * but hlist_nulls_del() does not. */ static inline int hlist_nulls_unhashed(const struct hlist_nulls_node *h) { - return !READ_ONCE(h->pprev); + return data_race(!READ_ONCE(h->pprev)); } static inline int hlist_nulls_empty(const struct hlist_nulls_head *h) { - return is_a_nulls(READ_ONCE(h->first)); + return data_race(is_a_nulls(READ_ONCE(h->first))); } static inline void hlist_nulls_add_head(struct hlist_nulls_node *n, diff --git a/include/linux/llist.h b/include/linux/llist.h index 2e9c7215882b..c7f042b73899 100644 --- a/include/linux/llist.h +++ b/include/linux/llist.h @@ -186,7 +186,7 @@ static inline void init_llist_head(struct llist_head *list) */ static inline bool llist_empty(const struct llist_head *head) { - return READ_ONCE(head->first) == NULL; + return data_race(READ_ONCE(head->first) == NULL); } static inline struct llist_node *llist_next(struct llist_node *node)
Some list predicates can be used locklessly even with the non-RCU list implementations, since they effectively boil down to a test against NULL. For example, checking whether or not a list is empty is safe even in the presence of a concurrent, tearing write to the list head pointer. Similarly, checking whether or not an hlist node has been hashed is safe as well. Annotate these lockless list predicates with data_race() and READ_ONCE() so that KCSAN and the compiler are aware of what's going on. The writer side can then avoid having to use WRITE_ONCE() in the non-RCU implementation. Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Marco Elver <elver@google.com> Signed-off-by: Will Deacon <will@kernel.org> --- include/linux/list.h | 10 +++++----- include/linux/list_bl.h | 5 +++-- include/linux/list_nulls.h | 6 +++--- include/linux/llist.h | 2 +- 4 files changed, 12 insertions(+), 11 deletions(-)