Message ID | 20250107232458.GA1860@templeofstupid.com (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Mikulas Patocka |
Headers | show |
Series | dm thin: make get_first_thin use rcu-safe list first function | expand |
On Wed, Jan 8, 2025 at 7:52 AM Krister Johansen <kjlx@templeofstupid.com> wrote: > > The documentation in rculist.h explains the absence of list_empty_rcu() > and cautions programmers against relying on a list_empty() -> > list_first() sequence in RCU safe code. This is because each of these > functions performs its own READ_ONCE() of the list head. This can lead > to a situation where the list_empty() sees a valid list entry, but the > subsequent list_first() sees a different view of list head state after a > modification. > > In the case of dm-thin, this author had a production box crash from a GP > fault in the process_deferred_bios path. This function saw a valid list > head in get_first_thin() but when it subsequently dereferenced that and > turned it into a thin_c, it got the inside of the struct pool, since the > list was now empty and referring to itself. The kernel on which this > occurred printed both a warning about a refcount_t being saturated, and > a UBSAN error for an out-of-bounds cpuid access in the queued spinlock, > prior to the fault itself. When the resulting kdump was examined, it > was possible to see another thread patiently waiting in thin_dtr's > synchronize_rcu. > > The thin_dtr call managed to pull the thin_c out of the active thins > list (and have it be the last entry in the active_thins list) at just > the wrong moment which lead to this crash. > > Fortunately, the fix here is straight forward. Switch get_first_thin() > function to use list_first_or_null_rcu() which performs just a single > READ_ONCE() and returns NULL if the list is already empty. > > This was run against the devicemapper test suite's thin-provisioning > suites for delete and suspend and no regressions were observed. > > Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> > Fixes: b10ebd34ccca ("dm thin: fix rcu_read_lock being held in code that can sleep") > Cc: stable@vger.kernel.org > --- > drivers/md/dm-thin.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c > index bf0f9dddd146..05cf4e3f2bbe 100644 > --- a/drivers/md/dm-thin.c > +++ b/drivers/md/dm-thin.c > @@ -2332,10 +2332,9 @@ static struct thin_c *get_first_thin(struct pool *pool) > struct thin_c *tc = NULL; > > rcu_read_lock(); > - if (!list_empty(&pool->active_thins)) { > - tc = list_entry_rcu(pool->active_thins.next, struct thin_c, list); > + tc = list_first_or_null_rcu(&pool->active_thins, struct thin_c, list); > + if (tc) > thin_get(tc); > - } > rcu_read_unlock(); > > return tc; > -- > 2.25.1 > > Acked-by: Ming-Hung Tsai <mtsai@redhat.com>
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index bf0f9dddd146..05cf4e3f2bbe 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -2332,10 +2332,9 @@ static struct thin_c *get_first_thin(struct pool *pool) struct thin_c *tc = NULL; rcu_read_lock(); - if (!list_empty(&pool->active_thins)) { - tc = list_entry_rcu(pool->active_thins.next, struct thin_c, list); + tc = list_first_or_null_rcu(&pool->active_thins, struct thin_c, list); + if (tc) thin_get(tc); - } rcu_read_unlock(); return tc;
The documentation in rculist.h explains the absence of list_empty_rcu() and cautions programmers against relying on a list_empty() -> list_first() sequence in RCU safe code. This is because each of these functions performs its own READ_ONCE() of the list head. This can lead to a situation where the list_empty() sees a valid list entry, but the subsequent list_first() sees a different view of list head state after a modification. In the case of dm-thin, this author had a production box crash from a GP fault in the process_deferred_bios path. This function saw a valid list head in get_first_thin() but when it subsequently dereferenced that and turned it into a thin_c, it got the inside of the struct pool, since the list was now empty and referring to itself. The kernel on which this occurred printed both a warning about a refcount_t being saturated, and a UBSAN error for an out-of-bounds cpuid access in the queued spinlock, prior to the fault itself. When the resulting kdump was examined, it was possible to see another thread patiently waiting in thin_dtr's synchronize_rcu. The thin_dtr call managed to pull the thin_c out of the active thins list (and have it be the last entry in the active_thins list) at just the wrong moment which lead to this crash. Fortunately, the fix here is straight forward. Switch get_first_thin() function to use list_first_or_null_rcu() which performs just a single READ_ONCE() and returns NULL if the list is already empty. This was run against the devicemapper test suite's thin-provisioning suites for delete and suspend and no regressions were observed. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Fixes: b10ebd34ccca ("dm thin: fix rcu_read_lock being held in code that can sleep") Cc: stable@vger.kernel.org --- drivers/md/dm-thin.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)