diff mbox series

dm thin: make get_first_thin use rcu-safe list first function

Message ID 20250107232458.GA1860@templeofstupid.com (mailing list archive)
State Accepted, archived
Delegated to: Mikulas Patocka
Headers show
Series dm thin: make get_first_thin use rcu-safe list first function | expand

Commit Message

Krister Johansen Jan. 7, 2025, 11:24 p.m. UTC
The documentation in rculist.h explains the absence of list_empty_rcu()
and cautions programmers against relying on a list_empty() ->
list_first() sequence in RCU safe code.  This is because each of these
functions performs its own READ_ONCE() of the list head.  This can lead
to a situation where the list_empty() sees a valid list entry, but the
subsequent list_first() sees a different view of list head state after a
modification.

In the case of dm-thin, this author had a production box crash from a GP
fault in the process_deferred_bios path.  This function saw a valid list
head in get_first_thin() but when it subsequently dereferenced that and
turned it into a thin_c, it got the inside of the struct pool, since the
list was now empty and referring to itself.  The kernel on which this
occurred printed both a warning about a refcount_t being saturated, and
a UBSAN error for an out-of-bounds cpuid access in the queued spinlock,
prior to the fault itself.  When the resulting kdump was examined, it
was possible to see another thread patiently waiting in thin_dtr's
synchronize_rcu.

The thin_dtr call managed to pull the thin_c out of the active thins
list (and have it be the last entry in the active_thins list) at just
the wrong moment which lead to this crash.

Fortunately, the fix here is straight forward.  Switch get_first_thin()
function to use list_first_or_null_rcu() which performs just a single
READ_ONCE() and returns NULL if the list is already empty.

This was run against the devicemapper test suite's thin-provisioning
suites for delete and suspend and no regressions were observed.

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Fixes: b10ebd34ccca ("dm thin: fix rcu_read_lock being held in code that can sleep")
Cc: stable@vger.kernel.org
---
 drivers/md/dm-thin.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Ming Hung Tsai Jan. 8, 2025, 9:42 a.m. UTC | #1
On Wed, Jan 8, 2025 at 7:52 AM Krister Johansen <kjlx@templeofstupid.com> wrote:
>
> The documentation in rculist.h explains the absence of list_empty_rcu()
> and cautions programmers against relying on a list_empty() ->
> list_first() sequence in RCU safe code.  This is because each of these
> functions performs its own READ_ONCE() of the list head.  This can lead
> to a situation where the list_empty() sees a valid list entry, but the
> subsequent list_first() sees a different view of list head state after a
> modification.
>
> In the case of dm-thin, this author had a production box crash from a GP
> fault in the process_deferred_bios path.  This function saw a valid list
> head in get_first_thin() but when it subsequently dereferenced that and
> turned it into a thin_c, it got the inside of the struct pool, since the
> list was now empty and referring to itself.  The kernel on which this
> occurred printed both a warning about a refcount_t being saturated, and
> a UBSAN error for an out-of-bounds cpuid access in the queued spinlock,
> prior to the fault itself.  When the resulting kdump was examined, it
> was possible to see another thread patiently waiting in thin_dtr's
> synchronize_rcu.
>
> The thin_dtr call managed to pull the thin_c out of the active thins
> list (and have it be the last entry in the active_thins list) at just
> the wrong moment which lead to this crash.
>
> Fortunately, the fix here is straight forward.  Switch get_first_thin()
> function to use list_first_or_null_rcu() which performs just a single
> READ_ONCE() and returns NULL if the list is already empty.
>
> This was run against the devicemapper test suite's thin-provisioning
> suites for delete and suspend and no regressions were observed.
>
> Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
> Fixes: b10ebd34ccca ("dm thin: fix rcu_read_lock being held in code that can sleep")
> Cc: stable@vger.kernel.org
> ---
>  drivers/md/dm-thin.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index bf0f9dddd146..05cf4e3f2bbe 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -2332,10 +2332,9 @@ static struct thin_c *get_first_thin(struct pool *pool)
>         struct thin_c *tc = NULL;
>
>         rcu_read_lock();
> -       if (!list_empty(&pool->active_thins)) {
> -               tc = list_entry_rcu(pool->active_thins.next, struct thin_c, list);
> +       tc = list_first_or_null_rcu(&pool->active_thins, struct thin_c, list);
> +       if (tc)
>                 thin_get(tc);
> -       }
>         rcu_read_unlock();
>
>         return tc;
> --
> 2.25.1
>
>

Acked-by: Ming-Hung Tsai <mtsai@redhat.com>
diff mbox series

Patch

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index bf0f9dddd146..05cf4e3f2bbe 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -2332,10 +2332,9 @@  static struct thin_c *get_first_thin(struct pool *pool)
 	struct thin_c *tc = NULL;
 
 	rcu_read_lock();
-	if (!list_empty(&pool->active_thins)) {
-		tc = list_entry_rcu(pool->active_thins.next, struct thin_c, list);
+	tc = list_first_or_null_rcu(&pool->active_thins, struct thin_c, list);
+	if (tc)
 		thin_get(tc);
-	}
 	rcu_read_unlock();
 
 	return tc;