diff mbox series

[v7] KEYS: Add a list for unreferenced keys

Message ID 20250407023918.29956-1-jarkko@kernel.org (mailing list archive)
State New
Headers show
Series [v7] KEYS: Add a list for unreferenced keys | expand

Commit Message

Jarkko Sakkinen April 7, 2025, 2:39 a.m. UTC
From: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>

Add an isolated list of unreferenced keys to be queued for deletion, and
try to pin the keys in the garbage collector before processing anything.
Skip unpinnable keys.

Use this list for blocking the reaping process during the teardown:

1. First off, the keys added to `keys_graveyard` are snapshotted, and the
   list is flushed. This the very last step in `key_put()`.
2. `key_put()` reaches zero. This will mark key as busy for the garbage
   collector.
3. `key_garbage_collector()` will try to increase refcount, which won't go
   above zero. Whenever this happens, the key will be skipped.

Cc: stable@vger.kernel.org # v6.1+
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
---
v7:
- Fixed multiple definitions (from rebasing, sorry).
v6:
- Rebase went wrong in v5.
v5:
- Rebased on top of v6.15-rc
- Updated commit message to explain how spin lock and refcount
  isolate the time window in key_put().
v4:
- Pin the key while processing key type teardown. Skip dead keys.
- Revert key_gc_graveyard back key_gc_unused_keys.
- Rewrote the commit message.
- "unsigned long flags" declaration somehow did make to the previous
  patch (sorry).
v3:
- Using spin_lock() fails since key_put() is executed inside IRQs.
  Using spin_lock_irqsave() would neither work given the lock is
  acquired for /proc/keys. Therefore, separate the lock for
  graveyard and key_graveyard before reaping key_serial_tree.
v2:
- Rename key_gc_unused_keys as key_gc_graveyard, and re-document the
  function.
---
 include/linux/key.h      |  7 ++-----
 security/keys/gc.c       | 40 ++++++++++++++++++++++++----------------
 security/keys/internal.h |  5 +++++
 security/keys/key.c      |  7 +++++--
 4 files changed, 36 insertions(+), 23 deletions(-)

Comments

Marek Szyprowski April 7, 2025, 10:25 a.m. UTC | #1
On 07.04.2025 04:39, Jarkko Sakkinen wrote:
> From: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
>
> Add an isolated list of unreferenced keys to be queued for deletion, and
> try to pin the keys in the garbage collector before processing anything.
> Skip unpinnable keys.
>
> Use this list for blocking the reaping process during the teardown:
>
> 1. First off, the keys added to `keys_graveyard` are snapshotted, and the
>     list is flushed. This the very last step in `key_put()`.
> 2. `key_put()` reaches zero. This will mark key as busy for the garbage
>     collector.
> 3. `key_garbage_collector()` will try to increase refcount, which won't go
>     above zero. Whenever this happens, the key will be skipped.
>
> Cc: stable@vger.kernel.org # v6.1+
> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>

This patch landed in today's linux-next as commit b0d023797e3e ("keys: 
Add a list for unreferenced keys"). In my tests I found that it triggers 
the following lockdep issue:

================================
WARNING: inconsistent lock state
6.15.0-rc1-next-20250407 #15630 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/3/32 [HC0[0]:SC1[1]:HE1:SE0] takes:
c13fdd68 (key_serial_lock){+.?.}-{2:2}, at: key_put+0x74/0x128
{SOFTIRQ-ON-W} state was registered at:
   lock_acquire+0x134/0x384
   _raw_spin_lock+0x38/0x48
   key_alloc+0x2fc/0x4d8
   keyring_alloc+0x40/0x90
   system_trusted_keyring_init+0x50/0x7c
   do_one_initcall+0x68/0x314
   kernel_init_freeable+0x1c0/0x224
   kernel_init+0x1c/0x12c
   ret_from_fork+0x14/0x28
irq event stamp: 234
hardirqs last  enabled at (234): [<c0cb7060>] 
_raw_spin_unlock_irqrestore+0x5c/0x60
hardirqs last disabled at (233): [<c0cb6dd0>] 
_raw_spin_lock_irqsave+0x64/0x68
softirqs last  enabled at (42): [<c013bcd8>] handle_softirqs+0x328/0x520
softirqs last disabled at (47): [<c013bf10>] run_ksoftirqd+0x40/0x68

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(key_serial_lock);
   <Interrupt>
     lock(key_serial_lock);

  *** DEADLOCK ***

1 lock held by ksoftirqd/3/32:
  #0: c137c040 (rcu_callback){....}-{0:0}, at: rcu_core+0x4d0/0x14a4

stack backtrace:
CPU: 3 UID: 0 PID: 32 Comm: ksoftirqd/3 Not tainted 
6.15.0-rc1-next-20250407 #15630 PREEMPT
Hardware name: Samsung Exynos (Flattened Device Tree)
Call trace:
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x68/0x88
  dump_stack_lvl from print_usage_bug.part.0+0x24c/0x270
  print_usage_bug.part.0 from mark_lock.part.0+0xc20/0x129c
  mark_lock.part.0 from __lock_acquire+0xafc/0x2970
  __lock_acquire from lock_acquire+0x134/0x384
  lock_acquire from _raw_spin_lock+0x38/0x48
  _raw_spin_lock from key_put+0x74/0x128
  key_put from put_cred_rcu+0x20/0xd0
  put_cred_rcu from rcu_core+0x478/0x14a4
  rcu_core from handle_softirqs+0x130/0x520
  handle_softirqs from run_ksoftirqd+0x40/0x68
  run_ksoftirqd from smpboot_thread_fn+0x17c/0x330
  smpboot_thread_fn from kthread+0x138/0x25c
  kthread from ret_from_fork+0x14/0x28
Exception stack(0xf090dfb0 to 0xf090dff8)
...

To fix this issue I had to change all calls around key_serial_lock and 
key_graveyard_lock spinlocks with the irqsave/irqrestore variants (in 
security/keys/key.c and security/keys/gc.c), but I'm not sure if this is 
desired solution.

> ---
> v7:
> - Fixed multiple definitions (from rebasing, sorry).
> v6:
> - Rebase went wrong in v5.
> v5:
> - Rebased on top of v6.15-rc
> - Updated commit message to explain how spin lock and refcount
>    isolate the time window in key_put().
> v4:
> - Pin the key while processing key type teardown. Skip dead keys.
> - Revert key_gc_graveyard back key_gc_unused_keys.
> - Rewrote the commit message.
> - "unsigned long flags" declaration somehow did make to the previous
>    patch (sorry).
> v3:
> - Using spin_lock() fails since key_put() is executed inside IRQs.
>    Using spin_lock_irqsave() would neither work given the lock is
>    acquired for /proc/keys. Therefore, separate the lock for
>    graveyard and key_graveyard before reaping key_serial_tree.
> v2:
> - Rename key_gc_unused_keys as key_gc_graveyard, and re-document the
>    function.
> ---
>   include/linux/key.h      |  7 ++-----
>   security/keys/gc.c       | 40 ++++++++++++++++++++++++----------------
>   security/keys/internal.h |  5 +++++
>   security/keys/key.c      |  7 +++++--
>   4 files changed, 36 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/key.h b/include/linux/key.h
> index ba05de8579ec..c50659184bdf 100644
> --- a/include/linux/key.h
> +++ b/include/linux/key.h
> @@ -195,10 +195,8 @@ enum key_state {
>   struct key {
>   	refcount_t		usage;		/* number of references */
>   	key_serial_t		serial;		/* key serial number */
> -	union {
> -		struct list_head graveyard_link;
> -		struct rb_node	serial_node;
> -	};
> +	struct list_head	graveyard_link; /* key->usage == 0 */
> +	struct rb_node		serial_node;
>   #ifdef CONFIG_KEY_NOTIFICATIONS
>   	struct watch_list	*watchers;	/* Entities watching this key for changes */
>   #endif
> @@ -236,7 +234,6 @@ struct key {
>   #define KEY_FLAG_ROOT_CAN_INVAL	7	/* set if key can be invalidated by root without permission */
>   #define KEY_FLAG_KEEP		8	/* set if key should not be removed */
>   #define KEY_FLAG_UID_KEYRING	9	/* set if key is a user or user session keyring */
> -#define KEY_FLAG_FINAL_PUT	10	/* set if final put has happened on key */
>   
>   	/* the key type and key description string
>   	 * - the desc is used to match a key against search criteria
> diff --git a/security/keys/gc.c b/security/keys/gc.c
> index f27223ea4578..0a3beb68633c 100644
> --- a/security/keys/gc.c
> +++ b/security/keys/gc.c
> @@ -189,6 +189,7 @@ static void key_garbage_collector(struct work_struct *work)
>   	struct rb_node *cursor;
>   	struct key *key;
>   	time64_t new_timer, limit, expiry;
> +	unsigned long flags;
>   
>   	kenter("[%lx,%x]", key_gc_flags, gc_state);
>   
> @@ -206,21 +207,35 @@ static void key_garbage_collector(struct work_struct *work)
>   
>   	new_timer = TIME64_MAX;
>   
> +	spin_lock_irqsave(&key_graveyard_lock, flags);
> +	list_splice_init(&key_graveyard, &graveyard);
> +	spin_unlock_irqrestore(&key_graveyard_lock, flags);
> +
> +	list_for_each_entry(key, &graveyard, graveyard_link) {
> +		spin_lock(&key_serial_lock);
> +		kdebug("unrefd key %d", key->serial);
> +		rb_erase(&key->serial_node, &key_serial_tree);
> +		spin_unlock(&key_serial_lock);
> +	}
> +
>   	/* As only this function is permitted to remove things from the key
>   	 * serial tree, if cursor is non-NULL then it will always point to a
>   	 * valid node in the tree - even if lock got dropped.
>   	 */
>   	spin_lock(&key_serial_lock);
> +	key = NULL;
>   	cursor = rb_first(&key_serial_tree);
>   
>   continue_scanning:
> +	key_put(key);
>   	while (cursor) {
>   		key = rb_entry(cursor, struct key, serial_node);
>   		cursor = rb_next(cursor);
> -
> -		if (test_bit(KEY_FLAG_FINAL_PUT, &key->flags)) {
> -			smp_mb(); /* Clobber key->user after FINAL_PUT seen. */
> -			goto found_unreferenced_key;
> +		/* key_get(), unless zero: */
> +		if (!refcount_inc_not_zero(&key->usage)) {
> +			key = NULL;
> +			gc_state |= KEY_GC_REAP_AGAIN;
> +			goto skip_dead_key;
>   		}
>   
>   		if (unlikely(gc_state & KEY_GC_REAPING_DEAD_1)) {
> @@ -274,6 +289,7 @@ static void key_garbage_collector(struct work_struct *work)
>   		spin_lock(&key_serial_lock);
>   		goto continue_scanning;
>   	}
> +	key_put(key);
>   
>   	/* We've completed the pass.  Set the timer if we need to and queue a
>   	 * new cycle if necessary.  We keep executing cycles until we find one
> @@ -286,6 +302,10 @@ static void key_garbage_collector(struct work_struct *work)
>   		key_schedule_gc(new_timer);
>   	}
>   
> +	spin_lock(&key_graveyard_lock);
> +	list_splice_init(&key_graveyard, &graveyard);
> +	spin_unlock(&key_graveyard_lock);
> +
>   	if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2) ||
>   	    !list_empty(&graveyard)) {
>   		/* Make sure that all pending keyring payload destructions are
> @@ -328,18 +348,6 @@ static void key_garbage_collector(struct work_struct *work)
>   	kleave(" [end %x]", gc_state);
>   	return;
>   
> -	/* We found an unreferenced key - once we've removed it from the tree,
> -	 * we can safely drop the lock.
> -	 */
> -found_unreferenced_key:
> -	kdebug("unrefd key %d", key->serial);
> -	rb_erase(&key->serial_node, &key_serial_tree);
> -	spin_unlock(&key_serial_lock);
> -
> -	list_add_tail(&key->graveyard_link, &graveyard);
> -	gc_state |= KEY_GC_REAP_AGAIN;
> -	goto maybe_resched;
> -
>   	/* We found a restricted keyring and need to update the restriction if
>   	 * it is associated with the dead key type.
>   	 */
> diff --git a/security/keys/internal.h b/security/keys/internal.h
> index 2cffa6dc8255..4e3d9b322390 100644
> --- a/security/keys/internal.h
> +++ b/security/keys/internal.h
> @@ -63,9 +63,14 @@ struct key_user {
>   	int			qnbytes;	/* number of bytes allocated to this user */
>   };
>   
> +extern struct list_head key_graveyard;
> +extern spinlock_t key_graveyard_lock;
> +
>   extern struct rb_root	key_user_tree;
>   extern spinlock_t	key_user_lock;
>   extern struct key_user	root_key_user;
> +extern struct list_head	key_graveyard;
> +extern spinlock_t	key_graveyard_lock;
>   
>   extern struct key_user *key_user_lookup(kuid_t uid);
>   extern void key_user_put(struct key_user *user);
> diff --git a/security/keys/key.c b/security/keys/key.c
> index 7198cd2ac3a3..7511f2017b6b 100644
> --- a/security/keys/key.c
> +++ b/security/keys/key.c
> @@ -22,6 +22,8 @@ DEFINE_SPINLOCK(key_serial_lock);
>   
>   struct rb_root	key_user_tree; /* tree of quota records indexed by UID */
>   DEFINE_SPINLOCK(key_user_lock);
> +LIST_HEAD(key_graveyard);
> +DEFINE_SPINLOCK(key_graveyard_lock);
>   
>   unsigned int key_quota_root_maxkeys = 1000000;	/* root's key count quota */
>   unsigned int key_quota_root_maxbytes = 25000000; /* root's key space quota */
> @@ -658,8 +660,9 @@ void key_put(struct key *key)
>   				key->user->qnbytes -= key->quotalen;
>   				spin_unlock_irqrestore(&key->user->lock, flags);
>   			}
> -			smp_mb(); /* key->user before FINAL_PUT set. */
> -			set_bit(KEY_FLAG_FINAL_PUT, &key->flags);
> +			spin_lock_irqsave(&key_graveyard_lock, flags);
> +			list_add_tail(&key->graveyard_link, &key_graveyard);
> +			spin_unlock_irqrestore(&key_graveyard_lock, flags);
>   			schedule_work(&key_gc_work);
>   		}
>   	}

Best regards
Jarkko Sakkinen April 7, 2025, 11:23 a.m. UTC | #2
On Mon, Apr 07, 2025 at 12:25:11PM +0200, Marek Szyprowski wrote:
> On 07.04.2025 04:39, Jarkko Sakkinen wrote:
> > From: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
> >
> > Add an isolated list of unreferenced keys to be queued for deletion, and
> > try to pin the keys in the garbage collector before processing anything.
> > Skip unpinnable keys.
> >
> > Use this list for blocking the reaping process during the teardown:
> >
> > 1. First off, the keys added to `keys_graveyard` are snapshotted, and the
> >     list is flushed. This the very last step in `key_put()`.
> > 2. `key_put()` reaches zero. This will mark key as busy for the garbage
> >     collector.
> > 3. `key_garbage_collector()` will try to increase refcount, which won't go
> >     above zero. Whenever this happens, the key will be skipped.
> >
> > Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
> This patch landed in today's linux-next as commit b0d023797e3e ("keys: 
> Add a list for unreferenced keys"). In my tests I found that it triggers 
> the following lockdep issue:
> 
> ================================
> WARNING: inconsistent lock state
> 6.15.0-rc1-next-20250407 #15630 Not tainted
> --------------------------------
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> ksoftirqd/3/32 [HC0[0]:SC1[1]:HE1:SE0] takes:
> c13fdd68 (key_serial_lock){+.?.}-{2:2}, at: key_put+0x74/0x128
> {SOFTIRQ-ON-W} state was registered at:
>    lock_acquire+0x134/0x384
>    _raw_spin_lock+0x38/0x48
>    key_alloc+0x2fc/0x4d8
>    keyring_alloc+0x40/0x90
>    system_trusted_keyring_init+0x50/0x7c
>    do_one_initcall+0x68/0x314
>    kernel_init_freeable+0x1c0/0x224
>    kernel_init+0x1c/0x12c
>    ret_from_fork+0x14/0x28
> irq event stamp: 234
> hardirqs last  enabled at (234): [<c0cb7060>] 
> _raw_spin_unlock_irqrestore+0x5c/0x60
> hardirqs last disabled at (233): [<c0cb6dd0>] 
> _raw_spin_lock_irqsave+0x64/0x68
> softirqs last  enabled at (42): [<c013bcd8>] handle_softirqs+0x328/0x520
> softirqs last disabled at (47): [<c013bf10>] run_ksoftirqd+0x40/0x68

OK what went to -next went there by accident and has been removed,
sorry. I think it was like the very first version of this patch.

Thanks for informing anyhow!

BR, Jarkko
Jarkko Sakkinen April 7, 2025, 12:08 p.m. UTC | #3
On Mon, Apr 07, 2025 at 02:23:49PM +0300, Jarkko Sakkinen wrote:
> On Mon, Apr 07, 2025 at 12:25:11PM +0200, Marek Szyprowski wrote:
> > On 07.04.2025 04:39, Jarkko Sakkinen wrote:
> > > From: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
> > >
> > > Add an isolated list of unreferenced keys to be queued for deletion, and
> > > try to pin the keys in the garbage collector before processing anything.
> > > Skip unpinnable keys.
> > >
> > > Use this list for blocking the reaping process during the teardown:
> > >
> > > 1. First off, the keys added to `keys_graveyard` are snapshotted, and the
> > >     list is flushed. This the very last step in `key_put()`.
> > > 2. `key_put()` reaches zero. This will mark key as busy for the garbage
> > >     collector.
> > > 3. `key_garbage_collector()` will try to increase refcount, which won't go
> > >     above zero. Whenever this happens, the key will be skipped.
> > >
> > > Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
> > This patch landed in today's linux-next as commit b0d023797e3e ("keys: 
> > Add a list for unreferenced keys"). In my tests I found that it triggers 
> > the following lockdep issue:
> > 
> > ================================
> > WARNING: inconsistent lock state
> > 6.15.0-rc1-next-20250407 #15630 Not tainted
> > --------------------------------
> > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> > ksoftirqd/3/32 [HC0[0]:SC1[1]:HE1:SE0] takes:
> > c13fdd68 (key_serial_lock){+.?.}-{2:2}, at: key_put+0x74/0x128
> > {SOFTIRQ-ON-W} state was registered at:
> >    lock_acquire+0x134/0x384
> >    _raw_spin_lock+0x38/0x48
> >    key_alloc+0x2fc/0x4d8
> >    keyring_alloc+0x40/0x90
> >    system_trusted_keyring_init+0x50/0x7c
> >    do_one_initcall+0x68/0x314
> >    kernel_init_freeable+0x1c0/0x224
> >    kernel_init+0x1c/0x12c
> >    ret_from_fork+0x14/0x28
> > irq event stamp: 234
> > hardirqs last  enabled at (234): [<c0cb7060>] 
> > _raw_spin_unlock_irqrestore+0x5c/0x60
> > hardirqs last disabled at (233): [<c0cb6dd0>] 
> > _raw_spin_lock_irqsave+0x64/0x68
> > softirqs last  enabled at (42): [<c013bcd8>] handle_softirqs+0x328/0x520
> > softirqs last disabled at (47): [<c013bf10>] run_ksoftirqd+0x40/0x68
> 
> OK what went to -next went there by accident and has been removed,
> sorry. I think it was like the very first version of this patch.
> 
> Thanks for informing anyhow!


Testing branch: https://web.git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git/log/?h=keys-graveyard

I updated my next this morning so should be fixed soon...

> 
> BR, Jarkko

BR, Jarkko
Marek Szyprowski April 7, 2025, 12:42 p.m. UTC | #4
On 07.04.2025 14:08, Jarkko Sakkinen wrote:
> On Mon, Apr 07, 2025 at 02:23:49PM +0300, Jarkko Sakkinen wrote:
>> On Mon, Apr 07, 2025 at 12:25:11PM +0200, Marek Szyprowski wrote:
>>> On 07.04.2025 04:39, Jarkko Sakkinen wrote:
>>>> From: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
>>>>
>>>> Add an isolated list of unreferenced keys to be queued for deletion, and
>>>> try to pin the keys in the garbage collector before processing anything.
>>>> Skip unpinnable keys.
>>>>
>>>> Use this list for blocking the reaping process during the teardown:
>>>>
>>>> 1. First off, the keys added to `keys_graveyard` are snapshotted, and the
>>>>      list is flushed. This the very last step in `key_put()`.
>>>> 2. `key_put()` reaches zero. This will mark key as busy for the garbage
>>>>      collector.
>>>> 3. `key_garbage_collector()` will try to increase refcount, which won't go
>>>>      above zero. Whenever this happens, the key will be skipped.
>>>>
>>>> Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
>>> This patch landed in today's linux-next as commit b0d023797e3e ("keys:
>>> Add a list for unreferenced keys"). In my tests I found that it triggers
>>> the following lockdep issue:
>>>
>>> ================================
>>> WARNING: inconsistent lock state
>>> 6.15.0-rc1-next-20250407 #15630 Not tainted
>>> --------------------------------
>>> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
>>> ksoftirqd/3/32 [HC0[0]:SC1[1]:HE1:SE0] takes:
>>> c13fdd68 (key_serial_lock){+.?.}-{2:2}, at: key_put+0x74/0x128
>>> {SOFTIRQ-ON-W} state was registered at:
>>>     lock_acquire+0x134/0x384
>>>     _raw_spin_lock+0x38/0x48
>>>     key_alloc+0x2fc/0x4d8
>>>     keyring_alloc+0x40/0x90
>>>     system_trusted_keyring_init+0x50/0x7c
>>>     do_one_initcall+0x68/0x314
>>>     kernel_init_freeable+0x1c0/0x224
>>>     kernel_init+0x1c/0x12c
>>>     ret_from_fork+0x14/0x28
>>> irq event stamp: 234
>>> hardirqs last  enabled at (234): [<c0cb7060>]
>>> _raw_spin_unlock_irqrestore+0x5c/0x60
>>> hardirqs last disabled at (233): [<c0cb6dd0>]
>>> _raw_spin_lock_irqsave+0x64/0x68
>>> softirqs last  enabled at (42): [<c013bcd8>] handle_softirqs+0x328/0x520
>>> softirqs last disabled at (47): [<c013bf10>] run_ksoftirqd+0x40/0x68
>> OK what went to -next went there by accident and has been removed,
>> sorry. I think it was like the very first version of this patch.
>>
>> Thanks for informing anyhow!
>
> Testing branch: https://web.git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git/log/?h=keys-graveyard
>
> I updated my next this morning so should be fixed soon...

I've just checked that branch and it still triggers lockdep issue. The 
following change is needed to get it fixed:

diff --git a/security/keys/gc.c b/security/keys/gc.c
index 0a3beb68633c..b22dc93eb4b4 100644
--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -302,9 +302,9 @@ static void key_garbage_collector(struct work_struct 
*work)
                 key_schedule_gc(new_timer);
         }

-       spin_lock(&key_graveyard_lock);
+       spin_lock_irqsave(&key_graveyard_lock, flags);
         list_splice_init(&key_graveyard, &graveyard);
-       spin_unlock(&key_graveyard_lock);
+       spin_unlock_irqrestore(&key_graveyard_lock, flags);

         if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2) ||
             !list_empty(&graveyard)) {

Best regards
Jarkko Sakkinen April 7, 2025, 12:49 p.m. UTC | #5
On Mon, Apr 07, 2025 at 02:42:34PM +0200, Marek Szyprowski wrote:
> On 07.04.2025 14:08, Jarkko Sakkinen wrote:
> > On Mon, Apr 07, 2025 at 02:23:49PM +0300, Jarkko Sakkinen wrote:
> >> On Mon, Apr 07, 2025 at 12:25:11PM +0200, Marek Szyprowski wrote:
> >>> On 07.04.2025 04:39, Jarkko Sakkinen wrote:
> >>>> From: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
> >>>>
> >>>> Add an isolated list of unreferenced keys to be queued for deletion, and
> >>>> try to pin the keys in the garbage collector before processing anything.
> >>>> Skip unpinnable keys.
> >>>>
> >>>> Use this list for blocking the reaping process during the teardown:
> >>>>
> >>>> 1. First off, the keys added to `keys_graveyard` are snapshotted, and the
> >>>>      list is flushed. This the very last step in `key_put()`.
> >>>> 2. `key_put()` reaches zero. This will mark key as busy for the garbage
> >>>>      collector.
> >>>> 3. `key_garbage_collector()` will try to increase refcount, which won't go
> >>>>      above zero. Whenever this happens, the key will be skipped.
> >>>>
> >>>> Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
> >>> This patch landed in today's linux-next as commit b0d023797e3e ("keys:
> >>> Add a list for unreferenced keys"). In my tests I found that it triggers
> >>> the following lockdep issue:
> >>>
> >>> ================================
> >>> WARNING: inconsistent lock state
> >>> 6.15.0-rc1-next-20250407 #15630 Not tainted
> >>> --------------------------------
> >>> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> >>> ksoftirqd/3/32 [HC0[0]:SC1[1]:HE1:SE0] takes:
> >>> c13fdd68 (key_serial_lock){+.?.}-{2:2}, at: key_put+0x74/0x128
> >>> {SOFTIRQ-ON-W} state was registered at:
> >>>     lock_acquire+0x134/0x384
> >>>     _raw_spin_lock+0x38/0x48
> >>>     key_alloc+0x2fc/0x4d8
> >>>     keyring_alloc+0x40/0x90
> >>>     system_trusted_keyring_init+0x50/0x7c
> >>>     do_one_initcall+0x68/0x314
> >>>     kernel_init_freeable+0x1c0/0x224
> >>>     kernel_init+0x1c/0x12c
> >>>     ret_from_fork+0x14/0x28
> >>> irq event stamp: 234
> >>> hardirqs last  enabled at (234): [<c0cb7060>]
> >>> _raw_spin_unlock_irqrestore+0x5c/0x60
> >>> hardirqs last disabled at (233): [<c0cb6dd0>]
> >>> _raw_spin_lock_irqsave+0x64/0x68
> >>> softirqs last  enabled at (42): [<c013bcd8>] handle_softirqs+0x328/0x520
> >>> softirqs last disabled at (47): [<c013bf10>] run_ksoftirqd+0x40/0x68
> >> OK what went to -next went there by accident and has been removed,
> >> sorry. I think it was like the very first version of this patch.
> >>
> >> Thanks for informing anyhow!
> >
> > Testing branch: https://web.git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git/log/?h=keys-graveyard
> >
> > I updated my next this morning so should be fixed soon...
> 
> I've just checked that branch and it still triggers lockdep issue. The 
> following change is needed to get it fixed:
> 
> diff --git a/security/keys/gc.c b/security/keys/gc.c
> index 0a3beb68633c..b22dc93eb4b4 100644
> --- a/security/keys/gc.c
> +++ b/security/keys/gc.c
> @@ -302,9 +302,9 @@ static void key_garbage_collector(struct work_struct 
> *work)
>                  key_schedule_gc(new_timer);
>          }
> 
> -       spin_lock(&key_graveyard_lock);
> +       spin_lock_irqsave(&key_graveyard_lock, flags);
>          list_splice_init(&key_graveyard, &graveyard);
> -       spin_unlock(&key_graveyard_lock);
> +       spin_unlock_irqrestore(&key_graveyard_lock, flags);
> 
>          if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2) ||
>              !list_empty(&graveyard)) {

Oh, it used to be liked this. I managed to mess things up during rebase:

https://lore.kernel.org/keyrings/Z-682XjIjxjAZ9j-@kernel.org/T/#m4a0db2526abb549df3871dec23056350556d4556

Thanks for spotting this, I'll revert it how it used to be in v4.

> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
> 
> 
BR, Jarkko
diff mbox series

Patch

diff --git a/include/linux/key.h b/include/linux/key.h
index ba05de8579ec..c50659184bdf 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -195,10 +195,8 @@  enum key_state {
 struct key {
 	refcount_t		usage;		/* number of references */
 	key_serial_t		serial;		/* key serial number */
-	union {
-		struct list_head graveyard_link;
-		struct rb_node	serial_node;
-	};
+	struct list_head	graveyard_link; /* key->usage == 0 */
+	struct rb_node		serial_node;
 #ifdef CONFIG_KEY_NOTIFICATIONS
 	struct watch_list	*watchers;	/* Entities watching this key for changes */
 #endif
@@ -236,7 +234,6 @@  struct key {
 #define KEY_FLAG_ROOT_CAN_INVAL	7	/* set if key can be invalidated by root without permission */
 #define KEY_FLAG_KEEP		8	/* set if key should not be removed */
 #define KEY_FLAG_UID_KEYRING	9	/* set if key is a user or user session keyring */
-#define KEY_FLAG_FINAL_PUT	10	/* set if final put has happened on key */
 
 	/* the key type and key description string
 	 * - the desc is used to match a key against search criteria
diff --git a/security/keys/gc.c b/security/keys/gc.c
index f27223ea4578..0a3beb68633c 100644
--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -189,6 +189,7 @@  static void key_garbage_collector(struct work_struct *work)
 	struct rb_node *cursor;
 	struct key *key;
 	time64_t new_timer, limit, expiry;
+	unsigned long flags;
 
 	kenter("[%lx,%x]", key_gc_flags, gc_state);
 
@@ -206,21 +207,35 @@  static void key_garbage_collector(struct work_struct *work)
 
 	new_timer = TIME64_MAX;
 
+	spin_lock_irqsave(&key_graveyard_lock, flags);
+	list_splice_init(&key_graveyard, &graveyard);
+	spin_unlock_irqrestore(&key_graveyard_lock, flags);
+
+	list_for_each_entry(key, &graveyard, graveyard_link) {
+		spin_lock(&key_serial_lock);
+		kdebug("unrefd key %d", key->serial);
+		rb_erase(&key->serial_node, &key_serial_tree);
+		spin_unlock(&key_serial_lock);
+	}
+
 	/* As only this function is permitted to remove things from the key
 	 * serial tree, if cursor is non-NULL then it will always point to a
 	 * valid node in the tree - even if lock got dropped.
 	 */
 	spin_lock(&key_serial_lock);
+	key = NULL;
 	cursor = rb_first(&key_serial_tree);
 
 continue_scanning:
+	key_put(key);
 	while (cursor) {
 		key = rb_entry(cursor, struct key, serial_node);
 		cursor = rb_next(cursor);
-
-		if (test_bit(KEY_FLAG_FINAL_PUT, &key->flags)) {
-			smp_mb(); /* Clobber key->user after FINAL_PUT seen. */
-			goto found_unreferenced_key;
+		/* key_get(), unless zero: */
+		if (!refcount_inc_not_zero(&key->usage)) {
+			key = NULL;
+			gc_state |= KEY_GC_REAP_AGAIN;
+			goto skip_dead_key;
 		}
 
 		if (unlikely(gc_state & KEY_GC_REAPING_DEAD_1)) {
@@ -274,6 +289,7 @@  static void key_garbage_collector(struct work_struct *work)
 		spin_lock(&key_serial_lock);
 		goto continue_scanning;
 	}
+	key_put(key);
 
 	/* We've completed the pass.  Set the timer if we need to and queue a
 	 * new cycle if necessary.  We keep executing cycles until we find one
@@ -286,6 +302,10 @@  static void key_garbage_collector(struct work_struct *work)
 		key_schedule_gc(new_timer);
 	}
 
+	spin_lock(&key_graveyard_lock);
+	list_splice_init(&key_graveyard, &graveyard);
+	spin_unlock(&key_graveyard_lock);
+
 	if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2) ||
 	    !list_empty(&graveyard)) {
 		/* Make sure that all pending keyring payload destructions are
@@ -328,18 +348,6 @@  static void key_garbage_collector(struct work_struct *work)
 	kleave(" [end %x]", gc_state);
 	return;
 
-	/* We found an unreferenced key - once we've removed it from the tree,
-	 * we can safely drop the lock.
-	 */
-found_unreferenced_key:
-	kdebug("unrefd key %d", key->serial);
-	rb_erase(&key->serial_node, &key_serial_tree);
-	spin_unlock(&key_serial_lock);
-
-	list_add_tail(&key->graveyard_link, &graveyard);
-	gc_state |= KEY_GC_REAP_AGAIN;
-	goto maybe_resched;
-
 	/* We found a restricted keyring and need to update the restriction if
 	 * it is associated with the dead key type.
 	 */
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 2cffa6dc8255..4e3d9b322390 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -63,9 +63,14 @@  struct key_user {
 	int			qnbytes;	/* number of bytes allocated to this user */
 };
 
+extern struct list_head key_graveyard;
+extern spinlock_t key_graveyard_lock;
+
 extern struct rb_root	key_user_tree;
 extern spinlock_t	key_user_lock;
 extern struct key_user	root_key_user;
+extern struct list_head	key_graveyard;
+extern spinlock_t	key_graveyard_lock;
 
 extern struct key_user *key_user_lookup(kuid_t uid);
 extern void key_user_put(struct key_user *user);
diff --git a/security/keys/key.c b/security/keys/key.c
index 7198cd2ac3a3..7511f2017b6b 100644
--- a/security/keys/key.c
+++ b/security/keys/key.c
@@ -22,6 +22,8 @@  DEFINE_SPINLOCK(key_serial_lock);
 
 struct rb_root	key_user_tree; /* tree of quota records indexed by UID */
 DEFINE_SPINLOCK(key_user_lock);
+LIST_HEAD(key_graveyard);
+DEFINE_SPINLOCK(key_graveyard_lock);
 
 unsigned int key_quota_root_maxkeys = 1000000;	/* root's key count quota */
 unsigned int key_quota_root_maxbytes = 25000000; /* root's key space quota */
@@ -658,8 +660,9 @@  void key_put(struct key *key)
 				key->user->qnbytes -= key->quotalen;
 				spin_unlock_irqrestore(&key->user->lock, flags);
 			}
-			smp_mb(); /* key->user before FINAL_PUT set. */
-			set_bit(KEY_FLAG_FINAL_PUT, &key->flags);
+			spin_lock_irqsave(&key_graveyard_lock, flags);
+			list_add_tail(&key->graveyard_link, &key_graveyard);
+			spin_unlock_irqrestore(&key_graveyard_lock, flags);
 			schedule_work(&key_gc_work);
 		}
 	}