diff mbox series

[1/3] ima: Remove inode lock

Message ID 20241008165732.2603647-1-roberto.sassu@huaweicloud.com (mailing list archive)
State Handled Elsewhere
Delegated to: Paul Moore
Headers show
Series [1/3] ima: Remove inode lock | expand

Commit Message

Roberto Sassu Oct. 8, 2024, 4:57 p.m. UTC
From: Roberto Sassu <roberto.sassu@huawei.com>

Move out the mutex in the ima_iint_cache structure to a new structure
called ima_iint_cache_lock, so that a lock can be taken regardless of
whether or not inode integrity metadata are stored in the inode.

Introduce ima_inode_security() to simplify accessing the new structure in
the inode security blob.

Move the mutex initialization and annotation in the new function
ima_inode_alloc_security() and introduce ima_iint_lock() and
ima_iint_unlock() to respectively lock and unlock the mutex.

Finally, expand the critical region in process_measurement() guarded by
iint->mutex up to where the inode was locked, use only one iint lock in
__ima_inode_hash(), since the mutex is now in the inode security blob, and
replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
---
 security/integrity/ima/ima.h      | 26 ++++++++---
 security/integrity/ima/ima_api.c  |  4 +-
 security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
 security/integrity/ima/ima_main.c | 39 +++++++---------
 4 files changed, 104 insertions(+), 42 deletions(-)

Comments

Paul Moore Oct. 9, 2024, 3:36 p.m. UTC | #1
On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
>
> From: Roberto Sassu <roberto.sassu@huawei.com>
>
> Move out the mutex in the ima_iint_cache structure to a new structure
> called ima_iint_cache_lock, so that a lock can be taken regardless of
> whether or not inode integrity metadata are stored in the inode.
>
> Introduce ima_inode_security() to simplify accessing the new structure in
> the inode security blob.
>
> Move the mutex initialization and annotation in the new function
> ima_inode_alloc_security() and introduce ima_iint_lock() and
> ima_iint_unlock() to respectively lock and unlock the mutex.
>
> Finally, expand the critical region in process_measurement() guarded by
> iint->mutex up to where the inode was locked, use only one iint lock in
> __ima_inode_hash(), since the mutex is now in the inode security blob, and
> replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().
>
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> ---
>  security/integrity/ima/ima.h      | 26 ++++++++---
>  security/integrity/ima/ima_api.c  |  4 +-
>  security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
>  security/integrity/ima/ima_main.c | 39 +++++++---------
>  4 files changed, 104 insertions(+), 42 deletions(-)

I'm not an IMA expert, but it looks reasonable to me, although
shouldn't this carry a stable CC in the patch metadata?

Reviewed-by: Paul Moore <paul@paul-moore.com>
Paul Moore Oct. 9, 2024, 3:37 p.m. UTC | #2
On Wed, Oct 9, 2024 at 11:36 AM Paul Moore <paul@paul-moore.com> wrote:
> On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu
> <roberto.sassu@huaweicloud.com> wrote:
> >
> > From: Roberto Sassu <roberto.sassu@huawei.com>
> >
> > Move out the mutex in the ima_iint_cache structure to a new structure
> > called ima_iint_cache_lock, so that a lock can be taken regardless of
> > whether or not inode integrity metadata are stored in the inode.
> >
> > Introduce ima_inode_security() to simplify accessing the new structure in
> > the inode security blob.
> >
> > Move the mutex initialization and annotation in the new function
> > ima_inode_alloc_security() and introduce ima_iint_lock() and
> > ima_iint_unlock() to respectively lock and unlock the mutex.
> >
> > Finally, expand the critical region in process_measurement() guarded by
> > iint->mutex up to where the inode was locked, use only one iint lock in
> > __ima_inode_hash(), since the mutex is now in the inode security blob, and
> > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().
> >
> > Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> > ---
> >  security/integrity/ima/ima.h      | 26 ++++++++---
> >  security/integrity/ima/ima_api.c  |  4 +-
> >  security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
> >  security/integrity/ima/ima_main.c | 39 +++++++---------
> >  4 files changed, 104 insertions(+), 42 deletions(-)
>
> I'm not an IMA expert, but it looks reasonable to me, although
> shouldn't this carry a stable CC in the patch metadata?
>
> Reviewed-by: Paul Moore <paul@paul-moore.com>

Sorry, one more thing ... did you verify this patchset resolves the
syzbot problem?  I saw at least one reproducer.
Shu Han Oct. 9, 2024, 3:59 p.m. UTC | #3
> Finally, expand the critical region in process_measurement() guarded by
> iint->mutex up to where the inode was locked, use only one iint lock in
> __ima_inode_hash(), since the mutex is now in the inode security blob, and
> replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().

I am not familiar with this, so the following statement may be inaccurate:

I suspect that modifying the `i_flags` field through
`inode->i_flags |= S_IMA;` in `ima_inode_get` may cause a
race, as this patch removes the write lock for inodes in
process_measurement().

For example, swapon() adds the S_SWAPFILE tag under inode write lock's
protection.

Perhaps this initialization tag(`S_IMA`) can also be moved into inode's
security blob.
Roberto Sassu Oct. 9, 2024, 4:24 p.m. UTC | #4
On Wed, 2024-10-09 at 11:36 -0400, Paul Moore wrote:
> On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu
> <roberto.sassu@huaweicloud.com> wrote:
> > 
> > From: Roberto Sassu <roberto.sassu@huawei.com>
> > 
> > Move out the mutex in the ima_iint_cache structure to a new structure
> > called ima_iint_cache_lock, so that a lock can be taken regardless of
> > whether or not inode integrity metadata are stored in the inode.
> > 
> > Introduce ima_inode_security() to simplify accessing the new structure in
> > the inode security blob.
> > 
> > Move the mutex initialization and annotation in the new function
> > ima_inode_alloc_security() and introduce ima_iint_lock() and
> > ima_iint_unlock() to respectively lock and unlock the mutex.
> > 
> > Finally, expand the critical region in process_measurement() guarded by
> > iint->mutex up to where the inode was locked, use only one iint lock in
> > __ima_inode_hash(), since the mutex is now in the inode security blob, and
> > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().
> > 
> > Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> > ---
> >  security/integrity/ima/ima.h      | 26 ++++++++---
> >  security/integrity/ima/ima_api.c  |  4 +-
> >  security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
> >  security/integrity/ima/ima_main.c | 39 +++++++---------
> >  4 files changed, 104 insertions(+), 42 deletions(-)
> 
> I'm not an IMA expert, but it looks reasonable to me, although
> shouldn't this carry a stable CC in the patch metadata?
> 
> Reviewed-by: Paul Moore <paul@paul-moore.com>

Thanks, will add in the new version.

Roberto
Roberto Sassu Oct. 9, 2024, 4:25 p.m. UTC | #5
On Wed, 2024-10-09 at 11:37 -0400, Paul Moore wrote:
> On Wed, Oct 9, 2024 at 11:36 AM Paul Moore <paul@paul-moore.com> wrote:
> > On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu
> > <roberto.sassu@huaweicloud.com> wrote:
> > > 
> > > From: Roberto Sassu <roberto.sassu@huawei.com>
> > > 
> > > Move out the mutex in the ima_iint_cache structure to a new structure
> > > called ima_iint_cache_lock, so that a lock can be taken regardless of
> > > whether or not inode integrity metadata are stored in the inode.
> > > 
> > > Introduce ima_inode_security() to simplify accessing the new structure in
> > > the inode security blob.
> > > 
> > > Move the mutex initialization and annotation in the new function
> > > ima_inode_alloc_security() and introduce ima_iint_lock() and
> > > ima_iint_unlock() to respectively lock and unlock the mutex.
> > > 
> > > Finally, expand the critical region in process_measurement() guarded by
> > > iint->mutex up to where the inode was locked, use only one iint lock in
> > > __ima_inode_hash(), since the mutex is now in the inode security blob, and
> > > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().
> > > 
> > > Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> > > ---
> > >  security/integrity/ima/ima.h      | 26 ++++++++---
> > >  security/integrity/ima/ima_api.c  |  4 +-
> > >  security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
> > >  security/integrity/ima/ima_main.c | 39 +++++++---------
> > >  4 files changed, 104 insertions(+), 42 deletions(-)
> > 
> > I'm not an IMA expert, but it looks reasonable to me, although
> > shouldn't this carry a stable CC in the patch metadata?
> > 
> > Reviewed-by: Paul Moore <paul@paul-moore.com>
> 
> Sorry, one more thing ... did you verify this patchset resolves the
> syzbot problem?  I saw at least one reproducer.

Uhm, could not reproduce the deadlock with the reproducer. However,
without the patch I have a lockdep warning, and with I don't.

I asked syzbot to try the patches. Let's see.

Thanks

Roberto
Roberto Sassu Oct. 9, 2024, 4:32 p.m. UTC | #6
On Wed, 2024-10-09 at 18:25 +0200, Roberto Sassu wrote:
> On Wed, 2024-10-09 at 11:37 -0400, Paul Moore wrote:
> > On Wed, Oct 9, 2024 at 11:36 AM Paul Moore <paul@paul-moore.com> wrote:
> > > On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu
> > > <roberto.sassu@huaweicloud.com> wrote:
> > > > 
> > > > From: Roberto Sassu <roberto.sassu@huawei.com>
> > > > 
> > > > Move out the mutex in the ima_iint_cache structure to a new structure
> > > > called ima_iint_cache_lock, so that a lock can be taken regardless of
> > > > whether or not inode integrity metadata are stored in the inode.
> > > > 
> > > > Introduce ima_inode_security() to simplify accessing the new structure in
> > > > the inode security blob.
> > > > 
> > > > Move the mutex initialization and annotation in the new function
> > > > ima_inode_alloc_security() and introduce ima_iint_lock() and
> > > > ima_iint_unlock() to respectively lock and unlock the mutex.
> > > > 
> > > > Finally, expand the critical region in process_measurement() guarded by
> > > > iint->mutex up to where the inode was locked, use only one iint lock in
> > > > __ima_inode_hash(), since the mutex is now in the inode security blob, and
> > > > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().
> > > > 
> > > > Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> > > > ---
> > > >  security/integrity/ima/ima.h      | 26 ++++++++---
> > > >  security/integrity/ima/ima_api.c  |  4 +-
> > > >  security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
> > > >  security/integrity/ima/ima_main.c | 39 +++++++---------
> > > >  4 files changed, 104 insertions(+), 42 deletions(-)
> > > 
> > > I'm not an IMA expert, but it looks reasonable to me, although
> > > shouldn't this carry a stable CC in the patch metadata?
> > > 
> > > Reviewed-by: Paul Moore <paul@paul-moore.com>
> > 
> > Sorry, one more thing ... did you verify this patchset resolves the
> > syzbot problem?  I saw at least one reproducer.
> 
> Uhm, could not reproduce the deadlock with the reproducer. However,
> without the patch I have a lockdep warning, and with I don't.
> 
> I asked syzbot to try the patches. Let's see.

@bpf: could you please manually trigger the tests in a PR? Next time
will add the bpf-next tag (or I can send a PR directly from Github).

This patch affects the BPF LSM, the bpf_ima_file_hash() and
bpf_ima_inode_hash() helpers.

Thanks

Roberto
Roberto Sassu Oct. 9, 2024, 4:43 p.m. UTC | #7
On Wed, 2024-10-09 at 18:25 +0200, Roberto Sassu wrote:
> On Wed, 2024-10-09 at 11:37 -0400, Paul Moore wrote:
> > On Wed, Oct 9, 2024 at 11:36 AM Paul Moore <paul@paul-moore.com> wrote:
> > > On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu
> > > <roberto.sassu@huaweicloud.com> wrote:
> > > > 
> > > > From: Roberto Sassu <roberto.sassu@huawei.com>
> > > > 
> > > > Move out the mutex in the ima_iint_cache structure to a new structure
> > > > called ima_iint_cache_lock, so that a lock can be taken regardless of
> > > > whether or not inode integrity metadata are stored in the inode.
> > > > 
> > > > Introduce ima_inode_security() to simplify accessing the new structure in
> > > > the inode security blob.
> > > > 
> > > > Move the mutex initialization and annotation in the new function
> > > > ima_inode_alloc_security() and introduce ima_iint_lock() and
> > > > ima_iint_unlock() to respectively lock and unlock the mutex.
> > > > 
> > > > Finally, expand the critical region in process_measurement() guarded by
> > > > iint->mutex up to where the inode was locked, use only one iint lock in
> > > > __ima_inode_hash(), since the mutex is now in the inode security blob, and
> > > > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().
> > > > 
> > > > Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> > > > ---
> > > >  security/integrity/ima/ima.h      | 26 ++++++++---
> > > >  security/integrity/ima/ima_api.c  |  4 +-
> > > >  security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
> > > >  security/integrity/ima/ima_main.c | 39 +++++++---------
> > > >  4 files changed, 104 insertions(+), 42 deletions(-)
> > > 
> > > I'm not an IMA expert, but it looks reasonable to me, although
> > > shouldn't this carry a stable CC in the patch metadata?
> > > 
> > > Reviewed-by: Paul Moore <paul@paul-moore.com>
> > 
> > Sorry, one more thing ... did you verify this patchset resolves the
> > syzbot problem?  I saw at least one reproducer.
> 
> Uhm, could not reproduce the deadlock with the reproducer. However,
> without the patch I have a lockdep warning, and with I don't.
> 
> I asked syzbot to try the patches. Let's see.

I actually got a different lockdep warning:

[  904.603365] ======================================================
[  904.604264] WARNING: possible circular locking dependency detected
[  904.605141] 6.12.0-rc2+ #20 Not tainted
[  904.605697] ------------------------------------------------------
[  904.606577] systemd/1 is trying to acquire lock:
[  904.607227] ffff88810e5c2580 (&p->lock){+.+.}-{4:4}, at: seq_read_iter+0x62/0x6b0
[  904.608290] 
[  904.608290] but task is already holding lock:
[  904.609105] ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
[  904.610429] 
[  904.610429] which lock already depends on the new lock.
[  904.610429] 
[  904.611574] 
[  904.611574] the existing dependency chain (in reverse order) is:
[  904.612628] 
[  904.612628] -> #2 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}:
[  904.613681]        __mutex_lock+0xaf/0x760
[  904.614266]        mutex_lock_nested+0x27/0x40
[  904.614897]        ima_iint_lock+0x24/0x40
[  904.615490]        process_measurement+0x176/0xef0
[  904.616168]        ima_file_mmap+0x98/0x120
[  904.616767]        security_mmap_file+0x408/0x560
[  904.617444]        __do_sys_remap_file_pages+0x2fa/0x4c0
[  904.618194]        __x64_sys_remap_file_pages+0x29/0x40
[  904.618937]        x64_sys_call+0x6e8/0x4550
[  904.619546]        do_syscall_64+0x71/0x180
[  904.620155]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  904.620952] 
[  904.620952] -> #1 (&mm->mmap_lock){++++}-{4:4}:
[  904.621813]        __might_fault+0x6f/0xb0
[  904.622400]        _copy_to_iter+0x12e/0xa80
[  904.623009]        seq_read_iter+0x593/0x6b0
[  904.623629]        proc_reg_read_iter+0x31/0xe0
[  904.624276]        vfs_read+0x256/0x3d0
[  904.624822]        ksys_read+0x6d/0x160
[  904.625372]        __x64_sys_read+0x1d/0x30
[  904.625964]        x64_sys_call+0x1068/0x4550
[  904.626594]        do_syscall_64+0x71/0x180
[  904.627188]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  904.627975] 
[  904.627975] -> #0 (&p->lock){+.+.}-{4:4}:
[  904.628787]        __lock_acquire+0x17f3/0x2320
[  904.629432]        lock_acquire+0xf2/0x420
[  904.630013]        __mutex_lock+0xaf/0x760
[  904.630596]        mutex_lock_nested+0x27/0x40
[  904.631225]        seq_read_iter+0x62/0x6b0
[  904.631831]        kernfs_fop_read_iter+0x1ef/0x2c0
[  904.632599]        __kernel_read+0x113/0x350
[  904.633206]        integrity_kernel_read+0x23/0x40
[  904.633902]        ima_calc_file_hash_tfm+0x14e/0x230
[  904.634621]        ima_calc_file_hash+0x97/0x250
[  904.635281]        ima_collect_measurement+0x4be/0x530
[  904.636008]        process_measurement+0x7c0/0xef0
[  904.636689]        ima_file_check+0x65/0x80
[  904.637295]        security_file_post_open+0xb1/0x1b0
[  904.638008]        path_openat+0x216/0x1280
[  904.638605]        do_filp_open+0xab/0x140
[  904.639185]        do_sys_openat2+0xba/0x120
[  904.639805]        do_sys_open+0x4c/0x80
[  904.640366]        __x64_sys_openat+0x23/0x30
[  904.640992]        x64_sys_call+0x2575/0x4550
[  904.641616]        do_syscall_64+0x71/0x180
[  904.642207]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  904.643003] 
[  904.643003] other info that might help us debug this:
[  904.643003] 
[  904.644149] Chain exists of:
[  904.644149]   &p->lock --> &mm->mmap_lock --> &ima_iint_lock_mutex_key[depth]
[  904.644149] 
[  904.645763]  Possible unsafe locking scenario:
[  904.645763] 
[  904.646614]        CPU0                    CPU1
[  904.647264]        ----                    ----
[  904.647909]   lock(&ima_iint_lock_mutex_key[depth]);
[  904.648617]                                lock(&mm->mmap_lock);
[  904.649479]                                lock(&ima_iint_lock_mutex_key[depth]);
[  904.650543]   lock(&p->lock);
[  904.650974] 
[  904.650974]  *** DEADLOCK ***
[  904.650974] 
[  904.651826] 1 lock held by systemd/1:
[  904.652376]  #0: ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
[  904.653759] 
[  904.653759] stack backtrace:
[  904.654391] CPU: 2 UID: 0 PID: 1 Comm: systemd Not tainted 6.12.0-rc2+ #20
[  904.655360] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[  904.656497] Call Trace:
[  904.656856]  <TASK>
[  904.657166]  dump_stack_lvl+0x134/0x1a0
[  904.657728]  dump_stack+0x14/0x30
[  904.658206]  print_circular_bug+0x38d/0x450
[  904.658812]  check_noncircular+0xed/0x120
[  904.659396]  ? srso_return_thunk+0x5/0x5f
[  904.659972]  ? srso_return_thunk+0x5/0x5f
[  904.660569]  __lock_acquire+0x17f3/0x2320
[  904.661145]  lock_acquire+0xf2/0x420
[  904.661664]  ? seq_read_iter+0x62/0x6b0
[  904.662217]  ? srso_return_thunk+0x5/0x5f
[  904.662886]  __mutex_lock+0xaf/0x760
[  904.663408]  ? seq_read_iter+0x62/0x6b0
[  904.663961]  ? seq_read_iter+0x62/0x6b0
[  904.664525]  ? srso_return_thunk+0x5/0x5f
[  904.665098]  ? mark_lock+0x4e/0x750
[  904.665610]  ? mutex_lock_nested+0x27/0x40
[  904.666194]  ? find_held_lock+0x3a/0x100
[  904.666770]  mutex_lock_nested+0x27/0x40
[  904.667337]  seq_read_iter+0x62/0x6b0
[  904.667869]  kernfs_fop_read_iter+0x1ef/0x2c0
[  904.668536]  __kernel_read+0x113/0x350
[  904.669079]  integrity_kernel_read+0x23/0x40
[  904.669698]  ima_calc_file_hash_tfm+0x14e/0x230
[  904.670349]  ? __lock_acquire+0xc32/0x2320
[  904.670937]  ? srso_return_thunk+0x5/0x5f
[  904.671525]  ? __lock_acquire+0xfbb/0x2320
[  904.672113]  ? srso_return_thunk+0x5/0x5f
[  904.672693]  ? srso_return_thunk+0x5/0x5f
[  904.673280]  ? lock_acquire+0xf2/0x420
[  904.673818]  ? kernfs_iop_getattr+0x4a/0xb0
[  904.674424]  ? srso_return_thunk+0x5/0x5f
[  904.674997]  ? find_held_lock+0x3a/0x100
[  904.675564]  ? srso_return_thunk+0x5/0x5f
[  904.676156]  ? srso_return_thunk+0x5/0x5f
[  904.676740]  ? srso_return_thunk+0x5/0x5f
[  904.677322]  ? local_clock_noinstr+0x9/0xb0
[  904.677923]  ? srso_return_thunk+0x5/0x5f
[  904.678502]  ? srso_return_thunk+0x5/0x5f
[  904.679075]  ? lock_release+0x4e2/0x570
[  904.679639]  ima_calc_file_hash+0x97/0x250
[  904.680227]  ima_collect_measurement+0x4be/0x530
[  904.680901]  ? srso_return_thunk+0x5/0x5f
[  904.681496]  ? srso_return_thunk+0x5/0x5f
[  904.682070]  ? __kernfs_iattrs+0x4a/0x140
[  904.682658]  ? srso_return_thunk+0x5/0x5f
[  904.683242]  ? process_measurement+0x7c0/0xef0
[  904.683876]  ? srso_return_thunk+0x5/0x5f
[  904.684462]  process_measurement+0x7c0/0xef0
[  904.685078]  ? srso_return_thunk+0x5/0x5f
[  904.685654]  ? srso_return_thunk+0x5/0x5f
[  904.686228]  ? _raw_spin_unlock_irqrestore+0x5d/0xd0
[  904.686938]  ? srso_return_thunk+0x5/0x5f
[  904.687523]  ? srso_return_thunk+0x5/0x5f
[  904.688098]  ? srso_return_thunk+0x5/0x5f
[  904.688672]  ? local_clock_noinstr+0x9/0xb0
[  904.689273]  ? srso_return_thunk+0x5/0x5f
[  904.689846]  ? srso_return_thunk+0x5/0x5f
[  904.690430]  ? srso_return_thunk+0x5/0x5f
[  904.691005]  ? srso_return_thunk+0x5/0x5f
[  904.691583]  ? srso_return_thunk+0x5/0x5f
[  904.692180]  ? local_clock_noinstr+0x9/0xb0
[  904.692841]  ? srso_return_thunk+0x5/0x5f
[  904.693419]  ? srso_return_thunk+0x5/0x5f
[  904.693990]  ? lock_release+0x4e2/0x570
[  904.694544]  ? srso_return_thunk+0x5/0x5f
[  904.695115]  ? kernfs_put_active+0x5d/0xc0
[  904.695708]  ? srso_return_thunk+0x5/0x5f
[  904.696286]  ? kernfs_fop_open+0x376/0x6b0
[  904.696872]  ima_file_check+0x65/0x80
[  904.697409]  security_file_post_open+0xb1/0x1b0
[  904.698058]  path_openat+0x216/0x1280
[  904.698589]  do_filp_open+0xab/0x140
[  904.699106]  ? srso_return_thunk+0x5/0x5f
[  904.699693]  ? lock_release+0x554/0x570
[  904.700264]  ? srso_return_thunk+0x5/0x5f
[  904.700836]  ? do_raw_spin_unlock+0x76/0x140
[  904.701450]  ? srso_return_thunk+0x5/0x5f
[  904.702021]  ? _raw_spin_unlock+0x3f/0xa0
[  904.702606]  ? srso_return_thunk+0x5/0x5f
[  904.703178]  ? alloc_fd+0x1ca/0x3b0
[  904.703685]  do_sys_openat2+0xba/0x120
[  904.704223]  ? file_free+0x8d/0x110
[  904.704729]  do_sys_open+0x4c/0x80
[  904.705221]  __x64_sys_openat+0x23/0x30
[  904.705784]  x64_sys_call+0x2575/0x4550
[  904.706337]  do_syscall_64+0x71/0x180
[  904.706863]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  904.707587] RIP: 0033:0x7f3462123037
[  904.708120] Code: 55 9c 48 89 75 a0 89 7d a8 44 89 55 ac e8 a1 7a f8 ff 44 8b 55 ac 8b 55 9c 41 89 c0 48 8b 75 a0 8b 7d a8 b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 45 ac e8 f6 7a f8 ff 8b 45 ac
[  904.710744] RSP: 002b:00007ffdd1a79370 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[  904.711821] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007f3462123037
[  904.712829] RDX: 0000000000080100 RSI: 00007ffdd1a79420 RDI: 00000000ffffff9c
[  904.713839] RBP: 00007ffdd1a793e0 R08: 0000000000000000 R09: 0000000000000000
[  904.714848] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffdd1a794c8
[  904.715855] R13: 00007ffdd1a794b8 R14: 00007ffdd1a79690 R15: 00007ffdd1a79690
[  904.716877]  </TASK>

Roberto
Roberto Sassu Oct. 11, 2024, 1:30 p.m. UTC | #8
On Wed, 2024-10-09 at 18:43 +0200, Roberto Sassu wrote:
> On Wed, 2024-10-09 at 18:25 +0200, Roberto Sassu wrote:
> > On Wed, 2024-10-09 at 11:37 -0400, Paul Moore wrote:
> > > On Wed, Oct 9, 2024 at 11:36 AM Paul Moore <paul@paul-moore.com> wrote:
> > > > On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu
> > > > <roberto.sassu@huaweicloud.com> wrote:
> > > > > 
> > > > > From: Roberto Sassu <roberto.sassu@huawei.com>
> > > > > 
> > > > > Move out the mutex in the ima_iint_cache structure to a new structure
> > > > > called ima_iint_cache_lock, so that a lock can be taken regardless of
> > > > > whether or not inode integrity metadata are stored in the inode.
> > > > > 
> > > > > Introduce ima_inode_security() to simplify accessing the new structure in
> > > > > the inode security blob.
> > > > > 
> > > > > Move the mutex initialization and annotation in the new function
> > > > > ima_inode_alloc_security() and introduce ima_iint_lock() and
> > > > > ima_iint_unlock() to respectively lock and unlock the mutex.
> > > > > 
> > > > > Finally, expand the critical region in process_measurement() guarded by
> > > > > iint->mutex up to where the inode was locked, use only one iint lock in
> > > > > __ima_inode_hash(), since the mutex is now in the inode security blob, and
> > > > > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().
> > > > > 
> > > > > Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> > > > > ---
> > > > >  security/integrity/ima/ima.h      | 26 ++++++++---
> > > > >  security/integrity/ima/ima_api.c  |  4 +-
> > > > >  security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
> > > > >  security/integrity/ima/ima_main.c | 39 +++++++---------
> > > > >  4 files changed, 104 insertions(+), 42 deletions(-)
> > > > 
> > > > I'm not an IMA expert, but it looks reasonable to me, although
> > > > shouldn't this carry a stable CC in the patch metadata?
> > > > 
> > > > Reviewed-by: Paul Moore <paul@paul-moore.com>
> > > 
> > > Sorry, one more thing ... did you verify this patchset resolves the
> > > syzbot problem?  I saw at least one reproducer.
> > 
> > Uhm, could not reproduce the deadlock with the reproducer. However,
> > without the patch I have a lockdep warning, and with I don't.
> > 
> > I asked syzbot to try the patches. Let's see.
> 
> I actually got a different lockdep warning:
> 
> [  904.603365] ======================================================
> [  904.604264] WARNING: possible circular locking dependency detected
> [  904.605141] 6.12.0-rc2+ #20 Not tainted
> [  904.605697] ------------------------------------------------------

I can reproduce by executing the syzbot reproducer and in another
terminal by logging in with SSH (not all the times).

If I understood what the lockdep warning means, this is the scenario.

A task accesses a seq_file which is in the IMA policy, so we enter the
critical region guarded by the iint lock. But before we get the chance
to measure the file, a second task calls remap_file_pages() on the same
seq_file.

remap_file_pages() takes the mmap lock and, if the event matches the
IMA policy too, the second task waits for the iint lock to be released.

Now, the first task starts to measure the seq_file and takes the
seq_file lock. I don't know if 3 processes must be involved, but I was
thinking that reading the seq_file from the first task can trigger a
page fault, which requires to take the mmap lock.

At this point, we reach a deadlock. The first task waits for the mmap
lock to be released, and the second waits for the iint lock to be
released, which both cannot happen.

Roberto

> [  904.606577] systemd/1 is trying to acquire lock:
> [  904.607227] ffff88810e5c2580 (&p->lock){+.+.}-{4:4}, at: seq_read_iter+0x62/0x6b0
> [  904.608290] 
> [  904.608290] but task is already holding lock:
> [  904.609105] ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
> [  904.610429] 
> [  904.610429] which lock already depends on the new lock.
> [  904.610429] 
> [  904.611574] 
> [  904.611574] the existing dependency chain (in reverse order) is:
> [  904.612628] 
> [  904.612628] -> #2 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}:
> [  904.613681]        __mutex_lock+0xaf/0x760
> [  904.614266]        mutex_lock_nested+0x27/0x40
> [  904.614897]        ima_iint_lock+0x24/0x40
> [  904.615490]        process_measurement+0x176/0xef0
> [  904.616168]        ima_file_mmap+0x98/0x120
> [  904.616767]        security_mmap_file+0x408/0x560
> [  904.617444]        __do_sys_remap_file_pages+0x2fa/0x4c0
> [  904.618194]        __x64_sys_remap_file_pages+0x29/0x40
> [  904.618937]        x64_sys_call+0x6e8/0x4550
> [  904.619546]        do_syscall_64+0x71/0x180
> [  904.620155]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  904.620952] 
> [  904.620952] -> #1 (&mm->mmap_lock){++++}-{4:4}:
> [  904.621813]        __might_fault+0x6f/0xb0
> [  904.622400]        _copy_to_iter+0x12e/0xa80
> [  904.623009]        seq_read_iter+0x593/0x6b0
> [  904.623629]        proc_reg_read_iter+0x31/0xe0
> [  904.624276]        vfs_read+0x256/0x3d0
> [  904.624822]        ksys_read+0x6d/0x160
> [  904.625372]        __x64_sys_read+0x1d/0x30
> [  904.625964]        x64_sys_call+0x1068/0x4550
> [  904.626594]        do_syscall_64+0x71/0x180
> [  904.627188]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  904.627975] 
> [  904.627975] -> #0 (&p->lock){+.+.}-{4:4}:
> [  904.628787]        __lock_acquire+0x17f3/0x2320
> [  904.629432]        lock_acquire+0xf2/0x420
> [  904.630013]        __mutex_lock+0xaf/0x760
> [  904.630596]        mutex_lock_nested+0x27/0x40
> [  904.631225]        seq_read_iter+0x62/0x6b0
> [  904.631831]        kernfs_fop_read_iter+0x1ef/0x2c0
> [  904.632599]        __kernel_read+0x113/0x350
> [  904.633206]        integrity_kernel_read+0x23/0x40
> [  904.633902]        ima_calc_file_hash_tfm+0x14e/0x230
> [  904.634621]        ima_calc_file_hash+0x97/0x250
> [  904.635281]        ima_collect_measurement+0x4be/0x530
> [  904.636008]        process_measurement+0x7c0/0xef0
> [  904.636689]        ima_file_check+0x65/0x80
> [  904.637295]        security_file_post_open+0xb1/0x1b0
> [  904.638008]        path_openat+0x216/0x1280
> [  904.638605]        do_filp_open+0xab/0x140
> [  904.639185]        do_sys_openat2+0xba/0x120
> [  904.639805]        do_sys_open+0x4c/0x80
> [  904.640366]        __x64_sys_openat+0x23/0x30
> [  904.640992]        x64_sys_call+0x2575/0x4550
> [  904.641616]        do_syscall_64+0x71/0x180
> [  904.642207]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  904.643003] 
> [  904.643003] other info that might help us debug this:
> [  904.643003] 
> [  904.644149] Chain exists of:
> [  904.644149]   &p->lock --> &mm->mmap_lock --> &ima_iint_lock_mutex_key[depth]
> [  904.644149] 
> [  904.645763]  Possible unsafe locking scenario:
> [  904.645763] 
> [  904.646614]        CPU0                    CPU1
> [  904.647264]        ----                    ----
> [  904.647909]   lock(&ima_iint_lock_mutex_key[depth]);
> [  904.648617]                                lock(&mm->mmap_lock);
> [  904.649479]                                lock(&ima_iint_lock_mutex_key[depth]);
> [  904.650543]   lock(&p->lock);
> [  904.650974] 
> [  904.650974]  *** DEADLOCK ***
> [  904.650974] 
> [  904.651826] 1 lock held by systemd/1:
> [  904.652376]  #0: ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
> [  904.653759] 
> [  904.653759] stack backtrace:
> [  904.654391] CPU: 2 UID: 0 PID: 1 Comm: systemd Not tainted 6.12.0-rc2+ #20
> [  904.655360] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> [  904.656497] Call Trace:
> [  904.656856]  <TASK>
> [  904.657166]  dump_stack_lvl+0x134/0x1a0
> [  904.657728]  dump_stack+0x14/0x30
> [  904.658206]  print_circular_bug+0x38d/0x450
> [  904.658812]  check_noncircular+0xed/0x120
> [  904.659396]  ? srso_return_thunk+0x5/0x5f
> [  904.659972]  ? srso_return_thunk+0x5/0x5f
> [  904.660569]  __lock_acquire+0x17f3/0x2320
> [  904.661145]  lock_acquire+0xf2/0x420
> [  904.661664]  ? seq_read_iter+0x62/0x6b0
> [  904.662217]  ? srso_return_thunk+0x5/0x5f
> [  904.662886]  __mutex_lock+0xaf/0x760
> [  904.663408]  ? seq_read_iter+0x62/0x6b0
> [  904.663961]  ? seq_read_iter+0x62/0x6b0
> [  904.664525]  ? srso_return_thunk+0x5/0x5f
> [  904.665098]  ? mark_lock+0x4e/0x750
> [  904.665610]  ? mutex_lock_nested+0x27/0x40
> [  904.666194]  ? find_held_lock+0x3a/0x100
> [  904.666770]  mutex_lock_nested+0x27/0x40
> [  904.667337]  seq_read_iter+0x62/0x6b0
> [  904.667869]  kernfs_fop_read_iter+0x1ef/0x2c0
> [  904.668536]  __kernel_read+0x113/0x350
> [  904.669079]  integrity_kernel_read+0x23/0x40
> [  904.669698]  ima_calc_file_hash_tfm+0x14e/0x230
> [  904.670349]  ? __lock_acquire+0xc32/0x2320
> [  904.670937]  ? srso_return_thunk+0x5/0x5f
> [  904.671525]  ? __lock_acquire+0xfbb/0x2320
> [  904.672113]  ? srso_return_thunk+0x5/0x5f
> [  904.672693]  ? srso_return_thunk+0x5/0x5f
> [  904.673280]  ? lock_acquire+0xf2/0x420
> [  904.673818]  ? kernfs_iop_getattr+0x4a/0xb0
> [  904.674424]  ? srso_return_thunk+0x5/0x5f
> [  904.674997]  ? find_held_lock+0x3a/0x100
> [  904.675564]  ? srso_return_thunk+0x5/0x5f
> [  904.676156]  ? srso_return_thunk+0x5/0x5f
> [  904.676740]  ? srso_return_thunk+0x5/0x5f
> [  904.677322]  ? local_clock_noinstr+0x9/0xb0
> [  904.677923]  ? srso_return_thunk+0x5/0x5f
> [  904.678502]  ? srso_return_thunk+0x5/0x5f
> [  904.679075]  ? lock_release+0x4e2/0x570
> [  904.679639]  ima_calc_file_hash+0x97/0x250
> [  904.680227]  ima_collect_measurement+0x4be/0x530
> [  904.680901]  ? srso_return_thunk+0x5/0x5f
> [  904.681496]  ? srso_return_thunk+0x5/0x5f
> [  904.682070]  ? __kernfs_iattrs+0x4a/0x140
> [  904.682658]  ? srso_return_thunk+0x5/0x5f
> [  904.683242]  ? process_measurement+0x7c0/0xef0
> [  904.683876]  ? srso_return_thunk+0x5/0x5f
> [  904.684462]  process_measurement+0x7c0/0xef0
> [  904.685078]  ? srso_return_thunk+0x5/0x5f
> [  904.685654]  ? srso_return_thunk+0x5/0x5f
> [  904.686228]  ? _raw_spin_unlock_irqrestore+0x5d/0xd0
> [  904.686938]  ? srso_return_thunk+0x5/0x5f
> [  904.687523]  ? srso_return_thunk+0x5/0x5f
> [  904.688098]  ? srso_return_thunk+0x5/0x5f
> [  904.688672]  ? local_clock_noinstr+0x9/0xb0
> [  904.689273]  ? srso_return_thunk+0x5/0x5f
> [  904.689846]  ? srso_return_thunk+0x5/0x5f
> [  904.690430]  ? srso_return_thunk+0x5/0x5f
> [  904.691005]  ? srso_return_thunk+0x5/0x5f
> [  904.691583]  ? srso_return_thunk+0x5/0x5f
> [  904.692180]  ? local_clock_noinstr+0x9/0xb0
> [  904.692841]  ? srso_return_thunk+0x5/0x5f
> [  904.693419]  ? srso_return_thunk+0x5/0x5f
> [  904.693990]  ? lock_release+0x4e2/0x570
> [  904.694544]  ? srso_return_thunk+0x5/0x5f
> [  904.695115]  ? kernfs_put_active+0x5d/0xc0
> [  904.695708]  ? srso_return_thunk+0x5/0x5f
> [  904.696286]  ? kernfs_fop_open+0x376/0x6b0
> [  904.696872]  ima_file_check+0x65/0x80
> [  904.697409]  security_file_post_open+0xb1/0x1b0
> [  904.698058]  path_openat+0x216/0x1280
> [  904.698589]  do_filp_open+0xab/0x140
> [  904.699106]  ? srso_return_thunk+0x5/0x5f
> [  904.699693]  ? lock_release+0x554/0x570
> [  904.700264]  ? srso_return_thunk+0x5/0x5f
> [  904.700836]  ? do_raw_spin_unlock+0x76/0x140
> [  904.701450]  ? srso_return_thunk+0x5/0x5f
> [  904.702021]  ? _raw_spin_unlock+0x3f/0xa0
> [  904.702606]  ? srso_return_thunk+0x5/0x5f
> [  904.703178]  ? alloc_fd+0x1ca/0x3b0
> [  904.703685]  do_sys_openat2+0xba/0x120
> [  904.704223]  ? file_free+0x8d/0x110
> [  904.704729]  do_sys_open+0x4c/0x80
> [  904.705221]  __x64_sys_openat+0x23/0x30
> [  904.705784]  x64_sys_call+0x2575/0x4550
> [  904.706337]  do_syscall_64+0x71/0x180
> [  904.706863]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  904.707587] RIP: 0033:0x7f3462123037
> [  904.708120] Code: 55 9c 48 89 75 a0 89 7d a8 44 89 55 ac e8 a1 7a f8 ff 44 8b 55 ac 8b 55 9c 41 89 c0 48 8b 75 a0 8b 7d a8 b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 45 ac e8 f6 7a f8 ff 8b 45 ac
> [  904.710744] RSP: 002b:00007ffdd1a79370 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
> [  904.711821] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007f3462123037
> [  904.712829] RDX: 0000000000080100 RSI: 00007ffdd1a79420 RDI: 00000000ffffff9c
> [  904.713839] RBP: 00007ffdd1a793e0 R08: 0000000000000000 R09: 0000000000000000
> [  904.714848] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffdd1a794c8
> [  904.715855] R13: 00007ffdd1a794b8 R14: 00007ffdd1a79690 R15: 00007ffdd1a79690
> [  904.716877]  </TASK>
> 
> Roberto
Roberto Sassu Oct. 16, 2024, 9:59 a.m. UTC | #9
On Fri, 2024-10-11 at 15:30 +0200, Roberto Sassu wrote:
> On Wed, 2024-10-09 at 18:43 +0200, Roberto Sassu wrote:
> > On Wed, 2024-10-09 at 18:25 +0200, Roberto Sassu wrote:
> > > On Wed, 2024-10-09 at 11:37 -0400, Paul Moore wrote:
> > > > On Wed, Oct 9, 2024 at 11:36 AM Paul Moore <paul@paul-moore.com> wrote:
> > > > > On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu
> > > > > <roberto.sassu@huaweicloud.com> wrote:
> > > > > > 
> > > > > > From: Roberto Sassu <roberto.sassu@huawei.com>
> > > > > > 
> > > > > > Move out the mutex in the ima_iint_cache structure to a new structure
> > > > > > called ima_iint_cache_lock, so that a lock can be taken regardless of
> > > > > > whether or not inode integrity metadata are stored in the inode.
> > > > > > 
> > > > > > Introduce ima_inode_security() to simplify accessing the new structure in
> > > > > > the inode security blob.
> > > > > > 
> > > > > > Move the mutex initialization and annotation in the new function
> > > > > > ima_inode_alloc_security() and introduce ima_iint_lock() and
> > > > > > ima_iint_unlock() to respectively lock and unlock the mutex.
> > > > > > 
> > > > > > Finally, expand the critical region in process_measurement() guarded by
> > > > > > iint->mutex up to where the inode was locked, use only one iint lock in
> > > > > > __ima_inode_hash(), since the mutex is now in the inode security blob, and
> > > > > > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().
> > > > > > 
> > > > > > Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> > > > > > ---
> > > > > >  security/integrity/ima/ima.h      | 26 ++++++++---
> > > > > >  security/integrity/ima/ima_api.c  |  4 +-
> > > > > >  security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
> > > > > >  security/integrity/ima/ima_main.c | 39 +++++++---------
> > > > > >  4 files changed, 104 insertions(+), 42 deletions(-)
> > > > > 
> > > > > I'm not an IMA expert, but it looks reasonable to me, although
> > > > > shouldn't this carry a stable CC in the patch metadata?
> > > > > 
> > > > > Reviewed-by: Paul Moore <paul@paul-moore.com>
> > > > 
> > > > Sorry, one more thing ... did you verify this patchset resolves the
> > > > syzbot problem?  I saw at least one reproducer.
> > > 
> > > Uhm, could not reproduce the deadlock with the reproducer. However,
> > > without the patch I have a lockdep warning, and with I don't.
> > > 
> > > I asked syzbot to try the patches. Let's see.
> > 
> > I actually got a different lockdep warning:
> > 
> > [  904.603365] ======================================================
> > [  904.604264] WARNING: possible circular locking dependency detected
> > [  904.605141] 6.12.0-rc2+ #20 Not tainted
> > [  904.605697] ------------------------------------------------------
> 
> I can reproduce by executing the syzbot reproducer and in another
> terminal by logging in with SSH (not all the times).
> 
> If I understood what the lockdep warning means, this is the scenario.
> 
> A task accesses a seq_file which is in the IMA policy, so we enter the
> critical region guarded by the iint lock. But before we get the chance
> to measure the file, a second task calls remap_file_pages() on the same
> seq_file.
> 
> remap_file_pages() takes the mmap lock and, if the event matches the
> IMA policy too, the second task waits for the iint lock to be released.
> 
> Now, the first task starts to measure the seq_file and takes the
> seq_file lock. I don't know if 3 processes must be involved, but I was
> thinking that reading the seq_file from the first task can trigger a
> page fault, which requires to take the mmap lock.
> 
> At this point, we reach a deadlock. The first task waits for the mmap
> lock to be released, and the second waits for the iint lock to be
> released, which both cannot happen.

+ mm, mm folks

(I leave the lockdep warning down for the new people included in the
thread).

I think I understood what the problem is. Any lock that is taken inside
security_file_mmap() would cause lock inversion since:

vm_mmap_pgoff():

	ret = security_mmap_file(file, prot, flag);
	if (!ret) {
		if (mmap_write_lock_killable(mm))



SYSCALL_DEFINE5(remap_file_pages, ...):

	if (mmap_write_lock_killable(mm))
		return -EINTR;

[...]

	file = get_file(vma->vm_file);
	ret = security_mmap_file(vma->vm_file, prot, flags);
	if (ret)
		goto out_fput;


Since I didn't see a good way to change the order of the second, I
changed the order of the first, i.e. I put security_file_mmap() under
the mmap lock.

(I don't know if this is a good idea.)

I cannot reproduce the lockdep warning anymore.

@mm folks, what is your opinion?

Thanks

Roberto

> Roberto
> 
> > [  904.606577] systemd/1 is trying to acquire lock:
> > [  904.607227] ffff88810e5c2580 (&p->lock){+.+.}-{4:4}, at: seq_read_iter+0x62/0x6b0
> > [  904.608290] 
> > [  904.608290] but task is already holding lock:
> > [  904.609105] ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
> > [  904.610429] 
> > [  904.610429] which lock already depends on the new lock.
> > [  904.610429] 
> > [  904.611574] 
> > [  904.611574] the existing dependency chain (in reverse order) is:
> > [  904.612628] 
> > [  904.612628] -> #2 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}:
> > [  904.613681]        __mutex_lock+0xaf/0x760
> > [  904.614266]        mutex_lock_nested+0x27/0x40
> > [  904.614897]        ima_iint_lock+0x24/0x40
> > [  904.615490]        process_measurement+0x176/0xef0
> > [  904.616168]        ima_file_mmap+0x98/0x120
> > [  904.616767]        security_mmap_file+0x408/0x560
> > [  904.617444]        __do_sys_remap_file_pages+0x2fa/0x4c0
> > [  904.618194]        __x64_sys_remap_file_pages+0x29/0x40
> > [  904.618937]        x64_sys_call+0x6e8/0x4550
> > [  904.619546]        do_syscall_64+0x71/0x180
> > [  904.620155]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [  904.620952] 
> > [  904.620952] -> #1 (&mm->mmap_lock){++++}-{4:4}:
> > [  904.621813]        __might_fault+0x6f/0xb0
> > [  904.622400]        _copy_to_iter+0x12e/0xa80
> > [  904.623009]        seq_read_iter+0x593/0x6b0
> > [  904.623629]        proc_reg_read_iter+0x31/0xe0
> > [  904.624276]        vfs_read+0x256/0x3d0
> > [  904.624822]        ksys_read+0x6d/0x160
> > [  904.625372]        __x64_sys_read+0x1d/0x30
> > [  904.625964]        x64_sys_call+0x1068/0x4550
> > [  904.626594]        do_syscall_64+0x71/0x180
> > [  904.627188]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [  904.627975] 
> > [  904.627975] -> #0 (&p->lock){+.+.}-{4:4}:
> > [  904.628787]        __lock_acquire+0x17f3/0x2320
> > [  904.629432]        lock_acquire+0xf2/0x420
> > [  904.630013]        __mutex_lock+0xaf/0x760
> > [  904.630596]        mutex_lock_nested+0x27/0x40
> > [  904.631225]        seq_read_iter+0x62/0x6b0
> > [  904.631831]        kernfs_fop_read_iter+0x1ef/0x2c0
> > [  904.632599]        __kernel_read+0x113/0x350
> > [  904.633206]        integrity_kernel_read+0x23/0x40
> > [  904.633902]        ima_calc_file_hash_tfm+0x14e/0x230
> > [  904.634621]        ima_calc_file_hash+0x97/0x250
> > [  904.635281]        ima_collect_measurement+0x4be/0x530
> > [  904.636008]        process_measurement+0x7c0/0xef0
> > [  904.636689]        ima_file_check+0x65/0x80
> > [  904.637295]        security_file_post_open+0xb1/0x1b0
> > [  904.638008]        path_openat+0x216/0x1280
> > [  904.638605]        do_filp_open+0xab/0x140
> > [  904.639185]        do_sys_openat2+0xba/0x120
> > [  904.639805]        do_sys_open+0x4c/0x80
> > [  904.640366]        __x64_sys_openat+0x23/0x30
> > [  904.640992]        x64_sys_call+0x2575/0x4550
> > [  904.641616]        do_syscall_64+0x71/0x180
> > [  904.642207]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [  904.643003] 
> > [  904.643003] other info that might help us debug this:
> > [  904.643003] 
> > [  904.644149] Chain exists of:
> > [  904.644149]   &p->lock --> &mm->mmap_lock --> &ima_iint_lock_mutex_key[depth]
> > [  904.644149] 
> > [  904.645763]  Possible unsafe locking scenario:
> > [  904.645763] 
> > [  904.646614]        CPU0                    CPU1
> > [  904.647264]        ----                    ----
> > [  904.647909]   lock(&ima_iint_lock_mutex_key[depth]);
> > [  904.648617]                                lock(&mm->mmap_lock);
> > [  904.649479]                                lock(&ima_iint_lock_mutex_key[depth]);
> > [  904.650543]   lock(&p->lock);
> > [  904.650974] 
> > [  904.650974]  *** DEADLOCK ***
> > [  904.650974] 
> > [  904.651826] 1 lock held by systemd/1:
> > [  904.652376]  #0: ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
> > [  904.653759] 
> > [  904.653759] stack backtrace:
> > [  904.654391] CPU: 2 UID: 0 PID: 1 Comm: systemd Not tainted 6.12.0-rc2+ #20
> > [  904.655360] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> > [  904.656497] Call Trace:
> > [  904.656856]  <TASK>
> > [  904.657166]  dump_stack_lvl+0x134/0x1a0
> > [  904.657728]  dump_stack+0x14/0x30
> > [  904.658206]  print_circular_bug+0x38d/0x450
> > [  904.658812]  check_noncircular+0xed/0x120
> > [  904.659396]  ? srso_return_thunk+0x5/0x5f
> > [  904.659972]  ? srso_return_thunk+0x5/0x5f
> > [  904.660569]  __lock_acquire+0x17f3/0x2320
> > [  904.661145]  lock_acquire+0xf2/0x420
> > [  904.661664]  ? seq_read_iter+0x62/0x6b0
> > [  904.662217]  ? srso_return_thunk+0x5/0x5f
> > [  904.662886]  __mutex_lock+0xaf/0x760
> > [  904.663408]  ? seq_read_iter+0x62/0x6b0
> > [  904.663961]  ? seq_read_iter+0x62/0x6b0
> > [  904.664525]  ? srso_return_thunk+0x5/0x5f
> > [  904.665098]  ? mark_lock+0x4e/0x750
> > [  904.665610]  ? mutex_lock_nested+0x27/0x40
> > [  904.666194]  ? find_held_lock+0x3a/0x100
> > [  904.666770]  mutex_lock_nested+0x27/0x40
> > [  904.667337]  seq_read_iter+0x62/0x6b0
> > [  904.667869]  kernfs_fop_read_iter+0x1ef/0x2c0
> > [  904.668536]  __kernel_read+0x113/0x350
> > [  904.669079]  integrity_kernel_read+0x23/0x40
> > [  904.669698]  ima_calc_file_hash_tfm+0x14e/0x230
> > [  904.670349]  ? __lock_acquire+0xc32/0x2320
> > [  904.670937]  ? srso_return_thunk+0x5/0x5f
> > [  904.671525]  ? __lock_acquire+0xfbb/0x2320
> > [  904.672113]  ? srso_return_thunk+0x5/0x5f
> > [  904.672693]  ? srso_return_thunk+0x5/0x5f
> > [  904.673280]  ? lock_acquire+0xf2/0x420
> > [  904.673818]  ? kernfs_iop_getattr+0x4a/0xb0
> > [  904.674424]  ? srso_return_thunk+0x5/0x5f
> > [  904.674997]  ? find_held_lock+0x3a/0x100
> > [  904.675564]  ? srso_return_thunk+0x5/0x5f
> > [  904.676156]  ? srso_return_thunk+0x5/0x5f
> > [  904.676740]  ? srso_return_thunk+0x5/0x5f
> > [  904.677322]  ? local_clock_noinstr+0x9/0xb0
> > [  904.677923]  ? srso_return_thunk+0x5/0x5f
> > [  904.678502]  ? srso_return_thunk+0x5/0x5f
> > [  904.679075]  ? lock_release+0x4e2/0x570
> > [  904.679639]  ima_calc_file_hash+0x97/0x250
> > [  904.680227]  ima_collect_measurement+0x4be/0x530
> > [  904.680901]  ? srso_return_thunk+0x5/0x5f
> > [  904.681496]  ? srso_return_thunk+0x5/0x5f
> > [  904.682070]  ? __kernfs_iattrs+0x4a/0x140
> > [  904.682658]  ? srso_return_thunk+0x5/0x5f
> > [  904.683242]  ? process_measurement+0x7c0/0xef0
> > [  904.683876]  ? srso_return_thunk+0x5/0x5f
> > [  904.684462]  process_measurement+0x7c0/0xef0
> > [  904.685078]  ? srso_return_thunk+0x5/0x5f
> > [  904.685654]  ? srso_return_thunk+0x5/0x5f
> > [  904.686228]  ? _raw_spin_unlock_irqrestore+0x5d/0xd0
> > [  904.686938]  ? srso_return_thunk+0x5/0x5f
> > [  904.687523]  ? srso_return_thunk+0x5/0x5f
> > [  904.688098]  ? srso_return_thunk+0x5/0x5f
> > [  904.688672]  ? local_clock_noinstr+0x9/0xb0
> > [  904.689273]  ? srso_return_thunk+0x5/0x5f
> > [  904.689846]  ? srso_return_thunk+0x5/0x5f
> > [  904.690430]  ? srso_return_thunk+0x5/0x5f
> > [  904.691005]  ? srso_return_thunk+0x5/0x5f
> > [  904.691583]  ? srso_return_thunk+0x5/0x5f
> > [  904.692180]  ? local_clock_noinstr+0x9/0xb0
> > [  904.692841]  ? srso_return_thunk+0x5/0x5f
> > [  904.693419]  ? srso_return_thunk+0x5/0x5f
> > [  904.693990]  ? lock_release+0x4e2/0x570
> > [  904.694544]  ? srso_return_thunk+0x5/0x5f
> > [  904.695115]  ? kernfs_put_active+0x5d/0xc0
> > [  904.695708]  ? srso_return_thunk+0x5/0x5f
> > [  904.696286]  ? kernfs_fop_open+0x376/0x6b0
> > [  904.696872]  ima_file_check+0x65/0x80
> > [  904.697409]  security_file_post_open+0xb1/0x1b0
> > [  904.698058]  path_openat+0x216/0x1280
> > [  904.698589]  do_filp_open+0xab/0x140
> > [  904.699106]  ? srso_return_thunk+0x5/0x5f
> > [  904.699693]  ? lock_release+0x554/0x570
> > [  904.700264]  ? srso_return_thunk+0x5/0x5f
> > [  904.700836]  ? do_raw_spin_unlock+0x76/0x140
> > [  904.701450]  ? srso_return_thunk+0x5/0x5f
> > [  904.702021]  ? _raw_spin_unlock+0x3f/0xa0
> > [  904.702606]  ? srso_return_thunk+0x5/0x5f
> > [  904.703178]  ? alloc_fd+0x1ca/0x3b0
> > [  904.703685]  do_sys_openat2+0xba/0x120
> > [  904.704223]  ? file_free+0x8d/0x110
> > [  904.704729]  do_sys_open+0x4c/0x80
> > [  904.705221]  __x64_sys_openat+0x23/0x30
> > [  904.705784]  x64_sys_call+0x2575/0x4550
> > [  904.706337]  do_syscall_64+0x71/0x180
> > [  904.706863]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [  904.707587] RIP: 0033:0x7f3462123037
> > [  904.708120] Code: 55 9c 48 89 75 a0 89 7d a8 44 89 55 ac e8 a1 7a f8 ff 44 8b 55 ac 8b 55 9c 41 89 c0 48 8b 75 a0 8b 7d a8 b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 45 ac e8 f6 7a f8 ff 8b 45 ac
> > [  904.710744] RSP: 002b:00007ffdd1a79370 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
> > [  904.711821] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007f3462123037
> > [  904.712829] RDX: 0000000000080100 RSI: 00007ffdd1a79420 RDI: 00000000ffffff9c
> > [  904.713839] RBP: 00007ffdd1a793e0 R08: 0000000000000000 R09: 0000000000000000
> > [  904.714848] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffdd1a794c8
> > [  904.715855] R13: 00007ffdd1a794b8 R14: 00007ffdd1a79690 R15: 00007ffdd1a79690
> > [  904.716877]  </TASK>
> > 
> > Roberto
>
Roberto Sassu Oct. 18, 2024, 9:24 a.m. UTC | #10
On Wed, 2024-10-16 at 11:59 +0200, Roberto Sassu wrote:
> On Fri, 2024-10-11 at 15:30 +0200, Roberto Sassu wrote:
> > On Wed, 2024-10-09 at 18:43 +0200, Roberto Sassu wrote:
> > > On Wed, 2024-10-09 at 18:25 +0200, Roberto Sassu wrote:
> > > > On Wed, 2024-10-09 at 11:37 -0400, Paul Moore wrote:
> > > > > On Wed, Oct 9, 2024 at 11:36 AM Paul Moore <paul@paul-moore.com> wrote:
> > > > > > On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu
> > > > > > <roberto.sassu@huaweicloud.com> wrote:
> > > > > > > 
> > > > > > > From: Roberto Sassu <roberto.sassu@huawei.com>
> > > > > > > 
> > > > > > > Move out the mutex in the ima_iint_cache structure to a new structure
> > > > > > > called ima_iint_cache_lock, so that a lock can be taken regardless of
> > > > > > > whether or not inode integrity metadata are stored in the inode.
> > > > > > > 
> > > > > > > Introduce ima_inode_security() to simplify accessing the new structure in
> > > > > > > the inode security blob.
> > > > > > > 
> > > > > > > Move the mutex initialization and annotation in the new function
> > > > > > > ima_inode_alloc_security() and introduce ima_iint_lock() and
> > > > > > > ima_iint_unlock() to respectively lock and unlock the mutex.
> > > > > > > 
> > > > > > > Finally, expand the critical region in process_measurement() guarded by
> > > > > > > iint->mutex up to where the inode was locked, use only one iint lock in
> > > > > > > __ima_inode_hash(), since the mutex is now in the inode security blob, and
> > > > > > > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer().
> > > > > > > 
> > > > > > > Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> > > > > > > ---
> > > > > > >  security/integrity/ima/ima.h      | 26 ++++++++---
> > > > > > >  security/integrity/ima/ima_api.c  |  4 +-
> > > > > > >  security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++-----
> > > > > > >  security/integrity/ima/ima_main.c | 39 +++++++---------
> > > > > > >  4 files changed, 104 insertions(+), 42 deletions(-)
> > > > > > 
> > > > > > I'm not an IMA expert, but it looks reasonable to me, although
> > > > > > shouldn't this carry a stable CC in the patch metadata?
> > > > > > 
> > > > > > Reviewed-by: Paul Moore <paul@paul-moore.com>
> > > > > 
> > > > > Sorry, one more thing ... did you verify this patchset resolves the
> > > > > syzbot problem?  I saw at least one reproducer.
> > > > 
> > > > Uhm, could not reproduce the deadlock with the reproducer. However,
> > > > without the patch I have a lockdep warning, and with I don't.
> > > > 
> > > > I asked syzbot to try the patches. Let's see.
> > > 
> > > I actually got a different lockdep warning:
> > > 
> > > [  904.603365] ======================================================
> > > [  904.604264] WARNING: possible circular locking dependency detected
> > > [  904.605141] 6.12.0-rc2+ #20 Not tainted
> > > [  904.605697] ------------------------------------------------------
> > 
> > I can reproduce by executing the syzbot reproducer and in another
> > terminal by logging in with SSH (not all the times).
> > 
> > If I understood what the lockdep warning means, this is the scenario.
> > 
> > A task accesses a seq_file which is in the IMA policy, so we enter the
> > critical region guarded by the iint lock. But before we get the chance
> > to measure the file, a second task calls remap_file_pages() on the same
> > seq_file.
> > 
> > remap_file_pages() takes the mmap lock and, if the event matches the
> > IMA policy too, the second task waits for the iint lock to be released.
> > 
> > Now, the first task starts to measure the seq_file and takes the
> > seq_file lock. I don't know if 3 processes must be involved, but I was
> > thinking that reading the seq_file from the first task can trigger a
> > page fault, which requires to take the mmap lock.
> > 
> > At this point, we reach a deadlock. The first task waits for the mmap
> > lock to be released, and the second waits for the iint lock to be
> > released, which both cannot happen.
> 
> + mm, mm folks
> 
> (I leave the lockdep warning down for the new people included in the
> thread).
> 
> I think I understood what the problem is. Any lock that is taken inside
> security_file_mmap() would cause lock inversion since:

+ Kirill
 
Ok, to be clear, we are talking about a regression introduced by commit
ea7e2d5e49c05 ("mm: call the security_mmap_file() LSM hook in
remap_file_pages()").

Originally, Mimi asked this patch to be included:

https://lore.kernel.org/lkml/1336963631-3541-1-git-send-email-zohar@us.ibm.com/

This was due to the commit 196f518 ("IMA: explicit IMA i_flag to remove
global lock on inode_delete"), which added an inode lock to protect the
new flag S_IMA, used to track integrity metadata only for the set of
inodes evaluated by IMA.

The problem was that, for mmapped files, the lock is taken in the
opposite order than the convention (i_mutex and then mmap_sem), causing
a lock inversion and a lockdep warning.

Linus proposed to split security_file_mmap() into security_mmap_file()
and security_mmap_addr(), so that the former can be taken out of the
mmap_sem lock and remove the lock inversion. The final commit was made
by Al Viro, 8b3ec6814c83 ("take security_mmap_file() outside of -
>mmap_sem").

Commit ea7e2d5e49c05 put again a security_mmap_file() call inside the
mmap_sem (now mmap_lock) lock, causing the recent syzbot reports.

Paul asked if the inode lock can be removed from IMA, and actually
partially can be done:

https://lore.kernel.org/linux-integrity/20241008165732.2603647-1-roberto.sassu@huaweicloud.com/

The patch is good in its own, since it merges two critical sections in
one. However, the inode lock cannot be removed completely:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/security/integrity/ima/ima_main.c?h=v6.12-rc3#n386

This one is required to set the security.ima xattr in
ima_appraise_measurement() -> ima_fix_xattr(), when IMA-Appraise is in
fix mode (ima_appraise=fix in the kernel command line).

I would say that my original suggestion of putting security_mmap_file()
back to the mmap_lock lock probably is not the optimal solution. Maybe
we could get around removing the remaining inode lock in IMA, but we
would also limit future LSMs to not use the inode lock if they need.

Probably it is hard, @Kirill would there be any way to safely move
security_mmap_file() out of the mmap_lock lock?

Thanks

Roberto


PS: I'm aware that Shu Han proposed a solution to the lock inversion:

https://lore.kernel.org/linux-security-module/20240928180044.50-1-ebpqwerty472123@gmail.com/

but I think anyway it boils down to either moving security_mmap_file()
to the mmap_lock lock again for all calls, or viceversa, moving
security_mmap_file() out of the remap_file_pages() system call.

> vm_mmap_pgoff():
> 
> 	ret = security_mmap_file(file, prot, flag);
> 	if (!ret) {
> 		if (mmap_write_lock_killable(mm))
> 
> 
> 
> SYSCALL_DEFINE5(remap_file_pages, ...):
> 
> 	if (mmap_write_lock_killable(mm))
> 		return -EINTR;
> 
> [...]
> 
> 	file = get_file(vma->vm_file);
> 	ret = security_mmap_file(vma->vm_file, prot, flags);
> 	if (ret)
> 		goto out_fput;
> 
> 
> Since I didn't see a good way to change the order of the second, I
> changed the order of the first, i.e. I put security_file_mmap() under
> the mmap lock.
> 
> (I don't know if this is a good idea.)
> 
> I cannot reproduce the lockdep warning anymore.
> 
> @mm folks, what is your opinion?
> 
> Thanks
> 
> Roberto
> 
> > Roberto
> > 
> > > [  904.606577] systemd/1 is trying to acquire lock:
> > > [  904.607227] ffff88810e5c2580 (&p->lock){+.+.}-{4:4}, at: seq_read_iter+0x62/0x6b0
> > > [  904.608290] 
> > > [  904.608290] but task is already holding lock:
> > > [  904.609105] ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
> > > [  904.610429] 
> > > [  904.610429] which lock already depends on the new lock.
> > > [  904.610429] 
> > > [  904.611574] 
> > > [  904.611574] the existing dependency chain (in reverse order) is:
> > > [  904.612628] 
> > > [  904.612628] -> #2 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}:
> > > [  904.613681]        __mutex_lock+0xaf/0x760
> > > [  904.614266]        mutex_lock_nested+0x27/0x40
> > > [  904.614897]        ima_iint_lock+0x24/0x40
> > > [  904.615490]        process_measurement+0x176/0xef0
> > > [  904.616168]        ima_file_mmap+0x98/0x120
> > > [  904.616767]        security_mmap_file+0x408/0x560
> > > [  904.617444]        __do_sys_remap_file_pages+0x2fa/0x4c0
> > > [  904.618194]        __x64_sys_remap_file_pages+0x29/0x40
> > > [  904.618937]        x64_sys_call+0x6e8/0x4550
> > > [  904.619546]        do_syscall_64+0x71/0x180
> > > [  904.620155]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [  904.620952] 
> > > [  904.620952] -> #1 (&mm->mmap_lock){++++}-{4:4}:
> > > [  904.621813]        __might_fault+0x6f/0xb0
> > > [  904.622400]        _copy_to_iter+0x12e/0xa80
> > > [  904.623009]        seq_read_iter+0x593/0x6b0
> > > [  904.623629]        proc_reg_read_iter+0x31/0xe0
> > > [  904.624276]        vfs_read+0x256/0x3d0
> > > [  904.624822]        ksys_read+0x6d/0x160
> > > [  904.625372]        __x64_sys_read+0x1d/0x30
> > > [  904.625964]        x64_sys_call+0x1068/0x4550
> > > [  904.626594]        do_syscall_64+0x71/0x180
> > > [  904.627188]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [  904.627975] 
> > > [  904.627975] -> #0 (&p->lock){+.+.}-{4:4}:
> > > [  904.628787]        __lock_acquire+0x17f3/0x2320
> > > [  904.629432]        lock_acquire+0xf2/0x420
> > > [  904.630013]        __mutex_lock+0xaf/0x760
> > > [  904.630596]        mutex_lock_nested+0x27/0x40
> > > [  904.631225]        seq_read_iter+0x62/0x6b0
> > > [  904.631831]        kernfs_fop_read_iter+0x1ef/0x2c0
> > > [  904.632599]        __kernel_read+0x113/0x350
> > > [  904.633206]        integrity_kernel_read+0x23/0x40
> > > [  904.633902]        ima_calc_file_hash_tfm+0x14e/0x230
> > > [  904.634621]        ima_calc_file_hash+0x97/0x250
> > > [  904.635281]        ima_collect_measurement+0x4be/0x530
> > > [  904.636008]        process_measurement+0x7c0/0xef0
> > > [  904.636689]        ima_file_check+0x65/0x80
> > > [  904.637295]        security_file_post_open+0xb1/0x1b0
> > > [  904.638008]        path_openat+0x216/0x1280
> > > [  904.638605]        do_filp_open+0xab/0x140
> > > [  904.639185]        do_sys_openat2+0xba/0x120
> > > [  904.639805]        do_sys_open+0x4c/0x80
> > > [  904.640366]        __x64_sys_openat+0x23/0x30
> > > [  904.640992]        x64_sys_call+0x2575/0x4550
> > > [  904.641616]        do_syscall_64+0x71/0x180
> > > [  904.642207]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [  904.643003] 
> > > [  904.643003] other info that might help us debug this:
> > > [  904.643003] 
> > > [  904.644149] Chain exists of:
> > > [  904.644149]   &p->lock --> &mm->mmap_lock --> &ima_iint_lock_mutex_key[depth]
> > > [  904.644149] 
> > > [  904.645763]  Possible unsafe locking scenario:
> > > [  904.645763] 
> > > [  904.646614]        CPU0                    CPU1
> > > [  904.647264]        ----                    ----
> > > [  904.647909]   lock(&ima_iint_lock_mutex_key[depth]);
> > > [  904.648617]                                lock(&mm->mmap_lock);
> > > [  904.649479]                                lock(&ima_iint_lock_mutex_key[depth]);
> > > [  904.650543]   lock(&p->lock);
> > > [  904.650974] 
> > > [  904.650974]  *** DEADLOCK ***
> > > [  904.650974] 
> > > [  904.651826] 1 lock held by systemd/1:
> > > [  904.652376]  #0: ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40
> > > [  904.653759] 
> > > [  904.653759] stack backtrace:
> > > [  904.654391] CPU: 2 UID: 0 PID: 1 Comm: systemd Not tainted 6.12.0-rc2+ #20
> > > [  904.655360] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> > > [  904.656497] Call Trace:
> > > [  904.656856]  <TASK>
> > > [  904.657166]  dump_stack_lvl+0x134/0x1a0
> > > [  904.657728]  dump_stack+0x14/0x30
> > > [  904.658206]  print_circular_bug+0x38d/0x450
> > > [  904.658812]  check_noncircular+0xed/0x120
> > > [  904.659396]  ? srso_return_thunk+0x5/0x5f
> > > [  904.659972]  ? srso_return_thunk+0x5/0x5f
> > > [  904.660569]  __lock_acquire+0x17f3/0x2320
> > > [  904.661145]  lock_acquire+0xf2/0x420
> > > [  904.661664]  ? seq_read_iter+0x62/0x6b0
> > > [  904.662217]  ? srso_return_thunk+0x5/0x5f
> > > [  904.662886]  __mutex_lock+0xaf/0x760
> > > [  904.663408]  ? seq_read_iter+0x62/0x6b0
> > > [  904.663961]  ? seq_read_iter+0x62/0x6b0
> > > [  904.664525]  ? srso_return_thunk+0x5/0x5f
> > > [  904.665098]  ? mark_lock+0x4e/0x750
> > > [  904.665610]  ? mutex_lock_nested+0x27/0x40
> > > [  904.666194]  ? find_held_lock+0x3a/0x100
> > > [  904.666770]  mutex_lock_nested+0x27/0x40
> > > [  904.667337]  seq_read_iter+0x62/0x6b0
> > > [  904.667869]  kernfs_fop_read_iter+0x1ef/0x2c0
> > > [  904.668536]  __kernel_read+0x113/0x350
> > > [  904.669079]  integrity_kernel_read+0x23/0x40
> > > [  904.669698]  ima_calc_file_hash_tfm+0x14e/0x230
> > > [  904.670349]  ? __lock_acquire+0xc32/0x2320
> > > [  904.670937]  ? srso_return_thunk+0x5/0x5f
> > > [  904.671525]  ? __lock_acquire+0xfbb/0x2320
> > > [  904.672113]  ? srso_return_thunk+0x5/0x5f
> > > [  904.672693]  ? srso_return_thunk+0x5/0x5f
> > > [  904.673280]  ? lock_acquire+0xf2/0x420
> > > [  904.673818]  ? kernfs_iop_getattr+0x4a/0xb0
> > > [  904.674424]  ? srso_return_thunk+0x5/0x5f
> > > [  904.674997]  ? find_held_lock+0x3a/0x100
> > > [  904.675564]  ? srso_return_thunk+0x5/0x5f
> > > [  904.676156]  ? srso_return_thunk+0x5/0x5f
> > > [  904.676740]  ? srso_return_thunk+0x5/0x5f
> > > [  904.677322]  ? local_clock_noinstr+0x9/0xb0
> > > [  904.677923]  ? srso_return_thunk+0x5/0x5f
> > > [  904.678502]  ? srso_return_thunk+0x5/0x5f
> > > [  904.679075]  ? lock_release+0x4e2/0x570
> > > [  904.679639]  ima_calc_file_hash+0x97/0x250
> > > [  904.680227]  ima_collect_measurement+0x4be/0x530
> > > [  904.680901]  ? srso_return_thunk+0x5/0x5f
> > > [  904.681496]  ? srso_return_thunk+0x5/0x5f
> > > [  904.682070]  ? __kernfs_iattrs+0x4a/0x140
> > > [  904.682658]  ? srso_return_thunk+0x5/0x5f
> > > [  904.683242]  ? process_measurement+0x7c0/0xef0
> > > [  904.683876]  ? srso_return_thunk+0x5/0x5f
> > > [  904.684462]  process_measurement+0x7c0/0xef0
> > > [  904.685078]  ? srso_return_thunk+0x5/0x5f
> > > [  904.685654]  ? srso_return_thunk+0x5/0x5f
> > > [  904.686228]  ? _raw_spin_unlock_irqrestore+0x5d/0xd0
> > > [  904.686938]  ? srso_return_thunk+0x5/0x5f
> > > [  904.687523]  ? srso_return_thunk+0x5/0x5f
> > > [  904.688098]  ? srso_return_thunk+0x5/0x5f
> > > [  904.688672]  ? local_clock_noinstr+0x9/0xb0
> > > [  904.689273]  ? srso_return_thunk+0x5/0x5f
> > > [  904.689846]  ? srso_return_thunk+0x5/0x5f
> > > [  904.690430]  ? srso_return_thunk+0x5/0x5f
> > > [  904.691005]  ? srso_return_thunk+0x5/0x5f
> > > [  904.691583]  ? srso_return_thunk+0x5/0x5f
> > > [  904.692180]  ? local_clock_noinstr+0x9/0xb0
> > > [  904.692841]  ? srso_return_thunk+0x5/0x5f
> > > [  904.693419]  ? srso_return_thunk+0x5/0x5f
> > > [  904.693990]  ? lock_release+0x4e2/0x570
> > > [  904.694544]  ? srso_return_thunk+0x5/0x5f
> > > [  904.695115]  ? kernfs_put_active+0x5d/0xc0
> > > [  904.695708]  ? srso_return_thunk+0x5/0x5f
> > > [  904.696286]  ? kernfs_fop_open+0x376/0x6b0
> > > [  904.696872]  ima_file_check+0x65/0x80
> > > [  904.697409]  security_file_post_open+0xb1/0x1b0
> > > [  904.698058]  path_openat+0x216/0x1280
> > > [  904.698589]  do_filp_open+0xab/0x140
> > > [  904.699106]  ? srso_return_thunk+0x5/0x5f
> > > [  904.699693]  ? lock_release+0x554/0x570
> > > [  904.700264]  ? srso_return_thunk+0x5/0x5f
> > > [  904.700836]  ? do_raw_spin_unlock+0x76/0x140
> > > [  904.701450]  ? srso_return_thunk+0x5/0x5f
> > > [  904.702021]  ? _raw_spin_unlock+0x3f/0xa0
> > > [  904.702606]  ? srso_return_thunk+0x5/0x5f
> > > [  904.703178]  ? alloc_fd+0x1ca/0x3b0
> > > [  904.703685]  do_sys_openat2+0xba/0x120
> > > [  904.704223]  ? file_free+0x8d/0x110
> > > [  904.704729]  do_sys_open+0x4c/0x80
> > > [  904.705221]  __x64_sys_openat+0x23/0x30
> > > [  904.705784]  x64_sys_call+0x2575/0x4550
> > > [  904.706337]  do_syscall_64+0x71/0x180
> > > [  904.706863]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [  904.707587] RIP: 0033:0x7f3462123037
> > > [  904.708120] Code: 55 9c 48 89 75 a0 89 7d a8 44 89 55 ac e8 a1 7a f8 ff 44 8b 55 ac 8b 55 9c 41 89 c0 48 8b 75 a0 8b 7d a8 b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 45 ac e8 f6 7a f8 ff 8b 45 ac
> > > [  904.710744] RSP: 002b:00007ffdd1a79370 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
> > > [  904.711821] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007f3462123037
> > > [  904.712829] RDX: 0000000000080100 RSI: 00007ffdd1a79420 RDI: 00000000ffffff9c
> > > [  904.713839] RBP: 00007ffdd1a793e0 R08: 0000000000000000 R09: 0000000000000000
> > > [  904.714848] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffdd1a794c8
> > > [  904.715855] R13: 00007ffdd1a794b8 R14: 00007ffdd1a79690 R15: 00007ffdd1a79690
> > > [  904.716877]  </TASK>
> > > 
> > > Roberto
> > 
>
Kirill A. Shutemov Oct. 18, 2024, 10:49 a.m. UTC | #11
On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote:
> Probably it is hard, @Kirill would there be any way to safely move
> security_mmap_file() out of the mmap_lock lock?

What about something like this (untested):

diff --git a/mm/mmap.c b/mm/mmap.c
index dd4b35a25aeb..03473e77d356 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
 		return ret;
 
+	if (mmap_read_lock_killable(mm))
+		return -EINTR;
+
+	vma = vma_lookup(mm, start);
+
+	if (!vma || !(vma->vm_flags & VM_SHARED)) {
+		mmap_read_unlock(mm);
+		return -EINVAL;
+	}
+
+	file = get_file(vma->vm_file);
+
+	mmap_read_unlock(mm);
+
+	ret = security_mmap_file(vma->vm_file, prot, flags);
+	if (ret) {
+		fput(file);
+		return ret;
+	}
+
 	if (mmap_write_lock_killable(mm))
 		return -EINTR;
 
@@ -1654,6 +1674,9 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	if (!vma || !(vma->vm_flags & VM_SHARED))
 		goto out;
 
+	if (vma->vm_file != file)
+		goto out;
+
 	if (start + size > vma->vm_end) {
 		VMA_ITERATOR(vmi, mm, vma->vm_end);
 		struct vm_area_struct *next, *prev = vma;
@@ -1688,16 +1711,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	if (vma->vm_flags & VM_LOCKED)
 		flags |= MAP_LOCKED;
 
-	file = get_file(vma->vm_file);
-	ret = security_mmap_file(vma->vm_file, prot, flags);
-	if (ret)
-		goto out_fput;
 	ret = do_mmap(vma->vm_file, start, size,
 			prot, flags, 0, pgoff, &populate, NULL);
-out_fput:
-	fput(file);
 out:
 	mmap_write_unlock(mm);
+	fput(file);
 	if (populate)
 		mm_populate(ret, populate);
 	if (!IS_ERR_VALUE(ret))
Kirill A. Shutemov Oct. 18, 2024, 10:52 a.m. UTC | #12
On Fri, Oct 18, 2024 at 01:49:21PM +0300, Kirill A. Shutemov wrote:
> On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote:
> > Probably it is hard, @Kirill would there be any way to safely move
> > security_mmap_file() out of the mmap_lock lock?
> 
> What about something like this (untested):
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index dd4b35a25aeb..03473e77d356 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>  	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
>  		return ret;
>  
> +	if (mmap_read_lock_killable(mm))
> +		return -EINTR;
> +
> +	vma = vma_lookup(mm, start);
> +
> +	if (!vma || !(vma->vm_flags & VM_SHARED)) {
> +		mmap_read_unlock(mm);
> +		return -EINVAL;
> +	}
> +
> +	file = get_file(vma->vm_file);
> +
> +	mmap_read_unlock(mm);
> +
> +	ret = security_mmap_file(vma->vm_file, prot, flags);
> +	if (ret) {
> +		fput(file);
> +		return ret;
> +	}
> +

Emm. We need to restore 'ret' to -EINVAL here:

+ 
+  	ret = -EINVAL;
+ 

>  	if (mmap_write_lock_killable(mm))
>  		return -EINTR;
>  
> @@ -1654,6 +1674,9 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>  	if (!vma || !(vma->vm_flags & VM_SHARED))
>  		goto out;
>  
> +	if (vma->vm_file != file)
> +		goto out;
> +
>  	if (start + size > vma->vm_end) {
>  		VMA_ITERATOR(vmi, mm, vma->vm_end);
>  		struct vm_area_struct *next, *prev = vma;
> @@ -1688,16 +1711,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>  	if (vma->vm_flags & VM_LOCKED)
>  		flags |= MAP_LOCKED;
>  
> -	file = get_file(vma->vm_file);
> -	ret = security_mmap_file(vma->vm_file, prot, flags);
> -	if (ret)
> -		goto out_fput;
>  	ret = do_mmap(vma->vm_file, start, size,
>  			prot, flags, 0, pgoff, &populate, NULL);
> -out_fput:
> -	fput(file);
>  out:
>  	mmap_write_unlock(mm);
> +	fput(file);
>  	if (populate)
>  		mm_populate(ret, populate);
>  	if (!IS_ERR_VALUE(ret))
> -- 
>   Kiryl Shutsemau / Kirill A. Shutemov
Kirill A. Shutemov Oct. 18, 2024, 10:55 a.m. UTC | #13
On Fri, Oct 18, 2024 at 01:53:03PM +0300, Kirill A. Shutemov wrote:
> > +	mmap_read_unlock(mm);
> > +
> > +	ret = security_mmap_file(vma->vm_file, prot, flags);
> > +	if (ret) {
> > +		fput(file);
> > +		return ret;
> > +	}
> > +
> 
> Emm. We need to restore 'ret' to -EINVAL here:
> 
> + 
> +  	ret = -EINVAL;
> + 
> 
> >  	if (mmap_write_lock_killable(mm))
> >  		return -EINTR;
> >  

And fput() here on error.

Updated patch:

diff --git a/mm/mmap.c b/mm/mmap.c
index dd4b35a25aeb..7c1b73a79937 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1646,14 +1646,41 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
 		return ret;
 
-	if (mmap_write_lock_killable(mm))
+	if (mmap_read_lock_killable(mm))
 		return -EINTR;
 
 	vma = vma_lookup(mm, start);
 
+	if (!vma || !(vma->vm_flags & VM_SHARED)) {
+		mmap_read_unlock(mm);
+		return -EINVAL;
+	}
+
+	file = get_file(vma->vm_file);
+
+	mmap_read_unlock(mm);
+
+	ret = security_mmap_file(vma->vm_file, prot, flags);
+	if (ret) {
+		fput(file);
+		return ret;
+	}
+
+	ret = -EINVAL;
+
+	if (mmap_write_lock_killable(mm)) {
+		fput(file);
+		return -EINTR;
+	}
+
+	vma = vma_lookup(mm, start);
+
 	if (!vma || !(vma->vm_flags & VM_SHARED))
 		goto out;
 
+	if (vma->vm_file != file)
+		goto out;
+
 	if (start + size > vma->vm_end) {
 		VMA_ITERATOR(vmi, mm, vma->vm_end);
 		struct vm_area_struct *next, *prev = vma;
@@ -1688,16 +1715,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	if (vma->vm_flags & VM_LOCKED)
 		flags |= MAP_LOCKED;
 
-	file = get_file(vma->vm_file);
-	ret = security_mmap_file(vma->vm_file, prot, flags);
-	if (ret)
-		goto out_fput;
 	ret = do_mmap(vma->vm_file, start, size,
 			prot, flags, 0, pgoff, &populate, NULL);
-out_fput:
-	fput(file);
 out:
 	mmap_write_unlock(mm);
+	fput(file);
 	if (populate)
 		mm_populate(ret, populate);
 	If (!IS_ERR_VALUE(ret))
Lorenzo Stoakes Oct. 18, 2024, 11 a.m. UTC | #14
+ Liam, Jann

On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote:
> On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote:
> > Probably it is hard, @Kirill would there be any way to safely move
> > security_mmap_file() out of the mmap_lock lock?
>
> What about something like this (untested):
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index dd4b35a25aeb..03473e77d356 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>  	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
>  		return ret;
>
> +	if (mmap_read_lock_killable(mm))
> +		return -EINTR;
> +
> +	vma = vma_lookup(mm, start);
> +
> +	if (!vma || !(vma->vm_flags & VM_SHARED)) {
> +		mmap_read_unlock(mm);
> +		return -EINVAL;
> +	}
> +
> +	file = get_file(vma->vm_file);
> +
> +	mmap_read_unlock(mm);
> +
> +	ret = security_mmap_file(vma->vm_file, prot, flags);

Accessing VMA fields without any kind of lock is... very much not advised.

I'm guessing you meant to say:

	ret = security_mmap_file(file, prot, flags);

Here? :)

I see the original code did this, but obviously was under an mmap lock.

I guess given you check that the file is the same below this.... should be
fine? Assuming nothing can come in and invalidate the security_mmap_file()
check in the mean time somehow?

Jann any thoughts?


> +	if (ret) {
> +		fput(file);
> +		return ret;
> +	}
> +
>  	if (mmap_write_lock_killable(mm))
>  		return -EINTR;
>
> @@ -1654,6 +1674,9 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>  	if (!vma || !(vma->vm_flags & VM_SHARED))
>  		goto out;
>
> +	if (vma->vm_file != file)
> +		goto out;
> +
>  	if (start + size > vma->vm_end) {
>  		VMA_ITERATOR(vmi, mm, vma->vm_end);
>  		struct vm_area_struct *next, *prev = vma;
> @@ -1688,16 +1711,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>  	if (vma->vm_flags & VM_LOCKED)
>  		flags |= MAP_LOCKED;
>
> -	file = get_file(vma->vm_file);
> -	ret = security_mmap_file(vma->vm_file, prot, flags);
> -	if (ret)
> -		goto out_fput;
>  	ret = do_mmap(vma->vm_file, start, size,
>  			prot, flags, 0, pgoff, &populate, NULL);
> -out_fput:
> -	fput(file);
>  out:
>  	mmap_write_unlock(mm);
> +	fput(file);
>  	if (populate)
>  		mm_populate(ret, populate);
>  	if (!IS_ERR_VALUE(ret))
> --
>   Kiryl Shutsemau / Kirill A. Shutemov
Kirill A. Shutemov Oct. 18, 2024, 11:05 a.m. UTC | #15
On Fri, Oct 18, 2024 at 12:00:22PM +0100, Lorenzo Stoakes wrote:
> + Liam, Jann
> 
> On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote:
> > > Probably it is hard, @Kirill would there be any way to safely move
> > > security_mmap_file() out of the mmap_lock lock?
> >
> > What about something like this (untested):
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index dd4b35a25aeb..03473e77d356 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> >  	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
> >  		return ret;
> >
> > +	if (mmap_read_lock_killable(mm))
> > +		return -EINTR;
> > +
> > +	vma = vma_lookup(mm, start);
> > +
> > +	if (!vma || !(vma->vm_flags & VM_SHARED)) {
> > +		mmap_read_unlock(mm);
> > +		return -EINVAL;
> > +	}
> > +
> > +	file = get_file(vma->vm_file);
> > +
> > +	mmap_read_unlock(mm);
> > +
> > +	ret = security_mmap_file(vma->vm_file, prot, flags);
> 
> Accessing VMA fields without any kind of lock is... very much not advised.
> 
> I'm guessing you meant to say:
> 
> 	ret = security_mmap_file(file, prot, flags);
> 
> Here? :)

Sure. My bad.

Patch with all fixups:

diff --git a/mm/mmap.c b/mm/mmap.c
index dd4b35a25aeb..541787d526b6 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1646,14 +1646,41 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
 		return ret;
 
-	if (mmap_write_lock_killable(mm))
+	if (mmap_read_lock_killable(mm))
 		return -EINTR;
 
 	vma = vma_lookup(mm, start);
 
+	if (!vma || !(vma->vm_flags & VM_SHARED)) {
+		mmap_read_unlock(mm);
+		return -EINVAL;
+	}
+
+	file = get_file(vma->vm_file);
+
+	mmap_read_unlock(mm);
+
+	ret = security_mmap_file(file, prot, flags);
+	if (ret) {
+		fput(file);
+		return ret;
+	}
+
+	ret = -EINVAL;
+
+	if (mmap_write_lock_killable(mm)) {
+		fput(file);
+		return -EINTR;
+	}
+
+	vma = vma_lookup(mm, start);
+
 	if (!vma || !(vma->vm_flags & VM_SHARED))
 		goto out;
 
+	if (vma->vm_file != file)
+		goto out;
+
 	if (start + size > vma->vm_end) {
 		VMA_ITERATOR(vmi, mm, vma->vm_end);
 		struct vm_area_struct *next, *prev = vma;
@@ -1688,16 +1715,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	if (vma->vm_flags & VM_LOCKED)
 		flags |= MAP_LOCKED;
 
-	file = get_file(vma->vm_file);
-	ret = security_mmap_file(vma->vm_file, prot, flags);
-	if (ret)
-		goto out_fput;
 	ret = do_mmap(vma->vm_file, start, size,
 			prot, flags, 0, pgoff, &populate, NULL);
-out_fput:
-	fput(file);
 out:
 	mmap_write_unlock(mm);
+	fput(file);
 	if (populate)
 		mm_populate(ret, populate);
 	if (!IS_ERR_VALUE(ret))
Roberto Sassu Oct. 18, 2024, 11:22 a.m. UTC | #16
On Fri, 2024-10-18 at 14:05 +0300, Kirill A. Shutemov wrote:
> On Fri, Oct 18, 2024 at 12:00:22PM +0100, Lorenzo Stoakes wrote:
> > + Liam, Jann
> > 
> > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote:
> > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote:
> > > > Probably it is hard, @Kirill would there be any way to safely move
> > > > security_mmap_file() out of the mmap_lock lock?
> > > 
> > > What about something like this (untested):
> > > 
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index dd4b35a25aeb..03473e77d356 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> > >  	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
> > >  		return ret;
> > > 
> > > +	if (mmap_read_lock_killable(mm))
> > > +		return -EINTR;
> > > +
> > > +	vma = vma_lookup(mm, start);
> > > +
> > > +	if (!vma || !(vma->vm_flags & VM_SHARED)) {
> > > +		mmap_read_unlock(mm);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	file = get_file(vma->vm_file);
> > > +
> > > +	mmap_read_unlock(mm);
> > > +
> > > +	ret = security_mmap_file(vma->vm_file, prot, flags);
> > 
> > Accessing VMA fields without any kind of lock is... very much not advised.
> > 
> > I'm guessing you meant to say:
> > 
> > 	ret = security_mmap_file(file, prot, flags);
> > 
> > Here? :)
> 
> Sure. My bad.
> 
> Patch with all fixups:

Thanks a lot! Let's wait a bit until the others have a chance to
comment. Meanwhile, I will test it.

Do you want me to do the final patch, or will you be proposing it?

Roberto

> diff --git a/mm/mmap.c b/mm/mmap.c
> index dd4b35a25aeb..541787d526b6 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1646,14 +1646,41 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>  	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
>  		return ret;
>  
> -	if (mmap_write_lock_killable(mm))
> +	if (mmap_read_lock_killable(mm))
>  		return -EINTR;
>  
>  	vma = vma_lookup(mm, start);
>  
> +	if (!vma || !(vma->vm_flags & VM_SHARED)) {
> +		mmap_read_unlock(mm);
> +		return -EINVAL;
> +	}
> +
> +	file = get_file(vma->vm_file);
> +
> +	mmap_read_unlock(mm);
> +
> +	ret = security_mmap_file(file, prot, flags);
> +	if (ret) {
> +		fput(file);
> +		return ret;
> +	}
> +
> +	ret = -EINVAL;
> +
> +	if (mmap_write_lock_killable(mm)) {
> +		fput(file);
> +		return -EINTR;
> +	}
> +
> +	vma = vma_lookup(mm, start);
> +
>  	if (!vma || !(vma->vm_flags & VM_SHARED))
>  		goto out;
>  
> +	if (vma->vm_file != file)
> +		goto out;
> +
>  	if (start + size > vma->vm_end) {
>  		VMA_ITERATOR(vmi, mm, vma->vm_end);
>  		struct vm_area_struct *next, *prev = vma;
> @@ -1688,16 +1715,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>  	if (vma->vm_flags & VM_LOCKED)
>  		flags |= MAP_LOCKED;
>  
> -	file = get_file(vma->vm_file);
> -	ret = security_mmap_file(vma->vm_file, prot, flags);
> -	if (ret)
> -		goto out_fput;
>  	ret = do_mmap(vma->vm_file, start, size,
>  			prot, flags, 0, pgoff, &populate, NULL);
> -out_fput:
> -	fput(file);
>  out:
>  	mmap_write_unlock(mm);
> +	fput(file);
>  	if (populate)
>  		mm_populate(ret, populate);
>  	if (!IS_ERR_VALUE(ret))
Kirill A. Shutemov Oct. 18, 2024, 11:54 a.m. UTC | #17
On Fri, Oct 18, 2024 at 01:22:35PM +0200, Roberto Sassu wrote:
> On Fri, 2024-10-18 at 14:05 +0300, Kirill A. Shutemov wrote:
> > On Fri, Oct 18, 2024 at 12:00:22PM +0100, Lorenzo Stoakes wrote:
> > > + Liam, Jann
> > > 
> > > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote:
> > > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote:
> > > > > Probably it is hard, @Kirill would there be any way to safely move
> > > > > security_mmap_file() out of the mmap_lock lock?
> > > > 
> > > > What about something like this (untested):
> > > > 
> > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > index dd4b35a25aeb..03473e77d356 100644
> > > > --- a/mm/mmap.c
> > > > +++ b/mm/mmap.c
> > > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> > > >  	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
> > > >  		return ret;
> > > > 
> > > > +	if (mmap_read_lock_killable(mm))
> > > > +		return -EINTR;
> > > > +
> > > > +	vma = vma_lookup(mm, start);
> > > > +
> > > > +	if (!vma || !(vma->vm_flags & VM_SHARED)) {
> > > > +		mmap_read_unlock(mm);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	file = get_file(vma->vm_file);
> > > > +
> > > > +	mmap_read_unlock(mm);
> > > > +
> > > > +	ret = security_mmap_file(vma->vm_file, prot, flags);
> > > 
> > > Accessing VMA fields without any kind of lock is... very much not advised.
> > > 
> > > I'm guessing you meant to say:
> > > 
> > > 	ret = security_mmap_file(file, prot, flags);
> > > 
> > > Here? :)
> > 
> > Sure. My bad.
> > 
> > Patch with all fixups:
> 
> Thanks a lot! Let's wait a bit until the others have a chance to
> comment. Meanwhile, I will test it.
> 
> Do you want me to do the final patch, or will you be proposing it?

You can post it if it works:

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Roberto Sassu Oct. 18, 2024, 11:56 a.m. UTC | #18
On Fri, 2024-10-18 at 14:54 +0300, Kirill A. Shutemov wrote:
> On Fri, Oct 18, 2024 at 01:22:35PM +0200, Roberto Sassu wrote:
> > On Fri, 2024-10-18 at 14:05 +0300, Kirill A. Shutemov wrote:
> > > On Fri, Oct 18, 2024 at 12:00:22PM +0100, Lorenzo Stoakes wrote:
> > > > + Liam, Jann
> > > > 
> > > > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote:
> > > > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote:
> > > > > > Probably it is hard, @Kirill would there be any way to safely move
> > > > > > security_mmap_file() out of the mmap_lock lock?
> > > > > 
> > > > > What about something like this (untested):
> > > > > 
> > > > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > > > index dd4b35a25aeb..03473e77d356 100644
> > > > > --- a/mm/mmap.c
> > > > > +++ b/mm/mmap.c
> > > > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> > > > >  	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
> > > > >  		return ret;
> > > > > 
> > > > > +	if (mmap_read_lock_killable(mm))
> > > > > +		return -EINTR;
> > > > > +
> > > > > +	vma = vma_lookup(mm, start);
> > > > > +
> > > > > +	if (!vma || !(vma->vm_flags & VM_SHARED)) {
> > > > > +		mmap_read_unlock(mm);
> > > > > +		return -EINVAL;
> > > > > +	}
> > > > > +
> > > > > +	file = get_file(vma->vm_file);
> > > > > +
> > > > > +	mmap_read_unlock(mm);
> > > > > +
> > > > > +	ret = security_mmap_file(vma->vm_file, prot, flags);
> > > > 
> > > > Accessing VMA fields without any kind of lock is... very much not advised.
> > > > 
> > > > I'm guessing you meant to say:
> > > > 
> > > > 	ret = security_mmap_file(file, prot, flags);
> > > > 
> > > > Here? :)
> > > 
> > > Sure. My bad.
> > > 
> > > Patch with all fixups:
> > 
> > Thanks a lot! Let's wait a bit until the others have a chance to
> > comment. Meanwhile, I will test it.
> > 
> > Do you want me to do the final patch, or will you be proposing it?
> 
> You can post it if it works:
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Thanks! Ok.

Roberto
Roberto Sassu Oct. 18, 2024, 1:29 p.m. UTC | #19
On Fri, 2024-10-18 at 14:05 +0300, Kirill A. Shutemov wrote:
> On Fri, Oct 18, 2024 at 12:00:22PM +0100, Lorenzo Stoakes wrote:
> > + Liam, Jann
> > 
> > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote:
> > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote:
> > > > Probably it is hard, @Kirill would there be any way to safely move
> > > > security_mmap_file() out of the mmap_lock lock?
> > > 
> > > What about something like this (untested):
> > > 
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index dd4b35a25aeb..03473e77d356 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> > >  	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
> > >  		return ret;
> > > 
> > > +	if (mmap_read_lock_killable(mm))
> > > +		return -EINTR;
> > > +
> > > +	vma = vma_lookup(mm, start);
> > > +
> > > +	if (!vma || !(vma->vm_flags & VM_SHARED)) {
> > > +		mmap_read_unlock(mm);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	file = get_file(vma->vm_file);
> > > +
> > > +	mmap_read_unlock(mm);
> > > +
> > > +	ret = security_mmap_file(vma->vm_file, prot, flags);
> > 
> > Accessing VMA fields without any kind of lock is... very much not advised.
> > 
> > I'm guessing you meant to say:
> > 
> > 	ret = security_mmap_file(file, prot, flags);
> > 
> > Here? :)
> 
> Sure. My bad.
> 
> Patch with all fixups:
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index dd4b35a25aeb..541787d526b6 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1646,14 +1646,41 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>  	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
>  		return ret;
>  
> -	if (mmap_write_lock_killable(mm))
> +	if (mmap_read_lock_killable(mm))
>  		return -EINTR;
>  
>  	vma = vma_lookup(mm, start);
>  
> +	if (!vma || !(vma->vm_flags & VM_SHARED)) {
> +		mmap_read_unlock(mm);
> +		return -EINVAL;
> +	}
> +
> +	file = get_file(vma->vm_file);
> +
> +	mmap_read_unlock(mm);
> +
> +	ret = security_mmap_file(file, prot, flags);

Uhm, I have to calculate prot and flags before. I can check if what I
used here changed in the next lock, and refuse.

Roberto

> +	if (ret) {
> +		fput(file);
> +		return ret;
> +	}
> +
> +	ret = -EINVAL;
> +
> +	if (mmap_write_lock_killable(mm)) {
> +		fput(file);
> +		return -EINTR;
> +	}
> +
> +	vma = vma_lookup(mm, start);
> +
>  	if (!vma || !(vma->vm_flags & VM_SHARED))
>  		goto out;
>  
> +	if (vma->vm_file != file)
> +		goto out;
> +
>  	if (start + size > vma->vm_end) {
>  		VMA_ITERATOR(vmi, mm, vma->vm_end);
>  		struct vm_area_struct *next, *prev = vma;
> @@ -1688,16 +1715,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
>  	if (vma->vm_flags & VM_LOCKED)
>  		flags |= MAP_LOCKED;
>  
> -	file = get_file(vma->vm_file);
> -	ret = security_mmap_file(vma->vm_file, prot, flags);
> -	if (ret)
> -		goto out_fput;
>  	ret = do_mmap(vma->vm_file, start, size,
>  			prot, flags, 0, pgoff, &populate, NULL);
> -out_fput:
> -	fput(file);
>  out:
>  	mmap_write_unlock(mm);
> +	fput(file);
>  	if (populate)
>  		mm_populate(ret, populate);
>  	if (!IS_ERR_VALUE(ret))
Jann Horn Oct. 18, 2024, 2:36 p.m. UTC | #20
On Fri, Oct 18, 2024 at 1:00 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote:
> > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote:
> > > Probably it is hard, @Kirill would there be any way to safely move
> > > security_mmap_file() out of the mmap_lock lock?
> >
> > What about something like this (untested):
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index dd4b35a25aeb..03473e77d356 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> >       if (pgoff + (size >> PAGE_SHIFT) < pgoff)
> >               return ret;
> >
> > +     if (mmap_read_lock_killable(mm))
> > +             return -EINTR;
> > +
> > +     vma = vma_lookup(mm, start);
> > +
> > +     if (!vma || !(vma->vm_flags & VM_SHARED)) {
> > +             mmap_read_unlock(mm);
> > +             return -EINVAL;
> > +     }
> > +
> > +     file = get_file(vma->vm_file);
> > +
> > +     mmap_read_unlock(mm);
> > +
> > +     ret = security_mmap_file(vma->vm_file, prot, flags);
>
> Accessing VMA fields without any kind of lock is... very much not advised.
>
> I'm guessing you meant to say:
>
>         ret = security_mmap_file(file, prot, flags);
>
> Here? :)
>
> I see the original code did this, but obviously was under an mmap lock.
>
> I guess given you check that the file is the same below this.... should be
> fine? Assuming nothing can come in and invalidate the security_mmap_file()
> check in the mean time somehow?
>
> Jann any thoughts?

The overall approach seems reasonable to me - it aligns this path with
the other security_mmap_file() checks, which also don't happen under
the lock.
Lorenzo Stoakes Oct. 18, 2024, 2:48 p.m. UTC | #21
On Fri, Oct 18, 2024 at 04:36:16PM +0200, Jann Horn wrote:
> On Fri, Oct 18, 2024 at 1:00 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote:
> > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote:
> > > > Probably it is hard, @Kirill would there be any way to safely move
> > > > security_mmap_file() out of the mmap_lock lock?
> > >
> > > What about something like this (untested):
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index dd4b35a25aeb..03473e77d356 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
> > >       if (pgoff + (size >> PAGE_SHIFT) < pgoff)
> > >               return ret;
> > >
> > > +     if (mmap_read_lock_killable(mm))
> > > +             return -EINTR;
> > > +
> > > +     vma = vma_lookup(mm, start);
> > > +
> > > +     if (!vma || !(vma->vm_flags & VM_SHARED)) {
> > > +             mmap_read_unlock(mm);
> > > +             return -EINVAL;
> > > +     }
> > > +
> > > +     file = get_file(vma->vm_file);
> > > +
> > > +     mmap_read_unlock(mm);
> > > +
> > > +     ret = security_mmap_file(vma->vm_file, prot, flags);
> >
> > Accessing VMA fields without any kind of lock is... very much not advised.
> >
> > I'm guessing you meant to say:
> >
> >         ret = security_mmap_file(file, prot, flags);
> >
> > Here? :)
> >
> > I see the original code did this, but obviously was under an mmap lock.
> >
> > I guess given you check that the file is the same below this.... should be
> > fine? Assuming nothing can come in and invalidate the security_mmap_file()
> > check in the mean time somehow?
> >
> > Jann any thoughts?
>
> The overall approach seems reasonable to me - it aligns this path with
> the other security_mmap_file() checks, which also don't happen under
> the lock.

Yeah equally I find this pattern fine as we check that the file is the same
after we reacquire the lock (a common pattern in mm), so if there's no
objections on security side I don't really see any issue myself - Roberto
if you want to go ahead and post the patch (separately perhaps?) we can
toss some tags your way :)
diff mbox series

Patch

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 3c323ca213d4..6474a15b584a 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -182,7 +182,6 @@  struct ima_kexec_hdr {
 
 /* IMA integrity metadata associated with an inode */
 struct ima_iint_cache {
-	struct mutex mutex;	/* protects: version, flags, digest */
 	struct integrity_inode_attributes real_inode;
 	unsigned long flags;
 	unsigned long measured_pcrs;
@@ -195,35 +194,48 @@  struct ima_iint_cache {
 	struct ima_digest_data *ima_hash;
 };
 
+struct ima_iint_cache_lock {
+	struct mutex mutex;	/* protects: version, flags, digest */
+	struct ima_iint_cache *iint;
+};
+
 extern struct lsm_blob_sizes ima_blob_sizes;
 
+static inline struct ima_iint_cache_lock *ima_inode_security(void *i_security)
+{
+	return i_security + ima_blob_sizes.lbs_inode;
+}
+
 static inline struct ima_iint_cache *
 ima_inode_get_iint(const struct inode *inode)
 {
-	struct ima_iint_cache **iint_sec;
+	struct ima_iint_cache_lock *iint_lock;
 
 	if (unlikely(!inode->i_security))
 		return NULL;
 
-	iint_sec = inode->i_security + ima_blob_sizes.lbs_inode;
-	return *iint_sec;
+	iint_lock = ima_inode_security(inode->i_security);
+	return iint_lock->iint;
 }
 
 static inline void ima_inode_set_iint(const struct inode *inode,
 				      struct ima_iint_cache *iint)
 {
-	struct ima_iint_cache **iint_sec;
+	struct ima_iint_cache_lock *iint_lock;
 
 	if (unlikely(!inode->i_security))
 		return;
 
-	iint_sec = inode->i_security + ima_blob_sizes.lbs_inode;
-	*iint_sec = iint;
+	iint_lock = ima_inode_security(inode->i_security);
+	iint_lock->iint = iint;
 }
 
 struct ima_iint_cache *ima_iint_find(struct inode *inode);
 struct ima_iint_cache *ima_inode_get(struct inode *inode);
+int ima_inode_alloc_security(struct inode *inode);
 void ima_inode_free_rcu(void *inode_security);
+void ima_iint_lock(struct inode *inode);
+void ima_iint_unlock(struct inode *inode);
 void __init ima_iintcache_init(void);
 
 extern const int read_idmap[];
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index 984e861f6e33..37c2a228f0e1 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -234,7 +234,7 @@  static bool ima_get_verity_digest(struct ima_iint_cache *iint,
  * Calculate the file hash, if it doesn't already exist,
  * storing the measurement and i_version in the iint.
  *
- * Must be called with iint->mutex held.
+ * Must be called with iint mutex held.
  *
  * Return 0 on success, error code otherwise
  */
@@ -343,7 +343,7 @@  int ima_collect_measurement(struct ima_iint_cache *iint, struct file *file,
  *	- the inode was previously flushed as well as the iint info,
  *	  containing the hashing info.
  *
- * Must be called with iint->mutex held.
+ * Must be called with iint mutex held.
  */
 void ima_store_measurement(struct ima_iint_cache *iint, struct file *file,
 			   const unsigned char *filename,
diff --git a/security/integrity/ima/ima_iint.c b/security/integrity/ima/ima_iint.c
index 00b249101f98..c176fd0faae7 100644
--- a/security/integrity/ima/ima_iint.c
+++ b/security/integrity/ima/ima_iint.c
@@ -40,18 +40,18 @@  struct ima_iint_cache *ima_iint_find(struct inode *inode)
  * mutex to avoid lockdep false positives related to IMA + overlayfs.
  * See ovl_lockdep_annotate_inode_mutex_key() for more details.
  */
-static inline void ima_iint_lockdep_annotate(struct ima_iint_cache *iint,
-					     struct inode *inode)
+static inline void ima_iint_lock_lockdep_annotate(struct mutex *mutex,
+						  struct inode *inode)
 {
 #ifdef CONFIG_LOCKDEP
-	static struct lock_class_key ima_iint_mutex_key[IMA_MAX_NESTING];
+	static struct lock_class_key ima_iint_lock_mutex_key[IMA_MAX_NESTING];
 
 	int depth = inode->i_sb->s_stack_depth;
 
 	if (WARN_ON_ONCE(depth < 0 || depth >= IMA_MAX_NESTING))
 		depth = 0;
 
-	lockdep_set_class(&iint->mutex, &ima_iint_mutex_key[depth]);
+	lockdep_set_class(mutex, &ima_iint_lock_mutex_key[depth]);
 #endif
 }
 
@@ -68,14 +68,11 @@  static void ima_iint_init_always(struct ima_iint_cache *iint,
 	iint->ima_read_status = INTEGRITY_UNKNOWN;
 	iint->ima_creds_status = INTEGRITY_UNKNOWN;
 	iint->measured_pcrs = 0;
-	mutex_init(&iint->mutex);
-	ima_iint_lockdep_annotate(iint, inode);
 }
 
 static void ima_iint_free(struct ima_iint_cache *iint)
 {
 	kfree(iint->ima_hash);
-	mutex_destroy(&iint->mutex);
 	kmem_cache_free(ima_iint_cache, iint);
 }
 
@@ -108,6 +105,26 @@  struct ima_iint_cache *ima_inode_get(struct inode *inode)
 	return iint;
 }
 
+/**
+ * ima_inode_alloc_security - Called to init an inode
+ * @inode: Pointer to the inode
+ *
+ * Initialize and annotate the mutex in the ima_iint_cache_lock structure.
+ *
+ * Return: Zero.
+ */
+int ima_inode_alloc_security(struct inode *inode)
+{
+	struct ima_iint_cache_lock *iint_lock;
+
+	iint_lock = ima_inode_security(inode->i_security);
+
+	mutex_init(&iint_lock->mutex);
+	ima_iint_lock_lockdep_annotate(&iint_lock->mutex, inode);
+
+	return 0;
+}
+
 /**
  * ima_inode_free_rcu - Called to free an inode via a RCU callback
  * @inode_security: The inode->i_security pointer
@@ -116,11 +133,49 @@  struct ima_iint_cache *ima_inode_get(struct inode *inode)
  */
 void ima_inode_free_rcu(void *inode_security)
 {
-	struct ima_iint_cache **iint_p = inode_security + ima_blob_sizes.lbs_inode;
+	struct ima_iint_cache_lock *iint_lock;
+
+	iint_lock = ima_inode_security(inode_security);
+
+	mutex_destroy(&iint_lock->mutex);
+
+	/* iint_lock->iint should be NULL if !IS_IMA(inode) */
+	if (iint_lock->iint)
+		ima_iint_free(iint_lock->iint);
+}
+
+/**
+ * ima_iint_lock - Lock integrity metadata
+ * @inode: Pointer to the inode
+ *
+ * Lock integrity metadata.
+ */
+void ima_iint_lock(struct inode *inode)
+{
+	struct ima_iint_cache_lock *iint_lock;
+
+	iint_lock = ima_inode_security(inode->i_security);
+
+	/* Only inodes with i_security are processed by IMA. */
+	if (iint_lock)
+		mutex_lock(&iint_lock->mutex);
+}
+
+/**
+ * ima_iint_unlock - Unlock integrity metadata
+ * @inode: Pointer to the inode
+ *
+ * Unlock integrity metadata.
+ */
+void ima_iint_unlock(struct inode *inode)
+{
+	struct ima_iint_cache_lock *iint_lock;
+
+	iint_lock = ima_inode_security(inode->i_security);
 
-	/* *iint_p should be NULL if !IS_IMA(inode) */
-	if (*iint_p)
-		ima_iint_free(*iint_p);
+	/* Only inodes with i_security are processed by IMA. */
+	if (iint_lock)
+		mutex_unlock(&iint_lock->mutex);
 }
 
 static void ima_iint_init_once(void *foo)
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 06132cf47016..7852212c43ce 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -163,7 +163,7 @@  static void ima_check_last_writer(struct ima_iint_cache *iint,
 	if (!(mode & FMODE_WRITE))
 		return;
 
-	mutex_lock(&iint->mutex);
+	ima_iint_lock(inode);
 	if (atomic_read(&inode->i_writecount) == 1) {
 		struct kstat stat;
 
@@ -181,7 +181,7 @@  static void ima_check_last_writer(struct ima_iint_cache *iint,
 				ima_update_xattr(iint, file);
 		}
 	}
-	mutex_unlock(&iint->mutex);
+	ima_iint_unlock(inode);
 }
 
 /**
@@ -247,7 +247,7 @@  static int process_measurement(struct file *file, const struct cred *cred,
 	if (action & IMA_FILE_APPRAISE)
 		func = FILE_CHECK;
 
-	inode_lock(inode);
+	ima_iint_lock(inode);
 
 	if (action) {
 		iint = ima_inode_get(inode);
@@ -259,15 +259,11 @@  static int process_measurement(struct file *file, const struct cred *cred,
 		ima_rdwr_violation_check(file, iint, action & IMA_MEASURE,
 					 &pathbuf, &pathname, filename);
 
-	inode_unlock(inode);
-
 	if (rc)
 		goto out;
 	if (!action)
 		goto out;
 
-	mutex_lock(&iint->mutex);
-
 	if (test_and_clear_bit(IMA_CHANGE_ATTR, &iint->atomic_flags))
 		/* reset appraisal flags if ima_inode_post_setattr was called */
 		iint->flags &= ~(IMA_APPRAISE | IMA_APPRAISED |
@@ -412,10 +408,10 @@  static int process_measurement(struct file *file, const struct cred *cred,
 	if ((mask & MAY_WRITE) && test_bit(IMA_DIGSIG, &iint->atomic_flags) &&
 	     !(iint->flags & IMA_NEW_FILE))
 		rc = -EACCES;
-	mutex_unlock(&iint->mutex);
 	kfree(xattr_value);
 	ima_free_modsig(modsig);
 out:
+	ima_iint_unlock(inode);
 	if (pathbuf)
 		__putname(pathbuf);
 	if (must_appraise) {
@@ -580,18 +576,13 @@  static int __ima_inode_hash(struct inode *inode, struct file *file, char *buf,
 	struct ima_iint_cache *iint = NULL, tmp_iint;
 	int rc, hash_algo;
 
-	if (ima_policy_flag) {
+	ima_iint_lock(inode);
+
+	if (ima_policy_flag)
 		iint = ima_iint_find(inode);
-		if (iint)
-			mutex_lock(&iint->mutex);
-	}
 
 	if ((!iint || !(iint->flags & IMA_COLLECTED)) && file) {
-		if (iint)
-			mutex_unlock(&iint->mutex);
-
 		memset(&tmp_iint, 0, sizeof(tmp_iint));
-		mutex_init(&tmp_iint.mutex);
 
 		rc = ima_collect_measurement(&tmp_iint, file, NULL, 0,
 					     ima_hash_algo, NULL);
@@ -600,22 +591,24 @@  static int __ima_inode_hash(struct inode *inode, struct file *file, char *buf,
 			if (rc != -ENOMEM)
 				kfree(tmp_iint.ima_hash);
 
+			ima_iint_unlock(inode);
 			return -EOPNOTSUPP;
 		}
 
 		iint = &tmp_iint;
-		mutex_lock(&iint->mutex);
 	}
 
-	if (!iint)
+	if (!iint) {
+		ima_iint_unlock(inode);
 		return -EOPNOTSUPP;
+	}
 
 	/*
 	 * ima_file_hash can be called when ima_collect_measurement has still
 	 * not been called, we might not always have a hash.
 	 */
 	if (!iint->ima_hash || !(iint->flags & IMA_COLLECTED)) {
-		mutex_unlock(&iint->mutex);
+		ima_iint_unlock(inode);
 		return -EOPNOTSUPP;
 	}
 
@@ -626,11 +619,12 @@  static int __ima_inode_hash(struct inode *inode, struct file *file, char *buf,
 		memcpy(buf, iint->ima_hash->digest, copied_size);
 	}
 	hash_algo = iint->ima_hash->algo;
-	mutex_unlock(&iint->mutex);
 
 	if (iint == &tmp_iint)
 		kfree(iint->ima_hash);
 
+	ima_iint_unlock(inode);
+
 	return hash_algo;
 }
 
@@ -1118,7 +1112,7 @@  EXPORT_SYMBOL_GPL(ima_measure_critical_data);
  * @kmod_name: kernel module name
  *
  * Avoid a verification loop where verifying the signature of the modprobe
- * binary requires executing modprobe itself. Since the modprobe iint->mutex
+ * binary requires executing modprobe itself. Since the modprobe iint mutex
  * is already held when the signature verification is performed, a deadlock
  * occurs as soon as modprobe is executed within the critical region, since
  * the same lock cannot be taken again.
@@ -1193,6 +1187,7 @@  static struct security_hook_list ima_hooks[] __ro_after_init = {
 #ifdef CONFIG_INTEGRITY_ASYMMETRIC_KEYS
 	LSM_HOOK_INIT(kernel_module_request, ima_kernel_module_request),
 #endif
+	LSM_HOOK_INIT(inode_alloc_security, ima_inode_alloc_security),
 	LSM_HOOK_INIT(inode_free_security_rcu, ima_inode_free_rcu),
 };
 
@@ -1210,7 +1205,7 @@  static int __init init_ima_lsm(void)
 }
 
 struct lsm_blob_sizes ima_blob_sizes __ro_after_init = {
-	.lbs_inode = sizeof(struct ima_iint_cache *),
+	.lbs_inode = sizeof(struct ima_iint_cache_lock),
 };
 
 DEFINE_LSM(ima) = {