mbox series

[bpf-next,0/5] Fix kernel panic caused by bpf lsm return value

Message ID 20240316122359.1073787-1-xukuohai@huaweicloud.com (mailing list archive)
Headers show
Series Fix kernel panic caused by bpf lsm return value | expand

Message

Xu Kuohai March 16, 2024, 12:23 p.m. UTC
From: Xu Kuohai <xukuohai@huawei.com>

A bpf prog returning positive number attached to file_alloc_security hook
will make kernel panic.

Here is a panic log:

[  441.235774] BUG: kernel NULL pointer dereference, address: 00000000000009
[  441.236748] #PF: supervisor write access in kernel mode
[  441.237429] #PF: error_code(0x0002) - not-present page
[  441.238119] PGD 800000000b02f067 P4D 800000000b02f067 PUD b031067 PMD 0
[  441.238990] Oops: 0002 [#1] PREEMPT SMP PTI
[  441.239546] CPU: 0 PID: 347 Comm: loader Not tainted 6.8.0-rc6-gafe0cbf23373 #22
[  441.240496] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b4
[  441.241933] RIP: 0010:alloc_file+0x4b/0x190
[  441.242485] Code: 8b 04 25 c0 3c 1f 00 48 8b b0 30 0c 00 00 e8 9c fe ff ff 48 3d 00 f0 ff fb
[  441.244820] RSP: 0018:ffffc90000c67c40 EFLAGS: 00010203
[  441.245484] RAX: ffff888006a891a0 RBX: ffffffff8223bd00 RCX: 0000000035b08000
[  441.246391] RDX: ffff88800b95f7b0 RSI: 00000000001fc110 RDI: f089cd0b8088ffff
[  441.247294] RBP: ffffc90000c67c58 R08: 0000000000000001 R09: 0000000000000001
[  441.248209] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001
[  441.249108] R13: ffffc90000c67c78 R14: ffffffff8223bd00 R15: fffffffffffffff4
[  441.250007] FS:  00000000005f3300(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[  441.251053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  441.251788] CR2: 00000000000001a9 CR3: 000000000bdc4003 CR4: 0000000000170ef0
[  441.252688] Call Trace:
[  441.253011]  <TASK>
[  441.253296]  ? __die+0x24/0x70
[  441.253702]  ? page_fault_oops+0x15b/0x480
[  441.254236]  ? fixup_exception+0x26/0x330
[  441.254750]  ? exc_page_fault+0x6d/0x1c0
[  441.255257]  ? asm_exc_page_fault+0x26/0x30
[  441.255792]  ? alloc_file+0x4b/0x190
[  441.256257]  alloc_file_pseudo+0x9f/0xf0
[  441.256760]  __anon_inode_getfile+0x87/0x190
[  441.257311]  ? lock_release+0x14e/0x3f0
[  441.257808]  bpf_link_prime+0xe8/0x1d0
[  441.258315]  bpf_tracing_prog_attach+0x311/0x570
[  441.258916]  ? __pfx_bpf_lsm_file_alloc_security+0x10/0x10
[  441.259605]  __sys_bpf+0x1bb7/0x2dc0
[  441.260070]  __x64_sys_bpf+0x20/0x30
[  441.260533]  do_syscall_64+0x72/0x140
[  441.261004]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  441.261643] RIP: 0033:0x4b0349
[  441.262045] Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 88
[  441.264355] RSP: 002b:00007fff74daee38 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[  441.265293] RAX: ffffffffffffffda RBX: 00007fff74daef30 RCX: 00000000004b0349
[  441.266187] RDX: 0000000000000040 RSI: 00007fff74daee50 RDI: 000000000000001c
[  441.267114] RBP: 000000000000001b R08: 00000000005ef820 R09: 0000000000000000
[  441.268018] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[  441.268907] R13: 0000000000000004 R14: 00000000005ef018 R15: 00000000004004e8

The reason is that the positive number returned by bpf prog is not a
valid errno, and could not be filtered out with IS_ERR which is used by
the file system to check errors. As a result, the file system uses this
positive number as file pointer, causing panic.

To fix this issue, there are two schemes:

1. Modify the calling sites of file_alloc_security to take positive
   return values as zero.

2. Make the bpf verifier to ensure no unpredicted value is returned by
   lsm bpf prog.

Considering that hook file_alloc_security never returned positive number
before bpf lsm was introduced, and other bpf lsm hooks may have the same
problem, scheme 2 is more reasonable.

So this patch set adds lsm return value check in verifier to fix it.

Xu Kuohai (5):
  bpf, lsm: Annotate lsm hook return integer with new macro LSM_RET_INT
  bpf, lsm: Add return value range description for lsm hook
  bpf, lsm: Add function to read lsm hook return value range
  bpf, lsm: Check bpf lsm hook return values in verifier
  bpf: Fix compare error in function retval_range_within

 include/linux/bpf.h           |   1 +
 include/linux/bpf_lsm.h       |   8 +
 include/linux/lsm_hook_defs.h | 433 +++++++++++++++++-----------------
 include/linux/lsm_hooks.h     |   6 -
 kernel/bpf/bpf_lsm.c          |  66 +++++-
 kernel/bpf/btf.c              |   5 +-
 kernel/bpf/verifier.c         |  59 ++++-
 security/security.c           |   1 +
 8 files changed, 348 insertions(+), 231 deletions(-)

Comments

Stanislav Fomichev March 18, 2024, 4:52 p.m. UTC | #1
On 03/16, Xu Kuohai wrote:
> From: Xu Kuohai <xukuohai@huawei.com>
> 
> A bpf prog returning positive number attached to file_alloc_security hook
> will make kernel panic.

I'll leave it up to KP. I remember there was a similar patch series in
the past, but I have no state on why it was not accepted..
Paul Moore March 18, 2024, 4:58 p.m. UTC | #2
On Mon, Mar 18, 2024 at 12:52 PM Stanislav Fomichev <sdf@google.com> wrote:
> On 03/16, Xu Kuohai wrote:
> > From: Xu Kuohai <xukuohai@huawei.com>
> >
> > A bpf prog returning positive number attached to file_alloc_security hook
> > will make kernel panic.
>
> I'll leave it up to KP. I remember there was a similar patch series in
> the past, but I have no state on why it was not accepted..

I believe this is the patchset you are referring to:

https://lore.kernel.org/linux-security-module/20240207124918.3498756-1-kpsingh@kernel.org

It wasn't that the patchset was accepted or rejected, it is still in
the review queue as there are higher priority items being kicked
around in the LSM space at the moment.  It also wasn't a pure bug-fix
or feature patchset/patch, which muddied things a bit.
Roberto Sassu March 18, 2024, 5:02 p.m. UTC | #3
On Mon, 2024-03-18 at 09:52 -0700, Stanislav Fomichev wrote:
> On 03/16, Xu Kuohai wrote:
> > From: Xu Kuohai <xukuohai@huawei.com>
> > 
> > A bpf prog returning positive number attached to file_alloc_security hook
> > will make kernel panic.
> 
> I'll leave it up to KP. I remember there was a similar patch series in
> the past, but I have no state on why it was not accepted..

Yes, this one:

v1: https://lore.kernel.org/bpf/20221115175652.3836811-1-roberto.sassu@huaweicloud.com/
v2: https://lore.kernel.org/bpf/20221207172434.435893-1-roberto.sassu@huaweicloud.com/

The selftests were failing, and I wasn't able to come up with a
solution on the verifier side. I see patch 5 goes in that direction,
and I remember there was related work in this area.

(just saw Paul's answer, maybe the new KP's patch set also solves this)

Roberto
Xu Kuohai March 19, 2024, 7:37 a.m. UTC | #4
On 3/19/2024 12:58 AM, Paul Moore wrote:
> On Mon, Mar 18, 2024 at 12:52 PM Stanislav Fomichev <sdf@google.com> wrote:
>> On 03/16, Xu Kuohai wrote:
>>> From: Xu Kuohai <xukuohai@huawei.com>
>>>
>>> A bpf prog returning positive number attached to file_alloc_security hook
>>> will make kernel panic.
>>
>> I'll leave it up to KP. I remember there was a similar patch series in
>> the past, but I have no state on why it was not accepted..
> 
> I believe this is the patchset you are referring to:
> 
> https://lore.kernel.org/linux-security-module/20240207124918.3498756-1-kpsingh@kernel.org
> 

Thank you for the reply. IIUC, the above patchset is intended to reduce
the indirect call overhead of bpf lsm. I have tested it, and the panic
issue still exists with this patchset applied.

> It wasn't that the patchset was accepted or rejected, it is still in
> the review queue as there are higher priority items being kicked
> around in the LSM space at the moment.  It also wasn't a pure bug-fix
> or feature patchset/patch, which muddied things a bit.
>
Xu Kuohai March 19, 2024, 7:54 a.m. UTC | #5
On 3/19/2024 1:02 AM, Roberto Sassu wrote:
> On Mon, 2024-03-18 at 09:52 -0700, Stanislav Fomichev wrote:
>> On 03/16, Xu Kuohai wrote:
>>> From: Xu Kuohai <xukuohai@huawei.com>
>>>
>>> A bpf prog returning positive number attached to file_alloc_security hook
>>> will make kernel panic.
>>
>> I'll leave it up to KP. I remember there was a similar patch series in
>> the past, but I have no state on why it was not accepted..
> 
> Yes, this one:
> 
> v1: https://lore.kernel.org/bpf/20221115175652.3836811-1-roberto.sassu@huaweicloud.com/
> v2: https://lore.kernel.org/bpf/20221207172434.435893-1-roberto.sassu@huaweicloud.com/
> 

Hmm, these two series do address the same problem. Now I'm pretty
sure I read your series last year, at least some of the patches,
but I didn't think of it when I ran into this panic issue last
week. Maybe it's because I does not fully understand it.

> The selftests were failing, and I wasn't able to come up with a
> solution on the verifier side. I see patch 5 goes in that direction,
> and I remember there was related work in this area.
>
> (just saw Paul's answer, maybe the new KP's patch set also solves this)
> 
> Roberto
> 
>