[7/9] riscv: bpf: Avoid breaking W^X

Message ID	20210330022521.2a904a8c@xhacker (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> Date: Tue, 30 Mar 2021 02:25:21 +0800 From: Jisheng Zhang <jszhang3@mail.ustc.edu.cn> To: Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, Albert Ou <aou@eecs.berkeley.edu>, Andrey Ryabinin <ryabinin.a.a@gmail.com>, Alexander Potapenko <glider@google.com>, Andrey Konovalov <andreyknvl@gmail.com>, Dmitry Vyukov <dvyukov@google.com>, " =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= " <bjorn@kernel.org>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@kernel.org>, Luke Nelson <luke.r.nels@gmail.com>, Xi Wang <xi.wang@gmail.com> Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com, netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH 7/9] riscv: bpf: Avoid breaking W^X Message-ID: <20210330022521.2a904a8c@xhacker> In-Reply-To: <20210330022144.150edc6e@xhacker> References: <20210330022144.150edc6e@xhacker> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk
Series	riscv: improve self-protection \| expand [0/9] riscv: improve self-protection [1/9] riscv: add __init section marker to some functions [2/9] riscv: Mark some global variables __ro_after_init [3/9] riscv: Constify sys_call_table [4/9] riscv: Constify sbi_ipi_ops [5/9] riscv: kprobes: Implement alloc_insn_page() [6/9] riscv: bpf: Move bpf_jit_alloc_exec() and bpf_jit_free_exec() to core [7/9] riscv: bpf: Avoid breaking W^X [8/9] riscv: module: Create module allocations without exec permissions [9/9] riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU

Context	Check	Description
netdev/tree_selection	success	Not a local patch

Jisheng Zhang March 29, 2021, 6:25 p.m. UTC

From: Jisheng Zhang <jszhang@kernel.org>

We allocate Non-executable pages, then call bpf_jit_binary_lock_ro()
to enable executable permission after mapping them read-only. This is
to prepare for STRICT_MODULE_RWX in following patch.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
 arch/riscv/net/bpf_jit_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Andreas Schwab June 11, 2021, 2:10 p.m. UTC | #1

On Mär 30 2021, Jisheng Zhang wrote:

> From: Jisheng Zhang <jszhang@kernel.org>
>
> We allocate Non-executable pages, then call bpf_jit_binary_lock_ro()
> to enable executable permission after mapping them read-only. This is
> to prepare for STRICT_MODULE_RWX in following patch.

That breaks booting with
<https://github.com/openSUSE/kernel-source/blob/master/config/riscv64/default>.

Andreas.

Jisheng Zhang June 11, 2021, 4:23 p.m. UTC | #2

Hi Andreas,

On Fri, 11 Jun 2021 16:10:03 +0200
Andreas Schwab <schwab@linux-m68k.org> wrote:

> On Mär 30 2021, Jisheng Zhang wrote:
> 
> > From: Jisheng Zhang <jszhang@kernel.org>
> >
> > We allocate Non-executable pages, then call bpf_jit_binary_lock_ro()
> > to enable executable permission after mapping them read-only. This is
> > to prepare for STRICT_MODULE_RWX in following patch.  
> 
> That breaks booting with
> <https://github.com/openSUSE/kernel-source/blob/master/config/riscv64/default>.
> 

Thanks for the bug report.
I reproduced an kernel panic with the defconfig on qemu, but I'm not sure whether
this is the issue you saw, I will check.

    0.161959] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[    0.167028] pinctrl core: initialized pinctrl subsystem
[    0.190727] Unable to handle kernel paging request at virtual address ffffffff81651bd8
[    0.191361] Oops [#1]
[    0.191509] Modules linked in:
[    0.191814] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #3
[    0.192179] Hardware name: riscv-virtio,qemu (DT)
[    0.192492] epc : __memset+0xc4/0xfc
[    0.192712]  ra : skb_flow_dissector_init+0x22/0x86
[    0.192915] epc : ffffffff803e2700 ra : ffffffff8058f90c sp : ffffffe001a4fda0
[    0.193221]  gp : ffffffff8156d120 tp : ffffffe001a5b700 t0 : ffffffff81651b10
[    0.193631]  t1 : 0000000000000100 t2 : 00000000000003a8 s0 : ffffffe001a4fdd0
[    0.194034]  s1 : ffffffff80c9e250 a0 : ffffffff81651bd8 a1 : 0000000000000000
[    0.194502]  a2 : 000000000000003c a3 : ffffffff81651c10 a4 : 0000000000000064
[    0.195053]  a5 : ffffffff803e2700 a6 : 0000000000000040 a7 : 0000000000000002
[    0.195436]  s2 : ffffffff81651bd8 s3 : 0000000000000009 s4 : ffffffff8156e0c8
[    0.195723]  s5 : ffffffff8156e050 s6 : ffffffff80a105e0 s7 : ffffffff80a00738
[    0.195992]  s8 : ffffffff80f07be0 s9 : 0000000000000008 s10: ffffffff808000ac
[    0.196257]  s11: 0000000000000000 t3 : fffffffffffffffc t4 : 0000000000000000
[    0.196511]  t5 : 00000000000003a9 t6 : 00000000000003ff
[    0.196714] status: 0000000000000120 badaddr: ffffffff81651bd8 cause: 000000000000000f
[    0.197103] [<ffffffff803e2700>] __memset+0xc4/0xfc
[    0.197408] [<ffffffff80831f58>] init_default_flow_dissectors+0x22/0x60
[    0.197693] [<ffffffff800020fc>] do_one_initcall+0x3e/0x168
[    0.197907] [<ffffffff80801438>] kernel_init_freeable+0x25a/0x2c6
[    0.198157] [<ffffffff8070a8a8>] kernel_init+0x12/0x110
[    0.198351] [<ffffffff8000333a>] ret_from_exception+0x0/0xc
[    0.198973] Unable to handle kernel paging request at virtual address ffffffff8164d860
[    0.199242] Oops [#2]
[    0.199336] Modules linked in:
[    0.199514] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G      D           5.13.0-rc5-default+ #3
[    0.199785] Hardware name: riscv-virtio,qemu (DT)
[    0.199940] epc : _raw_spin_lock_irqsave+0x14/0x4e
[    0.200113]  ra : _extract_crng+0x58/0xac
[    0.200264] epc : ffffffff807117ae ra : ffffffff80490774 sp : ffffffe001a4fa70
[    0.200489]  gp : ffffffff8156d120 tp : ffffffe001a5b700 t0 : ffffffff8157c0d7
[    0.200715]  t1 : ffffffff8157c0c8 t2 : 0000000000000000 s0 : ffffffe001a4fa80
[    0.200938]  s1 : ffffffff8164d818 a0 : 0000000000000022 a1 : ffffffe001a4fac8
[    0.201166]  a2 : 0000000000000010 a3 : 0000000000000001 a4 : ffffffff8164d860
[    0.201389]  a5 : 0000000000000000 a6 : c0000000ffffdfff a7 : ffffffffffffffff
[    0.201612]  s2 : ffffffff8156e1c0 s3 : ffffffe001a4fac8 s4 : ffffffff8164d860
[    0.201836]  s5 : ffffffff8156e0c8 s6 : ffffffff80a105e0 s7 : ffffffff80a00738
[    0.202060]  s8 : ffffffff80f07be0 s9 : 0000000000000008 s10: ffffffff808000ac
[    0.202295]  s11: 0000000000000000 t3 : 000000000000005b t4 : ffffffffffffffff
[    0.202519]  t5 : 00000000000003a9 t6 : ffffffe001a4f9b8
[    0.202691] status: 0000000000000100 badaddr: ffffffff8164d860 cause: 000000000000000f
[    0.202940] [<ffffffff807117ae>] _raw_spin_lock_irqsave+0x14/0x4e
[    0.203326] Unable to handle kernel paging request at virtual address ffffffff8164d860
[    0.203574] Oops [#3]
[    0.203664] Modules linked in:
[    0.203784] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G      D           5.13.0-rc5-default+ #3
[    0.204046] Hardware name: riscv-virtio,qemu (DT)
[    0.204201] epc : _raw_spin_lock_irqsave+0x14/0x4e
[    0.204371]  ra : _extract_crng+0x58/0xac
[    0.204519] epc : ffffffff807117ae ra : ffffffff80490774 sp : ffffffe001a4f740
[    0.204819]  gp : ffffffff8156d120 tp : ffffffe001a5b700 t0 : ffffffff8157c0d7
[    0.205089]  t1 : ffffffff8157c0c8 t2 : 0000000000000000 s0 : ffffffe001a4f750
[    0.205330]  s1 : ffffffff8164d818 a0 : 0000000000000102 a1 : ffffffe001a4f798
[    0.205553]  a2 : 0000000000000010 a3 : 0000000000000001 a4 : ffffffff8164d860
[    0.205768]  a5 : 0000000000000000 a6 : c0000000ffffdfff a7 : ffffffff81408a40
[    0.205981]  s2 : ffffffff8156e1c0 s3 : ffffffe001a4f798 s4 : ffffffff8164d860
[    0.206197]  s5 : ffffffff8156e0c8 s6 : ffffffff80a105e0 s7 : ffffffff80a00738
[    0.206411]  s8 : ffffffff80f07be0 s9 : 0000000000000008 s10: ffffffff808000ac
[    0.206633]  s11: 0000000000000000 t3 : 000000000000005b t4 : ffffffffffffffff
[    0.206849]  t5 : 00000000000003a9 t6 : ffffffe001a4f688

Andreas Schwab June 11, 2021, 4:41 p.m. UTC | #3

On Jun 12 2021, Jisheng Zhang wrote:

> I reproduced an kernel panic with the defconfig on qemu, but I'm not sure whether
> this is the issue you saw, I will check.
>
>     0.161959] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
> [    0.167028] pinctrl core: initialized pinctrl subsystem
> [    0.190727] Unable to handle kernel paging request at virtual address ffffffff81651bd8
> [    0.191361] Oops [#1]
> [    0.191509] Modules linked in:
> [    0.191814] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #3
> [    0.192179] Hardware name: riscv-virtio,qemu (DT)
> [    0.192492] epc : __memset+0xc4/0xfc
> [    0.192712]  ra : skb_flow_dissector_init+0x22/0x86

Yes, that's the same.

Andreas.

Jisheng Zhang June 13, 2021, 5:05 p.m. UTC | #4

Hi,

On Fri, 11 Jun 2021 18:41:16 +0200
Andreas Schwab <schwab@linux-m68k.org> wrote:

> On Jun 12 2021, Jisheng Zhang wrote:
> 
> > I reproduced an kernel panic with the defconfig on qemu, but I'm not sure whether
> > this is the issue you saw, I will check.
> >
> >     0.161959] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
> > [    0.167028] pinctrl core: initialized pinctrl subsystem
> > [    0.190727] Unable to handle kernel paging request at virtual address ffffffff81651bd8
> > [    0.191361] Oops [#1]
> > [    0.191509] Modules linked in:
> > [    0.191814] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #3
> > [    0.192179] Hardware name: riscv-virtio,qemu (DT)
> > [    0.192492] epc : __memset+0xc4/0xfc
> > [    0.192712]  ra : skb_flow_dissector_init+0x22/0x86  
> 
> Yes, that's the same.
> 
> Andreas.
> 

I think I found the root cause: commit 2bfc6cd81bd ("move kernel mapping
outside of linear mapping") moves BPF JIT region after the kernel:

#define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)

The &_end is unlikely aligned with PMD SIZE, so the front bpf jit region
sits with kernel .data section in one PMD. But kenrel is mapped in PMD SIZE,
so when bpf_jit_binary_lock_ro() is called to make the first bpf jit prog
ROX, we will make part of kernel .data section RO too, so when we write, for example
memset the .data section, MMU will trigger store page fault.

To fix the issue, we need to make the bpf jit region PMD size aligned by either
patch BPF_JIT_REGION_START to align on PMD size rather than PAGE SIZE, or
something as below patch to move the BPF region before modules region:

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 9469f464e71a..997b894edbc2 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -31,8 +31,8 @@
 #define BPF_JIT_REGION_SIZE	(SZ_128M)
 #ifdef CONFIG_64BIT
 /* KASLR should leave at least 128MB for BPF after the kernel */
-#define BPF_JIT_REGION_START	PFN_ALIGN((unsigned long)&_end)
-#define BPF_JIT_REGION_END	(BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
+#define BPF_JIT_REGION_START	(BPF_JIT_REGION_END - BPF_JIT_REGION_SIZE)
+#define BPF_JIT_REGION_END	(MODULES_VADDR)
 #else
 #define BPF_JIT_REGION_START	(PAGE_OFFSET - BPF_JIT_REGION_SIZE)
 #define BPF_JIT_REGION_END	(VMALLOC_END)
@@ -40,8 +40,8 @@
 
 /* Modules always live before the kernel */
 #ifdef CONFIG_64BIT
-#define MODULES_VADDR	(PFN_ALIGN((unsigned long)&_end) - SZ_2G)
 #define MODULES_END	(PFN_ALIGN((unsigned long)&_start))
+#define MODULES_VADDR	(MODULES_END - SZ_128M)
 #endif
 
 
can you please try it? Per my test, the issue is fixed.

Thanks

Andreas Schwab June 13, 2021, 7:50 p.m. UTC | #5

On Jun 14 2021, Jisheng Zhang wrote:

> I think I found the root cause: commit 2bfc6cd81bd ("move kernel mapping
> outside of linear mapping") moves BPF JIT region after the kernel:
>
> #define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)
>
> The &_end is unlikely aligned with PMD SIZE, so the front bpf jit region
> sits with kernel .data section in one PMD. But kenrel is mapped in PMD SIZE,
> so when bpf_jit_binary_lock_ro() is called to make the first bpf jit prog
> ROX, we will make part of kernel .data section RO too, so when we write, for example
> memset the .data section, MMU will trigger store page fault.
>
> To fix the issue, we need to make the bpf jit region PMD size aligned by either
> patch BPF_JIT_REGION_START to align on PMD size rather than PAGE SIZE, or
> something as below patch to move the BPF region before modules region:
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 9469f464e71a..997b894edbc2 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -31,8 +31,8 @@
>  #define BPF_JIT_REGION_SIZE	(SZ_128M)
>  #ifdef CONFIG_64BIT
>  /* KASLR should leave at least 128MB for BPF after the kernel */
> -#define BPF_JIT_REGION_START	PFN_ALIGN((unsigned long)&_end)
> -#define BPF_JIT_REGION_END	(BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
> +#define BPF_JIT_REGION_START	(BPF_JIT_REGION_END - BPF_JIT_REGION_SIZE)
> +#define BPF_JIT_REGION_END	(MODULES_VADDR)
>  #else
>  #define BPF_JIT_REGION_START	(PAGE_OFFSET - BPF_JIT_REGION_SIZE)
>  #define BPF_JIT_REGION_END	(VMALLOC_END)
> @@ -40,8 +40,8 @@
>  
>  /* Modules always live before the kernel */
>  #ifdef CONFIG_64BIT
> -#define MODULES_VADDR	(PFN_ALIGN((unsigned long)&_end) - SZ_2G)
>  #define MODULES_END	(PFN_ALIGN((unsigned long)&_start))
> +#define MODULES_VADDR	(MODULES_END - SZ_128M)
>  #endif
>  
>  
> can you please try it? Per my test, the issue is fixed.

I can confirm that this fixes the issue.

Andreas.

[7/9] riscv: bpf: Avoid breaking W^X

Checks

Commit Message

Comments

Patch