From patchwork Sat Aug 26 13:44:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= X-Patchwork-Id: 13366597 X-Patchwork-Delegate: bpf@iogearbox.net Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19ED0CA58 for ; Sat, 26 Aug 2023 13:44:51 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E5AC5C433C7; Sat, 26 Aug 2023 13:44:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1693057491; bh=oFKyndOxe6LHZACuQUiWjyjDTRuOBgWPpdfdATB1Dug=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=Rt7F/XMiZy780lu2ZKlBR7T7ydImNaVJEnqCZ5cXIIJ3/Mo5NxpAyvJJSVip8EpIw DqSf2qCQCwpBFvpwt9iM1jArX7UOKG+fVihKEjemO9oERbttfmnU7UUOmTaN4RlEuw E6B9Lhrho0Et2xyjuT6aSh2LiagMUISvRdc3d1/VLj7iRKy6tOD4TzfL6vlnNps2Ee 7BssZAx68OpYx9y4VNrm0ZqTcrjgoB0VB8k/P0y/o274rmkF+/Sp+87Iq3ESOv9Bnj 5lj1xm12RFXBZ5JF4rde6y96tqxw67Wj+KU9RTw2gvP79w7LM+jeG/74LjWILW6kei 9Qd9JGLofJG5w== From: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= To: linux-riscv@lists.infradead.org, Guo Ren Cc: bpf@vger.kernel.org, Hou Tao , yonghong.song@linux.dev, Alexei Starovoitov , Puranjay Mohan Subject: RISC-V uprobe bug (Was: Re: WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342) In-Reply-To: <87jztjmmy4.fsf@all.your.base.are.belong.to.us> References: <87jztjmmy4.fsf@all.your.base.are.belong.to.us> Date: Sat, 26 Aug 2023 15:44:48 +0200 Message-ID: <87v8d19aun.fsf@all.your.base.are.belong.to.us> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Björn Töpel writes: > I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf > selftests on bpf-next 9e3b47abeb8f. > > I'm able to reproduce the hang by multiple runs of: > | ./test_progs -a link_api -a linked_list > I'm currently investigating that. +Guo for uprobe This was an interesting bug. The hang is an ebreak (RISC-V breakpoint), that puts the kernel into an infinite loop. To reproduce, simply run the BPF selftest: ./test_progs -v -a link_api -a linked_list First the link_api test is being run, which exercises the uprobe functionality. The link_api test completes, and test_progs will still have the uprobe active/enabled. Next the linked_list test triggered a WARN_ON (which is implemented via ebreak as well). Now, handle_break() is entered, and the uprobe_breakpoint_handler() returns true exiting the handle_break(), which returns to the WARN ebreak, and we have merry-go-round. Lucky for the RISC-V folks, the BPF memory handler had a WARN that surfaced the bug! ;-) This patch fixes the issue, but it's probably a prettier variant: --8<-- --8<-- I'll cook a cleaner/proper patch for this, unless the uprobes folks has a better solution. Björn diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c index f798c853bede..1198cb879d2f 100644 --- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -248,23 +248,29 @@ static inline unsigned long get_break_insn_length(unsigned long pc) void handle_break(struct pt_regs *regs) { + bool user = user_mode(regs); + #ifdef CONFIG_KPROBES - if (kprobe_single_step_handler(regs)) - return; + if (!user) { + if (kprobe_single_step_handler(regs)) + return; - if (kprobe_breakpoint_handler(regs)) - return; + if (kprobe_breakpoint_handler(regs)) + return; + } #endif #ifdef CONFIG_UPROBES - if (uprobe_single_step_handler(regs)) - return; + if (user) { + if (uprobe_single_step_handler(regs)) + return; - if (uprobe_breakpoint_handler(regs)) - return; + if (uprobe_breakpoint_handler(regs)) + return; + } #endif current->thread.bad_cause = regs->cause; - if (user_mode(regs)) + if (user) force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)regs->epc); #ifdef CONFIG_KGDB else if (notify_die(DIE_TRAP, "EBREAK", regs, 0, regs->cause, SIGTRAP)