diff mbox series

[4/6] arm64: pmu: Add hook to handle pmu-related undefined instructions

Message ID 20190516132148.10085-5-raphael.gault@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: Enable access to pmu registers by user-space | expand

Commit Message

Raphael Gault May 16, 2019, 1:21 p.m. UTC
In order to prevent the userspace processes which are trying to access
the registers from the pmu registers on a big.LITTLE environment we
introduce a hook to handle undefined instructions.

The goal here is to prevent the process to be interrupted by a signal
when the error is caused by the task being scheduled while accessing
a counter, causing the counter access to be invalid. As we are not able
to know efficiently the number of counters available physically on both
pmu in that context we consider that any faulting access to a counter
which is architecturally correct should not cause a SIGILL signal if
the permissions are set accordingly.

Signed-off-by: Raphael Gault <raphael.gault@arm.com>
---
 arch/arm64/kernel/perf_event.c | 68 ++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

Comments

Peter Zijlstra May 17, 2019, 7:10 a.m. UTC | #1
On Thu, May 16, 2019 at 02:21:46PM +0100, Raphael Gault wrote:
> In order to prevent the userspace processes which are trying to access
> the registers from the pmu registers on a big.LITTLE environment we
> introduce a hook to handle undefined instructions.
> 
> The goal here is to prevent the process to be interrupted by a signal
> when the error is caused by the task being scheduled while accessing
> a counter, causing the counter access to be invalid. As we are not able
> to know efficiently the number of counters available physically on both
> pmu in that context we consider that any faulting access to a counter
> which is architecturally correct should not cause a SIGILL signal if
> the permissions are set accordingly.

The other approach is using rseq for this; with that you can guarantee
it will never issue the instruction on a wrong CPU.

That said; emulating the thing isn't horrible either.

> +	/*
> +	 * We put 0 in the target register if we
> +	 * are reading from pmu register. If we are
> +	 * writing, we do nothing.
> +	 */

Wait _what_ ?!? userspace can _WRITE_ to these registers?
Raphael Gault May 17, 2019, 7:35 a.m. UTC | #2
Hi,

On 5/17/19 8:10 AM, Peter Zijlstra wrote:
> On Thu, May 16, 2019 at 02:21:46PM +0100, Raphael Gault wrote:
>> In order to prevent the userspace processes which are trying to access
>> the registers from the pmu registers on a big.LITTLE environment we
>> introduce a hook to handle undefined instructions.
>>
>> The goal here is to prevent the process to be interrupted by a signal
>> when the error is caused by the task being scheduled while accessing
>> a counter, causing the counter access to be invalid. As we are not able
>> to know efficiently the number of counters available physically on both
>> pmu in that context we consider that any faulting access to a counter
>> which is architecturally correct should not cause a SIGILL signal if
>> the permissions are set accordingly.
>
> The other approach is using rseq for this; with that you can guarantee
> it will never issue the instruction on a wrong CPU.
>
> That said; emulating the thing isn't horrible either.
>
>> +/*
>> + * We put 0 in the target register if we
>> + * are reading from pmu register. If we are
>> + * writing, we do nothing.
>> + */
>
> Wait _what_ ?!? userspace can _WRITE_ to these registers?
>

The user can write to some pmu registers but those are not the ones that
interest us here. My comment was ill formed, indeed this hook can only
be triggered by reads in this case.
Sorry about that.

Thanks,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Mark Rutland May 17, 2019, 8:04 a.m. UTC | #3
On Fri, May 17, 2019 at 09:10:18AM +0200, Peter Zijlstra wrote:
> On Thu, May 16, 2019 at 02:21:46PM +0100, Raphael Gault wrote:
> > In order to prevent the userspace processes which are trying to access
> > the registers from the pmu registers on a big.LITTLE environment we
> > introduce a hook to handle undefined instructions.
> > 
> > The goal here is to prevent the process to be interrupted by a signal
> > when the error is caused by the task being scheduled while accessing
> > a counter, causing the counter access to be invalid. As we are not able
> > to know efficiently the number of counters available physically on both
> > pmu in that context we consider that any faulting access to a counter
> > which is architecturally correct should not cause a SIGILL signal if
> > the permissions are set accordingly.
> 
> The other approach is using rseq for this; with that you can guarantee
> it will never issue the instruction on a wrong CPU.
> 
> That said; emulating the thing isn't horrible either.

Yup. Attempting to use rseq is on the todo list.

> > +	/*
> > +	 * We put 0 in the target register if we
> > +	 * are reading from pmu register. If we are
> > +	 * writing, we do nothing.
> > +	 */
> 
> Wait _what_ ?!? userspace can _WRITE_ to these registers?

Remember that this is in an undefined (trap) handler.

If userspace _attempts_ to write to the registers, the CPU will trap to the
kernel. The comment is perhaps misleading; when we "do nothing", the common
trap handling code will send a SIGILL to userspace.

It would probably be better to say something like:

	/*
	 * If userspace is tries to read a counter that doesn't exist on this
	 * CPU, we emulate it as reading as zero. This happens if userspace is
	 * preempted between reading the idx and actually reading the counter,
	 * and the seqlock and idx have already changed, so it's as-if the
	 * counter has been reprogrammed with a different event.
	 *
	 * We don't permit userspace to write to these registers, and will
	 * inject a SIGILL.
	 */

There is one caveat: userspace can write to PMSELR without trapping, so we will
have to context-switch with the task. That only affects indirect addressing of
PMU registers, and doesn't have a functional effect on the behaviour of the
PMU, so that's benign from the PoV of perf.

Thanks,
Mark.
Peter Zijlstra May 17, 2019, 8:26 a.m. UTC | #4
On Fri, May 17, 2019 at 09:04:20AM +0100, Mark Rutland wrote:

> Remember that this is in an undefined (trap) handler.
> 
> If userspace _attempts_ to write to the registers, the CPU will trap to the
> kernel. The comment is perhaps misleading; when we "do nothing", the common
> trap handling code will send a SIGILL to userspace.
> 
> It would probably be better to say something like:
> 
> 	/*
> 	 * If userspace is tries to read a counter that doesn't exist on this
> 	 * CPU, we emulate it as reading as zero. This happens if userspace is
> 	 * preempted between reading the idx and actually reading the counter,
> 	 * and the seqlock and idx have already changed, so it's as-if the
> 	 * counter has been reprogrammed with a different event.

Might be good to mention that userspace will/should discard the value it
reads, and therefore any value is good (including 0).

> 	 * We don't permit userspace to write to these registers, and will
> 	 * inject a SIGILL.
> 	 */
> 
> There is one caveat: userspace can write to PMSELR without trapping, so we will
> have to context-switch with the task. That only affects indirect addressing of
> PMU registers, and doesn't have a functional effect on the behaviour of the
> PMU, so that's benign from the PoV of perf.

Sad though; ideally you'd state that indirect addressing is
out-of-bounds and they get to keep the pieces. But I suspect you're
right that people will do it anyway and complain once it comes apart.
Peter Zijlstra May 17, 2019, 9:07 a.m. UTC | #5
On Fri, May 17, 2019 at 10:26:55AM +0200, Peter Zijlstra wrote:
> On Fri, May 17, 2019 at 09:04:20AM +0100, Mark Rutland wrote:
> 
> > Remember that this is in an undefined (trap) handler.
> > 
> > If userspace _attempts_ to write to the registers, the CPU will trap to the
> > kernel. The comment is perhaps misleading; when we "do nothing", the common
> > trap handling code will send a SIGILL to userspace.
> > 
> > It would probably be better to say something like:
> > 
> > 	/*
> > 	 * If userspace is tries to read a counter that doesn't exist on this
> > 	 * CPU, we emulate it as reading as zero. This happens if userspace is
> > 	 * preempted between reading the idx and actually reading the counter,
> > 	 * and the seqlock and idx have already changed, so it's as-if the
> > 	 * counter has been reprogrammed with a different event.
> 
> Might be good to mention that userspace will/should discard the value it
> reads, and therefore any value is good (including 0).
> 
> > 	 * We don't permit userspace to write to these registers, and will
> > 	 * inject a SIGILL.
> > 	 */
> > 
> > There is one caveat: userspace can write to PMSELR without trapping, so we will
> > have to context-switch with the task. That only affects indirect addressing of
> > PMU registers, and doesn't have a functional effect on the behaviour of the
> > PMU, so that's benign from the PoV of perf.
> 
> Sad though; ideally you'd state that indirect addressing is
> out-of-bounds and they get to keep the pieces. But I suspect you're
> right that people will do it anyway and complain once it comes apart.

I'm still not entirely convinced you need that context switching. If we
sched-out, the seqcount value will change, idem when we sched-in. So
under no circumstance (even if we stay on the same CPU), will the
seqcount match when we get back on.

So why preserve that register?
diff mbox series

Patch

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index e6316f99f66b..760c947b58dd 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -19,9 +19,11 @@ 
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <asm/cpu.h>
 #include <asm/irq_regs.h>
 #include <asm/perf_event.h>
 #include <asm/sysreg.h>
+#include <asm/traps.h>
 #include <asm/virt.h>
 
 #include <linux/acpi.h>
@@ -993,6 +995,72 @@  static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
 	return probe.present ? 0 : -ENODEV;
 }
 
+static bool is_evcntr(u32 sys_reg)
+{
+	u32 CRn, Op0, Op1, CRm;
+
+	CRn = sys_reg_CRn(sys_reg);
+	CRm = sys_reg_CRm(sys_reg);
+	Op0 = sys_reg_Op0(sys_reg);
+	Op1 = sys_reg_Op1(sys_reg);
+
+	return (CRn == 0xE &&
+		(CRm & 0xc) == 0x8 &&
+		Op1 == 0x3 &&
+		Op0 == 0x3);
+}
+
+static int emulate_pmu(struct pt_regs *regs, u32 insn)
+{
+	u32 sys_reg, rt;
+	u32 pmuserenr;
+
+	sys_reg = (u32)aarch64_insn_decode_immediate(AARCH64_INSN_IMM_16, insn) << 5;
+	rt = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RT, insn);
+	pmuserenr = read_sysreg(pmuserenr_el0);
+
+	if ((pmuserenr & (ARMV8_PMU_USERENR_ER|ARMV8_PMU_USERENR_CR)) !=
+	    (ARMV8_PMU_USERENR_ER|ARMV8_PMU_USERENR_CR))
+		return -EINVAL;
+
+	if (sys_reg != SYS_PMXEVCNTR_EL0 &&
+	    !is_evcntr(sys_reg))
+		return -EINVAL;
+
+	/*
+	 * We put 0 in the target register if we
+	 * are reading from pmu register. If we are
+	 * writing, we do nothing.
+	 */
+	if ((insn & 0xfff00000) == 0xd5300000)
+		pt_regs_write_reg(regs, rt, 0);
+	else if (sys_reg != SYS_PMSELR_EL0)
+		return -EINVAL;
+
+	arm64_skip_faulting_instruction(regs, 4);
+	return 0;
+}
+
+/*
+ * This hook will only be triggered by mrs
+ * instructions on PMU registers. This is mandatory
+ * in order to have a consistent behaviour even on
+ * big.LITTLE systems.
+ */
+static struct undef_hook pmu_hook = {
+	.instr_mask = 0xffff8800,
+	.instr_val  = 0xd53b8800,
+	.fn = emulate_pmu,
+};
+
+static int __init enable_pmu_emulation(void)
+{
+	register_undef_hook(&pmu_hook);
+	return 0;
+}
+
+core_initcall(enable_pmu_emulation);
+
 static int armv8_pmu_init(struct arm_pmu *cpu_pmu)
 {
 	int ret = armv8pmu_probe_pmu(cpu_pmu);