mbox series

[0/6] x86/debug: fix guest dr6 value for single stepping and HW breakpoints

Message ID ca40fdab-fbe4-8679-9f7c-194d54a7ef34@gmail.com (mailing list archive)
Headers show
Series x86/debug: fix guest dr6 value for single stepping and HW breakpoints | expand

Message

Jinoh Kang Aug. 18, 2023, 3:44 p.m. UTC
Xen has a bug where hardware breakpoint exceptions (DR_TRAP<n>) are
erroneously recognized as single-stepping exceptions (DR_STEP).  This
interferes with userland debugging and allows (otherwise restricted)
usermode programs to detect Xen HVM (or PVH).  This patch series aim to
fix this.

The last patch is optional and finishes incomplete plumbing work
initiated by commit 21867648033d (x86/debug: Plumb pending_dbg through
the monitor and devicemodel interfaces, 2018-05-31).

The following Linux x86-64 program demonstrates the bug:

--- (snip) ---

#include <stddef.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>
#include <signal.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <stdio.h>

#define ABORT_ON_ERR(x) if ((x) == -1) abort();

int main(void)
{
    unsigned long cur_rip, cur_eflags, cur_dr6;
    int wstatus, exit_code;
    pid_t pid;

    ABORT_ON_ERR(pid = fork());
    if (pid == 0) {
        ABORT_ON_ERR(ptrace(PTRACE_TRACEME, 0, NULL, NULL));
        ABORT_ON_ERR(raise(SIGSTOP));
        _exit(0);
    }

    /* Wait for first ptrace event */
    if (waitpid(pid, &wstatus, 0) != pid) abort();
    if (!WIFSTOPPED(wstatus)) abort();

    /* Obtain current RIP value and perform sanity check */
    cur_rip = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, regs.rip), &cur_rip);
    cur_dr6 = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, u_debugreg[6]), &cur_dr6);
    assert(cur_dr6 == 0xffff0ff0UL);

    /* Set up debug registers and set EFLAGS.TF */
    cur_eflags = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, regs.eflags), &cur_eflags);
    ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, regs.eflags), (void *)(cur_eflags | 0x100UL)));
    ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, u_debugreg[0]), (void *)cur_rip));
    ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, u_debugreg[7]), (void *)1L));

    /* Continue execution to trigger hardware breakpoint */
    ABORT_ON_ERR(ptrace(PTRACE_CONT, pid, NULL, (unsigned long)0));
    if (waitpid(pid, &wstatus, 0) != pid) abort();
    if (!(WIFSTOPPED(wstatus) && WSTOPSIG(wstatus) == SIGTRAP)) abort();

    /* Detect if Xen has tampered with DR6 */
    cur_dr6 = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, u_debugreg[6]), &cur_dr6);
    fprintf(stderr, "DR6 = 0x%08lx\n", cur_dr6);
    if (cur_dr6 == 0xffff0ff1UL)
    {
        fputs("Running on bare-metal, Xen PV, or non-Xen VMM\n", stdout);
        exit_code = EXIT_FAILURE;
    }
    else
    {
        fputs("Running on Xen HVM\n", stdout);
        exit_code = EXIT_SUCCESS;
    }

    /* Tear down debug registers and unset EFLAGS.TF */
    cur_eflags = ptrace(PTRACE_PEEKUSER, pid, (void *)offsetof(struct user, regs.eflags), &cur_eflags);
    ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, regs.eflags), (void *)(cur_eflags & ~0x100UL)));
    ABORT_ON_ERR(ptrace(PTRACE_POKEUSER, pid, (void *)offsetof(struct user, u_debugreg[7]), (void *)0L));

    /* Continue execution to let child process exit */
    ABORT_ON_ERR(ptrace(PTRACE_CONT, pid, NULL, (unsigned long)0));
    if (waitpid(pid, &wstatus, 0) != pid) abort();
    if (!(WIFEXITED(wstatus) && WEXITSTATUS(wstatus) == 0)) abort();

    return exit_code;
}

--- (snip) ---

Jinoh Kang (6):
  x86/hvm: only populate info->cr2 for #PF in hvm_get_pending_event()
  x86emul: rename field 'cr2' of struct x86_event to 'extra'
  x86: don't assume #DB is always caused by singlestep if EFLAGS.TF is
    set
  x86/pv: set DR_STEP if single-stepping after ro page fault emulation
  x86/pv: factor out single-step debug trap injection
  x86/debug: actually plumb pending_dbg through the monitor and
    devicemodel interfaces

 xen/arch/x86/hvm/dm.c                  |  2 +-
 xen/arch/x86/hvm/emulate.c             |  3 +-
 xen/arch/x86/hvm/hvm.c                 | 11 ++++--
 xen/arch/x86/hvm/svm/nestedsvm.c       |  2 +-
 xen/arch/x86/hvm/svm/svm.c             | 48 ++++++++++++++++++++------
 xen/arch/x86/hvm/vmx/vmx.c             | 40 ++++++++++++++-------
 xen/arch/x86/include/asm/domain.h      | 14 +++++++-
 xen/arch/x86/include/asm/hvm/hvm.h     | 14 +++++++-
 xen/arch/x86/mm/shadow/multi.c         |  5 +--
 xen/arch/x86/pv/emulate.c              |  5 +--
 xen/arch/x86/pv/ro-page-fault.c        |  3 +-
 xen/arch/x86/pv/traps.c                | 18 +++++++---
 xen/arch/x86/x86_emulate/x86_emulate.h |  6 ++--
 13 files changed, 128 insertions(+), 43 deletions(-)

Comments

Andrew Cooper Aug. 18, 2023, 8:22 p.m. UTC | #1
On 18/08/2023 4:44 pm, Jinoh Kang wrote:
> Xen has a bug where hardware breakpoint exceptions (DR_TRAP<n>) are
> erroneously recognized as single-stepping exceptions (DR_STEP).

I expected this to come back and bite.

https://lore.kernel.org/xen-devel/1528120755-17455-1-git-send-email-andrew.cooper3@citrix.com/

Xen's %dr6 handling is very broken, and then Spectre/Meltdown happened
and some how my bugfixes are now 5 years old and still incomplete.  I've
got a form of this series rebased onto staging, which I'll need to dust off.


That said, I was not aware of this specific case going wrong.  (But I
can't say I'm surprised.)

Thankyou for the test case.  If I'm reading it right, the problem is
that when %dr0 genuinely triggers, we VMExit (to break #DB infinite
loops), and on re-injecting the #DB back to the guest, we blindly set
singlestep?

This is wrong for #DB faults, and you're using an instruction breakpoint
so that matches.

However, it is far more complicated for #DB traps, where hardware leaves
it up to the VMM to merge status bits, and it's different between PV,
VT-x and SVM.

Worse than that, there is an bug on Intel hardware where if a singlestep
is genuinely pending, then data breakpoint information isn't passed to
the VMM on VMExit.  I have yet to persuade Intel to fix this, despite
quite a lot of trying.

Looking at patch 3, I think I can see how it fixes your bug, but I don't
think the logic is correct in all cases.  In particular, look at my
series for the cases where  X86_DR6_DEFAULT is used to flip polarity. 
Notably, PV and SVM have different dr6 polarity to VT-x's pending_dbg field.

Also, on Intel you're supposed to leave pending bits in pending_dbg and
not inject #DB directly, in order for the pipeline to get the exception
priority in the right order.  This I didn't get around to fixing at the
time.


I suspect what we'll need to do is combine parts of your series and
parts of mine.  I never got around to fixing the introspection bugs in
mine (that's a far larger task to do nicely), and yours is more targetted.

~Andrew
Jinoh Kang Aug. 19, 2023, 3 p.m. UTC | #2
On 8/19/23 05:22, Andrew Cooper wrote:
> I suspect what we'll need to do is combine parts of your series and
> parts of mine.  I never got around to fixing the introspection bugs in
> mine (that's a far larger task to do nicely), and yours is more targetted.

Unless I've done something wrong, your debug-fixes-v1 when rebased to master
seems to fix my problem, so I guess your patchset is the superset.