Message ID | 1466442425-11885-1-git-send-email-peter.maydell@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 20/06/16 18:07, Peter Maydell wrote: > In get_page_addr_code(), if the guest program counter turns out not to > be in ROM or RAM, we can't handle executing from it, and we call > cpu_abort(). This results in the message > qemu: fatal: Trying to execute code outside RAM or ROM at 0x08000000 > followed by a guest register dump, and then QEMU dumps core. > > This situation happens in one of two cases: > (1) a guest kernel bug, where it jumped off into nowhere > (2) a user command line mistake, where they tried to run an image for > board A on a QEMU model of board B, or where they didn't provide > an image at all, and QEMU executed through a ROM or RAM full of > NOP instructions and then fell off the end > > In either case, a core dump of QEMU itself is entirely useless, and > only confuses users into thinking that this is a bug in QEMU rather > than a bug in the guest or a problem with their command line. (This > is a variation on the general idea that we shouldn't assert() on > something the user can accidentally provoke.) > > Replace the cpu_abort() with something that explains the situation > a bit better and exits QEMU without dumping core. > > (See LP:1062220 for several examples of confused users.) > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > I've been meaning to do this for a while now...hopefully the > expanded error message should reduce user confusion. > > cputlb.c | 39 +++++++++++++++++++++++++++++++++++++-- > 1 file changed, 37 insertions(+), 2 deletions(-) > > diff --git a/cputlb.c b/cputlb.c > index 23c9b91..079e497 100644 > --- a/cputlb.c > +++ b/cputlb.c > @@ -30,6 +30,8 @@ > #include "exec/ram_addr.h" > #include "exec/exec-all.h" > #include "tcg/tcg.h" > +#include "qemu/error-report.h" > +#include "exec/log.h" > > /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */ > /* #define DEBUG_TLB */ > @@ -427,6 +429,39 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr, > prot, mmu_idx, size); > } > > +static void report_bad_exec(CPUState *cpu, target_ulong addr) > +{ > + /* Accidentally executing outside RAM or ROM is quite common for > + * several user-error situations, so report it in a way that > + * makes it clear that this isn't a QEMU bug and provide suggestions > + * about what a user could do to fix things. > + */ > + error_report("Trying to execute code outside RAM or ROM at 0x" > + TARGET_FMT_lx, addr); > + error_printf("This usually means one of the following happened:\n\n" > + "(1) You told QEMU to execute a kernel for the wrong machine " > + "type, and it crashed on startup (eg trying to run a " > + "raspberry pi kernel on a versatilepb QEMU machine)\n" > + "(2) You didn't give QEMU a kernel or BIOS filename at all, " > + "and QEMU executed a ROM full of no-op instructions until " > + "it fell off the end\n" > + "(3) Your guest kernel has a bug and crashed by jumping " > + "off into nowhere\n\n" > + "This is almost always one of the first two, so check your " > + "command line and that you are using the right type of kernel " > + "for this machine.\n" > + "If you think option (3) is likely then you can try debugging " > + "your guest with the -d debug options; in particular " > + "-d guest_errors will cause the log to include a dump of the " > + "guest register state at this point.\n\n" > + "Execution cannot continue; stopping here.\n\n"); > + > + /* Report also to the logs, with more detail including register dump */ > + qemu_log_mask(LOG_GUEST_ERROR, "qemu: fatal: Trying to execute code " > + "outside RAM or ROM at 0x" TARGET_FMT_lx "\n", addr); > + log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP); > +} > + > /* NOTE: this function can trigger an exception */ > /* NOTE2: the returned address is not exactly the physical address: it > * is actually a ram_addr_t (in system mode; the user mode emulation > @@ -455,8 +490,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr) > if (cc->do_unassigned_access) { > cc->do_unassigned_access(cpu, addr, false, true, 0, 4); > } else { > - cpu_abort(cpu, "Trying to execute code outside RAM or ROM at 0x" > - TARGET_FMT_lx "\n", addr); > + report_bad_exec(cpu, addr); > + exit(1); > } > } > p = (void *)((uintptr_t)addr + env1->tlb_table[mmu_idx][page_index].addend); > Excellent! Another use case I see here is with HelenOS/ppc whose bootloader is fixed at address 0x8000000 (128Mb) and so if you don't increase the memory above the default then you end up with this panic, which as you rightly point out is often confusing. ATB, Mark.
On 20 June 2016 at 20:16, Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> wrote: > Excellent! Another use case I see here is with HelenOS/ppc whose > bootloader is fixed at address 0x8000000 (128Mb) and so if you don't > increase the memory above the default then you end up with this panic, > which as you rightly point out is often confusing. For that one, if the real life machine always has more ram and we don't mind breaking migration back-compat for it, then we could set its default_ram_size to something other than 128MB. thanks -- PMM
Ping for review? thanks -- PMM On 20 June 2016 at 18:07, Peter Maydell <peter.maydell@linaro.org> wrote: > In get_page_addr_code(), if the guest program counter turns out not to > be in ROM or RAM, we can't handle executing from it, and we call > cpu_abort(). This results in the message > qemu: fatal: Trying to execute code outside RAM or ROM at 0x08000000 > followed by a guest register dump, and then QEMU dumps core. > > This situation happens in one of two cases: > (1) a guest kernel bug, where it jumped off into nowhere > (2) a user command line mistake, where they tried to run an image for > board A on a QEMU model of board B, or where they didn't provide > an image at all, and QEMU executed through a ROM or RAM full of > NOP instructions and then fell off the end > > In either case, a core dump of QEMU itself is entirely useless, and > only confuses users into thinking that this is a bug in QEMU rather > than a bug in the guest or a problem with their command line. (This > is a variation on the general idea that we shouldn't assert() on > something the user can accidentally provoke.) > > Replace the cpu_abort() with something that explains the situation > a bit better and exits QEMU without dumping core. > > (See LP:1062220 for several examples of confused users.) > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > I've been meaning to do this for a while now...hopefully the > expanded error message should reduce user confusion. > > cputlb.c | 39 +++++++++++++++++++++++++++++++++++++-- > 1 file changed, 37 insertions(+), 2 deletions(-) > > diff --git a/cputlb.c b/cputlb.c > index 23c9b91..079e497 100644 > --- a/cputlb.c > +++ b/cputlb.c > @@ -30,6 +30,8 @@ > #include "exec/ram_addr.h" > #include "exec/exec-all.h" > #include "tcg/tcg.h" > +#include "qemu/error-report.h" > +#include "exec/log.h" > > /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */ > /* #define DEBUG_TLB */ > @@ -427,6 +429,39 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr, > prot, mmu_idx, size); > } > > +static void report_bad_exec(CPUState *cpu, target_ulong addr) > +{ > + /* Accidentally executing outside RAM or ROM is quite common for > + * several user-error situations, so report it in a way that > + * makes it clear that this isn't a QEMU bug and provide suggestions > + * about what a user could do to fix things. > + */ > + error_report("Trying to execute code outside RAM or ROM at 0x" > + TARGET_FMT_lx, addr); > + error_printf("This usually means one of the following happened:\n\n" > + "(1) You told QEMU to execute a kernel for the wrong machine " > + "type, and it crashed on startup (eg trying to run a " > + "raspberry pi kernel on a versatilepb QEMU machine)\n" > + "(2) You didn't give QEMU a kernel or BIOS filename at all, " > + "and QEMU executed a ROM full of no-op instructions until " > + "it fell off the end\n" > + "(3) Your guest kernel has a bug and crashed by jumping " > + "off into nowhere\n\n" > + "This is almost always one of the first two, so check your " > + "command line and that you are using the right type of kernel " > + "for this machine.\n" > + "If you think option (3) is likely then you can try debugging " > + "your guest with the -d debug options; in particular " > + "-d guest_errors will cause the log to include a dump of the " > + "guest register state at this point.\n\n" > + "Execution cannot continue; stopping here.\n\n"); > + > + /* Report also to the logs, with more detail including register dump */ > + qemu_log_mask(LOG_GUEST_ERROR, "qemu: fatal: Trying to execute code " > + "outside RAM or ROM at 0x" TARGET_FMT_lx "\n", addr); > + log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP); > +} > + > /* NOTE: this function can trigger an exception */ > /* NOTE2: the returned address is not exactly the physical address: it > * is actually a ram_addr_t (in system mode; the user mode emulation > @@ -455,8 +490,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr) > if (cc->do_unassigned_access) { > cc->do_unassigned_access(cpu, addr, false, true, 0, 4); > } else { > - cpu_abort(cpu, "Trying to execute code outside RAM or ROM at 0x" > - TARGET_FMT_lx "\n", addr); > + report_bad_exec(cpu, addr); > + exit(1); > } > } > p = (void *)((uintptr_t)addr + env1->tlb_table[mmu_idx][page_index].addend); > -- > 1.9.1
On 06/20/2016 10:07 AM, Peter Maydell wrote: > In get_page_addr_code(), if the guest program counter turns out not to > be in ROM or RAM, we can't handle executing from it, and we call > cpu_abort(). This results in the message > qemu: fatal: Trying to execute code outside RAM or ROM at 0x08000000 > followed by a guest register dump, and then QEMU dumps core. > > This situation happens in one of two cases: > (1) a guest kernel bug, where it jumped off into nowhere > (2) a user command line mistake, where they tried to run an image for > board A on a QEMU model of board B, or where they didn't provide > an image at all, and QEMU executed through a ROM or RAM full of > NOP instructions and then fell off the end > > In either case, a core dump of QEMU itself is entirely useless, and > only confuses users into thinking that this is a bug in QEMU rather > than a bug in the guest or a problem with their command line. (This > is a variation on the general idea that we shouldn't assert() on > something the user can accidentally provoke.) > > Replace the cpu_abort() with something that explains the situation > a bit better and exits QEMU without dumping core. > > (See LP:1062220 for several examples of confused users.) > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- Reviewed-by: Richard Henderson <rth@twiddle.net> r~
On 28/06/2016 17:42, Peter Maydell wrote: > Ping for review? The patch is trivial, the hard part was coming up with the message for the user. :) Go ahead! Paolo > thanks > -- PMM > > On 20 June 2016 at 18:07, Peter Maydell <peter.maydell@linaro.org> wrote: >> In get_page_addr_code(), if the guest program counter turns out not to >> be in ROM or RAM, we can't handle executing from it, and we call >> cpu_abort(). This results in the message >> qemu: fatal: Trying to execute code outside RAM or ROM at 0x08000000 >> followed by a guest register dump, and then QEMU dumps core. >> >> This situation happens in one of two cases: >> (1) a guest kernel bug, where it jumped off into nowhere >> (2) a user command line mistake, where they tried to run an image for >> board A on a QEMU model of board B, or where they didn't provide >> an image at all, and QEMU executed through a ROM or RAM full of >> NOP instructions and then fell off the end >> >> In either case, a core dump of QEMU itself is entirely useless, and >> only confuses users into thinking that this is a bug in QEMU rather >> than a bug in the guest or a problem with their command line. (This >> is a variation on the general idea that we shouldn't assert() on >> something the user can accidentally provoke.) >> >> Replace the cpu_abort() with something that explains the situation >> a bit better and exits QEMU without dumping core. >> >> (See LP:1062220 for several examples of confused users.) >> >> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> >> --- >> I've been meaning to do this for a while now...hopefully the >> expanded error message should reduce user confusion. >> >> cputlb.c | 39 +++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 37 insertions(+), 2 deletions(-) >> >> diff --git a/cputlb.c b/cputlb.c >> index 23c9b91..079e497 100644 >> --- a/cputlb.c >> +++ b/cputlb.c >> @@ -30,6 +30,8 @@ >> #include "exec/ram_addr.h" >> #include "exec/exec-all.h" >> #include "tcg/tcg.h" >> +#include "qemu/error-report.h" >> +#include "exec/log.h" >> >> /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */ >> /* #define DEBUG_TLB */ >> @@ -427,6 +429,39 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr, >> prot, mmu_idx, size); >> } >> >> +static void report_bad_exec(CPUState *cpu, target_ulong addr) >> +{ >> + /* Accidentally executing outside RAM or ROM is quite common for >> + * several user-error situations, so report it in a way that >> + * makes it clear that this isn't a QEMU bug and provide suggestions >> + * about what a user could do to fix things. >> + */ >> + error_report("Trying to execute code outside RAM or ROM at 0x" >> + TARGET_FMT_lx, addr); >> + error_printf("This usually means one of the following happened:\n\n" >> + "(1) You told QEMU to execute a kernel for the wrong machine " >> + "type, and it crashed on startup (eg trying to run a " >> + "raspberry pi kernel on a versatilepb QEMU machine)\n" >> + "(2) You didn't give QEMU a kernel or BIOS filename at all, " >> + "and QEMU executed a ROM full of no-op instructions until " >> + "it fell off the end\n" >> + "(3) Your guest kernel has a bug and crashed by jumping " >> + "off into nowhere\n\n" >> + "This is almost always one of the first two, so check your " >> + "command line and that you are using the right type of kernel " >> + "for this machine.\n" >> + "If you think option (3) is likely then you can try debugging " >> + "your guest with the -d debug options; in particular " >> + "-d guest_errors will cause the log to include a dump of the " >> + "guest register state at this point.\n\n" >> + "Execution cannot continue; stopping here.\n\n"); >> + >> + /* Report also to the logs, with more detail including register dump */ >> + qemu_log_mask(LOG_GUEST_ERROR, "qemu: fatal: Trying to execute code " >> + "outside RAM or ROM at 0x" TARGET_FMT_lx "\n", addr); >> + log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP); >> +} >> + >> /* NOTE: this function can trigger an exception */ >> /* NOTE2: the returned address is not exactly the physical address: it >> * is actually a ram_addr_t (in system mode; the user mode emulation >> @@ -455,8 +490,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr) >> if (cc->do_unassigned_access) { >> cc->do_unassigned_access(cpu, addr, false, true, 0, 4); >> } else { >> - cpu_abort(cpu, "Trying to execute code outside RAM or ROM at 0x" >> - TARGET_FMT_lx "\n", addr); >> + report_bad_exec(cpu, addr); >> + exit(1); >> } >> } >> p = (void *)((uintptr_t)addr + env1->tlb_table[mmu_idx][page_index].addend); >> -- >> 1.9.1 > >
On 28 June 2016 at 18:49, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 28/06/2016 17:42, Peter Maydell wrote: >> Ping for review? > > The patch is trivial, the hard part was coming up with the message for > the user. :) Sure, but review includes whether the message makes sense :-) > Go ahead! I'll push it to master in a bit. thanks -- PMM
diff --git a/cputlb.c b/cputlb.c index 23c9b91..079e497 100644 --- a/cputlb.c +++ b/cputlb.c @@ -30,6 +30,8 @@ #include "exec/ram_addr.h" #include "exec/exec-all.h" #include "tcg/tcg.h" +#include "qemu/error-report.h" +#include "exec/log.h" /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */ /* #define DEBUG_TLB */ @@ -427,6 +429,39 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr, prot, mmu_idx, size); } +static void report_bad_exec(CPUState *cpu, target_ulong addr) +{ + /* Accidentally executing outside RAM or ROM is quite common for + * several user-error situations, so report it in a way that + * makes it clear that this isn't a QEMU bug and provide suggestions + * about what a user could do to fix things. + */ + error_report("Trying to execute code outside RAM or ROM at 0x" + TARGET_FMT_lx, addr); + error_printf("This usually means one of the following happened:\n\n" + "(1) You told QEMU to execute a kernel for the wrong machine " + "type, and it crashed on startup (eg trying to run a " + "raspberry pi kernel on a versatilepb QEMU machine)\n" + "(2) You didn't give QEMU a kernel or BIOS filename at all, " + "and QEMU executed a ROM full of no-op instructions until " + "it fell off the end\n" + "(3) Your guest kernel has a bug and crashed by jumping " + "off into nowhere\n\n" + "This is almost always one of the first two, so check your " + "command line and that you are using the right type of kernel " + "for this machine.\n" + "If you think option (3) is likely then you can try debugging " + "your guest with the -d debug options; in particular " + "-d guest_errors will cause the log to include a dump of the " + "guest register state at this point.\n\n" + "Execution cannot continue; stopping here.\n\n"); + + /* Report also to the logs, with more detail including register dump */ + qemu_log_mask(LOG_GUEST_ERROR, "qemu: fatal: Trying to execute code " + "outside RAM or ROM at 0x" TARGET_FMT_lx "\n", addr); + log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP); +} + /* NOTE: this function can trigger an exception */ /* NOTE2: the returned address is not exactly the physical address: it * is actually a ram_addr_t (in system mode; the user mode emulation @@ -455,8 +490,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr) if (cc->do_unassigned_access) { cc->do_unassigned_access(cpu, addr, false, true, 0, 4); } else { - cpu_abort(cpu, "Trying to execute code outside RAM or ROM at 0x" - TARGET_FMT_lx "\n", addr); + report_bad_exec(cpu, addr); + exit(1); } } p = (void *)((uintptr_t)addr + env1->tlb_table[mmu_idx][page_index].addend);
In get_page_addr_code(), if the guest program counter turns out not to be in ROM or RAM, we can't handle executing from it, and we call cpu_abort(). This results in the message qemu: fatal: Trying to execute code outside RAM or ROM at 0x08000000 followed by a guest register dump, and then QEMU dumps core. This situation happens in one of two cases: (1) a guest kernel bug, where it jumped off into nowhere (2) a user command line mistake, where they tried to run an image for board A on a QEMU model of board B, or where they didn't provide an image at all, and QEMU executed through a ROM or RAM full of NOP instructions and then fell off the end In either case, a core dump of QEMU itself is entirely useless, and only confuses users into thinking that this is a bug in QEMU rather than a bug in the guest or a problem with their command line. (This is a variation on the general idea that we shouldn't assert() on something the user can accidentally provoke.) Replace the cpu_abort() with something that explains the situation a bit better and exits QEMU without dumping core. (See LP:1062220 for several examples of confused users.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- I've been meaning to do this for a while now...hopefully the expanded error message should reduce user confusion. cputlb.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)