diff mbox series

[1/2] linux-user: Fix siginfo_t contents when jumping to non-readable pages

Message ID 20220804182359.830058-2-iii@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series linux-user: Fix siginfo_t contents when jumping to non-readable pages | expand

Commit Message

Ilya Leoshkevich Aug. 4, 2022, 6:23 p.m. UTC
When the first instruction of a translation block is located in a
non-readable page, qemu-user fills siginfo_t correctly. For the other
instructions the result is as if it were the first instruction, which
is not correct.

The reason is that the current logic expects translate_insn() hook to
stop at the page boundary. This way only the first instruction can
cause a SEGV. However, this is quite difficult to properly implement
when the problematic instruction crosses a page boundary, and indeed
the actual implementations do not do this. Note that this can also
break self-modifying code detection when only bytes on the second page
are modified, but this is outside of the scope of this patch.

Instead of chaning all the translators, do a much simpler thing: when
such a situation is detected, start from scratch and stop right before
the problematic instruction.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 accel/tcg/translate-all.c | 16 +++++++++++-----
 accel/tcg/translator.c    | 25 +++++++++++++++++++++++++
 include/hw/core/cpu.h     |  2 ++
 linux-user/signal.c       |  5 +++++
 4 files changed, 43 insertions(+), 5 deletions(-)

Comments

Peter Maydell Aug. 5, 2022, 8:50 a.m. UTC | #1
On Thu, 4 Aug 2022 at 19:50, Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> When the first instruction of a translation block is located in a
> non-readable page, qemu-user fills siginfo_t correctly. For the other
> instructions the result is as if it were the first instruction, which
> is not correct.
>
> The reason is that the current logic expects translate_insn() hook to
> stop at the page boundary. This way only the first instruction can
> cause a SEGV. However, this is quite difficult to properly implement
> when the problematic instruction crosses a page boundary, and indeed
> the actual implementations do not do this. Note that this can also
> break self-modifying code detection when only bytes on the second page
> are modified, but this is outside of the scope of this patch.

Which guests do you observe this on ? I think we should indeed
fix this in the translators. More specifically, I think we should
get this correct already on Arm, and I would expect it to work
correctly on all the fixed-insn-width architectures, which can't
have page-crossing-insns to start with. x86 probably gets this wrong.

thanks
-- PMM
Ilya Leoshkevich Aug. 5, 2022, 10:28 a.m. UTC | #2
On Fri, 2022-08-05 at 09:50 +0100, Peter Maydell wrote:
> On Thu, 4 Aug 2022 at 19:50, Ilya Leoshkevich <iii@linux.ibm.com>
> wrote:
> > 
> > When the first instruction of a translation block is located in a
> > non-readable page, qemu-user fills siginfo_t correctly. For the
> > other
> > instructions the result is as if it were the first instruction,
> > which
> > is not correct.
> > 
> > The reason is that the current logic expects translate_insn() hook
> > to
> > stop at the page boundary. This way only the first instruction can
> > cause a SEGV. However, this is quite difficult to properly
> > implement
> > when the problematic instruction crosses a page boundary, and
> > indeed
> > the actual implementations do not do this. Note that this can also
> > break self-modifying code detection when only bytes on the second
> > page
> > are modified, but this is outside of the scope of this patch.
> 
> Which guests do you observe this on ? I think we should indeed
> fix this in the translators. More specifically, I think we should
> get this correct already on Arm, and I would expect it to work
> correctly on all the fixed-insn-width architectures, which can't
> have page-crossing-insns to start with. x86 probably gets this wrong.
> 
> thanks
> -- PMM

I first discovered this on s390x, and then realized x86_64 is broken as
well. Fixing this in translators means adding page boundary checks to
all code loads. Actually, on s390x it doesn't look as nasty as I
thought it would, since we quickly figure out the length and load
everything in one place:

$ grep ld.*code target/s390x/tcg/translate.c | wc -l
6

On x86_64 it's as bad as expected:

$ grep x86_ld.*code target/i386/tcg/translate.c | wc -l
96

Implementing this there would mean changing x86_ldub_code() and friends
to macros, and then making sure we undo everything that we did since
the start of the instruction. E.g. bt/bts/btr/btc mix parsing and
op emission. There might be something that touches DisasContext as
well. Therefore I thought that the generic approach from this patch
would be more reliable.
Peter Maydell Aug. 5, 2022, 10:55 a.m. UTC | #3
On Fri, 5 Aug 2022 at 11:28, Ilya Leoshkevich <iii@linux.ibm.com> wrote:
> On Fri, 2022-08-05 at 09:50 +0100, Peter Maydell wrote:
> > Which guests do you observe this on ? I think we should indeed
> > fix this in the translators. More specifically, I think we should
> > get this correct already on Arm, and I would expect it to work
> > correctly on all the fixed-insn-width architectures, which can't
> > have page-crossing-insns to start with. x86 probably gets this wrong.

> I first discovered this on s390x, and then realized x86_64 is broken as
> well. Fixing this in translators means adding page boundary checks to
> all code loads. Actually, on s390x it doesn't look as nasty as I
> thought it would, since we quickly figure out the length and load
> everything in one place:
>
> $ grep ld.*code target/s390x/tcg/translate.c | wc -l
> 6

Yes, it depends a bit on the translator and the architecture
how painful it is to get this right. Note that it's OK to
be pessimistic about whether an insn might cross the page
boundary (you end up with a 1-insn TB for it, which is a bit
inefficient but not wrong behaviour). For Arm we check this kind
of thing in insn_crosses_page() (original fix in commit
541ebcd401ee in 2015, cleaned up to be a bit more efficient later).

I think also that this bug is not specific to linux-user.
In a case with a TB like:

 load/store insn that should fault
 other insn that spans page boundary into a non-executable page

ie where the translator failed to break the TB before the
page-boundary-spanning instruction, we will report the fault for
the execution on the non-executable page, when we should have
reported the fault for the load/store, and this happens in system
mode as well as linux-user.

> On x86_64 it's as bad as expected:
>
> $ grep x86_ld.*code target/i386/tcg/translate.c | wc -l
> 96
>
> Implementing this there would mean changing x86_ldub_code() and friends
> to macros, and then making sure we undo everything that we did since
> the start of the instruction. E.g. bt/bts/btr/btc mix parsing and
> op emission. There might be something that touches DisasContext as
> well. Therefore I thought that the generic approach from this patch
> would be more reliable.

No surprise that x86 is a mess :-)

-- PMM
diff mbox series

Patch

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index ef62a199c7..b4766f4661 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2295,12 +2295,18 @@  void page_set_flags(target_ulong start, target_ulong end, int flags)
          len != 0;
          len -= TARGET_PAGE_SIZE, addr += TARGET_PAGE_SIZE) {
         PageDesc *p = page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
+        bool invalidate;
 
-        /* If the write protection bit is set, then we invalidate
-           the code inside.  */
-        if (!(p->flags & PAGE_WRITE) &&
-            (flags & PAGE_WRITE) &&
-            p->first_tb) {
+        /*
+         * If the write protection bit is set, then we invalidate the code
+         * inside.  For qemu-user, we need to do this when PAGE_READ is cleared
+         * as well, in order to force a SEGV when trying to run this code.
+         */
+        invalidate = !(p->flags & PAGE_WRITE) && (flags & PAGE_WRITE);
+#ifdef CONFIG_USER_ONLY
+        invalidate |= (p->flags & PAGE_READ) && !(flags & PAGE_READ);
+#endif
+        if (invalidate && p->first_tb) {
             tb_invalidate_phys_page(addr, 0);
         }
         if (reset_target_data) {
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index fe7af9b943..e444c17515 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -57,6 +57,18 @@  void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
     uint32_t cflags = tb_cflags(tb);
     bool plugin_enabled;
 
+    /*
+     * In case translate_insn hook touched an unreadable page, redo the
+     * translation until the problematic instruction.  We cannot just throw
+     * away the trailing ops, because the hook could have changed DisasContext.
+     */
+    tcg_debug_assert(!cpu->translator_jmp);
+    if (sigsetjmp(cpu->translator_jmp_env, 1) != 0) {
+        cpu->translator_jmp = false;
+        tcg_remove_ops_after(NULL);
+        max_insns = db->num_insns - 1;
+    }
+
     /* Initialize DisasContext */
     db->tb = tb;
     db->pc_first = tb->pc;
@@ -122,8 +134,21 @@  void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
             db->is_jmp = DISAS_TOO_MANY;
             break;
         }
+
+        /*
+         * Propagate SEGVs from the first instruction to the guest and handle
+         * the rest. This way guest's siginfo_t gets accurate pc and si_addr.
+         */
+        cpu->translator_jmp = true;
     }
 
+    /*
+     * Clear translator_jmp on all ways out of this function, otherwise
+     * instructions that fetch code as part of their operation will be
+     * confused.
+     */
+    cpu->translator_jmp = false;
+
     /* Emit code to exit the TB, as indicated by db->is_jmp.  */
     ops->tb_stop(db, cpu);
     gen_tb_end(db->tb, db->num_insns);
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 500503da13..6c1829b7f5 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -349,6 +349,8 @@  struct CPUState {
     int64_t icount_extra;
     uint64_t random_seed;
     sigjmp_buf jmp_env;
+    bool translator_jmp;
+    sigjmp_buf translator_jmp_env;
 
     QemuMutex work_mutex;
     QSIMPLEQ_HEAD(, qemu_work_item) work_list;
diff --git a/linux-user/signal.c b/linux-user/signal.c
index 8d29bfaa6b..f7e77c8d2e 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -833,6 +833,11 @@  static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
         abi_ptr guest_addr;
         bool is_write;
 
+        /* Translator wants to handle this. */
+        if (helper_retaddr == 1 && cpu->translator_jmp) {
+            siglongjmp(cpu->translator_jmp_env, 1);
+        }
+
         host_addr = (uintptr_t)info->si_addr;
 
         /*