diff mbox series

[v2] x86/livepatch: Fix livepatch application when CET is active

Message ID 20230417135219.3776777-1-andrew.cooper3@citrix.com (mailing list archive)
State Superseded
Headers show
Series [v2] x86/livepatch: Fix livepatch application when CET is active | expand

Commit Message

Andrew Cooper April 17, 2023, 1:52 p.m. UTC
Right now, trying to apply a livepatch on any system with CET shstk (AMD Zen3
or later, Intel Tiger Lake or Sapphire Rapids and later) fails as follows:

  (XEN) livepatch: lp: Verifying enabled expectations for all functions
  (XEN) common/livepatch.c:1591: livepatch: lp: timeout is 30000000ns
  (XEN) common/livepatch.c:1703: livepatch: lp: CPU28 - IPIing the other 127 CPUs
  (XEN) livepatch: lp: Applying 1 functions
  (XEN) hi_func: Hi! (called 1 times)
  (XEN) Hook executing.
  (XEN) Assertion 'local_irq_is_enabled() || cpumask_subset(mask, cpumask_of(cpu))' failed at arch/x86/smp.c:265
  (XEN) *** DOUBLE FAULT ***
  <many double faults>

The assertion failure is from a global (system wide) TLB flush initiated by
modify_xen_mappings().  I'm not entirely sure when this broke, and I'm not
sure exactly what causes the #DF's, but it doesn't really matter either
because they highlight a latent bug that I'd overlooked with the CET-SS vs
patching work the first place.

While we're careful to arrange for the patching CPU to avoid encountering
non-shstk memory with transient shstk perms, other CPUs can pick these
mappings up too if they need to re-walk for uarch reasons.

Another bug is that for livepatching, we only disable CET if shadow stacks are
in use.  Running on Intel CET systems when Xen is only using CET-IBT will
crash in arch_livepatch_quiesce() when trying to clear CR0.WP with CR4.CET
still active.

Also, we never went and cleared the dirty bits on .rodata.  This would
matter (for the same reason it matters on .text - it becomes a valid target
for WRSS), but we never actually patch .rodata anyway.

Therefore rework how we do patching for both alternatives and livepatches.

Introduce modify_xen_mappings_lite() with a purpose similar to
modify_xen_mappings(), but stripped down to the bare minimum as it's used in
weird contexts.  Leave all complexity to the caller to handle.

Instead of patching by clearing CR0.WP (and having to jump through some
fragile hoops to disable CET in order to do this), just transiently relax the
permissions on .text via l2_identmap[].

Note that neither alternatives nor livepatching edit .rodata, so we don't need
to relax those permissions at this juncture.

The perms are relaxed globally, but is safe enough.  Alternatives run before
we boot APs, and Livepatching runs in a quiesced state where the other CPUs
are not doing anything interesting.

This approach is far more robust.

Fixes: 48cdc15a424f ("x86/alternatives: Clear CR4.CET when clearing CR0.WP")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>

v2:
 * Add a fixes tag
 * Put modify_xen_mappings_lite() in init_or_livepatch
 * Fix various comments

Pulling put_pte_flags() out of the loops in modify_xen_mappings_lite() halves
the size of the function.  The code generation of the typesafe pagetable
helpers is terrible, both because of flags needing a 32->64 expand, and
because of _PAGE_NX using cpu_has_nx behind the scene.  We really should
improve how all of this works.
---
 xen/arch/x86/alternative.c       | 45 +++++++++------------
 xen/arch/x86/livepatch.c         | 56 +++++++++++---------------
 xen/arch/x86/mm.c                | 68 ++++++++++++++++++++++++++++++++
 xen/common/virtual_region.c      | 22 ++++++++---
 xen/include/xen/mm.h             |  1 +
 xen/include/xen/virtual_region.h |  4 +-
 6 files changed, 129 insertions(+), 67 deletions(-)

Comments

Jan Beulich April 17, 2023, 1:59 p.m. UTC | #1
On 17.04.2023 15:52, Andrew Cooper wrote:
> Right now, trying to apply a livepatch on any system with CET shstk (AMD Zen3
> or later, Intel Tiger Lake or Sapphire Rapids and later) fails as follows:
> 
>   (XEN) livepatch: lp: Verifying enabled expectations for all functions
>   (XEN) common/livepatch.c:1591: livepatch: lp: timeout is 30000000ns
>   (XEN) common/livepatch.c:1703: livepatch: lp: CPU28 - IPIing the other 127 CPUs
>   (XEN) livepatch: lp: Applying 1 functions
>   (XEN) hi_func: Hi! (called 1 times)
>   (XEN) Hook executing.
>   (XEN) Assertion 'local_irq_is_enabled() || cpumask_subset(mask, cpumask_of(cpu))' failed at arch/x86/smp.c:265
>   (XEN) *** DOUBLE FAULT ***
>   <many double faults>
> 
> The assertion failure is from a global (system wide) TLB flush initiated by
> modify_xen_mappings().  I'm not entirely sure when this broke, and I'm not
> sure exactly what causes the #DF's, but it doesn't really matter either
> because they highlight a latent bug that I'd overlooked with the CET-SS vs
> patching work the first place.
> 
> While we're careful to arrange for the patching CPU to avoid encountering
> non-shstk memory with transient shstk perms, other CPUs can pick these
> mappings up too if they need to re-walk for uarch reasons.
> 
> Another bug is that for livepatching, we only disable CET if shadow stacks are
> in use.  Running on Intel CET systems when Xen is only using CET-IBT will
> crash in arch_livepatch_quiesce() when trying to clear CR0.WP with CR4.CET
> still active.
> 
> Also, we never went and cleared the dirty bits on .rodata.  This would
> matter (for the same reason it matters on .text - it becomes a valid target
> for WRSS), but we never actually patch .rodata anyway.
> 
> Therefore rework how we do patching for both alternatives and livepatches.
> 
> Introduce modify_xen_mappings_lite() with a purpose similar to
> modify_xen_mappings(), but stripped down to the bare minimum as it's used in
> weird contexts.  Leave all complexity to the caller to handle.
> 
> Instead of patching by clearing CR0.WP (and having to jump through some
> fragile hoops to disable CET in order to do this), just transiently relax the
> permissions on .text via l2_identmap[].
> 
> Note that neither alternatives nor livepatching edit .rodata, so we don't need
> to relax those permissions at this juncture.
> 
> The perms are relaxed globally, but is safe enough.  Alternatives run before
> we boot APs, and Livepatching runs in a quiesced state where the other CPUs
> are not doing anything interesting.
> 
> This approach is far more robust.
> 
> Fixes: 48cdc15a424f ("x86/alternatives: Clear CR4.CET when clearing CR0.WP")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

One further remark, though:

> @@ -5879,6 +5880,73 @@ int destroy_xen_mappings(unsigned long s, unsigned long e)
>      return modify_xen_mappings(s, e, _PAGE_NONE);
>  }
>  
> +/*
> + * Similar to modify_xen_mappings(), but used by the alternatives and
> + * livepatch in weird contexts.  All synchronization, TLB flushing, etc is the
> + * responsibility of the caller, and *MUST* not be introduced here.
> + *
> + * Must be limited to XEN_VIRT_{START,END}, i.e. over l2_xenmap[].
> + * Must be called with present flags, and over present mappings.
> + * Must be called on leaf page boundaries.

This last sentence, while wording-wise correct, could do with making more
explicit that it is the caller's responsibility to know whether large page
mappings are in use, due to ...

> + */
> +void init_or_livepatch modify_xen_mappings_lite(
> +    unsigned long s, unsigned long e, unsigned int _nf)
> +{
> +    unsigned long v = s, fm, nf;
> +
> +    /* Set of valid PTE bits which may be altered. */
> +#define FLAGS_MASK (_PAGE_NX|_PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_RW|_PAGE_PRESENT)
> +    fm = put_pte_flags(FLAGS_MASK);
> +    nf = put_pte_flags(_nf & FLAGS_MASK);
> +#undef FLAGS_MASK
> +
> +    ASSERT(nf & _PAGE_PRESENT);
> +    ASSERT(IS_ALIGNED(s, PAGE_SIZE) && s >= XEN_VIRT_START);
> +    ASSERT(IS_ALIGNED(e, PAGE_SIZE) && e <= XEN_VIRT_END);
> +
> +    while ( v < e )
> +    {
> +        l2_pgentry_t *pl2e = &l2_xenmap[l2_table_offset(v)];
> +        l2_pgentry_t l2e = l2e_read_atomic(pl2e);
> +        unsigned int l2f = l2e_get_flags(l2e);
> +
> +        ASSERT(l2f & _PAGE_PRESENT);
> +
> +        if ( l2e_get_flags(l2e) & _PAGE_PSE )
> +        {
> +            ASSERT(l1_table_offset(v) == 0);

... this.

Jan
Andrew Cooper April 17, 2023, 2:41 p.m. UTC | #2
On 17/04/2023 2:59 pm, Jan Beulich wrote:
> On 17.04.2023 15:52, Andrew Cooper wrote:
>> Right now, trying to apply a livepatch on any system with CET shstk (AMD Zen3
>> or later, Intel Tiger Lake or Sapphire Rapids and later) fails as follows:
>>
>>   (XEN) livepatch: lp: Verifying enabled expectations for all functions
>>   (XEN) common/livepatch.c:1591: livepatch: lp: timeout is 30000000ns
>>   (XEN) common/livepatch.c:1703: livepatch: lp: CPU28 - IPIing the other 127 CPUs
>>   (XEN) livepatch: lp: Applying 1 functions
>>   (XEN) hi_func: Hi! (called 1 times)
>>   (XEN) Hook executing.
>>   (XEN) Assertion 'local_irq_is_enabled() || cpumask_subset(mask, cpumask_of(cpu))' failed at arch/x86/smp.c:265
>>   (XEN) *** DOUBLE FAULT ***
>>   <many double faults>
>>
>> The assertion failure is from a global (system wide) TLB flush initiated by
>> modify_xen_mappings().  I'm not entirely sure when this broke, and I'm not
>> sure exactly what causes the #DF's, but it doesn't really matter either
>> because they highlight a latent bug that I'd overlooked with the CET-SS vs
>> patching work the first place.
>>
>> While we're careful to arrange for the patching CPU to avoid encountering
>> non-shstk memory with transient shstk perms, other CPUs can pick these
>> mappings up too if they need to re-walk for uarch reasons.
>>
>> Another bug is that for livepatching, we only disable CET if shadow stacks are
>> in use.  Running on Intel CET systems when Xen is only using CET-IBT will
>> crash in arch_livepatch_quiesce() when trying to clear CR0.WP with CR4.CET
>> still active.
>>
>> Also, we never went and cleared the dirty bits on .rodata.  This would
>> matter (for the same reason it matters on .text - it becomes a valid target
>> for WRSS), but we never actually patch .rodata anyway.
>>
>> Therefore rework how we do patching for both alternatives and livepatches.
>>
>> Introduce modify_xen_mappings_lite() with a purpose similar to
>> modify_xen_mappings(), but stripped down to the bare minimum as it's used in
>> weird contexts.  Leave all complexity to the caller to handle.
>>
>> Instead of patching by clearing CR0.WP (and having to jump through some
>> fragile hoops to disable CET in order to do this), just transiently relax the
>> permissions on .text via l2_identmap[].
>>
>> Note that neither alternatives nor livepatching edit .rodata, so we don't need
>> to relax those permissions at this juncture.
>>
>> The perms are relaxed globally, but is safe enough.  Alternatives run before
>> we boot APs, and Livepatching runs in a quiesced state where the other CPUs
>> are not doing anything interesting.
>>
>> This approach is far more robust.
>>
>> Fixes: 48cdc15a424f ("x86/alternatives: Clear CR4.CET when clearing CR0.WP")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

>
> One further remark, though:
>
>> @@ -5879,6 +5880,73 @@ int destroy_xen_mappings(unsigned long s, unsigned long e)
>>      return modify_xen_mappings(s, e, _PAGE_NONE);
>>  }
>>  
>> +/*
>> + * Similar to modify_xen_mappings(), but used by the alternatives and
>> + * livepatch in weird contexts.  All synchronization, TLB flushing, etc is the
>> + * responsibility of the caller, and *MUST* not be introduced here.
>> + *
>> + * Must be limited to XEN_VIRT_{START,END}, i.e. over l2_xenmap[].
>> + * Must be called with present flags, and over present mappings.
>> + * Must be called on leaf page boundaries.
> This last sentence, while wording-wise correct, could do with making more
> explicit that it is the caller's responsibility to know whether large page
> mappings are in use, due to ...

The meaning here is really "this doesn't shatter superpages", and this
was the most concise I could come up with.

Would ", i.e. won't shatter 2M pages." as a clarification work?

~Andrew
Jan Beulich April 17, 2023, 2:51 p.m. UTC | #3
On 17.04.2023 16:41, Andrew Cooper wrote:
> On 17/04/2023 2:59 pm, Jan Beulich wrote:
>> On 17.04.2023 15:52, Andrew Cooper wrote:
>>> @@ -5879,6 +5880,73 @@ int destroy_xen_mappings(unsigned long s, unsigned long e)
>>>      return modify_xen_mappings(s, e, _PAGE_NONE);
>>>  }
>>>  
>>> +/*
>>> + * Similar to modify_xen_mappings(), but used by the alternatives and
>>> + * livepatch in weird contexts.  All synchronization, TLB flushing, etc is the
>>> + * responsibility of the caller, and *MUST* not be introduced here.
>>> + *
>>> + * Must be limited to XEN_VIRT_{START,END}, i.e. over l2_xenmap[].
>>> + * Must be called with present flags, and over present mappings.
>>> + * Must be called on leaf page boundaries.
>> This last sentence, while wording-wise correct, could do with making more
>> explicit that it is the caller's responsibility to know whether large page
>> mappings are in use, due to ...
> 
> The meaning here is really "this doesn't shatter superpages", and this
> was the most concise I could come up with.
> 
> Would ", i.e. won't shatter 2M pages." as a clarification work?

Yes, that would definitely help. Nevertheless I was more after something
like "..., i.e. for 2M mappings on 2M boundaries." Which, thinking about
it, points out that while you have a respective check for the start
address, the full 2M page would be changed even if the end address wasn't
2M aligned (but fell in the middle of a 2M page).

Jan
Andrew Cooper April 17, 2023, 7:34 p.m. UTC | #4
On 17/04/2023 3:51 pm, Jan Beulich wrote:
> On 17.04.2023 16:41, Andrew Cooper wrote:
>> On 17/04/2023 2:59 pm, Jan Beulich wrote:
>>> On 17.04.2023 15:52, Andrew Cooper wrote:
>>>> @@ -5879,6 +5880,73 @@ int destroy_xen_mappings(unsigned long s, unsigned long e)
>>>>      return modify_xen_mappings(s, e, _PAGE_NONE);
>>>>  }
>>>>  
>>>> +/*
>>>> + * Similar to modify_xen_mappings(), but used by the alternatives and
>>>> + * livepatch in weird contexts.  All synchronization, TLB flushing, etc is the
>>>> + * responsibility of the caller, and *MUST* not be introduced here.
>>>> + *
>>>> + * Must be limited to XEN_VIRT_{START,END}, i.e. over l2_xenmap[].
>>>> + * Must be called with present flags, and over present mappings.
>>>> + * Must be called on leaf page boundaries.
>>> This last sentence, while wording-wise correct, could do with making more
>>> explicit that it is the caller's responsibility to know whether large page
>>> mappings are in use, due to ...
>> The meaning here is really "this doesn't shatter superpages", and this
>> was the most concise I could come up with.
>>
>> Would ", i.e. won't shatter 2M pages." as a clarification work?
> Yes, that would definitely help. Nevertheless I was more after something
> like "..., i.e. for 2M mappings on 2M boundaries." Which, thinking about
> it, points out that while you have a respective check for the start
> address, the full 2M page would be changed even if the end address wasn't
> 2M aligned (but fell in the middle of a 2M page).

There's no nice way to check for because a range that starts on a 4k
non-2M boundary can legitimately end on a 2M boundary at 4k granularity.

How about ", i.e. s and e must not be in the middle of a superpage." then?

~Andrew
Jan Beulich April 18, 2023, 5:58 a.m. UTC | #5
On 17.04.2023 21:34, Andrew Cooper wrote:
> On 17/04/2023 3:51 pm, Jan Beulich wrote:
>> On 17.04.2023 16:41, Andrew Cooper wrote:
>>> On 17/04/2023 2:59 pm, Jan Beulich wrote:
>>>> On 17.04.2023 15:52, Andrew Cooper wrote:
>>>>> @@ -5879,6 +5880,73 @@ int destroy_xen_mappings(unsigned long s, unsigned long e)
>>>>>      return modify_xen_mappings(s, e, _PAGE_NONE);
>>>>>  }
>>>>>  
>>>>> +/*
>>>>> + * Similar to modify_xen_mappings(), but used by the alternatives and
>>>>> + * livepatch in weird contexts.  All synchronization, TLB flushing, etc is the
>>>>> + * responsibility of the caller, and *MUST* not be introduced here.
>>>>> + *
>>>>> + * Must be limited to XEN_VIRT_{START,END}, i.e. over l2_xenmap[].
>>>>> + * Must be called with present flags, and over present mappings.
>>>>> + * Must be called on leaf page boundaries.
>>>> This last sentence, while wording-wise correct, could do with making more
>>>> explicit that it is the caller's responsibility to know whether large page
>>>> mappings are in use, due to ...
>>> The meaning here is really "this doesn't shatter superpages", and this
>>> was the most concise I could come up with.
>>>
>>> Would ", i.e. won't shatter 2M pages." as a clarification work?
>> Yes, that would definitely help. Nevertheless I was more after something
>> like "..., i.e. for 2M mappings on 2M boundaries." Which, thinking about
>> it, points out that while you have a respective check for the start
>> address, the full 2M page would be changed even if the end address wasn't
>> 2M aligned (but fell in the middle of a 2M page).
> 
> There's no nice way to check for because a range that starts on a 4k
> non-2M boundary can legitimately end on a 2M boundary at 4k granularity.

How about

        if ( l2e_get_flags(l2e) & _PAGE_PSE )
        {
            ASSERT(l1_table_offset(v) == 0);
            ASSERT(e - v >= (1UL << L2_PAGETABLE_SHIFT));

?

> How about ", i.e. s and e must not be in the middle of a superpage." then?

That sounds good, thanks.

Jan
Andrew Cooper April 18, 2023, 8:28 a.m. UTC | #6
On 18/04/2023 6:58 am, Jan Beulich wrote:
> On 17.04.2023 21:34, Andrew Cooper wrote:
>> On 17/04/2023 3:51 pm, Jan Beulich wrote:
>>> On 17.04.2023 16:41, Andrew Cooper wrote:
>>>> On 17/04/2023 2:59 pm, Jan Beulich wrote:
>>>>> On 17.04.2023 15:52, Andrew Cooper wrote:
>>>>>> @@ -5879,6 +5880,73 @@ int destroy_xen_mappings(unsigned long s, unsigned long e)
>>>>>>      return modify_xen_mappings(s, e, _PAGE_NONE);
>>>>>>  }
>>>>>>  
>>>>>> +/*
>>>>>> + * Similar to modify_xen_mappings(), but used by the alternatives and
>>>>>> + * livepatch in weird contexts.  All synchronization, TLB flushing, etc is the
>>>>>> + * responsibility of the caller, and *MUST* not be introduced here.
>>>>>> + *
>>>>>> + * Must be limited to XEN_VIRT_{START,END}, i.e. over l2_xenmap[].
>>>>>> + * Must be called with present flags, and over present mappings.
>>>>>> + * Must be called on leaf page boundaries.
>>>>> This last sentence, while wording-wise correct, could do with making more
>>>>> explicit that it is the caller's responsibility to know whether large page
>>>>> mappings are in use, due to ...
>>>> The meaning here is really "this doesn't shatter superpages", and this
>>>> was the most concise I could come up with.
>>>>
>>>> Would ", i.e. won't shatter 2M pages." as a clarification work?
>>> Yes, that would definitely help. Nevertheless I was more after something
>>> like "..., i.e. for 2M mappings on 2M boundaries." Which, thinking about
>>> it, points out that while you have a respective check for the start
>>> address, the full 2M page would be changed even if the end address wasn't
>>> 2M aligned (but fell in the middle of a 2M page).
>> There's no nice way to check for because a range that starts on a 4k
>> non-2M boundary can legitimately end on a 2M boundary at 4k granularity.
> How about
>
>         if ( l2e_get_flags(l2e) & _PAGE_PSE )
>         {
>             ASSERT(l1_table_offset(v) == 0);
>             ASSERT(e - v >= (1UL << L2_PAGETABLE_SHIFT));

Ah, not as bad as I feared.  I'll include this, and post a v3 for
completeness.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 2383fa66294c..99482766b51f 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -382,24 +382,28 @@  static int __init cf_check nmi_apply_alternatives(
      */
     if ( !(alt_done & alt_todo) )
     {
-        unsigned long cr0, cr4;
-
-        cr0 = read_cr0();
-        cr4 = read_cr4();
-
-        if ( cr4 & X86_CR4_CET )
-            write_cr4(cr4 & ~X86_CR4_CET);
-
-        /* Disable WP to allow patching read-only pages. */
-        write_cr0(cr0 & ~X86_CR0_WP);
+        /*
+         * Relax perms on .text to be RWX, so we can modify them.
+         *
+         * This relaxes perms globally, but we run ahead of bringing APs
+         * online, so only have our own TLB to worry about.
+         */
+        modify_xen_mappings_lite(XEN_VIRT_START + MB(2),
+                                 (unsigned long)&__2M_text_end,
+                                 PAGE_HYPERVISOR_RWX);
+        flush_local(FLUSH_TLB_GLOBAL);
 
         _apply_alternatives(__alt_instructions, __alt_instructions_end,
                             alt_done);
 
-        write_cr0(cr0);
-
-        if ( cr4 & X86_CR4_CET )
-            write_cr4(cr4);
+        /*
+         * Reinstate perms on .text to be RX.  This also cleans out the dirty
+         * bits, which matters when CET Shstk is active.
+         */
+        modify_xen_mappings_lite(XEN_VIRT_START + MB(2),
+                                 (unsigned long)&__2M_text_end,
+                                 PAGE_HYPERVISOR_RX);
+        flush_local(FLUSH_TLB_GLOBAL);
 
         alt_done |= alt_todo;
     }
@@ -454,19 +458,6 @@  static void __init _alternative_instructions(bool force)
         panic("Timed out waiting for alternatives self-NMI to hit\n");
 
     set_nmi_callback(saved_nmi_callback);
-
-    /*
-     * When Xen is using shadow stacks, the alternatives clearing CR0.WP and
-     * writing into the mappings set dirty bits, turning the mappings into
-     * shadow stack mappings.
-     *
-     * While we can execute from them, this would also permit them to be the
-     * target of WRSS instructions, so reset the dirty after patching.
-     */
-    if ( cpu_has_xen_shstk )
-        modify_xen_mappings(XEN_VIRT_START + MB(2),
-                            (unsigned long)&__2M_text_end,
-                            PAGE_HYPERVISOR_RX);
 }
 
 void __init alternative_instructions(void)
diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
index f2d783fdc567..a54d991c5f0f 100644
--- a/xen/arch/x86/livepatch.c
+++ b/xen/arch/x86/livepatch.c
@@ -61,46 +61,32 @@  int arch_livepatch_safety_check(void)
 
 int noinline arch_livepatch_quiesce(void)
 {
-    /* If Shadow Stacks are in use, disable CR4.CET so we can modify CR0.WP. */
-    if ( cpu_has_xen_shstk )
-        write_cr4(read_cr4() & ~X86_CR4_CET);
-
-    /* Disable WP to allow changes to read-only pages. */
-    write_cr0(read_cr0() & ~X86_CR0_WP);
+    /*
+     * Relax perms on .text to be RWX, so we can modify them.
+     *
+     * This relaxes perms globally, but all other CPUs are waiting on us.
+     */
+    relax_virtual_region_perms();
+    flush_local(FLUSH_TLB_GLOBAL);
 
     return 0;
 }
 
 void noinline arch_livepatch_revive(void)
 {
-    /* Reinstate WP. */
-    write_cr0(read_cr0() | X86_CR0_WP);
-
-    /* Clobber dirty bits and reinstate CET, if applicable. */
-    if ( IS_ENABLED(CONFIG_XEN_SHSTK) && cpu_has_xen_shstk )
-    {
-        unsigned long tmp;
-
-        reset_virtual_region_perms();
-
-        write_cr4(read_cr4() | X86_CR4_CET);
-
-        /*
-         * Fix up the return address on the shadow stack, which currently
-         * points at arch_livepatch_quiesce()'s caller.
-         *
-         * Note: this is somewhat fragile, and depends on both
-         * arch_livepatch_{quiesce,revive}() being called from the same
-         * function, which is currently the case.
-         *
-         * Any error will result in Xen dying with #CP, and its too late to
-         * recover in any way.
-         */
-        asm volatile ("rdsspq %[ssp];"
-                      "wrssq %[addr], (%[ssp]);"
-                      : [ssp] "=&r" (tmp)
-                      : [addr] "r" (__builtin_return_address(0)));
-    }
+    /*
+     * Reinstate perms on .text to be RX.  This also cleans out the dirty
+     * bits, which matters when CET Shstk is active.
+     *
+     * The other CPUs waiting for us could in principle have re-walked while
+     * we were patching and cached the reduced perms in their TLB.  Therefore,
+     * we need to do a global TLB flush.
+     *
+     * However, we can't use Xen's normal global TLB flush infrastructure, so
+     * delay the TLB flush to arch_livepatch_post_action(), which is called on
+     * all CPUs (including us) on the way out of patching.
+     */
+    tighten_virtual_region_perms();
 }
 
 int arch_livepatch_verify_func(const struct livepatch_func *func)
@@ -197,6 +183,8 @@  void noinline arch_livepatch_revert(const struct livepatch_func *func)
  */
 void noinline arch_livepatch_post_action(void)
 {
+    /* See arch_livepatch_revive() */
+    flush_local(FLUSH_TLB_GLOBAL);
 }
 
 static nmi_callback_t *saved_nmi_callback;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 36a07ef77eae..46df495352e9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -91,6 +91,7 @@ 
 #include <xen/ioreq.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
+#include <xen/livepatch.h>
 #include <xen/mm.h>
 #include <xen/param.h>
 #include <xen/domain.h>
@@ -5879,6 +5880,73 @@  int destroy_xen_mappings(unsigned long s, unsigned long e)
     return modify_xen_mappings(s, e, _PAGE_NONE);
 }
 
+/*
+ * Similar to modify_xen_mappings(), but used by the alternatives and
+ * livepatch in weird contexts.  All synchronization, TLB flushing, etc is the
+ * responsibility of the caller, and *MUST* not be introduced here.
+ *
+ * Must be limited to XEN_VIRT_{START,END}, i.e. over l2_xenmap[].
+ * Must be called with present flags, and over present mappings.
+ * Must be called on leaf page boundaries.
+ */
+void init_or_livepatch modify_xen_mappings_lite(
+    unsigned long s, unsigned long e, unsigned int _nf)
+{
+    unsigned long v = s, fm, nf;
+
+    /* Set of valid PTE bits which may be altered. */
+#define FLAGS_MASK (_PAGE_NX|_PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_RW|_PAGE_PRESENT)
+    fm = put_pte_flags(FLAGS_MASK);
+    nf = put_pte_flags(_nf & FLAGS_MASK);
+#undef FLAGS_MASK
+
+    ASSERT(nf & _PAGE_PRESENT);
+    ASSERT(IS_ALIGNED(s, PAGE_SIZE) && s >= XEN_VIRT_START);
+    ASSERT(IS_ALIGNED(e, PAGE_SIZE) && e <= XEN_VIRT_END);
+
+    while ( v < e )
+    {
+        l2_pgentry_t *pl2e = &l2_xenmap[l2_table_offset(v)];
+        l2_pgentry_t l2e = l2e_read_atomic(pl2e);
+        unsigned int l2f = l2e_get_flags(l2e);
+
+        ASSERT(l2f & _PAGE_PRESENT);
+
+        if ( l2e_get_flags(l2e) & _PAGE_PSE )
+        {
+            ASSERT(l1_table_offset(v) == 0);
+
+            l2e_write_atomic(pl2e, l2e_from_intpte((l2e.l2 & ~fm) | nf));
+
+            v += 1UL << L2_PAGETABLE_SHIFT;
+            continue;
+        }
+
+        /* else descend to l1 */
+        {
+            l1_pgentry_t *pl1t = map_l1t_from_l2e(l2e);
+
+            while ( v < e )
+            {
+                l1_pgentry_t *pl1e = &pl1t[l1_table_offset(v)];
+                l1_pgentry_t l1e = l1e_read_atomic(pl1e);
+                unsigned int l1f = l1e_get_flags(l1e);
+
+                ASSERT(l1f & _PAGE_PRESENT);
+
+                l1e_write_atomic(pl1e, l1e_from_intpte((l1e.l1 & ~fm) | nf));
+
+                v += 1UL << L1_PAGETABLE_SHIFT;
+
+                if ( l2_table_offset(v) == 0 )
+                    break;
+            }
+
+            unmap_domain_page(pl1t);
+        }
+    }
+}
+
 void __set_fixmap(
     enum fixed_addresses idx, unsigned long mfn, unsigned long flags)
 {
diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
index 5ecdba9c08ed..ddac5c9147e5 100644
--- a/xen/common/virtual_region.c
+++ b/xen/common/virtual_region.c
@@ -92,16 +92,28 @@  void unregister_virtual_region(struct virtual_region *r)
     remove_virtual_region(r);
 }
 
-#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_XEN_SHSTK)
-void reset_virtual_region_perms(void)
+#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_X86)
+void relax_virtual_region_perms(void)
 {
     const struct virtual_region *region;
 
     rcu_read_lock(&rcu_virtual_region_lock);
     list_for_each_entry_rcu( region, &virtual_region_list, list )
-        modify_xen_mappings((unsigned long)region->start,
-                            ROUNDUP((unsigned long)region->end, PAGE_SIZE),
-                            PAGE_HYPERVISOR_RX);
+        modify_xen_mappings_lite((unsigned long)region->start,
+                                 ROUNDUP((unsigned long)region->end, PAGE_SIZE),
+                                 PAGE_HYPERVISOR_RWX);
+    rcu_read_unlock(&rcu_virtual_region_lock);
+}
+
+void tighten_virtual_region_perms(void)
+{
+    const struct virtual_region *region;
+
+    rcu_read_lock(&rcu_virtual_region_lock);
+    list_for_each_entry_rcu( region, &virtual_region_list, list )
+        modify_xen_mappings_lite((unsigned long)region->start,
+                                 ROUNDUP((unsigned long)region->end, PAGE_SIZE),
+                                 PAGE_HYPERVISOR_RX);
     rcu_read_unlock(&rcu_virtual_region_lock);
 }
 #endif
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 9d14aed74baa..b0dc3ba9c98d 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -100,6 +100,7 @@  int map_pages_to_xen(
     unsigned int flags);
 /* Alter the permissions of a range of Xen virtual address space. */
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags);
+void modify_xen_mappings_lite(unsigned long s, unsigned long e, unsigned int flags);
 int destroy_xen_mappings(unsigned long s, unsigned long e);
 /* Retrieve the MFN mapped by VA in Xen virtual address space. */
 mfn_t xen_map_to_mfn(unsigned long va);
diff --git a/xen/include/xen/virtual_region.h b/xen/include/xen/virtual_region.h
index ba408eb87a1a..d05362071135 100644
--- a/xen/include/xen/virtual_region.h
+++ b/xen/include/xen/virtual_region.h
@@ -33,7 +33,9 @@  void setup_virtual_regions(const struct exception_table_entry *start,
 void unregister_init_virtual_region(void);
 void register_virtual_region(struct virtual_region *r);
 void unregister_virtual_region(struct virtual_region *r);
-void reset_virtual_region_perms(void);
+
+void relax_virtual_region_perms(void);
+void tighten_virtual_region_perms(void);
 
 #endif /* __XEN_VIRTUAL_REGION_H__ */