x86/mm: Do not split_large_page() for set_kernel_text_rw()
diff mbox series

Message ID 20190823052335.572133-1-songliubraving@fb.com
State New
Headers show
Series
  • x86/mm: Do not split_large_page() for set_kernel_text_rw()
Related show

Commit Message

Song Liu Aug. 23, 2019, 5:23 a.m. UTC
As 4k pages check was removed from cpa [1], set_kernel_text_rw() leads to
split_large_page() for all kernel text pages. This means a single kprobe
will put all kernel text in 4k pages:

  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
  0xffffffff81000000-0xffffffff82400000     20M  ro    PSE      x  pmd

  root@ ~# echo ONE_KPROBE >> /sys/kernel/debug/tracing/kprobe_events
  root@ ~# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable

  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
  0xffffffff81000000-0xffffffff82400000     20M  ro             x  pte

To fix this issue, introduce CPA_FLIP_TEXT_RW to bypass "Text RO" check
in static_protections().

Two helper functions set_text_rw() and set_text_ro() are added to flip
_PAGE_RW bit for kernel text.

[1] commit 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")

Fixes: 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")
Cc: stable@vger.kernel.org  # v4.20+
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 arch/x86/mm/init_64.c     |  4 ++--
 arch/x86/mm/mm_internal.h |  4 ++++
 arch/x86/mm/pageattr.c    | 34 +++++++++++++++++++++++++---------
 3 files changed, 31 insertions(+), 11 deletions(-)

Comments

Peter Zijlstra Aug. 23, 2019, 9:36 a.m. UTC | #1
On Thu, Aug 22, 2019 at 10:23:35PM -0700, Song Liu wrote:
> As 4k pages check was removed from cpa [1], set_kernel_text_rw() leads to
> split_large_page() for all kernel text pages. This means a single kprobe
> will put all kernel text in 4k pages:
> 
>   root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
>   0xffffffff81000000-0xffffffff82400000     20M  ro    PSE      x  pmd
> 
>   root@ ~# echo ONE_KPROBE >> /sys/kernel/debug/tracing/kprobe_events
>   root@ ~# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
> 
>   root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
>   0xffffffff81000000-0xffffffff82400000     20M  ro             x  pte
> 
> To fix this issue, introduce CPA_FLIP_TEXT_RW to bypass "Text RO" check
> in static_protections().
> 
> Two helper functions set_text_rw() and set_text_ro() are added to flip
> _PAGE_RW bit for kernel text.
> 
> [1] commit 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")

ARGH; so this is because ftrace flips the whole kernel range to RW and
back for giggles? I'm thinking _that_ is a bug, it's a clear W^X
violation.
Song Liu Aug. 26, 2019, 4:40 a.m. UTC | #2
Cc: Steven Rostedt and Suresh Siddha

Hi Peter, 

> On Aug 23, 2019, at 2:36 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, Aug 22, 2019 at 10:23:35PM -0700, Song Liu wrote:
>> As 4k pages check was removed from cpa [1], set_kernel_text_rw() leads to
>> split_large_page() for all kernel text pages. This means a single kprobe
>> will put all kernel text in 4k pages:
>> 
>>  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
>>  0xffffffff81000000-0xffffffff82400000     20M  ro    PSE      x  pmd
>> 
>>  root@ ~# echo ONE_KPROBE >> /sys/kernel/debug/tracing/kprobe_events
>>  root@ ~# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
>> 
>>  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
>>  0xffffffff81000000-0xffffffff82400000     20M  ro             x  pte
>> 
>> To fix this issue, introduce CPA_FLIP_TEXT_RW to bypass "Text RO" check
>> in static_protections().
>> 
>> Two helper functions set_text_rw() and set_text_ro() are added to flip
>> _PAGE_RW bit for kernel text.
>> 
>> [1] commit 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")
> 
> ARGH; so this is because ftrace flips the whole kernel range to RW and
> back for giggles? I'm thinking _that_ is a bug, it's a clear W^X
> violation.

Thanks for your comments. Yes, it is related to ftrace, as we have
CONFIG_KPROBES_ON_FTRACE. However, after digging around, I am not sure
what is the expected behavior. 

Kernel text region has two mappings to it. For x86_64 and four-level 
page table, there are: 

	1. kernel identity mapping, from 0xffff888000100000; 
	2. kernel text mapping, from 0xffffffff81000000, 

Per comments in arch/x86/mm/init_64.c:set_kernel_text_rw():

        /*
         * Make the kernel identity mapping for text RW. Kernel text
         * mapping will always be RO. Refer to the comment in
         * static_protections() in pageattr.c
         */
	set_memory_rw(start, (end - start) >> PAGE_SHIFT);

kprobe (with CONFIG_KPROBES_ON_FTRACE) should work on kernel identity
mapping. 

However, my experiment shows that kprobe actually operates on the 
kernel text mapping (0xffffffff81000000-). It is the same w/ and w/o 
CONFIG_KPROBES_ON_FTRACE. Therefore, I am not sure whether the comment
is out-dated (10-year old), or the kprobe is doing something wrong. 


More information about the issue we are looking at. 

We found with 5.2 kernel (no CONFIG_PAGE_TABLE_ISOLATION, w/ 
CONFIG_KPROBES_ON_FTRACE), a single kprobe will split _all_ PMDs in 
kernel text mapping into pte-mapped pages. This increases iTLB 
miss rate from about 300 per million instructions to about 700 per
million instructions (for the application I test with). 

Per bisect, we found this behavior happens after commit 585948f4f695 
("x86/mm/cpa: Avoid the 4k pages check completely"). That's why I 
proposed this PATCH to fix/workaround this issue. However, per
Peter's comment and my study of the code, this doesn't seem the 
real problem or the only here. 

I also tested that the PMD split issue doesn't happen w/o 
CONFIG_KPROBES_ON_FTRACE. 


In summary, I have the following questions:

1. Which mapping should kprobe work on? Kernel identity mapping or 
   kernel text mapping?
2. FTRACE causes split of PMD mapped kernel text. How should we fix
   this? 

Thanks,
Song
Peter Zijlstra Aug. 26, 2019, 9:23 a.m. UTC | #3
On Mon, Aug 26, 2019 at 04:40:23AM +0000, Song Liu wrote:
> Cc: Steven Rostedt and Suresh Siddha
> 
> Hi Peter, 
> 
> > On Aug 23, 2019, at 2:36 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > On Thu, Aug 22, 2019 at 10:23:35PM -0700, Song Liu wrote:
> >> As 4k pages check was removed from cpa [1], set_kernel_text_rw() leads to
> >> split_large_page() for all kernel text pages. This means a single kprobe
> >> will put all kernel text in 4k pages:
> >> 
> >>  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
> >>  0xffffffff81000000-0xffffffff82400000     20M  ro    PSE      x  pmd
> >> 
> >>  root@ ~# echo ONE_KPROBE >> /sys/kernel/debug/tracing/kprobe_events
> >>  root@ ~# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
> >> 
> >>  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
> >>  0xffffffff81000000-0xffffffff82400000     20M  ro             x  pte
> >> 
> >> To fix this issue, introduce CPA_FLIP_TEXT_RW to bypass "Text RO" check
> >> in static_protections().
> >> 
> >> Two helper functions set_text_rw() and set_text_ro() are added to flip
> >> _PAGE_RW bit for kernel text.
> >> 
> >> [1] commit 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")
> > 
> > ARGH; so this is because ftrace flips the whole kernel range to RW and
> > back for giggles? I'm thinking _that_ is a bug, it's a clear W^X
> > violation.
> 
> Thanks for your comments. Yes, it is related to ftrace, as we have
> CONFIG_KPROBES_ON_FTRACE. However, after digging around, I am not sure
> what is the expected behavior.

It changed recently; that is we got a lot more strict wrt W^X mappings.
IIRC ftrace is the only known violator of W^X at this time.

> Kernel text region has two mappings to it. For x86_64 and four-level 
> page table, there are: 
> 
> 	1. kernel identity mapping, from 0xffff888000100000; 
> 	2. kernel text mapping, from 0xffffffff81000000, 

Right; AFAICT this is so that kernel text fits in s32 immediates.

> Per comments in arch/x86/mm/init_64.c:set_kernel_text_rw():
> 
>         /*
>          * Make the kernel identity mapping for text RW. Kernel text
>          * mapping will always be RO. Refer to the comment in
>          * static_protections() in pageattr.c
>          */
> 	set_memory_rw(start, (end - start) >> PAGE_SHIFT);

So only the high mapping is ever executable; the identity map should not
be. Both should be RO.

> kprobe (with CONFIG_KPROBES_ON_FTRACE) should work on kernel identity
> mapping. 

Please provide more information; kprobes shouldn't be touching either
mapping. That is, afaict kprobes uses text_poke() which uses a temporary
mapping (in 'userspace' even) to alias the high text mapping.

I'm also not sure how it would then result in any 4k text maps. Yes the
alias is 4k, but it should not affect the actual high text map in any
way.

kprobes also allocates executable slots, but it does that in the module
range (afaict), so that, again, should not affect the high text mapping.

> We found with 5.2 kernel (no CONFIG_PAGE_TABLE_ISOLATION, w/ 
> CONFIG_KPROBES_ON_FTRACE), a single kprobe will split _all_ PMDs in 
> kernel text mapping into pte-mapped pages. This increases iTLB 
> miss rate from about 300 per million instructions to about 700 per
> million instructions (for the application I test with). 
> 
> Per bisect, we found this behavior happens after commit 585948f4f695 
> ("x86/mm/cpa: Avoid the 4k pages check completely"). That's why I 
> proposed this PATCH to fix/workaround this issue. However, per
> Peter's comment and my study of the code, this doesn't seem the 
> real problem or the only here. 
> 
> I also tested that the PMD split issue doesn't happen w/o 
> CONFIG_KPROBES_ON_FTRACE. 

Right, because then ftrace doesn't flip the whole kernel map writable;
which it _really_ should stop doing anyway.

But I'm still wondering what causes that first 4k split...
Steven Rostedt Aug. 26, 2019, 11:33 a.m. UTC | #4
On Fri, 23 Aug 2019 11:36:37 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Aug 22, 2019 at 10:23:35PM -0700, Song Liu wrote:
> > As 4k pages check was removed from cpa [1], set_kernel_text_rw() leads to
> > split_large_page() for all kernel text pages. This means a single kprobe
> > will put all kernel text in 4k pages:
> > 
> >   root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
> >   0xffffffff81000000-0xffffffff82400000     20M  ro    PSE      x  pmd
> > 
> >   root@ ~# echo ONE_KPROBE >> /sys/kernel/debug/tracing/kprobe_events
> >   root@ ~# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
> > 
> >   root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
> >   0xffffffff81000000-0xffffffff82400000     20M  ro             x  pte
> > 
> > To fix this issue, introduce CPA_FLIP_TEXT_RW to bypass "Text RO" check
> > in static_protections().
> > 
> > Two helper functions set_text_rw() and set_text_ro() are added to flip
> > _PAGE_RW bit for kernel text.
> > 
> > [1] commit 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")  
> 
> ARGH; so this is because ftrace flips the whole kernel range to RW and
> back for giggles? I'm thinking _that_ is a bug, it's a clear W^X
> violation.

Since ftrace did this way before text_poke existed and way before
anybody cared (back in 2007), it's not really a bug.

Anyway, I believe Nadav has some patches that converts ftrace to use
the shadow page modification trick somewhere.

Or we also need the text_poke batch processing (did that get upstream?).

Mapping in 40,000 pages one at a time is noticeable from a human stand
point.

-- Steve
Peter Zijlstra Aug. 26, 2019, 12:44 p.m. UTC | #5
On Mon, Aug 26, 2019 at 07:33:08AM -0400, Steven Rostedt wrote:
> Anyway, I believe Nadav has some patches that converts ftrace to use
> the shadow page modification trick somewhere.
> 
> Or we also need the text_poke batch processing (did that get upstream?).

It did. And I just did that patch; I'll send out in a bit.

It seems to work, but this is the very first time I've looked at this
code.
Song Liu Aug. 26, 2019, 3:08 p.m. UTC | #6
> On Aug 26, 2019, at 2:23 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> So only the high mapping is ever executable; the identity map should not
> be. Both should be RO.
> 
>> kprobe (with CONFIG_KPROBES_ON_FTRACE) should work on kernel identity
>> mapping. 
> 
> Please provide more information; kprobes shouldn't be touching either
> mapping. That is, afaict kprobes uses text_poke() which uses a temporary
> mapping (in 'userspace' even) to alias the high text mapping.

kprobe without CONFIG_KPROBES_ON_FTRACE uses text_poke(). But kprobe with
CONFIG_KPROBES_ON_FTRACE uses another path. The split happens with
set_kernel_text_rw() -> ... -> __change_page_attr() -> split_large_page().
The split is introduced by commit 585948f4f695. do_split in 
__change_page_attr() becomes true after commit 585948f4f695. This patch 
tries to fix/workaround this part. 

> 
> I'm also not sure how it would then result in any 4k text maps. Yes the
> alias is 4k, but it should not affect the actual high text map in any
> way.

I am confused by the alias logic. set_kernel_text_rw() makes the high map
rw, and split the PMD in the high map. 

> 
> kprobes also allocates executable slots, but it does that in the module
> range (afaict), so that, again, should not affect the high text mapping.
> 
>> We found with 5.2 kernel (no CONFIG_PAGE_TABLE_ISOLATION, w/ 
>> CONFIG_KPROBES_ON_FTRACE), a single kprobe will split _all_ PMDs in 
>> kernel text mapping into pte-mapped pages. This increases iTLB 
>> miss rate from about 300 per million instructions to about 700 per
>> million instructions (for the application I test with). 
>> 
>> Per bisect, we found this behavior happens after commit 585948f4f695 
>> ("x86/mm/cpa: Avoid the 4k pages check completely"). That's why I 
>> proposed this PATCH to fix/workaround this issue. However, per
>> Peter's comment and my study of the code, this doesn't seem the 
>> real problem or the only here. 
>> 
>> I also tested that the PMD split issue doesn't happen w/o 
>> CONFIG_KPROBES_ON_FTRACE. 
> 
> Right, because then ftrace doesn't flip the whole kernel map writable;
> which it _really_ should stop doing anyway.
> 
> But I'm still wondering what causes that first 4k split...

Please see above. 

Thanks,
Song
Nadav Amit Aug. 26, 2019, 3:41 p.m. UTC | #7
> On Aug 26, 2019, at 4:33 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Fri, 23 Aug 2019 11:36:37 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Thu, Aug 22, 2019 at 10:23:35PM -0700, Song Liu wrote:
>>> As 4k pages check was removed from cpa [1], set_kernel_text_rw() leads to
>>> split_large_page() for all kernel text pages. This means a single kprobe
>>> will put all kernel text in 4k pages:
>>> 
>>>  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
>>>  0xffffffff81000000-0xffffffff82400000     20M  ro    PSE      x  pmd
>>> 
>>>  root@ ~# echo ONE_KPROBE >> /sys/kernel/debug/tracing/kprobe_events
>>>  root@ ~# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
>>> 
>>>  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
>>>  0xffffffff81000000-0xffffffff82400000     20M  ro             x  pte
>>> 
>>> To fix this issue, introduce CPA_FLIP_TEXT_RW to bypass "Text RO" check
>>> in static_protections().
>>> 
>>> Two helper functions set_text_rw() and set_text_ro() are added to flip
>>> _PAGE_RW bit for kernel text.
>>> 
>>> [1] commit 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")  
>> 
>> ARGH; so this is because ftrace flips the whole kernel range to RW and
>> back for giggles? I'm thinking _that_ is a bug, it's a clear W^X
>> violation.
> 
> Since ftrace did this way before text_poke existed and way before
> anybody cared (back in 2007), it's not really a bug.
> 
> Anyway, I believe Nadav has some patches that converts ftrace to use
> the shadow page modification trick somewhere.

For the record - here is my previous patch:
https://lkml.org/lkml/2018/12/5/211
Steven Rostedt Aug. 26, 2019, 3:56 p.m. UTC | #8
On Mon, 26 Aug 2019 15:41:24 +0000
Nadav Amit <namit@vmware.com> wrote:

> > Anyway, I believe Nadav has some patches that converts ftrace to use
> > the shadow page modification trick somewhere.  
> 
> For the record - here is my previous patch:
> https://lkml.org/lkml/2018/12/5/211

FYI, when referencing older patches, please use lkml.kernel.org or
lore.kernel.org, lkml.org is slow and obsolete.

ie. http://lkml.kernel.org/r/20181205013408.47725-9-namit@vmware.com

-- Steve
Peter Zijlstra Aug. 26, 2019, 3:56 p.m. UTC | #9
On Mon, Aug 26, 2019 at 03:41:24PM +0000, Nadav Amit wrote:

> For the record - here is my previous patch:
> https://lkml.org/lkml/2018/12/5/211

Thanks!
Nadav Amit Aug. 26, 2019, 4:09 p.m. UTC | #10
> On Aug 26, 2019, at 8:56 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> On Mon, 26 Aug 2019 15:41:24 +0000
> Nadav Amit <namit@vmware.com> wrote:
> 
>>> Anyway, I believe Nadav has some patches that converts ftrace to use
>>> the shadow page modification trick somewhere.  
>> 
>> For the record - here is my previous patch:
>> https://lkml.org/lkml/2018/12/5/211
> 
> FYI, when referencing older patches, please use lkml.kernel.org or
> lore.kernel.org, lkml.org is slow and obsolete.
> 
> ie. http://lkml.kernel.org/r/20181205013408.47725-9-namit@vmware.com

Will do so next time.
Song Liu Aug. 26, 2019, 8:50 p.m. UTC | #11
> On Aug 26, 2019, at 8:08 AM, Song Liu <songliubraving@fb.com> wrote:
> 
> 
> 
>> On Aug 26, 2019, at 2:23 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>> So only the high mapping is ever executable; the identity map should not
>> be. Both should be RO.
>> 
>>> kprobe (with CONFIG_KPROBES_ON_FTRACE) should work on kernel identity
>>> mapping. 
>> 
>> Please provide more information; kprobes shouldn't be touching either
>> mapping. That is, afaict kprobes uses text_poke() which uses a temporary
>> mapping (in 'userspace' even) to alias the high text mapping.
> 
> kprobe without CONFIG_KPROBES_ON_FTRACE uses text_poke(). But kprobe with
> CONFIG_KPROBES_ON_FTRACE uses another path. The split happens with
> set_kernel_text_rw() -> ... -> __change_page_attr() -> split_large_page().
> The split is introduced by commit 585948f4f695. do_split in 
> __change_page_attr() becomes true after commit 585948f4f695. This patch 
> tries to fix/workaround this part. 
> 
>> 
>> I'm also not sure how it would then result in any 4k text maps. Yes the
>> alias is 4k, but it should not affect the actual high text map in any
>> way.
> 
> I am confused by the alias logic. set_kernel_text_rw() makes the high map
> rw, and split the PMD in the high map. 
> 
>> 
>> kprobes also allocates executable slots, but it does that in the module
>> range (afaict), so that, again, should not affect the high text mapping.
>> 
>>> We found with 5.2 kernel (no CONFIG_PAGE_TABLE_ISOLATION, w/ 
>>> CONFIG_KPROBES_ON_FTRACE), a single kprobe will split _all_ PMDs in 
>>> kernel text mapping into pte-mapped pages. This increases iTLB 
>>> miss rate from about 300 per million instructions to about 700 per
>>> million instructions (for the application I test with). 
>>> 
>>> Per bisect, we found this behavior happens after commit 585948f4f695 
>>> ("x86/mm/cpa: Avoid the 4k pages check completely"). That's why I 
>>> proposed this PATCH to fix/workaround this issue. However, per
>>> Peter's comment and my study of the code, this doesn't seem the 
>>> real problem or the only here. 
>>> 
>>> I also tested that the PMD split issue doesn't happen w/o 
>>> CONFIG_KPROBES_ON_FTRACE. 
>> 
>> Right, because then ftrace doesn't flip the whole kernel map writable;
>> which it _really_ should stop doing anyway.
>> 
>> But I'm still wondering what causes that first 4k split...
> 
> Please see above. 

Another data point: we can repro the issue on Linus's master with just
ftrace:

# start with PMD mapped
root@virt-test:~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
0xffffffff81000000-0xffffffff81c00000          12M     ro         PSE         x  pmd

# enable single ftrace
root@virt-test:~# echo consume_skb > /sys/kernel/debug/tracing/set_ftrace_filter
root@virt-test:~# echo function > /sys/kernel/debug/tracing/current_tracer

# now the text is PTE mapped
root@virt-test:~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
0xffffffff81000000-0xffffffff81c00000          12M     ro                     x  pte

Song

Patch
diff mbox series

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a6b5c653727b..5745fdcc429e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1276,7 +1276,7 @@  void set_kernel_text_rw(void)
 	 * mapping will always be RO. Refer to the comment in
 	 * static_protections() in pageattr.c
 	 */
-	set_memory_rw(start, (end - start) >> PAGE_SHIFT);
+	set_text_rw(start, (end - start) >> PAGE_SHIFT);
 }
 
 void set_kernel_text_ro(void)
@@ -1293,7 +1293,7 @@  void set_kernel_text_ro(void)
 	/*
 	 * Set the kernel identity mapping for text RO.
 	 */
-	set_memory_ro(start, (end - start) >> PAGE_SHIFT);
+	set_text_ro(start, (end - start) >> PAGE_SHIFT);
 }
 
 void mark_rodata_ro(void)
diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h
index eeae142062ed..65b84b471770 100644
--- a/arch/x86/mm/mm_internal.h
+++ b/arch/x86/mm/mm_internal.h
@@ -24,4 +24,8 @@  void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache);
 
 extern unsigned long tlb_single_page_flush_ceiling;
 
+int set_text_rw(unsigned long addr, int numpages);
+
+int set_text_ro(unsigned long addr, int numpages);
+
 #endif	/* __X86_MM_INTERNAL_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 6a9a77a403c9..44a885df776d 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -66,6 +66,7 @@  static DEFINE_SPINLOCK(cpa_lock);
 #define CPA_ARRAY 2
 #define CPA_PAGES_ARRAY 4
 #define CPA_NO_CHECK_ALIAS 8 /* Do not search for aliases */
+#define CPA_FLIP_TEXT_RW 0x10 /* allow flip _PAGE_RW for kernel text */
 
 #ifdef CONFIG_PROC_FS
 static unsigned long direct_pages_count[PG_LEVEL_NUM];
@@ -516,7 +517,7 @@  static inline void check_conflict(int warnlvl, pgprot_t prot, pgprotval_t val,
  */
 static inline pgprot_t static_protections(pgprot_t prot, unsigned long start,
 					  unsigned long pfn, unsigned long npg,
-					  int warnlvl)
+					  int warnlvl, unsigned int cpa_flags)
 {
 	pgprotval_t forbidden, res;
 	unsigned long end;
@@ -535,9 +536,11 @@  static inline pgprot_t static_protections(pgprot_t prot, unsigned long start,
 	check_conflict(warnlvl, prot, res, start, end, pfn, "Text NX");
 	forbidden = res;
 
-	res = protect_kernel_text_ro(start, end);
-	check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO");
-	forbidden |= res;
+	if (!(cpa_flags & CPA_FLIP_TEXT_RW)) {
+		res = protect_kernel_text_ro(start, end);
+		check_conflict(warnlvl, prot, res, start, end, pfn, "Text RO");
+		forbidden |= res;
+	}
 
 	/* Check the PFN directly */
 	res = protect_pci_bios(pfn, pfn + npg - 1);
@@ -819,7 +822,7 @@  static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	 * extra conditional required here.
 	 */
 	chk_prot = static_protections(old_prot, lpaddr, old_pfn, numpages,
-				      CPA_CONFLICT);
+				      CPA_CONFLICT, cpa->flags);
 
 	if (WARN_ON_ONCE(pgprot_val(chk_prot) != pgprot_val(old_prot))) {
 		/*
@@ -855,7 +858,7 @@  static int __should_split_large_page(pte_t *kpte, unsigned long address,
 	 * protection requirement in the large page.
 	 */
 	new_prot = static_protections(req_prot, lpaddr, old_pfn, numpages,
-				      CPA_DETECT);
+				      CPA_DETECT, cpa->flags);
 
 	/*
 	 * If there is a conflict, split the large page.
@@ -906,7 +909,7 @@  static void split_set_pte(struct cpa_data *cpa, pte_t *pte, unsigned long pfn,
 	if (!cpa->force_static_prot)
 		goto set;
 
-	prot = static_protections(ref_prot, address, pfn, npg, CPA_PROTECT);
+	prot = static_protections(ref_prot, address, pfn, npg, CPA_PROTECT, 0);
 
 	if (pgprot_val(prot) == pgprot_val(ref_prot))
 		goto set;
@@ -1504,7 +1507,7 @@  static int __change_page_attr(struct cpa_data *cpa, int primary)
 
 		cpa_inc_4k_install();
 		new_prot = static_protections(new_prot, address, pfn, 1,
-					      CPA_PROTECT);
+					      CPA_PROTECT, 0);
 
 		new_prot = pgprot_clear_protnone_bits(new_prot);
 
@@ -1707,7 +1710,7 @@  static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 	cpa.curpage = 0;
 	cpa.force_split = force_split;
 
-	if (in_flag & (CPA_ARRAY | CPA_PAGES_ARRAY))
+	if (in_flag & (CPA_ARRAY | CPA_PAGES_ARRAY | CPA_FLIP_TEXT_RW))
 		cpa.flags |= in_flag;
 
 	/* No alias checking for _NX bit modifications */
@@ -1983,11 +1986,24 @@  int set_memory_ro(unsigned long addr, int numpages)
 	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0);
 }
 
+int set_text_ro(unsigned long addr, int numpages)
+{
+	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
+					__pgprot(_PAGE_RW), 0, CPA_FLIP_TEXT_RW,
+					NULL);
+}
+
 int set_memory_rw(unsigned long addr, int numpages)
 {
 	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_RW), 0);
 }
 
+int set_text_rw(unsigned long addr, int numpages)
+{
+	return change_page_attr_set_clr(&addr, numpages, __pgprot(_PAGE_RW),
+					__pgprot(0), 0, CPA_FLIP_TEXT_RW, NULL);
+}
+
 int set_memory_np(unsigned long addr, int numpages)
 {
 	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);