diff mbox

Bug report about KASLR and ZONE_MOVABLE

Message ID 20180712235240.GH2070@MiWiFi-R3L-srv (mailing list archive)
State New, archived
Headers show

Commit Message

Baoquan He July 12, 2018, 11:52 p.m. UTC
Hi Michal,

On 07/12/18 at 02:32pm, Michal Hocko wrote:
> On Thu 12-07-18 14:01:15, Chao Fan wrote:
> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
> > >Hi Baoquan,
> > >
> > >At 07/11/2018 08:40 PM, Baoquan He wrote:
> > >> Please try this v3 patch:
> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
> > >> From: Baoquan He <bhe@redhat.com>
> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800
> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
> > >> 
> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
> > >> PFN movable zone begins in each node, kernel text position is not
> > >> considered. KASLR may put kernel after which movable zone begins.
> > >> 
> > >> Fix it by finding movable zone after kernel text on that node.
> > >> 
> > >> Signed-off-by: Baoquan He <bhe@redhat.com>
> > >
> > >
> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the
> > >last node, or more.
> > 
> > I think it may not fail.
> > There is a 'restart' to do another pass.
> > 
> > >
> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying
> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
> > >this in the KASLR side.
> > >
> > 
> > I think it's better to fix here, but not KASLR side.
> > Cause much more code will be change if doing it in KASLR side.
> > Since we didn't parse 'kernelcore' in compressed code, and you can see
> > the distribution of ZONE_MOVABLE need so much code, so we do not need
> > to do so much job in KASLR side. But here, several lines will be OK.
> 
> I am not able to find the beginning of the email thread right now. Could
> you summarize what is the actual problem please?

The bug is found on x86 now. 

When added "kernelcore=" or "movablecore=" into kernel command line,
kernel memory is spread evenly among nodes. However, this is right when
KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
If KASLR enabled, it could be put any place from 16M to 64T randomly.
 
Consider a scenario, we have 10 nodes, and each node has 20G memory, and
we specify "kernelcore=50%", means each node will take 10G for
kernelcore, 10G for movable area. But this doesn't take kernel position
into consideration. E.g if kernel is put at 15G of 2nd node, namely
node1. Then we think on node1 there's 10G for kernelcore, 10G for
movable, in fact there's only 5G available for movable, just after
kernel.

I made a v4 patch which possibly can fix it.


From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001
From: Baoquan He <bhe@redhat.com>
Date: Fri, 13 Jul 2018 07:49:29 +0800
Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text

In find_zone_movable_pfns_for_nodes(), when try to find the starting
PFN movable zone begins at in each node, kernel text position is not
considered. KASLR may put kernel after which movable zone begins.

Fix it by finding movable zone after kernel text on that node.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/page_alloc.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Chao Fan July 13, 2018, 1:44 a.m. UTC | #1
On Fri, Jul 13, 2018 at 07:52:40AM +0800, Baoquan He wrote:
>Hi Michal,
>
>On 07/12/18 at 02:32pm, Michal Hocko wrote:
>> On Thu 12-07-18 14:01:15, Chao Fan wrote:
>> > On Thu, Jul 12, 2018 at 01:49:49PM +0800, Dou Liyang wrote:
>> > >Hi Baoquan,
>> > >
>> > >At 07/11/2018 08:40 PM, Baoquan He wrote:
>> > >> Please try this v3 patch:
>> > >> >>From 9850d3de9c02e570dc7572069a9749a8add4c4c7 Mon Sep 17 00:00:00 2001
>> > >> From: Baoquan He <bhe@redhat.com>
>> > >> Date: Wed, 11 Jul 2018 20:31:51 +0800
>> > >> Subject: [PATCH v3] mm, page_alloc: find movable zone after kernel text
>> > >> 
>> > >> In find_zone_movable_pfns_for_nodes(), when try to find the starting
>> > >> PFN movable zone begins in each node, kernel text position is not
>> > >> considered. KASLR may put kernel after which movable zone begins.
>> > >> 
>> > >> Fix it by finding movable zone after kernel text on that node.
>> > >> 
>> > >> Signed-off-by: Baoquan He <bhe@redhat.com>
>> > >
>> > >
>> > >You fix this in the _zone_init side_. This may make the 'kernelcore=' or
>> > >'movablecore=' failed if the KASLR puts the kernel back the tail of the
>> > >last node, or more.
>> > 
>> > I think it may not fail.
>> > There is a 'restart' to do another pass.
>> > 
>> > >
>> > >Due to we have fix the mirror memory in KASLR side, and Chao is trying
>> > >to fix the 'movable_node' in KASLR side. Have you had a chance to fix
>> > >this in the KASLR side.
>> > >
>> > 
>> > I think it's better to fix here, but not KASLR side.
>> > Cause much more code will be change if doing it in KASLR side.
>> > Since we didn't parse 'kernelcore' in compressed code, and you can see
>> > the distribution of ZONE_MOVABLE need so much code, so we do not need
>> > to do so much job in KASLR side. But here, several lines will be OK.
>> 
>> I am not able to find the beginning of the email thread right now. Could
>> you summarize what is the actual problem please?
>
>The bug is found on x86 now. 
>
>When added "kernelcore=" or "movablecore=" into kernel command line,
>kernel memory is spread evenly among nodes. However, this is right when
>KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
>If KASLR enabled, it could be put any place from 16M to 64T randomly.
> 
>Consider a scenario, we have 10 nodes, and each node has 20G memory, and
>we specify "kernelcore=50%", means each node will take 10G for
>kernelcore, 10G for movable area. But this doesn't take kernel position
>into consideration. E.g if kernel is put at 15G of 2nd node, namely
>node1. Then we think on node1 there's 10G for kernelcore, 10G for
>movable, in fact there's only 5G available for movable, just after
>kernel.
>
>I made a v4 patch which possibly can fix it.
>
>
>From dbcac3631863aed556dc2c4ff1839772dfd02d18 Mon Sep 17 00:00:00 2001
>From: Baoquan He <bhe@redhat.com>
>Date: Fri, 13 Jul 2018 07:49:29 +0800
>Subject: [PATCH v4] mm, page_alloc: find movable zone after kernel text
>
>In find_zone_movable_pfns_for_nodes(), when try to find the starting
>PFN movable zone begins at in each node, kernel text position is not
>considered. KASLR may put kernel after which movable zone begins.
>
>Fix it by finding movable zone after kernel text on that node.
>
>Signed-off-by: Baoquan He <bhe@redhat.com>

You can post it as alone PATCH, then I will test it next week.

Thanks,
Chao Fan

>---
> mm/page_alloc.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 1521100f1e63..5bc1a47dafda 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -6547,7 +6547,7 @@ static unsigned long __init early_calculate_totalpages(void)
> static void __init find_zone_movable_pfns_for_nodes(void)
> {
> 	int i, nid;
>-	unsigned long usable_startpfn;
>+	unsigned long usable_startpfn, kernel_endpfn, arch_startpfn;
> 	unsigned long kernelcore_node, kernelcore_remaining;
> 	/* save the state before borrow the nodemask */
> 	nodemask_t saved_node_state = node_states[N_MEMORY];
>@@ -6649,8 +6649,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
> 	if (!required_kernelcore || required_kernelcore >= totalpages)
> 		goto out;
> 
>+	kernel_endpfn = PFN_UP(__pa_symbol(_end));
> 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
>-	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
>+	arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
> 
> restart:
> 	/* Spread kernelcore memory as evenly as possible throughout nodes */
>@@ -6659,6 +6660,16 @@ static void __init find_zone_movable_pfns_for_nodes(void)
> 		unsigned long start_pfn, end_pfn;
> 
> 		/*
>+		 * KASLR may put kernel near tail of node memory,
>+		 * start after kernel on that node to find PFN
>+		 * at which zone begins.
>+		 */
>+		if (pfn_to_nid(kernel_endpfn) == nid)
>+		        usable_startpfn = max(arch_startpfn, kernel_endpfn);
>+		else
>+		        usable_startpfn = arch_startpfn;
>+
>+		/*
> 		 * Recalculate kernelcore_node if the division per node
> 		 * now exceeds what is necessary to satisfy the requested
> 		 * amount of memory for the kernel
>-- 
>2.13.6
>
>
>
Michal Hocko July 16, 2018, 11:38 a.m. UTC | #2
On Fri 13-07-18 07:52:40, Baoquan He wrote:
> Hi Michal,
> 
> On 07/12/18 at 02:32pm, Michal Hocko wrote:
[...]
> > I am not able to find the beginning of the email thread right now. Could
> > you summarize what is the actual problem please?
> 
> The bug is found on x86 now. 
> 
> When added "kernelcore=" or "movablecore=" into kernel command line,
> kernel memory is spread evenly among nodes. However, this is right when
> KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> If KASLR enabled, it could be put any place from 16M to 64T randomly.
>  
> Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> we specify "kernelcore=50%", means each node will take 10G for
> kernelcore, 10G for movable area. But this doesn't take kernel position
> into consideration. E.g if kernel is put at 15G of 2nd node, namely
> node1. Then we think on node1 there's 10G for kernelcore, 10G for
> movable, in fact there's only 5G available for movable, just after
> kernel.

OK, I guess I see that part. But who is going to use movablecore along
with KASLR enabled? I mean do we really have to support those two
obscure command line parameters for KASLR?

In fact I would be much more concerned about memory hotplug and
pre-defined movable nodes. Does the current KASLR code work in that
case?
Baoquan He July 16, 2018, 1:02 p.m. UTC | #3
On 07/16/18 at 01:38pm, Michal Hocko wrote:
> On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > Hi Michal,
> > 
> > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> [...]
> > > I am not able to find the beginning of the email thread right now. Could
> > > you summarize what is the actual problem please?
> > 
> > The bug is found on x86 now. 
> > 
> > When added "kernelcore=" or "movablecore=" into kernel command line,
> > kernel memory is spread evenly among nodes. However, this is right when
> > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> >  
> > Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> > we specify "kernelcore=50%", means each node will take 10G for
> > kernelcore, 10G for movable area. But this doesn't take kernel position
> > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > movable, in fact there's only 5G available for movable, just after
> > kernel.
> 
> OK, I guess I see that part. But who is going to use movablecore along
> with KASLR enabled? I mean do we really have to support those two
> obscure command line parameters for KASLR?

Not very sure whether we have to support both of those to work with
KASLR. Maybe it's time to make clear of it now.

For 'kernelcore=mirror', we have solved the conflict to make it work well
with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
patches to fix it. As for 'kernelcore=' and 'movablecore=', 

1) solve the conflict between them with KASLR in
   find_zone_movable_pfns_for_nodes();
2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
4) add note in doc to notice people to not add them at the same time;

2) and 3) may need be fixed in arch/x86 code. As long as come to an
agreement, any one is fine to me.
> 
> In fact I would be much more concerned about memory hotplug and
> pre-defined movable nodes. Does the current KASLR code work in that
> case?

As said above, kernelcore=mirror works well with KASLR now. Making
'movable_node' work with KASLR is in progress.

Thanks
Baoquan
Michal Hocko July 16, 2018, 3:24 p.m. UTC | #4
On Mon 16-07-18 21:02:02, Baoquan He wrote:
> On 07/16/18 at 01:38pm, Michal Hocko wrote:
> > On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > > Hi Michal,
> > > 
> > > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> > [...]
> > > > I am not able to find the beginning of the email thread right now. Could
> > > > you summarize what is the actual problem please?
> > > 
> > > The bug is found on x86 now. 
> > > 
> > > When added "kernelcore=" or "movablecore=" into kernel command line,
> > > kernel memory is spread evenly among nodes. However, this is right when
> > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> > >  
> > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> > > we specify "kernelcore=50%", means each node will take 10G for
> > > kernelcore, 10G for movable area. But this doesn't take kernel position
> > > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > > movable, in fact there's only 5G available for movable, just after
> > > kernel.
> > 
> > OK, I guess I see that part. But who is going to use movablecore along
> > with KASLR enabled? I mean do we really have to support those two
> > obscure command line parameters for KASLR?
> 
> Not very sure whether we have to support both of those to work with
> KASLR. Maybe it's time to make clear of it now.

Yes, I would really like to deprecate this. It is an ugly piece of code
and it's far from easily maintainable as well.

> For 'kernelcore=mirror', we have solved the conflict to make it work well
> with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
> patches to fix it. As for 'kernelcore=' and 'movablecore=', 
> 
> 1) solve the conflict between them with KASLR in
>    find_zone_movable_pfns_for_nodes();
> 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
> 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
> 4) add note in doc to notice people to not add them at the same time;

I would simply warn that those kernel parameters are not supported
anymore. If somebody shows up with a valid usecase we can reconsider.

> 2) and 3) may need be fixed in arch/x86 code. As long as come to an
> agreement, any one is fine to me.
> > 
> > In fact I would be much more concerned about memory hotplug and
> > pre-defined movable nodes. Does the current KASLR code work in that
> > case?
> 
> As said above, kernelcore=mirror works well with KASLR now. Making
> 'movable_node' work with KASLR is in progress.

OK, thanks for the info.
Baoquan He July 17, 2018, 1:51 a.m. UTC | #5
On 07/16/18 at 05:24pm, Michal Hocko wrote:
> On Mon 16-07-18 21:02:02, Baoquan He wrote:
> > On 07/16/18 at 01:38pm, Michal Hocko wrote:
> > > On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > > > Hi Michal,
> > > > 
> > > > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> > > [...]
> > > > > I am not able to find the beginning of the email thread right now. Could
> > > > > you summarize what is the actual problem please?
> > > > 
> > > > The bug is found on x86 now. 
> > > > 
> > > > When added "kernelcore=" or "movablecore=" into kernel command line,
> > > > kernel memory is spread evenly among nodes. However, this is right when
> > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > > > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> > > >  
> > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> > > > we specify "kernelcore=50%", means each node will take 10G for
> > > > kernelcore, 10G for movable area. But this doesn't take kernel position
> > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > > > movable, in fact there's only 5G available for movable, just after
> > > > kernel.
> > > 
> > > OK, I guess I see that part. But who is going to use movablecore along
> > > with KASLR enabled? I mean do we really have to support those two
> > > obscure command line parameters for KASLR?
> > 
> > Not very sure whether we have to support both of those to work with
> > KASLR. Maybe it's time to make clear of it now.
> 
> Yes, I would really like to deprecate this. It is an ugly piece of code
> and it's far from easily maintainable as well.
> 
> > For 'kernelcore=mirror', we have solved the conflict to make it work well
> > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
> > patches to fix it. As for 'kernelcore=' and 'movablecore=', 
> > 
> > 1) solve the conflict between them with KASLR in
> >    find_zone_movable_pfns_for_nodes();
> > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
> > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
> > 4) add note in doc to notice people to not add them at the same time;
> 
> I would simply warn that those kernel parameters are not supported
> anymore. If somebody shows up with a valid usecase we can reconsider.

OK, got it. The use case I can think of is that people want to check 
hotplug on system w/o hotplug ACPI info.

I am fine with warning people they are not supported. Should I post a
patch to address this, or you will do it? Both is fine to me.

> 
> > 2) and 3) may need be fixed in arch/x86 code. As long as come to an
> > agreement, any one is fine to me.
> > > 
> > > In fact I would be much more concerned about memory hotplug and
> > > pre-defined movable nodes. Does the current KASLR code work in that
> > > case?
> > 
> > As said above, kernelcore=mirror works well with KASLR now. Making
> > 'movable_node' work with KASLR is in progress.
> 
> OK, thanks for the info.

You are welcome.

Thanks
Baoquan
Michal Hocko July 17, 2018, 8:22 a.m. UTC | #6
On Tue 17-07-18 09:51:20, Baoquan He wrote:
> On 07/16/18 at 05:24pm, Michal Hocko wrote:
> > On Mon 16-07-18 21:02:02, Baoquan He wrote:
> > > On 07/16/18 at 01:38pm, Michal Hocko wrote:
> > > > On Fri 13-07-18 07:52:40, Baoquan He wrote:
> > > > > Hi Michal,
> > > > > 
> > > > > On 07/12/18 at 02:32pm, Michal Hocko wrote:
> > > > [...]
> > > > > > I am not able to find the beginning of the email thread right now. Could
> > > > > > you summarize what is the actual problem please?
> > > > > 
> > > > > The bug is found on x86 now. 
> > > > > 
> > > > > When added "kernelcore=" or "movablecore=" into kernel command line,
> > > > > kernel memory is spread evenly among nodes. However, this is right when
> > > > > KASLR is not enabled, then kernel will be at 16M of place in x86 arch.
> > > > > If KASLR enabled, it could be put any place from 16M to 64T randomly.
> > > > >  
> > > > > Consider a scenario, we have 10 nodes, and each node has 20G memory, and
> > > > > we specify "kernelcore=50%", means each node will take 10G for
> > > > > kernelcore, 10G for movable area. But this doesn't take kernel position
> > > > > into consideration. E.g if kernel is put at 15G of 2nd node, namely
> > > > > node1. Then we think on node1 there's 10G for kernelcore, 10G for
> > > > > movable, in fact there's only 5G available for movable, just after
> > > > > kernel.
> > > > 
> > > > OK, I guess I see that part. But who is going to use movablecore along
> > > > with KASLR enabled? I mean do we really have to support those two
> > > > obscure command line parameters for KASLR?
> > > 
> > > Not very sure whether we have to support both of those to work with
> > > KASLR. Maybe it's time to make clear of it now.
> > 
> > Yes, I would really like to deprecate this. It is an ugly piece of code
> > and it's far from easily maintainable as well.
> > 
> > > For 'kernelcore=mirror', we have solved the conflict to make it work well
> > > with KASLR. For 'movable_node' conflict with KASLR, Chao is posting
> > > patches to fix it. As for 'kernelcore=' and 'movablecore=', 
> > > 
> > > 1) solve the conflict between them with KASLR in
> > >    find_zone_movable_pfns_for_nodes();
> > > 2) disable KASLR when 'kernelcore=' | 'movablecore=' is set;
> > > 3) disable 'kernelcore=' | 'movablecore=' when KASLR is enabled;
> > > 4) add note in doc to notice people to not add them at the same time;
> > 
> > I would simply warn that those kernel parameters are not supported
> > anymore. If somebody shows up with a valid usecase we can reconsider.
> 
> OK, got it. The use case I can think of is that people want to check 
> hotplug on system w/o hotplug ACPI info.

Let's see whether there is really somebody like that and complain.
 
> I am fine with warning people they are not supported. Should I post a
> patch to address this, or you will do it? Both is fine to me.

I will happily ack it if you create a patch.
diff mbox

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1521100f1e63..5bc1a47dafda 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6547,7 +6547,7 @@  static unsigned long __init early_calculate_totalpages(void)
 static void __init find_zone_movable_pfns_for_nodes(void)
 {
 	int i, nid;
-	unsigned long usable_startpfn;
+	unsigned long usable_startpfn, kernel_endpfn, arch_startpfn;
 	unsigned long kernelcore_node, kernelcore_remaining;
 	/* save the state before borrow the nodemask */
 	nodemask_t saved_node_state = node_states[N_MEMORY];
@@ -6649,8 +6649,9 @@  static void __init find_zone_movable_pfns_for_nodes(void)
 	if (!required_kernelcore || required_kernelcore >= totalpages)
 		goto out;
 
+	kernel_endpfn = PFN_UP(__pa_symbol(_end));
 	/* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */
-	usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
+	arch_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
 
 restart:
 	/* Spread kernelcore memory as evenly as possible throughout nodes */
@@ -6659,6 +6660,16 @@  static void __init find_zone_movable_pfns_for_nodes(void)
 		unsigned long start_pfn, end_pfn;
 
 		/*
+		 * KASLR may put kernel near tail of node memory,
+		 * start after kernel on that node to find PFN
+		 * at which zone begins.
+		 */
+		if (pfn_to_nid(kernel_endpfn) == nid)
+		        usable_startpfn = max(arch_startpfn, kernel_endpfn);
+		else
+		        usable_startpfn = arch_startpfn;
+
+		/*
 		 * Recalculate kernelcore_node if the division per node
 		 * now exceeds what is necessary to satisfy the requested
 		 * amount of memory for the kernel