diff mbox

[v2] Call xen_cleanhighmap() with 4MB aligned for page tables mapping

Message ID 85bd42d5-b0d2-40f5-81a9-14cb51ec4503@default (mailing list archive)
State New, archived
Headers show

Commit Message

Zhenzhong Duan Sept. 27, 2017, 9:41 a.m. UTC
When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
mapping overlaps with kernel module virtual space. When mapping in this space
is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
finish at 2MB boundary.

When module loading is just on top of the 2MB space, got below warning:

WARNING: at mm/vmalloc.c:106 vmap_pte_range+0x14e/0x190()
Call Trace:
 [<ffffffff81117083>] warn_alloc_failed+0xf3/0x160
 [<ffffffff81146022>] __vmalloc_area_node+0x182/0x1c0
 [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
 [<ffffffff81145df7>] __vmalloc_node_range+0xa7/0x110
 [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
 [<ffffffff8103ca54>] module_alloc+0x64/0x70
 [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
 [<ffffffff810ac91e>] module_alloc_update_bounds+0x1e/0x80
 [<ffffffff810ac9a7>] move_module+0x27/0x150
 [<ffffffff810aefa0>] layout_and_allocate+0x120/0x1b0
 [<ffffffff810af0a8>] load_module+0x78/0x640
 [<ffffffff811ff90b>] ? security_file_permission+0x8b/0x90
 [<ffffffff810af6d2>] sys_init_module+0x62/0x1e0
 [<ffffffff815154c2>] system_call_fastpath+0x16/0x1b

Then the mapping of 2MB is cleared, finally oops when the page in that space is
accessed.

BUG: unable to handle kernel paging request at ffff880022600000
IP: [<ffffffff81260877>] clear_page_c_e+0x7/0x10
PGD 1788067 PUD 178c067 PMD 22434067 PTE 0
Oops: 0002 [#1] SMP
Call Trace:
 [<ffffffff81116ef7>] ? prep_new_page+0x127/0x1c0
 [<ffffffff81117d42>] get_page_from_freelist+0x1e2/0x550
 [<ffffffff81133010>] ? ii_iovec_copy_to_user+0x90/0x140
 [<ffffffff81119c9d>] __alloc_pages_nodemask+0x12d/0x230
 [<ffffffff81155516>] alloc_pages_vma+0xc6/0x1a0
 [<ffffffff81006ffd>] ? pte_mfn_to_pfn+0x7d/0x100
 [<ffffffff81134cfb>] do_anonymous_page+0x16b/0x350
 [<ffffffff81139c34>] handle_pte_fault+0x1e4/0x200
 [<ffffffff8100712e>] ? xen_pmd_val+0xe/0x10
 [<ffffffff810052c9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81139dab>] handle_mm_fault+0x15b/0x270
 [<ffffffff81510c10>] do_page_fault+0x140/0x470
 [<ffffffff8150d7d5>] page_fault+0x25/0x30

Call xen_cleanhighmap() with 4MB aligned for page tables mapping to fix it.
The unnecessory call of xen_cleanhighmap() in DEBUG mode is also removed.

-v2: add comment about XEN alignment from Juergen.

References: https://lists.xen.org/archives/html/xen-devel/2012-07/msg01562.html
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
---
 arch/x86/xen/mmu_pv.c |   13 ++++---------
 1 files changed, 4 insertions(+), 9 deletions(-)

Comments

Juergen Gross Sept. 27, 2017, 9:43 a.m. UTC | #1
On 27/09/17 11:41, Zhenzhong Duan wrote:
> When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
> mapping overlaps with kernel module virtual space. When mapping in this space
> is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
> left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
> finish at 2MB boundary.
> 
> When module loading is just on top of the 2MB space, got below warning:
> 
> WARNING: at mm/vmalloc.c:106 vmap_pte_range+0x14e/0x190()
> Call Trace:
>  [<ffffffff81117083>] warn_alloc_failed+0xf3/0x160
>  [<ffffffff81146022>] __vmalloc_area_node+0x182/0x1c0
>  [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
>  [<ffffffff81145df7>] __vmalloc_node_range+0xa7/0x110
>  [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
>  [<ffffffff8103ca54>] module_alloc+0x64/0x70
>  [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
>  [<ffffffff810ac91e>] module_alloc_update_bounds+0x1e/0x80
>  [<ffffffff810ac9a7>] move_module+0x27/0x150
>  [<ffffffff810aefa0>] layout_and_allocate+0x120/0x1b0
>  [<ffffffff810af0a8>] load_module+0x78/0x640
>  [<ffffffff811ff90b>] ? security_file_permission+0x8b/0x90
>  [<ffffffff810af6d2>] sys_init_module+0x62/0x1e0
>  [<ffffffff815154c2>] system_call_fastpath+0x16/0x1b
> 
> Then the mapping of 2MB is cleared, finally oops when the page in that space is
> accessed.
> 
> BUG: unable to handle kernel paging request at ffff880022600000
> IP: [<ffffffff81260877>] clear_page_c_e+0x7/0x10
> PGD 1788067 PUD 178c067 PMD 22434067 PTE 0
> Oops: 0002 [#1] SMP
> Call Trace:
>  [<ffffffff81116ef7>] ? prep_new_page+0x127/0x1c0
>  [<ffffffff81117d42>] get_page_from_freelist+0x1e2/0x550
>  [<ffffffff81133010>] ? ii_iovec_copy_to_user+0x90/0x140
>  [<ffffffff81119c9d>] __alloc_pages_nodemask+0x12d/0x230
>  [<ffffffff81155516>] alloc_pages_vma+0xc6/0x1a0
>  [<ffffffff81006ffd>] ? pte_mfn_to_pfn+0x7d/0x100
>  [<ffffffff81134cfb>] do_anonymous_page+0x16b/0x350
>  [<ffffffff81139c34>] handle_pte_fault+0x1e4/0x200
>  [<ffffffff8100712e>] ? xen_pmd_val+0xe/0x10
>  [<ffffffff810052c9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
>  [<ffffffff81139dab>] handle_mm_fault+0x15b/0x270
>  [<ffffffff81510c10>] do_page_fault+0x140/0x470
>  [<ffffffff8150d7d5>] page_fault+0x25/0x30
> 
> Call xen_cleanhighmap() with 4MB aligned for page tables mapping to fix it.
> The unnecessory call of xen_cleanhighmap() in DEBUG mode is also removed.
> 
> -v2: add comment about XEN alignment from Juergen.
> 
> References: https://lists.xen.org/archives/html/xen-devel/2012-07/msg01562.html
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen
Boris Ostrovsky Sept. 27, 2017, 1:38 p.m. UTC | #2
On 09/27/2017 05:43 AM, Juergen Gross wrote:
> On 27/09/17 11:41, Zhenzhong Duan wrote:
>> When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
>> mapping overlaps with kernel module virtual space. When mapping in this space
>> is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
>> left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
>> finish at 2MB boundary.

Does this mapping need to be 4MB-aligned?

(I also think this should go to stable trees)

-boris

>>
>> When module loading is just on top of the 2MB space, got below warning:
>>
>> WARNING: at mm/vmalloc.c:106 vmap_pte_range+0x14e/0x190()
>> Call Trace:
>>  [<ffffffff81117083>] warn_alloc_failed+0xf3/0x160
>>  [<ffffffff81146022>] __vmalloc_area_node+0x182/0x1c0
>>  [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
>>  [<ffffffff81145df7>] __vmalloc_node_range+0xa7/0x110
>>  [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
>>  [<ffffffff8103ca54>] module_alloc+0x64/0x70
>>  [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
>>  [<ffffffff810ac91e>] module_alloc_update_bounds+0x1e/0x80
>>  [<ffffffff810ac9a7>] move_module+0x27/0x150
>>  [<ffffffff810aefa0>] layout_and_allocate+0x120/0x1b0
>>  [<ffffffff810af0a8>] load_module+0x78/0x640
>>  [<ffffffff811ff90b>] ? security_file_permission+0x8b/0x90
>>  [<ffffffff810af6d2>] sys_init_module+0x62/0x1e0
>>  [<ffffffff815154c2>] system_call_fastpath+0x16/0x1b
>>
>> Then the mapping of 2MB is cleared, finally oops when the page in that space is
>> accessed.
>>
>> BUG: unable to handle kernel paging request at ffff880022600000
>> IP: [<ffffffff81260877>] clear_page_c_e+0x7/0x10
>> PGD 1788067 PUD 178c067 PMD 22434067 PTE 0
>> Oops: 0002 [#1] SMP
>> Call Trace:
>>  [<ffffffff81116ef7>] ? prep_new_page+0x127/0x1c0
>>  [<ffffffff81117d42>] get_page_from_freelist+0x1e2/0x550
>>  [<ffffffff81133010>] ? ii_iovec_copy_to_user+0x90/0x140
>>  [<ffffffff81119c9d>] __alloc_pages_nodemask+0x12d/0x230
>>  [<ffffffff81155516>] alloc_pages_vma+0xc6/0x1a0
>>  [<ffffffff81006ffd>] ? pte_mfn_to_pfn+0x7d/0x100
>>  [<ffffffff81134cfb>] do_anonymous_page+0x16b/0x350
>>  [<ffffffff81139c34>] handle_pte_fault+0x1e4/0x200
>>  [<ffffffff8100712e>] ? xen_pmd_val+0xe/0x10
>>  [<ffffffff810052c9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
>>  [<ffffffff81139dab>] handle_mm_fault+0x15b/0x270
>>  [<ffffffff81510c10>] do_page_fault+0x140/0x470
>>  [<ffffffff8150d7d5>] page_fault+0x25/0x30
>>
>> Call xen_cleanhighmap() with 4MB aligned for page tables mapping to fix it.
>> The unnecessory call of xen_cleanhighmap() in DEBUG mode is also removed.
>>
>> -v2: add comment about XEN alignment from Juergen.
>>
>> References: https://lists.xen.org/archives/html/xen-devel/2012-07/msg01562.html
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
>
>
> Juergen
Juergen Gross Sept. 27, 2017, 2:33 p.m. UTC | #3
On 27/09/17 15:38, Boris Ostrovsky wrote:
> On 09/27/2017 05:43 AM, Juergen Gross wrote:
>> On 27/09/17 11:41, Zhenzhong Duan wrote:
>>> When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
>>> mapping overlaps with kernel module virtual space. When mapping in this space
>>> is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
>>> left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
>>> finish at 2MB boundary.
> 
> Does this mapping need to be 4MB-aligned?

I guess you are questioning the alignment of addr to be 4MB?
In this case you are right: the end of the mapping is 4MB aligned, as
correctly stated in the comment added.

> (I also think this should go to stable trees)

Indeed.


Juergen
Boris Ostrovsky Sept. 27, 2017, 2:48 p.m. UTC | #4
On 09/27/2017 10:33 AM, Juergen Gross wrote:
> On 27/09/17 15:38, Boris Ostrovsky wrote:
>> On 09/27/2017 05:43 AM, Juergen Gross wrote:
>>> On 27/09/17 11:41, Zhenzhong Duan wrote:
>>>> When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
>>>> mapping overlaps with kernel module virtual space. When mapping in this space
>>>> is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
>>>> left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
>>>> finish at 2MB boundary.
>> Does this mapping need to be 4MB-aligned?
> I guess you are questioning the alignment of addr to be 4MB?
> In this case you are right: the end of the mapping is 4MB aligned, as
> correctly stated in the comment added.

Yes, and my question is why does it need to be aligned on 4MB. Doesn't
2MB alignment suffice?

-boris
Juergen Gross Sept. 27, 2017, 2:56 p.m. UTC | #5
On 27/09/17 16:48, Boris Ostrovsky wrote:
> On 09/27/2017 10:33 AM, Juergen Gross wrote:
>> On 27/09/17 15:38, Boris Ostrovsky wrote:
>>> On 09/27/2017 05:43 AM, Juergen Gross wrote:
>>>> On 27/09/17 11:41, Zhenzhong Duan wrote:
>>>>> When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
>>>>> mapping overlaps with kernel module virtual space. When mapping in this space
>>>>> is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
>>>>> left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
>>>>> finish at 2MB boundary.
>>> Does this mapping need to be 4MB-aligned?
>> I guess you are questioning the alignment of addr to be 4MB?
>> In this case you are right: the end of the mapping is 4MB aligned, as
>> correctly stated in the comment added.
> 
> Yes, and my question is why does it need to be aligned on 4MB. Doesn't
> 2MB alignment suffice?

I believe this has historical reasons. :-)

For this patch the answer doesn't matter, as Xen does it this way and
the kernel has to cope with the situation.

This interface is specified in include/xen/interface/xen.h in the
comment section just before struct start_info:

/*
 * Start-of-day memory layout
 *
 *  1. The domain is started within contiguous virtual-memory region.
 *  2. The contiguous region begins and ends on an aligned 4MB boundary.
...


Juergen
Boris Ostrovsky Sept. 27, 2017, 3:02 p.m. UTC | #6
On 09/27/2017 10:56 AM, Juergen Gross wrote:
> On 27/09/17 16:48, Boris Ostrovsky wrote:
>> On 09/27/2017 10:33 AM, Juergen Gross wrote:
>>> On 27/09/17 15:38, Boris Ostrovsky wrote:
>>>> On 09/27/2017 05:43 AM, Juergen Gross wrote:
>>>>> On 27/09/17 11:41, Zhenzhong Duan wrote:
>>>>>> When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
>>>>>> mapping overlaps with kernel module virtual space. When mapping in this space
>>>>>> is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
>>>>>> left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
>>>>>> finish at 2MB boundary.
>>>> Does this mapping need to be 4MB-aligned?
>>> I guess you are questioning the alignment of addr to be 4MB?
>>> In this case you are right: the end of the mapping is 4MB aligned, as
>>> correctly stated in the comment added.
>> Yes, and my question is why does it need to be aligned on 4MB. Doesn't
>> 2MB alignment suffice?
> I believe this has historical reasons. :-)
>
> For this patch the answer doesn't matter, as Xen does it this way and
> the kernel has to cope with the situation.
>
> This interface is specified in include/xen/interface/xen.h in the
> comment section just before struct start_info:
>
> /*
>  * Start-of-day memory layout
>  *
>  *  1. The domain is started within contiguous virtual-memory region.
>  *  2. The contiguous region begins and ends on an aligned 4MB boundary.

Ah, this is what I was really looking for --- that 4MB alignment is part
of the ABI.

-boris
Andrew Cooper Sept. 27, 2017, 3:03 p.m. UTC | #7
On 27/09/17 15:56, Juergen Gross wrote:
> On 27/09/17 16:48, Boris Ostrovsky wrote:
>> On 09/27/2017 10:33 AM, Juergen Gross wrote:
>>> On 27/09/17 15:38, Boris Ostrovsky wrote:
>>>> On 09/27/2017 05:43 AM, Juergen Gross wrote:
>>>>> On 27/09/17 11:41, Zhenzhong Duan wrote:
>>>>>> When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
>>>>>> mapping overlaps with kernel module virtual space. When mapping in this space
>>>>>> is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
>>>>>> left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
>>>>>> finish at 2MB boundary.
>>>> Does this mapping need to be 4MB-aligned?
>>> I guess you are questioning the alignment of addr to be 4MB?
>>> In this case you are right: the end of the mapping is 4MB aligned, as
>>> correctly stated in the comment added.
>> Yes, and my question is why does it need to be aligned on 4MB. Doesn't
>> 2MB alignment suffice?
> I believe this has historical reasons. :-)

Back in the day, superpages had 4M alignment.

~Andrew
Boris Ostrovsky Sept. 27, 2017, 8:01 p.m. UTC | #8
On 09/27/2017 05:43 AM, Juergen Gross wrote:
> On 27/09/17 11:41, Zhenzhong Duan wrote:
>> When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
>> mapping overlaps with kernel module virtual space. When mapping in this space
>> is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
>> left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
>> finish at 2MB boundary.
>>
>> When module loading is just on top of the 2MB space, got below warning:
>>
>> WARNING: at mm/vmalloc.c:106 vmap_pte_range+0x14e/0x190()
>> Call Trace:
>>  [<ffffffff81117083>] warn_alloc_failed+0xf3/0x160
>>  [<ffffffff81146022>] __vmalloc_area_node+0x182/0x1c0
>>  [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
>>  [<ffffffff81145df7>] __vmalloc_node_range+0xa7/0x110
>>  [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
>>  [<ffffffff8103ca54>] module_alloc+0x64/0x70
>>  [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
>>  [<ffffffff810ac91e>] module_alloc_update_bounds+0x1e/0x80
>>  [<ffffffff810ac9a7>] move_module+0x27/0x150
>>  [<ffffffff810aefa0>] layout_and_allocate+0x120/0x1b0
>>  [<ffffffff810af0a8>] load_module+0x78/0x640
>>  [<ffffffff811ff90b>] ? security_file_permission+0x8b/0x90
>>  [<ffffffff810af6d2>] sys_init_module+0x62/0x1e0
>>  [<ffffffff815154c2>] system_call_fastpath+0x16/0x1b
>>
>> Then the mapping of 2MB is cleared, finally oops when the page in that space is
>> accessed.
>>
>> BUG: unable to handle kernel paging request at ffff880022600000
>> IP: [<ffffffff81260877>] clear_page_c_e+0x7/0x10
>> PGD 1788067 PUD 178c067 PMD 22434067 PTE 0
>> Oops: 0002 [#1] SMP
>> Call Trace:
>>  [<ffffffff81116ef7>] ? prep_new_page+0x127/0x1c0
>>  [<ffffffff81117d42>] get_page_from_freelist+0x1e2/0x550
>>  [<ffffffff81133010>] ? ii_iovec_copy_to_user+0x90/0x140
>>  [<ffffffff81119c9d>] __alloc_pages_nodemask+0x12d/0x230
>>  [<ffffffff81155516>] alloc_pages_vma+0xc6/0x1a0
>>  [<ffffffff81006ffd>] ? pte_mfn_to_pfn+0x7d/0x100
>>  [<ffffffff81134cfb>] do_anonymous_page+0x16b/0x350
>>  [<ffffffff81139c34>] handle_pte_fault+0x1e4/0x200
>>  [<ffffffff8100712e>] ? xen_pmd_val+0xe/0x10
>>  [<ffffffff810052c9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
>>  [<ffffffff81139dab>] handle_mm_fault+0x15b/0x270
>>  [<ffffffff81510c10>] do_page_fault+0x140/0x470
>>  [<ffffffff8150d7d5>] page_fault+0x25/0x30
>>
>> Call xen_cleanhighmap() with 4MB aligned for page tables mapping to fix it.
>> The unnecessory call of xen_cleanhighmap() in DEBUG mode is also removed.
>>
>> -v2: add comment about XEN alignment from Juergen.
>>
>> References: https://lists.xen.org/archives/html/xen-devel/2012-07/msg01562.html
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>


Applied to for-linus-14b

-boris
diff mbox

Patch

diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 7330cb3..aa0f7e2 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -1238,21 +1238,16 @@  static void __init xen_pagetable_cleanhighmap(void)
 	 * from _brk_limit way up to the max_pfn_mapped (which is the end of
 	 * the ramdisk). We continue on, erasing PMD entries that point to page
 	 * tables - do note that they are accessible at this stage via __va.
-	 * For good measure we also round up to the PMD - which means that if
+	 * As Xen is aligning the memory end to a 4MB boundary, for good
+	 * measure we also round up to PMD_SIZE * 2 - which means that if
 	 * anybody is using __ka address to the initial boot-stack - and try
 	 * to use it - they are going to crash. The xen_start_info has been
 	 * taken care of already in xen_setup_kernel_pagetable. */
 	addr = xen_start_info->pt_base;
-	size = roundup(xen_start_info->nr_pt_frames * PAGE_SIZE, PMD_SIZE);
+	size = xen_start_info->nr_pt_frames * PAGE_SIZE;
 
-	xen_cleanhighmap(addr, addr + size);
+	xen_cleanhighmap(addr, roundup(addr + size, PMD_SIZE * 2));
 	xen_start_info->pt_base = (unsigned long)__va(__pa(xen_start_info->pt_base));
-#ifdef DEBUG
-	/* This is superfluous and is not necessary, but you know what
-	 * lets do it. The MODULES_VADDR -> MODULES_END should be clear of
-	 * anything at this stage. */
-	xen_cleanhighmap(MODULES_VADDR, roundup(MODULES_VADDR, PUD_SIZE) - 1);
-#endif
 }
 #endif