Message ID | 58FDFD90.8050300@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote: > On 2017/4/24 18:44, Mark Rutland wrote: > > So the issue is that we have the callchain below for a kernel image > > address: > > > > read_kcore() > > ->is_vmalloc_or_module_addr() // returns true > > ->vread() > > -->aligned_vread() > > --->vmalloc_to_page() > > > > In is_vmalloc{,or_module}_addr() we just check the addr against > > VMALLOC_START and VMALLOC_END, so they will return true for a kernel > > image address. > > > > Then, we call vmalloc_to_page(). While this only handles mappings made > > at page granularity, the kernel image mapping may have used sections. So > > this tries a bogus walk to the pte level. > > Should we special-case kernel image handling, e.g. with new > > is_kernel_image_addr() / kernel_image_to_page() helpers? > yes , it seems to the best way to implents it without performance back. > The following patch is the implment. Any thought? > > Signed-off-by: zhong jiang <zhongjiang@huawei.com> > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index b84615b..851ac35 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x) > return false; > #endif > } > + > +static inline bool is_kernel_image_addr(const void *x) > +{ > + unsigned long addr = (unsigned long)x; > + > + return addr >= (unsigned long)_stext && addr < (unsigned long)_end; > + > +} > + > #ifdef CONFIG_MMU > extern int is_vmalloc_or_module_addr(const void *x); > #else > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 3ca82d4..9a9ef65 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x) > return is_vmalloc_addr(x); > } > > +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd) > +{ > + unsigned long addr = (unsigned long)kernel_addr; > + struct page *page = NULL; > + pud_t *pud; > + pmd_t *pmd; > + pte_t *pte; > + > + if (pgd_none(*pgd)) > + goto out; > + > + pud = pud_offset(pgd, addr); > + if (pud_none(*pud)) > + goto out; > + > + if (pud_sect(*pud)) > + return pud_page(*pud); The *_sect() helpers are arch-specific, so we cannot use them in generic code. This would need to be architecture-specific. Secondly, this will return head page of the section regardless of which page in the section the address corresponds to > + > + pmd = pmd_offset(*pmd, addr); > + if (pmd_none(*pmd)) > + goto out; > + > + if (pmd_sect(*pmd)) > + return pmd_page(*pmd); Likewise on both counts. > + > + pte = pte_offset_kernel(pmd, addr); > + if (pte_none(*pte)) > + goto out; > + > + page = pte_page(*pte); > + > +out: > + return page; > + > +} Given we know what the address should map to, I don't think we need to walk the page tables here. I think this can be: static struct page *kernel_image_to_page(const void *addr) { return virt_to_page(lm_alias(vmalloc_addr)); } > + > /* > * Walk a vmap address to the struct page it maps. > */ > @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) > */ > VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr)); > > + if (is_kernel_image_addr(vmalloc_addr)) > + return kernel_image_to_page(vmalloc_addr, pgd); It's not clear to me that this is the right place for this to live. It might be best to code the kernel image logic directly in kcore (and kmem), assuming everyone's OK with that approach. Thanks, Mark.
On 04/24/2017 08:51 AM, Mark Rutland wrote: > On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote: >> On 2017/4/24 18:44, Mark Rutland wrote: >>> So the issue is that we have the callchain below for a kernel image >>> address: >>> >>> read_kcore() >>> ->is_vmalloc_or_module_addr() // returns true >>> ->vread() >>> -->aligned_vread() >>> --->vmalloc_to_page() >>> >>> In is_vmalloc{,or_module}_addr() we just check the addr against >>> VMALLOC_START and VMALLOC_END, so they will return true for a kernel >>> image address. >>> >>> Then, we call vmalloc_to_page(). While this only handles mappings made >>> at page granularity, the kernel image mapping may have used sections. So >>> this tries a bogus walk to the pte level. > >>> Should we special-case kernel image handling, e.g. with new >>> is_kernel_image_addr() / kernel_image_to_page() helpers? > >> yes , it seems to the best way to implents it without performance back. > >> The following patch is the implment. Any thought? >> >> Signed-off-by: zhong jiang <zhongjiang@huawei.com> >> >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index b84615b..851ac35 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x) >> return false; >> #endif >> } >> + >> +static inline bool is_kernel_image_addr(const void *x) >> +{ >> + unsigned long addr = (unsigned long)x; >> + >> + return addr >= (unsigned long)_stext && addr < (unsigned long)_end; >> + >> +} >> + >> #ifdef CONFIG_MMU >> extern int is_vmalloc_or_module_addr(const void *x); >> #else >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index 3ca82d4..9a9ef65 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x) >> return is_vmalloc_addr(x); >> } >> >> +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd) >> +{ >> + unsigned long addr = (unsigned long)kernel_addr; >> + struct page *page = NULL; >> + pud_t *pud; >> + pmd_t *pmd; >> + pte_t *pte; >> + >> + if (pgd_none(*pgd)) >> + goto out; >> + >> + pud = pud_offset(pgd, addr); >> + if (pud_none(*pud)) >> + goto out; >> + >> + if (pud_sect(*pud)) >> + return pud_page(*pud); > > The *_sect() helpers are arch-specific, so we cannot use them in generic > code. This would need to be architecture-specific. > > Secondly, this will return head page of the section regardless of which > page in the section the address corresponds to > >> + >> + pmd = pmd_offset(*pmd, addr); >> + if (pmd_none(*pmd)) >> + goto out; >> + >> + if (pmd_sect(*pmd)) >> + return pmd_page(*pmd); > > Likewise on both counts. > >> + >> + pte = pte_offset_kernel(pmd, addr); >> + if (pte_none(*pte)) >> + goto out; >> + >> + page = pte_page(*pte); >> + >> +out: >> + return page; >> + >> +} > > Given we know what the address should map to, I don't think we need to > walk the page tables here. I think this can be: > > static struct page *kernel_image_to_page(const void *addr) > { > return virt_to_page(lm_alias(vmalloc_addr)); > } > >> + >> /* >> * Walk a vmap address to the struct page it maps. >> */ >> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) >> */ >> VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr)); >> >> + if (is_kernel_image_addr(vmalloc_addr)) >> + return kernel_image_to_page(vmalloc_addr, pgd); > > It's not clear to me that this is the right place for this to live. > > It might be best to code the kernel image logic directly in kcore (and > kmem), assuming everyone's OK with that approach. > That will fix kcore and kmem but this will show up in other places too. We've gone through and made sure that virt_addr_valid returns true if and only if virt_to_page returns a valid address. I don't know if we can make as strong a claim about is_vmalloc_addr and vmalloc_to_page in all cases but is_vmalloc_addr should not return true for the kernel image. That would at least let kcore fall back to kern_addr_valid which should correctly handle the kernel image. The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END seems like the best approach although I haven't tried a prototype at all. Thanks, Laura
On 24 April 2017 at 18:52, Laura Abbott <labbott@redhat.com> wrote: > On 04/24/2017 08:51 AM, Mark Rutland wrote: >> On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote: >>> On 2017/4/24 18:44, Mark Rutland wrote: >>>> So the issue is that we have the callchain below for a kernel image >>>> address: >>>> >>>> read_kcore() >>>> ->is_vmalloc_or_module_addr() // returns true >>>> ->vread() >>>> -->aligned_vread() >>>> --->vmalloc_to_page() >>>> >>>> In is_vmalloc{,or_module}_addr() we just check the addr against >>>> VMALLOC_START and VMALLOC_END, so they will return true for a kernel >>>> image address. >>>> >>>> Then, we call vmalloc_to_page(). While this only handles mappings made >>>> at page granularity, the kernel image mapping may have used sections. So >>>> this tries a bogus walk to the pte level. >> >>>> Should we special-case kernel image handling, e.g. with new >>>> is_kernel_image_addr() / kernel_image_to_page() helpers? >> >>> yes , it seems to the best way to implents it without performance back. >> >>> The following patch is the implment. Any thought? >>> >>> Signed-off-by: zhong jiang <zhongjiang@huawei.com> >>> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>> index b84615b..851ac35 100644 >>> --- a/include/linux/mm.h >>> +++ b/include/linux/mm.h >>> @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x) >>> return false; >>> #endif >>> } >>> + >>> +static inline bool is_kernel_image_addr(const void *x) >>> +{ >>> + unsigned long addr = (unsigned long)x; >>> + >>> + return addr >= (unsigned long)_stext && addr < (unsigned long)_end; >>> + >>> +} >>> + >>> #ifdef CONFIG_MMU >>> extern int is_vmalloc_or_module_addr(const void *x); >>> #else >>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >>> index 3ca82d4..9a9ef65 100644 >>> --- a/mm/vmalloc.c >>> +++ b/mm/vmalloc.c >>> @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x) >>> return is_vmalloc_addr(x); >>> } >>> >>> +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd) >>> +{ >>> + unsigned long addr = (unsigned long)kernel_addr; >>> + struct page *page = NULL; >>> + pud_t *pud; >>> + pmd_t *pmd; >>> + pte_t *pte; >>> + >>> + if (pgd_none(*pgd)) >>> + goto out; >>> + >>> + pud = pud_offset(pgd, addr); >>> + if (pud_none(*pud)) >>> + goto out; >>> + >>> + if (pud_sect(*pud)) >>> + return pud_page(*pud); >> >> The *_sect() helpers are arch-specific, so we cannot use them in generic >> code. This would need to be architecture-specific. >> >> Secondly, this will return head page of the section regardless of which >> page in the section the address corresponds to >> >>> + >>> + pmd = pmd_offset(*pmd, addr); >>> + if (pmd_none(*pmd)) >>> + goto out; >>> + >>> + if (pmd_sect(*pmd)) >>> + return pmd_page(*pmd); >> >> Likewise on both counts. >> >>> + >>> + pte = pte_offset_kernel(pmd, addr); >>> + if (pte_none(*pte)) >>> + goto out; >>> + >>> + page = pte_page(*pte); >>> + >>> +out: >>> + return page; >>> + >>> +} >> >> Given we know what the address should map to, I don't think we need to >> walk the page tables here. I think this can be: >> >> static struct page *kernel_image_to_page(const void *addr) >> { >> return virt_to_page(lm_alias(vmalloc_addr)); >> } >> >>> + >>> /* >>> * Walk a vmap address to the struct page it maps. >>> */ >>> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) >>> */ >>> VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr)); >>> >>> + if (is_kernel_image_addr(vmalloc_addr)) >>> + return kernel_image_to_page(vmalloc_addr, pgd); >> >> It's not clear to me that this is the right place for this to live. >> >> It might be best to code the kernel image logic directly in kcore (and >> kmem), assuming everyone's OK with that approach. >> > > That will fix kcore and kmem but this will show up in other places too. > We've gone through and made sure that virt_addr_valid returns > true if and only if virt_to_page returns a valid address. I don't know > if we can make as strong a claim about is_vmalloc_addr and > vmalloc_to_page in all cases but is_vmalloc_addr should not return true > for the kernel image. That would at least let kcore fall back to > kern_addr_valid which should correctly handle the kernel image. > The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END > seems like the best approach although I haven't tried a prototype > at all. > Moving the kernel into the vmalloc region was kind of the point, for KASLR. Undoing that means either disabling KASLR, or splitting the vmalloc region into a KASLR region only for the kernel, and a vmalloc region like we had when vmlinux lived in the linear region. In general, I think we should be able to deal with different kinds of mappings with different granularity in the vmalloc region. If necessary, we could introduce a VM_xxx flag for the kernel to distinguish such regions from ordinary VM_MAP regions.
On 2017/4/25 1:56, Ard Biesheuvel wrote: > On 24 April 2017 at 18:52, Laura Abbott <labbott@redhat.com> wrote: >> On 04/24/2017 08:51 AM, Mark Rutland wrote: >>> On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote: >>>> On 2017/4/24 18:44, Mark Rutland wrote: >>>>> So the issue is that we have the callchain below for a kernel image >>>>> address: >>>>> >>>>> read_kcore() >>>>> ->is_vmalloc_or_module_addr() // returns true >>>>> ->vread() >>>>> -->aligned_vread() >>>>> --->vmalloc_to_page() >>>>> >>>>> In is_vmalloc{,or_module}_addr() we just check the addr against >>>>> VMALLOC_START and VMALLOC_END, so they will return true for a kernel >>>>> image address. >>>>> >>>>> Then, we call vmalloc_to_page(). While this only handles mappings made >>>>> at page granularity, the kernel image mapping may have used sections. So >>>>> this tries a bogus walk to the pte level. >>>>> Should we special-case kernel image handling, e.g. with new >>>>> is_kernel_image_addr() / kernel_image_to_page() helpers? >>>> yes , it seems to the best way to implents it without performance back. >>>> The following patch is the implment. Any thought? >>>> >>>> Signed-off-by: zhong jiang <zhongjiang@huawei.com> >>>> >>>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>>> index b84615b..851ac35 100644 >>>> --- a/include/linux/mm.h >>>> +++ b/include/linux/mm.h >>>> @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x) >>>> return false; >>>> #endif >>>> } >>>> + >>>> +static inline bool is_kernel_image_addr(const void *x) >>>> +{ >>>> + unsigned long addr = (unsigned long)x; >>>> + >>>> + return addr >= (unsigned long)_stext && addr < (unsigned long)_end; >>>> + >>>> +} >>>> + >>>> #ifdef CONFIG_MMU >>>> extern int is_vmalloc_or_module_addr(const void *x); >>>> #else >>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >>>> index 3ca82d4..9a9ef65 100644 >>>> --- a/mm/vmalloc.c >>>> +++ b/mm/vmalloc.c >>>> @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x) >>>> return is_vmalloc_addr(x); >>>> } >>>> >>>> +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd) >>>> +{ >>>> + unsigned long addr = (unsigned long)kernel_addr; >>>> + struct page *page = NULL; >>>> + pud_t *pud; >>>> + pmd_t *pmd; >>>> + pte_t *pte; >>>> + >>>> + if (pgd_none(*pgd)) >>>> + goto out; >>>> + >>>> + pud = pud_offset(pgd, addr); >>>> + if (pud_none(*pud)) >>>> + goto out; >>>> + >>>> + if (pud_sect(*pud)) >>>> + return pud_page(*pud); >>> The *_sect() helpers are arch-specific, so we cannot use them in generic >>> code. This would need to be architecture-specific. >>> >>> Secondly, this will return head page of the section regardless of which >>> page in the section the address corresponds to >>> >>>> + >>>> + pmd = pmd_offset(*pmd, addr); >>>> + if (pmd_none(*pmd)) >>>> + goto out; >>>> + >>>> + if (pmd_sect(*pmd)) >>>> + return pmd_page(*pmd); >>> Likewise on both counts. >>> >>>> + >>>> + pte = pte_offset_kernel(pmd, addr); >>>> + if (pte_none(*pte)) >>>> + goto out; >>>> + >>>> + page = pte_page(*pte); >>>> + >>>> +out: >>>> + return page; >>>> + >>>> +} >>> Given we know what the address should map to, I don't think we need to >>> walk the page tables here. I think this can be: >>> >>> static struct page *kernel_image_to_page(const void *addr) >>> { >>> return virt_to_page(lm_alias(vmalloc_addr)); >>> } >>> >>>> + >>>> /* >>>> * Walk a vmap address to the struct page it maps. >>>> */ >>>> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) >>>> */ >>>> VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr)); >>>> >>>> + if (is_kernel_image_addr(vmalloc_addr)) >>>> + return kernel_image_to_page(vmalloc_addr, pgd); >>> It's not clear to me that this is the right place for this to live. >>> >>> It might be best to code the kernel image logic directly in kcore (and >>> kmem), assuming everyone's OK with that approach. >>> >> That will fix kcore and kmem but this will show up in other places too. >> We've gone through and made sure that virt_addr_valid returns >> true if and only if virt_to_page returns a valid address. I don't know >> if we can make as strong a claim about is_vmalloc_addr and >> vmalloc_to_page in all cases but is_vmalloc_addr should not return true >> for the kernel image. That would at least let kcore fall back to >> kern_addr_valid which should correctly handle the kernel image. >> The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END >> seems like the best approach although I haven't tried a prototype >> at all. >> > Moving the kernel into the vmalloc region was kind of the point, for > KASLR. Undoing that means either disabling KASLR, or splitting the > vmalloc region into a KASLR region only for the kernel, and a vmalloc > region like we had when vmlinux lived in the linear region. > > In general, I think we should be able to deal with different kinds of > mappings with different granularity in the vmalloc region. If > necessary, we could introduce a VM_xxx flag for the kernel to > distinguish such regions from ordinary VM_MAP regions. > . > Hi, Ard My initial thought is same with you. valloc_to_page is public for any ARCH. Only arm64 change the mappings granularity. if we need to adjust the function to any ARCH, it should implement it in each ARCH. or add CONFIG_ARM64 code to exclude the other ARCH. because the pud_* helper is arch related. Mark proposed that add kernel_image_is_page helper is more simple. and I prefer to the Mark opinions. Do you think? or I miss anythings. Depend on you and other maintainers. Thanks zhongjiang
On 2017/4/24 23:51, Mark Rutland wrote: > On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote: >> On 2017/4/24 18:44, Mark Rutland wrote: >>> So the issue is that we have the callchain below for a kernel image >>> address: >>> >>> read_kcore() >>> ->is_vmalloc_or_module_addr() // returns true >>> ->vread() >>> -->aligned_vread() >>> --->vmalloc_to_page() >>> >>> In is_vmalloc{,or_module}_addr() we just check the addr against >>> VMALLOC_START and VMALLOC_END, so they will return true for a kernel >>> image address. >>> >>> Then, we call vmalloc_to_page(). While this only handles mappings made >>> at page granularity, the kernel image mapping may have used sections. So >>> this tries a bogus walk to the pte level. >>> Should we special-case kernel image handling, e.g. with new >>> is_kernel_image_addr() / kernel_image_to_page() helpers? >> yes , it seems to the best way to implents it without performance back. >> The following patch is the implment. Any thought? >> >> Signed-off-by: zhong jiang <zhongjiang@huawei.com> >> >> diff --git a/include/linux/mm.h b/include/linux/mm.h >> index b84615b..851ac35 100644 >> --- a/include/linux/mm.h >> +++ b/include/linux/mm.h >> @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x) >> return false; >> #endif >> } >> + >> +static inline bool is_kernel_image_addr(const void *x) >> +{ >> + unsigned long addr = (unsigned long)x; >> + >> + return addr >= (unsigned long)_stext && addr < (unsigned long)_end; >> + >> +} >> + >> #ifdef CONFIG_MMU >> extern int is_vmalloc_or_module_addr(const void *x); >> #else >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index 3ca82d4..9a9ef65 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x) >> return is_vmalloc_addr(x); >> } >> >> +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd) >> +{ >> + unsigned long addr = (unsigned long)kernel_addr; >> + struct page *page = NULL; >> + pud_t *pud; >> + pmd_t *pmd; >> + pte_t *pte; >> + >> + if (pgd_none(*pgd)) >> + goto out; >> + >> + pud = pud_offset(pgd, addr); >> + if (pud_none(*pud)) >> + goto out; >> + >> + if (pud_sect(*pud)) >> + return pud_page(*pud); > The *_sect() helpers are arch-specific, so we cannot use them in generic > code. This would need to be architecture-specific. > > Secondly, this will return head page of the section regardless of which > page in the section the address corresponds to yes, the code is too harry to finish. it is so mess. we will send the formal patch later. if you still accept the approach. Thanks zhongjiang >> + >> + pmd = pmd_offset(*pmd, addr); >> + if (pmd_none(*pmd)) >> + goto out; >> + >> + if (pmd_sect(*pmd)) >> + return pmd_page(*pmd); > Likewise on both counts. > >> + >> + pte = pte_offset_kernel(pmd, addr); >> + if (pte_none(*pte)) >> + goto out; >> + >> + page = pte_page(*pte); >> + >> +out: >> + return page; >> + >> +} > Given we know what the address should map to, I don't think we need to > walk the page tables here. I think this can be: > > static struct page *kernel_image_to_page(const void *addr) > { > return virt_to_page(lm_alias(vmalloc_addr)); > } > >> + >> /* >> * Walk a vmap address to the struct page it maps. >> */ >> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) >> */ >> VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr)); >> >> + if (is_kernel_image_addr(vmalloc_addr)) >> + return kernel_image_to_page(vmalloc_addr, pgd); > It's not clear to me that this is the right place for this to live. > > It might be best to code the kernel image logic directly in kcore (and > kmem), assuming everyone's OK with that approach. > > Thanks, > Mark. > . >
On Mon, Apr 24, 2017 at 10:52:08AM -0700, Laura Abbott wrote: > On 04/24/2017 08:51 AM, Mark Rutland wrote: > > On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote: > >> /* > >> * Walk a vmap address to the struct page it maps. > >> */ > >> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) > >> */ > >> VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr)); > >> > >> + if (is_kernel_image_addr(vmalloc_addr)) > >> + return kernel_image_to_page(vmalloc_addr, pgd); > > > > It's not clear to me that this is the right place for this to live. > > > > It might be best to code the kernel image logic directly in kcore (and > > kmem), assuming everyone's OK with that approach. > > > > That will fix kcore and kmem but this will show up in other places too. True. > We've gone through and made sure that virt_addr_valid returns > true if and only if virt_to_page returns a valid address. I don't know > if we can make as strong a claim about is_vmalloc_addr and > vmalloc_to_page in all cases but is_vmalloc_addr should not return true > for the kernel image. That would at least let kcore fall back to > kern_addr_valid which should correctly handle the kernel image. That would largely be my preference. My fear is that other users of is_vmalloc_addr() are doing the right thing for the kernel image today (e.g. not doing virt_to_phys()), because they see it as a vmalloc addr. So we might have to audit all of those. > The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END > seems like the best approach although I haven't tried a prototype > at all. Given that (AFAICT) we're the only architecture that puts the kernel in the vmalloc area, I agree that this is likely to be the simplest correct approach. The interaction with KASLR is somewhat unfortunate. Thanks, Mark.
On Mon, Apr 24, 2017 at 06:56:43PM +0100, Ard Biesheuvel wrote: > On 24 April 2017 at 18:52, Laura Abbott <labbott@redhat.com> wrote: > > On 04/24/2017 08:51 AM, Mark Rutland wrote: > >> On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote: > >> static struct page *kernel_image_to_page(const void *addr) > >> { > >> return virt_to_page(lm_alias(vmalloc_addr)); > >> } > >> > >>> + > >>> /* > >>> * Walk a vmap address to the struct page it maps. > >>> */ > >>> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) > >>> */ > >>> VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr)); > >>> > >>> + if (is_kernel_image_addr(vmalloc_addr)) > >>> + return kernel_image_to_page(vmalloc_addr, pgd); > >> > >> It's not clear to me that this is the right place for this to live. > >> > >> It might be best to code the kernel image logic directly in kcore (and > >> kmem), assuming everyone's OK with that approach. > >> > > > > That will fix kcore and kmem but this will show up in other places too. > > We've gone through and made sure that virt_addr_valid returns > > true if and only if virt_to_page returns a valid address. I don't know > > if we can make as strong a claim about is_vmalloc_addr and > > vmalloc_to_page in all cases but is_vmalloc_addr should not return true > > for the kernel image. That would at least let kcore fall back to > > kern_addr_valid which should correctly handle the kernel image. > > The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END > > seems like the best approach although I haven't tried a prototype > > at all. > > Moving the kernel into the vmalloc region was kind of the point, for > KASLR. Undoing that means either disabling KASLR, or splitting the > vmalloc region into a KASLR region only for the kernel, and a vmalloc > region like we had when vmlinux lived in the linear region. AFAICT, x86 don't place the kernel in the vmalloc region for its KASLR implementation. Is that just to avoid the issues that we're seeing, or are there aother constraints on x86? > In general, I think we should be able to deal with different kinds of > mappings with different granularity in the vmalloc region. If > necessary, we could introduce a VM_xxx flag for the kernel to > distinguish such regions from ordinary VM_MAP regions. I don't think that vmalloc_to_page() should have to deal with anything other than the usual page-granular memory mappings in the vmalloc area, as it currently does. We'd have to write a page table walker per-arch for that to work, and the only thing it would benefit is arm64's kernel image mapping. Adding VM_xxx (e.g. VM_KERNEL) sounds sane to me regardless of anything else. That still leaves the question as to what is_vmalloc_addr(addr) (should) imply about addr, though. Thanks, Mark.
On 25 April 2017 at 16:02, Mark Rutland <mark.rutland@arm.com> wrote: > On Mon, Apr 24, 2017 at 06:56:43PM +0100, Ard Biesheuvel wrote: >> On 24 April 2017 at 18:52, Laura Abbott <labbott@redhat.com> wrote: >> > On 04/24/2017 08:51 AM, Mark Rutland wrote: >> >> On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote: > >> >> static struct page *kernel_image_to_page(const void *addr) >> >> { >> >> return virt_to_page(lm_alias(vmalloc_addr)); >> >> } >> >> >> >>> + >> >>> /* >> >>> * Walk a vmap address to the struct page it maps. >> >>> */ >> >>> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) >> >>> */ >> >>> VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr)); >> >>> >> >>> + if (is_kernel_image_addr(vmalloc_addr)) >> >>> + return kernel_image_to_page(vmalloc_addr, pgd); >> >> >> >> It's not clear to me that this is the right place for this to live. >> >> >> >> It might be best to code the kernel image logic directly in kcore (and >> >> kmem), assuming everyone's OK with that approach. >> >> >> > >> > That will fix kcore and kmem but this will show up in other places too. >> > We've gone through and made sure that virt_addr_valid returns >> > true if and only if virt_to_page returns a valid address. I don't know >> > if we can make as strong a claim about is_vmalloc_addr and >> > vmalloc_to_page in all cases but is_vmalloc_addr should not return true >> > for the kernel image. That would at least let kcore fall back to >> > kern_addr_valid which should correctly handle the kernel image. >> > The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END >> > seems like the best approach although I haven't tried a prototype >> > at all. >> >> Moving the kernel into the vmalloc region was kind of the point, for >> KASLR. Undoing that means either disabling KASLR, or splitting the >> vmalloc region into a KASLR region only for the kernel, and a vmalloc >> region like we had when vmlinux lived in the linear region. > > AFAICT, x86 don't place the kernel in the vmalloc region for its KASLR > implementation. Is that just to avoid the issues that we're seeing, or > are there aother constraints on x86? > Given how a lot of architectures put kernel modules in the vmalloc space, I don't think there is generally an issue with putting the kernel there as well, other than what we are currently experiencing. As discussed, ioremap regions are also likely to be mapped using block mappings these days, and also reside between VMALLOC_START and VMALLOC_END. So in my opinion, since a) we already put text and data in the vmalloc region, and b) we already use block mappings in the vmalloc region putting the kernel there is only a minor variation of what we already have to deal with, and so I don't think there is anything wrong with it. We simply have to tweak some tests here and there, but nothing fundamental afaict. >> In general, I think we should be able to deal with different kinds of >> mappings with different granularity in the vmalloc region. If >> necessary, we could introduce a VM_xxx flag for the kernel to >> distinguish such regions from ordinary VM_MAP regions. > > I don't think that vmalloc_to_page() should have to deal with anything > other than the usual page-granular memory mappings in the vmalloc area, > as it currently does. We'd have to write a page table walker per-arch > for that to work, and the only thing it would benefit is arm64's kernel > image mapping. > > Adding VM_xxx (e.g. VM_KERNEL) sounds sane to me regardless of anything > else. > > That still leaves the question as to what is_vmalloc_addr(addr) (should) > imply about addr, though. > > Thanks, > Mark.
--- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x) return false; #endif } + +static inline bool is_kernel_image_addr(const void *x) +{ + unsigned long addr = (unsigned long)x; + + return addr >= (unsigned long)_stext && addr < (unsigned long)_end; + +} + #ifdef CONFIG_MMU extern int is_vmalloc_or_module_addr(const void *x); #else diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 3ca82d4..9a9ef65 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x) return is_vmalloc_addr(x); } +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd) +{ + unsigned long addr = (unsigned long)kernel_addr; + struct page *page = NULL; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + if (pgd_none(*pgd)) + goto out; + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) + goto out; + + if (pud_sect(*pud)) + return pud_page(*pud); + + pmd = pmd_offset(*pmd, addr); + if (pmd_none(*pmd)) + goto out; + + if (pmd_sect(*pmd)) + return pmd_page(*pmd); + + pte = pte_offset_kernel(pmd, addr); + if (pte_none(*pte)) + goto out; + + page = pte_page(*pte); + +out: + return page; + +} + /* * Walk a vmap address to the struct page it maps. */ @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr) */ VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr)); + if (is_kernel_image_addr(vmalloc_addr)) + return kernel_image_to_page(vmalloc_addr, pgd); + if (!pgd_none(*pgd)) { pud_t *pud = pud_offset(pgd, addr); if (!pud_none(*pud)) {