Message ID | 20161209195925.21420-1-tamas.lengyel@zentific.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 09.12.16 at 20:59, <tamas.lengyel@zentific.com> wrote: > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -1461,7 +1461,8 @@ mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn) > * we indeed found a conflicting mem_access setting. > */ > static struct page_info* > -p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > +p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag, > + const struct vcpu *v) > { > long rc; > paddr_t ipa; > @@ -1470,7 +1471,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > xenmem_access_t xma; > p2m_type_t t; > struct page_info *page = NULL; > - struct p2m_domain *p2m = ¤t->domain->arch.p2m; > + struct p2m_domain *p2m = &v->domain->arch.p2m; > > rc = gva_to_ipa(gva, &ipa, flag); > if ( rc < 0 ) > @@ -1482,7 +1483,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > * We do this first as this is faster in the default case when no > * permission is set on the page. > */ > - rc = __p2m_get_mem_access(current->domain, gfn, &xma); > + rc = __p2m_get_mem_access(v->domain, gfn, &xma); > if ( rc < 0 ) > goto err; > > @@ -1546,7 +1547,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > > page = mfn_to_page(mfn_x(mfn)); > > - if ( unlikely(!get_page(page, current->domain)) ) > + if ( unlikely(!get_page(page, v->domain)) ) > page = NULL; > > err: Looking at these changes, wouldn't it be more reasonable to hand the function a struct domain *? Jan
On Dec 12, 2016 00:42, "Jan Beulich" <JBeulich@suse.com> wrote: >>> On 09.12.16 at 20:59, <tamas.lengyel@zentific.com> wrote: > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -1461,7 +1461,8 @@ mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn) > * we indeed found a conflicting mem_access setting. > */ > static struct page_info* > -p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > +p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag, > + const struct vcpu *v) > { > long rc; > paddr_t ipa; > @@ -1470,7 +1471,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > xenmem_access_t xma; > p2m_type_t t; > struct page_info *page = NULL; > - struct p2m_domain *p2m = ¤t->domain->arch.p2m; > + struct p2m_domain *p2m = &v->domain->arch.p2m; > > rc = gva_to_ipa(gva, &ipa, flag); > if ( rc < 0 ) > @@ -1482,7 +1483,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > * We do this first as this is faster in the default case when no > * permission is set on the page. > */ > - rc = __p2m_get_mem_access(current->domain, gfn, &xma); > + rc = __p2m_get_mem_access(v->domain, gfn, &xma); > if ( rc < 0 ) > goto err; > > @@ -1546,7 +1547,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > > page = mfn_to_page(mfn_x(mfn)); > > - if ( unlikely(!get_page(page, current->domain)) ) > + if ( unlikely(!get_page(page, v->domain)) ) > page = NULL; > > err: Looking at these changes, wouldn't it be more reasonable to hand the function a struct domain *? In its current state it might be but I believe with altp2m we will need the vcpu struct here anyway. Tamas
On 12/12/16 07:47, Tamas K Lengyel wrote: > > > On Dec 12, 2016 00:42, "Jan Beulich" <JBeulich@suse.com > <mailto:JBeulich@suse.com>> wrote: > > >>> On 09.12.16 at 20:59, <tamas.lengyel@zentific.com > <mailto:tamas.lengyel@zentific.com>> wrote: > > --- a/xen/arch/arm/p2m.c > > +++ b/xen/arch/arm/p2m.c > > @@ -1461,7 +1461,8 @@ mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn) > > * we indeed found a conflicting mem_access setting. > > */ > > static struct page_info* > > -p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > > +p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag, > > + const struct vcpu *v) > > { > > long rc; > > paddr_t ipa; > > @@ -1470,7 +1471,7 @@ p2m_mem_access_check_and_get_page(vaddr_t > gva, unsigned long flag) > > xenmem_access_t xma; > > p2m_type_t t; > > struct page_info *page = NULL; > > - struct p2m_domain *p2m = ¤t->domain->arch.p2m; > > + struct p2m_domain *p2m = &v->domain->arch.p2m; > > > > rc = gva_to_ipa(gva, &ipa, flag); > > if ( rc < 0 ) > > @@ -1482,7 +1483,7 @@ p2m_mem_access_check_and_get_page(vaddr_t > gva, unsigned long flag) > > * We do this first as this is faster in the default case when no > > * permission is set on the page. > > */ > > - rc = __p2m_get_mem_access(current->domain, gfn, &xma); > > + rc = __p2m_get_mem_access(v->domain, gfn, &xma); > > if ( rc < 0 ) > > goto err; > > > > @@ -1546,7 +1547,7 @@ p2m_mem_access_check_and_get_page(vaddr_t > gva, unsigned long flag) > > > > page = mfn_to_page(mfn_x(mfn)); > > > > - if ( unlikely(!get_page(page, current->domain)) ) > > + if ( unlikely(!get_page(page, v->domain)) ) > > page = NULL; > > > > err: > > Looking at these changes, wouldn't it be more reasonable to hand > the function a struct domain *? > > > In its current state it might be but I believe with altp2m we will need > the vcpu struct here anyway. Not only the altp2m. I keep mentioning a bigger issue, but it seems been ignored every time... The translation VA to IPA (guest physical address) is done using hardware. If the underlying memory of the stage-1 page table is protected, so the translation will fail. Given that this function is used in hypercall to retrieve the page associated to a buffer, it means that it will not be possible to do hypercall when the page table used to find the buffer IPA has not been touched. I believe this would require the vCPU in hand to fix this problem. Regards,
On Mon, Dec 12, 2016 at 4:46 AM, Julien Grall <julien.grall@arm.com> wrote: > > > On 12/12/16 07:47, Tamas K Lengyel wrote: >> >> >> >> On Dec 12, 2016 00:42, "Jan Beulich" <JBeulich@suse.com >> <mailto:JBeulich@suse.com>> wrote: >> >> >>> On 09.12.16 at 20:59, <tamas.lengyel@zentific.com >> <mailto:tamas.lengyel@zentific.com>> wrote: >> > --- a/xen/arch/arm/p2m.c >> > +++ b/xen/arch/arm/p2m.c >> > @@ -1461,7 +1461,8 @@ mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn) >> > * we indeed found a conflicting mem_access setting. >> > */ >> > static struct page_info* >> > -p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) >> > +p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag, >> > + const struct vcpu *v) >> > { >> > long rc; >> > paddr_t ipa; >> > @@ -1470,7 +1471,7 @@ p2m_mem_access_check_and_get_page(vaddr_t >> gva, unsigned long flag) >> > xenmem_access_t xma; >> > p2m_type_t t; >> > struct page_info *page = NULL; >> > - struct p2m_domain *p2m = ¤t->domain->arch.p2m; >> > + struct p2m_domain *p2m = &v->domain->arch.p2m; >> > >> > rc = gva_to_ipa(gva, &ipa, flag); >> > if ( rc < 0 ) >> > @@ -1482,7 +1483,7 @@ p2m_mem_access_check_and_get_page(vaddr_t >> gva, unsigned long flag) >> > * We do this first as this is faster in the default case when >> no >> > * permission is set on the page. >> > */ >> > - rc = __p2m_get_mem_access(current->domain, gfn, &xma); >> > + rc = __p2m_get_mem_access(v->domain, gfn, &xma); >> > if ( rc < 0 ) >> > goto err; >> > >> > @@ -1546,7 +1547,7 @@ p2m_mem_access_check_and_get_page(vaddr_t >> gva, unsigned long flag) >> > >> > page = mfn_to_page(mfn_x(mfn)); >> > >> > - if ( unlikely(!get_page(page, current->domain)) ) >> > + if ( unlikely(!get_page(page, v->domain)) ) >> > page = NULL; >> > >> > err: >> >> Looking at these changes, wouldn't it be more reasonable to hand >> the function a struct domain *? >> >> >> In its current state it might be but I believe with altp2m we will need >> the vcpu struct here anyway. > > > Not only the altp2m. I keep mentioning a bigger issue, but it seems been > ignored every time... I wouldn't say ignored. I think this is the first description with details that I see of the problem you passingly mentioned previously and I still don't have a way to reproduce it. > > The translation VA to IPA (guest physical address) is done using hardware. > If the underlying memory of the stage-1 page table is protected, so the > translation will fail. Given that this function is used in hypercall to > retrieve the page associated to a buffer, it means that it will not be > possible to do hypercall when the page table used to find the buffer IPA has > not been touched. This function specifically works around the case where the page of the guest pagetable is not accessible due to mem_access, when the hardware based lookup doesn't work. This function checks what the fault was, checks the page type and the mem_access rights to determine whether the fault was legit, or if it was due to mem_access. If it was mem_access it gets the page without involving the hardware. I'm not following what you describe afterwards regarding the buffer and what you mean by "the buffer IPA has not been touched". Care to elaborate? Thanks, Tamas
Hi Tamas, On 12/12/16 18:42, Tamas K Lengyel wrote: > On Mon, Dec 12, 2016 at 4:46 AM, Julien Grall <julien.grall@arm.com> wrote: >> The translation VA to IPA (guest physical address) is done using hardware. >> If the underlying memory of the stage-1 page table is protected, so the >> translation will fail. Given that this function is used in hypercall to >> retrieve the page associated to a buffer, it means that it will not be >> possible to do hypercall when the page table used to find the buffer IPA has >> not been touched. > > This function specifically works around the case where the page of the > guest pagetable is not accessible due to mem_access, when the hardware > based lookup doesn't work.This function checks what the fault was, > checks the page type and the mem_access rights to determine whether > the fault was legit, or if it was due to mem_access. If it was > mem_access it gets the page without involving the hardware. I'm not > following what you describe afterwards regarding the buffer and what > you mean by "the buffer IPA has not been touched". Care to elaborate? I am afraid to say that the function does not do what you think and is still using the hardware to do the translation. For instance the function gva_to_ipa is using the hardware to translate a VA to IPA. This function is called when it is not possible to directly translate a VA to a PA. This may fail for various reason: * The underlying memory of the buffer was restricted in stage-2 * The underlying memory of stage-1 page tables was restricted in stage-2 Whilst the function is solving the former, the latter will not work due to the call to gva_to_ipa. This will fail because the stage-1 PT are not accessible. One way to trigger it (note I haven't tested myself) is to have the code and data in separate page. The translation will be using distinct stage-1 page table entry. If nobody touches the page table entry before hand (for instance by accessing the buffer), it will still be inaccessible when Xen is calling p2m_mem_access_check_and_get_page. Regards,
On Mon, Dec 12, 2016 at 12:11 PM, Julien Grall <julien.grall@arm.com> wrote: > Hi Tamas, > > On 12/12/16 18:42, Tamas K Lengyel wrote: >> >> On Mon, Dec 12, 2016 at 4:46 AM, Julien Grall <julien.grall@arm.com> >> wrote: >>> >>> The translation VA to IPA (guest physical address) is done using >>> hardware. >>> If the underlying memory of the stage-1 page table is protected, so the >>> translation will fail. Given that this function is used in hypercall to >>> retrieve the page associated to a buffer, it means that it will not be >>> possible to do hypercall when the page table used to find the buffer IPA >>> has >>> not been touched. >> >> >> This function specifically works around the case where the page of the >> guest pagetable is not accessible due to mem_access, when the hardware >> based lookup doesn't work.This function checks what the fault was, >> checks the page type and the mem_access rights to determine whether >> the fault was legit, or if it was due to mem_access. If it was >> mem_access it gets the page without involving the hardware. I'm not >> following what you describe afterwards regarding the buffer and what >> you mean by "the buffer IPA has not been touched". Care to elaborate? > > > I am afraid to say that the function does not do what you think and is still > using the hardware to do the translation. For instance the function > gva_to_ipa is using the hardware to translate a VA to IPA. > > This function is called when it is not possible to directly translate a VA > to a PA. This may fail for various reason: > * The underlying memory of the buffer was restricted in stage-2 > * The underlying memory of stage-1 page tables was restricted in > stage-2 > > Whilst the function is solving the former, the latter will not work due to > the call to gva_to_ipa. This will fail because the stage-1 PT are not > accessible. I see. So IMHO this is not a problem with mem_access in general, but a problem with a specific application of mem_access on ARM (ie. restricting read access to guest pagetables). It's a pitty that ARM doesn't report the IPA automatically during a stage-2 violation. A way to work around this would require mem_access restrictions to be complete removed, which cannot be done unless all other vCPUs of the domain are paused to avoid a race-condition. With altp2m I could also envision creating a temporary p2m for the vcpu at hand with the restriction removed, so that it doesn't affect other vcpus. However, without a use-case specifically requiring this to be implemented I would not deem it critical. For now a comment in the header describing this limitation would suffice from my perspective. Thanks, Tamas
Hi Tamas, On 12/12/2016 19:41, Tamas K Lengyel wrote: > On Mon, Dec 12, 2016 at 12:11 PM, Julien Grall <julien.grall@arm.com> wrote: >> Hi Tamas, >> >> On 12/12/16 18:42, Tamas K Lengyel wrote: >>> >>> On Mon, Dec 12, 2016 at 4:46 AM, Julien Grall <julien.grall@arm.com> >>> wrote: >>>> >>>> The translation VA to IPA (guest physical address) is done using >>>> hardware. >>>> If the underlying memory of the stage-1 page table is protected, so the >>>> translation will fail. Given that this function is used in hypercall to >>>> retrieve the page associated to a buffer, it means that it will not be >>>> possible to do hypercall when the page table used to find the buffer IPA >>>> has >>>> not been touched. >>> >>> >>> This function specifically works around the case where the page of the >>> guest pagetable is not accessible due to mem_access, when the hardware >>> based lookup doesn't work.This function checks what the fault was, >>> checks the page type and the mem_access rights to determine whether >>> the fault was legit, or if it was due to mem_access. If it was >>> mem_access it gets the page without involving the hardware. I'm not >>> following what you describe afterwards regarding the buffer and what >>> you mean by "the buffer IPA has not been touched". Care to elaborate? >> >> >> I am afraid to say that the function does not do what you think and is still >> using the hardware to do the translation. For instance the function >> gva_to_ipa is using the hardware to translate a VA to IPA. >> >> This function is called when it is not possible to directly translate a VA >> to a PA. This may fail for various reason: >> * The underlying memory of the buffer was restricted in stage-2 >> * The underlying memory of stage-1 page tables was restricted in >> stage-2 >> >> Whilst the function is solving the former, the latter will not work due to >> the call to gva_to_ipa. This will fail because the stage-1 PT are not >> accessible. > > I see. So IMHO this is not a problem with mem_access in general, but a > problem with a specific application of mem_access on ARM (ie. > restricting read access to guest pagetables). It's a pitty that ARM > doesn't report the IPA automatically during a stage-2 violation. I don't understand what you are asking for here. If you are not able to access stage-1 page table how would you be able to find the IPA? It works on x86 because, IIRC, you do a software page table walking. Although, I don't think you have particular write/read access checking on x86. > A way to work around this would require mem_access restrictions to be > complete removed, which cannot be done unless all other vCPUs of the > domain are paused to avoid a race-condition. With altp2m I could also > envision creating a temporary p2m for the vcpu at hand with the > restriction removed, so that it doesn't affect other vcpus. However, > without a use-case specifically requiring this to be implemented I > would not deem it critical. I suggested a use-case on the previous e-mail... You are affected today because Linux is creating hypercall buffer on the stack and heap. So the memory would always be accessed before. I could foresee guest using const hypercall buffer. > For now a comment in the header describing > this limitation would suffice from my perspective. So you are going to defer everything until someone actually hit it? It might be time for you to focus a bit more on other use case... Regards,
On Mon, Dec 12, 2016 at 2:28 PM, Julien Grall <julien.grall@arm.com> wrote: > Hi Tamas, > > > On 12/12/2016 19:41, Tamas K Lengyel wrote: >> >> On Mon, Dec 12, 2016 at 12:11 PM, Julien Grall <julien.grall@arm.com> >> wrote: >>> >>> Hi Tamas, >>> >>> On 12/12/16 18:42, Tamas K Lengyel wrote: >>>> >>>> >>>> On Mon, Dec 12, 2016 at 4:46 AM, Julien Grall <julien.grall@arm.com> >>>> wrote: >>>>> >>>>> >>>>> The translation VA to IPA (guest physical address) is done using >>>>> hardware. >>>>> If the underlying memory of the stage-1 page table is protected, so the >>>>> translation will fail. Given that this function is used in hypercall to >>>>> retrieve the page associated to a buffer, it means that it will not be >>>>> possible to do hypercall when the page table used to find the buffer >>>>> IPA >>>>> has >>>>> not been touched. >>>> >>>> >>>> >>>> This function specifically works around the case where the page of the >>>> guest pagetable is not accessible due to mem_access, when the hardware >>>> based lookup doesn't work.This function checks what the fault was, >>>> checks the page type and the mem_access rights to determine whether >>>> the fault was legit, or if it was due to mem_access. If it was >>>> mem_access it gets the page without involving the hardware. I'm not >>>> following what you describe afterwards regarding the buffer and what >>>> you mean by "the buffer IPA has not been touched". Care to elaborate? >>> >>> >>> >>> I am afraid to say that the function does not do what you think and is >>> still >>> using the hardware to do the translation. For instance the function >>> gva_to_ipa is using the hardware to translate a VA to IPA. >>> >>> This function is called when it is not possible to directly translate a >>> VA >>> to a PA. This may fail for various reason: >>> * The underlying memory of the buffer was restricted in stage-2 >>> * The underlying memory of stage-1 page tables was restricted in >>> stage-2 >>> >>> Whilst the function is solving the former, the latter will not work due >>> to >>> the call to gva_to_ipa. This will fail because the stage-1 PT are not >>> accessible. >> >> >> I see. So IMHO this is not a problem with mem_access in general, but a >> problem with a specific application of mem_access on ARM (ie. >> restricting read access to guest pagetables). It's a pitty that ARM >> doesn't report the IPA automatically during a stage-2 violation. > > > I don't understand what you are asking for here. If you are not able to > access stage-1 page table how would you be able to find the IPA? I'm not asking for anything, I'm expressing how it's a pity that ARM CPUs are limited in this regard compared to x86. > > It works on x86 because, IIRC, you do a software page table walking. > Although, I don't think you have particular write/read access checking on > x86. I don't recall there being any software page walking being involved on x86. Why would that be needed? On x86 we get the guest physical address recorded by the CPU automatically. AFAIK in case the pagetable was unaccessible for the translation of a VA, we would get an event with the pagetable's PA and the type of event that generated it (ie. reading during translation). > >> A way to work around this would require mem_access restrictions to be >> complete removed, which cannot be done unless all other vCPUs of the >> domain are paused to avoid a race-condition. With altp2m I could also >> envision creating a temporary p2m for the vcpu at hand with the >> restriction removed, so that it doesn't affect other vcpus. However, >> without a use-case specifically requiring this to be implemented I >> would not deem it critical. > > > I suggested a use-case on the previous e-mail... You are affected today > because Linux is creating hypercall buffer on the stack and heap. So the > memory would always be accessed before. I could foresee guest using const > hypercall buffer. > >> For now a comment in the header describing >> this limitation would suffice from my perspective. > > > So you are going to defer everything until someone actually hit it? It might > be time for you to focus a bit more on other use case... > Yes, as long as this is not a critical issue that breaks mem_access and can be worked around I'll postpone spending time on it. If someone finds the time in the meanwhile to submit a patch fixing it I would be happy to review and test it. Tamas
Hello Tamas, On 12/12/16 23:47, Tamas K Lengyel wrote: > On Mon, Dec 12, 2016 at 2:28 PM, Julien Grall <julien.grall@arm.com> wrote: >> On 12/12/2016 19:41, Tamas K Lengyel wrote: >>> On Mon, Dec 12, 2016 at 12:11 PM, Julien Grall <julien.grall@arm.com> >>> wrote: >>>> On 12/12/16 18:42, Tamas K Lengyel wrote: >>>>> On Mon, Dec 12, 2016 at 4:46 AM, Julien Grall <julien.grall@arm.com> >>>>> wrote: >>> I see. So IMHO this is not a problem with mem_access in general, but a >>> problem with a specific application of mem_access on ARM (ie. >>> restricting read access to guest pagetables). It's a pitty that ARM >>> doesn't report the IPA automatically during a stage-2 violation. >> >> >> I don't understand what you are asking for here. If you are not able to >> access stage-1 page table how would you be able to find the IPA? > > I'm not asking for anything, I'm expressing how it's a pity that ARM > CPUs are limited in this regard compared to x86. Give a look at the ARM ARM before complaining. The IPA will be provided (see HPFAR) on a stage-2 data/prefetch abort fault. > >> >> It works on x86 because, IIRC, you do a software page table walking. >> Although, I don't think you have particular write/read access checking on >> x86. > > I don't recall there being any software page walking being involved on > x86. Why would that be needed? On x86 we get the guest physical > address recorded by the CPU automatically. AFAIK in case the pagetable > was unaccessible for the translation of a VA, we would get an event > with the pagetable's PA and the type of event that generated it (ie. > reading during translation). You are talking about a different thing. The function p2m_mem_access_check_and_get_page is only used during copy_*_guest helpers which will copy hypercall buffer. If you give a look at the x86 code, for simplicity let's focus on HVM, the function __hvm_copy will call paging_gva_to_gfn which is doing the table walk in software (see arch/x86/mm/hap/guest_walk.c). No hardware instruction like on ARM... Although, it looks like that there is hardware instruction to do address translation (see nvmx_hap_walk_L1_p2m), but only for nested virtualization. And even in this case, they will return the IPA (e.g guest PA) only if stage-1 page table are accessible. > >> >>> A way to work around this would require mem_access restrictions to be >>> complete removed, which cannot be done unless all other vCPUs of the >>> domain are paused to avoid a race-condition. With altp2m I could also >>> envision creating a temporary p2m for the vcpu at hand with the >>> restriction removed, so that it doesn't affect other vcpus. However, >>> without a use-case specifically requiring this to be implemented I >>> would not deem it critical. >> >> >> I suggested a use-case on the previous e-mail... You are affected today >> because Linux is creating hypercall buffer on the stack and heap. So the >> memory would always be accessed before. I could foresee guest using const >> hypercall buffer. >> >>> For now a comment in the header describing >>> this limitation would suffice from my perspective. >> >> >> So you are going to defer everything until someone actually hit it? It might >> be time for you to focus a bit more on other use case... >> > > Yes, as long as this is not a critical issue that breaks mem_access > and can be worked around I'll postpone spending time on it. If someone > finds the time in the meanwhile to submit a patch fixing it I would be > happy to review and test it. I will be happy to keep the memaccess code in p2m.c until I see a strong reason to move it in a separate file. Your sincerely,
On Tue, Dec 13, 2016 at 5:50 AM, Julien Grall <julien.grall@arm.com> wrote: > Hello Tamas, > > On 12/12/16 23:47, Tamas K Lengyel wrote: >> >> On Mon, Dec 12, 2016 at 2:28 PM, Julien Grall <julien.grall@arm.com> >> wrote: >>> >>> On 12/12/2016 19:41, Tamas K Lengyel wrote: >>>> >>>> On Mon, Dec 12, 2016 at 12:11 PM, Julien Grall <julien.grall@arm.com> >>>> wrote: >>>>> >>>>> On 12/12/16 18:42, Tamas K Lengyel wrote: >>>>>> >>>>>> On Mon, Dec 12, 2016 at 4:46 AM, Julien Grall <julien.grall@arm.com> >>>>>> wrote: >>>> >>>> I see. So IMHO this is not a problem with mem_access in general, but a >>>> problem with a specific application of mem_access on ARM (ie. >>>> restricting read access to guest pagetables). It's a pitty that ARM >>>> doesn't report the IPA automatically during a stage-2 violation. >>> >>> >>> >>> I don't understand what you are asking for here. If you are not able to >>> access stage-1 page table how would you be able to find the IPA? >> >> >> I'm not asking for anything, I'm expressing how it's a pity that ARM >> CPUs are limited in this regard compared to x86. > > > Give a look at the ARM ARM before complaining. The IPA will be provided (see > HPFAR) on a stage-2 data/prefetch abort fault. > >> >>> >>> It works on x86 because, IIRC, you do a software page table walking. >>> Although, I don't think you have particular write/read access checking on >>> x86. >> >> >> I don't recall there being any software page walking being involved on >> x86. Why would that be needed? On x86 we get the guest physical >> address recorded by the CPU automatically. AFAIK in case the pagetable >> was unaccessible for the translation of a VA, we would get an event >> with the pagetable's PA and the type of event that generated it (ie. >> reading during translation). > > > You are talking about a different thing. The function > p2m_mem_access_check_and_get_page is only used during copy_*_guest helpers > which will copy hypercall buffer. > > If you give a look at the x86 code, for simplicity let's focus on HVM, the > function __hvm_copy will call paging_gva_to_gfn which is doing the > table walk in software (see arch/x86/mm/hap/guest_walk.c). No hardware > instruction like on ARM... > > Although, it looks like that there is hardware instruction to do address > translation (see nvmx_hap_walk_L1_p2m), but only for nested virtualization. > And even in this case, they will return the IPA (e.g guest PA) only if > stage-1 page table are accessible. > >> >>> >>>> A way to work around this would require mem_access restrictions to be >>>> complete removed, which cannot be done unless all other vCPUs of the >>>> domain are paused to avoid a race-condition. With altp2m I could also >>>> envision creating a temporary p2m for the vcpu at hand with the >>>> restriction removed, so that it doesn't affect other vcpus. However, >>>> without a use-case specifically requiring this to be implemented I >>>> would not deem it critical. >>> >>> >>> >>> I suggested a use-case on the previous e-mail... You are affected today >>> because Linux is creating hypercall buffer on the stack and heap. So the >>> memory would always be accessed before. I could foresee guest using const >>> hypercall buffer. >>> >>>> For now a comment in the header describing >>>> this limitation would suffice from my perspective. >>> >>> >>> >>> So you are going to defer everything until someone actually hit it? It >>> might >>> be time for you to focus a bit more on other use case... >>> >> >> Yes, as long as this is not a critical issue that breaks mem_access >> and can be worked around I'll postpone spending time on it. If someone >> finds the time in the meanwhile to submit a patch fixing it I would be >> happy to review and test it. > > > I will be happy to keep the memaccess code in p2m.c until I see a strong > reason to move it in a separate file. > Does that mean you want to take over maintainership of mem_access on ARM? Otherwise I don't think that is an acceptable reason to keep it p2m. Tamas
On 12/12/16 23:47, Tamas K Lengyel wrote: > >> It works on x86 because, IIRC, you do a software page table walking. >> Although, I don't think you have particular write/read access checking on >> x86. > I don't recall there being any software page walking being involved on > x86. Why would that be needed? On x86 we get the guest physical > address recorded by the CPU automatically. AFAIK in case the pagetable > was unaccessible for the translation of a VA, we would get an event > with the pagetable's PA and the type of event that generated it (ie. > reading during translation). x86 provides no mechanism to have hardware translate an address in a separate context. Therefore, all translations which are necessary need to be done in hardware. Newer versions of VT-x/SVM may provide additional information on a vmexit, which include a guest physical address relevant to the the reason for the vmexit. Xen will attempt to proactively use this information to avoid a software pagewalk, but can always fall back to the software method if needs be. I presume ARM has always relied on hardware support for translation, and has no software translation support? I presume therefore that translations only work when in current context? ~Andrew
On Tue, Dec 13, 2016 at 5:50 AM, Julien Grall <julien.grall@arm.com> wrote: > Hello Tamas, > > On 12/12/16 23:47, Tamas K Lengyel wrote: >> >> On Mon, Dec 12, 2016 at 2:28 PM, Julien Grall <julien.grall@arm.com> >> wrote: >>> >>> On 12/12/2016 19:41, Tamas K Lengyel wrote: >>>> >>>> On Mon, Dec 12, 2016 at 12:11 PM, Julien Grall <julien.grall@arm.com> >>>> wrote: >>>>> >>>>> On 12/12/16 18:42, Tamas K Lengyel wrote: >>>>>> >>>>>> On Mon, Dec 12, 2016 at 4:46 AM, Julien Grall <julien.grall@arm.com> >>>>>> wrote: >>>> >>>> I see. So IMHO this is not a problem with mem_access in general, but a >>>> problem with a specific application of mem_access on ARM (ie. >>>> restricting read access to guest pagetables). It's a pitty that ARM >>>> doesn't report the IPA automatically during a stage-2 violation. >>> >>> >>> >>> I don't understand what you are asking for here. If you are not able to >>> access stage-1 page table how would you be able to find the IPA? >> >> >> I'm not asking for anything, I'm expressing how it's a pity that ARM >> CPUs are limited in this regard compared to x86. > > > Give a look at the ARM ARM before complaining. The IPA will be provided (see > HPFAR) on a stage-2 data/prefetch abort fault. Yes, we are already using HPFAR_EL2 when the guest faults with mem_access. Here however Xen faults when it tries to do gvirt_to_maddr. For some reason I was under the impression this register is only valid when the violation originates from guest context - hence my comment. If that's not the case we could avoid using gva_to_ipa and determine if mem_access was the reason for the gvirt_to_maddr failure by nature of having hpfar_is_valid return true AFAICT. Tamas
Hi Andrew, On 13/12/16 18:28, Andrew Cooper wrote: > On 12/12/16 23:47, Tamas K Lengyel wrote: >> >>> It works on x86 because, IIRC, you do a software page table walking. >>> Although, I don't think you have particular write/read access checking on >>> x86. >> I don't recall there being any software page walking being involved on >> x86. Why would that be needed? On x86 we get the guest physical >> address recorded by the CPU automatically. AFAIK in case the pagetable >> was unaccessible for the translation of a VA, we would get an event >> with the pagetable's PA and the type of event that generated it (ie. >> reading during translation). > > x86 provides no mechanism to have hardware translate an address in a > separate context. Therefore, all translations which are necessary need > to be done in hardware. I guess you meant "need to be done in sofware"? > > Newer versions of VT-x/SVM may provide additional information on a > vmexit, which include a guest physical address relevant to the the > reason for the vmexit. > > Xen will attempt to proactively use this information to avoid a software > pagewalk, but can always fall back to the software method if needs be. That is a good idea, but may bring some issue with memacces as we currently have on ARM. > > > I presume ARM has always relied on hardware support for translation, and > has no software translation support? I presume therefore that > translations only work when in current context? That's correct. ARM provided hardware support for most of translation from the start. But it relies on the guest page table to be accessible (e.g there is no memory restriction in stage-2). Note that ARM does not provide any hardware instruction to translate an IPA (guest physical address) to a PA. So we have a walker there. We also a walk for debugging guest page table (though only when it is using LPAE). I guess it could be re-used in the case where it is not possible to do it in hardware. Although, it would be rewritten to make it safe. Cheers,
On 13/12/16 18:41, Julien Grall wrote: > Hi Andrew, > > On 13/12/16 18:28, Andrew Cooper wrote: >> On 12/12/16 23:47, Tamas K Lengyel wrote: >>> >>>> It works on x86 because, IIRC, you do a software page table walking. >>>> Although, I don't think you have particular write/read access >>>> checking on >>>> x86. >>> I don't recall there being any software page walking being involved on >>> x86. Why would that be needed? On x86 we get the guest physical >>> address recorded by the CPU automatically. AFAIK in case the pagetable >>> was unaccessible for the translation of a VA, we would get an event >>> with the pagetable's PA and the type of event that generated it (ie. >>> reading during translation). >> >> x86 provides no mechanism to have hardware translate an address in a >> separate context. Therefore, all translations which are necessary need >> to be done in hardware. > > I guess you meant "need to be done in sofware"? I did. Sorry for the confusion. > >> >> Newer versions of VT-x/SVM may provide additional information on a >> vmexit, which include a guest physical address relevant to the the >> reason for the vmexit. >> >> Xen will attempt to proactively use this information to avoid a software >> pagewalk, but can always fall back to the software method if needs be. > > That is a good idea, but may bring some issue with memacces as we > currently have on ARM. Why would a software-only approach have problem on ARM? Slow certainly, but it should function correctly. > >> >> >> I presume ARM has always relied on hardware support for translation, and >> has no software translation support? I presume therefore that >> translations only work when in current context? > > That's correct. ARM provided hardware support for most of translation > from the start. But it relies on the guest page table to be accessible > (e.g there is no memory restriction in stage-2). Ok, so ARM provides an instruction to translate an arbitrary virtual address to guest physical. How does this work in the context of Xen, or can hardware follow the currently-active virtualisation state to find the guest pagetables? Or does it rely on information in the TLB? > Note that ARM does not provide any hardware instruction to translate > an IPA (guest physical address) to a PA. So we have a walker there. > > We also a walk for debugging guest page table (though only when it is > using LPAE). I guess it could be re-used in the case where it is not > possible to do it in hardware. Although, it would be rewritten to make > it safe. This sounds like the kind of thing which would be generally useful, although I'd like to understand the problem better before making suggestions. ~Andrew
Hi Andrew, On 13/12/16 19:39, Andrew Cooper wrote: > On 13/12/16 18:41, Julien Grall wrote: >> On 13/12/16 18:28, Andrew Cooper wrote: >>> On 12/12/16 23:47, Tamas K Lengyel wrote: >>> Newer versions of VT-x/SVM may provide additional information on a >>> vmexit, which include a guest physical address relevant to the the >>> reason for the vmexit. >>> >>> Xen will attempt to proactively use this information to avoid a software >>> pagewalk, but can always fall back to the software method if needs be. >> >> That is a good idea, but may bring some issue with memacces as we >> currently have on ARM. > > Why would a software-only approach have problem on ARM? Slow certainly, > but it should function correctly. Sorry I meant, that using hardware instruction to translate an address on ARM has some drawback when using memaccess. However, ARM supports 2 kind of page tables (short page table and LPAE), different page size (4KB, 16KB, 64KB), split page tables, endianness... This would require some works to make all the combinations working. Furthermore, on 32-bit host not all the RAM is mapped in Xen. So guest pages are mapped on demand, requiring TLB invalidation. So I would only use the software approach when it is strictly necessary. > >> >>> >>> >>> I presume ARM has always relied on hardware support for translation, and >>> has no software translation support? I presume therefore that >>> translations only work when in current context? >> >> That's correct. ARM provided hardware support for most of translation >> from the start. But it relies on the guest page table to be accessible >> (e.g there is no memory restriction in stage-2). > > Ok, so ARM provides an instruction to translate an arbitrary virtual > address to guest physical. How does this work in the context of Xen, or > can hardware follow the currently-active virtualisation state to find > the guest pagetables? Or does it rely on information in the TLB? Xen and the guest vCPU are using different separate set of the registers. When running in Xen context, the guest vCPU state is still present and will be used by the hardware to translate a VA to guest physical address. > >> Note that ARM does not provide any hardware instruction to translate >> an IPA (guest physical address) to a PA. So we have a walker there. >> >> We also a walk for debugging guest page table (though only when it is >> using LPAE). I guess it could be re-used in the case where it is not >> possible to do it in hardware. Although, it would be rewritten to make >> it safe. > > This sounds like the kind of thing which would be generally useful, > although I'd like to understand the problem better before making > suggestions. memaccess will restrict permission of certain memory regions in stage-2 page table. For the purpose of the example, lets say read-access as been restricted. One of these memory regions may contain the stage-1 page table. Do you agree that the guest will not able to read stage-1 page table due the restriction? Similarly, when the hardware will do the page table walk it all the accesses will be on behalf of the guest. So a read on stage-1 page table would fail and the hardware will not be able to do the translation. I hope this clarify the problem. Cheers,
On Wed, Dec 14, 2016 at 8:05 PM, Julien Grall <julien.grall@arm.com> wrote: >>> Note that ARM does not provide any hardware instruction to translate >>> an IPA (guest physical address) to a PA. So we have a walker there. >>> >>> We also a walk for debugging guest page table (though only when it is >>> using LPAE). I guess it could be re-used in the case where it is not >>> possible to do it in hardware. Although, it would be rewritten to make >>> it safe. >> >> >> This sounds like the kind of thing which would be generally useful, >> although I'd like to understand the problem better before making >> suggestions. > > > memaccess will restrict permission of certain memory regions in stage-2 page > table. For the purpose of the example, lets say read-access as been > restricted. > > One of these memory regions may contain the stage-1 page table. Do you agree > that the guest will not able to read stage-1 page table due the restriction? But only if the memory is read-protected, right? If a guest PT (stage-1) has read permission but not write permission, the hardware walker should still work, shouldn't it? -George
Hi George, On 15/12/16 04:54, George Dunlap wrote: > On Wed, Dec 14, 2016 at 8:05 PM, Julien Grall <julien.grall@arm.com> wrote: >>>> Note that ARM does not provide any hardware instruction to translate >>>> an IPA (guest physical address) to a PA. So we have a walker there. >>>> >>>> We also a walk for debugging guest page table (though only when it is >>>> using LPAE). I guess it could be re-used in the case where it is not >>>> possible to do it in hardware. Although, it would be rewritten to make >>>> it safe. >>> >>> >>> This sounds like the kind of thing which would be generally useful, >>> although I'd like to understand the problem better before making >>> suggestions. >> >> >> memaccess will restrict permission of certain memory regions in stage-2 page >> table. For the purpose of the example, lets say read-access as been >> restricted. >> >> One of these memory regions may contain the stage-1 page table. Do you agree >> that the guest will not able to read stage-1 page table due the restriction? > > But only if the memory is read-protected, right? If a guest PT > (stage-1) has read permission but not write permission, the hardware > walker should still work, shouldn't it? Good question, ARMv8.1 is adding support of hardware management for the access flag and dirty state (see chapter B4 in DDI0557A.b). So if the hardware has to update the page table entry it would need write permission. I have looked at the pseudo code for the address translation in both ARMv8.0 and ARMv8.1. ARMv8.0 does not need to update the table entry in hardware, looking at the AArch64.CheckS2Permission pseudo-code (see J1-5290 in ARM DDI 0486A.k_iss10775), the hardware walk will check whether stage-2 as read permission during Stage 1 page table walk (s2fs1walk variable is to true). For ARMv8.1, after spending a couple of hours reading the pseudo-code (see E.1.2.5 in DDI0557A.b) my understanding is the hardware walker may need to update the entry (see AArch64.CheckAndUpdateDescriptor). This is depending on who has triggered the walker (e.g address translation instruction, instruction fetch, memory access...). In the case of address translation instruction (AccType_AT) no hardware update will be done. So to answer your question, the address translation instruction only need read access. So the issue mentioned would only happen when the memory is read-protected. Reading the pseudo-code reminded me a potential upcoming error with memaccess. I mentioned it a couple of months ago (see [1] and the KVM counterpart [2]) on the ML. Stage-2 permission fault during stage-1 walk, the syndrome field WnR (Write not Read) will report whether the abort was caused by a write instruction or read instruction. And not whether the hardware walker was reading or writing the page table entry. This is pretty obvious with the new pseudo-code for AArch64.CheckPermission in the spec (see J1-5290 in ARM DDI 0486A.k_iss10775). I think that a guest having stage-1 page table protected in stage-2 could be stuck forever faulting and retrying a write instruction because memaccess would receive the wrong fault (i.e write fault instead of read). Cheers, [1] https://lists.xenproject.org/archives/html/xen-devel/2016-10/msg00228.html [2] https://lists.cs.columbia.edu/pipermail/kvmarm/2016-September/021925.html
On Fri, Dec 9, 2016 at 12:59 PM, Tamas K Lengyel <tamas.lengyel@zentific.com> wrote: > The only caller of this function is get_page_from_gva which already takes > a vcpu pointer as input. Pass this along to make the function in-line with > its intended use-case. > > Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> > --- > Cc: Stefano Stabellini <sstabellini@kernel.org> > Cc: Julien Grall <julien.grall@arm.com> Patch ping. > --- > xen/arch/arm/p2m.c | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index cc5634b..837be1d 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -1461,7 +1461,8 @@ mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn) > * we indeed found a conflicting mem_access setting. > */ > static struct page_info* > -p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > +p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag, > + const struct vcpu *v) > { > long rc; > paddr_t ipa; > @@ -1470,7 +1471,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > xenmem_access_t xma; > p2m_type_t t; > struct page_info *page = NULL; > - struct p2m_domain *p2m = ¤t->domain->arch.p2m; > + struct p2m_domain *p2m = &v->domain->arch.p2m; > > rc = gva_to_ipa(gva, &ipa, flag); > if ( rc < 0 ) > @@ -1482,7 +1483,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > * We do this first as this is faster in the default case when no > * permission is set on the page. > */ > - rc = __p2m_get_mem_access(current->domain, gfn, &xma); > + rc = __p2m_get_mem_access(v->domain, gfn, &xma); > if ( rc < 0 ) > goto err; > > @@ -1546,7 +1547,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > > page = mfn_to_page(mfn_x(mfn)); > > - if ( unlikely(!get_page(page, current->domain)) ) > + if ( unlikely(!get_page(page, v->domain)) ) > page = NULL; > > err: > @@ -1587,7 +1588,7 @@ struct page_info *get_page_from_gva(struct vcpu *v, vaddr_t va, > > err: > if ( !page && p2m->mem_access_enabled ) > - page = p2m_mem_access_check_and_get_page(va, flags); > + page = p2m_mem_access_check_and_get_page(va, flags, v); > > p2m_read_unlock(p2m); > > -- > 2.10.2 >
On Fri, 9 Dec 2016, Tamas K Lengyel wrote: > The only caller of this function is get_page_from_gva which already takes > a vcpu pointer as input. Pass this along to make the function in-line with > its intended use-case. > > Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> > --- > Cc: Stefano Stabellini <sstabellini@kernel.org> > Cc: Julien Grall <julien.grall@arm.com> > --- > xen/arch/arm/p2m.c | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index cc5634b..837be1d 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -1461,7 +1461,8 @@ mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn) > * we indeed found a conflicting mem_access setting. > */ > static struct page_info* > -p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > +p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag, > + const struct vcpu *v) > { > long rc; > paddr_t ipa; > @@ -1470,7 +1471,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > xenmem_access_t xma; > p2m_type_t t; > struct page_info *page = NULL; > - struct p2m_domain *p2m = ¤t->domain->arch.p2m; > + struct p2m_domain *p2m = &v->domain->arch.p2m; > > rc = gva_to_ipa(gva, &ipa, flag); > if ( rc < 0 ) > @@ -1482,7 +1483,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > * We do this first as this is faster in the default case when no > * permission is set on the page. > */ > - rc = __p2m_get_mem_access(current->domain, gfn, &xma); > + rc = __p2m_get_mem_access(v->domain, gfn, &xma); > if ( rc < 0 ) > goto err; > > @@ -1546,7 +1547,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) > > page = mfn_to_page(mfn_x(mfn)); > > - if ( unlikely(!get_page(page, current->domain)) ) > + if ( unlikely(!get_page(page, v->domain)) ) > page = NULL; > > err: > @@ -1587,7 +1588,7 @@ struct page_info *get_page_from_gva(struct vcpu *v, vaddr_t va, > > err: > if ( !page && p2m->mem_access_enabled ) > - page = p2m_mem_access_check_and_get_page(va, flags); > + page = p2m_mem_access_check_and_get_page(va, flags, v); > > p2m_read_unlock(p2m); > > -- > 2.10.2 >
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index cc5634b..837be1d 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -1461,7 +1461,8 @@ mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn) * we indeed found a conflicting mem_access setting. */ static struct page_info* -p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) +p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag, + const struct vcpu *v) { long rc; paddr_t ipa; @@ -1470,7 +1471,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) xenmem_access_t xma; p2m_type_t t; struct page_info *page = NULL; - struct p2m_domain *p2m = ¤t->domain->arch.p2m; + struct p2m_domain *p2m = &v->domain->arch.p2m; rc = gva_to_ipa(gva, &ipa, flag); if ( rc < 0 ) @@ -1482,7 +1483,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) * We do this first as this is faster in the default case when no * permission is set on the page. */ - rc = __p2m_get_mem_access(current->domain, gfn, &xma); + rc = __p2m_get_mem_access(v->domain, gfn, &xma); if ( rc < 0 ) goto err; @@ -1546,7 +1547,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag) page = mfn_to_page(mfn_x(mfn)); - if ( unlikely(!get_page(page, current->domain)) ) + if ( unlikely(!get_page(page, v->domain)) ) page = NULL; err: @@ -1587,7 +1588,7 @@ struct page_info *get_page_from_gva(struct vcpu *v, vaddr_t va, err: if ( !page && p2m->mem_access_enabled ) - page = p2m_mem_access_check_and_get_page(va, flags); + page = p2m_mem_access_check_and_get_page(va, flags, v); p2m_read_unlock(p2m);
The only caller of this function is get_page_from_gva which already takes a vcpu pointer as input. Pass this along to make the function in-line with its intended use-case. Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> --- Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Julien Grall <julien.grall@arm.com> --- xen/arch/arm/p2m.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)