diff mbox

[RFC,4/4] arm/mem_access: Add software guest-page-table walk

Message ID 20170430194838.29932-5-proskurin@sec.in.tum.de (mailing list archive)
State New, archived
Headers show

Commit Message

Sergej Proskurin April 30, 2017, 7:48 p.m. UTC
The function p2m_mem_access_check_and_get_page in mem_access.c
translates a gva to an ipa by means of the hardware functionality
implemented in the function gva_to_ipa. If mem_access is active,
hardware-based gva to ipa translation might fail, as gva_to_ipa uses the
guest's translation tables, access to which might be restricted by the
active VTTBR. To address this issue, we perform the gva to ipa
translation in software.

Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
---
Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
Cc: Tamas K Lengyel <tamas@tklengyel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
---
 xen/arch/arm/mem_access.c | 140 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 139 insertions(+), 1 deletion(-)

Comments

Razvan Cojocaru May 1, 2017, 2:38 p.m. UTC | #1
On 04/30/2017 10:48 PM, Sergej Proskurin wrote:
> The function p2m_mem_access_check_and_get_page in mem_access.c
> translates a gva to an ipa by means of the hardware functionality
> implemented in the function gva_to_ipa. If mem_access is active,
> hardware-based gva to ipa translation might fail, as gva_to_ipa uses the
> guest's translation tables, access to which might be restricted by the
> active VTTBR. To address this issue, we perform the gva to ipa
> translation in software.
> 
> Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
> ---
> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
> Cc: Tamas K Lengyel <tamas@tklengyel.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>
> ---
>  xen/arch/arm/mem_access.c | 140 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 139 insertions(+), 1 deletion(-)

My ARM knowledge is scant to say the least, and I have no way to test
this code, so I'll leave it to Tamas who has done some ARM work in the
past. In any case - to state the obvious - the main acks here I believe
are Julien and Stefano.


Thanks,
Razvan
Sergej Proskurin May 1, 2017, 3:39 p.m. UTC | #2
Hi Razvan,

Sure thing. Thanks anyway :)

Cheers,

~Sergej


On 05/01/2017 04:38 PM, Razvan Cojocaru wrote:
> On 04/30/2017 10:48 PM, Sergej Proskurin wrote:
>> The function p2m_mem_access_check_and_get_page in mem_access.c
>> translates a gva to an ipa by means of the hardware functionality
>> implemented in the function gva_to_ipa. If mem_access is active,
>> hardware-based gva to ipa translation might fail, as gva_to_ipa uses the
>> guest's translation tables, access to which might be restricted by the
>> active VTTBR. To address this issue, we perform the gva to ipa
>> translation in software.
>>
>> Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
>> ---
>> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
>> Cc: Tamas K Lengyel <tamas@tklengyel.com>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>> Cc: Julien Grall <julien.grall@arm.com>
>> ---
>>  xen/arch/arm/mem_access.c | 140 +++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 139 insertions(+), 1 deletion(-)
> My ARM knowledge is scant to say the least, and I have no way to test
> this code, so I'll leave it to Tamas who has done some ARM work in the
> past. In any case - to state the obvious - the main acks here I believe
> are Julien and Stefano.
>
>
> Thanks,
> Razvan
Julien Grall May 2, 2017, 3:17 p.m. UTC | #3
Hi Sergej,

On 30/04/17 20:48, Sergej Proskurin wrote:
> The function p2m_mem_access_check_and_get_page in mem_access.c
> translates a gva to an ipa by means of the hardware functionality
> implemented in the function gva_to_ipa. If mem_access is active,
> hardware-based gva to ipa translation might fail, as gva_to_ipa uses the
> guest's translation tables, access to which might be restricted by the
> active VTTBR. To address this issue, we perform the gva to ipa
> translation in software.
>
> Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
> ---
> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
> Cc: Tamas K Lengyel <tamas@tklengyel.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>
> ---
>  xen/arch/arm/mem_access.c | 140 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 139 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
> index 04b1506b00..352eb6073f 100644
> --- a/xen/arch/arm/mem_access.c
> +++ b/xen/arch/arm/mem_access.c
> @@ -20,6 +20,7 @@
>  #include <xen/monitor.h>
>  #include <xen/sched.h>
>  #include <xen/vm_event.h>
> +#include <xen/domain_page.h>
>  #include <public/vm_event.h>
>  #include <asm/event.h>
>
> @@ -90,6 +91,129 @@ static int __p2m_get_mem_access(struct domain *d, gfn_t gfn,
>      return 0;
>  }
>
> +static int
> +p2m_gva_to_ipa(struct p2m_domain *p2m, vaddr_t gva,
> +               paddr_t *ipa, unsigned int flags)

I don't think this function belongs to mem_access.c. It would be better 
in p2m.c. Also, the name is a bit confusing, a user would not know the 
difference between p2m_gva_to_ipa and gva_to_ipa.

> +{
> +    int level=0, t0_sz, t1_sz;

Coding style level = 0.

Also please use unsigned int for level.

t0_sz and t1_sz should be stay signed.

> +    unsigned long t0_max, t1_min;
> +    lpae_t pte, *table;
> +    mfn_t root_mfn;
> +    uint64_t ttbr;
> +    uint32_t sctlr = READ_SYSREG(SCTLR_EL1);
> +    register_t ttbcr = READ_SYSREG(TCR_EL1);

So you are assuming that the current vCPU is running, right? If so, this 
should be clarified in the commit message and in a comment above the 
function.

Also s/ttbcr/tcr/

> +    struct domain *d = p2m->domain;
> +
> +    const unsigned int offsets[4] = {
> +#ifdef CONFIG_ARM_64
> +        zeroeth_table_offset(gva),
> +#endif
> +        first_table_offset(gva),
> +        second_table_offset(gva),
> +        third_table_offset(gva)
> +    };

Offsets are based on the granularity of Xen (currently 4KB). However the 
guests can support 4KB, 16KB, 64KB. Please handle everything correctly.

> +
> +    const paddr_t masks[4] = {
> +#ifdef CONFIG_ARM_64
> +        ZEROETH_SIZE - 1,
> +#endif
> +        FIRST_SIZE - 1,
> +        SECOND_SIZE - 1,
> +        THIRD_SIZE - 1
> +    };
> +
> +    /* If the MMU is disabled, there is no need to translate the gva. */
> +    if ( !(sctlr & SCTLR_M) )
> +    {
> +        *ipa = gva;
> +
> +        return 0;
> +    }
> +
> +    if ( is_32bit_domain(d) )
> +    {
> +        /*
> +         * XXX: We do not support 32-bit domain translation table walks for
> +         * domains using the short-descriptor translation table format, yet.
> +         */

Debian ARM 32bit is using short-descriptor translation table. So are you 
suggesting that memaccess will not correctly with Debian guest?

> +        if ( !(ttbcr & TTBCR_EAE) )

See my comment on patch #2 about the naming convention.

> +            return -EFAULT;
> +
> +#ifdef CONFIG_ARM_64
> +        level = 1;

This sounds wrong to me. The first lookup level is determined by the 
value of T0SZ and T1SZ (see B3-1352 in ARM DDI 0406C.c).

For the AArch64 version see D4.2.5 in ARM DDI 0487A.k_iss10775.

> +#endif
> +    }
> +
> +#ifdef CONFIG_ARM_64
> +    if ( is_64bit_domain(d) )
> +    {
> +        /* Get the max GVA that can be translated by TTBR0. */
> +        t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TCR_SZ_MASK;
> +        t0_max = (1UL << (64 - t0_sz)) - 1;
> +
> +        /* Get the min GVA that can be translated by TTBR1. */
> +        t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TCR_SZ_MASK;
> +        t1_min = ~0UL - (1UL << (64 - t1_sz)) + 1;
> +    }
> +    else
> +#endif
> +    {
> +        /* Get the max GVA that can be translated by TTBR0. */
> +        t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TTBCR_SZ_MASK;

See my comment on patch #2 for the naming convention.

> +        t0_max = (1U << (32 - t0_sz)) - 1;
> +
> +        /* Get the min GVA that can be translated by TTBR1. */
> +        t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TTBCR_SZ_MASK;
> +        t1_min = ~0U - (1U << (32 - t1_sz)) + 1;

1U << (32 - t1_sz) will not fit in an unsigned long if t1_sz = 0.

Also, this code looks wrong to me. Looking at B3.6.4 in DDI 0406C.c, you 
don't handle properly t0_max and t1_min  when T0SZ or T1SZ is 0.

I think it would be worth for you to have a look to the pseudo-code in 
the ARM ARM for TranslationTableWalk (see B3.19.3 in ARM DDI 0406C.c and 
J1-5295 in ARM DDI 0487A.k_iss10775) which gives you all the details for 
doing properly translation table walk.

> +    }
> +
> +    if ( t0_max >= gva )
> +        /* Use TTBR0 for GVA to IPA translation. */
> +        ttbr = READ_SYSREG64(TTBR0_EL1);
> +    else if ( t1_min <= gva )
> +        /* Use TTBR1 for GVA to IPA translation. */
> +        ttbr = READ_SYSREG64(TTBR1_EL1);
> +    else
> +        /* GVA out of bounds of TTBR(0|1). */
> +        return -EFAULT;
> +
> +    /* Bits [63..48] might be used by an ASID. */
> +    root_mfn = p2m_lookup(d, _gfn(paddr_to_pfn(ttbr & ((1ULL<<48)-1))), NULL);

Please don't hardcode the mask.

> +
> +    /* Check, whether TTBR holds a valid address. */
> +    if ( mfn_eq(root_mfn, INVALID_MFN) )
> +        return -EFAULT;
> +
> +    table = map_domain_page(root_mfn);

Nothing prevents the guest to give back the page table whilst you 
browsing it. You may want to have a look to patch "ARM: introduce 
vgic_access_guest_memory()" [1] to generalize the function and avoid 
re-inventing the wheel.

> +
> +    for ( ; ; level++ )
> +    {
> +        pte = table[offsets[level]];

You likely want to use ACCESS_ONCE here to prevent the compiler to read 
the pte twice from the memory.

> +
> +        if ( level == 3 || !pte.walk.valid || !pte.walk.table )
> +            break;
> +
> +        unmap_domain_page(table);
> +
> +        root_mfn = p2m_lookup(d, _gfn(pte.walk.base), NULL);

No check on root_mfn?

> +        table = map_domain_page(root_mfn);
> +    }
> +
> +    unmap_domain_page(table);
> +
> +    if ( !pte.walk.valid )
> +        return -EFAULT;
> +
> +    /* Make sure the entry holds the requested access attributes. */
> +    if ( ((flags & GV2M_WRITE) == GV2M_WRITE) && pte.pt.ro )
> +        return -EFAULT;

IHMO, this function should return the access attribute of the page and 
let the caller decides what to do. This would make this function more 
generic.

> +
> +    *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level]);
> +
> +    return 0;
> +}
> +
> +
>  /*
>   * If mem_access is in use it might have been the reason why get_page_from_gva
>   * failed to fetch the page, as it uses the MMU for the permission checking.
> @@ -109,9 +233,23 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag,
>      struct page_info *page = NULL;
>      struct p2m_domain *p2m = &v->domain->arch.p2m;
>
> +    ASSERT(p2m->mem_access_enabled);
> +

Why this ASSERT has been added?

>      rc = gva_to_ipa(gva, &ipa, flag);
> +
> +    /*
> +     * In case mem_access is active, hardware-based gva_to_ipa translation
> +     * might fail. Since gva_to_ipa uses the guest's translation tables, access
> +     * to which might be restricted by the active VTTBR, we perform a gva to
> +     * ipa translation in software.
> +     */
>      if ( rc < 0 )
> -        goto err;
> +        if ( p2m_gva_to_ipa(p2m, gva, &ipa, flag) < 0 )
> +            /*
> +             * The software gva to ipa translation can still fail, if the the
> +             * gva is not mapped or does not hold the requested access rights.
> +             */
> +            goto err;

Rather falling back, why don't we do software page table walk every time?

>
>      gfn = _gfn(paddr_to_pfn(ipa));
>
>

Cheers,

[1] https://patchwork.kernel.org/patch/9676291/
Sergej Proskurin May 8, 2017, 9:22 a.m. UTC | #4
Hi Julien,


On 05/02/2017 05:17 PM, Julien Grall wrote:
> Hi Sergej,
>
> On 30/04/17 20:48, Sergej Proskurin wrote:
>> The function p2m_mem_access_check_and_get_page in mem_access.c
>> translates a gva to an ipa by means of the hardware functionality
>> implemented in the function gva_to_ipa. If mem_access is active,
>> hardware-based gva to ipa translation might fail, as gva_to_ipa uses the
>> guest's translation tables, access to which might be restricted by the
>> active VTTBR. To address this issue, we perform the gva to ipa
>> translation in software.
>>
>> Signed-off-by: Sergej Proskurin <proskurin@sec.in.tum.de>
>> ---
>> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com>
>> Cc: Tamas K Lengyel <tamas@tklengyel.com>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>> Cc: Julien Grall <julien.grall@arm.com>
>> ---
>>  xen/arch/arm/mem_access.c | 140
>> +++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 139 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
>> index 04b1506b00..352eb6073f 100644
>> --- a/xen/arch/arm/mem_access.c
>> +++ b/xen/arch/arm/mem_access.c
>> @@ -20,6 +20,7 @@
>>  #include <xen/monitor.h>
>>  #include <xen/sched.h>
>>  #include <xen/vm_event.h>
>> +#include <xen/domain_page.h>
>>  #include <public/vm_event.h>
>>  #include <asm/event.h>
>>
>> @@ -90,6 +91,129 @@ static int __p2m_get_mem_access(struct domain *d,
>> gfn_t gfn,
>>      return 0;
>>  }
>>
>> +static int
>> +p2m_gva_to_ipa(struct p2m_domain *p2m, vaddr_t gva,
>> +               paddr_t *ipa, unsigned int flags)
>
> I don't think this function belongs to mem_access.c. It would be
> better in p2m.c. Also, the name is a bit confusing, a user would not
> know the difference between p2m_gva_to_ipa and gva_to_ipa.

Fair enough, thank you.

>
>> +{
>> +    int level=0, t0_sz, t1_sz;
>
> Coding style level = 0.
>
> Also please use unsigned int for level.
>
> t0_sz and t1_sz should be stay signed.
>
>> +    unsigned long t0_max, t1_min;
>> +    lpae_t pte, *table;
>> +    mfn_t root_mfn;
>> +    uint64_t ttbr;
>> +    uint32_t sctlr = READ_SYSREG(SCTLR_EL1);
>> +    register_t ttbcr = READ_SYSREG(TCR_EL1);
>
> So you are assuming that the current vCPU is running, right? If so,
> this should be clarified in the commit message and in a comment above
> the function.

I will state that in a comment, thank you.

>
> Also s/ttbcr/tcr/
>
>> +    struct domain *d = p2m->domain;
>> +
>> +    const unsigned int offsets[4] = {
>> +#ifdef CONFIG_ARM_64
>> +        zeroeth_table_offset(gva),
>> +#endif
>> +        first_table_offset(gva),
>> +        second_table_offset(gva),
>> +        third_table_offset(gva)
>> +    };
>
> Offsets are based on the granularity of Xen (currently 4KB). However
> the guests can support 4KB, 16KB, 64KB. Please handle everything
> correctly.

True, ARM ist quite flexible. Yet, I have not seen an OS implementation
that is supported by Xen and makes use of page sizes other than 4KB and
their supersets (2/4MB, 1GB) (please let me know, if you know some),
which is why I doubt that we need that. Please let me know why you think
we need that kind of flexibility in software?

If you should nevertheless insist on that functionality, I would include
it in the next patch and try not to blow up the code too much.

>
>> +
>> +    const paddr_t masks[4] = {
>> +#ifdef CONFIG_ARM_64
>> +        ZEROETH_SIZE - 1,
>> +#endif
>> +        FIRST_SIZE - 1,
>> +        SECOND_SIZE - 1,
>> +        THIRD_SIZE - 1
>> +    };
>> +
>> +    /* If the MMU is disabled, there is no need to translate the
>> gva. */
>> +    if ( !(sctlr & SCTLR_M) )
>> +    {
>> +        *ipa = gva;
>> +
>> +        return 0;
>> +    }
>> +
>> +    if ( is_32bit_domain(d) )
>> +    {
>> +        /*
>> +         * XXX: We do not support 32-bit domain translation table
>> walks for
>> +         * domains using the short-descriptor translation table
>> format, yet.
>> +         */
>
> Debian ARM 32bit is using short-descriptor translation table. So are
> you suggesting that memaccess will not correctly with Debian guest?
>

Yes, as stated in the comment, the current implementation does not
support the short-descriptor translation table format. As this is an RFC
patch, I wanted to see your opinion on that functionality in general
before implementing all corner cases of the ARM architecture.

As mentioned in my previous reply in patch (patch 2/4), I would prefer
to separate the long-descriptor from the short-descriptor translation in
the future implementation.

>> +        if ( !(ttbcr & TTBCR_EAE) )
>
> See my comment on patch #2 about the naming convention.
>

Done, thanks.

>> +            return -EFAULT;
>> +
>> +#ifdef CONFIG_ARM_64
>> +        level = 1;
>
> This sounds wrong to me. The first lookup level is determined by the
> value of T0SZ and T1SZ (see B3-1352 in ARM DDI 0406C.c).
>
> For the AArch64 version see D4.2.5 in ARM DDI 0487A.k_iss10775.
>

Thank you for the hint.

>> +#endif
>> +    }
>> +
>> +#ifdef CONFIG_ARM_64
>> +    if ( is_64bit_domain(d) )
>> +    {
>> +        /* Get the max GVA that can be translated by TTBR0. */
>> +        t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TCR_SZ_MASK;
>> +        t0_max = (1UL << (64 - t0_sz)) - 1;
>> +
>> +        /* Get the min GVA that can be translated by TTBR1. */
>> +        t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TCR_SZ_MASK;
>> +        t1_min = ~0UL - (1UL << (64 - t1_sz)) + 1;
>> +    }
>> +    else
>> +#endif
>> +    {
>> +        /* Get the max GVA that can be translated by TTBR0. */
>> +        t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TTBCR_SZ_MASK;
>
> See my comment on patch #2 for the naming convention.
>
>> +        t0_max = (1U << (32 - t0_sz)) - 1;
>> +
>> +        /* Get the min GVA that can be translated by TTBR1. */
>> +        t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TTBCR_SZ_MASK;
>> +        t1_min = ~0U - (1U << (32 - t1_sz)) + 1;
>
> 1U << (32 - t1_sz) will not fit in an unsigned long if t1_sz = 0.

An unsigned long long should fix that (however not in the 64-bit case),
thanks.

>
> Also, this code looks wrong to me. Looking at B3.6.4 in DDI 0406C.c,
> you don't handle properly t0_max and t1_min  when T0SZ or T1SZ is 0.
>
> I think it would be worth for you to have a look to the pseudo-code in
> the ARM ARM for TranslationTableWalk (see B3.19.3 in ARM DDI 0406C.c
> and J1-5295 in ARM DDI 0487A.k_iss10775) which gives you all the
> details for doing properly translation table walk.

I will have a deeper look and adopt parts of the given pseudocode in my
implementation. You are right: there might be an off-by-one issue in the
computation above.

>
>> +    }
>> +
>> +    if ( t0_max >= gva )
>> +        /* Use TTBR0 for GVA to IPA translation. */
>> +        ttbr = READ_SYSREG64(TTBR0_EL1);
>> +    else if ( t1_min <= gva )
>> +        /* Use TTBR1 for GVA to IPA translation. */
>> +        ttbr = READ_SYSREG64(TTBR1_EL1);
>> +    else
>> +        /* GVA out of bounds of TTBR(0|1). */
>> +        return -EFAULT;
>> +
>> +    /* Bits [63..48] might be used by an ASID. */
>> +    root_mfn = p2m_lookup(d, _gfn(paddr_to_pfn(ttbr &
>> ((1ULL<<48)-1))), NULL);
>
> Please don't hardcode the mask.
>
>> +
>> +    /* Check, whether TTBR holds a valid address. */
>> +    if ( mfn_eq(root_mfn, INVALID_MFN) )
>> +        return -EFAULT;
>> +
>> +    table = map_domain_page(root_mfn);
>
> Nothing prevents the guest to give back the page table whilst you
> browsing it. You may want to have a look to patch "ARM: introduce
> vgic_access_guest_memory()" [1] to generalize the function and avoid
> re-inventing the wheel.
>

I will have a look at this commit, thanks.

>> +
>> +    for ( ; ; level++ )
>> +    {
>> +        pte = table[offsets[level]];
>
> You likely want to use ACCESS_ONCE here to prevent the compiler to
> read the pte twice from the memory.

I tried that, however the compiler did not seem to like it. I will try
again and let you know if I got that to work.

>
>> +
>> +        if ( level == 3 || !pte.walk.valid || !pte.walk.table )
>> +            break;
>> +
>> +        unmap_domain_page(table);
>> +
>> +        root_mfn = p2m_lookup(d, _gfn(pte.walk.base), NULL);
>
> No check on root_mfn?

Done, thanks.

>
>> +        table = map_domain_page(root_mfn);
>> +    }
>> +
>> +    unmap_domain_page(table);
>> +
>> +    if ( !pte.walk.valid )
>> +        return -EFAULT;
>> +
>> +    /* Make sure the entry holds the requested access attributes. */
>> +    if ( ((flags & GV2M_WRITE) == GV2M_WRITE) && pte.pt.ro )
>> +        return -EFAULT;
>
> IHMO, this function should return the access attribute of the page and
> let the caller decides what to do. This would make this function more
> generic.
>

Makes sense. This would make the function indeed more generic. Since
there are already multiple gpt-walk stubs throughout the codebase, we
could make use of only one.

>> +
>> +    *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level]);
>> +
>> +    return 0;
>> +}
>> +
>> +
>>  /*
>>   * If mem_access is in use it might have been the reason why
>> get_page_from_gva
>>   * failed to fetch the page, as it uses the MMU for the permission
>> checking.
>> @@ -109,9 +233,23 @@ p2m_mem_access_check_and_get_page(vaddr_t gva,
>> unsigned long flag,
>>      struct page_info *page = NULL;
>>      struct p2m_domain *p2m = &v->domain->arch.p2m;
>>
>> +    ASSERT(p2m->mem_access_enabled);
>> +
>
> Why this ASSERT has been added?

The function p2m_mem_access_check_and_get_page is called only if
get_page_from_gva fails and mem_access is active. I can add a comment
and move this part into an individual commit.

>
>>      rc = gva_to_ipa(gva, &ipa, flag);
>> +
>> +    /*
>> +     * In case mem_access is active, hardware-based gva_to_ipa
>> translation
>> +     * might fail. Since gva_to_ipa uses the guest's translation
>> tables, access
>> +     * to which might be restricted by the active VTTBR, we perform
>> a gva to
>> +     * ipa translation in software.
>> +     */
>>      if ( rc < 0 )
>> -        goto err;
>> +        if ( p2m_gva_to_ipa(p2m, gva, &ipa, flag) < 0 )
>> +            /*
>> +             * The software gva to ipa translation can still fail,
>> if the the
>> +             * gva is not mapped or does not hold the requested
>> access rights.
>> +             */
>> +            goto err;
>
> Rather falling back, why don't we do software page table walk every time?

A software page table walk would (presumably) take more time to
translate the gva. Do we want that? Also, I am not sure what you meant
by "falling back" at this point. Thank you.

>
>>
>>      gfn = _gfn(paddr_to_pfn(ipa));
>>
>>
>
> Cheers,
>
> [1] https://patchwork.kernel.org/patch/9676291/
>

Cheers,
~Sergej
Julien Grall May 8, 2017, 11:44 a.m. UTC | #5
Hi,

On 08/05/17 10:22, Sergej Proskurin wrote:
> On 05/02/2017 05:17 PM, Julien Grall wrote:
>> On 30/04/17 20:48, Sergej Proskurin wrote:
>> Also s/ttbcr/tcr/
>>
>>> +    struct domain *d = p2m->domain;
>>> +
>>> +    const unsigned int offsets[4] = {
>>> +#ifdef CONFIG_ARM_64
>>> +        zeroeth_table_offset(gva),
>>> +#endif
>>> +        first_table_offset(gva),
>>> +        second_table_offset(gva),
>>> +        third_table_offset(gva)
>>> +    };
>>
>> Offsets are based on the granularity of Xen (currently 4KB). However
>> the guests can support 4KB, 16KB, 64KB. Please handle everything
>> correctly.
>
> True, ARM ist quite flexible. Yet, I have not seen an OS implementation
> that is supported by Xen and makes use of page sizes other than 4KB and
> their supersets (2/4MB, 1GB) (please let me know, if you know some),
> which is why I doubt that we need that. Please let me know why you think
> we need that kind of flexibility in software?
>
> If you should nevertheless insist on that functionality, I would include
> it in the next patch and try not to blow up the code too much.

Linux is able to support 4KB, 16KB, 64KB page granularity for AArch64. 
Centos and RedHat are only shipped with 64KB page granularity.

>
>>
>>> +
>>> +    const paddr_t masks[4] = {
>>> +#ifdef CONFIG_ARM_64
>>> +        ZEROETH_SIZE - 1,
>>> +#endif
>>> +        FIRST_SIZE - 1,
>>> +        SECOND_SIZE - 1,
>>> +        THIRD_SIZE - 1
>>> +    };
>>> +
>>> +    /* If the MMU is disabled, there is no need to translate the
>>> gva. */
>>> +    if ( !(sctlr & SCTLR_M) )
>>> +    {
>>> +        *ipa = gva;
>>> +
>>> +        return 0;
>>> +    }
>>> +
>>> +    if ( is_32bit_domain(d) )
>>> +    {
>>> +        /*
>>> +         * XXX: We do not support 32-bit domain translation table
>>> walks for
>>> +         * domains using the short-descriptor translation table
>>> format, yet.
>>> +         */
>>
>> Debian ARM 32bit is using short-descriptor translation table. So are
>> you suggesting that memaccess will not correctly with Debian guest?
>>
>
> Yes, as stated in the comment, the current implementation does not
> support the short-descriptor translation table format. As this is an RFC
> patch, I wanted to see your opinion on that functionality in general
> before implementing all corner cases of the ARM architecture.
>
> As mentioned in my previous reply in patch (patch 2/4), I would prefer
> to separate the long-descriptor from the short-descriptor translation in
> the future implementation.

I agree here. See my answer on patch #2 about how I would like to see 
the implementation.

[...]

>>> +
>>> +    *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level]);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +
>>>  /*
>>>   * If mem_access is in use it might have been the reason why
>>> get_page_from_gva
>>>   * failed to fetch the page, as it uses the MMU for the permission
>>> checking.
>>> @@ -109,9 +233,23 @@ p2m_mem_access_check_and_get_page(vaddr_t gva,
>>> unsigned long flag,
>>>      struct page_info *page = NULL;
>>>      struct p2m_domain *p2m = &v->domain->arch.p2m;
>>>
>>> +    ASSERT(p2m->mem_access_enabled);
>>> +
>>
>> Why this ASSERT has been added?
>
> The function p2m_mem_access_check_and_get_page is called only if
> get_page_from_gva fails and mem_access is active. I can add a comment
> and move this part into an individual commit.

Whilst I agree it is dumb to call this code without mem_access enabled, 
this code is able to cope with it. So why do you need this ASSERT?

>
>>
>>>      rc = gva_to_ipa(gva, &ipa, flag);
>>> +
>>> +    /*
>>> +     * In case mem_access is active, hardware-based gva_to_ipa
>>> translation
>>> +     * might fail. Since gva_to_ipa uses the guest's translation
>>> tables, access
>>> +     * to which might be restricted by the active VTTBR, we perform
>>> a gva to
>>> +     * ipa translation in software.
>>> +     */
>>>      if ( rc < 0 )
>>> -        goto err;
>>> +        if ( p2m_gva_to_ipa(p2m, gva, &ipa, flag) < 0 )
>>> +            /*
>>> +             * The software gva to ipa translation can still fail,
>>> if the the
>>> +             * gva is not mapped or does not hold the requested
>>> access rights.
>>> +             */
>>> +            goto err;
>>
>> Rather falling back, why don't we do software page table walk every time?
>
> A software page table walk would (presumably) take more time to
> translate the gva. Do we want that? Also, I am not sure what you meant
> by "falling back" at this point. Thank you.

What you currently do is try gva_to_ipa and if it does not work you will 
call p2m_gva_to_ipa. This sounds a bit pointless to me and waste of time 
if the underlying memory of stage-1 page table is protected.

Before saying this is taking much more time, I would like to see actual 
numbers.

Cheers,
Tamas K Lengyel May 8, 2017, 7:42 p.m. UTC | #6
On Mon, May 8, 2017 at 5:44 AM, Julien Grall <julien.grall@arm.com> wrote:
> Hi,
>
> On 08/05/17 10:22, Sergej Proskurin wrote:
>>
>> On 05/02/2017 05:17 PM, Julien Grall wrote:
>>>
>>> On 30/04/17 20:48, Sergej Proskurin wrote:
>>> Also s/ttbcr/tcr/
>>>
>>>> +    struct domain *d = p2m->domain;
>>>> +
>>>> +    const unsigned int offsets[4] = {
>>>> +#ifdef CONFIG_ARM_64
>>>> +        zeroeth_table_offset(gva),
>>>> +#endif
>>>> +        first_table_offset(gva),
>>>> +        second_table_offset(gva),
>>>> +        third_table_offset(gva)
>>>> +    };
>>>
>>>
>>> Offsets are based on the granularity of Xen (currently 4KB). However
>>> the guests can support 4KB, 16KB, 64KB. Please handle everything
>>> correctly.
>>
>>
>> True, ARM ist quite flexible. Yet, I have not seen an OS implementation
>> that is supported by Xen and makes use of page sizes other than 4KB and
>> their supersets (2/4MB, 1GB) (please let me know, if you know some),
>> which is why I doubt that we need that. Please let me know why you think
>> we need that kind of flexibility in software?
>>
>> If you should nevertheless insist on that functionality, I would include
>> it in the next patch and try not to blow up the code too much.
>
>
> Linux is able to support 4KB, 16KB, 64KB page granularity for AArch64.
> Centos and RedHat are only shipped with 64KB page granularity.
>
>
>>
>>>
>>>> +
>>>> +    const paddr_t masks[4] = {
>>>> +#ifdef CONFIG_ARM_64
>>>> +        ZEROETH_SIZE - 1,
>>>> +#endif
>>>> +        FIRST_SIZE - 1,
>>>> +        SECOND_SIZE - 1,
>>>> +        THIRD_SIZE - 1
>>>> +    };
>>>> +
>>>> +    /* If the MMU is disabled, there is no need to translate the
>>>> gva. */
>>>> +    if ( !(sctlr & SCTLR_M) )
>>>> +    {
>>>> +        *ipa = gva;
>>>> +
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    if ( is_32bit_domain(d) )
>>>> +    {
>>>> +        /*
>>>> +         * XXX: We do not support 32-bit domain translation table
>>>> walks for
>>>> +         * domains using the short-descriptor translation table
>>>> format, yet.
>>>> +         */
>>>
>>>
>>> Debian ARM 32bit is using short-descriptor translation table. So are
>>> you suggesting that memaccess will not correctly with Debian guest?
>>>
>>
>> Yes, as stated in the comment, the current implementation does not
>> support the short-descriptor translation table format. As this is an RFC
>> patch, I wanted to see your opinion on that functionality in general
>> before implementing all corner cases of the ARM architecture.
>>
>> As mentioned in my previous reply in patch (patch 2/4), I would prefer
>> to separate the long-descriptor from the short-descriptor translation in
>> the future implementation.
>
>
> I agree here. See my answer on patch #2 about how I would like to see the
> implementation.
>
> [...]
>
>>>> +
>>>> +    *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level]);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +
>>>>  /*
>>>>   * If mem_access is in use it might have been the reason why
>>>> get_page_from_gva
>>>>   * failed to fetch the page, as it uses the MMU for the permission
>>>> checking.
>>>> @@ -109,9 +233,23 @@ p2m_mem_access_check_and_get_page(vaddr_t gva,
>>>> unsigned long flag,
>>>>      struct page_info *page = NULL;
>>>>      struct p2m_domain *p2m = &v->domain->arch.p2m;
>>>>
>>>> +    ASSERT(p2m->mem_access_enabled);
>>>> +
>>>
>>>
>>> Why this ASSERT has been added?
>>
>>
>> The function p2m_mem_access_check_and_get_page is called only if
>> get_page_from_gva fails and mem_access is active. I can add a comment
>> and move this part into an individual commit.
>
>
> Whilst I agree it is dumb to call this code without mem_access enabled, this
> code is able to cope with it. So why do you need this ASSERT?
>
>
>>
>>>
>>>>      rc = gva_to_ipa(gva, &ipa, flag);
>>>> +
>>>> +    /*
>>>> +     * In case mem_access is active, hardware-based gva_to_ipa
>>>> translation
>>>> +     * might fail. Since gva_to_ipa uses the guest's translation
>>>> tables, access
>>>> +     * to which might be restricted by the active VTTBR, we perform
>>>> a gva to
>>>> +     * ipa translation in software.
>>>> +     */
>>>>      if ( rc < 0 )
>>>> -        goto err;
>>>> +        if ( p2m_gva_to_ipa(p2m, gva, &ipa, flag) < 0 )
>>>> +            /*
>>>> +             * The software gva to ipa translation can still fail,
>>>> if the the
>>>> +             * gva is not mapped or does not hold the requested
>>>> access rights.
>>>> +             */
>>>> +            goto err;
>>>
>>>
>>> Rather falling back, why don't we do software page table walk every time?
>>
>>
>> A software page table walk would (presumably) take more time to
>> translate the gva. Do we want that? Also, I am not sure what you meant
>> by "falling back" at this point. Thank you.
>
>
> What you currently do is try gva_to_ipa and if it does not work you will
> call p2m_gva_to_ipa. This sounds a bit pointless to me and waste of time if
> the underlying memory of stage-1 page table is protected.

But we don't know that the stage-1 page table is protected until the
hardware based lookup fails. I can turn your argument around and say
doing the software based lookup is pointless and a waste of time when
the stage-1 table is not protected. Which is by the way what I would
expect to see in most cases.

> Before saying this is taking much more time, I would like to see actual
> numbers.

Whether the performance is measurable different is going to be very
usecase specific. If the TLBs are already loaded with the translation
then the hardware lookup will be a lot faster. Setting up a test-case
for this is just a PITA and I don't really see why you are against
using the hardware lookup to start with.

Tamas
Julien Grall May 8, 2017, 8:53 p.m. UTC | #7
On 05/08/2017 08:42 PM, Tamas K Lengyel wrote:
> On Mon, May 8, 2017 at 5:44 AM, Julien Grall <julien.grall@arm.com> wrote:
> Whether the performance is measurable different is going to be very
> usecase specific. If the TLBs are already loaded with the translation
> then the hardware lookup will be a lot faster. Setting up a test-case
> for this is just a PITA and I don't really see why you are against
> using the hardware lookup to start with.

Well, if you read my e-mail you would have noticed I am not against it. 
I just find a bit pointless to do both and was asking if you did some 
benchmark before taking this decision.

But it sounds like you are not willing to even consider it. Anyway I am 
not going to argue on that. It is not something I really care for now.

Cheers,
Tamas K Lengyel May 8, 2017, 9:22 p.m. UTC | #8
On Mon, May 8, 2017 at 2:53 PM, Julien Grall <julien.grall@arm.com> wrote:
>
>
> On 05/08/2017 08:42 PM, Tamas K Lengyel wrote:
>>
>> On Mon, May 8, 2017 at 5:44 AM, Julien Grall <julien.grall@arm.com> wrote:
>> Whether the performance is measurable different is going to be very
>> usecase specific. If the TLBs are already loaded with the translation
>> then the hardware lookup will be a lot faster. Setting up a test-case
>> for this is just a PITA and I don't really see why you are against
>> using the hardware lookup to start with.
>
>
> Well, if you read my e-mail you would have noticed I am not against it. I
> just find a bit pointless to do both and was asking if you did some
> benchmark before taking this decision.
>
> But it sounds like you are not willing to even consider it. Anyway I am not
> going to argue on that. It is not something I really care for now.

It did sound to me like you were pushing for it. I would prefer doing
hardware translation first and only falling back to software when
needed unless we have some good incentive to do otherwise.

Tamas
Sergej Proskurin May 9, 2017, 7:17 a.m. UTC | #9
Hi,

>> What you currently do is try gva_to_ipa and if it does not work >> you will call p2m_gva_to_ipa. This sounds a bit pointless to me and
>> waste of time if the underlying memory of stage-1 page table is >>
protected. > > But we don't know that the stage-1 page table is
protected until the > hardware based lookup fails. I can turn your
argument around and say > doing the software based lookup is pointless
and a waste of time > when the stage-1 table is not protected. Which is
by the way what I > would expect to see in most cases.

I agree with Tamas: I also believe that in most cases the stage-1
translation table won't be protected. So, in my opinion, falling back to
software (which will be presumablly a rare operation) is the better
approach.

Cheers,
~Sergej
Julien Grall May 9, 2017, 8:09 a.m. UTC | #10
On 05/09/2017 08:17 AM, Sergej Proskurin wrote:
> Hi,
>
>>> What you currently do is try gva_to_ipa and if it does not work >> you will call p2m_gva_to_ipa. This sounds a bit pointless to me and
>>> waste of time if the underlying memory of stage-1 page table is >>
> protected. > > But we don't know that the stage-1 page table is
> protected until the > hardware based lookup fails. I can turn your
> argument around and say > doing the software based lookup is pointless
> and a waste of time > when the stage-1 table is not protected. Which is
> by the way what I > would expect to see in most cases.
>
> I agree with Tamas: I also believe that in most cases the stage-1
> translation table won't be protected. So, in my opinion, falling back to
> software (which will be presumablly a rare operation) is the better
> approach.

Well, you both consider that it is better to do the fallback by assuming 
the TLBs will always cache the intermediate translations (e.g VA -> IPA) 
and not only the full S1 -> S2 translation (e.g VA -> PA).

Cheers,
Tamas K Lengyel May 9, 2017, 4:04 p.m. UTC | #11
On Tue, May 9, 2017 at 2:09 AM, Julien Grall <julien.grall@arm.com> wrote:
>
>
> On 05/09/2017 08:17 AM, Sergej Proskurin wrote:
>>
>> Hi,
>>
>>>> What you currently do is try gva_to_ipa and if it does not work >> you
>>>> will call p2m_gva_to_ipa. This sounds a bit pointless to me and
>>>> waste of time if the underlying memory of stage-1 page table is >>
>>
>> protected. > > But we don't know that the stage-1 page table is
>> protected until the > hardware based lookup fails. I can turn your
>> argument around and say > doing the software based lookup is pointless
>> and a waste of time > when the stage-1 table is not protected. Which is
>> by the way what I > would expect to see in most cases.
>>
>> I agree with Tamas: I also believe that in most cases the stage-1
>> translation table won't be protected. So, in my opinion, falling back to
>> software (which will be presumablly a rare operation) is the better
>> approach.
>
>
> Well, you both consider that it is better to do the fallback by assuming the
> TLBs will always cache the intermediate translations (e.g VA -> IPA) and not
> only the full S1 -> S2 translation (e.g VA -> PA).

No, I was just pointing out that if the TLB has it cached it is
guaranteed to be faster. Whether that is the case is probably usecase
dependent. But even if the TLB is not loaded, I would assume the
hardware lookup is at least as fast as the software based one, but I
would bet it is faster. Without number this is just theoretical but I
would be very surprised if the hardware lookup would ever be slower
then the software based one... And since this is a corner-case that
most application will never encounter, forcing them all to use a
slower approach doesn't sound right.

Tamas
Julien Grall May 9, 2017, 4:22 p.m. UTC | #12
On 09/05/17 17:04, Tamas K Lengyel wrote:
> On Tue, May 9, 2017 at 2:09 AM, Julien Grall <julien.grall@arm.com> wrote:
>>
>>
>> On 05/09/2017 08:17 AM, Sergej Proskurin wrote:
>>>
>>> Hi,
>>>
>>>>> What you currently do is try gva_to_ipa and if it does not work >> you
>>>>> will call p2m_gva_to_ipa. This sounds a bit pointless to me and
>>>>> waste of time if the underlying memory of stage-1 page table is >>
>>>
>>> protected. > > But we don't know that the stage-1 page table is
>>> protected until the > hardware based lookup fails. I can turn your
>>> argument around and say > doing the software based lookup is pointless
>>> and a waste of time > when the stage-1 table is not protected. Which is
>>> by the way what I > would expect to see in most cases.
>>>
>>> I agree with Tamas: I also believe that in most cases the stage-1
>>> translation table won't be protected. So, in my opinion, falling back to
>>> software (which will be presumablly a rare operation) is the better
>>> approach.
>>
>>
>> Well, you both consider that it is better to do the fallback by assuming the
>> TLBs will always cache the intermediate translations (e.g VA -> IPA) and not
>> only the full S1 -> S2 translation (e.g VA -> PA).
>
> No, I was just pointing out that if the TLB has it cached it is
> guaranteed to be faster. Whether that is the case is probably usecase
> dependent. But even if the TLB is not loaded, I would assume the
> hardware lookup is at least as fast as the software based one, but I
> would bet it is faster. Without number this is just theoretical but I
> would be very surprised if the hardware lookup would ever be slower
> then the software based one... And since this is a corner-case that
> most application will never encounter, forcing them all to use a
> slower approach doesn't sound right.

What you miss in my point is TLB can be designed to never cache 
intermediate translation. It is not even about whether the TLB are 
loaded or not.

I am struggling to understand why this would be a corner case as it 
would be valid to protect the page table to inspect what the guest is 
doing...

I am not saying translation would be slower in hardware than in 
software. I am just saying that maybe it would not be a huge performance 
hit to always do software rather than fallback and having a worse one 
time to time (or often?).

Also, if you use altp2m (when it gets implemented), you would likely 
need to either do some context switch to use the right p2m or do 
software lookup.

Cheers,
Tamas K Lengyel May 9, 2017, 4:45 p.m. UTC | #13
On Tue, May 9, 2017 at 10:22 AM, Julien Grall <julien.grall@arm.com> wrote:
>
>
> On 09/05/17 17:04, Tamas K Lengyel wrote:
>>
>> On Tue, May 9, 2017 at 2:09 AM, Julien Grall <julien.grall@arm.com> wrote:
>>>
>>>
>>>
>>> On 05/09/2017 08:17 AM, Sergej Proskurin wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>>>> What you currently do is try gva_to_ipa and if it does not work >> you
>>>>>> will call p2m_gva_to_ipa. This sounds a bit pointless to me and
>>>>>> waste of time if the underlying memory of stage-1 page table is >>
>>>>
>>>>
>>>> protected. > > But we don't know that the stage-1 page table is
>>>> protected until the > hardware based lookup fails. I can turn your
>>>> argument around and say > doing the software based lookup is pointless
>>>> and a waste of time > when the stage-1 table is not protected. Which is
>>>> by the way what I > would expect to see in most cases.
>>>>
>>>> I agree with Tamas: I also believe that in most cases the stage-1
>>>> translation table won't be protected. So, in my opinion, falling back to
>>>> software (which will be presumablly a rare operation) is the better
>>>> approach.
>>>
>>>
>>>
>>> Well, you both consider that it is better to do the fallback by assuming
>>> the
>>> TLBs will always cache the intermediate translations (e.g VA -> IPA) and
>>> not
>>> only the full S1 -> S2 translation (e.g VA -> PA).
>>
>>
>> No, I was just pointing out that if the TLB has it cached it is
>> guaranteed to be faster. Whether that is the case is probably usecase
>> dependent. But even if the TLB is not loaded, I would assume the
>> hardware lookup is at least as fast as the software based one, but I
>> would bet it is faster. Without number this is just theoretical but I
>> would be very surprised if the hardware lookup would ever be slower
>> then the software based one... And since this is a corner-case that
>> most application will never encounter, forcing them all to use a
>> slower approach doesn't sound right.
>
>
> What you miss in my point is TLB can be designed to never cache intermediate
> translation. It is not even about whether the TLB are loaded or not.
>
> I am struggling to understand why this would be a corner case as it would be
> valid to protect the page table to inspect what the guest is doing...

It is valid, yes, but nothing requires the user to block read access
to that specific page. Most applications only monitor a handful of
pages at a time. Applications may not even use read-access
restrictions at all (doing only write or execute tracing). If we do
software based lookups all the time we penalize all of those
applications when it is really only a very small subset of cases where
this will be needed.

>
> I am not saying translation would be slower in hardware than in software. I
> am just saying that maybe it would not be a huge performance hit to always
> do software rather than fallback and having a worse one time to time (or
> often?).

It probably wouldn't be a big hit, but why take any hit if not necessary?

Tamas
Julien Grall May 9, 2017, 6:55 p.m. UTC | #14
On 09/05/2017 17:45, Tamas K Lengyel wrote:
> On Tue, May 9, 2017 at 10:22 AM, Julien Grall <julien.grall@arm.com> wrote:
>>
>>
>> On 09/05/17 17:04, Tamas K Lengyel wrote:
>>>
>>> On Tue, May 9, 2017 at 2:09 AM, Julien Grall <julien.grall@arm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 05/09/2017 08:17 AM, Sergej Proskurin wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>>> What you currently do is try gva_to_ipa and if it does not work >> you
>>>>>>> will call p2m_gva_to_ipa. This sounds a bit pointless to me and
>>>>>>> waste of time if the underlying memory of stage-1 page table is >>
>>>>>
>>>>>
>>>>> protected. > > But we don't know that the stage-1 page table is
>>>>> protected until the > hardware based lookup fails. I can turn your
>>>>> argument around and say > doing the software based lookup is pointless
>>>>> and a waste of time > when the stage-1 table is not protected. Which is
>>>>> by the way what I > would expect to see in most cases.
>>>>>
>>>>> I agree with Tamas: I also believe that in most cases the stage-1
>>>>> translation table won't be protected. So, in my opinion, falling back to
>>>>> software (which will be presumablly a rare operation) is the better
>>>>> approach.
>>>>
>>>>
>>>>
>>>> Well, you both consider that it is better to do the fallback by assuming
>>>> the
>>>> TLBs will always cache the intermediate translations (e.g VA -> IPA) and
>>>> not
>>>> only the full S1 -> S2 translation (e.g VA -> PA).
>>>
>>>
>>> No, I was just pointing out that if the TLB has it cached it is
>>> guaranteed to be faster. Whether that is the case is probably usecase
>>> dependent. But even if the TLB is not loaded, I would assume the
>>> hardware lookup is at least as fast as the software based one, but I
>>> would bet it is faster. Without number this is just theoretical but I
>>> would be very surprised if the hardware lookup would ever be slower
>>> then the software based one... And since this is a corner-case that
>>> most application will never encounter, forcing them all to use a
>>> slower approach doesn't sound right.
>>
>>
>> What you miss in my point is TLB can be designed to never cache intermediate
>> translation. It is not even about whether the TLB are loaded or not.
>>
>> I am struggling to understand why this would be a corner case as it would be
>> valid to protect the page table to inspect what the guest is doing...
>
> It is valid, yes, but nothing requires the user to block read access
> to that specific page. Most applications only monitor a handful of
> pages at a time. Applications may not even use read-access
> restrictions at all (doing only write or execute tracing). If we do
> software based lookups all the time we penalize all of those
> applications when it is really only a very small subset of cases where
> this will be needed.
>
>>
>> I am not saying translation would be slower in hardware than in software. I
>> am just saying that maybe it would not be a huge performance hit to always
>> do software rather than fallback and having a worse one time to time (or
>> often?).
>
> It probably wouldn't be a big hit, but why take any hit if not necessary?

You still miss my point... Anyway, it is nothing something I really care 
for now. So I am not going to argue.

Cheers,
Julien Grall May 9, 2017, 6:56 p.m. UTC | #15
On 09/05/2017 19:55, Julien Grall wrote:
>
>
> On 09/05/2017 17:45, Tamas K Lengyel wrote:
>> On Tue, May 9, 2017 at 10:22 AM, Julien Grall <julien.grall@arm.com>
>> wrote:
>>>
>>>
>>> On 09/05/17 17:04, Tamas K Lengyel wrote:
>>>>
>>>> On Tue, May 9, 2017 at 2:09 AM, Julien Grall <julien.grall@arm.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 05/09/2017 08:17 AM, Sergej Proskurin wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>>> What you currently do is try gva_to_ipa and if it does not work
>>>>>>>> >> you
>>>>>>>> will call p2m_gva_to_ipa. This sounds a bit pointless to me and
>>>>>>>> waste of time if the underlying memory of stage-1 page table is >>
>>>>>>
>>>>>>
>>>>>> protected. > > But we don't know that the stage-1 page table is
>>>>>> protected until the > hardware based lookup fails. I can turn your
>>>>>> argument around and say > doing the software based lookup is
>>>>>> pointless
>>>>>> and a waste of time > when the stage-1 table is not protected.
>>>>>> Which is
>>>>>> by the way what I > would expect to see in most cases.
>>>>>>
>>>>>> I agree with Tamas: I also believe that in most cases the stage-1
>>>>>> translation table won't be protected. So, in my opinion, falling
>>>>>> back to
>>>>>> software (which will be presumablly a rare operation) is the better
>>>>>> approach.
>>>>>
>>>>>
>>>>>
>>>>> Well, you both consider that it is better to do the fallback by
>>>>> assuming
>>>>> the
>>>>> TLBs will always cache the intermediate translations (e.g VA ->
>>>>> IPA) and
>>>>> not
>>>>> only the full S1 -> S2 translation (e.g VA -> PA).
>>>>
>>>>
>>>> No, I was just pointing out that if the TLB has it cached it is
>>>> guaranteed to be faster. Whether that is the case is probably usecase
>>>> dependent. But even if the TLB is not loaded, I would assume the
>>>> hardware lookup is at least as fast as the software based one, but I
>>>> would bet it is faster. Without number this is just theoretical but I
>>>> would be very surprised if the hardware lookup would ever be slower
>>>> then the software based one... And since this is a corner-case that
>>>> most application will never encounter, forcing them all to use a
>>>> slower approach doesn't sound right.
>>>
>>>
>>> What you miss in my point is TLB can be designed to never cache
>>> intermediate
>>> translation. It is not even about whether the TLB are loaded or not.
>>>
>>> I am struggling to understand why this would be a corner case as it
>>> would be
>>> valid to protect the page table to inspect what the guest is doing...
>>
>> It is valid, yes, but nothing requires the user to block read access
>> to that specific page. Most applications only monitor a handful of
>> pages at a time. Applications may not even use read-access
>> restrictions at all (doing only write or execute tracing). If we do
>> software based lookups all the time we penalize all of those
>> applications when it is really only a very small subset of cases where
>> this will be needed.
>>
>>>
>>> I am not saying translation would be slower in hardware than in
>>> software. I
>>> am just saying that maybe it would not be a huge performance hit to
>>> always
>>> do software rather than fallback and having a worse one time to time (or
>>> often?).
>>
>> It probably wouldn't be a big hit, but why take any hit if not necessary?
>
> You still miss my point... Anyway, it is nothing something I really care

* it is not something

> for now. So I am not going to argue.
>
> Cheers,
>
diff mbox

Patch

diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
index 04b1506b00..352eb6073f 100644
--- a/xen/arch/arm/mem_access.c
+++ b/xen/arch/arm/mem_access.c
@@ -20,6 +20,7 @@ 
 #include <xen/monitor.h>
 #include <xen/sched.h>
 #include <xen/vm_event.h>
+#include <xen/domain_page.h>
 #include <public/vm_event.h>
 #include <asm/event.h>
 
@@ -90,6 +91,129 @@  static int __p2m_get_mem_access(struct domain *d, gfn_t gfn,
     return 0;
 }
 
+static int
+p2m_gva_to_ipa(struct p2m_domain *p2m, vaddr_t gva,
+               paddr_t *ipa, unsigned int flags)
+{
+    int level=0, t0_sz, t1_sz;
+    unsigned long t0_max, t1_min;
+    lpae_t pte, *table;
+    mfn_t root_mfn;
+    uint64_t ttbr;
+    uint32_t sctlr = READ_SYSREG(SCTLR_EL1);
+    register_t ttbcr = READ_SYSREG(TCR_EL1);
+    struct domain *d = p2m->domain;
+
+    const unsigned int offsets[4] = {
+#ifdef CONFIG_ARM_64
+        zeroeth_table_offset(gva),
+#endif
+        first_table_offset(gva),
+        second_table_offset(gva),
+        third_table_offset(gva)
+    };
+
+    const paddr_t masks[4] = {
+#ifdef CONFIG_ARM_64
+        ZEROETH_SIZE - 1,
+#endif
+        FIRST_SIZE - 1,
+        SECOND_SIZE - 1,
+        THIRD_SIZE - 1
+    };
+
+    /* If the MMU is disabled, there is no need to translate the gva. */
+    if ( !(sctlr & SCTLR_M) )
+    {
+        *ipa = gva;
+
+        return 0;
+    }
+
+    if ( is_32bit_domain(d) )
+    {
+        /*
+         * XXX: We do not support 32-bit domain translation table walks for
+         * domains using the short-descriptor translation table format, yet.
+         */
+        if ( !(ttbcr & TTBCR_EAE) )
+            return -EFAULT;
+
+#ifdef CONFIG_ARM_64
+        level = 1;
+#endif
+    }
+
+#ifdef CONFIG_ARM_64
+    if ( is_64bit_domain(d) )
+    {
+        /* Get the max GVA that can be translated by TTBR0. */
+        t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TCR_SZ_MASK;
+        t0_max = (1UL << (64 - t0_sz)) - 1;
+
+        /* Get the min GVA that can be translated by TTBR1. */
+        t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TCR_SZ_MASK;
+        t1_min = ~0UL - (1UL << (64 - t1_sz)) + 1;
+    }
+    else
+#endif
+    {
+        /* Get the max GVA that can be translated by TTBR0. */
+        t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TTBCR_SZ_MASK;
+        t0_max = (1U << (32 - t0_sz)) - 1;
+
+        /* Get the min GVA that can be translated by TTBR1. */
+        t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TTBCR_SZ_MASK;
+        t1_min = ~0U - (1U << (32 - t1_sz)) + 1;
+    }
+
+    if ( t0_max >= gva )
+        /* Use TTBR0 for GVA to IPA translation. */
+        ttbr = READ_SYSREG64(TTBR0_EL1);
+    else if ( t1_min <= gva )
+        /* Use TTBR1 for GVA to IPA translation. */
+        ttbr = READ_SYSREG64(TTBR1_EL1);
+    else
+        /* GVA out of bounds of TTBR(0|1). */
+        return -EFAULT;
+
+    /* Bits [63..48] might be used by an ASID. */
+    root_mfn = p2m_lookup(d, _gfn(paddr_to_pfn(ttbr & ((1ULL<<48)-1))), NULL);
+
+    /* Check, whether TTBR holds a valid address. */
+    if ( mfn_eq(root_mfn, INVALID_MFN) )
+        return -EFAULT;
+
+    table = map_domain_page(root_mfn);
+
+    for ( ; ; level++ )
+    {
+        pte = table[offsets[level]];
+
+        if ( level == 3 || !pte.walk.valid || !pte.walk.table )
+            break;
+
+        unmap_domain_page(table);
+
+        root_mfn = p2m_lookup(d, _gfn(pte.walk.base), NULL);
+        table = map_domain_page(root_mfn);
+    }
+
+    unmap_domain_page(table);
+
+    if ( !pte.walk.valid )
+        return -EFAULT;
+
+    /* Make sure the entry holds the requested access attributes. */
+    if ( ((flags & GV2M_WRITE) == GV2M_WRITE) && pte.pt.ro )
+        return -EFAULT;
+
+    *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level]);
+
+    return 0;
+}
+
+
 /*
  * If mem_access is in use it might have been the reason why get_page_from_gva
  * failed to fetch the page, as it uses the MMU for the permission checking.
@@ -109,9 +233,23 @@  p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned long flag,
     struct page_info *page = NULL;
     struct p2m_domain *p2m = &v->domain->arch.p2m;
 
+    ASSERT(p2m->mem_access_enabled);
+
     rc = gva_to_ipa(gva, &ipa, flag);
+
+    /*
+     * In case mem_access is active, hardware-based gva_to_ipa translation
+     * might fail. Since gva_to_ipa uses the guest's translation tables, access
+     * to which might be restricted by the active VTTBR, we perform a gva to
+     * ipa translation in software.
+     */
     if ( rc < 0 )
-        goto err;
+        if ( p2m_gva_to_ipa(p2m, gva, &ipa, flag) < 0 )
+            /*
+             * The software gva to ipa translation can still fail, if the the
+             * gva is not mapped or does not hold the requested access rights.
+             */
+            goto err;
 
     gfn = _gfn(paddr_to_pfn(ipa));