diff mbox series

[3/3] AMD/IOMMU: replace a few literal numbers

Message ID 056a856a-147e-612b-d476-50be80406581@suse.com (mailing list archive)
State New, archived
Headers show
Series AMD IOMMU: misc small adjustments | expand

Commit Message

Jan Beulich Feb. 5, 2020, 9:43 a.m. UTC
Introduce IOMMU_PDE_NEXT_LEVEL_{MIN,MAX} to replace literal 1, 6, and 7
instances. While doing so replace two uses of memset() by initializers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
TBD: We should really honor the hats field of union
     amd_iommu_ext_features, but the specification (or at least the
     parts I did look at in the course of putting together this patch)
     is unclear about the maximum valid value in case EFRSup is clear.

Comments

Andrew Cooper Feb. 10, 2020, 2:28 p.m. UTC | #1
On 05/02/2020 09:43, Jan Beulich wrote:
> Introduce IOMMU_PDE_NEXT_LEVEL_{MIN,MAX} to replace literal 1, 6, and 7
> instances. While doing so replace two uses of memset() by initializers.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

This does not look to be an improvement.  IOMMU_PDE_NEXT_LEVEL_MIN is
definitely bogus, and in all cases, a literal 1 is better, because that
is how we describe pagetable levels.

Something to replace literal 6/7 probably is ok, but doesn't want to be
done like this.

The majority of the problems here as caused by iommu_pde_from_dfn()'s
silly ABI.  The pt_mfn[] array is problematic (because it is used as a
1-based array, not 0-based) and useless because both callers only want
the 4k-equivelent mfn.  Fixing the ABI gets rid of quite a lot of wasted
stack space, every use of '1', and every upper bound other than the bug
on and amd_iommu_get_paging_mode().

> ---
> TBD: We should really honor the hats field of union
>      amd_iommu_ext_features, but the specification (or at least the
>      parts I did look at in the course of putting together this patch)
>      is unclear about the maximum valid value in case EFRSup is clear.

It is available from PCI config space (Misc0 register, cap+0x10) even on
first gen IOMMUs, and the IVRS table in Type 10.

I'm honestly not sure why the information was duplicated into EFR, other
than perhaps for providing the information in a more useful format.

~Andrew
Jan Beulich Feb. 17, 2020, 1:09 p.m. UTC | #2
On 10.02.2020 15:28, Andrew Cooper wrote:
> On 05/02/2020 09:43, Jan Beulich wrote:
>> Introduce IOMMU_PDE_NEXT_LEVEL_{MIN,MAX} to replace literal 1, 6, and 7
>> instances. While doing so replace two uses of memset() by initializers.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> This does not look to be an improvement.  IOMMU_PDE_NEXT_LEVEL_MIN is
> definitely bogus, and in all cases, a literal 1 is better, because that
> is how we describe pagetable levels.

I disagree. The device table entry's mode field is bounded by 1
(min) and 6 (max) for the legitimate values to put there.

> Something to replace literal 6/7 probably is ok, but doesn't want to be
> done like this.
> 
> The majority of the problems here as caused by iommu_pde_from_dfn()'s
> silly ABI.  The pt_mfn[] array is problematic (because it is used as a
> 1-based array, not 0-based) and useless because both callers only want
> the 4k-equivelent mfn.  Fixing the ABI gets rid of quite a lot of wasted
> stack space, every use of '1', and every upper bound other than the bug
> on and amd_iommu_get_paging_mode().

I didn't mean to alter that function's behavior, at the very least
not until being certain there wasn't a reason it was coded with this
array approach. IOW the alternative to going with this patch
(subject to corrections of course) is for me to drop it altogether,
keeping the hard-coded numbers in place. Just let me know.

>> ---
>> TBD: We should really honor the hats field of union
>>      amd_iommu_ext_features, but the specification (or at least the
>>      parts I did look at in the course of putting together this patch)
>>      is unclear about the maximum valid value in case EFRSup is clear.
> 
> It is available from PCI config space (Misc0 register, cap+0x10) even on
> first gen IOMMUs,

I don't think any of the address size fields there matches what
HATS is about (limiting of the values valid to put in a DTE's
mode field). In fact I'm having some difficulty bringing the
two in (sensible) sync.

> and the IVRS table in Type 10.

Which may in turn be absent, i.e. the question of what to use as
a default merely gets shifted.

Jan
Andrew Cooper Feb. 17, 2020, 7:06 p.m. UTC | #3
On 17/02/2020 13:09, Jan Beulich wrote:
> On 10.02.2020 15:28, Andrew Cooper wrote:
>> On 05/02/2020 09:43, Jan Beulich wrote:
>>> Introduce IOMMU_PDE_NEXT_LEVEL_{MIN,MAX} to replace literal 1, 6, and 7
>>> instances. While doing so replace two uses of memset() by initializers.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> This does not look to be an improvement.  IOMMU_PDE_NEXT_LEVEL_MIN is
>> definitely bogus, and in all cases, a literal 1 is better, because that
>> is how we describe pagetable levels.
> I disagree.

A pagetable walking function which does:

while ( level > 1 )
{
    ...
    level--;
}

is far clearer and easier to follow than hiding 1 behind a constant
which isn't obviously 1.    Something like LEVEL_4K would at least be
something that makes sense in context, but a literal one less verbose.

>  The device table entry's mode field is bounded by 1
> (min) and 6 (max) for the legitimate values to put there.

If by 1, you mean 0, then yes.  Coping properly with a mode of 0 looks
to be easier than putting in an arbitrary restriction.

OTOH, you intended to restrict to just values we expect to find in a Xen
setup, then the answers are 3 and 4 only.  (The "correctness" of this
function depends on only running on Xen-written tables.  It doesn't
actually read the next-level field out of the PTE, and assumes that it
is a standard pagetable hierarchy.  Things will go wrong if it
encounters a superpage, or a next-level-7 entry.)

>
>> Something to replace literal 6/7 probably is ok, but doesn't want to be
>> done like this.
>>
>> The majority of the problems here as caused by iommu_pde_from_dfn()'s
>> silly ABI.  The pt_mfn[] array is problematic (because it is used as a
>> 1-based array, not 0-based) and useless because both callers only want
>> the 4k-equivelent mfn.  Fixing the ABI gets rid of quite a lot of wasted
>> stack space, every use of '1', and every upper bound other than the bug
>> on and amd_iommu_get_paging_mode().
> I didn't mean to alter that function's behavior, at the very least
> not until being certain there wasn't a reason it was coded with this
> array approach. IOW the alternative to going with this patch
> (subject to corrections of course) is for me to drop it altogether,
> keeping the hard-coded numbers in place. Just let me know.

If you don't want to change the API, then I'll put it on my todo list.

As previously expressed, this patch on its own is not an improvement IMO.

>>> ---
>>> TBD: We should really honor the hats field of union
>>>      amd_iommu_ext_features, but the specification (or at least the
>>>      parts I did look at in the course of putting together this patch)
>>>      is unclear about the maximum valid value in case EFRSup is clear.
>> It is available from PCI config space (Misc0 register, cap+0x10) even on
>> first gen IOMMUs,
> I don't think any of the address size fields there matches what
> HATS is about (limiting of the values valid to put in a DTE's
> mode field). In fact I'm having some difficulty bringing the
> two in (sensible) sync.

It will confirm whether 4-levels is available or not, but TBH, we know
that anyway by virtue of being 64bit.

Higher levels really don't matter because we don't support using them. 
We're we to support using them (and I do have one usecase in mind), it
would be entirely reasonable to restrict usage to systems which had EFR.

>
>> and the IVRS table in Type 10.
> Which may in turn be absent, i.e. the question of what to use as
> a default merely gets shifted.

One of Type 10 or 11 is mandatory for each IOMMU in the system.  One way
or another, the information is present.

~Andrew
Jan Beulich Feb. 18, 2020, 7:52 a.m. UTC | #4
On 17.02.2020 20:06, Andrew Cooper wrote:
> On 17/02/2020 13:09, Jan Beulich wrote:
>> On 10.02.2020 15:28, Andrew Cooper wrote:
>>> On 05/02/2020 09:43, Jan Beulich wrote:
>>>> Introduce IOMMU_PDE_NEXT_LEVEL_{MIN,MAX} to replace literal 1, 6, and 7
>>>> instances. While doing so replace two uses of memset() by initializers.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> This does not look to be an improvement.  IOMMU_PDE_NEXT_LEVEL_MIN is
>>> definitely bogus, and in all cases, a literal 1 is better, because that
>>> is how we describe pagetable levels.
>> I disagree.
> 
> A pagetable walking function which does:
> 
> while ( level > 1 )
> {
>     ...
>     level--;
> }
> 
> is far clearer and easier to follow than hiding 1 behind a constant
> which isn't obviously 1.    Something like LEVEL_4K would at least be
> something that makes sense in context, but a literal one less verbose.
> 
>>  The device table entry's mode field is bounded by 1
>> (min) and 6 (max) for the legitimate values to put there.
> 
> If by 1, you mean 0, then yes.

I don't, no. A value of zero means "translation disabled".

>  Coping properly with a mode of 0 looks
> to be easier than putting in an arbitrary restriction.

Coping with this mode is entirely orthogonal imo.

>>> Something to replace literal 6/7 probably is ok, but doesn't want to be
>>> done like this.
>>>
>>> The majority of the problems here as caused by iommu_pde_from_dfn()'s
>>> silly ABI.  The pt_mfn[] array is problematic (because it is used as a
>>> 1-based array, not 0-based) and useless because both callers only want
>>> the 4k-equivelent mfn.  Fixing the ABI gets rid of quite a lot of wasted
>>> stack space, every use of '1', and every upper bound other than the bug
>>> on and amd_iommu_get_paging_mode().
>> I didn't mean to alter that function's behavior, at the very least
>> not until being certain there wasn't a reason it was coded with this
>> array approach. IOW the alternative to going with this patch
>> (subject to corrections of course) is for me to drop it altogether,
>> keeping the hard-coded numbers in place. Just let me know.
> 
> If you don't want to change the API, then I'll put it on my todo list.
> 
> As previously expressed, this patch on its own is not an improvement IMO.

We disagree here, quite obviously, but well, we'll have to live
with the literal numbers then. I'll drop the patch.

>>> and the IVRS table in Type 10.
>> Which may in turn be absent, i.e. the question of what to use as
>> a default merely gets shifted.
> 
> One of Type 10 or 11 is mandatory for each IOMMU in the system.  One way
> or another, the information is present.

Even for type 10 the description for the field says "If IVinfo[EFRSup] = 0,
this field is Reserved."

Jan
diff mbox series

Patch

--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -187,7 +187,8 @@  static int iommu_pde_from_dfn(struct dom
     table = hd->arch.root_table;
     level = hd->arch.paging_mode;
 
-    BUG_ON( table == NULL || level < 1 || level > 6 );
+    BUG_ON(!table || level < IOMMU_PDE_NEXT_LEVEL_MIN ||
+           level > IOMMU_PDE_NEXT_LEVEL_MAX);
 
     /*
      * A frame number past what the current page tables can represent can't
@@ -198,7 +199,7 @@  static int iommu_pde_from_dfn(struct dom
 
     next_table_mfn = mfn_x(page_to_mfn(table));
 
-    while ( level > 1 )
+    while ( level > IOMMU_PDE_NEXT_LEVEL_MIN )
     {
         unsigned int next_level = level - 1;
         pt_mfn[level] = next_table_mfn;
@@ -274,7 +275,7 @@  static int iommu_pde_from_dfn(struct dom
         level--;
     }
 
-    /* mfn of level 1 page table */
+    /* mfn of IOMMU_PDE_NEXT_LEVEL_MIN page table */
     pt_mfn[level] = next_table_mfn;
     return 0;
 }
@@ -284,9 +285,7 @@  int amd_iommu_map_page(struct domain *d,
 {
     struct domain_iommu *hd = dom_iommu(d);
     int rc;
-    unsigned long pt_mfn[7];
-
-    memset(pt_mfn, 0, sizeof(pt_mfn));
+    unsigned long pt_mfn[IOMMU_PDE_NEXT_LEVEL_MAX + 1] = {};
 
     spin_lock(&hd->arch.mapping_lock);
 
@@ -300,7 +299,8 @@  int amd_iommu_map_page(struct domain *d,
         return rc;
     }
 
-    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn, true) || (pt_mfn[1] == 0) )
+    if ( iommu_pde_from_dfn(d, dfn_x(dfn), pt_mfn, true) ||
+         !pt_mfn[IOMMU_PDE_NEXT_LEVEL_MIN] )
     {
         spin_unlock(&hd->arch.mapping_lock);
         AMD_IOMMU_DEBUG("Invalid IO pagetable entry dfn = %"PRI_dfn"\n",
@@ -310,9 +310,11 @@  int amd_iommu_map_page(struct domain *d,
     }
 
     /* Install 4k mapping */
-    *flush_flags |= set_iommu_pte_present(pt_mfn[1], dfn_x(dfn), mfn_x(mfn),
-                                          1, (flags & IOMMUF_writable),
-                                          (flags & IOMMUF_readable));
+    *flush_flags |= set_iommu_pte_present(pt_mfn[IOMMU_PDE_NEXT_LEVEL_MIN],
+                                          dfn_x(dfn), mfn_x(mfn),
+                                          IOMMU_PDE_NEXT_LEVEL_MIN,
+                                          flags & IOMMUF_writable,
+                                          flags & IOMMUF_readable);
 
     spin_unlock(&hd->arch.mapping_lock);
 
@@ -322,11 +324,9 @@  int amd_iommu_map_page(struct domain *d,
 int amd_iommu_unmap_page(struct domain *d, dfn_t dfn,
                          unsigned int *flush_flags)
 {
-    unsigned long pt_mfn[7];
+    unsigned long pt_mfn[IOMMU_PDE_NEXT_LEVEL_MAX + 1] = {};
     struct domain_iommu *hd = dom_iommu(d);
 
-    memset(pt_mfn, 0, sizeof(pt_mfn));
-
     spin_lock(&hd->arch.mapping_lock);
 
     if ( !hd->arch.root_table )
@@ -344,10 +344,12 @@  int amd_iommu_unmap_page(struct domain *
         return -EFAULT;
     }
 
-    if ( pt_mfn[1] )
+    if ( pt_mfn[IOMMU_PDE_NEXT_LEVEL_MIN] )
     {
         /* Mark PTE as 'page not present'. */
-        *flush_flags |= clear_iommu_pte_present(pt_mfn[1], dfn_x(dfn));
+        *flush_flags |=
+            clear_iommu_pte_present(pt_mfn[IOMMU_PDE_NEXT_LEVEL_MIN],
+                                    dfn_x(dfn));
     }
 
     spin_unlock(&hd->arch.mapping_lock);
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -233,14 +233,14 @@  static int __must_check allocate_domain_
 
 int amd_iommu_get_paging_mode(unsigned long entries)
 {
-    int level = 1;
+    int level = IOMMU_PDE_NEXT_LEVEL_MIN;
 
     BUG_ON( !entries );
 
     while ( entries > PTE_PER_TABLE_SIZE )
     {
         entries = PTE_PER_TABLE_ALIGN(entries) >> PTE_PER_TABLE_SHIFT;
-        if ( ++level > 6 )
+        if ( ++level > IOMMU_PDE_NEXT_LEVEL_MAX )
             return -ENOMEM;
     }
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -465,6 +465,9 @@  union amd_iommu_x2apic_control {
 #define IOMMU_PAGE_TABLE_U32_PER_ENTRY	(IOMMU_PAGE_TABLE_ENTRY_SIZE / 4)
 #define IOMMU_PAGE_TABLE_ALIGNMENT	4096
 
+#define IOMMU_PDE_NEXT_LEVEL_MIN	1
+#define IOMMU_PDE_NEXT_LEVEL_MAX	6
+
 struct amd_iommu_pte {
     uint64_t pr:1;
     uint64_t ignored0:4;