diff mbox series

[1/9] AMD/IOMMU: redo awaiting of command completion

Message ID 1bb49cce-537e-9fba-80bb-82bf502728d0@suse.com (mailing list archive)
State New, archived
Headers show
Series IOMMU: XSA-373 follow-on | expand

Commit Message

Jan Beulich June 9, 2021, 9:26 a.m. UTC
The present abuse of the completion interrupt does not only stand in the
way of, down the road, using it for its actual purpose, but also
requires holding the IOMMU lock while waiting for command completion,
limiting parallelism and keeping interrupts off for non-negligible
periods of time. Have the IOMMU do an ordinary memory write instead of
signaling an otherwise disabled interrupt (by just updating a status
register bit).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

Comments

Andrew Cooper June 9, 2021, 10:36 a.m. UTC | #1
On 09/06/2021 10:26, Jan Beulich wrote:
> The present abuse of the completion interrupt does not only stand in the
> way of, down the road, using it for its actual purpose, but also
> requires holding the IOMMU lock while waiting for command completion,
> limiting parallelism and keeping interrupts off for non-negligible
> periods of time. Have the IOMMU do an ordinary memory write instead of
> signaling an otherwise disabled interrupt (by just updating a status
> register bit).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Paul Durrant <paul@xen.org>

While I agree with the direction of the patch, some of the details could
do with improvement.

>
> --- a/xen/drivers/passthrough/amd/iommu_cmd.c
> +++ b/xen/drivers/passthrough/amd/iommu_cmd.c
> @@ -20,6 +20,9 @@
>  #include "iommu.h"
>  #include "../ats.h"
>  
> +#define CMD_COMPLETION_INIT 0
> +#define CMD_COMPLETION_DONE 1
> +
>  static void send_iommu_command(struct amd_iommu *iommu,
>                                 const uint32_t cmd[4])
>  {
> @@ -49,28 +52,31 @@ static void send_iommu_command(struct am
>  static void flush_command_buffer(struct amd_iommu *iommu,
>                                   unsigned int timeout_base)
>  {
> +    static DEFINE_PER_CPU(uint64_t, poll_slot);
> +    uint64_t *this_poll_slot = &this_cpu(poll_slot);
> +    paddr_t addr = virt_to_maddr(this_poll_slot);
>      uint32_t cmd[4];
>      s_time_t start, timeout;
>      static unsigned int __read_mostly threshold = 1;
>  
> -    /* RW1C 'ComWaitInt' in status register */
> -    writel(IOMMU_STATUS_COMP_WAIT_INT,
> -           iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
> -
> -    /* send an empty COMPLETION_WAIT command to flush command buffer */
> -    cmd[3] = cmd[2] = 0;
> -    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, 0,
> +    ACCESS_ONCE(*this_poll_slot) = CMD_COMPLETION_INIT;
> +
> +    /* send a COMPLETION_WAIT command to flush command buffer */
> +    cmd[0] = addr;
> +    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, cmd[0],
> +                         IOMMU_COMP_WAIT_S_FLAG_MASK,
> +                         IOMMU_COMP_WAIT_S_FLAG_SHIFT, &cmd[0]);

set_field_in_reg_u32() is a disaster of a function - both in terms of
semantics, and code gen - and needs to be purged from the code.

It is a shame we don't have a real struct for objects in the command
buffer, but in lieu of that, this is just

    cmd[0] = addr | IOMMU_COMP_WAIT_S_FLAG_MASK;

which is the direction that previous cleanup has gone.

There are no current users of IOMMU_COMP_WAIT_S_FLAG_SHIFT, and ...

> +    cmd[1] = addr >> 32;
> +    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, cmd[1],
>                           IOMMU_CMD_OPCODE_MASK,
>                           IOMMU_CMD_OPCODE_SHIFT, &cmd[1]);
> -    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
> -                         IOMMU_COMP_WAIT_I_FLAG_MASK,
> -                         IOMMU_COMP_WAIT_I_FLAG_SHIFT, &cmd[0]);

... this drops the final use of IOMMU_COMP_WAIT_I_FLAG_SHIFT, so both
should be dropped.

As for IOMMU_CMD_OPCODE_SHIFT, that can't be dropped yet, but it would
still be better to use

    cmd[1] = (addr >> 32) | MASK_INSR(IOMMU_CMD_COMPLETION_WAIT,
IOMMU_CMD_COMPLETION_WAIT);

in the short term.

~Andrew

P.S. an observation of cmd[1] means that AMD IOMMUs don't actually work
for a physaddr width of >52, despite some support along these lines
elsewhere in the spec.
Jan Beulich June 9, 2021, 12:08 p.m. UTC | #2
On 09.06.2021 12:36, Andrew Cooper wrote:
> On 09/06/2021 10:26, Jan Beulich wrote:
>> @@ -49,28 +52,31 @@ static void send_iommu_command(struct am
>>  static void flush_command_buffer(struct amd_iommu *iommu,
>>                                   unsigned int timeout_base)
>>  {
>> +    static DEFINE_PER_CPU(uint64_t, poll_slot);
>> +    uint64_t *this_poll_slot = &this_cpu(poll_slot);
>> +    paddr_t addr = virt_to_maddr(this_poll_slot);
>>      uint32_t cmd[4];
>>      s_time_t start, timeout;
>>      static unsigned int __read_mostly threshold = 1;
>>  
>> -    /* RW1C 'ComWaitInt' in status register */
>> -    writel(IOMMU_STATUS_COMP_WAIT_INT,
>> -           iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
>> -
>> -    /* send an empty COMPLETION_WAIT command to flush command buffer */
>> -    cmd[3] = cmd[2] = 0;
>> -    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, 0,
>> +    ACCESS_ONCE(*this_poll_slot) = CMD_COMPLETION_INIT;
>> +
>> +    /* send a COMPLETION_WAIT command to flush command buffer */
>> +    cmd[0] = addr;
>> +    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, cmd[0],
>> +                         IOMMU_COMP_WAIT_S_FLAG_MASK,
>> +                         IOMMU_COMP_WAIT_S_FLAG_SHIFT, &cmd[0]);
> 
> set_field_in_reg_u32() is a disaster of a function - both in terms of
> semantics, and code gen - and needs to be purged from the code.

Long ago I had an item on my todo list to get this cleaned up. But
it never really having made it up high enough, I dropped it at
some point, in the hope that we'd manage to get this sorted while
re-writing code step by step.

> It is a shame we don't have a real struct for objects in the command
> buffer, but in lieu of that, this is just
> 
>     cmd[0] = addr | IOMMU_COMP_WAIT_S_FLAG_MASK;
> 
> which is the direction that previous cleanup has gone.

I don't think I can spot a single instance of such. Some work was
done to introduce (mainly bitfield) structs, but this surely goes
too far for the change at hand. I can spot two instances using
MASK_INSR(), so I can see two consistent ways of doing what you
ask for:

    cmd[0] = addr | MASK_INSR(IOMMU_CONTROL_ENABLED,
                              IOMMU_COMP_WAIT_S_FLAG_MASK);

keeping the name as *_MASK (and I'd be open to replace
IOMMU_CONTROL_ENABLED by true) or

    cmd[0] = addr | IOMMU_COMP_WAIT_S_FLAG;

i.e. dropping _MASK (but requiring adjustments elsewhere in the
code). Please let me know which one you'd prefer.

> There are no current users of IOMMU_COMP_WAIT_S_FLAG_SHIFT, and ...
> 
>> +    cmd[1] = addr >> 32;
>> +    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, cmd[1],
>>                           IOMMU_CMD_OPCODE_MASK,
>>                           IOMMU_CMD_OPCODE_SHIFT, &cmd[1]);
>> -    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
>> -                         IOMMU_COMP_WAIT_I_FLAG_MASK,
>> -                         IOMMU_COMP_WAIT_I_FLAG_SHIFT, &cmd[0]);
> 
> ... this drops the final use of IOMMU_COMP_WAIT_I_FLAG_SHIFT, so both
> should be dropped.

Well, I can surely do so, but like this entire request of yours this
feels like scope creep - there was no intention here to do any
unrelated cleanup. And if I remove _S_ and _I_, then surely _F_
wants dropping as well, while IOMMU_COMP_WAIT_ADDR_*_SHIFT have a
use each in iommu_guest.c and hence need to stay for now.

> As for IOMMU_CMD_OPCODE_SHIFT, that can't be dropped yet, but it would
> still be better to use
> 
>     cmd[1] = (addr >> 32) | MASK_INSR(IOMMU_CMD_COMPLETION_WAIT,
> IOMMU_CMD_COMPLETION_WAIT);
> 
> in the short term.

Can do (using IOMMU_CMD_OPCODE_MASK).

Jan
Andrew Cooper June 9, 2021, 12:33 p.m. UTC | #3
On 09/06/2021 13:08, Jan Beulich wrote:
> On 09.06.2021 12:36, Andrew Cooper wrote:
>> On 09/06/2021 10:26, Jan Beulich wrote:
>>> @@ -49,28 +52,31 @@ static void send_iommu_command(struct am
>>>  static void flush_command_buffer(struct amd_iommu *iommu,
>>>                                   unsigned int timeout_base)
>>>  {
>>> +    static DEFINE_PER_CPU(uint64_t, poll_slot);
>>> +    uint64_t *this_poll_slot = &this_cpu(poll_slot);
>>> +    paddr_t addr = virt_to_maddr(this_poll_slot);
>>>      uint32_t cmd[4];
>>>      s_time_t start, timeout;
>>>      static unsigned int __read_mostly threshold = 1;
>>>  
>>> -    /* RW1C 'ComWaitInt' in status register */
>>> -    writel(IOMMU_STATUS_COMP_WAIT_INT,
>>> -           iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
>>> -
>>> -    /* send an empty COMPLETION_WAIT command to flush command buffer */
>>> -    cmd[3] = cmd[2] = 0;
>>> -    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, 0,
>>> +    ACCESS_ONCE(*this_poll_slot) = CMD_COMPLETION_INIT;
>>> +
>>> +    /* send a COMPLETION_WAIT command to flush command buffer */
>>> +    cmd[0] = addr;
>>> +    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, cmd[0],
>>> +                         IOMMU_COMP_WAIT_S_FLAG_MASK,
>>> +                         IOMMU_COMP_WAIT_S_FLAG_SHIFT, &cmd[0]);
>> set_field_in_reg_u32() is a disaster of a function - both in terms of
>> semantics, and code gen - and needs to be purged from the code.
> Long ago I had an item on my todo list to get this cleaned up. But
> it never really having made it up high enough, I dropped it at
> some point, in the hope that we'd manage to get this sorted while
> re-writing code step by step.
>
>> It is a shame we don't have a real struct for objects in the command
>> buffer, but in lieu of that, this is just
>>
>>     cmd[0] = addr | IOMMU_COMP_WAIT_S_FLAG_MASK;
>>
>> which is the direction that previous cleanup has gone.
> I don't think I can spot a single instance of such.

It's actually the other way around, for the emulation logic (which isn't
used in practice).

drivers/passthrough/amd/iommu_guest.c:348
    i = cmd->data[0] & IOMMU_COMP_WAIT_I_FLAG_MASK;

>  Some work was
> done to introduce (mainly bitfield) structs, but this surely goes
> too far for the change at hand. I can spot two instances using
> MASK_INSR(), so I can see two consistent ways of doing what you
> ask for:
>
>     cmd[0] = addr | MASK_INSR(IOMMU_CONTROL_ENABLED,
>                               IOMMU_COMP_WAIT_S_FLAG_MASK);
>
> keeping the name as *_MASK (and I'd be open to replace
> IOMMU_CONTROL_ENABLED by true) or
>
>     cmd[0] = addr | IOMMU_COMP_WAIT_S_FLAG;
>
> i.e. dropping _MASK (but requiring adjustments elsewhere in the
> code). Please let me know which one you'd prefer.

TBH, I'd suggest just using

    cmd[0] = addr | IOMMU_COMP_WAIT_S_FLAG_MASK;

for now.  The constant is correct - its just the name which is wonky. 
This in particular will reduce the code churn for ...

>> There are no current users of IOMMU_COMP_WAIT_S_FLAG_SHIFT, and ...
>>
>>> +    cmd[1] = addr >> 32;
>>> +    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, cmd[1],
>>>                           IOMMU_CMD_OPCODE_MASK,
>>>                           IOMMU_CMD_OPCODE_SHIFT, &cmd[1]);
>>> -    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
>>> -                         IOMMU_COMP_WAIT_I_FLAG_MASK,
>>> -                         IOMMU_COMP_WAIT_I_FLAG_SHIFT, &cmd[0]);
>> ... this drops the final use of IOMMU_COMP_WAIT_I_FLAG_SHIFT, so both
>> should be dropped.
> Well, I can surely do so, but like this entire request of yours this
> feels like scope creep - there was no intention here to do any
> unrelated cleanup. And if I remove _S_ and _I_, then surely _F_
> wants dropping as well, while IOMMU_COMP_WAIT_ADDR_*_SHIFT have a
> use each in iommu_guest.c and hence need to stay for now.

... this, which I'm perfectly happy leaving to a subsequent change. 
(I'll even do it, if you're too busy right now).

What I am mainly concerned with is not using this opportunity to remove
uses of set_field_in_reg_u32().

>
>> As for IOMMU_CMD_OPCODE_SHIFT, that can't be dropped yet, but it would
>> still be better to use
>>
>>     cmd[1] = (addr >> 32) | MASK_INSR(IOMMU_CMD_COMPLETION_WAIT,
>> IOMMU_CMD_COMPLETION_WAIT);
>>
>> in the short term.
> Can do (using IOMMU_CMD_OPCODE_MASK).

Oops yes.  That was a copy&paste mistake.

~Andrew
Jan Beulich June 9, 2021, 12:51 p.m. UTC | #4
On 09.06.2021 14:33, Andrew Cooper wrote:
> On 09/06/2021 13:08, Jan Beulich wrote:
>> On 09.06.2021 12:36, Andrew Cooper wrote:
>>> On 09/06/2021 10:26, Jan Beulich wrote:
>>>> @@ -49,28 +52,31 @@ static void send_iommu_command(struct am
>>>>  static void flush_command_buffer(struct amd_iommu *iommu,
>>>>                                   unsigned int timeout_base)
>>>>  {
>>>> +    static DEFINE_PER_CPU(uint64_t, poll_slot);
>>>> +    uint64_t *this_poll_slot = &this_cpu(poll_slot);
>>>> +    paddr_t addr = virt_to_maddr(this_poll_slot);
>>>>      uint32_t cmd[4];
>>>>      s_time_t start, timeout;
>>>>      static unsigned int __read_mostly threshold = 1;
>>>>  
>>>> -    /* RW1C 'ComWaitInt' in status register */
>>>> -    writel(IOMMU_STATUS_COMP_WAIT_INT,
>>>> -           iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
>>>> -
>>>> -    /* send an empty COMPLETION_WAIT command to flush command buffer */
>>>> -    cmd[3] = cmd[2] = 0;
>>>> -    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, 0,
>>>> +    ACCESS_ONCE(*this_poll_slot) = CMD_COMPLETION_INIT;
>>>> +
>>>> +    /* send a COMPLETION_WAIT command to flush command buffer */
>>>> +    cmd[0] = addr;
>>>> +    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, cmd[0],
>>>> +                         IOMMU_COMP_WAIT_S_FLAG_MASK,
>>>> +                         IOMMU_COMP_WAIT_S_FLAG_SHIFT, &cmd[0]);
>>> set_field_in_reg_u32() is a disaster of a function - both in terms of
>>> semantics, and code gen - and needs to be purged from the code.
>> Long ago I had an item on my todo list to get this cleaned up. But
>> it never really having made it up high enough, I dropped it at
>> some point, in the hope that we'd manage to get this sorted while
>> re-writing code step by step.
>>
>>> It is a shame we don't have a real struct for objects in the command
>>> buffer, but in lieu of that, this is just
>>>
>>>     cmd[0] = addr | IOMMU_COMP_WAIT_S_FLAG_MASK;
>>>
>>> which is the direction that previous cleanup has gone.
>> I don't think I can spot a single instance of such.
> 
> It's actually the other way around, for the emulation logic (which isn't
> used in practice).
> 
> drivers/passthrough/amd/iommu_guest.c:348
>     i = cmd->data[0] & IOMMU_COMP_WAIT_I_FLAG_MASK;
> 
>>  Some work was
>> done to introduce (mainly bitfield) structs, but this surely goes
>> too far for the change at hand. I can spot two instances using
>> MASK_INSR(), so I can see two consistent ways of doing what you
>> ask for:
>>
>>     cmd[0] = addr | MASK_INSR(IOMMU_CONTROL_ENABLED,
>>                               IOMMU_COMP_WAIT_S_FLAG_MASK);
>>
>> keeping the name as *_MASK (and I'd be open to replace
>> IOMMU_CONTROL_ENABLED by true) or
>>
>>     cmd[0] = addr | IOMMU_COMP_WAIT_S_FLAG;
>>
>> i.e. dropping _MASK (but requiring adjustments elsewhere in the
>> code). Please let me know which one you'd prefer.
> 
> TBH, I'd suggest just using
> 
>     cmd[0] = addr | IOMMU_COMP_WAIT_S_FLAG_MASK;
> 
> for now.  The constant is correct - its just the name which is wonky. 

But my previous reply was to make clear that I don't agree with
ORing in a *_MASK into a (to become) live value. *_MASK should be
used exclusively for masking, not as actual field values. Any
code violating this should imo be looked at with suspicion, as to
possibly having used the wrong value altogether.

> This in particular will reduce the code churn for ...
> 
>>> There are no current users of IOMMU_COMP_WAIT_S_FLAG_SHIFT, and ...
>>>
>>>> +    cmd[1] = addr >> 32;
>>>> +    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, cmd[1],
>>>>                           IOMMU_CMD_OPCODE_MASK,
>>>>                           IOMMU_CMD_OPCODE_SHIFT, &cmd[1]);
>>>> -    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
>>>> -                         IOMMU_COMP_WAIT_I_FLAG_MASK,
>>>> -                         IOMMU_COMP_WAIT_I_FLAG_SHIFT, &cmd[0]);
>>> ... this drops the final use of IOMMU_COMP_WAIT_I_FLAG_SHIFT, so both
>>> should be dropped.
>> Well, I can surely do so, but like this entire request of yours this
>> feels like scope creep - there was no intention here to do any
>> unrelated cleanup. And if I remove _S_ and _I_, then surely _F_
>> wants dropping as well, while IOMMU_COMP_WAIT_ADDR_*_SHIFT have a
>> use each in iommu_guest.c and hence need to stay for now.
> 
> ... this, which I'm perfectly happy leaving to a subsequent change. 
> (I'll even do it, if you're too busy right now).
> 
> What I am mainly concerned with is not using this opportunity to remove
> uses of set_field_in_reg_u32().

Well, when I put the patch together I was thinking of two "proper"
options - keeping the use of set_field_in_reg_u32(), or replacing it
by (bitfield) struct uses. The latter would be a far larger change,
and should imo be one on its own (i.e. no functional change) anyway.
Hence I went the former route. Since you vehemently ask for it, I'll
go the middle route you suggest, but this only sets us up for re-
writing this piece of code another time once we introduce proper
structs.

Jan
Jan Beulich June 10, 2021, 12:24 p.m. UTC | #5
On 09.06.2021 12:36, Andrew Cooper wrote:
> On 09/06/2021 10:26, Jan Beulich wrote:
>> The present abuse of the completion interrupt does not only stand in the
>> way of, down the road, using it for its actual purpose, but also
>> requires holding the IOMMU lock while waiting for command completion,
>> limiting parallelism and keeping interrupts off for non-negligible
>> periods of time. Have the IOMMU do an ordinary memory write instead of
>> signaling an otherwise disabled interrupt (by just updating a status
>> register bit).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Reviewed-by: Paul Durrant <paul@xen.org>
> 
> While I agree with the direction of the patch, some of the details could
> do with improvement.
> 
>>
>> --- a/xen/drivers/passthrough/amd/iommu_cmd.c
>> +++ b/xen/drivers/passthrough/amd/iommu_cmd.c
>> @@ -20,6 +20,9 @@
>>  #include "iommu.h"
>>  #include "../ats.h"
>>  
>> +#define CMD_COMPLETION_INIT 0
>> +#define CMD_COMPLETION_DONE 1
>> +
>>  static void send_iommu_command(struct amd_iommu *iommu,
>>                                 const uint32_t cmd[4])
>>  {
>> @@ -49,28 +52,31 @@ static void send_iommu_command(struct am
>>  static void flush_command_buffer(struct amd_iommu *iommu,
>>                                   unsigned int timeout_base)
>>  {
>> +    static DEFINE_PER_CPU(uint64_t, poll_slot);
>> +    uint64_t *this_poll_slot = &this_cpu(poll_slot);
>> +    paddr_t addr = virt_to_maddr(this_poll_slot);
>>      uint32_t cmd[4];
>>      s_time_t start, timeout;
>>      static unsigned int __read_mostly threshold = 1;
>>  
>> -    /* RW1C 'ComWaitInt' in status register */
>> -    writel(IOMMU_STATUS_COMP_WAIT_INT,
>> -           iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
>> -
>> -    /* send an empty COMPLETION_WAIT command to flush command buffer */
>> -    cmd[3] = cmd[2] = 0;
>> -    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, 0,
>> +    ACCESS_ONCE(*this_poll_slot) = CMD_COMPLETION_INIT;
>> +
>> +    /* send a COMPLETION_WAIT command to flush command buffer */
>> +    cmd[0] = addr;
>> +    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, cmd[0],
>> +                         IOMMU_COMP_WAIT_S_FLAG_MASK,
>> +                         IOMMU_COMP_WAIT_S_FLAG_SHIFT, &cmd[0]);
> 
> set_field_in_reg_u32() is a disaster of a function - both in terms of
> semantics, and code gen - and needs to be purged from the code.
> 
> It is a shame we don't have a real struct for objects in the command
> buffer, but in lieu of that, this is just
> 
>     cmd[0] = addr | IOMMU_COMP_WAIT_S_FLAG_MASK;
> 
> which is the direction that previous cleanup has gone.
> 
> There are no current users of IOMMU_COMP_WAIT_S_FLAG_SHIFT, and ...
> 
>> +    cmd[1] = addr >> 32;
>> +    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, cmd[1],
>>                           IOMMU_CMD_OPCODE_MASK,
>>                           IOMMU_CMD_OPCODE_SHIFT, &cmd[1]);
>> -    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
>> -                         IOMMU_COMP_WAIT_I_FLAG_MASK,
>> -                         IOMMU_COMP_WAIT_I_FLAG_SHIFT, &cmd[0]);
> 
> ... this drops the final use of IOMMU_COMP_WAIT_I_FLAG_SHIFT, so both
> should be dropped.
> 
> As for IOMMU_CMD_OPCODE_SHIFT, that can't be dropped yet, but it would
> still be better to use
> 
>     cmd[1] = (addr >> 32) | MASK_INSR(IOMMU_CMD_COMPLETION_WAIT,
> IOMMU_CMD_COMPLETION_WAIT);
> 
> in the short term.

Okay, this conversion has indeed saved a single

	and	$0x0FFFFFFF, %eax

But we're down by two set_field_in_reg_u32() now; only some 30 left.

Jan
diff mbox series

Patch

--- a/xen/drivers/passthrough/amd/iommu_cmd.c
+++ b/xen/drivers/passthrough/amd/iommu_cmd.c
@@ -20,6 +20,9 @@ 
 #include "iommu.h"
 #include "../ats.h"
 
+#define CMD_COMPLETION_INIT 0
+#define CMD_COMPLETION_DONE 1
+
 static void send_iommu_command(struct amd_iommu *iommu,
                                const uint32_t cmd[4])
 {
@@ -49,28 +52,31 @@  static void send_iommu_command(struct am
 static void flush_command_buffer(struct amd_iommu *iommu,
                                  unsigned int timeout_base)
 {
+    static DEFINE_PER_CPU(uint64_t, poll_slot);
+    uint64_t *this_poll_slot = &this_cpu(poll_slot);
+    paddr_t addr = virt_to_maddr(this_poll_slot);
     uint32_t cmd[4];
     s_time_t start, timeout;
     static unsigned int __read_mostly threshold = 1;
 
-    /* RW1C 'ComWaitInt' in status register */
-    writel(IOMMU_STATUS_COMP_WAIT_INT,
-           iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
-
-    /* send an empty COMPLETION_WAIT command to flush command buffer */
-    cmd[3] = cmd[2] = 0;
-    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, 0,
+    ACCESS_ONCE(*this_poll_slot) = CMD_COMPLETION_INIT;
+
+    /* send a COMPLETION_WAIT command to flush command buffer */
+    cmd[0] = addr;
+    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, cmd[0],
+                         IOMMU_COMP_WAIT_S_FLAG_MASK,
+                         IOMMU_COMP_WAIT_S_FLAG_SHIFT, &cmd[0]);
+    cmd[1] = addr >> 32;
+    set_field_in_reg_u32(IOMMU_CMD_COMPLETION_WAIT, cmd[1],
                          IOMMU_CMD_OPCODE_MASK,
                          IOMMU_CMD_OPCODE_SHIFT, &cmd[1]);
-    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
-                         IOMMU_COMP_WAIT_I_FLAG_MASK,
-                         IOMMU_COMP_WAIT_I_FLAG_SHIFT, &cmd[0]);
+    cmd[2] = CMD_COMPLETION_DONE;
+    cmd[3] = 0;
     send_iommu_command(iommu, cmd);
 
     start = NOW();
     timeout = start + (timeout_base ?: 100) * MILLISECS(threshold);
-    while ( !(readl(iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET) &
-              IOMMU_STATUS_COMP_WAIT_INT) )
+    while ( ACCESS_ONCE(*this_poll_slot) != CMD_COMPLETION_DONE )
     {
         if ( timeout && NOW() > timeout )
         {