diff mbox

[PART2,v4,07/11] iommu/amd: Introduce amd_iommu_update_ga()

Message ID 1468416032-7692-8-git-send-email-suravee.suthikulpanit@amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Suthikulpanit, Suravee July 13, 2016, 1:20 p.m. UTC
From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>

Introduces a new IOMMU API, amd_iommu_update_ga(), which allows
KVM (SVM) to update existing posted interrupt IOMMU IRTE when
load/unload vcpu.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
---
 drivers/iommu/amd_iommu.c       | 65 +++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/amd_iommu_types.h |  1 +
 include/linux/amd-iommu.h       |  9 ++++++
 3 files changed, 75 insertions(+)

Comments

Suthikulpanit, Suravee July 14, 2016, 9:13 a.m. UTC | #1
Hi Radim,

On 7/13/16 21:14, Radim Krčmář wrote:
> [I pasted v3 reviews prefixed with a pipe where I think they still apply.]
>
> 2016-07-13 08:20-0500, Suravee Suthikulpanit:
>> From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>>
>> Introduces a new IOMMU API, amd_iommu_update_ga(), which allows
>> KVM (SVM) to update existing posted interrupt IOMMU IRTE when
>> load/unload vcpu.
>>
>> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>> ---
>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> @@ -4461,4 +4461,69 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
>> +int amd_iommu_update_ga(u32 vcpu_id, u32 cpu, u32 vm_id,
>> +			u64 base, bool is_run)
>
> |2016-07-13 15:49+0700, Suravee Suthikulpanit:
> |> On 07/12/2016 01:59 AM, Radim Krčmář wrote:
> |>> Not just in this function does the interface between svm and iommu split
> |>> ga_tag into its two components (vcpu_id and ga_tag), but it seems that
> |>> the combined value could always be used instead ...
> |>> Is there an advantage to passing two values?
> |>
> |> Here, the amd_iommu_update_ga() takes the two separate value for input
> |> parameters. Mainly the ga_tag (which is really the vm_id) and vcpu_id. This
> |> allow IOMMU driver to decide how to encode the GATAG to be programmed into
> |> the IRTE. Currently, the actual GATAG is a 16-bit value, <vm_id><vcpu_id>.
> |> This keeps the interface independent from how we encode the GATAG.
>
> I was thinking about making the IOMMU unaware how SVM or other Linux
> hypervisors use the ga_tag, i.e. passing the final u32 ga_tag.
> For example 32 bit hypervisor doesn't need to use lookup, because any
> pointer can used as the ga_tag directly.

Ahh....... (w/ a big light bulb)

I get your point now. Let's just have SVM (or other hypervisor) define 
what the tag should be and just pass-on the value to IOMMU. IOMMU can 
just simply use this w/o knowing what it is.  Sorry, I'm slow :)

> And there are other viable algoritms for assigning the ga_tag --
 > why isn't the vm_id 24 bits?

Good point! Actually, I am somehow limited to 30-bit hash value. So, the 
VM_ID can be 22 bits, I'll make that change.

>
>> +	unsigned long flags;
>> +	struct amd_iommu *iommu;
>> +
>> +	if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
>> +		return 0;
>> +
>> +	for_each_iommu(iommu) {
>> +		struct amd_ir_data *ir_data;
>> +
>> +		spin_lock_irqsave(&iommu->gatag_ir_hash_lock, flags);
>> +
>> +		/* Note:
>> +		 * We need to update all interrupt remapping table entries
>> +		 * for targeting the specified vcpu. Here, we use gatag
>> +		 * as a hash key and iterate through all entries in the bucket.
>> +		 */
>> +		hash_for_each_possible(iommu->gatag_ir_hash, ir_data, hnode,
>> +				       AMD_IOMMU_GATAG(vm_id, vcpu_id)) {
>> +			struct irte_ga *irte = (struct irte_ga *) ir_data->entry;
>
> |>> (The ga_tag check is missing here too.)
> |>
> |> Here, the intention is to update all interrupt remapping entries in the
> |> bucket w/ the same GATAG (i.e. vm_id + vcpu_id), where GATAG =
> |> AMD_IOMMU_GATAG(vm_id, vcpu_id).
>
> Which is why you need to check that
>    AMD_IOMMU_GATAG(vm_id, vcpu_id) == entry->fields_vapic.ga_tag
>
> The hashing function can map two different vm_id + vcpu_id to the same
> bucket and hash_for_each_possible() would return both of them, but only
> one belongs to the VCPU that we want to update.
>
> (And shouldn't there be only one match?)

Actually, with your suggestion above, the hask key would be (vm_id & 
0x3FFFFF << 8)| (vcpu_id & 0xFF). So, it should be unique for each vcpu 
of each vm, or am I still missing something?

Also, since we will not be passing the vmid and vcpuid as separate 
value, and just passing the (u32)ga_tag, we would not be able to do the 
check you suggested here.

>
>> +
>> +			if (!irte->lo.fields_vapic.guest_mode)
>> +				continue;
>> +
>> +			update_irte_ga((struct irte_ga *)ir_data->ref,
>> +					ir_data->irq_2_irte.devid,
>> +					base, cpu, is_run);
>
> |>> (The lookup leading up to here is avoidable -- svm, the caller, has the
> |>>   ability to map ga_tag into irte/ir_data directly with a pointer.

I'm not sure about this optimization to avoid look up.

The struct amd_ir_data is part of the IOMMU driver, and the SVM knows 
nothing about it. I don't think it would be able to find out the pointer 
to amd_ir_data/irte.

Also, with the current design, each ga_tag can be mapped to different 
irte since there could be multiple interrupts targeting a particular 
cpu. Here, we would want to update all of the IRTEs with the same ga_tag.

> |>>   I'm not sure if the lookup is slow enough to pardon optimization, but
> |>>   it might make the code simpler as well.)
> |>
> |> I might have mislead you up to this point. Not sure if the assumption here
> |> still hold with my explanation above. Sorry for confusion.
>
> SVM configures IOMMU with ga_tag, so IOMMU could return the pointer to
> ir_data/irte that was just configured.

Also, IIUC, you want to use the pointer to ir_data/irte as the ga_tag 
value. The issue would be ga_tag is a 32-bit value, and this would not 
work with 64-bit address.

> SVM would couple it with a VCPU
> (and hence a ga_tag) and when amd_iommu_update_ga() was needed, SVM
> would pass the ir_data/irte pointer directly, instead of looking it up
> though a ga_tag.

Please let me know if I am still missing any points.

Thanks,
Suravee
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Suthikulpanit, Suravee July 14, 2016, 9:33 a.m. UTC | #2
On 7/14/16 16:13, Suravee Suthikulpanit wrote:
>>>    unsigned long flags;
>>> +    struct amd_iommu *iommu;
>>> +
>>> +    if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
>>> +        return 0;
>>> +
>>> +    for_each_iommu(iommu) {
>>> +        struct amd_ir_data *ir_data;
>>> +
>>> +        spin_lock_irqsave(&iommu->gatag_ir_hash_lock, flags);
>>> +
>>> +        /* Note:
>>> +         * We need to update all interrupt remapping table entries
>>> +         * for targeting the specified vcpu. Here, we use gatag
>>> +         * as a hash key and iterate through all entries in the bucket.
>>> +         */
>>> +        hash_for_each_possible(iommu->gatag_ir_hash, ir_data, hnode,
>>> +                       AMD_IOMMU_GATAG(vm_id, vcpu_id)) {
>>> +            struct irte_ga *irte = (struct irte_ga *) ir_data->entry;
>>
>> |>> (The ga_tag check is missing here too.)
>> |>
>> |> Here, the intention is to update all interrupt remapping entries in
>> the
>> |> bucket w/ the same GATAG (i.e. vm_id + vcpu_id), where GATAG =
>> |> AMD_IOMMU_GATAG(vm_id, vcpu_id).
>>
>> Which is why you need to check that
>>    AMD_IOMMU_GATAG(vm_id, vcpu_id) == entry->fields_vapic.ga_tag
>>
>> The hashing function can map two different vm_id + vcpu_id to the same
>> bucket and hash_for_each_possible() would return both of them, but only
>> one belongs to the VCPU that we want to update.
>>
>> (And shouldn't there be only one match?)
>
> Actually, with your suggestion above, the hask key would be (vm_id &
> 0x3FFFFF << 8)| (vcpu_id & 0xFF). So, it should be unique for each vcpu
> of each vm, or am I still missing something?

Ok, one scenario would be when SVM run out of the VM_ID and having to 
start re-using them. Since we want SVM to generate ga_tag and just pass 
into IOMMU driver for it to program the IRTE, we probably can make an 
assumption that SVM would make sure that ga_tag would not conflict for 
each vm_id/vcpu_id.

Thanks,
Suravee

Thanks,
Suravee
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář July 14, 2016, 1:40 p.m. UTC | #3
2016-07-14 16:13+0700, Suravee Suthikulpanit:
> On 7/13/16 21:14, Radim Krčmář wrote:
>> 2016-07-13 08:20-0500, Suravee Suthikulpanit:
>> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
>> > @@ -4461,4 +4461,69 @@ int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
>> > +int amd_iommu_update_ga(u32 vcpu_id, u32 cpu, u32 vm_id,
>> > +			u64 base, bool is_run)
>> 
>> |2016-07-13 15:49+0700, Suravee Suthikulpanit:
>> |> On 07/12/2016 01:59 AM, Radim Krčmář wrote:
>> |>> Not just in this function does the interface between svm and iommu split
>> |>> ga_tag into its two components (vcpu_id and ga_tag), but it seems that
>> |>> the combined value could always be used instead ...
>> |>> Is there an advantage to passing two values?
>> |>
>> |> Here, the amd_iommu_update_ga() takes the two separate value for input
>> |> parameters. Mainly the ga_tag (which is really the vm_id) and vcpu_id. This
>> |> allow IOMMU driver to decide how to encode the GATAG to be programmed into
>> |> the IRTE. Currently, the actual GATAG is a 16-bit value, <vm_id><vcpu_id>.
>> |> This keeps the interface independent from how we encode the GATAG.
>> 
>> I was thinking about making the IOMMU unaware how SVM or other Linux
>> hypervisors use the ga_tag, i.e. passing the final u32 ga_tag.
>> For example 32 bit hypervisor doesn't need to use lookup, because any
>> pointer can used as the ga_tag directly.
> 
> Ahh....... (w/ a big light bulb)
> I get your point now. Let's just have SVM (or other hypervisor) define what
> the tag should be and just pass-on the value to IOMMU. IOMMU can just simply
> use this w/o knowing what it is.  Sorry, I'm slow :)

That is what I meant, but misunderstanding is a product of both
participants.  I didn't write it clearly on the first try.

>> > +		hash_for_each_possible(iommu->gatag_ir_hash, ir_data, hnode,
>> > +				       AMD_IOMMU_GATAG(vm_id, vcpu_id)) {
>> > +			struct irte_ga *irte = (struct irte_ga *) ir_data->entry;
>> 
>> |>> (The ga_tag check is missing here too.)
>> |>
>> |> Here, the intention is to update all interrupt remapping entries in the
>> |> bucket w/ the same GATAG (i.e. vm_id + vcpu_id), where GATAG =
>> |> AMD_IOMMU_GATAG(vm_id, vcpu_id).
>> 
>> Which is why you need to check that
>>    AMD_IOMMU_GATAG(vm_id, vcpu_id) == entry->fields_vapic.ga_tag
>> 
>> The hashing function can map two different vm_id + vcpu_id to the same
>> bucket and hash_for_each_possible() would return both of them, but only
>> one belongs to the VCPU that we want to update.
>> 
>> (And shouldn't there be only one match?)
> 
> Actually, with your suggestion above, the hask key would be (vm_id &
> 0x3FFFFF << 8)| (vcpu_id & 0xFF). So, it should be unique for each vcpu of
> each vm, or am I still missing something?

[Reply in the followup mail.]

> Also, since we will not be passing the vmid and vcpuid as separate value,
> and just passing the (u32)ga_tag, we would not be able to do the check you
> suggested here.

There will be the u32 ga_tag argument, so you would still do
  ga_tag == entry->fields_vapic.ga_tag

Because even if the ga_tag is unique for every vcpu, the hash table will
mix various vcpus into one bucket and you need to filter them.

>> > +			update_irte_ga((struct irte_ga *)ir_data->ref,
>> > +					ir_data->irq_2_irte.devid,
>> > +					base, cpu, is_run);
>> 
>> |>> (The lookup leading up to here is avoidable -- svm, the caller, has the
>> |>>   ability to map ga_tag into irte/ir_data directly with a pointer.
> 
> I'm not sure about this optimization to avoid look up.
> 
> The struct amd_ir_data is part of the IOMMU driver, and the SVM knows
> nothing about it. I don't think it would be able to find out the pointer to
> amd_ir_data/irte.

Yeah, SVM would store it in a "void *" pointer, because it doesn't need
to know anything else, but you still need to retrieve it from IOMMU,
which could be done through vcpu_info argument to
amd_ir_set_vcpu_affinity().

(I am not sure if it doesn't breach isolation of IOMMU, so we might not
 want to do it in any case ...)

> Also, with the current design, each ga_tag can be mapped to different irte
> since there could be multiple interrupts targeting a particular cpu. Here,
> we would want to update all of the IRTEs with the same ga_tag.

True, that design is good.  SVM would need a list of pointers for each
vcpu to cope with it ...

>> |>>   I'm not sure if the lookup is slow enough to pardon optimization, but
>> |>>   it might make the code simpler as well.)
>> |>
>> |> I might have mislead you up to this point. Not sure if the assumption here
>> |> still hold with my explanation above. Sorry for confusion.
>> 
>> SVM configures IOMMU with ga_tag, so IOMMU could return the pointer to
>> ir_data/irte that was just configured.
> 
> Also, IIUC, you want to use the pointer to ir_data/irte as the ga_tag value.
> The issue would be ga_tag is a 32-bit value, and this would not work with
> 64-bit address.

I mean something slightly different.  Instead of passing ga_tag into
amd_iommu_update_ga(), just pass void * of whatever IOMMU provided back
when SVM configured the interrupt.  ga_tag will never come into play.

(The vcpu lookup from ga_tag is necessary, when processing the queue of
 undelivered interrupts.  ir_data lookup can be avoided.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář July 14, 2016, 1:45 p.m. UTC | #4
2016-07-14 16:33+0700, Suravee Suthikulpanit:
> On 7/14/16 16:13, Suravee Suthikulpanit wrote:
>> > >    unsigned long flags;
>> > > +    struct amd_iommu *iommu;
>> > > +
>> > > +    if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
>> > > +        return 0;
>> > > +
>> > > +    for_each_iommu(iommu) {
>> > > +        struct amd_ir_data *ir_data;
>> > > +
>> > > +        spin_lock_irqsave(&iommu->gatag_ir_hash_lock, flags);
>> > > +
>> > > +        /* Note:
>> > > +         * We need to update all interrupt remapping table entries
>> > > +         * for targeting the specified vcpu. Here, we use gatag
>> > > +         * as a hash key and iterate through all entries in the bucket.
>> > > +         */
>> > > +        hash_for_each_possible(iommu->gatag_ir_hash, ir_data, hnode,
>> > > +                       AMD_IOMMU_GATAG(vm_id, vcpu_id)) {
>> > > +            struct irte_ga *irte = (struct irte_ga *) ir_data->entry;
>> > 
>> > |>> (The ga_tag check is missing here too.)
>> > |>
>> > |> Here, the intention is to update all interrupt remapping entries in
>> > the
>> > |> bucket w/ the same GATAG (i.e. vm_id + vcpu_id), where GATAG =
>> > |> AMD_IOMMU_GATAG(vm_id, vcpu_id).
>> > 
>> > Which is why you need to check that
>> >    AMD_IOMMU_GATAG(vm_id, vcpu_id) == entry->fields_vapic.ga_tag
>> > 
>> > The hashing function can map two different vm_id + vcpu_id to the same
>> > bucket and hash_for_each_possible() would return both of them, but only
>> > one belongs to the VCPU that we want to update.
>> > 
>> > (And shouldn't there be only one match?)
>> 
>> Actually, with your suggestion above, the hask key would be (vm_id &
>> 0x3FFFFF << 8)| (vcpu_id & 0xFF). So, it should be unique for each vcpu
>> of each vm, or am I still missing something?
> 
> Ok, one scenario would be when SVM run out of the VM_ID and having to start
> re-using them. Since we want SVM to generate ga_tag and just pass into IOMMU
> driver for it to program the IRTE, we probably can make an assumption that
> SVM would make sure that ga_tag would not conflict for each vm_id/vcpu_id.

I agree, it could enable doorbell to an unscheduled VCPU and therefore
lose the notification.

The per-vcpu list of IRTEs would solve it as well, but making sure that
no two VMs have the same id might be easier and 2^22 active VMs should
be more than enough. :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index fe9b005..4a337dc 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4461,4 +4461,69 @@  int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 
 	return 0;
 }
+
+static int
+update_irte_ga(struct irte_ga *irte, unsigned int devid,
+	       u64 base, int cpu, bool is_run)
+{
+	struct irq_remap_table *irt = get_irq_table(devid, false);
+	unsigned long flags;
+
+	if (!irt)
+		return -ENODEV;
+
+	spin_lock_irqsave(&irt->lock, flags);
+
+	if (irte->lo.fields_vapic.guest_mode) {
+		irte->hi.fields.ga_root_ptr = (base >> 12);
+		if (cpu >= 0)
+			irte->lo.fields_vapic.destination = cpu;
+		irte->lo.fields_vapic.is_run = is_run;
+		barrier();
+	}
+
+	spin_unlock_irqrestore(&irt->lock, flags);
+
+	return 0;
+}
+
+int amd_iommu_update_ga(u32 vcpu_id, u32 cpu, u32 vm_id,
+			u64 base, bool is_run)
+{
+	unsigned long flags;
+	struct amd_iommu *iommu;
+
+	if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
+		return 0;
+
+	for_each_iommu(iommu) {
+		struct amd_ir_data *ir_data;
+
+		spin_lock_irqsave(&iommu->gatag_ir_hash_lock, flags);
+
+		/* Note:
+		 * We need to update all interrupt remapping table entries
+		 * for targeting the specified vcpu. Here, we use gatag
+		 * as a hash key and iterate through all entries in the bucket.
+		 */
+		hash_for_each_possible(iommu->gatag_ir_hash, ir_data, hnode,
+				       AMD_IOMMU_GATAG(vm_id, vcpu_id)) {
+			struct irte_ga *irte = (struct irte_ga *) ir_data->entry;
+
+			if (!irte->lo.fields_vapic.guest_mode)
+				continue;
+
+			update_irte_ga((struct irte_ga *)ir_data->ref,
+					ir_data->irq_2_irte.devid,
+					base, cpu, is_run);
+			iommu_flush_irt(iommu, ir_data->irq_2_irte.devid);
+			iommu_completion_wait(iommu);
+		}
+
+		spin_unlock_irqrestore(&iommu->gatag_ir_hash_lock, flags);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(amd_iommu_update_ga);
 #endif
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 2ed353b..52160c8 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -844,6 +844,7 @@  struct amd_ir_data {
 	union {
 		struct msi_msg msi_entry;
 	};
+	void *ref;	/* Pointer to the actual irte */
 };
 
 #ifdef CONFIG_IRQ_REMAP
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 940fdd8..a6fc022 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -179,6 +179,9 @@  static inline int amd_iommu_detect(void) { return -ENODEV; }
 /* IOMMU AVIC Function */
 extern int amd_iommu_register_ga_log_notifier(int (*notifier)(int, int));
 
+extern int
+amd_iommu_update_ga(u32 vcpu_id, u32 cpu, u32 vm_id, u64 base, bool is_run);
+
 #else /* defined(CONFIG_AMD_IOMMU) && defined(CONFIG_IRQ_REMAP) */
 
 static inline int
@@ -187,6 +190,12 @@  amd_iommu_register_ga_log_notifier(int (*notifier)(int, int))
 	return 0;
 }
 
+static inline int
+amd_iommu_update_ga(u32 vcpu_id, u32 cpu, u32 vm_id, u64 base, bool is_run)
+{
+	return 0;
+}
+
 #endif /* defined(CONFIG_AMD_IOMMU) && defined(CONFIG_IRQ_REMAP) */
 
 #endif /* _ASM_X86_AMD_IOMMU_H */