diff mbox

[v11,10/10] genirq/msi: use the MSI doorbell's IOVA when requested

Message ID 1468933367-23159-11-git-send-email-eric.auger@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric Auger July 19, 2016, 1:02 p.m. UTC
On MSI message composition we now use the MSI doorbell's IOVA in
place of the doorbell's PA in case the device is upstream to an
IOMMU that requires MSI addresses to be mapped. The doorbell's
allocation and mapping happened on an early stage (pci_enable_msi).

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v8 -> v9:
- Braces on both sides of the 'else' in msi_compose

v7 -> v8:
- use iommu_msi_msg_pa_to_va
- add WARN_ON

v6 -> v7:
- allocation/mapping is done at an earlier stage. We now just perform
  the iova lookup. So it is safe now to be called in a code that cannot
  sleep. iommu_msi_set_doorbell_iova is moved in the dma-reserved-iommu
  API: I think it cleans things up with respect to various #ifdef CONFIGS.

v5:
- use macros to increase the readability
- add comments
- fix a typo that caused a compilation error if CONFIG_IOMMU_API
  is not set
---
 kernel/irq/msi.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Thomas Gleixner July 20, 2016, 9:09 a.m. UTC | #1
On Tue, 19 Jul 2016, Eric Auger wrote:

First of all - valid for all patches:

Subject: sys/subsys: Sentence starts with an uppercase letter

Now for this particular one:

genirq/msi: use the MSI doorbell's IOVA when requested

> On MSI message composition we now use the MSI doorbell's IOVA in
> place of the doorbell's PA in case the device is upstream to an
> IOMMU that requires MSI addresses to be mapped. The doorbell's
> allocation and mapping happened on an early stage (pci_enable_msi).

This changelog is completely useless. At least I cannot figure out what that
patch actually does. And the implementation is not self explaining either.
 
> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
>  {
>  	int ret = 0;
>  
> -	if (erase)
> +	if (erase) {
>  		memset(msg, 0, sizeof(*msg));
> -	else
> +	} else {
> +		struct device *dev;
> +
>  		ret = irq_chip_compose_msi_msg(irq_data, msg);
> +		if (ret)
> +			return ret;
> +
> +		dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
> +		WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));

What the heck is this call doing? And why is there only a WARN_ON and not a
proper error return code handling?

Thanks,

	tglx
Eric Auger July 25, 2016, 4:31 p.m. UTC | #2
Hi Thomas,

On 20/07/2016 11:09, Thomas Gleixner wrote:
> On Tue, 19 Jul 2016, Eric Auger wrote:
> 
> First of all - valid for all patches:
> 
> Subject: sys/subsys: Sentence starts with an uppercase letter
OK understood.
> 
> Now for this particular one:
> 
> genirq/msi: use the MSI doorbell's IOVA when requested
> 
>> On MSI message composition we now use the MSI doorbell's IOVA in
>> place of the doorbell's PA in case the device is upstream to an
>> IOMMU that requires MSI addresses to be mapped. The doorbell's
>> allocation and mapping happened on an early stage (pci_enable_msi).
> 
> This changelog is completely useless. At least I cannot figure out what that
> patch actually does. And the implementation is not self explaining either.

>  
>> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
>>  {
>>  	int ret = 0;
>>  
>> -	if (erase)
>> +	if (erase) {
>>  		memset(msg, 0, sizeof(*msg));
>> -	else
>> +	} else {
>> +		struct device *dev;
>> +
>>  		ret = irq_chip_compose_msi_msg(irq_data, msg);
>> +		if (ret)
>> +			return ret;
>> +
>> +		dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
>> +		WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));
> 
> What the heck is this call doing? And why is there only a WARN_ON and not a
> proper error return code handling?

iommu_msi_msg_pa_to_va is part of the new iommu-msi API introduced in PART I of
this series. This helper function detects the physical address found in the
MSI message has a corresponding allocated IOVA. This happens if the MSI doorbell
is accessed through an IOMMU and this IOMMU do not bypass the MSI addresses
(ARM case). Allocation of this IOVA was performed in the previous patch.

So, if this is the case, the physical address is swapped with the IOVA
address. That way the PCIe device will send the MSI with this IOVA and
the address will be translated by the IOMMU into the target MSI doorbell PA.

Hope this clarifies

Thanks

Eric 
> 
> Thanks,
> 
> 	tglx
>
Thomas Gleixner July 26, 2016, 9:04 a.m. UTC | #3
Eric,

On Mon, 25 Jul 2016, Auger Eric wrote:
> On 20/07/2016 11:09, Thomas Gleixner wrote:
> > On Tue, 19 Jul 2016, Eric Auger wrote:
> >> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
> >>  {
> >>  	int ret = 0;
> >>  
> >> -	if (erase)
> >> +	if (erase) {
> >>  		memset(msg, 0, sizeof(*msg));
> >> -	else
> >> +	} else {
> >> +		struct device *dev;
> >> +
> >>  		ret = irq_chip_compose_msi_msg(irq_data, msg);
> >> +		if (ret)
> >> +			return ret;
> >> +
> >> +		dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
> >> +		WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));
> > 
> > What the heck is this call doing? And why is there only a WARN_ON and not a
> > proper error return code handling?
> 
> iommu_msi_msg_pa_to_va is part of the new iommu-msi API introduced in PART I
> of this series. This helper function detects the physical address found in
> the MSI message has a corresponding allocated IOVA. This happens if the MSI
> doorbell is accessed through an IOMMU and this IOMMU do not bypass the MSI
> addresses (ARM case). Allocation of this IOVA was performed in the previous
> patch.
>
> So, if this is the case, the physical address is swapped with the IOVA
> address. That way the PCIe device will send the MSI with this IOVA and
> the address will be translated by the IOMMU into the target MSI doorbell PA.
> 
> Hope this clarifies

No, it does not. You are explaining in great length what that function is
doing, but you are not explaining WHY your don't do a proper return code
handling and just do a WARN_ON() and happily proceed. If that function fails
then the interrupt will not be functional, so WHY on earth are you continuing?

Thanks,

	tglx
Eric Auger July 26, 2016, 10:02 a.m. UTC | #4
Hi Thomas,

On 26/07/2016 11:04, Thomas Gleixner wrote:
> Eric,
> 
> On Mon, 25 Jul 2016, Auger Eric wrote:
>> On 20/07/2016 11:09, Thomas Gleixner wrote:
>>> On Tue, 19 Jul 2016, Eric Auger wrote:
>>>> @@ -63,10 +63,18 @@ static int msi_compose(struct irq_data *irq_data,
>>>>  {
>>>>  	int ret = 0;
>>>>  
>>>> -	if (erase)
>>>> +	if (erase) {
>>>>  		memset(msg, 0, sizeof(*msg));
>>>> -	else
>>>> +	} else {
>>>> +		struct device *dev;
>>>> +
>>>>  		ret = irq_chip_compose_msi_msg(irq_data, msg);
>>>> +		if (ret)
>>>> +			return ret;
>>>> +
>>>> +		dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
>>>> +		WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));
>>>
>>> What the heck is this call doing? And why is there only a WARN_ON and not a
>>> proper error return code handling?
>>
>> iommu_msi_msg_pa_to_va is part of the new iommu-msi API introduced in PART I
>> of this series. This helper function detects the physical address found in
>> the MSI message has a corresponding allocated IOVA. This happens if the MSI
>> doorbell is accessed through an IOMMU and this IOMMU do not bypass the MSI
>> addresses (ARM case). Allocation of this IOVA was performed in the previous
>> patch.
>>
>> So, if this is the case, the physical address is swapped with the IOVA
>> address. That way the PCIe device will send the MSI with this IOVA and
>> the address will be translated by the IOMMU into the target MSI doorbell PA.
>>
>> Hope this clarifies
> 
> No, it does not. You are explaining in great length what that function is
> doing, but you are not explaining WHY your don't do a proper return code
> handling and just do a WARN_ON() and happily proceed. If that function fails
> then the interrupt will not be functional, so WHY on earth are you continuing?
Oh sorry I focused on the function's goal. Originally I could not return an
error since there is a BUG_ON(ret) afterwards. And typically the userspace can
willingly omit to pass IPA range that map MSIs. But now we have this 2 phases where
we first map the MSIs on pci_enable_msi_range and use the IOVA at compose time
I need to analyze again if the userspace can induce a BUG_ON.

Thanks

Eric
> 
> Thanks,
> 
> 	tglx
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
diff mbox

Patch

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 69b5b19..e375544 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -63,10 +63,18 @@  static int msi_compose(struct irq_data *irq_data,
 {
 	int ret = 0;
 
-	if (erase)
+	if (erase) {
 		memset(msg, 0, sizeof(*msg));
-	else
+	} else {
+		struct device *dev;
+
 		ret = irq_chip_compose_msi_msg(irq_data, msg);
+		if (ret)
+			return ret;
+
+		dev = msi_desc_to_dev(irq_data_get_msi_desc(irq_data));
+		WARN_ON(iommu_msi_msg_pa_to_va(dev, msg));
+	}
 
 	return ret;
 }