diff mbox series

[4/6] drm/amdgpu: add checks if DMA-buf P2P is supported

Message ID 20200311135158.3310-5-christian.koenig@amd.com (mailing list archive)
State New, archived
Headers show
Series [1/6] lib/scatterlist: add sg_set_dma_addr() function | expand

Commit Message

Christian König March 11, 2020, 1:51 p.m. UTC
Check if we can do peer2peer on the PCIe bus.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jason Gunthorpe March 11, 2020, 2:04 p.m. UTC | #1
On Wed, Mar 11, 2020 at 02:51:56PM +0100, Christian König wrote:
> Check if we can do peer2peer on the PCIe bus.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index aef12ee2f1e3..bbf67800c8a6 100644
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -38,6 +38,7 @@
>  #include <drm/amdgpu_drm.h>
>  #include <linux/dma-buf.h>
>  #include <linux/dma-fence-array.h>
> +#include <linux/pci-p2pdma.h>
>  
>  /**
>   * amdgpu_gem_prime_vmap - &dma_buf_ops.vmap implementation
> @@ -179,6 +180,9 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
>  	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>  	int r;
>  
> +	if (pci_p2pdma_distance_many(adev->pdev, &attach->dev, 1, true) < 0)
> +		attach->peer2peer = false;
> +

Are there other related patches than this series?

p2p dma mapping needs to be done in common code, in p2pdma.c - ie this
open coding is missing the bus_offset stuff, at least. 

I really do not want to see drivers open code this stuff.

We already have a p2pdma API for handling the struct page case, so I
suggest adding some new p2pdma API to handle this for non-struct page
cases.

ie some thing like:

int 'p2pdma map bar'(
   struct pci_device *source,
   unsigned int source_bar_number, 
   struct pci_device *dest, 
   physaddr&len *array_of_offsets & length pairs into source bar,
   struct scatterlist *output_sgl)

Jason
Christian König March 11, 2020, 2:33 p.m. UTC | #2
Am 11.03.20 um 15:04 schrieb Jason Gunthorpe:
> On Wed, Mar 11, 2020 at 02:51:56PM +0100, Christian König wrote:
>> Check if we can do peer2peer on the PCIe bus.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>> index aef12ee2f1e3..bbf67800c8a6 100644
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>> @@ -38,6 +38,7 @@
>>   #include <drm/amdgpu_drm.h>
>>   #include <linux/dma-buf.h>
>>   #include <linux/dma-fence-array.h>
>> +#include <linux/pci-p2pdma.h>
>>   
>>   /**
>>    * amdgpu_gem_prime_vmap - &dma_buf_ops.vmap implementation
>> @@ -179,6 +180,9 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
>>   	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>   	int r;
>>   
>> +	if (pci_p2pdma_distance_many(adev->pdev, &attach->dev, 1, true) < 0)
>> +		attach->peer2peer = false;
>> +
> Are there other related patches than this series?
>
> p2p dma mapping needs to be done in common code, in p2pdma.c - ie this
> open coding is missing the bus_offset stuff, at least.

Yeah, I'm aware of this. But I couldn't find a better way for now.

> I really do not want to see drivers open code this stuff.
>
> We already have a p2pdma API for handling the struct page case, so I
> suggest adding some new p2pdma API to handle this for non-struct page
> cases.
>
> ie some thing like:
>
> int 'p2pdma map bar'(
>     struct pci_device *source,
>     unsigned int source_bar_number,
>     struct pci_device *dest,
>     physaddr&len *array_of_offsets & length pairs into source bar,
>     struct scatterlist *output_sgl)

Well that's exactly what I have to avoid since I don't have the array of 
offsets around and want to avoid constructing it.

Similar problem for dma_map_resource(). My example does this on demand, 
but essentially we also have use cases where this is done only once.

Ideally we would have some function to create an sgl based on some 
arbitrary collection of offsets and length inside a BAR.

Regards,
Christian.

>
> Jason
Jason Gunthorpe March 11, 2020, 2:38 p.m. UTC | #3
On Wed, Mar 11, 2020 at 03:33:01PM +0100, Christian König wrote:
> Am 11.03.20 um 15:04 schrieb Jason Gunthorpe:
> > On Wed, Mar 11, 2020 at 02:51:56PM +0100, Christian König wrote:
> > > Check if we can do peer2peer on the PCIe bus.
> > > 
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 ++++
> > >   1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > > index aef12ee2f1e3..bbf67800c8a6 100644
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > > @@ -38,6 +38,7 @@
> > >   #include <drm/amdgpu_drm.h>
> > >   #include <linux/dma-buf.h>
> > >   #include <linux/dma-fence-array.h>
> > > +#include <linux/pci-p2pdma.h>
> > >   /**
> > >    * amdgpu_gem_prime_vmap - &dma_buf_ops.vmap implementation
> > > @@ -179,6 +180,9 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
> > >   	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> > >   	int r;
> > > +	if (pci_p2pdma_distance_many(adev->pdev, &attach->dev, 1, true) < 0)
> > > +		attach->peer2peer = false;
> > > +
> > Are there other related patches than this series?
> > 
> > p2p dma mapping needs to be done in common code, in p2pdma.c - ie this
> > open coding is missing the bus_offset stuff, at least.
> 
> Yeah, I'm aware of this. But I couldn't find a better way for now.

Well, it isn't optional :)
 
> > I really do not want to see drivers open code this stuff.
> > 
> > We already have a p2pdma API for handling the struct page case, so I
> > suggest adding some new p2pdma API to handle this for non-struct page
> > cases.
> > 
> > ie some thing like:
> > 
> > int 'p2pdma map bar'(
> >     struct pci_device *source,
> >     unsigned int source_bar_number,
> >     struct pci_device *dest,
> >     physaddr&len *array_of_offsets & length pairs into source bar,
> >     struct scatterlist *output_sgl)
> 
> Well that's exactly what I have to avoid since I don't have the array of
> offsets around and want to avoid constructing it.

Maybe it doesn't need an array of offsets - just a single offset and
callers can iterate the API?

> Similar problem for dma_map_resource(). My example does this on demand, but
> essentially we also have use cases where this is done only once.

I'm not sure if this is portable. Does any IOMMU HW need to know P2P
is happening to setup successfully? We currently support such a narrow
scope of HW for P2P..

> Ideally we would have some function to create an sgl based on some arbitrary
> collection of offsets and length inside a BAR.

Isn't that what I just proposed above ?

Jason
Christian König March 11, 2020, 2:43 p.m. UTC | #4
Am 11.03.20 um 15:38 schrieb Jason Gunthorpe:
> On Wed, Mar 11, 2020 at 03:33:01PM +0100, Christian König wrote:
>> Am 11.03.20 um 15:04 schrieb Jason Gunthorpe:
>>> On Wed, Mar 11, 2020 at 02:51:56PM +0100, Christian König wrote:
>>>> Check if we can do peer2peer on the PCIe bus.
>>>>
>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 ++++
>>>>    1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>>>> index aef12ee2f1e3..bbf67800c8a6 100644
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
>>>> @@ -38,6 +38,7 @@
>>>>    #include <drm/amdgpu_drm.h>
>>>>    #include <linux/dma-buf.h>
>>>>    #include <linux/dma-fence-array.h>
>>>> +#include <linux/pci-p2pdma.h>
>>>>    /**
>>>>     * amdgpu_gem_prime_vmap - &dma_buf_ops.vmap implementation
>>>> @@ -179,6 +180,9 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
>>>>    	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>>>>    	int r;
>>>> +	if (pci_p2pdma_distance_many(adev->pdev, &attach->dev, 1, true) < 0)
>>>> +		attach->peer2peer = false;
>>>> +
>>> Are there other related patches than this series?
>>>
>>> p2p dma mapping needs to be done in common code, in p2pdma.c - ie this
>>> open coding is missing the bus_offset stuff, at least.
>> Yeah, I'm aware of this. But I couldn't find a better way for now.
> Well, it isn't optional :)
>   
>>> I really do not want to see drivers open code this stuff.
>>>
>>> We already have a p2pdma API for handling the struct page case, so I
>>> suggest adding some new p2pdma API to handle this for non-struct page
>>> cases.
>>>
>>> ie some thing like:
>>>
>>> int 'p2pdma map bar'(
>>>      struct pci_device *source,
>>>      unsigned int source_bar_number,
>>>      struct pci_device *dest,
>>>      physaddr&len *array_of_offsets & length pairs into source bar,
>>>      struct scatterlist *output_sgl)
>> Well that's exactly what I have to avoid since I don't have the array of
>> offsets around and want to avoid constructing it.
> Maybe it doesn't need an array of offsets - just a single offset and
> callers can iterate the API?

Yes, that would of course work as well.

But I was assuming that p2pdma_map_bar() needs some state between those 
calls.

>
>> Similar problem for dma_map_resource(). My example does this on demand, but
>> essentially we also have use cases where this is done only once.
> I'm not sure if this is portable. Does any IOMMU HW need to know P2P
> is happening to setup successfully? We currently support such a narrow
> scope of HW for P2P..

On the AMD hardware I'm testing this calling dma_map_resource() already 
seems to work with IOMMU enabled. (Well at least it seemed so 6month ago 
when I last tested this).

>> Ideally we would have some function to create an sgl based on some arbitrary
>> collection of offsets and length inside a BAR.
> Isn't that what I just proposed above ?

Yes, just didn't thought that this would easily possible. I will double 
check the p2pdma code again.

Thanks,
Christian.

>
> Jason
Jason Gunthorpe March 11, 2020, 2:48 p.m. UTC | #5
On Wed, Mar 11, 2020 at 03:43:03PM +0100, Christian König wrote:
> > > > int 'p2pdma map bar'(
> > > >      struct pci_device *source,
> > > >      unsigned int source_bar_number,
> > > >      struct pci_device *dest,
> > > >      physaddr&len *array_of_offsets & length pairs into source bar,
> > > >      struct scatterlist *output_sgl)
> > > Well that's exactly what I have to avoid since I don't have the array of
> > > offsets around and want to avoid constructing it.
> > Maybe it doesn't need an array of offsets - just a single offset and
> > callers can iterate the API?
> 
> Yes, that would of course work as well.
> 
> But I was assuming that p2pdma_map_bar() needs some state between those
> calls.

It might be able to run faster if some state is held. We've had APIs
before where the caller can provide a cache for expensive state for
APIs. Maybe that would be an appropriate pattern here?

IIRC the distance calculation is the expensive bit, that would be easy
enough to cache.

Jason
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index aef12ee2f1e3..bbf67800c8a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -38,6 +38,7 @@ 
 #include <drm/amdgpu_drm.h>
 #include <linux/dma-buf.h>
 #include <linux/dma-fence-array.h>
+#include <linux/pci-p2pdma.h>
 
 /**
  * amdgpu_gem_prime_vmap - &dma_buf_ops.vmap implementation
@@ -179,6 +180,9 @@  static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
 	int r;
 
+	if (pci_p2pdma_distance_many(adev->pdev, &attach->dev, 1, true) < 0)
+		attach->peer2peer = false;
+
 	if (attach->dev->driver == adev->dev->driver)
 		return 0;