diff mbox

[v4,2/6] drm/nouveau: map pages using DMA API on platform devices

Message ID 1404807961-30530-3-git-send-email-acourbot@nvidia.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alexandre Courbot July 8, 2014, 8:25 a.m. UTC
page_to_phys() is not the correct way to obtain the DMA address of a
buffer on a non-PCI system. Use the DMA API functions for this, which
are portable and will allow us to use other DMA API functions for
buffer synchronization.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/drm/nouveau/core/engine/device/base.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Daniel Vetter July 10, 2014, 12:58 p.m. UTC | #1
On Tue, Jul 08, 2014 at 05:25:57PM +0900, Alexandre Courbot wrote:
> page_to_phys() is not the correct way to obtain the DMA address of a
> buffer on a non-PCI system. Use the DMA API functions for this, which
> are portable and will allow us to use other DMA API functions for
> buffer synchronization.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/drm/nouveau/core/engine/device/base.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/core/engine/device/base.c b/drivers/gpu/drm/nouveau/core/engine/device/base.c
> index 18c8c7245b73..e4e9e64988fe 100644
> --- a/drivers/gpu/drm/nouveau/core/engine/device/base.c
> +++ b/drivers/gpu/drm/nouveau/core/engine/device/base.c
> @@ -489,7 +489,10 @@ nv_device_map_page(struct nouveau_device *device, struct page *page)
>  		if (pci_dma_mapping_error(device->pdev, ret))
>  			ret = 0;
>  	} else {
> -		ret = page_to_phys(page);
> +		ret = dma_map_page(&device->platformdev->dev, page, 0,
> +				   PAGE_SIZE, DMA_BIDIRECTIONAL);
> +		if (dma_mapping_error(&device->platformdev->dev, ret))
> +			ret = 0;
>  	}
>  
>  	return ret;
> @@ -501,6 +504,9 @@ nv_device_unmap_page(struct nouveau_device *device, dma_addr_t addr)
>  	if (nv_device_is_pci(device))
>  		pci_unmap_page(device->pdev, addr, PAGE_SIZE,
>  			       PCI_DMA_BIDIRECTIONAL);

pci_map/unmap alias to dma_unmap/map when called on the underlying struct
device embedded in pci_device (like for platform drivers). Dunno whether
it's worth to track a pointer to the struct device directly and always
call dma_unmap/map.

Just drive-by comment since I'm interested in how you solve this - i915
has similar fun with buffer sharing and coherent and non-coherent
platforms. Although we don't have fun with pci and non-pci based
platforms.
-Daniel

> +	else
> +		dma_unmap_page(&device->platformdev->dev, addr,
> +			       PAGE_SIZE, DMA_BIDIRECTIONAL);
>  }
>  
>  int
> -- 
> 2.0.0
> 
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau
Alexandre Courbot July 11, 2014, 2:35 a.m. UTC | #2
On 07/10/2014 09:58 PM, Daniel Vetter wrote:
> On Tue, Jul 08, 2014 at 05:25:57PM +0900, Alexandre Courbot wrote:
>> page_to_phys() is not the correct way to obtain the DMA address of a
>> buffer on a non-PCI system. Use the DMA API functions for this, which
>> are portable and will allow us to use other DMA API functions for
>> buffer synchronization.
>>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>   drivers/gpu/drm/nouveau/core/engine/device/base.c | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/core/engine/device/base.c b/drivers/gpu/drm/nouveau/core/engine/device/base.c
>> index 18c8c7245b73..e4e9e64988fe 100644
>> --- a/drivers/gpu/drm/nouveau/core/engine/device/base.c
>> +++ b/drivers/gpu/drm/nouveau/core/engine/device/base.c
>> @@ -489,7 +489,10 @@ nv_device_map_page(struct nouveau_device *device, struct page *page)
>>   		if (pci_dma_mapping_error(device->pdev, ret))
>>   			ret = 0;
>>   	} else {
>> -		ret = page_to_phys(page);
>> +		ret = dma_map_page(&device->platformdev->dev, page, 0,
>> +				   PAGE_SIZE, DMA_BIDIRECTIONAL);
>> +		if (dma_mapping_error(&device->platformdev->dev, ret))
>> +			ret = 0;
>>   	}
>>
>>   	return ret;
>> @@ -501,6 +504,9 @@ nv_device_unmap_page(struct nouveau_device *device, dma_addr_t addr)
>>   	if (nv_device_is_pci(device))
>>   		pci_unmap_page(device->pdev, addr, PAGE_SIZE,
>>   			       PCI_DMA_BIDIRECTIONAL);
>
> pci_map/unmap alias to dma_unmap/map when called on the underlying struct
> device embedded in pci_device (like for platform drivers). Dunno whether
> it's worth to track a pointer to the struct device directly and always
> call dma_unmap/map.

Isn't it (theoretically) possible to have a platform that does not use 
the DMA API for its PCI implementation and thus requires the pci_* 
functions to be called? I could not find such a case in -next, which 
suggests that all PCI platforms have been converted to the DMA API 
already and that we could indeed refactor this to always use the DMA 
functions.

But at the same time the way we use APIs should not be directed by their 
implementation, but by their intent - and unless the PCI API has been 
deprecated in some way (something I am not aware of), the rule is still 
that you should use it on a PCI device.

>
> Just drive-by comment since I'm interested in how you solve this - i915
> has similar fun with buffer sharing and coherent and non-coherent
> platforms. Although we don't have fun with pci and non-pci based
> platforms.

Yeah, I am not familiar with i915 but it seems like we are on a similar 
boat here (excepted ARM is more constrained as to its memory mappings). 
The strategy in this series is, map buffers used by user-space cached 
and explicitly synchronize them (since the ownership transition from 
user to GPU is always clearly performed by syscalls), and use coherent 
mappings for buffers used by the kernel which are accessed more 
randomly. This has solved all our coherency issues and resulted in the 
best performance so far.
Ben Skeggs July 11, 2014, 2:50 a.m. UTC | #3
On Fri, Jul 11, 2014 at 12:35 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> On 07/10/2014 09:58 PM, Daniel Vetter wrote:
>>
>> On Tue, Jul 08, 2014 at 05:25:57PM +0900, Alexandre Courbot wrote:
>>>
>>> page_to_phys() is not the correct way to obtain the DMA address of a
>>> buffer on a non-PCI system. Use the DMA API functions for this, which
>>> are portable and will allow us to use other DMA API functions for
>>> buffer synchronization.
>>>
>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>>> ---
>>>   drivers/gpu/drm/nouveau/core/engine/device/base.c | 8 +++++++-
>>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/core/engine/device/base.c
>>> b/drivers/gpu/drm/nouveau/core/engine/device/base.c
>>> index 18c8c7245b73..e4e9e64988fe 100644
>>> --- a/drivers/gpu/drm/nouveau/core/engine/device/base.c
>>> +++ b/drivers/gpu/drm/nouveau/core/engine/device/base.c
>>> @@ -489,7 +489,10 @@ nv_device_map_page(struct nouveau_device *device,
>>> struct page *page)
>>>                 if (pci_dma_mapping_error(device->pdev, ret))
>>>                         ret = 0;
>>>         } else {
>>> -               ret = page_to_phys(page);
>>> +               ret = dma_map_page(&device->platformdev->dev, page, 0,
>>> +                                  PAGE_SIZE, DMA_BIDIRECTIONAL);
>>> +               if (dma_mapping_error(&device->platformdev->dev, ret))
>>> +                       ret = 0;
>>>         }
>>>
>>>         return ret;
>>> @@ -501,6 +504,9 @@ nv_device_unmap_page(struct nouveau_device *device,
>>> dma_addr_t addr)
>>>         if (nv_device_is_pci(device))
>>>                 pci_unmap_page(device->pdev, addr, PAGE_SIZE,
>>>                                PCI_DMA_BIDIRECTIONAL);
>>
>>
>> pci_map/unmap alias to dma_unmap/map when called on the underlying struct
>> device embedded in pci_device (like for platform drivers). Dunno whether
>> it's worth to track a pointer to the struct device directly and always
>> call dma_unmap/map.
>
>
> Isn't it (theoretically) possible to have a platform that does not use the
> DMA API for its PCI implementation and thus requires the pci_* functions to
> be called? I could not find such a case in -next, which suggests that all
> PCI platforms have been converted to the DMA API already and that we could
> indeed refactor this to always use the DMA functions.
>
> But at the same time the way we use APIs should not be directed by their
> implementation, but by their intent - and unless the PCI API has been
> deprecated in some way (something I am not aware of), the rule is still that
> you should use it on a PCI device.
>
>
>>
>> Just drive-by comment since I'm interested in how you solve this - i915
>> has similar fun with buffer sharing and coherent and non-coherent
>> platforms. Although we don't have fun with pci and non-pci based
>> platforms.
>
>
> Yeah, I am not familiar with i915 but it seems like we are on a similar boat
> here (excepted ARM is more constrained as to its memory mappings). The
> strategy in this series is, map buffers used by user-space cached and
> explicitly synchronize them (since the ownership transition from user to GPU
> is always clearly performed by syscalls), and use coherent mappings for
> buffers used by the kernel which are accessed more randomly. This has solved
> all our coherency issues and resulted in the best performance so far.
I wonder if we might want to use unsnooped cached mappings of pages on
non-ARM platforms also, to avoid the overhead of the cache snooping?

>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
Alexandre Courbot July 11, 2014, 2:57 a.m. UTC | #4
On 07/11/2014 11:50 AM, Ben Skeggs wrote:
> On Fri, Jul 11, 2014 at 12:35 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> On 07/10/2014 09:58 PM, Daniel Vetter wrote:
>>>
>>> On Tue, Jul 08, 2014 at 05:25:57PM +0900, Alexandre Courbot wrote:
>>>>
>>>> page_to_phys() is not the correct way to obtain the DMA address of a
>>>> buffer on a non-PCI system. Use the DMA API functions for this, which
>>>> are portable and will allow us to use other DMA API functions for
>>>> buffer synchronization.
>>>>
>>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>>>> ---
>>>>    drivers/gpu/drm/nouveau/core/engine/device/base.c | 8 +++++++-
>>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/nouveau/core/engine/device/base.c
>>>> b/drivers/gpu/drm/nouveau/core/engine/device/base.c
>>>> index 18c8c7245b73..e4e9e64988fe 100644
>>>> --- a/drivers/gpu/drm/nouveau/core/engine/device/base.c
>>>> +++ b/drivers/gpu/drm/nouveau/core/engine/device/base.c
>>>> @@ -489,7 +489,10 @@ nv_device_map_page(struct nouveau_device *device,
>>>> struct page *page)
>>>>                  if (pci_dma_mapping_error(device->pdev, ret))
>>>>                          ret = 0;
>>>>          } else {
>>>> -               ret = page_to_phys(page);
>>>> +               ret = dma_map_page(&device->platformdev->dev, page, 0,
>>>> +                                  PAGE_SIZE, DMA_BIDIRECTIONAL);
>>>> +               if (dma_mapping_error(&device->platformdev->dev, ret))
>>>> +                       ret = 0;
>>>>          }
>>>>
>>>>          return ret;
>>>> @@ -501,6 +504,9 @@ nv_device_unmap_page(struct nouveau_device *device,
>>>> dma_addr_t addr)
>>>>          if (nv_device_is_pci(device))
>>>>                  pci_unmap_page(device->pdev, addr, PAGE_SIZE,
>>>>                                 PCI_DMA_BIDIRECTIONAL);
>>>
>>>
>>> pci_map/unmap alias to dma_unmap/map when called on the underlying struct
>>> device embedded in pci_device (like for platform drivers). Dunno whether
>>> it's worth to track a pointer to the struct device directly and always
>>> call dma_unmap/map.
>>
>>
>> Isn't it (theoretically) possible to have a platform that does not use the
>> DMA API for its PCI implementation and thus requires the pci_* functions to
>> be called? I could not find such a case in -next, which suggests that all
>> PCI platforms have been converted to the DMA API already and that we could
>> indeed refactor this to always use the DMA functions.
>>
>> But at the same time the way we use APIs should not be directed by their
>> implementation, but by their intent - and unless the PCI API has been
>> deprecated in some way (something I am not aware of), the rule is still that
>> you should use it on a PCI device.
>>
>>
>>>
>>> Just drive-by comment since I'm interested in how you solve this - i915
>>> has similar fun with buffer sharing and coherent and non-coherent
>>> platforms. Although we don't have fun with pci and non-pci based
>>> platforms.
>>
>>
>> Yeah, I am not familiar with i915 but it seems like we are on a similar boat
>> here (excepted ARM is more constrained as to its memory mappings). The
>> strategy in this series is, map buffers used by user-space cached and
>> explicitly synchronize them (since the ownership transition from user to GPU
>> is always clearly performed by syscalls), and use coherent mappings for
>> buffers used by the kernel which are accessed more randomly. This has solved
>> all our coherency issues and resulted in the best performance so far.
> I wonder if we might want to use unsnooped cached mappings of pages on
> non-ARM platforms also, to avoid the overhead of the cache snooping?

You might want to indeed, now that coherency is guaranteed by the sync 
functions originally introduced by Lucas. The only issue I could see is 
that they always invalidate the full buffer whereas bus snooping only 
affects pages that are actually touched. Someone would need to try this 
on a desktop machine and see how it affects performance.

I'd be all for it though, since it would also allow us to get rid of 
this ungraceful nv_device_is_cpu_coherent() function and result in 
simplifying nouveau_bo.c a bit.
Daniel Vetter July 11, 2014, 7:38 a.m. UTC | #5
On Fri, Jul 11, 2014 at 11:35:23AM +0900, Alexandre Courbot wrote:
> On 07/10/2014 09:58 PM, Daniel Vetter wrote:
> >On Tue, Jul 08, 2014 at 05:25:57PM +0900, Alexandre Courbot wrote:
> >>page_to_phys() is not the correct way to obtain the DMA address of a
> >>buffer on a non-PCI system. Use the DMA API functions for this, which
> >>are portable and will allow us to use other DMA API functions for
> >>buffer synchronization.
> >>
> >>Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> >>---
> >>  drivers/gpu/drm/nouveau/core/engine/device/base.c | 8 +++++++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/gpu/drm/nouveau/core/engine/device/base.c b/drivers/gpu/drm/nouveau/core/engine/device/base.c
> >>index 18c8c7245b73..e4e9e64988fe 100644
> >>--- a/drivers/gpu/drm/nouveau/core/engine/device/base.c
> >>+++ b/drivers/gpu/drm/nouveau/core/engine/device/base.c
> >>@@ -489,7 +489,10 @@ nv_device_map_page(struct nouveau_device *device, struct page *page)
> >>  		if (pci_dma_mapping_error(device->pdev, ret))
> >>  			ret = 0;
> >>  	} else {
> >>-		ret = page_to_phys(page);
> >>+		ret = dma_map_page(&device->platformdev->dev, page, 0,
> >>+				   PAGE_SIZE, DMA_BIDIRECTIONAL);
> >>+		if (dma_mapping_error(&device->platformdev->dev, ret))
> >>+			ret = 0;
> >>  	}
> >>
> >>  	return ret;
> >>@@ -501,6 +504,9 @@ nv_device_unmap_page(struct nouveau_device *device, dma_addr_t addr)
> >>  	if (nv_device_is_pci(device))
> >>  		pci_unmap_page(device->pdev, addr, PAGE_SIZE,
> >>  			       PCI_DMA_BIDIRECTIONAL);
> >
> >pci_map/unmap alias to dma_unmap/map when called on the underlying struct
> >device embedded in pci_device (like for platform drivers). Dunno whether
> >it's worth to track a pointer to the struct device directly and always
> >call dma_unmap/map.
> 
> Isn't it (theoretically) possible to have a platform that does not use the
> DMA API for its PCI implementation and thus requires the pci_* functions to
> be called? I could not find such a case in -next, which suggests that all
> PCI platforms have been converted to the DMA API already and that we could
> indeed refactor this to always use the DMA functions.
> 
> But at the same time the way we use APIs should not be directed by their
> implementation, but by their intent - and unless the PCI API has been
> deprecated in some way (something I am not aware of), the rule is still that
> you should use it on a PCI device.

Hm, somehow I've thought that it's recommended to just use the dma api
directly. It's what we're doing in i915 at least, but now I'm not so sure
any more. My guess is that this is just history really when the dma api
was pci-only.
-Daniel
Lucas Stach July 11, 2014, 9:53 a.m. UTC | #6
Am Freitag, den 11.07.2014, 11:57 +0900 schrieb Alexandre Courbot:
[...]
> >> Yeah, I am not familiar with i915 but it seems like we are on a similar boat
> >> here (excepted ARM is more constrained as to its memory mappings). The
> >> strategy in this series is, map buffers used by user-space cached and
> >> explicitly synchronize them (since the ownership transition from user to GPU
> >> is always clearly performed by syscalls), and use coherent mappings for
> >> buffers used by the kernel which are accessed more randomly. This has solved
> >> all our coherency issues and resulted in the best performance so far.
> > I wonder if we might want to use unsnooped cached mappings of pages on
> > non-ARM platforms also, to avoid the overhead of the cache snooping?
> 
> You might want to indeed, now that coherency is guaranteed by the sync 
> functions originally introduced by Lucas. The only issue I could see is 
> that they always invalidate the full buffer whereas bus snooping only 
> affects pages that are actually touched. Someone would need to try this 
> on a desktop machine and see how it affects performance.
> 
> I'd be all for it though, since it would also allow us to get rid of 
> this ungraceful nv_device_is_cpu_coherent() function and result in 
> simplifying nouveau_bo.c a bit.

This will need some testing to get hard numbers, but I suspect that
invalidating the whole buffer isn't to bad as the prefetch machinery
works very well with the access patterns we see in graphics drivers.

Flushing out the whole buffer should be even less problematic, as it
will only flush out dirty lines that would need to be flushed on GPU
read snooping anyways.

In the long run we might want a separate cpu prepare/finish ioctl where
we can indicate the area of interest. This might help to avoid some of
the invalidate overhead especially for userspace suballocated buffers.

Regards,
Lucas
diff mbox

Patch

diff --git a/drivers/gpu/drm/nouveau/core/engine/device/base.c b/drivers/gpu/drm/nouveau/core/engine/device/base.c
index 18c8c7245b73..e4e9e64988fe 100644
--- a/drivers/gpu/drm/nouveau/core/engine/device/base.c
+++ b/drivers/gpu/drm/nouveau/core/engine/device/base.c
@@ -489,7 +489,10 @@  nv_device_map_page(struct nouveau_device *device, struct page *page)
 		if (pci_dma_mapping_error(device->pdev, ret))
 			ret = 0;
 	} else {
-		ret = page_to_phys(page);
+		ret = dma_map_page(&device->platformdev->dev, page, 0,
+				   PAGE_SIZE, DMA_BIDIRECTIONAL);
+		if (dma_mapping_error(&device->platformdev->dev, ret))
+			ret = 0;
 	}
 
 	return ret;
@@ -501,6 +504,9 @@  nv_device_unmap_page(struct nouveau_device *device, dma_addr_t addr)
 	if (nv_device_is_pci(device))
 		pci_unmap_page(device->pdev, addr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
+	else
+		dma_unmap_page(&device->platformdev->dev, addr,
+			       PAGE_SIZE, DMA_BIDIRECTIONAL);
 }
 
 int