diff mbox series

[1/3] drm/amdgpu: Refuse to create a KMS FB for non-P2P exported dma-bufs

Message ID 20240222172821.16901-1-michel@daenzer.net (mailing list archive)
State New, archived
Headers show
Series [1/3] drm/amdgpu: Refuse to create a KMS FB for non-P2P exported dma-bufs | expand

Commit Message

Michel Dänzer Feb. 22, 2024, 5:28 p.m. UTC
From: Michel Dänzer <mdaenzer@redhat.com>

Pinning the BO storage to VRAM for scanout would make it inaccessible
to non-P2P dma-buf importers.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10635
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 38 ++++++++++++++++++---
 1 file changed, 33 insertions(+), 5 deletions(-)

Comments

Christian König Feb. 23, 2024, 7:06 a.m. UTC | #1
Am 22.02.24 um 18:28 schrieb Michel Dänzer:
> From: Michel Dänzer <mdaenzer@redhat.com>
>
> Pinning the BO storage to VRAM for scanout would make it inaccessible
> to non-P2P dma-buf importers.

Thinking more about it I don't think we can do this.

Using the BO in a ping/pong fashion for scanout and DMA-buf is actually 
valid, you just can't do both at the same time.

And if I'm not completely mistaken we actually have use cases for this 
at the moment, only as fallback but it would still break existing 
userspace and that is a no-go.

So rejecting things during CS and atomic commit is the best thing we can do.

Regards,
Christian.

>
> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10635
> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 38 ++++++++++++++++++---
>   1 file changed, 33 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index b8fbe97efe1d3..514a5b2159815 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -1255,13 +1255,41 @@ amdgpu_display_user_framebuffer_create(struct drm_device *dev,
>   		return ERR_PTR(-ENOENT);
>   	}
>   
> -	/* Handle is imported dma-buf, so cannot be migrated to VRAM for scanout */
>   	bo = gem_to_amdgpu_bo(obj);
>   	domains = amdgpu_display_supported_domains(drm_to_adev(dev), bo->flags);
> -	if (obj->import_attach && !(domains & AMDGPU_GEM_DOMAIN_GTT)) {
> -		drm_dbg_kms(dev, "Cannot create framebuffer from imported dma_buf\n");
> -		drm_gem_object_put(obj);
> -		return ERR_PTR(-EINVAL);
> +	if (!(domains & AMDGPU_GEM_DOMAIN_GTT)) {
> +		bool can_pin = true;
> +
> +		mutex_lock(&file_priv->prime.lock);
> +
> +		/* Handle is imported dma-buf, so cannot be migrated to VRAM for scanout */
> +		if (obj->import_attach) {
> +			drm_dbg_kms(dev, "Cannot create framebuffer from imported dma_buf\n");
> +			can_pin = false;
> +		} else if (obj->dma_buf) {
> +			struct dma_buf *dmabuf = obj->dma_buf;
> +			struct dma_buf_attachment *attachment;
> +
> +			dma_resv_lock(dmabuf->resv, NULL);
> +
> +			list_for_each_entry(attachment, &dmabuf->attachments, node) {
> +				if (IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY) && attachment->peer2peer)
> +					continue;
> +
> +				drm_dbg_kms(dev, "Cannot create framebuffer from non-P2P exported dma_buf\n");
> +				can_pin = false;
> +				break;
> +			}
> +
> +			dma_resv_unlock(dmabuf->resv);
> +		}
> +
> +		mutex_unlock(&file_priv->prime.lock);
> +
> +		if (!can_pin) {
> +			drm_gem_object_put(obj);
> +			return ERR_PTR(-EINVAL);
> +		}
>   	}
>   
>   	amdgpu_fb = kzalloc(sizeof(*amdgpu_fb), GFP_KERNEL);
Michel Dänzer Feb. 23, 2024, 8:11 a.m. UTC | #2
On 2024-02-23 08:06, Christian König wrote:
> Am 22.02.24 um 18:28 schrieb Michel Dänzer:
>> From: Michel Dänzer <mdaenzer@redhat.com>
>>
>> Pinning the BO storage to VRAM for scanout would make it inaccessible
>> to non-P2P dma-buf importers.
> 
> Thinking more about it I don't think we can do this.
> 
> Using the BO in a ping/pong fashion for scanout and DMA-buf is actually valid, you just can't do both at the same time.
> 
> And if I'm not completely mistaken we actually have use cases for this at the moment,

Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?

(As discussed on the GitLab issue, AFAICT P2P could be made to work even without CONFIG_DMABUF_MOVE_NOTIFY, by pinning to VRAM instead of GTT for dma-buf sharing)


> only as fallback but it would still break existing userspace and that is a no-go.

I'm obviously aware of this general rule. There are exceptions though, and this might be one.


> So rejecting things during CS and atomic commit is the best thing we can do.

It's problematic for a Wayland compositor:

The CS ioctl failing is awkward. With GL, I'm pretty sure it means the compositor would have to destroy the context and create a new one. Not sure about Vulkan, but I suspect it's painful there as well.

Similarly for the KMS atomic commit ioctl. The compositor can't know why exactly it failed, all it gets is an error code.

And there's no other way for the compositor to detect when both things can actually work concurrently.

Together, this means the compositor always has to assume the worst case, and do workarounds such as using the scanout GPU to copy from the scanout buffer to another buffer for access from another GPU. Even when direct access from the other GPU would actually work fine.
Michel Dänzer Feb. 23, 2024, 8:41 a.m. UTC | #3
On 2024-02-23 09:11, Michel Dänzer wrote:
> On 2024-02-23 08:06, Christian König wrote:
>>
>> So rejecting things during CS and atomic commit is the best thing we can do.
> 
> It's problematic for a Wayland compositor:
> 
> The CS ioctl failing is awkward. With GL, I'm pretty sure it means the compositor would have to destroy the context and create a new one. Not sure about Vulkan, but I suspect it's painful there as well.
> 
> Similarly for the KMS atomic commit ioctl. The compositor can't know why exactly it failed, all it gets is an error code.
> 
> And there's no other way for the compositor to detect when both things can actually work concurrently.
> 
> Together, this means the compositor always has to assume the worst case, and do workarounds such as using the scanout GPU to copy from the scanout buffer to another buffer for access from another GPU. Even when direct access from the other GPU would actually work fine.

It's worse for Xwayland:

It can't know if/when the Wayland compositor uses a buffer for scanout. It would have to assume the worst case whenever a buffer is owned by the compositor.

Except Xwayland can't know which GPU a buffer is originally from. So it can't know when the workaround is actually needed, or which GPU it has to use for the workaround.
Christian König Feb. 23, 2024, 9:34 a.m. UTC | #4
Am 23.02.24 um 09:11 schrieb Michel Dänzer:
> On 2024-02-23 08:06, Christian König wrote:
>> Am 22.02.24 um 18:28 schrieb Michel Dänzer:
>>> From: Michel Dänzer <mdaenzer@redhat.com>
>>>
>>> Pinning the BO storage to VRAM for scanout would make it inaccessible
>>> to non-P2P dma-buf importers.
>> Thinking more about it I don't think we can do this.
>>
>> Using the BO in a ping/pong fashion for scanout and DMA-buf is actually valid, you just can't do both at the same time.
>>
>> And if I'm not completely mistaken we actually have use cases for this at the moment,
> Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?

Nope, we are basically talking about unit tests and examples for inter 
device operations.

Those render into a shared buffer and then display it to check if the 
content was rendered/transferred correctly.

I'm not sure if we still do those test cases, the last time I looked 
into it was before P2P was even supported, but I also can't rule it out.

> (As discussed on the GitLab issue, AFAICT P2P could be made to work even without CONFIG_DMABUF_MOVE_NOTIFY, by pinning to VRAM instead of GTT for dma-buf sharing)

Longer story but that is something intentionally not done.

>> only as fallback but it would still break existing userspace and that is a no-go.
> I'm obviously aware of this general rule. There are exceptions though, and this might be one.
>
>
>> So rejecting things during CS and atomic commit is the best thing we can do.
> It's problematic for a Wayland compositor:
>
> The CS ioctl failing is awkward. With GL, I'm pretty sure it means the compositor would have to destroy the context and create a new one. Not sure about Vulkan, but I suspect it's painful there as well.
>
> Similarly for the KMS atomic commit ioctl. The compositor can't know why exactly it failed, all it gets is an error code.

Yeah, but that is not because the kernel is doing anything wrong.

Sharing, rendering and then doing an atomic commit is a perfectly valid 
use case.

You just can't do scanout and sharing at the same time.

> And there's no other way for the compositor to detect when both things can actually work concurrently.

That I totally agree with. And IIRC we already have at least a query for 
the buffer placement. E.g. you can already check if the BO is in GTT or 
VRAM and shared.

What's missing is exposing if the device can scanout from GTT or not.

It's just that blocking a valid use case because a special combination 
doesn't work is not going to fly I think.

> Together, this means the compositor always has to assume the worst case, and do workarounds such as using the scanout GPU to copy from the scanout buffer to another buffer for access from another GPU. Even when direct access from the other GPU would actually work fine.

Yeah, I think we can avoid that. I'm just not sure how to do it in a 
driver agnostic way.

Regards,
Christian.
Michel Dänzer Feb. 23, 2024, 10:04 a.m. UTC | #5
On 2024-02-23 10:34, Christian König wrote:
> Am 23.02.24 um 09:11 schrieb Michel Dänzer:
>> On 2024-02-23 08:06, Christian König wrote:
>>> Am 22.02.24 um 18:28 schrieb Michel Dänzer:
>>>> From: Michel Dänzer <mdaenzer@redhat.com>
>>>>
>>>> Pinning the BO storage to VRAM for scanout would make it inaccessible
>>>> to non-P2P dma-buf importers.
>>> Thinking more about it I don't think we can do this.
>>>
>>> Using the BO in a ping/pong fashion for scanout and DMA-buf is actually valid, you just can't do both at the same time.
>>>
>>> And if I'm not completely mistaken we actually have use cases for this at the moment,
>> Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?
> 
> Nope, we are basically talking about unit tests and examples for inter device operations.

Sounds like the "no user-space regressions" rule might not apply then.


> Those render into a shared buffer and then display it to check if the content was rendered/transferred correctly.

That can be fixed by dropping the dma-buf attachments from other GPUs before creating the KMS FB.

Conversely, tests / examples which do scanout first can be fixed by destroying KMS FBs before sharing the BO with another GPU.


> I'm not sure if we still do those test cases, the last time I looked into it was before P2P was even supported, but I also can't rule it out.

Sounds too vague to block this series.


>>> So rejecting things during CS and atomic commit is the best thing we can do.
>> It's problematic for a Wayland compositor:
>>
>> The CS ioctl failing is awkward. With GL, I'm pretty sure it means the compositor would have to destroy the context and create a new one. Not sure about Vulkan, but I suspect it's painful there as well.
>>
>> Similarly for the KMS atomic commit ioctl. The compositor can't know why exactly it failed, all it gets is an error code.
> 
> Yeah, but that is not because the kernel is doing anything wrong.
> 
> Sharing, rendering and then doing an atomic commit is a perfectly valid use case.
> 
> You just can't do scanout and sharing at the same time.

Per my later follow-up, Xwayland can't really avoid it.


>> And there's no other way for the compositor to detect when both things can actually work concurrently.
> 
> That I totally agree with. And IIRC we already have at least a query for the buffer placement. E.g. you can already check if the BO is in GTT or VRAM and shared.
> 
> What's missing is exposing if the device can scanout from GTT or not.

Requiring Wayland compositors to have driver-specific knowledge like that baked in isn't great either.
Michel Dänzer Feb. 23, 2024, 4:43 p.m. UTC | #6
On 2024-02-23 11:04, Michel Dänzer wrote:
> On 2024-02-23 10:34, Christian König wrote:
>> Am 23.02.24 um 09:11 schrieb Michel Dänzer:
>>> On 2024-02-23 08:06, Christian König wrote:
>>>> Am 22.02.24 um 18:28 schrieb Michel Dänzer:
>>>>> From: Michel Dänzer <mdaenzer@redhat.com>
>>>>>
>>>>> Pinning the BO storage to VRAM for scanout would make it inaccessible
>>>>> to non-P2P dma-buf importers.
>>>> Thinking more about it I don't think we can do this.
>>>>
>>>> Using the BO in a ping/pong fashion for scanout and DMA-buf is actually valid, you just can't do both at the same time.
>>>>
>>>> And if I'm not completely mistaken we actually have use cases for this at the moment,
>>> Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?
>>
>> Nope, we are basically talking about unit tests and examples for inter device operations.
> 
> Sounds like the "no user-space regressions" rule might not apply then.

To clarify what I mean by that:

"We can't fix this issue, because it would break some unit tests and examples" is similar to saying "We can't fix this KMS bug, because there are IGT tests expecting the buggy behaviour". In practice, the latter do get fixed, along with the IGT tests.
Christian König Feb. 26, 2024, 3:58 p.m. UTC | #7
Am 23.02.24 um 17:43 schrieb Michel Dänzer:
> On 2024-02-23 11:04, Michel Dänzer wrote:
>> On 2024-02-23 10:34, Christian König wrote:
>>> Am 23.02.24 um 09:11 schrieb Michel Dänzer:
>>>> On 2024-02-23 08:06, Christian König wrote:
>>>>> Am 22.02.24 um 18:28 schrieb Michel Dänzer:
>>>>>> From: Michel Dänzer <mdaenzer@redhat.com>
>>>>>>
>>>>>> Pinning the BO storage to VRAM for scanout would make it inaccessible
>>>>>> to non-P2P dma-buf importers.
>>>>> Thinking more about it I don't think we can do this.
>>>>>
>>>>> Using the BO in a ping/pong fashion for scanout and DMA-buf is actually valid, you just can't do both at the same time.
>>>>>
>>>>> And if I'm not completely mistaken we actually have use cases for this at the moment,
>>>> Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?
>>> Nope, we are basically talking about unit tests and examples for inter device operations.
>> Sounds like the "no user-space regressions" rule might not apply then.
> To clarify what I mean by that:
>
> "We can't fix this issue, because it would break some unit tests and examples" is similar to saying "We can't fix this KMS bug, because there are IGT tests expecting the buggy behaviour". In practice, the latter do get fixed, along with the IGT tests.

The problem here is that this is not a bug, but intentional behavior. 
Exporting BOs and using them in scanout in a ping/pong fashion is 
perfectly valid.

We have use cases which depend on this behavior and I'm not going to 
break those to fix your use case.

So as far as I can see this approach here is a no-go.

Regards,
Christian.
Michel Dänzer Feb. 26, 2024, 4:27 p.m. UTC | #8
On 2024-02-26 16:58, Christian König wrote:
> Am 23.02.24 um 17:43 schrieb Michel Dänzer:
>> On 2024-02-23 11:04, Michel Dänzer wrote:
>>> On 2024-02-23 10:34, Christian König wrote:
>>>> Am 23.02.24 um 09:11 schrieb Michel Dänzer:
>>>>> On 2024-02-23 08:06, Christian König wrote:
>>>>>> Am 22.02.24 um 18:28 schrieb Michel Dänzer:
>>>>>>> From: Michel Dänzer <mdaenzer@redhat.com>
>>>>>>>
>>>>>>> Pinning the BO storage to VRAM for scanout would make it inaccessible
>>>>>>> to non-P2P dma-buf importers.
>>>>>> Thinking more about it I don't think we can do this.
>>>>>>
>>>>>> Using the BO in a ping/pong fashion for scanout and DMA-buf is actually valid, you just can't do both at the same time.
>>>>>>
>>>>>> And if I'm not completely mistaken we actually have use cases for this at the moment,
>>>>> Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?
>>>> Nope, we are basically talking about unit tests and examples for inter device operations.
>>> Sounds like the "no user-space regressions" rule might not apply then.
>> To clarify what I mean by that:
>>
>> "We can't fix this issue, because it would break some unit tests and examples" is similar to saying "We can't fix this KMS bug, because there are IGT tests expecting the buggy behaviour". In practice, the latter do get fixed, along with the IGT tests.
> 
> The problem here is that this is not a bug, but intentional behavior. Exporting BOs and using them in scanout in a ping/pong fashion is perfectly valid.

The problem with the status quo is that it requires amdgpu-specific knowledge and worst-case pessimization in user space.


> We have use cases which depend on this behavior and I'm not going to break those to fix your use case.

It's not "my" use case, it's a Wayland compositor trying to make use of BO sharing and scanout without always pessimizing for the worst case.

That's surely more of a real-world use case than unit tests and examples.
Christian König Feb. 26, 2024, 4:34 p.m. UTC | #9
Am 26.02.24 um 17:27 schrieb Michel Dänzer:
> On 2024-02-26 16:58, Christian König wrote:
>> Am 23.02.24 um 17:43 schrieb Michel Dänzer:
>>> On 2024-02-23 11:04, Michel Dänzer wrote:
>>>> On 2024-02-23 10:34, Christian König wrote:
>>>>> Am 23.02.24 um 09:11 schrieb Michel Dänzer:
>>>>>> On 2024-02-23 08:06, Christian König wrote:
>>>>>>> Am 22.02.24 um 18:28 schrieb Michel Dänzer:
>>>>>>>> From: Michel Dänzer<mdaenzer@redhat.com>
>>>>>>>>
>>>>>>>> Pinning the BO storage to VRAM for scanout would make it inaccessible
>>>>>>>> to non-P2P dma-buf importers.
>>>>>>> Thinking more about it I don't think we can do this.
>>>>>>>
>>>>>>> Using the BO in a ping/pong fashion for scanout and DMA-buf is actually valid, you just can't do both at the same time.
>>>>>>>
>>>>>>> And if I'm not completely mistaken we actually have use cases for this at the moment,
>>>>>> Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?
>>>>> Nope, we are basically talking about unit tests and examples for inter device operations.
>>>> Sounds like the "no user-space regressions" rule might not apply then.
>>> To clarify what I mean by that:
>>>
>>> "We can't fix this issue, because it would break some unit tests and examples" is similar to saying "We can't fix this KMS bug, because there are IGT tests expecting the buggy behaviour". In practice, the latter do get fixed, along with the IGT tests.
>> The problem here is that this is not a bug, but intentional behavior. Exporting BOs and using them in scanout in a ping/pong fashion is perfectly valid.
> The problem with the status quo is that it requires amdgpu-specific knowledge and worst-case pessimization in user space.

Yeah, I'm perfectly aware of that. But that approach here really doesn't 
seems to be valid.

>> We have use cases which depend on this behavior and I'm not going to break those to fix your use case.
> It's not "my" use case, it's a Wayland compositor trying to make use of BO sharing and scanout without always pessimizing for the worst case.
>
> That's surely more of a real-world use case than unit tests and examples.

I've looked a bit deeper into this and we have told customers for the 
last ~10 years or so that this is how it is supposed to work and that 
they can use this approach.

So this is much more than unit tests and examples, we are changing 
existing behavior which is clearly not a bug but intentional and have a 
very high chance to break valid use cases.

My question is why has it worked so far? I mean we are not doing this 
since yesterday and the problem only shows up now?

While I see the use case something is still missing in that picture.

Christian.
Michel Dänzer Feb. 26, 2024, 4:46 p.m. UTC | #10
On 2024-02-26 17:34, Christian König wrote:
> Am 26.02.24 um 17:27 schrieb Michel Dänzer:
>> On 2024-02-26 16:58, Christian König wrote:
>>> Am 23.02.24 um 17:43 schrieb Michel Dänzer:
>>>> On 2024-02-23 11:04, Michel Dänzer wrote:
>>>>> On 2024-02-23 10:34, Christian König wrote:
>>>>>> Am 23.02.24 um 09:11 schrieb Michel Dänzer:
>>>>>>> On 2024-02-23 08:06, Christian König wrote:
>>>>>>>> Am 22.02.24 um 18:28 schrieb Michel Dänzer:
>>>>>>>>> From: Michel Dänzer <mdaenzer@redhat.com>
>>>>>>>>>
>>>>>>>>> Pinning the BO storage to VRAM for scanout would make it inaccessible
>>>>>>>>> to non-P2P dma-buf importers.
>>>>>>>> Thinking more about it I don't think we can do this.
>>>>>>>>
>>>>>>>> Using the BO in a ping/pong fashion for scanout and DMA-buf is actually valid, you just can't do both at the same time.
>>>>>>>>
>>>>>>>> And if I'm not completely mistaken we actually have use cases for this at the moment,
>>>>>>> Those use cases don't have P2P & CONFIG_DMABUF_MOVE_NOTIFY?
>>>>>> Nope, we are basically talking about unit tests and examples for inter device operations.
>>>>> Sounds like the "no user-space regressions" rule might not apply then.
>>>> To clarify what I mean by that:
>>>>
>>>> "We can't fix this issue, because it would break some unit tests and examples" is similar to saying "We can't fix this KMS bug, because there are IGT tests expecting the buggy behaviour". In practice, the latter do get fixed, along with the IGT tests.
>>> The problem here is that this is not a bug, but intentional behavior. Exporting BOs and using them in scanout in a ping/pong fashion is perfectly valid.
>> The problem with the status quo is that it requires amdgpu-specific knowledge and worst-case pessimization in user space.
> 
> Yeah, I'm perfectly aware of that. But that approach here really doesn't seems to be valid.

It's the only known sane approach at this point. There's no other proposal which allows using both BO sharing and scanout without pessimizing or hitting seemingly random CS ioctl failures.


>>> We have use cases which depend on this behavior and I'm not going to break those to fix your use case.
>> It's not "my" use case, it's a Wayland compositor trying to make use of BO sharing and scanout without always pessimizing for the worst case.
>>
>> That's surely more of a real-world use case than unit tests and examples.
> 
> I've looked a bit deeper into this and we have told customers for the last ~10 years or so that this is how it is supposed to work and that they can use this approach.
> 
> So this is much more than unit tests and examples, we are changing existing behavior which is clearly not a bug but intentional and have a very high chance to break valid use cases.

"Very high chance" sounds like you still don't know of any actual real-world use case relying on it though?


> My question is why has it worked so far? I mean we are not doing this since yesterday and the problem only shows up now?

Yes, Wayland compositors are only starting to try and make use of this now.
Michel Dänzer Feb. 26, 2024, 4:50 p.m. UTC | #11
On 2024-02-26 17:46, Michel Dänzer wrote:
> On 2024-02-26 17:34, Christian König wrote:
> 
>> My question is why has it worked so far? I mean we are not doing this since yesterday and the problem only shows up now?
> 
> Yes, Wayland compositors are only starting to try and make use of this now.

To expand on this, mutter will want to do something like this as well sooner or later. I suspect it's the same for others like kwin, sway etc.
Christian König Feb. 26, 2024, 4:53 p.m. UTC | #12
Am 26.02.24 um 17:50 schrieb Michel Dänzer:
> On 2024-02-26 17:46, Michel Dänzer wrote:
>> On 2024-02-26 17:34, Christian König wrote:
>>
>>> My question is why has it worked so far? I mean we are not doing this since yesterday and the problem only shows up now?
>> Yes, Wayland compositors are only starting to try and make use of this now.
> To expand on this, mutter will want to do something like this as well sooner or later. I suspect it's the same for others like kwin, sway etc.

Yeah, but we have done similar things with X decades before. E.g. 
basically the client sends a BO to the server for displaying it.

Why we suddenly have to juggle with the fact that it is DMA-buf shared 
with another device? This has worked for at least a decade before.

Regards,
Christian.
Michel Dänzer Feb. 26, 2024, 6:01 p.m. UTC | #13
On 2024-02-26 17:53, Christian König wrote:
> Am 26.02.24 um 17:50 schrieb Michel Dänzer:
>> On 2024-02-26 17:46, Michel Dänzer wrote:
>>> On 2024-02-26 17:34, Christian König wrote:
>>>
>>>> My question is why has it worked so far? I mean we are not doing this since yesterday and the problem only shows up now?
>>> Yes, Wayland compositors are only starting to try and make use of this now.
>> To expand on this, mutter will want to do something like this as well sooner or later. I suspect it's the same for others like kwin, sway etc.
> 
> Yeah, but we have done similar things with X decades before. E.g. basically the client sends a BO to the server for displaying it.

The scenario in https://gitlab.freedesktop.org/mesa/mesa/-/issues/10635 is direct scanout of a client buffer on a secondary dGPU, while the compositor uses the APU as the primary compositing GPU.

AFAIR Xorg has never supported direct scanout of client buffers in this scenario. Secondary GPU scanout is always done from a separate local buffer allocated by Xorg / the driver.

This is Wayland compositors pushing the envelope.


> Why we suddenly have to juggle with the fact that it is DMA-buf shared with another device?

The problematic case is if the Wayland compositor has to produce a composited frame (e.g. due to a notification or other window popping up over the fullscreen window), but the client hasn't attached a new buffer to the fullscreen surface, so the compositor has to use the contents of the same client buffer which is still being scanned out by the dGPU for compositing with the APU.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index b8fbe97efe1d3..514a5b2159815 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -1255,13 +1255,41 @@  amdgpu_display_user_framebuffer_create(struct drm_device *dev,
 		return ERR_PTR(-ENOENT);
 	}
 
-	/* Handle is imported dma-buf, so cannot be migrated to VRAM for scanout */
 	bo = gem_to_amdgpu_bo(obj);
 	domains = amdgpu_display_supported_domains(drm_to_adev(dev), bo->flags);
-	if (obj->import_attach && !(domains & AMDGPU_GEM_DOMAIN_GTT)) {
-		drm_dbg_kms(dev, "Cannot create framebuffer from imported dma_buf\n");
-		drm_gem_object_put(obj);
-		return ERR_PTR(-EINVAL);
+	if (!(domains & AMDGPU_GEM_DOMAIN_GTT)) {
+		bool can_pin = true;
+
+		mutex_lock(&file_priv->prime.lock);
+
+		/* Handle is imported dma-buf, so cannot be migrated to VRAM for scanout */
+		if (obj->import_attach) {
+			drm_dbg_kms(dev, "Cannot create framebuffer from imported dma_buf\n");
+			can_pin = false;
+		} else if (obj->dma_buf) {
+			struct dma_buf *dmabuf = obj->dma_buf;
+			struct dma_buf_attachment *attachment;
+
+			dma_resv_lock(dmabuf->resv, NULL);
+
+			list_for_each_entry(attachment, &dmabuf->attachments, node) {
+				if (IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY) && attachment->peer2peer)
+					continue;
+
+				drm_dbg_kms(dev, "Cannot create framebuffer from non-P2P exported dma_buf\n");
+				can_pin = false;
+				break;
+			}
+
+			dma_resv_unlock(dmabuf->resv);
+		}
+
+		mutex_unlock(&file_priv->prime.lock);
+
+		if (!can_pin) {
+			drm_gem_object_put(obj);
+			return ERR_PTR(-EINVAL);
+		}
 	}
 
 	amdgpu_fb = kzalloc(sizeof(*amdgpu_fb), GFP_KERNEL);