diff mbox

[2/3] drm/scheduler: Don't call wait_event_killable for signaled process.

Message ID c1d31aae-6687-0974-019d-a9be02b9327e@amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrey Grodzovsky May 22, 2018, 3:49 p.m. UTC
On 05/18/2018 04:46 AM, Michel Dänzer wrote:
> On 2018-05-17 09:05 PM, Andrey Grodzovsky wrote:
>> On 05/17/2018 10:48 AM, Michel Dänzer wrote:
>>> On 2018-05-17 01:18 PM, Andrey Grodzovsky wrote:
>>>> Hi Michele and others, I am trying to implement the approach bellow to
>>>> resolve AMDGPU's hang when commands are stuck in pipe during process
>>>> exit.
>>>>
>>>> I noticed that once I implemented the file_operation.flush callback
>>>> then during run of X, i see the flush callback gets called not only for
>>>> Xorg process but for other
>>>>
>>>> processes such as 'xkbcomp' and even 'sh', it seems like Xorg passes his
>>>> FDs to children, Christian mentioned he remembered a discussion to
>>>> always set FD_CLOEXEC flag when opening the hardware device file, so
>>>>
>>>> we suspect a bug in Xorg with regard to this behavior.
>>> Try the libdrm patch below.
>>>
>>> Note that the X server passes DRM file descriptors to DRI3 clients.
>> Tried it, didn't help. I still see other processes calling .flush for
>> /dev/dri/card0
> Try the attached xserver patch on top. With these patches, I no longer
> see any DRM file descriptors being opened without O_CLOEXEC running Xorg
> -pogo in strace.

Thanks for the patch, unfortunately this is my first time  building xorg
form source and I hit some blocks with dependencies. I wonder if you
could quickly apply to amdgpu the attached small patch and run xinit from
command line. In case the FD is not passed any more you will only see
Xorg print in dmeg afterwards, otherwise 'sh' and 'xkbcomp' will also 
get printed.

Andrey

>
> Anyway, the kernel can't rely on userspace using O_CLOEXEC. If the flush
> callback being called from multiple processes is an issue, maybe the
> flush callback isn't appropriate after all.
>
>

Comments

Michel Dänzer May 22, 2018, 4:09 p.m. UTC | #1
On 2018-05-22 05:49 PM, Andrey Grodzovsky wrote:
> On 05/18/2018 04:46 AM, Michel Dänzer wrote:
>> On 2018-05-17 09:05 PM, Andrey Grodzovsky wrote:
>>> On 05/17/2018 10:48 AM, Michel Dänzer wrote:
>>>> On 2018-05-17 01:18 PM, Andrey Grodzovsky wrote:
>>>>> Hi Michele and others, I am trying to implement the approach bellow to
>>>>> resolve AMDGPU's hang when commands are stuck in pipe during process
>>>>> exit.
>>>>>
>>>>> I noticed that once I implemented the file_operation.flush callback
>>>>> then during run of X, i see the flush callback gets called not only
>>>>> for
>>>>> Xorg process but for other
>>>>>
>>>>> processes such as 'xkbcomp' and even 'sh', it seems like Xorg
>>>>> passes his
>>>>> FDs to children, Christian mentioned he remembered a discussion to
>>>>> always set FD_CLOEXEC flag when opening the hardware device file, so
>>>>>
>>>>> we suspect a bug in Xorg with regard to this behavior.
>>>> Try the libdrm patch below.
>>>>
>>>> Note that the X server passes DRM file descriptors to DRI3 clients.
>>> Tried it, didn't help. I still see other processes calling .flush for
>>> /dev/dri/card0
>> Try the attached xserver patch on top. With these patches, I no longer
>> see any DRM file descriptors being opened without O_CLOEXEC running Xorg
>> -pogo in strace.
> 
> Thanks for the patch, unfortunately this is my first time  building xorg
> form source and I hit some blocks with dependencies. I wonder if you
> could quickly apply to amdgpu the attached small patch and run xinit from
> command line. In case the FD is not passed any more you will only see
> Xorg print in dmeg afterwards, otherwise 'sh' and 'xkbcomp' will also
> get printed.

As I said above, with these patches I don't see any DRM file descriptors
being opened without O_CLOEXEC, so they aren't getting passed to sh or
xkbcomp.
Andrey Grodzovsky May 22, 2018, 4:30 p.m. UTC | #2
On 05/22/2018 12:09 PM, Michel Dänzer wrote:
> On 2018-05-22 05:49 PM, Andrey Grodzovsky wrote:
>> On 05/18/2018 04:46 AM, Michel Dänzer wrote:
>>> On 2018-05-17 09:05 PM, Andrey Grodzovsky wrote:
>>>> On 05/17/2018 10:48 AM, Michel Dänzer wrote:
>>>>> On 2018-05-17 01:18 PM, Andrey Grodzovsky wrote:
>>>>>> Hi Michele and others, I am trying to implement the approach bellow to
>>>>>> resolve AMDGPU's hang when commands are stuck in pipe during process
>>>>>> exit.
>>>>>>
>>>>>> I noticed that once I implemented the file_operation.flush callback
>>>>>> then during run of X, i see the flush callback gets called not only
>>>>>> for
>>>>>> Xorg process but for other
>>>>>>
>>>>>> processes such as 'xkbcomp' and even 'sh', it seems like Xorg
>>>>>> passes his
>>>>>> FDs to children, Christian mentioned he remembered a discussion to
>>>>>> always set FD_CLOEXEC flag when opening the hardware device file, so
>>>>>>
>>>>>> we suspect a bug in Xorg with regard to this behavior.
>>>>> Try the libdrm patch below.
>>>>>
>>>>> Note that the X server passes DRM file descriptors to DRI3 clients.
>>>> Tried it, didn't help. I still see other processes calling .flush for
>>>> /dev/dri/card0
>>> Try the attached xserver patch on top. With these patches, I no longer
>>> see any DRM file descriptors being opened without O_CLOEXEC running Xorg
>>> -pogo in strace.
>> Thanks for the patch, unfortunately this is my first time  building xorg
>> form source and I hit some blocks with dependencies. I wonder if you
>> could quickly apply to amdgpu the attached small patch and run xinit from
>> command line. In case the FD is not passed any more you will only see
>> Xorg print in dmeg afterwards, otherwise 'sh' and 'xkbcomp' will also
>> get printed.
> As I said above, with these patches I don't see any DRM file descriptors
> being opened without O_CLOEXEC, so they aren't getting passed to sh or
> xkbcomp.

OK then, please put me on CC when you send this patch for review.

Andrey

>
>
Michel Dänzer May 22, 2018, 4:33 p.m. UTC | #3
On 2018-05-22 06:30 PM, Andrey Grodzovsky wrote:
> On 05/22/2018 12:09 PM, Michel Dänzer wrote:
>> On 2018-05-22 05:49 PM, Andrey Grodzovsky wrote:
>>> On 05/18/2018 04:46 AM, Michel Dänzer wrote:
>>>> On 2018-05-17 09:05 PM, Andrey Grodzovsky wrote:
>>>>> On 05/17/2018 10:48 AM, Michel Dänzer wrote:
>>>>>> On 2018-05-17 01:18 PM, Andrey Grodzovsky wrote:
>>>>>>> Hi Michele and others, I am trying to implement the approach
>>>>>>> bellow to
>>>>>>> resolve AMDGPU's hang when commands are stuck in pipe during process
>>>>>>> exit.
>>>>>>>
>>>>>>> I noticed that once I implemented the file_operation.flush callback
>>>>>>> then during run of X, i see the flush callback gets called not only
>>>>>>> for
>>>>>>> Xorg process but for other
>>>>>>>
>>>>>>> processes such as 'xkbcomp' and even 'sh', it seems like Xorg
>>>>>>> passes his
>>>>>>> FDs to children, Christian mentioned he remembered a discussion to
>>>>>>> always set FD_CLOEXEC flag when opening the hardware device file, so
>>>>>>>
>>>>>>> we suspect a bug in Xorg with regard to this behavior.
>>>>>> Try the libdrm patch below.
>>>>>>
>>>>>> Note that the X server passes DRM file descriptors to DRI3 clients.
>>>>> Tried it, didn't help. I still see other processes calling .flush for
>>>>> /dev/dri/card0
>>>> Try the attached xserver patch on top. With these patches, I no longer
>>>> see any DRM file descriptors being opened without O_CLOEXEC running
>>>> Xorg
>>>> -pogo in strace.
>>> Thanks for the patch, unfortunately this is my first time  building xorg
>>> form source and I hit some blocks with dependencies. I wonder if you
>>> could quickly apply to amdgpu the attached small patch and run xinit
>>> from
>>> command line. In case the FD is not passed any more you will only see
>>> Xorg print in dmeg afterwards, otherwise 'sh' and 'xkbcomp' will also
>>> get printed.
>> As I said above, with these patches I don't see any DRM file descriptors
>> being opened without O_CLOEXEC, so they aren't getting passed to sh or
>> xkbcomp.
> 
> OK then, please put me on CC when you send this patch for review.

Both patches have already landed on the respective Git master branches.
Andrey Grodzovsky May 22, 2018, 4:37 p.m. UTC | #4
On 05/22/2018 12:33 PM, Michel Dänzer wrote:
> On 2018-05-22 06:30 PM, Andrey Grodzovsky wrote:
>> On 05/22/2018 12:09 PM, Michel Dänzer wrote:
>>> On 2018-05-22 05:49 PM, Andrey Grodzovsky wrote:
>>>> On 05/18/2018 04:46 AM, Michel Dänzer wrote:
>>>>> On 2018-05-17 09:05 PM, Andrey Grodzovsky wrote:
>>>>>> On 05/17/2018 10:48 AM, Michel Dänzer wrote:
>>>>>>> On 2018-05-17 01:18 PM, Andrey Grodzovsky wrote:
>>>>>>>> Hi Michele and others, I am trying to implement the approach
>>>>>>>> bellow to
>>>>>>>> resolve AMDGPU's hang when commands are stuck in pipe during process
>>>>>>>> exit.
>>>>>>>>
>>>>>>>> I noticed that once I implemented the file_operation.flush callback
>>>>>>>> then during run of X, i see the flush callback gets called not only
>>>>>>>> for
>>>>>>>> Xorg process but for other
>>>>>>>>
>>>>>>>> processes such as 'xkbcomp' and even 'sh', it seems like Xorg
>>>>>>>> passes his
>>>>>>>> FDs to children, Christian mentioned he remembered a discussion to
>>>>>>>> always set FD_CLOEXEC flag when opening the hardware device file, so
>>>>>>>>
>>>>>>>> we suspect a bug in Xorg with regard to this behavior.
>>>>>>> Try the libdrm patch below.
>>>>>>>
>>>>>>> Note that the X server passes DRM file descriptors to DRI3 clients.
>>>>>> Tried it, didn't help. I still see other processes calling .flush for
>>>>>> /dev/dri/card0
>>>>> Try the attached xserver patch on top. With these patches, I no longer
>>>>> see any DRM file descriptors being opened without O_CLOEXEC running
>>>>> Xorg
>>>>> -pogo in strace.
>>>> Thanks for the patch, unfortunately this is my first time  building xorg
>>>> form source and I hit some blocks with dependencies. I wonder if you
>>>> could quickly apply to amdgpu the attached small patch and run xinit
>>>> from
>>>> command line. In case the FD is not passed any more you will only see
>>>> Xorg print in dmeg afterwards, otherwise 'sh' and 'xkbcomp' will also
>>>> get printed.
>>> As I said above, with these patches I don't see any DRM file descriptors
>>> being opened without O_CLOEXEC, so they aren't getting passed to sh or
>>> xkbcomp.
>> OK then, please put me on CC when you send this patch for review.
> Both patches have already landed on the respective Git master branches.

Good to know.

Andrey

>
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index b0bf2f2..1f63712 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -855,9 +855,18 @@  static const struct dev_pm_ops amdgpu_pm_ops = {
 	.runtime_idle = amdgpu_pmops_runtime_idle,
 };
 
+static int amdgpu_flush(struct file *f, fl_owner_t id) {
+
+	DRM_ERROR("%s\n", current->comm);
+
+       return 0;
+}
+
+
 static const struct file_operations amdgpu_driver_kms_fops = {
 	.owner = THIS_MODULE,
 	.open = drm_open,
+	.flush = amdgpu_flush,
 	.release = drm_release,
 	.unlocked_ioctl = amdgpu_drm_ioctl,
 	.mmap = amdgpu_mmap,