Message ID | c1d31aae-6687-0974-019d-a9be02b9327e@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2018-05-22 05:49 PM, Andrey Grodzovsky wrote: > On 05/18/2018 04:46 AM, Michel Dänzer wrote: >> On 2018-05-17 09:05 PM, Andrey Grodzovsky wrote: >>> On 05/17/2018 10:48 AM, Michel Dänzer wrote: >>>> On 2018-05-17 01:18 PM, Andrey Grodzovsky wrote: >>>>> Hi Michele and others, I am trying to implement the approach bellow to >>>>> resolve AMDGPU's hang when commands are stuck in pipe during process >>>>> exit. >>>>> >>>>> I noticed that once I implemented the file_operation.flush callback >>>>> then during run of X, i see the flush callback gets called not only >>>>> for >>>>> Xorg process but for other >>>>> >>>>> processes such as 'xkbcomp' and even 'sh', it seems like Xorg >>>>> passes his >>>>> FDs to children, Christian mentioned he remembered a discussion to >>>>> always set FD_CLOEXEC flag when opening the hardware device file, so >>>>> >>>>> we suspect a bug in Xorg with regard to this behavior. >>>> Try the libdrm patch below. >>>> >>>> Note that the X server passes DRM file descriptors to DRI3 clients. >>> Tried it, didn't help. I still see other processes calling .flush for >>> /dev/dri/card0 >> Try the attached xserver patch on top. With these patches, I no longer >> see any DRM file descriptors being opened without O_CLOEXEC running Xorg >> -pogo in strace. > > Thanks for the patch, unfortunately this is my first time building xorg > form source and I hit some blocks with dependencies. I wonder if you > could quickly apply to amdgpu the attached small patch and run xinit from > command line. In case the FD is not passed any more you will only see > Xorg print in dmeg afterwards, otherwise 'sh' and 'xkbcomp' will also > get printed. As I said above, with these patches I don't see any DRM file descriptors being opened without O_CLOEXEC, so they aren't getting passed to sh or xkbcomp.
On 05/22/2018 12:09 PM, Michel Dänzer wrote: > On 2018-05-22 05:49 PM, Andrey Grodzovsky wrote: >> On 05/18/2018 04:46 AM, Michel Dänzer wrote: >>> On 2018-05-17 09:05 PM, Andrey Grodzovsky wrote: >>>> On 05/17/2018 10:48 AM, Michel Dänzer wrote: >>>>> On 2018-05-17 01:18 PM, Andrey Grodzovsky wrote: >>>>>> Hi Michele and others, I am trying to implement the approach bellow to >>>>>> resolve AMDGPU's hang when commands are stuck in pipe during process >>>>>> exit. >>>>>> >>>>>> I noticed that once I implemented the file_operation.flush callback >>>>>> then during run of X, i see the flush callback gets called not only >>>>>> for >>>>>> Xorg process but for other >>>>>> >>>>>> processes such as 'xkbcomp' and even 'sh', it seems like Xorg >>>>>> passes his >>>>>> FDs to children, Christian mentioned he remembered a discussion to >>>>>> always set FD_CLOEXEC flag when opening the hardware device file, so >>>>>> >>>>>> we suspect a bug in Xorg with regard to this behavior. >>>>> Try the libdrm patch below. >>>>> >>>>> Note that the X server passes DRM file descriptors to DRI3 clients. >>>> Tried it, didn't help. I still see other processes calling .flush for >>>> /dev/dri/card0 >>> Try the attached xserver patch on top. With these patches, I no longer >>> see any DRM file descriptors being opened without O_CLOEXEC running Xorg >>> -pogo in strace. >> Thanks for the patch, unfortunately this is my first time building xorg >> form source and I hit some blocks with dependencies. I wonder if you >> could quickly apply to amdgpu the attached small patch and run xinit from >> command line. In case the FD is not passed any more you will only see >> Xorg print in dmeg afterwards, otherwise 'sh' and 'xkbcomp' will also >> get printed. > As I said above, with these patches I don't see any DRM file descriptors > being opened without O_CLOEXEC, so they aren't getting passed to sh or > xkbcomp. OK then, please put me on CC when you send this patch for review. Andrey > >
On 2018-05-22 06:30 PM, Andrey Grodzovsky wrote: > On 05/22/2018 12:09 PM, Michel Dänzer wrote: >> On 2018-05-22 05:49 PM, Andrey Grodzovsky wrote: >>> On 05/18/2018 04:46 AM, Michel Dänzer wrote: >>>> On 2018-05-17 09:05 PM, Andrey Grodzovsky wrote: >>>>> On 05/17/2018 10:48 AM, Michel Dänzer wrote: >>>>>> On 2018-05-17 01:18 PM, Andrey Grodzovsky wrote: >>>>>>> Hi Michele and others, I am trying to implement the approach >>>>>>> bellow to >>>>>>> resolve AMDGPU's hang when commands are stuck in pipe during process >>>>>>> exit. >>>>>>> >>>>>>> I noticed that once I implemented the file_operation.flush callback >>>>>>> then during run of X, i see the flush callback gets called not only >>>>>>> for >>>>>>> Xorg process but for other >>>>>>> >>>>>>> processes such as 'xkbcomp' and even 'sh', it seems like Xorg >>>>>>> passes his >>>>>>> FDs to children, Christian mentioned he remembered a discussion to >>>>>>> always set FD_CLOEXEC flag when opening the hardware device file, so >>>>>>> >>>>>>> we suspect a bug in Xorg with regard to this behavior. >>>>>> Try the libdrm patch below. >>>>>> >>>>>> Note that the X server passes DRM file descriptors to DRI3 clients. >>>>> Tried it, didn't help. I still see other processes calling .flush for >>>>> /dev/dri/card0 >>>> Try the attached xserver patch on top. With these patches, I no longer >>>> see any DRM file descriptors being opened without O_CLOEXEC running >>>> Xorg >>>> -pogo in strace. >>> Thanks for the patch, unfortunately this is my first time building xorg >>> form source and I hit some blocks with dependencies. I wonder if you >>> could quickly apply to amdgpu the attached small patch and run xinit >>> from >>> command line. In case the FD is not passed any more you will only see >>> Xorg print in dmeg afterwards, otherwise 'sh' and 'xkbcomp' will also >>> get printed. >> As I said above, with these patches I don't see any DRM file descriptors >> being opened without O_CLOEXEC, so they aren't getting passed to sh or >> xkbcomp. > > OK then, please put me on CC when you send this patch for review. Both patches have already landed on the respective Git master branches.
On 05/22/2018 12:33 PM, Michel Dänzer wrote: > On 2018-05-22 06:30 PM, Andrey Grodzovsky wrote: >> On 05/22/2018 12:09 PM, Michel Dänzer wrote: >>> On 2018-05-22 05:49 PM, Andrey Grodzovsky wrote: >>>> On 05/18/2018 04:46 AM, Michel Dänzer wrote: >>>>> On 2018-05-17 09:05 PM, Andrey Grodzovsky wrote: >>>>>> On 05/17/2018 10:48 AM, Michel Dänzer wrote: >>>>>>> On 2018-05-17 01:18 PM, Andrey Grodzovsky wrote: >>>>>>>> Hi Michele and others, I am trying to implement the approach >>>>>>>> bellow to >>>>>>>> resolve AMDGPU's hang when commands are stuck in pipe during process >>>>>>>> exit. >>>>>>>> >>>>>>>> I noticed that once I implemented the file_operation.flush callback >>>>>>>> then during run of X, i see the flush callback gets called not only >>>>>>>> for >>>>>>>> Xorg process but for other >>>>>>>> >>>>>>>> processes such as 'xkbcomp' and even 'sh', it seems like Xorg >>>>>>>> passes his >>>>>>>> FDs to children, Christian mentioned he remembered a discussion to >>>>>>>> always set FD_CLOEXEC flag when opening the hardware device file, so >>>>>>>> >>>>>>>> we suspect a bug in Xorg with regard to this behavior. >>>>>>> Try the libdrm patch below. >>>>>>> >>>>>>> Note that the X server passes DRM file descriptors to DRI3 clients. >>>>>> Tried it, didn't help. I still see other processes calling .flush for >>>>>> /dev/dri/card0 >>>>> Try the attached xserver patch on top. With these patches, I no longer >>>>> see any DRM file descriptors being opened without O_CLOEXEC running >>>>> Xorg >>>>> -pogo in strace. >>>> Thanks for the patch, unfortunately this is my first time building xorg >>>> form source and I hit some blocks with dependencies. I wonder if you >>>> could quickly apply to amdgpu the attached small patch and run xinit >>>> from >>>> command line. In case the FD is not passed any more you will only see >>>> Xorg print in dmeg afterwards, otherwise 'sh' and 'xkbcomp' will also >>>> get printed. >>> As I said above, with these patches I don't see any DRM file descriptors >>> being opened without O_CLOEXEC, so they aren't getting passed to sh or >>> xkbcomp. >> OK then, please put me on CC when you send this patch for review. > Both patches have already landed on the respective Git master branches. Good to know. Andrey > >
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index b0bf2f2..1f63712 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -855,9 +855,18 @@ static const struct dev_pm_ops amdgpu_pm_ops = { .runtime_idle = amdgpu_pmops_runtime_idle, }; +static int amdgpu_flush(struct file *f, fl_owner_t id) { + + DRM_ERROR("%s\n", current->comm); + + return 0; +} + + static const struct file_operations amdgpu_driver_kms_fops = { .owner = THIS_MODULE, .open = drm_open, + .flush = amdgpu_flush, .release = drm_release, .unlocked_ioctl = amdgpu_drm_ioctl, .mmap = amdgpu_mmap,