[2/3] drm/scheduler: Don't call wait_event_killable for signaled process.
diff mbox

Message ID 12c806f9-f283-5bed-d137-7719ba73205a@daenzer.net
State New
Headers show

Commit Message

Michel Dänzer May 18, 2018, 8:46 a.m. UTC
On 2018-05-17 09:05 PM, Andrey Grodzovsky wrote:
> On 05/17/2018 10:48 AM, Michel Dänzer wrote:
>> On 2018-05-17 01:18 PM, Andrey Grodzovsky wrote:
>>> Hi Michele and others, I am trying to implement the approach bellow to
>>> resolve AMDGPU's hang when commands are stuck in pipe during process
>>> exit.
>>>
>>> I noticed that once I implemented the file_operation.flush callback
>>> then during run of X, i see the flush callback gets called not only for
>>> Xorg process but for other
>>>
>>> processes such as 'xkbcomp' and even 'sh', it seems like Xorg passes his
>>> FDs to children, Christian mentioned he remembered a discussion to
>>> always set FD_CLOEXEC flag when opening the hardware device file, so
>>>
>>> we suspect a bug in Xorg with regard to this behavior.
>> Try the libdrm patch below.
>>
>> Note that the X server passes DRM file descriptors to DRI3 clients.
> 
> Tried it, didn't help. I still see other processes calling .flush for
> /dev/dri/card0

Try the attached xserver patch on top. With these patches, I no longer
see any DRM file descriptors being opened without O_CLOEXEC running Xorg
-pogo in strace.


Anyway, the kernel can't rely on userspace using O_CLOEXEC. If the flush
callback being called from multiple processes is an issue, maybe the
flush callback isn't appropriate after all.

Comments

Christian König May 18, 2018, 9:42 a.m. UTC | #1
> Anyway, the kernel can't rely on userspace using O_CLOEXEC. If the flush
> callback being called from multiple processes is an issue, maybe the
> flush callback isn't appropriate after all.

Userspace could also grab a reference just by opening /proc/$pid/fd/*.

The idea is just that when any process which used the fd is killed by a 
signal we drop the remaining jobs from being submitted to the hardware.

Christian.
Michel Dänzer May 18, 2018, 2:44 p.m. UTC | #2
On 2018-05-18 11:42 AM, Christian König wrote:
> 
>> Anyway, the kernel can't rely on userspace using O_CLOEXEC. If the flush
>> callback being called from multiple processes is an issue, maybe the
>> flush callback isn't appropriate after all.
> 
> Userspace could also grab a reference just by opening /proc/$pid/fd/*.
> 
> The idea is just that when any process which used the fd is killed by a
> signal we drop the remaining jobs from being submitted to the hardware.

This must only affect jobs submitted by the killed process, not those
submitted by other processes.
Christian König May 18, 2018, 2:50 p.m. UTC | #3
Am 18.05.2018 um 16:44 schrieb Michel Dänzer:
> On 2018-05-18 11:42 AM, Christian König wrote:
>>> Anyway, the kernel can't rely on userspace using O_CLOEXEC. If the flush
>>> callback being called from multiple processes is an issue, maybe the
>>> flush callback isn't appropriate after all.
>> Userspace could also grab a reference just by opening /proc/$pid/fd/*.
>>
>> The idea is just that when any process which used the fd is killed by a
>> signal we drop the remaining jobs from being submitted to the hardware.
> This must only affect jobs submitted by the killed process, not those
> submitted by other processes.

Yeah, that's exactly the plan here.

For additional security we could safe the pid of the job submitter, but 
since this should basically not happen in normal operation I would 
rather like to avoid that.

Christian.
Andrey Grodzovsky May 18, 2018, 3:02 p.m. UTC | #4
On 05/18/2018 10:50 AM, Christian König wrote:
> Am 18.05.2018 um 16:44 schrieb Michel Dänzer:
>> On 2018-05-18 11:42 AM, Christian König wrote:
>>>> Anyway, the kernel can't rely on userspace using O_CLOEXEC. If the 
>>>> flush
>>>> callback being called from multiple processes is an issue, maybe the
>>>> flush callback isn't appropriate after all.
>>> Userspace could also grab a reference just by opening /proc/$pid/fd/*.
>>>
>>> The idea is just that when any process which used the fd is killed by a
>>> signal we drop the remaining jobs from being submitted to the hardware.
>> This must only affect jobs submitted by the killed process, not those
>> submitted by other processes.
>
> Yeah, that's exactly the plan here.

I don't see how it's gong to happen -
.flush is being called for any terminating process regardless if he 
submitted jobs
or just accidentally (or not)  has the device file FD in his private 
file table. So here
we going to have a problem with that requirement. If a process is being 
killed and .flush is
executed I don't have any way to know which  amdgpu_ctx to chose to 
terminate it's pending jobs.
The only info i have from .flush caller is the process id.
As it's now in amdgpu_ctx_mgr_entity_fini and 
amdgpu_ctx_mgr_entity_cleanup we are going to iterate
all the contextes from the context manager list and terminate them all, 
which sounds wrong to me indeed.
I can save the pid of the context creator on the context structure so i 
can match during .flush call, but in case some one
creates the context but passes the context id to another process for 
actual job submission this approach won't work either.

Am I messing something here ?

Andrey

>
> For additional security we could safe the pid of the job submitter, 
> but since this should basically not happen in normal operation I would 
> rather like to avoid that.
>
> Christian.
Christian König May 22, 2018, 12:58 p.m. UTC | #5
Am 18.05.2018 um 17:02 schrieb Andrey Grodzovsky:
>
>
> On 05/18/2018 10:50 AM, Christian König wrote:
>> Am 18.05.2018 um 16:44 schrieb Michel Dänzer:
>>> On 2018-05-18 11:42 AM, Christian König wrote:
>>>>> Anyway, the kernel can't rely on userspace using O_CLOEXEC. If the 
>>>>> flush
>>>>> callback being called from multiple processes is an issue, maybe the
>>>>> flush callback isn't appropriate after all.
>>>> Userspace could also grab a reference just by opening /proc/$pid/fd/*.
>>>>
>>>> The idea is just that when any process which used the fd is killed 
>>>> by a
>>>> signal we drop the remaining jobs from being submitted to the 
>>>> hardware.
>>> This must only affect jobs submitted by the killed process, not those
>>> submitted by other processes.
>>
>> Yeah, that's exactly the plan here.
>
> I don't see how it's gong to happen -
> .flush is being called for any terminating process regardless if he 
> submitted jobs
> or just accidentally (or not)  has the device file FD in his private 
> file table. So here
> we going to have a problem with that requirement. If a process is 
> being killed and .flush is
> executed I don't have any way to know which  amdgpu_ctx to chose to 
> terminate it's pending jobs.
> The only info i have from .flush caller is the process id.
> As it's now in amdgpu_ctx_mgr_entity_fini and 
> amdgpu_ctx_mgr_entity_cleanup we are going to iterate
> all the contextes from the context manager list and terminate them 
> all, which sounds wrong to me indeed.
> I can save the pid of the context creator on the context structure so 
> i can match during .flush call, but in case some one
> creates the context but passes the context id to another process for 
> actual job submission this approach won't work either.
>
> Am I messing something here ?

Your analyses is correct, it's just that I think that this case should 
not happen.

What can happen is that the fd is passed accidentally to child processes 
and those child processes are then killed, but passing the fd to child 
processes is a bug in the first place.

When somebody on purpose opens the fd and kills the process then it 
breaks and he can keep the pieces. I mean to open the fd you need to be 
privileged anyway.

What we could do to completely fix the issue:
1. Note for each submitted job which process (pid) it submitted.
2. During flush wait or kill only jobs of the current process.

But I think that this is overkill.

Christian.

>
> Andrey
>
>>
>> For additional security we could safe the pid of the job submitter, 
>> but since this should basically not happen in normal operation I 
>> would rather like to avoid that.
>>
>> Christian.
>

Patch
diff mbox

diff --git a/hw/xfree86/drivers/modesetting/driver.c b/hw/xfree86/drivers/modesetting/driver.c
index 5d8906d63..306541f33 100644
--- a/hw/xfree86/drivers/modesetting/driver.c
+++ b/hw/xfree86/drivers/modesetting/driver.c
@@ -200,12 +200,12 @@  open_hw(const char *dev)
     int fd;
 
     if (dev)
-        fd = open(dev, O_RDWR, 0);
+        fd = open(dev, O_RDWR | O_CLOEXEC, 0);
     else {
         dev = getenv("KMSDEVICE");
-        if ((NULL == dev) || ((fd = open(dev, O_RDWR, 0)) == -1)) {
+        if ((NULL == dev) || ((fd = open(dev, O_RDWR | O_CLOEXEC, 0)) == -1)) {
             dev = "/dev/dri/card0";
-            fd = open(dev, O_RDWR, 0);
+            fd = open(dev, O_RDWR | O_CLOEXEC, 0);
         }
     }
     if (fd == -1)
diff --git a/hw/xfree86/os-support/linux/lnx_platform.c b/hw/xfree86/os-support/linux/lnx_platform.c
index 11af52c46..70374ace8 100644
--- a/hw/xfree86/os-support/linux/lnx_platform.c
+++ b/hw/xfree86/os-support/linux/lnx_platform.c
@@ -43,7 +43,7 @@  get_drm_info(struct OdevAttributes *attribs, char *path, int delayed_index)
     }
 
     if (fd == -1)
-        fd = open(path, O_RDWR, O_CLOEXEC);
+        fd = open(path, O_RDWR | O_CLOEXEC, 0);
 
     if (fd == -1)
         return FALSE;