Message ID | 20180519065243.27600-6-yuq825@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Qiang Yu <yuq825@gmail.com> writes: > This reverts commit 45c3d213a400c952ab7119f394c5293bb6877e6b. > > lima driver need preclose to wait all task in the context > created within closing file to finish before free all the > buffer object. Otherwise pending tesk may fail and get > noisy MMU fault message. > > Move this wait to each buffer object free function can > achieve the same result but some buffer object is shared > with other file context, but we only want to wait the > closing file context's tasks. So the implementation is > not that straight forword compared to the preclose one. You should just separate your MMU structures from drm_file, and have drm_file and the jobs using it keep a reference on them. This is what I've done in V3D as well.
On Tue, May 22, 2018 at 3:37 AM, Eric Anholt <eric@anholt.net> wrote: > Qiang Yu <yuq825@gmail.com> writes: > >> This reverts commit 45c3d213a400c952ab7119f394c5293bb6877e6b. >> >> lima driver need preclose to wait all task in the context >> created within closing file to finish before free all the >> buffer object. Otherwise pending tesk may fail and get >> noisy MMU fault message. >> >> Move this wait to each buffer object free function can >> achieve the same result but some buffer object is shared >> with other file context, but we only want to wait the >> closing file context's tasks. So the implementation is >> not that straight forword compared to the preclose one. > > You should just separate your MMU structures from drm_file, and have > drm_file and the jobs using it keep a reference on them. This is what > I've done in V3D as well. It's not the VM/MMU struct that causes this problem, it's each buffer object that gets freed before task is done (postclose is after buffer free). If you mean I should keep reference of all buffers for tasks, that's not as simple as just waiting task done before free buffers. Regards, Qiang
On Tue, May 22, 2018 at 09:04:17AM +0800, Qiang Yu wrote: > On Tue, May 22, 2018 at 3:37 AM, Eric Anholt <eric@anholt.net> wrote: > > Qiang Yu <yuq825@gmail.com> writes: > > > >> This reverts commit 45c3d213a400c952ab7119f394c5293bb6877e6b. > >> > >> lima driver need preclose to wait all task in the context > >> created within closing file to finish before free all the > >> buffer object. Otherwise pending tesk may fail and get > >> noisy MMU fault message. > >> > >> Move this wait to each buffer object free function can > >> achieve the same result but some buffer object is shared > >> with other file context, but we only want to wait the > >> closing file context's tasks. So the implementation is > >> not that straight forword compared to the preclose one. > > > > You should just separate your MMU structures from drm_file, and have > > drm_file and the jobs using it keep a reference on them. This is what > > I've done in V3D as well. > > It's not the VM/MMU struct that causes this problem, it's each buffer > object that gets freed before task is done (postclose is after buffer free). > If you mean I should keep reference of all buffers for tasks, that's not > as simple as just waiting task done before free buffers. Why can't you do that waiting in the postclose hook? If it's the lack of reference-counting in your driver for gem bo, then I'd say you need to roll out some reference counting. Relying on the implicit reference provided by the core is kinda not so great (which was the reason I've thrown out the preclose hook). There's also per-bo open/close hooks. -Daniel
On Wed, May 23, 2018 at 5:04 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Tue, May 22, 2018 at 09:04:17AM +0800, Qiang Yu wrote: >> On Tue, May 22, 2018 at 3:37 AM, Eric Anholt <eric@anholt.net> wrote: >> > Qiang Yu <yuq825@gmail.com> writes: >> > >> >> This reverts commit 45c3d213a400c952ab7119f394c5293bb6877e6b. >> >> >> >> lima driver need preclose to wait all task in the context >> >> created within closing file to finish before free all the >> >> buffer object. Otherwise pending tesk may fail and get >> >> noisy MMU fault message. >> >> >> >> Move this wait to each buffer object free function can >> >> achieve the same result but some buffer object is shared >> >> with other file context, but we only want to wait the >> >> closing file context's tasks. So the implementation is >> >> not that straight forword compared to the preclose one. >> > >> > You should just separate your MMU structures from drm_file, and have >> > drm_file and the jobs using it keep a reference on them. This is what >> > I've done in V3D as well. >> >> It's not the VM/MMU struct that causes this problem, it's each buffer >> object that gets freed before task is done (postclose is after buffer free). >> If you mean I should keep reference of all buffers for tasks, that's not >> as simple as just waiting task done before free buffers. > > Why can't you do that waiting in the postclose hook? If it's the lack of > reference-counting in your driver for gem bo, then I'd say you need to > roll out some reference counting. Relying on the implicit reference > provided by the core is kinda not so great (which was the reason I've > thrown out the preclose hook). There's also per-bo open/close hooks. It's possible to not use preclose, but the implementation is not as simple and straight forward as the preclose I think. There're two method I can think of: 1. do wait when free buffers callback unmap buffer from this process's lima VM (wait buffer reservation object), this is fine and simple, but there's case that this buffer is shared between two processes, so the best way should be only waiting fences from this process, so we'd better do some record for fences for a "perfect waiting" 2. keep a reference of involved buffers for a task, unreference it when task done, also keep a reference of the buffer mapping in this process's lima VM (this is more complicated to implement) But if there's a preclose, just wait all this process's task done, then unmap/free buffers, it's simple and straight forward. I'd like to hear if there's other better way for only use postclose. Regards, Qiang
On Wed, May 23, 2018 at 2:59 PM, Qiang Yu <yuq825@gmail.com> wrote: > On Wed, May 23, 2018 at 5:04 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >> On Tue, May 22, 2018 at 09:04:17AM +0800, Qiang Yu wrote: >>> On Tue, May 22, 2018 at 3:37 AM, Eric Anholt <eric@anholt.net> wrote: >>> > Qiang Yu <yuq825@gmail.com> writes: >>> > >>> >> This reverts commit 45c3d213a400c952ab7119f394c5293bb6877e6b. >>> >> >>> >> lima driver need preclose to wait all task in the context >>> >> created within closing file to finish before free all the >>> >> buffer object. Otherwise pending tesk may fail and get >>> >> noisy MMU fault message. >>> >> >>> >> Move this wait to each buffer object free function can >>> >> achieve the same result but some buffer object is shared >>> >> with other file context, but we only want to wait the >>> >> closing file context's tasks. So the implementation is >>> >> not that straight forword compared to the preclose one. >>> > >>> > You should just separate your MMU structures from drm_file, and have >>> > drm_file and the jobs using it keep a reference on them. This is what >>> > I've done in V3D as well. >>> >>> It's not the VM/MMU struct that causes this problem, it's each buffer >>> object that gets freed before task is done (postclose is after buffer free). >>> If you mean I should keep reference of all buffers for tasks, that's not >>> as simple as just waiting task done before free buffers. >> >> Why can't you do that waiting in the postclose hook? If it's the lack of >> reference-counting in your driver for gem bo, then I'd say you need to >> roll out some reference counting. Relying on the implicit reference >> provided by the core is kinda not so great (which was the reason I've >> thrown out the preclose hook). There's also per-bo open/close hooks. > > It's possible to not use preclose, but the implementation is not as simple > and straight forward as the preclose I think. There're two method I can > think of: > 1. do wait when free buffers callback unmap buffer from this process's > lima VM (wait buffer reservation object), this is fine and simple, but > there's case that this buffer is shared between two processes, so the > best way should be only waiting fences from this process, so we'd > better do some record for fences for a "perfect waiting" > 2. keep a reference of involved buffers for a task, unreference it when > task done, also keep a reference of the buffer mapping in this process's > lima VM (this is more complicated to implement) > > But if there's a preclose, just wait all this process's task done, then > unmap/free buffers, it's simple and straight forward. I'd like to hear if > there's other better way for only use postclose. Refcount your buffers. Borrowing references from other places tends to result in a maintenance headache with no end. So solution 2. -Daniel
On Thu, May 24, 2018 at 4:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Wed, May 23, 2018 at 2:59 PM, Qiang Yu <yuq825@gmail.com> wrote: >> On Wed, May 23, 2018 at 5:04 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >>> On Tue, May 22, 2018 at 09:04:17AM +0800, Qiang Yu wrote: >>>> On Tue, May 22, 2018 at 3:37 AM, Eric Anholt <eric@anholt.net> wrote: >>>> > Qiang Yu <yuq825@gmail.com> writes: >>>> > >>>> >> This reverts commit 45c3d213a400c952ab7119f394c5293bb6877e6b. >>>> >> >>>> >> lima driver need preclose to wait all task in the context >>>> >> created within closing file to finish before free all the >>>> >> buffer object. Otherwise pending tesk may fail and get >>>> >> noisy MMU fault message. >>>> >> >>>> >> Move this wait to each buffer object free function can >>>> >> achieve the same result but some buffer object is shared >>>> >> with other file context, but we only want to wait the >>>> >> closing file context's tasks. So the implementation is >>>> >> not that straight forword compared to the preclose one. >>>> > >>>> > You should just separate your MMU structures from drm_file, and have >>>> > drm_file and the jobs using it keep a reference on them. This is what >>>> > I've done in V3D as well. >>>> >>>> It's not the VM/MMU struct that causes this problem, it's each buffer >>>> object that gets freed before task is done (postclose is after buffer free). >>>> If you mean I should keep reference of all buffers for tasks, that's not >>>> as simple as just waiting task done before free buffers. >>> >>> Why can't you do that waiting in the postclose hook? If it's the lack of >>> reference-counting in your driver for gem bo, then I'd say you need to >>> roll out some reference counting. Relying on the implicit reference >>> provided by the core is kinda not so great (which was the reason I've >>> thrown out the preclose hook). There's also per-bo open/close hooks. >> >> It's possible to not use preclose, but the implementation is not as simple >> and straight forward as the preclose I think. There're two method I can >> think of: >> 1. do wait when free buffers callback unmap buffer from this process's >> lima VM (wait buffer reservation object), this is fine and simple, but >> there's case that this buffer is shared between two processes, so the >> best way should be only waiting fences from this process, so we'd >> better do some record for fences for a "perfect waiting" >> 2. keep a reference of involved buffers for a task, unreference it when >> task done, also keep a reference of the buffer mapping in this process's >> lima VM (this is more complicated to implement) >> >> But if there's a preclose, just wait all this process's task done, then >> unmap/free buffers, it's simple and straight forward. I'd like to hear if >> there's other better way for only use postclose. > > Refcount your buffers. Borrowing references from other places tends to > result in a maintenance headache with no end. So solution 2. In current lima implementation, refcount involved buffer for task is done in user space. So kernel's task object don't keep that. User space driver is responsible not unmap/free buffer before task is complete. This works simple and fine except the case that user press Ctrl+C to terminate the application which will force to close drm fd. I really don't think adding buffer refcount for tasks in kernel just for this case is valuable because it has no benefits for normal case but some extra load. Regards, Qiang > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Thu, May 24, 2018 at 09:18:04AM +0800, Qiang Yu wrote: > On Thu, May 24, 2018 at 4:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote: > > On Wed, May 23, 2018 at 2:59 PM, Qiang Yu <yuq825@gmail.com> wrote: > >> On Wed, May 23, 2018 at 5:04 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > >>> On Tue, May 22, 2018 at 09:04:17AM +0800, Qiang Yu wrote: > >>>> On Tue, May 22, 2018 at 3:37 AM, Eric Anholt <eric@anholt.net> wrote: > >>>> > Qiang Yu <yuq825@gmail.com> writes: > >>>> > > >>>> >> This reverts commit 45c3d213a400c952ab7119f394c5293bb6877e6b. > >>>> >> > >>>> >> lima driver need preclose to wait all task in the context > >>>> >> created within closing file to finish before free all the > >>>> >> buffer object. Otherwise pending tesk may fail and get > >>>> >> noisy MMU fault message. > >>>> >> > >>>> >> Move this wait to each buffer object free function can > >>>> >> achieve the same result but some buffer object is shared > >>>> >> with other file context, but we only want to wait the > >>>> >> closing file context's tasks. So the implementation is > >>>> >> not that straight forword compared to the preclose one. > >>>> > > >>>> > You should just separate your MMU structures from drm_file, and have > >>>> > drm_file and the jobs using it keep a reference on them. This is what > >>>> > I've done in V3D as well. > >>>> > >>>> It's not the VM/MMU struct that causes this problem, it's each buffer > >>>> object that gets freed before task is done (postclose is after buffer free). > >>>> If you mean I should keep reference of all buffers for tasks, that's not > >>>> as simple as just waiting task done before free buffers. > >>> > >>> Why can't you do that waiting in the postclose hook? If it's the lack of > >>> reference-counting in your driver for gem bo, then I'd say you need to > >>> roll out some reference counting. Relying on the implicit reference > >>> provided by the core is kinda not so great (which was the reason I've > >>> thrown out the preclose hook). There's also per-bo open/close hooks. > >> > >> It's possible to not use preclose, but the implementation is not as simple > >> and straight forward as the preclose I think. There're two method I can > >> think of: > >> 1. do wait when free buffers callback unmap buffer from this process's > >> lima VM (wait buffer reservation object), this is fine and simple, but > >> there's case that this buffer is shared between two processes, so the > >> best way should be only waiting fences from this process, so we'd > >> better do some record for fences for a "perfect waiting" > >> 2. keep a reference of involved buffers for a task, unreference it when > >> task done, also keep a reference of the buffer mapping in this process's > >> lima VM (this is more complicated to implement) > >> > >> But if there's a preclose, just wait all this process's task done, then > >> unmap/free buffers, it's simple and straight forward. I'd like to hear if > >> there's other better way for only use postclose. > > > > Refcount your buffers. Borrowing references from other places tends to > > result in a maintenance headache with no end. So solution 2. > > In current lima implementation, refcount involved buffer for task is done > in user space. So kernel's task object don't keep that. User space > driver is responsible not unmap/free buffer before task is complete. This > works simple and fine except the case that user press Ctrl+C to terminate > the application which will force to close drm fd. I really don't think adding > buffer refcount for tasks in kernel just for this case is valuable because > it has no benefits for normal case but some extra load. If kernel correctness relies on refcounting you have a giantic security problem. You need to fix that. Kernel _must_ assume that userspace is evil, trying to pull it over the table. Yes, you need refcounting. -Daniel > > Regards, > Qiang > > > -Daniel > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Thu, May 24, 2018 at 3:51 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Thu, May 24, 2018 at 09:18:04AM +0800, Qiang Yu wrote: >> On Thu, May 24, 2018 at 4:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote: >> > On Wed, May 23, 2018 at 2:59 PM, Qiang Yu <yuq825@gmail.com> wrote: >> >> On Wed, May 23, 2018 at 5:04 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >> >>> On Tue, May 22, 2018 at 09:04:17AM +0800, Qiang Yu wrote: >> >>>> On Tue, May 22, 2018 at 3:37 AM, Eric Anholt <eric@anholt.net> wrote: >> >>>> > Qiang Yu <yuq825@gmail.com> writes: >> >>>> > >> >>>> >> This reverts commit 45c3d213a400c952ab7119f394c5293bb6877e6b. >> >>>> >> >> >>>> >> lima driver need preclose to wait all task in the context >> >>>> >> created within closing file to finish before free all the >> >>>> >> buffer object. Otherwise pending tesk may fail and get >> >>>> >> noisy MMU fault message. >> >>>> >> >> >>>> >> Move this wait to each buffer object free function can >> >>>> >> achieve the same result but some buffer object is shared >> >>>> >> with other file context, but we only want to wait the >> >>>> >> closing file context's tasks. So the implementation is >> >>>> >> not that straight forword compared to the preclose one. >> >>>> > >> >>>> > You should just separate your MMU structures from drm_file, and have >> >>>> > drm_file and the jobs using it keep a reference on them. This is what >> >>>> > I've done in V3D as well. >> >>>> >> >>>> It's not the VM/MMU struct that causes this problem, it's each buffer >> >>>> object that gets freed before task is done (postclose is after buffer free). >> >>>> If you mean I should keep reference of all buffers for tasks, that's not >> >>>> as simple as just waiting task done before free buffers. >> >>> >> >>> Why can't you do that waiting in the postclose hook? If it's the lack of >> >>> reference-counting in your driver for gem bo, then I'd say you need to >> >>> roll out some reference counting. Relying on the implicit reference >> >>> provided by the core is kinda not so great (which was the reason I've >> >>> thrown out the preclose hook). There's also per-bo open/close hooks. >> >> >> >> It's possible to not use preclose, but the implementation is not as simple >> >> and straight forward as the preclose I think. There're two method I can >> >> think of: >> >> 1. do wait when free buffers callback unmap buffer from this process's >> >> lima VM (wait buffer reservation object), this is fine and simple, but >> >> there's case that this buffer is shared between two processes, so the >> >> best way should be only waiting fences from this process, so we'd >> >> better do some record for fences for a "perfect waiting" >> >> 2. keep a reference of involved buffers for a task, unreference it when >> >> task done, also keep a reference of the buffer mapping in this process's >> >> lima VM (this is more complicated to implement) >> >> >> >> But if there's a preclose, just wait all this process's task done, then >> >> unmap/free buffers, it's simple and straight forward. I'd like to hear if >> >> there's other better way for only use postclose. >> > >> > Refcount your buffers. Borrowing references from other places tends to >> > result in a maintenance headache with no end. So solution 2. >> >> In current lima implementation, refcount involved buffer for task is done >> in user space. So kernel's task object don't keep that. User space >> driver is responsible not unmap/free buffer before task is complete. This >> works simple and fine except the case that user press Ctrl+C to terminate >> the application which will force to close drm fd. I really don't think adding >> buffer refcount for tasks in kernel just for this case is valuable because >> it has no benefits for normal case but some extra load. > > If kernel correctness relies on refcounting you have a giantic security > problem. You need to fix that. Kernel _must_ assume that userspace is > evil, trying to pull it over the table. It is OK if evil user free/unmap the buffer when task is not done in my implementation. It will generate a MMU fault in that case and kernel driver will do recovery. So does the Ctrl+C case, if don't deal with it, just get some noisy MMU fault warning and a HW reset recovery. Regards, Qiang > > Yes, you need refcounting. > -Daniel >> >> Regards, >> Qiang >> >> > -Daniel >> > -- >> > Daniel Vetter >> > Software Engineer, Intel Corporation >> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
Qiang Yu <yuq825@gmail.com> writes: > On Thu, May 24, 2018 at 3:51 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >> On Thu, May 24, 2018 at 09:18:04AM +0800, Qiang Yu wrote: >>> On Thu, May 24, 2018 at 4:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote: >>> > On Wed, May 23, 2018 at 2:59 PM, Qiang Yu <yuq825@gmail.com> wrote: >>> >> On Wed, May 23, 2018 at 5:04 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >>> >>> On Tue, May 22, 2018 at 09:04:17AM +0800, Qiang Yu wrote: >>> >>>> On Tue, May 22, 2018 at 3:37 AM, Eric Anholt <eric@anholt.net> wrote: >>> >>>> > Qiang Yu <yuq825@gmail.com> writes: >>> >>>> > >>> >>>> >> This reverts commit 45c3d213a400c952ab7119f394c5293bb6877e6b. >>> >>>> >> >>> >>>> >> lima driver need preclose to wait all task in the context >>> >>>> >> created within closing file to finish before free all the >>> >>>> >> buffer object. Otherwise pending tesk may fail and get >>> >>>> >> noisy MMU fault message. >>> >>>> >> >>> >>>> >> Move this wait to each buffer object free function can >>> >>>> >> achieve the same result but some buffer object is shared >>> >>>> >> with other file context, but we only want to wait the >>> >>>> >> closing file context's tasks. So the implementation is >>> >>>> >> not that straight forword compared to the preclose one. >>> >>>> > >>> >>>> > You should just separate your MMU structures from drm_file, and have >>> >>>> > drm_file and the jobs using it keep a reference on them. This is what >>> >>>> > I've done in V3D as well. >>> >>>> >>> >>>> It's not the VM/MMU struct that causes this problem, it's each buffer >>> >>>> object that gets freed before task is done (postclose is after buffer free). >>> >>>> If you mean I should keep reference of all buffers for tasks, that's not >>> >>>> as simple as just waiting task done before free buffers. >>> >>> >>> >>> Why can't you do that waiting in the postclose hook? If it's the lack of >>> >>> reference-counting in your driver for gem bo, then I'd say you need to >>> >>> roll out some reference counting. Relying on the implicit reference >>> >>> provided by the core is kinda not so great (which was the reason I've >>> >>> thrown out the preclose hook). There's also per-bo open/close hooks. >>> >> >>> >> It's possible to not use preclose, but the implementation is not as simple >>> >> and straight forward as the preclose I think. There're two method I can >>> >> think of: >>> >> 1. do wait when free buffers callback unmap buffer from this process's >>> >> lima VM (wait buffer reservation object), this is fine and simple, but >>> >> there's case that this buffer is shared between two processes, so the >>> >> best way should be only waiting fences from this process, so we'd >>> >> better do some record for fences for a "perfect waiting" >>> >> 2. keep a reference of involved buffers for a task, unreference it when >>> >> task done, also keep a reference of the buffer mapping in this process's >>> >> lima VM (this is more complicated to implement) >>> >> >>> >> But if there's a preclose, just wait all this process's task done, then >>> >> unmap/free buffers, it's simple and straight forward. I'd like to hear if >>> >> there's other better way for only use postclose. >>> > >>> > Refcount your buffers. Borrowing references from other places tends to >>> > result in a maintenance headache with no end. So solution 2. >>> >>> In current lima implementation, refcount involved buffer for task is done >>> in user space. So kernel's task object don't keep that. User space >>> driver is responsible not unmap/free buffer before task is complete. This >>> works simple and fine except the case that user press Ctrl+C to terminate >>> the application which will force to close drm fd. I really don't think adding >>> buffer refcount for tasks in kernel just for this case is valuable because >>> it has no benefits for normal case but some extra load. >> >> If kernel correctness relies on refcounting you have a giantic security >> problem. You need to fix that. Kernel _must_ assume that userspace is >> evil, trying to pull it over the table. > > It is OK if evil user free/unmap the buffer when task is not done > in my implementation. It will generate a MMU fault in that case and kernel > driver will do recovery. > > So does the Ctrl+C case, if don't deal with it, just get some noisy MMU > fault warning and a HW reset recovery. How about an app rendering to shared buffers, which glFlush()es and exits cleanly but doesn't close the DRI screen? What would cause that app's rendering to get completed succesfully instead of faulting to death? You really do need to refcount the buffers used in a rendering job so they don't get freed early.
>> >> It is OK if evil user free/unmap the buffer when task is not done >> in my implementation. It will generate a MMU fault in that case and kernel >> driver will do recovery. >> >> So does the Ctrl+C case, if don't deal with it, just get some noisy MMU >> fault warning and a HW reset recovery. > > How about an app rendering to shared buffers, which glFlush()es and > exits cleanly but doesn't close the DRI screen? What would cause that > app's rendering to get completed succesfully instead of faulting to > death? Do you mean the same case as Ctrl+C when an app exit without waiting all task finished in userspace? > > You really do need to refcount the buffers used in a rendering job so > they don't get freed early. Do you mean refcount the buffers in kernel space job? This is OK but not necessary, I can wait task complete in gem_close_object which will be called by drm_release for each buffer too (I still think better waiting in preclose at once but it's gone).
Qiang Yu <yuq825@gmail.com> writes: >>> >>> It is OK if evil user free/unmap the buffer when task is not done >>> in my implementation. It will generate a MMU fault in that case and kernel >>> driver will do recovery. >>> >>> So does the Ctrl+C case, if don't deal with it, just get some noisy MMU >>> fault warning and a HW reset recovery. >> >> How about an app rendering to shared buffers, which glFlush()es and >> exits cleanly but doesn't close the DRI screen? What would cause that >> app's rendering to get completed succesfully instead of faulting to >> death? > Do you mean the same case as Ctrl+C when an app exit without waiting > all task finished in userspace? Basically the same, but I'm saying that the app is doing everything right and terminating successfully, rather than being interrupted (which you might otherwise use to justify its rendering failing) >> You really do need to refcount the buffers used in a rendering job so >> they don't get freed early. > Do you mean refcount the buffers in kernel space job? This is OK but > not necessary, I can wait task complete in gem_close_object which > will be called by drm_release for each buffer too (I still think better > waiting in preclose at once but it's gone). Just wait for all tasks to complete when any object is freed? That's going to be bad for performance. Or are you saying that you already have the connection between the task and its objects (and, if so, why aren't you just doing refcounting correctly through that path?)
Eric Anholt <eric@anholt.net> writes: > Just wait for all tasks to complete when any object is freed? That's > going to be bad for performance. Or are you saying that you already > have the connection between the task and its objects (and, if so, why > aren't you just doing refcounting correctly through that path?) How about wait on close of the DRM device?
On Fri, Jun 1, 2018 at 1:51 AM, Eric Anholt <eric@anholt.net> wrote: > Qiang Yu <yuq825@gmail.com> writes: > >>>> >>>> It is OK if evil user free/unmap the buffer when task is not done >>>> in my implementation. It will generate a MMU fault in that case and kernel >>>> driver will do recovery. >>>> >>>> So does the Ctrl+C case, if don't deal with it, just get some noisy MMU >>>> fault warning and a HW reset recovery. >>> >>> How about an app rendering to shared buffers, which glFlush()es and >>> exits cleanly but doesn't close the DRI screen? What would cause that >>> app's rendering to get completed succesfully instead of faulting to >>> death? >> Do you mean the same case as Ctrl+C when an app exit without waiting >> all task finished in userspace? > > Basically the same, but I'm saying that the app is doing everything > right and terminating successfully, rather than being interrupted (which > you might otherwise use to justify its rendering failing) I won't justify Ctrl+C. In fact I think it's also a good case which should not get a MMU fault and GPU reset because it happens even user sapce driver is right. I only think when user driver is not right deserve a MMU fault like bug/evil code which free/unmap bo before task done. I think this is also the difference of user free bo and drm close free bo in my case. > >>> You really do need to refcount the buffers used in a rendering job so >>> they don't get freed early. >> Do you mean refcount the buffers in kernel space job? This is OK but >> not necessary, I can wait task complete in gem_close_object which >> will be called by drm_release for each buffer too (I still think better >> waiting in preclose at once but it's gone). > > Just wait for all tasks to complete when any object is freed? That's > going to be bad for performance. In my case this doesn't affect performance. My implementation is user space driver will record task buffers and free them when task done. So in normal usage when the buffer is freed, there should be no task from this process using it. Wait should only happen when Ctrl+C and not close screen case. > Or are you saying that you already > have the connection between the task and its objects (and, if so, why > aren't you just doing refcounting correctly through that path?) This is by the reservation object of the buffer, I can wait on the fences in the reservation object for task done when buffer free.
On Fri, Jun 1, 2018 at 2:04 AM, Keith Packard <keithp@keithp.com> wrote: > Eric Anholt <eric@anholt.net> writes: > >> Just wait for all tasks to complete when any object is freed? That's >> going to be bad for performance. Or are you saying that you already >> have the connection between the task and its objects (and, if so, why >> aren't you just doing refcounting correctly through that path?) > > How about wait on close of the DRM device? Yeah, that's what this patch for: get preclose back and do task wait in it before free buffers. I still think this is best way in my case (and may be other drivers).
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index e394799979a6..0a43107396b9 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -361,8 +361,9 @@ void drm_lastclose(struct drm_device * dev) * * This function must be used by drivers as their &file_operations.release * method. It frees any resources associated with the open file, and calls the - * &drm_driver.postclose driver callback. If this is the last open file for the - * DRM device also proceeds to call the &drm_driver.lastclose driver callback. + * &drm_driver.preclose and &drm_driver.lastclose driver callbacks. If this is + * the last open file for the DRM device also proceeds to call the + * &drm_driver.lastclose driver callback. * * RETURNS: * @@ -382,8 +383,7 @@ int drm_release(struct inode *inode, struct file *filp) list_del(&file_priv->lhead); mutex_unlock(&dev->filelist_mutex); - if (drm_core_check_feature(dev, DRIVER_LEGACY) && - dev->driver->preclose) + if (dev->driver->preclose) dev->driver->preclose(dev, file_priv); /* ======================================================== diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index d23dcdd1bd95..8d6080f97ed4 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -107,6 +107,23 @@ struct drm_driver { */ int (*open) (struct drm_device *, struct drm_file *); + /** + * @preclose: + * + * One of the driver callbacks when a new &struct drm_file is closed. + * Useful for tearing down driver-private data structures allocated in + * @open like buffer allocators, execution contexts or similar things. + * + * Since the display/modeset side of DRM can only be owned by exactly + * one &struct drm_file (see &drm_file.is_master and &drm_device.master) + * there should never be a need to tear down any modeset related + * resources in this callback. Doing so would be a driver design bug. + * + * FIXME: It is not really clear why there's both @preclose and + * @postclose. Without a really good reason, use @postclose only. + */ + void (*preclose) (struct drm_device *, struct drm_file *file_priv); + /** * @postclose: * @@ -118,6 +135,9 @@ struct drm_driver { * one &struct drm_file (see &drm_file.is_master and &drm_device.master) * there should never be a need to tear down any modeset related * resources in this callback. Doing so would be a driver design bug. + * + * FIXME: It is not really clear why there's both @preclose and + * @postclose. Without a really good reason, use @postclose only. */ void (*postclose) (struct drm_device *, struct drm_file *); @@ -134,7 +154,7 @@ struct drm_driver { * state changes, e.g. in conjunction with the :ref:`vga_switcheroo` * infrastructure. * - * This is called after @postclose hook has been called. + * This is called after @preclose and @postclose have been called. * * NOTE: * @@ -601,7 +621,6 @@ struct drm_driver { /* List of devices hanging off this driver with stealth attach. */ struct list_head legacy_dev_list; int (*firstopen) (struct drm_device *); - void (*preclose) (struct drm_device *, struct drm_file *file_priv); int (*dma_ioctl) (struct drm_device *dev, void *data, struct drm_file *file_priv); int (*dma_quiescent) (struct drm_device *); int (*context_dtor) (struct drm_device *dev, int context);
This reverts commit 45c3d213a400c952ab7119f394c5293bb6877e6b. lima driver need preclose to wait all task in the context created within closing file to finish before free all the buffer object. Otherwise pending tesk may fail and get noisy MMU fault message. Move this wait to each buffer object free function can achieve the same result but some buffer object is shared with other file context, but we only want to wait the closing file context's tasks. So the implementation is not that straight forword compared to the preclose one. Signed-off-by: Qiang Yu <yuq825@gmail.com> --- drivers/gpu/drm/drm_file.c | 8 ++++---- include/drm/drm_drv.h | 23 +++++++++++++++++++++-- 2 files changed, 25 insertions(+), 6 deletions(-)