Message ID | 53CFB5C2.7060207@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
op 23-07-14 15:16, Maarten Lankhorst schreef: > op 23-07-14 14:36, Christian König schreef: >> Am 23.07.2014 12:52, schrieb Daniel Vetter: >>> On Wed, Jul 23, 2014 at 12:13 PM, Christian König >>> <christian.koenig@amd.com> wrote: >>>>> And the dma-buf would still have fences belonging to both drivers, and it >>>>> would still call from outside the driver. >>>> Calling from outside the driver is fine as long as the driver can do >>>> everything necessary to complete it's work and isn't forced into any ugly >>>> hacks and things that are not 100% reliable. >>>> >>>> So I don't see much other approach as integrating recovery code for not >>>> firing interrupts and some kind of lockup handling into the fence code as >>>> well. >>> That approach doesn't really work at that well since every driver has >>> it's own reset semantics. And we're trying to move away from global >>> reset to fine-grained reset. So stop-the-world reset is out of >>> fashion, at least for i915. As you said, reset is normal in gpus and >>> we're trying to make reset less invasive. I really don't see a point >>> in imposing a reset scheme upon all drivers and I think you have about >>> as much motivation to convert radeon to the scheme used by i915 as >>> I'll have for converting to the one used by radeon. If it would fit at >>> all. >> Oh my! No, I didn't wanted to suggest any global reset infrastructure. >> >> My idea was more that the fence framework provides a fence->process_signaling callback that is periodically called after enable_signaling is called to trigger manual signal processing in the driver. >> >> This would both be suitable as a fallback in case of not working interrupts as well as a chance for any driver to do necessary lockup handling. > I managed to do it without needing it to be part of the interface? I'm not sure whether radeon_fence_driver_recheck needs exclusive_lock, but if so it's a small change.. > > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h > index 7fbfd41479f1..51b646b9c8bb 100644 > --- a/drivers/gpu/drm/radeon/radeon.h > +++ b/drivers/gpu/drm/radeon/radeon.h > @@ -345,6 +345,9 @@ struct radeon_fence_driver { > uint64_t sync_seq[RADEON_NUM_RINGS]; > atomic64_t last_seq; > bool initialized; > + struct delayed_work work; > + struct radeon_device *rdev; > + unsigned ring; > }; > > struct radeon_fence_cb { > diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c > index da83f36dd708..955c825946ad 100644 > --- a/drivers/gpu/drm/radeon/radeon_fence.c > +++ b/drivers/gpu/drm/radeon/radeon_fence.c > @@ -231,6 +231,9 @@ static bool __radeon_fence_process(struct radeon_device *rdev, int ring) > } > } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq); > > + if (!wake && last_seq < last_emitted) > + schedule_delayed_work(&rdev->fence_drv[ring].work, jiffies_to_msecs(10)); > + > When trying this: if (seq < last_emitted) is probably a better check. ~Maarten
Hi Maarten, try to implement this as a replacement for specifying the RADEON_FENCE_JIFFIES_TIMEOUT on wait_event_*. And reset the timeout every time radeon_fence_process is called and not only when any of the sequences increment. I don't have the time right now to look deeper into it or help with the patch, but the general approach sounds valid to me. Regards, Christian. Am 23.07.2014 16:05, schrieb Maarten Lankhorst: > op 23-07-14 15:16, Maarten Lankhorst schreef: >> op 23-07-14 14:36, Christian König schreef: >>> Am 23.07.2014 12:52, schrieb Daniel Vetter: >>>> On Wed, Jul 23, 2014 at 12:13 PM, Christian König >>>> <christian.koenig@amd.com> wrote: >>>>>> And the dma-buf would still have fences belonging to both drivers, and it >>>>>> would still call from outside the driver. >>>>> Calling from outside the driver is fine as long as the driver can do >>>>> everything necessary to complete it's work and isn't forced into any ugly >>>>> hacks and things that are not 100% reliable. >>>>> >>>>> So I don't see much other approach as integrating recovery code for not >>>>> firing interrupts and some kind of lockup handling into the fence code as >>>>> well. >>>> That approach doesn't really work at that well since every driver has >>>> it's own reset semantics. And we're trying to move away from global >>>> reset to fine-grained reset. So stop-the-world reset is out of >>>> fashion, at least for i915. As you said, reset is normal in gpus and >>>> we're trying to make reset less invasive. I really don't see a point >>>> in imposing a reset scheme upon all drivers and I think you have about >>>> as much motivation to convert radeon to the scheme used by i915 as >>>> I'll have for converting to the one used by radeon. If it would fit at >>>> all. >>> Oh my! No, I didn't wanted to suggest any global reset infrastructure. >>> >>> My idea was more that the fence framework provides a fence->process_signaling callback that is periodically called after enable_signaling is called to trigger manual signal processing in the driver. >>> >>> This would both be suitable as a fallback in case of not working interrupts as well as a chance for any driver to do necessary lockup handling. >> I managed to do it without needing it to be part of the interface? I'm not sure whether radeon_fence_driver_recheck needs exclusive_lock, but if so it's a small change.. >> >> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h >> index 7fbfd41479f1..51b646b9c8bb 100644 >> --- a/drivers/gpu/drm/radeon/radeon.h >> +++ b/drivers/gpu/drm/radeon/radeon.h >> @@ -345,6 +345,9 @@ struct radeon_fence_driver { >> uint64_t sync_seq[RADEON_NUM_RINGS]; >> atomic64_t last_seq; >> bool initialized; >> + struct delayed_work work; >> + struct radeon_device *rdev; >> + unsigned ring; >> }; >> >> struct radeon_fence_cb { >> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c >> index da83f36dd708..955c825946ad 100644 >> --- a/drivers/gpu/drm/radeon/radeon_fence.c >> +++ b/drivers/gpu/drm/radeon/radeon_fence.c >> @@ -231,6 +231,9 @@ static bool __radeon_fence_process(struct radeon_device *rdev, int ring) >> } >> } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq); >> >> + if (!wake && last_seq < last_emitted) >> + schedule_delayed_work(&rdev->fence_drv[ring].work, jiffies_to_msecs(10)); >> + >> > When trying this: if (seq < last_emitted) is probably a better check. > > ~Maarten >
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 7fbfd41479f1..51b646b9c8bb 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -345,6 +345,9 @@ struct radeon_fence_driver { uint64_t sync_seq[RADEON_NUM_RINGS]; atomic64_t last_seq; bool initialized; + struct delayed_work work; + struct radeon_device *rdev; + unsigned ring; }; struct radeon_fence_cb { diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index da83f36dd708..955c825946ad 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -231,6 +231,9 @@ static bool __radeon_fence_process(struct radeon_device *rdev, int ring) } } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq); + if (!wake && last_seq < last_emitted) + schedule_delayed_work(&rdev->fence_drv[ring].work, jiffies_to_msecs(10)); + return wake; } @@ -815,6 +818,14 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring) return 0; } +static void radeon_fence_driver_recheck(struct work_struct *work) +{ + struct radeon_fence_driver *drv = container_of(work, struct radeon_fence_driver, work.work); + + DRM_ERROR("omg, working!\n"); + radeon_fence_process(drv->rdev, drv->ring); +} + /** * radeon_fence_driver_init_ring - init the fence driver * for the requested ring. @@ -836,6 +847,10 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring) rdev->fence_drv[ring].sync_seq[i] = 0; atomic64_set(&rdev->fence_drv[ring].last_seq, 0); rdev->fence_drv[ring].initialized = false; + + rdev->fence_drv[ring].ring = ring; + rdev->fence_drv[ring].rdev = rdev; + INIT_DELAYED_WORK(&rdev->fence_drv[ring].work, radeon_fence_driver_recheck); } /** @@ -880,6 +895,7 @@ void radeon_fence_driver_fini(struct radeon_device *rdev) for (ring = 0; ring < RADEON_NUM_RINGS; ring++) { if (!rdev->fence_drv[ring].initialized) continue; + cancel_delayed_work_sync(&rdev->fence_drv[ring].work); r = radeon_fence_wait_empty(rdev, ring); if (r) { /* no need to trigger GPU reset as we are unloading */ diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c index 16807afab362..85391ddd3ce9 100644 --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c @@ -331,7 +331,7 @@ void radeon_irq_kms_sw_irq_get(struct radeon_device *rdev, int ring) { unsigned long irqflags; - if (!rdev->ddev->irq_enabled) +// if (!rdev->ddev->irq_enabled) return; if (atomic_inc_return(&rdev->irq.ring_int[ring]) == 1) { @@ -355,7 +355,7 @@ void radeon_irq_kms_sw_irq_put(struct radeon_device *rdev, int ring) { unsigned long irqflags; - if (!rdev->ddev->irq_enabled) +// if (!rdev->ddev->irq_enabled) return; if (atomic_dec_and_test(&rdev->irq.ring_int[ring])) {