Message ID | 1378132262-19453-1-git-send-email-maarten.lankhorst@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Sep 3, 2013 at 12:31 AM, Maarten Lankhorst <maarten.lankhorst@canonical.com> wrote: > This increases the chance slightly that recovery from lockup can happen > succesfully. I'd *really* love to see proof of this. When channels die, all outstanding fences are marked as signalled. This should do absolutely nothing... > > Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> > --- > drivers/gpu/drm/nouveau/nv84_fence.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nv84_fence.c b/drivers/gpu/drm/nouveau/nv84_fence.c > index 2cf0ade..daf4b18 100644 > --- a/drivers/gpu/drm/nouveau/nv84_fence.c > +++ b/drivers/gpu/drm/nouveau/nv84_fence.c > @@ -122,8 +122,11 @@ nv84_fence_context_del(struct nouveau_channel *chan) > struct drm_device *dev = chan->drm->dev; > struct nv84_fence_priv *priv = chan->drm->fence; > struct nv84_fence_chan *fctx = chan->fence; > + struct nouveau_fifo_chan *fifo = (void *)chan->object; > int i; > > + nouveau_bo_wr32(priv->bo, fifo->chid * 16/4, fctx->base.sequence); > + > for (i = 0; i < dev->mode_config.num_crtc; i++) { > struct nouveau_bo *bo = nv50_display_crtc_sema(dev, i); > nouveau_bo_vma_del(bo, &fctx->dispc_vma[i]); > @@ -168,7 +171,7 @@ nv84_fence_context_new(struct nouveau_channel *chan) > ret = nouveau_bo_vma_add(bo, client->vm, &fctx->dispc_vma[i]); > } > > - nouveau_bo_wr32(priv->bo, fifo->chid * 16/4, 0x00000000); > + fctx->base.sequence = nouveau_bo_rd32(priv->bo, fifo->chid * 16/4); > > if (ret) > nv84_fence_context_del(chan); > -- > 1.8.3.4 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
Op 04-09-13 05:21, Ben Skeggs schreef: > On Tue, Sep 3, 2013 at 12:31 AM, Maarten Lankhorst > <maarten.lankhorst@canonical.com> wrote: >> This increases the chance slightly that recovery from lockup can happen >> succesfully. > I'd *really* love to see proof of this. When channels die, all > outstanding fences are marked as signalled. This should do absolutely > nothing... nv84+ heavily rely on fences though, and a race like this is possible: - channel 0 uses a bo from channel 1, queues a wait somewhere in the command stream for it. - channel 1 dies cleanly, but userspace creates a new channel in its place, fence counter is reset to 0. - channel 0 reaches the NV84_SUBCHAN_SEMAPHORE_TRIGGER.ACQUIRE_GEQUAL op, waits on fence in channel 1 to signal forever. Channel 0 could be the global drm channel used for buffer moves, which would result in a hang. This may seem unlikely, but I believe that parallel piglit runs could trigger it. If not, simply creating an operation that takes a few seconds in channel 0 and then queuing a command that uses a bo from channel 1 while chan1 is still busy, then deleting/recreating chan1 could trigger it. ~Maarten
On Wed, Sep 4, 2013 at 10:37 PM, Maarten Lankhorst <maarten.lankhorst@canonical.com> wrote: > Op 04-09-13 05:21, Ben Skeggs schreef: >> On Tue, Sep 3, 2013 at 12:31 AM, Maarten Lankhorst >> <maarten.lankhorst@canonical.com> wrote: >>> This increases the chance slightly that recovery from lockup can happen >>> succesfully. >> I'd *really* love to see proof of this. When channels die, all >> outstanding fences are marked as signalled. This should do absolutely >> nothing... > nv84+ heavily rely on fences though, and a race like this is possible: > - channel 0 uses a bo from channel 1, queues a wait somewhere in the command stream for it. > - channel 1 dies cleanly, but userspace creates a new channel in its place, fence counter is reset to 0. > - channel 0 reaches the NV84_SUBCHAN_SEMAPHORE_TRIGGER.ACQUIRE_GEQUAL op, waits on fence in channel 1 to signal forever. Ok, this isn't exactly the issue you implied in the commit message. But yes, this could possibly be an issue for sure. I don't think this is the right way to fix it however. I'll have a bit of a think on the problem and see what I can come up with. Thanks, Ben. > > Channel 0 could be the global drm channel used for buffer moves, which would result in a hang. This may seem unlikely, but I believe that parallel piglit runs could trigger it. > > If not, simply creating an operation that takes a few seconds in channel 0 and then queuing a command that uses a bo from channel 1 while chan1 is still busy, then deleting/recreating chan1 could trigger it. > > ~Maarten >
diff --git a/drivers/gpu/drm/nouveau/nv84_fence.c b/drivers/gpu/drm/nouveau/nv84_fence.c index 2cf0ade..daf4b18 100644 --- a/drivers/gpu/drm/nouveau/nv84_fence.c +++ b/drivers/gpu/drm/nouveau/nv84_fence.c @@ -122,8 +122,11 @@ nv84_fence_context_del(struct nouveau_channel *chan) struct drm_device *dev = chan->drm->dev; struct nv84_fence_priv *priv = chan->drm->fence; struct nv84_fence_chan *fctx = chan->fence; + struct nouveau_fifo_chan *fifo = (void *)chan->object; int i; + nouveau_bo_wr32(priv->bo, fifo->chid * 16/4, fctx->base.sequence); + for (i = 0; i < dev->mode_config.num_crtc; i++) { struct nouveau_bo *bo = nv50_display_crtc_sema(dev, i); nouveau_bo_vma_del(bo, &fctx->dispc_vma[i]); @@ -168,7 +171,7 @@ nv84_fence_context_new(struct nouveau_channel *chan) ret = nouveau_bo_vma_add(bo, client->vm, &fctx->dispc_vma[i]); } - nouveau_bo_wr32(priv->bo, fifo->chid * 16/4, 0x00000000); + fctx->base.sequence = nouveau_bo_rd32(priv->bo, fifo->chid * 16/4); if (ret) nv84_fence_context_del(chan);
This increases the chance slightly that recovery from lockup can happen succesfully. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> --- drivers/gpu/drm/nouveau/nv84_fence.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)