Message ID | 20220824192913.2425634-1-jsnitsel@redhat.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | dmaengine: idxd: Set workqueue state to disabled before trying to re-enable | expand |
On 8/24/2022 12:29 PM, Jerry Snitselaar wrote: > For a software reset idxd_device_reinit() is called, which will walk > the device workqueues to see which ones were enabled, and try to > re-enable them. It keys off wq->state being iDXD_WQ_ENABLED, but the > first thing idxd_enable_wq() will do is see that the state of the > workqueue is enabled, and return 0 instead of attempting to issue > a command to enable the workqueue. > > So once a workqueue is found that needs to be re-enabled, > set the state to disabled prior to calling idxd_enable_wq(). > This would accurately reflect the state if the enable fails > as well. > > Cc: Fenghua Yu <fenghua.yu@intel.com> > Cc: Dave Jiang <dave.jiang@intel.com> > Cc: Vinod Koul <vkoul@kernel.org> > Cc: dmaengine@vger.kernel.org > Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel data accelerators") > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> > --- > drivers/dma/idxd/irq.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c > index 743ead5ebc57..723eeb5328d6 100644 > --- a/drivers/dma/idxd/irq.c > +++ b/drivers/dma/idxd/irq.c > @@ -52,6 +52,7 @@ static void idxd_device_reinit(struct work_struct *work) > struct idxd_wq *wq = idxd->wqs[i]; > > if (wq->state == IDXD_WQ_ENABLED) { > + wq->state = IDXD_WQ_DISABLED; Might be better off to insert this line in idxd_wq_disable_cleanup(). I think that should put it in sane state. > rc = idxd_wq_enable(wq); > if (rc < 0) { > dev_warn(dev, "Unable to re-enable wq %s\n",
On Wed, Aug 24, 2022 at 01:29:03PM -0700, Dave Jiang wrote: > > On 8/24/2022 12:29 PM, Jerry Snitselaar wrote: > > For a software reset idxd_device_reinit() is called, which will walk > > the device workqueues to see which ones were enabled, and try to > > re-enable them. It keys off wq->state being iDXD_WQ_ENABLED, but the > > first thing idxd_enable_wq() will do is see that the state of the > > workqueue is enabled, and return 0 instead of attempting to issue > > a command to enable the workqueue. > > > > So once a workqueue is found that needs to be re-enabled, > > set the state to disabled prior to calling idxd_enable_wq(). > > This would accurately reflect the state if the enable fails > > as well. > > > > Cc: Fenghua Yu <fenghua.yu@intel.com> > > Cc: Dave Jiang <dave.jiang@intel.com> > > Cc: Vinod Koul <vkoul@kernel.org> > > Cc: dmaengine@vger.kernel.org > > Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel data accelerators") > > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> > > --- > > drivers/dma/idxd/irq.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c > > index 743ead5ebc57..723eeb5328d6 100644 > > --- a/drivers/dma/idxd/irq.c > > +++ b/drivers/dma/idxd/irq.c > > @@ -52,6 +52,7 @@ static void idxd_device_reinit(struct work_struct *work) > > struct idxd_wq *wq = idxd->wqs[i]; > > if (wq->state == IDXD_WQ_ENABLED) { > > + wq->state = IDXD_WQ_DISABLED; > Might be better off to insert this line in idxd_wq_disable_cleanup(). I > think that should put it in sane state. I don't think that is called in the code path that I was lookng at. I've been looking at this bit of process_misc_interrupts(): halt: gensts.bits = ioread32(idxd->reg_base + IDXD_GENSTATS_OFFSET); if (gensts.state == IDXD_DEVICE_STATE_HALT) { idxd->state = IDXD_DEV_HALTED; if (gensts.reset_type == IDXD_DEVICE_RESET_SOFTWARE) { /* * If we need a software reset, we will throw the work * on a system workqueue in order to allow interrupts * for the device command completions. */ INIT_WORK(&idxd->work, idxd_device_reinit); queue_work(idxd->wq, &idxd->work); } else { idxd->state = IDXD_DEV_HALTED; idxd_wqs_quiesce(idxd); idxd_wqs_unmap_portal(idxd); spin_lock(&idxd->dev_lock); idxd_device_clear_state(idxd); dev_err(&idxd->pdev->dev, "idxd halted, need %s.\n", gensts.reset_type == IDXD_DEVICE_RESET_FLR ? "FLR" : "system reset"); spin_unlock(&idxd->dev_lock); return -ENXIO; } } return 0; } So it sees that the device is halted, and sticks idxd_device_reinint() on that workqueue. The idxd_device_reinit() has this loop to re-enable the idxd wqs: for (i = 0; i < idxd->max_wqs; i++) { struct idxd_wq *wq = idxd->wqs[i]; if (wq->state == IDXD_WQ_ENABLED) { wq->state = IDXD_WQ_DISABLED; rc = idxd_wq_enable(wq); if (rc < 0) { dev_warn(dev, "Unable to re-enable wq %s\n", dev_name(wq_confdev(wq))); } } } Once you go into idxd_wq_enable() though you get this check at the beginning: if (wq->state == IDXD_WQ_ENABLED) { dev_dbg(dev, "WQ %d already enabled\n", wq->id); return 0; } So IIUC it sees the device is halted, goes to reset it, figures out a wq should be re-enabled, calls idxd_wq_enable() which hits the check, returns 0 and the wq is never really re-enabled, though it will still have wq state set to IDXD_WQ_ENABLED. Or am I missing something? Regards, Jerry > > rc = idxd_wq_enable(wq); > > if (rc < 0) { > > dev_warn(dev, "Unable to re-enable wq %s\n",
On 8/24/2022 2:16 PM, Jerry Snitselaar wrote: > On Wed, Aug 24, 2022 at 01:29:03PM -0700, Dave Jiang wrote: >> On 8/24/2022 12:29 PM, Jerry Snitselaar wrote: >>> For a software reset idxd_device_reinit() is called, which will walk >>> the device workqueues to see which ones were enabled, and try to >>> re-enable them. It keys off wq->state being iDXD_WQ_ENABLED, but the >>> first thing idxd_enable_wq() will do is see that the state of the >>> workqueue is enabled, and return 0 instead of attempting to issue >>> a command to enable the workqueue. >>> >>> So once a workqueue is found that needs to be re-enabled, >>> set the state to disabled prior to calling idxd_enable_wq(). >>> This would accurately reflect the state if the enable fails >>> as well. >>> >>> Cc: Fenghua Yu <fenghua.yu@intel.com> >>> Cc: Dave Jiang <dave.jiang@intel.com> >>> Cc: Vinod Koul <vkoul@kernel.org> >>> Cc: dmaengine@vger.kernel.org >>> Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel data accelerators") >>> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> >>> --- >>> drivers/dma/idxd/irq.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c >>> index 743ead5ebc57..723eeb5328d6 100644 >>> --- a/drivers/dma/idxd/irq.c >>> +++ b/drivers/dma/idxd/irq.c >>> @@ -52,6 +52,7 @@ static void idxd_device_reinit(struct work_struct *work) >>> struct idxd_wq *wq = idxd->wqs[i]; >>> if (wq->state == IDXD_WQ_ENABLED) { >>> + wq->state = IDXD_WQ_DISABLED; >> Might be better off to insert this line in idxd_wq_disable_cleanup(). I >> think that should put it in sane state. > I don't think that is called in the code path that I was lookng at. I've been > looking at this bit of process_misc_interrupts(): > > halt: > gensts.bits = ioread32(idxd->reg_base + IDXD_GENSTATS_OFFSET); > if (gensts.state == IDXD_DEVICE_STATE_HALT) { > idxd->state = IDXD_DEV_HALTED; > if (gensts.reset_type == IDXD_DEVICE_RESET_SOFTWARE) { > /* > * If we need a software reset, we will throw the work > * on a system workqueue in order to allow interrupts > * for the device command completions. > */ > INIT_WORK(&idxd->work, idxd_device_reinit); > queue_work(idxd->wq, &idxd->work); > } else { > idxd->state = IDXD_DEV_HALTED; > idxd_wqs_quiesce(idxd); > idxd_wqs_unmap_portal(idxd); > spin_lock(&idxd->dev_lock); > idxd_device_clear_state(idxd); > dev_err(&idxd->pdev->dev, > "idxd halted, need %s.\n", > gensts.reset_type == IDXD_DEVICE_RESET_FLR ? > "FLR" : "system reset"); > spin_unlock(&idxd->dev_lock); > return -ENXIO; > } > } > > return 0; > } > > So it sees that the device is halted, and sticks idxd_device_reinint() on that > workqueue. The idxd_device_reinit() has this loop to re-enable the idxd wqs: idxd_device_reinit() should called idxd_device_reset() first. And that should at some point call idxd_wq_disable_cleanup() and clean up the states. https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/irq.c#L42 https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L725 https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L711 https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L376 So if we stick the wq state reset in there, it should show up as "disabled" by the time we try to enable the WQs again. Does that look reasonable? > > for (i = 0; i < idxd->max_wqs; i++) { > struct idxd_wq *wq = idxd->wqs[i]; > > if (wq->state == IDXD_WQ_ENABLED) { > wq->state = IDXD_WQ_DISABLED; > rc = idxd_wq_enable(wq); > if (rc < 0) { > dev_warn(dev, "Unable to re-enable wq %s\n", > dev_name(wq_confdev(wq))); > } > } > } > > Once you go into idxd_wq_enable() though you get this check at the beginning: > > if (wq->state == IDXD_WQ_ENABLED) { > dev_dbg(dev, "WQ %d already enabled\n", wq->id); > return 0; > } > > So IIUC it sees the device is halted, goes to reset it, figures out a wq > should be re-enabled, calls idxd_wq_enable() which hits the check, returns > 0 and the wq is never really re-enabled, though it will still have wq state > set to IDXD_WQ_ENABLED. > > Or am I missing something? > > Regards, > Jerry > >>> rc = idxd_wq_enable(wq); >>> if (rc < 0) { >>> dev_warn(dev, "Unable to re-enable wq %s\n",
On Wed, 2022-08-24 at 14:59 -0700, Dave Jiang wrote: > > On 8/24/2022 2:16 PM, Jerry Snitselaar wrote: > > On Wed, Aug 24, 2022 at 01:29:03PM -0700, Dave Jiang wrote: > > > On 8/24/2022 12:29 PM, Jerry Snitselaar wrote: > > > > For a software reset idxd_device_reinit() is called, which will > > > > walk > > > > the device workqueues to see which ones were enabled, and try > > > > to > > > > re-enable them. It keys off wq->state being iDXD_WQ_ENABLED, > > > > but the > > > > first thing idxd_enable_wq() will do is see that the state of > > > > the > > > > workqueue is enabled, and return 0 instead of attempting to > > > > issue > > > > a command to enable the workqueue. > > > > > > > > So once a workqueue is found that needs to be re-enabled, > > > > set the state to disabled prior to calling idxd_enable_wq(). > > > > This would accurately reflect the state if the enable fails > > > > as well. > > > > > > > > Cc: Fenghua Yu <fenghua.yu@intel.com> > > > > Cc: Dave Jiang <dave.jiang@intel.com> > > > > Cc: Vinod Koul <vkoul@kernel.org> > > > > Cc: dmaengine@vger.kernel.org > > > > Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel > > > > data accelerators") > > > > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> > > > > --- > > > > drivers/dma/idxd/irq.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c > > > > index 743ead5ebc57..723eeb5328d6 100644 > > > > --- a/drivers/dma/idxd/irq.c > > > > +++ b/drivers/dma/idxd/irq.c > > > > @@ -52,6 +52,7 @@ static void idxd_device_reinit(struct > > > > work_struct *work) > > > > struct idxd_wq *wq = idxd->wqs[i]; > > > > if (wq->state == IDXD_WQ_ENABLED) { > > > > + wq->state = IDXD_WQ_DISABLED; > > > Might be better off to insert this line in > > > idxd_wq_disable_cleanup(). I > > > think that should put it in sane state. > > I don't think that is called in the code path that I was lookng at. > > I've been > > looking at this bit of process_misc_interrupts(): > > > > halt: > > gensts.bits = ioread32(idxd->reg_base + > > IDXD_GENSTATS_OFFSET); > > if (gensts.state == IDXD_DEVICE_STATE_HALT) { > > idxd->state = IDXD_DEV_HALTED; > > if (gensts.reset_type == > > IDXD_DEVICE_RESET_SOFTWARE) { > > /* > > * If we need a software reset, we will > > throw the work > > * on a system workqueue in order to allow > > interrupts > > * for the device command completions. > > */ > > INIT_WORK(&idxd->work, idxd_device_reinit); > > queue_work(idxd->wq, &idxd->work); > > } else { > > idxd->state = IDXD_DEV_HALTED; > > idxd_wqs_quiesce(idxd); > > idxd_wqs_unmap_portal(idxd); > > spin_lock(&idxd->dev_lock); > > idxd_device_clear_state(idxd); > > dev_err(&idxd->pdev->dev, > > "idxd halted, need %s.\n", > > gensts.reset_type == > > IDXD_DEVICE_RESET_FLR ? > > "FLR" : "system reset"); > > spin_unlock(&idxd->dev_lock); > > return -ENXIO; > > } > > } > > > > return 0; > > } > > > > So it sees that the device is halted, and sticks > > idxd_device_reinint() on that > > workqueue. The idxd_device_reinit() has this loop to re-enable the > > idxd wqs: > > idxd_device_reinit() should called idxd_device_reset() first. And > that > should at some point call idxd_wq_disable_cleanup() and clean up the > states. > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/irq.c#L42 > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L725 > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L711 > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L376 > > So if we stick the wq state reset in there, it should show up as > "disabled" by the time we try to enable the WQs again. Does that look > reasonable? > Ah, yeah I see that now. So, if it does set the state to disabled in idxd_wq_disable_cleanup(), does it have another means to track which wqs need to be re-enabled for that loop that happens after the idxd_device_reset() call? > > > > > for (i = 0; i < idxd->max_wqs; i++) { > > struct idxd_wq *wq = idxd->wqs[i]; > > > > if (wq->state == IDXD_WQ_ENABLED) { > > wq->state = IDXD_WQ_DISABLED; > > rc = idxd_wq_enable(wq); > > if (rc < 0) { > > dev_warn(dev, "Unable to re-enable > > wq %s\n", > > dev_name(wq_confdev(wq))); > > } > > } > > } > > > > Once you go into idxd_wq_enable() though you get this check at the > > beginning: > > > > if (wq->state == IDXD_WQ_ENABLED) { > > dev_dbg(dev, "WQ %d already enabled\n", wq->id); > > return 0; > > } > > > > So IIUC it sees the device is halted, goes to reset it, figures out > > a wq > > should be re-enabled, calls idxd_wq_enable() which hits the check, > > returns > > 0 and the wq is never really re-enabled, though it will still have > > wq state > > set to IDXD_WQ_ENABLED. > > > > Or am I missing something? > > > > Regards, > > Jerry > > > > > > rc = idxd_wq_enable(wq); > > > > if (rc < 0) { > > > > dev_warn(dev, "Unable to re- > > > > enable wq %s\n", >
On 8/24/2022 3:07 PM, Jerry Snitselaar wrote: > On Wed, 2022-08-24 at 14:59 -0700, Dave Jiang wrote: >> On 8/24/2022 2:16 PM, Jerry Snitselaar wrote: >>> On Wed, Aug 24, 2022 at 01:29:03PM -0700, Dave Jiang wrote: >>>> On 8/24/2022 12:29 PM, Jerry Snitselaar wrote: >>>>> For a software reset idxd_device_reinit() is called, which will >>>>> walk >>>>> the device workqueues to see which ones were enabled, and try >>>>> to >>>>> re-enable them. It keys off wq->state being iDXD_WQ_ENABLED, >>>>> but the >>>>> first thing idxd_enable_wq() will do is see that the state of >>>>> the >>>>> workqueue is enabled, and return 0 instead of attempting to >>>>> issue >>>>> a command to enable the workqueue. >>>>> >>>>> So once a workqueue is found that needs to be re-enabled, >>>>> set the state to disabled prior to calling idxd_enable_wq(). >>>>> This would accurately reflect the state if the enable fails >>>>> as well. >>>>> >>>>> Cc: Fenghua Yu <fenghua.yu@intel.com> >>>>> Cc: Dave Jiang <dave.jiang@intel.com> >>>>> Cc: Vinod Koul <vkoul@kernel.org> >>>>> Cc: dmaengine@vger.kernel.org >>>>> Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel >>>>> data accelerators") >>>>> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> >>>>> --- >>>>> drivers/dma/idxd/irq.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c >>>>> index 743ead5ebc57..723eeb5328d6 100644 >>>>> --- a/drivers/dma/idxd/irq.c >>>>> +++ b/drivers/dma/idxd/irq.c >>>>> @@ -52,6 +52,7 @@ static void idxd_device_reinit(struct >>>>> work_struct *work) >>>>> struct idxd_wq *wq = idxd->wqs[i]; >>>>> if (wq->state == IDXD_WQ_ENABLED) { >>>>> + wq->state = IDXD_WQ_DISABLED; >>>> Might be better off to insert this line in >>>> idxd_wq_disable_cleanup(). I >>>> think that should put it in sane state. >>> I don't think that is called in the code path that I was lookng at. >>> I've been >>> looking at this bit of process_misc_interrupts(): >>> >>> halt: >>> gensts.bits = ioread32(idxd->reg_base + >>> IDXD_GENSTATS_OFFSET); >>> if (gensts.state == IDXD_DEVICE_STATE_HALT) { >>> idxd->state = IDXD_DEV_HALTED; >>> if (gensts.reset_type == >>> IDXD_DEVICE_RESET_SOFTWARE) { >>> /* >>> * If we need a software reset, we will >>> throw the work >>> * on a system workqueue in order to allow >>> interrupts >>> * for the device command completions. >>> */ >>> INIT_WORK(&idxd->work, idxd_device_reinit); >>> queue_work(idxd->wq, &idxd->work); >>> } else { >>> idxd->state = IDXD_DEV_HALTED; >>> idxd_wqs_quiesce(idxd); >>> idxd_wqs_unmap_portal(idxd); >>> spin_lock(&idxd->dev_lock); >>> idxd_device_clear_state(idxd); >>> dev_err(&idxd->pdev->dev, >>> "idxd halted, need %s.\n", >>> gensts.reset_type == >>> IDXD_DEVICE_RESET_FLR ? >>> "FLR" : "system reset"); >>> spin_unlock(&idxd->dev_lock); >>> return -ENXIO; >>> } >>> } >>> >>> return 0; >>> } >>> >>> So it sees that the device is halted, and sticks >>> idxd_device_reinint() on that >>> workqueue. The idxd_device_reinit() has this loop to re-enable the >>> idxd wqs: >> idxd_device_reinit() should called idxd_device_reset() first. And >> that >> should at some point call idxd_wq_disable_cleanup() and clean up the >> states. >> >> https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/irq.c#L42 >> >> https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L725 >> >> https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L711 >> >> https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L376 >> >> So if we stick the wq state reset in there, it should show up as >> "disabled" by the time we try to enable the WQs again. Does that look >> reasonable? >> > Ah, yeah I see that now. So, if it does set the state to disabled in > idxd_wq_disable_cleanup(), does it have another means to track which > wqs need to be re-enabled for that loop that happens after the > idxd_device_reset() call? Oh I see what you mean... So we can either do what you did or create a mask and mark the WQ that are "enabled" before reset. Maybe that's cleaner rather than relying on the side effect of the WQ state isn't cleared? Thoughts? > >>> for (i = 0; i < idxd->max_wqs; i++) { >>> struct idxd_wq *wq = idxd->wqs[i]; >>> >>> if (wq->state == IDXD_WQ_ENABLED) { >>> wq->state = IDXD_WQ_DISABLED; >>> rc = idxd_wq_enable(wq); >>> if (rc < 0) { >>> dev_warn(dev, "Unable to re-enable >>> wq %s\n", >>> dev_name(wq_confdev(wq))); >>> } >>> } >>> } >>> >>> Once you go into idxd_wq_enable() though you get this check at the >>> beginning: >>> >>> if (wq->state == IDXD_WQ_ENABLED) { >>> dev_dbg(dev, "WQ %d already enabled\n", wq->id); >>> return 0; >>> } >>> >>> So IIUC it sees the device is halted, goes to reset it, figures out >>> a wq >>> should be re-enabled, calls idxd_wq_enable() which hits the check, >>> returns >>> 0 and the wq is never really re-enabled, though it will still have >>> wq state >>> set to IDXD_WQ_ENABLED. >>> >>> Or am I missing something? >>> >>> Regards, >>> Jerry >>> >>>>> rc = idxd_wq_enable(wq); >>>>> if (rc < 0) { >>>>> dev_warn(dev, "Unable to re- >>>>> enable wq %s\n",
On Wed, 2022-08-24 at 15:19 -0700, Dave Jiang wrote: > > On 8/24/2022 3:07 PM, Jerry Snitselaar wrote: > > On Wed, 2022-08-24 at 14:59 -0700, Dave Jiang wrote: > > > On 8/24/2022 2:16 PM, Jerry Snitselaar wrote: > > > > On Wed, Aug 24, 2022 at 01:29:03PM -0700, Dave Jiang wrote: > > > > > On 8/24/2022 12:29 PM, Jerry Snitselaar wrote: > > > > > > For a software reset idxd_device_reinit() is called, which > > > > > > will > > > > > > walk > > > > > > the device workqueues to see which ones were enabled, and > > > > > > try > > > > > > to > > > > > > re-enable them. It keys off wq->state being > > > > > > iDXD_WQ_ENABLED, > > > > > > but the > > > > > > first thing idxd_enable_wq() will do is see that the state > > > > > > of > > > > > > the > > > > > > workqueue is enabled, and return 0 instead of attempting to > > > > > > issue > > > > > > a command to enable the workqueue. > > > > > > > > > > > > So once a workqueue is found that needs to be re-enabled, > > > > > > set the state to disabled prior to calling > > > > > > idxd_enable_wq(). > > > > > > This would accurately reflect the state if the enable fails > > > > > > as well. > > > > > > > > > > > > Cc: Fenghua Yu <fenghua.yu@intel.com> > > > > > > Cc: Dave Jiang <dave.jiang@intel.com> > > > > > > Cc: Vinod Koul <vkoul@kernel.org> > > > > > > Cc: dmaengine@vger.kernel.org > > > > > > Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for > > > > > > Intel > > > > > > data accelerators") > > > > > > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> > > > > > > --- > > > > > > drivers/dma/idxd/irq.c | 1 + > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > diff --git a/drivers/dma/idxd/irq.c > > > > > > b/drivers/dma/idxd/irq.c > > > > > > index 743ead5ebc57..723eeb5328d6 100644 > > > > > > --- a/drivers/dma/idxd/irq.c > > > > > > +++ b/drivers/dma/idxd/irq.c > > > > > > @@ -52,6 +52,7 @@ static void idxd_device_reinit(struct > > > > > > work_struct *work) > > > > > > struct idxd_wq *wq = idxd->wqs[i]; > > > > > > if (wq->state == IDXD_WQ_ENABLED) { > > > > > > + wq->state = IDXD_WQ_DISABLED; > > > > > Might be better off to insert this line in > > > > > idxd_wq_disable_cleanup(). I > > > > > think that should put it in sane state. > > > > I don't think that is called in the code path that I was lookng > > > > at. > > > > I've been > > > > looking at this bit of process_misc_interrupts(): > > > > > > > > halt: > > > > gensts.bits = ioread32(idxd->reg_base + > > > > IDXD_GENSTATS_OFFSET); > > > > if (gensts.state == IDXD_DEVICE_STATE_HALT) { > > > > idxd->state = IDXD_DEV_HALTED; > > > > if (gensts.reset_type == > > > > IDXD_DEVICE_RESET_SOFTWARE) { > > > > /* > > > > * If we need a software reset, we > > > > will > > > > throw the work > > > > * on a system workqueue in order to > > > > allow > > > > interrupts > > > > * for the device command completions. > > > > */ > > > > INIT_WORK(&idxd->work, > > > > idxd_device_reinit); > > > > queue_work(idxd->wq, &idxd->work); > > > > } else { > > > > idxd->state = IDXD_DEV_HALTED; > > > > idxd_wqs_quiesce(idxd); > > > > idxd_wqs_unmap_portal(idxd); > > > > spin_lock(&idxd->dev_lock); > > > > idxd_device_clear_state(idxd); > > > > dev_err(&idxd->pdev->dev, > > > > "idxd halted, need %s.\n", > > > > gensts.reset_type == > > > > IDXD_DEVICE_RESET_FLR ? > > > > "FLR" : "system reset"); > > > > spin_unlock(&idxd->dev_lock); > > > > return -ENXIO; > > > > } > > > > } > > > > > > > > return 0; > > > > } > > > > > > > > So it sees that the device is halted, and sticks > > > > idxd_device_reinint() on that > > > > workqueue. The idxd_device_reinit() has this loop to re-enable > > > > the > > > > idxd wqs: > > > idxd_device_reinit() should called idxd_device_reset() first. And > > > that > > > should at some point call idxd_wq_disable_cleanup() and clean up > > > the > > > states. > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/irq.c#L42 > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L725 > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L711 > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L376 > > > > > > So if we stick the wq state reset in there, it should show up as > > > "disabled" by the time we try to enable the WQs again. Does that > > > look > > > reasonable? > > > > > Ah, yeah I see that now. So, if it does set the state to disabled > > in > > idxd_wq_disable_cleanup(), does it have another means to track > > which > > wqs need to be re-enabled for that loop that happens after the > > idxd_device_reset() call? > > Oh I see what you mean... So we can either do what you did or create > a > mask and mark the WQ that are "enabled" before reset. Maybe that's > cleaner rather than relying on the side effect of the WQ state isn't > cleared? Thoughts? > That would be better I think. > > > > > > > for (i = 0; i < idxd->max_wqs; i++) { > > > > struct idxd_wq *wq = idxd->wqs[i]; > > > > > > > > if (wq->state == IDXD_WQ_ENABLED) { > > > > wq->state = IDXD_WQ_DISABLED; > > > > rc = idxd_wq_enable(wq); > > > > if (rc < 0) { > > > > dev_warn(dev, "Unable to re- > > > > enable > > > > wq %s\n", > > > > > > > > dev_name(wq_confdev(wq))); > > > > } > > > > } > > > > } > > > > > > > > Once you go into idxd_wq_enable() though you get this check at > > > > the > > > > beginning: > > > > > > > > if (wq->state == IDXD_WQ_ENABLED) { > > > > dev_dbg(dev, "WQ %d already enabled\n", wq- > > > > >id); > > > > return 0; > > > > } > > > > > > > > So IIUC it sees the device is halted, goes to reset it, figures > > > > out > > > > a wq > > > > should be re-enabled, calls idxd_wq_enable() which hits the > > > > check, > > > > returns > > > > 0 and the wq is never really re-enabled, though it will still > > > > have > > > > wq state > > > > set to IDXD_WQ_ENABLED. > > > > > > > > Or am I missing something? > > > > > > > > Regards, > > > > Jerry > > > > > > > > > > rc = idxd_wq_enable(wq); > > > > > > if (rc < 0) { > > > > > > dev_warn(dev, "Unable to > > > > > > re- > > > > > > enable wq %s\n", >
On Wed, Aug 24, 2022 at 03:19:51PM -0700, Dave Jiang wrote: > > On 8/24/2022 3:07 PM, Jerry Snitselaar wrote: > > On Wed, 2022-08-24 at 14:59 -0700, Dave Jiang wrote: > > > On 8/24/2022 2:16 PM, Jerry Snitselaar wrote: > > > > On Wed, Aug 24, 2022 at 01:29:03PM -0700, Dave Jiang wrote: > > > > > On 8/24/2022 12:29 PM, Jerry Snitselaar wrote: > > > > > > For a software reset idxd_device_reinit() is called, which will > > > > > > walk > > > > > > the device workqueues to see which ones were enabled, and try > > > > > > to > > > > > > re-enable them. It keys off wq->state being iDXD_WQ_ENABLED, > > > > > > but the > > > > > > first thing idxd_enable_wq() will do is see that the state of > > > > > > the > > > > > > workqueue is enabled, and return 0 instead of attempting to > > > > > > issue > > > > > > a command to enable the workqueue. > > > > > > > > > > > > So once a workqueue is found that needs to be re-enabled, > > > > > > set the state to disabled prior to calling idxd_enable_wq(). > > > > > > This would accurately reflect the state if the enable fails > > > > > > as well. > > > > > > > > > > > > Cc: Fenghua Yu <fenghua.yu@intel.com> > > > > > > Cc: Dave Jiang <dave.jiang@intel.com> > > > > > > Cc: Vinod Koul <vkoul@kernel.org> > > > > > > Cc: dmaengine@vger.kernel.org > > > > > > Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel > > > > > > data accelerators") > > > > > > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> > > > > > > --- > > > > > > drivers/dma/idxd/irq.c | 1 + > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c > > > > > > index 743ead5ebc57..723eeb5328d6 100644 > > > > > > --- a/drivers/dma/idxd/irq.c > > > > > > +++ b/drivers/dma/idxd/irq.c > > > > > > @@ -52,6 +52,7 @@ static void idxd_device_reinit(struct > > > > > > work_struct *work) > > > > > > struct idxd_wq *wq = idxd->wqs[i]; > > > > > > if (wq->state == IDXD_WQ_ENABLED) { > > > > > > + wq->state = IDXD_WQ_DISABLED; > > > > > Might be better off to insert this line in > > > > > idxd_wq_disable_cleanup(). I > > > > > think that should put it in sane state. > > > > I don't think that is called in the code path that I was lookng at. > > > > I've been > > > > looking at this bit of process_misc_interrupts(): > > > > > > > > halt: > > > > gensts.bits = ioread32(idxd->reg_base + > > > > IDXD_GENSTATS_OFFSET); > > > > if (gensts.state == IDXD_DEVICE_STATE_HALT) { > > > > idxd->state = IDXD_DEV_HALTED; > > > > if (gensts.reset_type == > > > > IDXD_DEVICE_RESET_SOFTWARE) { > > > > /* > > > > * If we need a software reset, we will > > > > throw the work > > > > * on a system workqueue in order to allow > > > > interrupts > > > > * for the device command completions. > > > > */ > > > > INIT_WORK(&idxd->work, idxd_device_reinit); > > > > queue_work(idxd->wq, &idxd->work); > > > > } else { > > > > idxd->state = IDXD_DEV_HALTED; > > > > idxd_wqs_quiesce(idxd); > > > > idxd_wqs_unmap_portal(idxd); > > > > spin_lock(&idxd->dev_lock); > > > > idxd_device_clear_state(idxd); > > > > dev_err(&idxd->pdev->dev, > > > > "idxd halted, need %s.\n", > > > > gensts.reset_type == > > > > IDXD_DEVICE_RESET_FLR ? > > > > "FLR" : "system reset"); > > > > spin_unlock(&idxd->dev_lock); > > > > return -ENXIO; > > > > } > > > > } > > > > > > > > return 0; > > > > } > > > > > > > > So it sees that the device is halted, and sticks > > > > idxd_device_reinint() on that > > > > workqueue. The idxd_device_reinit() has this loop to re-enable the > > > > idxd wqs: > > > idxd_device_reinit() should called idxd_device_reset() first. And > > > that > > > should at some point call idxd_wq_disable_cleanup() and clean up the > > > states. > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/irq.c#L42 > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L725 > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L711 > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L376 > > > > > > So if we stick the wq state reset in there, it should show up as > > > "disabled" by the time we try to enable the WQs again. Does that look > > > reasonable? > > > > > Ah, yeah I see that now. So, if it does set the state to disabled in > > idxd_wq_disable_cleanup(), does it have another means to track which > > wqs need to be re-enabled for that loop that happens after the > > idxd_device_reset() call? > > Oh I see what you mean... So we can either do what you did or create a mask > and mark the WQ that are "enabled" before reset. Maybe that's cleaner rather > than relying on the side effect of the WQ state isn't cleared? Thoughts? > Circling back to this. Since max_wqs could theoretically go up to 2^8, I guess this would need to be done with the bitmap_* functions? Regards, Jerry > > > > > > > for (i = 0; i < idxd->max_wqs; i++) { > > > > struct idxd_wq *wq = idxd->wqs[i]; > > > > > > > > if (wq->state == IDXD_WQ_ENABLED) { > > > > wq->state = IDXD_WQ_DISABLED; > > > > rc = idxd_wq_enable(wq); > > > > if (rc < 0) { > > > > dev_warn(dev, "Unable to re-enable > > > > wq %s\n", > > > > dev_name(wq_confdev(wq))); > > > > } > > > > } > > > > } > > > > > > > > Once you go into idxd_wq_enable() though you get this check at the > > > > beginning: > > > > > > > > if (wq->state == IDXD_WQ_ENABLED) { > > > > dev_dbg(dev, "WQ %d already enabled\n", wq->id); > > > > return 0; > > > > } > > > > > > > > So IIUC it sees the device is halted, goes to reset it, figures out > > > > a wq > > > > should be re-enabled, calls idxd_wq_enable() which hits the check, > > > > returns > > > > 0 and the wq is never really re-enabled, though it will still have > > > > wq state > > > > set to IDXD_WQ_ENABLED. > > > > > > > > Or am I missing something? > > > > > > > > Regards, > > > > Jerry > > > > > > > > > > rc = idxd_wq_enable(wq); > > > > > > if (rc < 0) { > > > > > > dev_warn(dev, "Unable to re- > > > > > > enable wq %s\n",
On 9/17/2022 10:05 AM, Jerry Snitselaar wrote: > On Wed, Aug 24, 2022 at 03:19:51PM -0700, Dave Jiang wrote: >> On 8/24/2022 3:07 PM, Jerry Snitselaar wrote: >>> On Wed, 2022-08-24 at 14:59 -0700, Dave Jiang wrote: >>>> On 8/24/2022 2:16 PM, Jerry Snitselaar wrote: >>>>> On Wed, Aug 24, 2022 at 01:29:03PM -0700, Dave Jiang wrote: >>>>>> On 8/24/2022 12:29 PM, Jerry Snitselaar wrote: >>>>>>> For a software reset idxd_device_reinit() is called, which will >>>>>>> walk >>>>>>> the device workqueues to see which ones were enabled, and try >>>>>>> to >>>>>>> re-enable them. It keys off wq->state being iDXD_WQ_ENABLED, >>>>>>> but the >>>>>>> first thing idxd_enable_wq() will do is see that the state of >>>>>>> the >>>>>>> workqueue is enabled, and return 0 instead of attempting to >>>>>>> issue >>>>>>> a command to enable the workqueue. >>>>>>> >>>>>>> So once a workqueue is found that needs to be re-enabled, >>>>>>> set the state to disabled prior to calling idxd_enable_wq(). >>>>>>> This would accurately reflect the state if the enable fails >>>>>>> as well. >>>>>>> >>>>>>> Cc: Fenghua Yu <fenghua.yu@intel.com> >>>>>>> Cc: Dave Jiang <dave.jiang@intel.com> >>>>>>> Cc: Vinod Koul <vkoul@kernel.org> >>>>>>> Cc: dmaengine@vger.kernel.org >>>>>>> Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel >>>>>>> data accelerators") >>>>>>> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> >>>>>>> --- >>>>>>> drivers/dma/idxd/irq.c | 1 + >>>>>>> 1 file changed, 1 insertion(+) >>>>>>> >>>>>>> diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c >>>>>>> index 743ead5ebc57..723eeb5328d6 100644 >>>>>>> --- a/drivers/dma/idxd/irq.c >>>>>>> +++ b/drivers/dma/idxd/irq.c >>>>>>> @@ -52,6 +52,7 @@ static void idxd_device_reinit(struct >>>>>>> work_struct *work) >>>>>>> struct idxd_wq *wq = idxd->wqs[i]; >>>>>>> if (wq->state == IDXD_WQ_ENABLED) { >>>>>>> + wq->state = IDXD_WQ_DISABLED; >>>>>> Might be better off to insert this line in >>>>>> idxd_wq_disable_cleanup(). I >>>>>> think that should put it in sane state. >>>>> I don't think that is called in the code path that I was lookng at. >>>>> I've been >>>>> looking at this bit of process_misc_interrupts(): >>>>> >>>>> halt: >>>>> gensts.bits = ioread32(idxd->reg_base + >>>>> IDXD_GENSTATS_OFFSET); >>>>> if (gensts.state == IDXD_DEVICE_STATE_HALT) { >>>>> idxd->state = IDXD_DEV_HALTED; >>>>> if (gensts.reset_type == >>>>> IDXD_DEVICE_RESET_SOFTWARE) { >>>>> /* >>>>> * If we need a software reset, we will >>>>> throw the work >>>>> * on a system workqueue in order to allow >>>>> interrupts >>>>> * for the device command completions. >>>>> */ >>>>> INIT_WORK(&idxd->work, idxd_device_reinit); >>>>> queue_work(idxd->wq, &idxd->work); >>>>> } else { >>>>> idxd->state = IDXD_DEV_HALTED; >>>>> idxd_wqs_quiesce(idxd); >>>>> idxd_wqs_unmap_portal(idxd); >>>>> spin_lock(&idxd->dev_lock); >>>>> idxd_device_clear_state(idxd); >>>>> dev_err(&idxd->pdev->dev, >>>>> "idxd halted, need %s.\n", >>>>> gensts.reset_type == >>>>> IDXD_DEVICE_RESET_FLR ? >>>>> "FLR" : "system reset"); >>>>> spin_unlock(&idxd->dev_lock); >>>>> return -ENXIO; >>>>> } >>>>> } >>>>> >>>>> return 0; >>>>> } >>>>> >>>>> So it sees that the device is halted, and sticks >>>>> idxd_device_reinint() on that >>>>> workqueue. The idxd_device_reinit() has this loop to re-enable the >>>>> idxd wqs: >>>> idxd_device_reinit() should called idxd_device_reset() first. And >>>> that >>>> should at some point call idxd_wq_disable_cleanup() and clean up the >>>> states. >>>> >>>> https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/irq.c#L42 >>>> >>>> https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L725 >>>> >>>> https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L711 >>>> >>>> https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L376 >>>> >>>> So if we stick the wq state reset in there, it should show up as >>>> "disabled" by the time we try to enable the WQs again. Does that look >>>> reasonable? >>>> >>> Ah, yeah I see that now. So, if it does set the state to disabled in >>> idxd_wq_disable_cleanup(), does it have another means to track which >>> wqs need to be re-enabled for that loop that happens after the >>> idxd_device_reset() call? >> Oh I see what you mean... So we can either do what you did or create a mask >> and mark the WQ that are "enabled" before reset. Maybe that's cleaner rather >> than relying on the side effect of the WQ state isn't cleared? Thoughts? >> > Circling back to this. Since max_wqs could theoretically go up to 2^8, I guess > this would need to be done with the bitmap_* functions? Hi Jerry, I wouldn't say never but I doubt any time soon for 2^8. DSA 1.0 has 8 WQs, and 2.0 (spec just went public) has 16. But yes we can use bitmap to be future proof. Are you currently working on a fix for this? Just don't want to duplicate effort if you already have something going. Thank you! > > Regards, > Jerry > >>>>> for (i = 0; i < idxd->max_wqs; i++) { >>>>> struct idxd_wq *wq = idxd->wqs[i]; >>>>> >>>>> if (wq->state == IDXD_WQ_ENABLED) { >>>>> wq->state = IDXD_WQ_DISABLED; >>>>> rc = idxd_wq_enable(wq); >>>>> if (rc < 0) { >>>>> dev_warn(dev, "Unable to re-enable >>>>> wq %s\n", >>>>> dev_name(wq_confdev(wq))); >>>>> } >>>>> } >>>>> } >>>>> >>>>> Once you go into idxd_wq_enable() though you get this check at the >>>>> beginning: >>>>> >>>>> if (wq->state == IDXD_WQ_ENABLED) { >>>>> dev_dbg(dev, "WQ %d already enabled\n", wq->id); >>>>> return 0; >>>>> } >>>>> >>>>> So IIUC it sees the device is halted, goes to reset it, figures out >>>>> a wq >>>>> should be re-enabled, calls idxd_wq_enable() which hits the check, >>>>> returns >>>>> 0 and the wq is never really re-enabled, though it will still have >>>>> wq state >>>>> set to IDXD_WQ_ENABLED. >>>>> >>>>> Or am I missing something? >>>>> >>>>> Regards, >>>>> Jerry >>>>> >>>>>>> rc = idxd_wq_enable(wq); >>>>>>> if (rc < 0) { >>>>>>> dev_warn(dev, "Unable to re- >>>>>>> enable wq %s\n",
On Mon, Sep 19, 2022 at 08:28:04AM -0700, Dave Jiang wrote: > > On 9/17/2022 10:05 AM, Jerry Snitselaar wrote: > > On Wed, Aug 24, 2022 at 03:19:51PM -0700, Dave Jiang wrote: > > > On 8/24/2022 3:07 PM, Jerry Snitselaar wrote: > > > > On Wed, 2022-08-24 at 14:59 -0700, Dave Jiang wrote: > > > > > On 8/24/2022 2:16 PM, Jerry Snitselaar wrote: > > > > > > On Wed, Aug 24, 2022 at 01:29:03PM -0700, Dave Jiang wrote: > > > > > > > On 8/24/2022 12:29 PM, Jerry Snitselaar wrote: > > > > > > > > For a software reset idxd_device_reinit() is called, which will > > > > > > > > walk > > > > > > > > the device workqueues to see which ones were enabled, and try > > > > > > > > to > > > > > > > > re-enable them. It keys off wq->state being iDXD_WQ_ENABLED, > > > > > > > > but the > > > > > > > > first thing idxd_enable_wq() will do is see that the state of > > > > > > > > the > > > > > > > > workqueue is enabled, and return 0 instead of attempting to > > > > > > > > issue > > > > > > > > a command to enable the workqueue. > > > > > > > > > > > > > > > > So once a workqueue is found that needs to be re-enabled, > > > > > > > > set the state to disabled prior to calling idxd_enable_wq(). > > > > > > > > This would accurately reflect the state if the enable fails > > > > > > > > as well. > > > > > > > > > > > > > > > > Cc: Fenghua Yu <fenghua.yu@intel.com> > > > > > > > > Cc: Dave Jiang <dave.jiang@intel.com> > > > > > > > > Cc: Vinod Koul <vkoul@kernel.org> > > > > > > > > Cc: dmaengine@vger.kernel.org > > > > > > > > Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel > > > > > > > > data accelerators") > > > > > > > > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> > > > > > > > > --- > > > > > > > > drivers/dma/idxd/irq.c | 1 + > > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > > > diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c > > > > > > > > index 743ead5ebc57..723eeb5328d6 100644 > > > > > > > > --- a/drivers/dma/idxd/irq.c > > > > > > > > +++ b/drivers/dma/idxd/irq.c > > > > > > > > @@ -52,6 +52,7 @@ static void idxd_device_reinit(struct > > > > > > > > work_struct *work) > > > > > > > > struct idxd_wq *wq = idxd->wqs[i]; > > > > > > > > if (wq->state == IDXD_WQ_ENABLED) { > > > > > > > > + wq->state = IDXD_WQ_DISABLED; > > > > > > > Might be better off to insert this line in > > > > > > > idxd_wq_disable_cleanup(). I > > > > > > > think that should put it in sane state. > > > > > > I don't think that is called in the code path that I was lookng at. > > > > > > I've been > > > > > > looking at this bit of process_misc_interrupts(): > > > > > > > > > > > > halt: > > > > > > gensts.bits = ioread32(idxd->reg_base + > > > > > > IDXD_GENSTATS_OFFSET); > > > > > > if (gensts.state == IDXD_DEVICE_STATE_HALT) { > > > > > > idxd->state = IDXD_DEV_HALTED; > > > > > > if (gensts.reset_type == > > > > > > IDXD_DEVICE_RESET_SOFTWARE) { > > > > > > /* > > > > > > * If we need a software reset, we will > > > > > > throw the work > > > > > > * on a system workqueue in order to allow > > > > > > interrupts > > > > > > * for the device command completions. > > > > > > */ > > > > > > INIT_WORK(&idxd->work, idxd_device_reinit); > > > > > > queue_work(idxd->wq, &idxd->work); > > > > > > } else { > > > > > > idxd->state = IDXD_DEV_HALTED; > > > > > > idxd_wqs_quiesce(idxd); > > > > > > idxd_wqs_unmap_portal(idxd); > > > > > > spin_lock(&idxd->dev_lock); > > > > > > idxd_device_clear_state(idxd); > > > > > > dev_err(&idxd->pdev->dev, > > > > > > "idxd halted, need %s.\n", > > > > > > gensts.reset_type == > > > > > > IDXD_DEVICE_RESET_FLR ? > > > > > > "FLR" : "system reset"); > > > > > > spin_unlock(&idxd->dev_lock); > > > > > > return -ENXIO; > > > > > > } > > > > > > } > > > > > > > > > > > > return 0; > > > > > > } > > > > > > > > > > > > So it sees that the device is halted, and sticks > > > > > > idxd_device_reinint() on that > > > > > > workqueue. The idxd_device_reinit() has this loop to re-enable the > > > > > > idxd wqs: > > > > > idxd_device_reinit() should called idxd_device_reset() first. And > > > > > that > > > > > should at some point call idxd_wq_disable_cleanup() and clean up the > > > > > states. > > > > > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/irq.c#L42 > > > > > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L725 > > > > > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L711 > > > > > > > > > > https://elixir.bootlin.com/linux/v6.0-rc2/source/drivers/dma/idxd/device.c#L376 > > > > > > > > > > So if we stick the wq state reset in there, it should show up as > > > > > "disabled" by the time we try to enable the WQs again. Does that look > > > > > reasonable? > > > > > > > > > Ah, yeah I see that now. So, if it does set the state to disabled in > > > > idxd_wq_disable_cleanup(), does it have another means to track which > > > > wqs need to be re-enabled for that loop that happens after the > > > > idxd_device_reset() call? > > > Oh I see what you mean... So we can either do what you did or create a mask > > > and mark the WQ that are "enabled" before reset. Maybe that's cleaner rather > > > than relying on the side effect of the WQ state isn't cleared? Thoughts? > > > > > Circling back to this. Since max_wqs could theoretically go up to 2^8, I guess > > this would need to be done with the bitmap_* functions? > > Hi Jerry, > > I wouldn't say never but I doubt any time soon for 2^8. DSA 1.0 has 8 WQs, > and 2.0 (spec just went public) has 16. But yes we can use bitmap to be > future proof. Are you currently working on a fix for this? Just don't want > to duplicate effort if you already have something going. Thank you! > Hi Dave, Yes, I will try to send it this afternoon. Regards, Jerry > > > > > Regards, > > Jerry > > > > > > > > for (i = 0; i < idxd->max_wqs; i++) { > > > > > > struct idxd_wq *wq = idxd->wqs[i]; > > > > > > > > > > > > if (wq->state == IDXD_WQ_ENABLED) { > > > > > > wq->state = IDXD_WQ_DISABLED; > > > > > > rc = idxd_wq_enable(wq); > > > > > > if (rc < 0) { > > > > > > dev_warn(dev, "Unable to re-enable > > > > > > wq %s\n", > > > > > > dev_name(wq_confdev(wq))); > > > > > > } > > > > > > } > > > > > > } > > > > > > > > > > > > Once you go into idxd_wq_enable() though you get this check at the > > > > > > beginning: > > > > > > > > > > > > if (wq->state == IDXD_WQ_ENABLED) { > > > > > > dev_dbg(dev, "WQ %d already enabled\n", wq->id); > > > > > > return 0; > > > > > > } > > > > > > > > > > > > So IIUC it sees the device is halted, goes to reset it, figures out > > > > > > a wq > > > > > > should be re-enabled, calls idxd_wq_enable() which hits the check, > > > > > > returns > > > > > > 0 and the wq is never really re-enabled, though it will still have > > > > > > wq state > > > > > > set to IDXD_WQ_ENABLED. > > > > > > > > > > > > Or am I missing something? > > > > > > > > > > > > Regards, > > > > > > Jerry > > > > > > > > > > > > > > rc = idxd_wq_enable(wq); > > > > > > > > if (rc < 0) { > > > > > > > > dev_warn(dev, "Unable to re- > > > > > > > > enable wq %s\n",
diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index 743ead5ebc57..723eeb5328d6 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -52,6 +52,7 @@ static void idxd_device_reinit(struct work_struct *work) struct idxd_wq *wq = idxd->wqs[i]; if (wq->state == IDXD_WQ_ENABLED) { + wq->state = IDXD_WQ_DISABLED; rc = idxd_wq_enable(wq); if (rc < 0) { dev_warn(dev, "Unable to re-enable wq %s\n",
For a software reset idxd_device_reinit() is called, which will walk the device workqueues to see which ones were enabled, and try to re-enable them. It keys off wq->state being iDXD_WQ_ENABLED, but the first thing idxd_enable_wq() will do is see that the state of the workqueue is enabled, and return 0 instead of attempting to issue a command to enable the workqueue. So once a workqueue is found that needs to be re-enabled, set the state to disabled prior to calling idxd_enable_wq(). This would accurately reflect the state if the enable fails as well. Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vinod Koul <vkoul@kernel.org> Cc: dmaengine@vger.kernel.org Fixes: bfe1d56091c1 ("dmaengine: idxd: Init and probe for Intel data accelerators") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> --- drivers/dma/idxd/irq.c | 1 + 1 file changed, 1 insertion(+)