Message ID | 5f5a1e4e90f3625cea57ffa79fc0e5bcb7efe09d.1548963371.git.swise@opengridcomputing.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [rdma-next] RDMA/addr: create addr_wq with WQ_MEM_RECLAIM flag | expand |
> -----Original Message----- > From: linux-rdma-owner@vger.kernel.org <linux-rdma- > owner@vger.kernel.org> On Behalf Of Steve Wise > Sent: Thursday, January 31, 2019 1:31 PM > To: dledford@redhat.com; Jason Gunthorpe <jgg@mellanox.com> > Cc: linux-rdma@vger.kernel.org > Subject: [PATCH rdma-next] RDMA/addr: create addr_wq with > WQ_MEM_RECLAIM flag > > While running NVMe/oF wire unplug tests, we hit this warning in > kernel/workqueue.c:check_flush_dependency(): > > WARN_ONCE(worker && ((worker->current_pwq->wq->flags & > (WQ_MEM_RECLAIM | __WQ_LEGACY)) == > WQ_MEM_RECLAIM), > "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing > !WQ_MEM_RECLAIM %s:%pf", > worker->current_pwq->wq->name, worker->current_func, > target_wq->name, target_func); > > Which I think means we're flushing a workq that doesn't have > WQ_MEM_RECLAIM set, from workqueue context that does have it set. > > Looking at rdma_addr_cancel() which is doing the flushing, it flushes the > addr_wq which doesn't have MEM_RECLAIM set. Yet rdma_addr_cancel() is > being called by the nvme host connection timeout/reconnect workqueue > thread that does have WQ_MEM_RECLAIM set. > > So set WQ_MEM_RECLAIM on the addr_req workqueue. > Please add below [1] fixes by tag. I removed this flag based on commit [2] of Sagi and discussion[3]. Which I think [2] and [3] are incorrect. Memory reclaim path could have been triggered by processes or kernel and I think it is ok to allocate a memory in a work item trigger as part of reclaim path or otherwise. Memory allocation in a wq has nothing to do with memory reclaim. I wish if Tejun or Christoph confirm this is right or correct our understanding. Based these past discussions, it appears to me that WQ_MEM_RECLAIM limitation and its use is not clear. Commit [2] also requires a fix. This [3] is the past discussion thread when flag was dropped incorrectly as starting point. If [3] is right, there are several wq which need a fix and proposed patch doesn't make sense too. [1] Fixes: 39baf10310e6 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM") [2] cb93e597779e ("cm: Don't allocate ib_cm workqueue with WQ_MEM_RECLAIM") [3] https://www.spinics.net/lists/linux-rdma/msg53314.html Reviewed-by: Parav Pandit <parav@mellanox.com> This was reported internally too a while back. It was in my TODO list, for a while now, thanks for the fix. Lets get the clarity of WQ_MEM_RECLAIM once for all. I would be happy to submit a documentation update patch once its clear. > > Signed-off-by: Steve Wise <swise@opengridcomputing.com> > --- > drivers/infiniband/core/addr.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c > index 0dce94e3c495..1d88ee3ac8c7 100644 > --- a/drivers/infiniband/core/addr.c > +++ b/drivers/infiniband/core/addr.c > @@ -868,7 +868,7 @@ static int netevent_callback(struct notifier_block > *self, unsigned long event, > > int addr_init(void) > { > - addr_wq = alloc_ordered_workqueue("ib_addr", 0); > + addr_wq = alloc_ordered_workqueue("ib_addr", > WQ_MEM_RECLAIM); > if (!addr_wq) > return -ENOMEM; > > -- > 1.8.3.1
On Thu, Jan 31, 2019 at 01:56:15PM -0700, Parav Pandit wrote: > > > > From: linux-rdma-owner@vger.kernel.org <linux-rdma- > > owner@vger.kernel.org> On Behalf Of Steve Wise > > Sent: Thursday, January 31, 2019 1:31 PM > > To: dledford@redhat.com; Jason Gunthorpe <jgg@mellanox.com> > > Cc: linux-rdma@vger.kernel.org > > Subject: [PATCH rdma-next] RDMA/addr: create addr_wq with > > WQ_MEM_RECLAIM flag > > > > While running NVMe/oF wire unplug tests, we hit this warning in > > kernel/workqueue.c:check_flush_dependency(): > > > > WARN_ONCE(worker && ((worker->current_pwq->wq->flags & > > (WQ_MEM_RECLAIM | __WQ_LEGACY)) == > > WQ_MEM_RECLAIM), > > "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing > > !WQ_MEM_RECLAIM %s:%pf", > > worker->current_pwq->wq->name, worker->current_func, > > target_wq->name, target_func); > > > > Which I think means we're flushing a workq that doesn't have > > WQ_MEM_RECLAIM set, from workqueue context that does have it set. > > > > Looking at rdma_addr_cancel() which is doing the flushing, it flushes the > > addr_wq which doesn't have MEM_RECLAIM set. Yet rdma_addr_cancel() is > > being called by the nvme host connection timeout/reconnect workqueue > > thread that does have WQ_MEM_RECLAIM set. > > > > So set WQ_MEM_RECLAIM on the addr_req workqueue. > > > > Please add below [1] fixes by tag. > I removed this flag based on commit [2] of Sagi and discussion[3]. > Which I think [2] and [3] are incorrect. > > Memory reclaim path could have been triggered by processes or kernel > and I think it is ok to allocate a memory in a work item trigger as > part of reclaim path or otherwise. Memory allocation in a wq has > nothing to do with memory reclaim. I wish if Tejun or Christoph > confirm this is right or correct our understanding. Considering the only thing WQ_MEM_RECLAIM does is to pre-allocate a execution progress to guarentee forward progress if the system is unable to succed in memory allocations.. It seems a bit goofy to say that it is OK to then block on memory allocations inside that precious execution context.. I also would like to understand what the rules should be for this flag before applying this patch?? > Based these past discussions, it appears to me that WQ_MEM_RECLAIM > limitation and its use is not clear. I've always thought it was designed to guarentee forward progress if memory cannot be allocated. Jason
+tj@kernel.org +hch@infradead.org +sagi@ Hey Tejun, Christoph, and Sagi, We seem to be (still) unclear on when and why work queues should be created with WQ_MEM_RECLAIM. Care to comment on this patch, as well as the earlier discusson in [3] that led to removing WQ_MEM_RECLAIM on all the RDMA workqueues? more below: On 1/31/2019 2:56 PM, Parav Pandit wrote: > >> -----Original Message----- >> From: linux-rdma-owner@vger.kernel.org <linux-rdma- >> owner@vger.kernel.org> On Behalf Of Steve Wise >> Sent: Thursday, January 31, 2019 1:31 PM >> To: dledford@redhat.com; Jason Gunthorpe <jgg@mellanox.com> >> Cc: linux-rdma@vger.kernel.org >> Subject: [PATCH rdma-next] RDMA/addr: create addr_wq with >> WQ_MEM_RECLAIM flag >> >> While running NVMe/oF wire unplug tests, we hit this warning in >> kernel/workqueue.c:check_flush_dependency(): >> >> WARN_ONCE(worker && ((worker->current_pwq->wq->flags & >> (WQ_MEM_RECLAIM | __WQ_LEGACY)) == >> WQ_MEM_RECLAIM), >> "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing >> !WQ_MEM_RECLAIM %s:%pf", >> worker->current_pwq->wq->name, worker->current_func, >> target_wq->name, target_func); >> >> Which I think means we're flushing a workq that doesn't have >> WQ_MEM_RECLAIM set, from workqueue context that does have it set. >> >> Looking at rdma_addr_cancel() which is doing the flushing, it flushes the >> addr_wq which doesn't have MEM_RECLAIM set. Yet rdma_addr_cancel() is >> being called by the nvme host connection timeout/reconnect workqueue >> thread that does have WQ_MEM_RECLAIM set. >> >> So set WQ_MEM_RECLAIM on the addr_req workqueue. >> > Please add below [1] fixes by tag. > I removed this flag based on commit [2] of Sagi and discussion[3]. > Which I think [2] and [3] are incorrect. > > Memory reclaim path could have been triggered by processes or kernel and I think it is ok to allocate a memory in a work item trigger as part of reclaim path or otherwise. > Memory allocation in a wq has nothing to do with memory reclaim. I wish if Tejun or Christoph confirm this is right or correct our understanding. > > Based these past discussions, it appears to me that WQ_MEM_RECLAIM limitation and its use is not clear. > > Commit [2] also requires a fix. > This [3] is the past discussion thread when flag was dropped incorrectly as starting point. > > If [3] is right, there are several wq which need a fix and proposed patch doesn't make sense too. > > [1] Fixes: 39baf10310e6 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM") > [2] cb93e597779e ("cm: Don't allocate ib_cm workqueue with WQ_MEM_RECLAIM") > [3] https://www.spinics.net/lists/linux-rdma/msg53314.html > > Reviewed-by: Parav Pandit <parav@mellanox.com> > This was reported internally too a while back. It was in my TODO list, for a while now, thanks for the fix. > Lets get the clarity of WQ_MEM_RECLAIM once for all. > I would be happy to submit a documentation update patch once its clear. Funny, I reported [3] which led to removing WQ_MEM_RECLAIM on all the rdma/core wqs [1] and [2]. If that was the correct thing to do, then perhaps the nvme host workqueue, nvme_wq from nvme/host/core.c, needs to remove WQ_MEM_RECLAIM: ... int __init nvme_core_init(void) { int result = -ENOMEM; nvme_wq = alloc_workqueue("nvme-wq", WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0); if (!nvme_wq) goto out; ...
On Thu, Jan 31, 2019 at 09:00:42PM +0000, Jason Gunthorpe wrote: > On Thu, Jan 31, 2019 at 01:56:15PM -0700, Parav Pandit wrote: > > > > > > > From: linux-rdma-owner@vger.kernel.org <linux-rdma- > > > owner@vger.kernel.org> On Behalf Of Steve Wise > > > Sent: Thursday, January 31, 2019 1:31 PM > > > To: dledford@redhat.com; Jason Gunthorpe <jgg@mellanox.com> > > > Cc: linux-rdma@vger.kernel.org > > > Subject: [PATCH rdma-next] RDMA/addr: create addr_wq with > > > WQ_MEM_RECLAIM flag > > > > > > While running NVMe/oF wire unplug tests, we hit this warning in > > > kernel/workqueue.c:check_flush_dependency(): > > > > > > WARN_ONCE(worker && ((worker->current_pwq->wq->flags & > > > (WQ_MEM_RECLAIM | __WQ_LEGACY)) == > > > WQ_MEM_RECLAIM), > > > "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing > > > !WQ_MEM_RECLAIM %s:%pf", > > > worker->current_pwq->wq->name, worker->current_func, > > > target_wq->name, target_func); > > > > > > Which I think means we're flushing a workq that doesn't have > > > WQ_MEM_RECLAIM set, from workqueue context that does have it set. > > > > > > Looking at rdma_addr_cancel() which is doing the flushing, it flushes the > > > addr_wq which doesn't have MEM_RECLAIM set. Yet rdma_addr_cancel() is > > > being called by the nvme host connection timeout/reconnect workqueue > > > thread that does have WQ_MEM_RECLAIM set. > > > > > > So set WQ_MEM_RECLAIM on the addr_req workqueue. > > > > > > > Please add below [1] fixes by tag. > > I removed this flag based on commit [2] of Sagi and discussion[3]. > > Which I think [2] and [3] are incorrect. > > > > Memory reclaim path could have been triggered by processes or kernel > > and I think it is ok to allocate a memory in a work item trigger as > > part of reclaim path or otherwise. Memory allocation in a wq has > > nothing to do with memory reclaim. I wish if Tejun or Christoph > > confirm this is right or correct our understanding. > > Considering the only thing WQ_MEM_RECLAIM does is to pre-allocate a > execution progress to guarentee forward progress if the system is > unable to succed in memory allocations.. > > It seems a bit goofy to say that it is OK to then block on memory > allocations inside that precious execution context.. > > I also would like to understand what the rules should be for this flag > before applying this patch?? > > > Based these past discussions, it appears to me that WQ_MEM_RECLAIM > > limitation and its use is not clear. > > I've always thought it was designed to guarentee forward progress if > memory cannot be allocated. I understood it very similar except the last part and my version is: "it was designed to guarantee forward progress and prioritized if memory shrinker is called." Thanks > > Jason
On Thu, Jan 31, 2019 at 11:30:42AM -0800, Steve Wise wrote: > While running NVMe/oF wire unplug tests, we hit this warning in > kernel/workqueue.c:check_flush_dependency(): > > WARN_ONCE(worker && ((worker->current_pwq->wq->flags & > (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), > "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing !WQ_MEM_RECLAIM %s:%pf", > worker->current_pwq->wq->name, worker->current_func, > target_wq->name, target_func); > > Which I think means we're flushing a workq that doesn't have > WQ_MEM_RECLAIM set, from workqueue context that does have it set. > > Looking at rdma_addr_cancel() which is doing the flushing, it flushes > the addr_wq which doesn't have MEM_RECLAIM set. Yet rdma_addr_cancel() > is being called by the nvme host connection timeout/reconnect workqueue > thread that does have WQ_MEM_RECLAIM set. Since we haven't learned anything more, I think you should look to remove either the WQ_MEM_RECLAIM or the rdma_addr_cancel() from the nvme side. Jason
On 2/5/2019 4:39 PM, Jason Gunthorpe wrote: > On Thu, Jan 31, 2019 at 11:30:42AM -0800, Steve Wise wrote: >> While running NVMe/oF wire unplug tests, we hit this warning in >> kernel/workqueue.c:check_flush_dependency(): >> >> WARN_ONCE(worker && ((worker->current_pwq->wq->flags & >> (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), >> "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing !WQ_MEM_RECLAIM %s:%pf", >> worker->current_pwq->wq->name, worker->current_func, >> target_wq->name, target_func); >> >> Which I think means we're flushing a workq that doesn't have >> WQ_MEM_RECLAIM set, from workqueue context that does have it set. >> >> Looking at rdma_addr_cancel() which is doing the flushing, it flushes >> the addr_wq which doesn't have MEM_RECLAIM set. Yet rdma_addr_cancel() >> is being called by the nvme host connection timeout/reconnect workqueue >> thread that does have WQ_MEM_RECLAIM set. > Since we haven't learned anything more, I think you should look to > remove either the WQ_MEM_RECLAIM or the rdma_addr_cancel() from the > nvme side. The nvme code is just calling rdma_destroy_id(), which in turn calls rdma_addr_cancel(), so I'll have to remove WQ_MEM_RECLAIM from the workqueue. I'll post this patch then.
On 2/6/2019 3:52 PM, Steve Wise wrote: > On 2/5/2019 4:39 PM, Jason Gunthorpe wrote: >> On Thu, Jan 31, 2019 at 11:30:42AM -0800, Steve Wise wrote: >>> While running NVMe/oF wire unplug tests, we hit this warning in >>> kernel/workqueue.c:check_flush_dependency(): >>> >>> WARN_ONCE(worker && ((worker->current_pwq->wq->flags & >>> (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), >>> "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing !WQ_MEM_RECLAIM %s:%pf", >>> worker->current_pwq->wq->name, worker->current_func, >>> target_wq->name, target_func); >>> >>> Which I think means we're flushing a workq that doesn't have >>> WQ_MEM_RECLAIM set, from workqueue context that does have it set. >>> >>> Looking at rdma_addr_cancel() which is doing the flushing, it flushes >>> the addr_wq which doesn't have MEM_RECLAIM set. Yet rdma_addr_cancel() >>> is being called by the nvme host connection timeout/reconnect workqueue >>> thread that does have WQ_MEM_RECLAIM set. >> Since we haven't learned anything more, I think you should look to >> remove either the WQ_MEM_RECLAIM or the rdma_addr_cancel() from the >> nvme side. > The nvme code is just calling rdma_destroy_id(), which in turn calls > rdma_addr_cancel(), so I'll have to remove WQ_MEM_RECLAIM from the > workqueue. > > I'll post this patch then. What a mess. If I remove RECLAIM for nvme_wq, then I'll regress this change, I think: c669ccdc50c2 ("nvme: queue ns scanning and async request from nvme_wq")
On Thursday, February 02/07/19, 2019 at 03:49:47 +0530, Steve Wise wrote: > > On 2/6/2019 3:52 PM, Steve Wise wrote: > > On 2/5/2019 4:39 PM, Jason Gunthorpe wrote: > >> On Thu, Jan 31, 2019 at 11:30:42AM -0800, Steve Wise wrote: > >>> While running NVMe/oF wire unplug tests, we hit this warning in > >>> kernel/workqueue.c:check_flush_dependency(): > >>> > >>> WARN_ONCE(worker && ((worker->current_pwq->wq->flags & > >>> (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), > >>> "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing !WQ_MEM_RECLAIM %s:%pf", > >>> worker->current_pwq->wq->name, worker->current_func, > >>> target_wq->name, target_func); > >>> > >>> Which I think means we're flushing a workq that doesn't have > >>> WQ_MEM_RECLAIM set, from workqueue context that does have it set. > >>> > >>> Looking at rdma_addr_cancel() which is doing the flushing, it flushes > >>> the addr_wq which doesn't have MEM_RECLAIM set. Yet rdma_addr_cancel() > >>> is being called by the nvme host connection timeout/reconnect workqueue > >>> thread that does have WQ_MEM_RECLAIM set. > >> Since we haven't learned anything more, I think you should look to > >> remove either the WQ_MEM_RECLAIM or the rdma_addr_cancel() from the > >> nvme side. > > The nvme code is just calling rdma_destroy_id(), which in turn calls > > rdma_addr_cancel(), so I'll have to remove WQ_MEM_RECLAIM from the > > workqueue. > > > > I'll post this patch then. > > > What a mess. If I remove RECLAIM for nvme_wq, then I'll regress this > change, I think: > > c669ccdc50c2 ("nvme: queue ns scanning and async request from nvme_wq") > Hi All, I see this issue with some basic tests. Looks like the discussion ended here. Please suggest where this ideally needs to be fixed so that I can try fixing it. Thanks, Bharat
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 0dce94e3c495..1d88ee3ac8c7 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -868,7 +868,7 @@ static int netevent_callback(struct notifier_block *self, unsigned long event, int addr_init(void) { - addr_wq = alloc_ordered_workqueue("ib_addr", 0); + addr_wq = alloc_ordered_workqueue("ib_addr", WQ_MEM_RECLAIM); if (!addr_wq) return -ENOMEM;
While running NVMe/oF wire unplug tests, we hit this warning in kernel/workqueue.c:check_flush_dependency(): WARN_ONCE(worker && ((worker->current_pwq->wq->flags & (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), "workqueue: WQ_MEM_RECLAIM %s:%pf is flushing !WQ_MEM_RECLAIM %s:%pf", worker->current_pwq->wq->name, worker->current_func, target_wq->name, target_func); Which I think means we're flushing a workq that doesn't have WQ_MEM_RECLAIM set, from workqueue context that does have it set. Looking at rdma_addr_cancel() which is doing the flushing, it flushes the addr_wq which doesn't have MEM_RECLAIM set. Yet rdma_addr_cancel() is being called by the nvme host connection timeout/reconnect workqueue thread that does have WQ_MEM_RECLAIM set. So set WQ_MEM_RECLAIM on the addr_req workqueue. Signed-off-by: Steve Wise <swise@opengridcomputing.com> --- drivers/infiniband/core/addr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)