Message ID | 20230523155408.48594-1-mlombard@redhat.com (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
Series | Revert "IB/core: Fix use workqueue without WQ_MEM_RECLAIM" | expand |
On Tue, May 23, 2023 at 05:54:08PM +0200, Maurizio Lombardi wrote: > when nvme_rdma_reconnect_ctrl_work() fails, it flushes > the ib_addr:process_one_req() work but the latter is enqueued > on addr_wq which has been marked as "!WQ_MEM_RECLAIM". > > workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work > [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core] And why does nvme-wq need WQ_MEM_RECLAIM flag? I wonder if it is really needed. Thanks
Ășt 23. 5. 2023 v 20:28 odesĂlatel Leon Romanovsky <leon@kernel.org> napsal: > > workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work > > [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core] > > And why does nvme-wq need WQ_MEM_RECLAIM flag? I wonder if it is really > needed. Adding Sagi Grimberg to cc, he probably knows and can explain it better than me. Maurizio
>>>> workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work >>>> [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core] >>> >>> And why does nvme-wq need WQ_MEM_RECLAIM flag? I wonder if it is really >>> needed. >> >> Adding Sagi Grimberg to cc, he probably knows and can explain it better than me. > > We already allocate so much memory on these paths it is pretty > nonsense to claim they are a reclaim context. One allocation on the WQ > is not going to be the problem. > > Probably this nvme stuff should not be re-using a reclaim marke dWQ > for memory allocating work like this, it is kind of nonsensical. A controller reset runs on this workqueue, which should succeed to allow for pages to be flushed to the nvme disk. So I'd say its kind of essential that this sequence has a rescuer thread.
On Tue, Jun 06, 2023 at 02:01:04AM +0300, Sagi Grimberg wrote: > > > > > > workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work > > > > > [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core] > > > > > > > > And why does nvme-wq need WQ_MEM_RECLAIM flag? I wonder if it is really > > > > needed. > > > > > > Adding Sagi Grimberg to cc, he probably knows and can explain it better than me. > > > > We already allocate so much memory on these paths it is pretty > > nonsense to claim they are a reclaim context. One allocation on the WQ > > is not going to be the problem. > > > > Probably this nvme stuff should not be re-using a reclaim marke dWQ > > for memory allocating work like this, it is kind of nonsensical. > > A controller reset runs on this workqueue, which should succeed to allow > for pages to be flushed to the nvme disk. So I'd say its kind of > essential that this sequence has a rescuer thread. So don't run the CM stuff on the same WQ, go to another one without the reclaim mark? Jason
>>>>>> workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work >>>>>> [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core] >>>>> >>>>> And why does nvme-wq need WQ_MEM_RECLAIM flag? I wonder if it is really >>>>> needed. >>>> >>>> Adding Sagi Grimberg to cc, he probably knows and can explain it better than me. >>> >>> We already allocate so much memory on these paths it is pretty >>> nonsense to claim they are a reclaim context. One allocation on the WQ >>> is not going to be the problem. >>> >>> Probably this nvme stuff should not be re-using a reclaim marke dWQ >>> for memory allocating work like this, it is kind of nonsensical. >> >> A controller reset runs on this workqueue, which should succeed to allow >> for pages to be flushed to the nvme disk. So I'd say its kind of >> essential that this sequence has a rescuer thread. > > So don't run the CM stuff on the same WQ, go to another one without > the reclaim mark? That is not trivial. teardown works won't need a rescuer, but they will need to fence async work elements that are a part of the bringup that do need a rescuer (for example a scan work). So it requires more thought how it would be possible to untangle any dependency between work elements that make use of a rescuer and others that don't, and vice-versa.
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index f253295795f0..5c36d01ebf0b 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -872,7 +872,7 @@ static struct notifier_block nb = { int addr_init(void) { - addr_wq = alloc_ordered_workqueue("ib_addr", 0); + addr_wq = alloc_ordered_workqueue("ib_addr", WQ_MEM_RECLAIM); if (!addr_wq) return -ENOMEM;
when nvme_rdma_reconnect_ctrl_work() fails, it flushes the ib_addr:process_one_req() work but the latter is enqueued on addr_wq which has been marked as "!WQ_MEM_RECLAIM". workqueue: WQ_MEM_RECLAIM nvme-wq:nvme_rdma_reconnect_ctrl_work [nvme_rdma] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core] Call Trace: __flush_work.isra.0+0xbf/0x230 __cancel_work_timer+0x103/0x190 ? rdma_resolve_ip+0x257/0x2f0 [ib_core] ? __dev_printk+0x2d/0x69 rdma_addr_cancel+0x8e/0xc0 [ib_core] _destroy_id+0x1b/0x270 [rdma_cm] nvme_rdma_alloc_queue.cold+0x4b/0x5c [nvme_rdma] nvme_rdma_configure_admin_queue+0x1e/0x2f0 [nvme_rdma] nvme_rdma_setup_ctrl+0x1e/0x220 [nvme_rdma] nvme_rdma_reconnect_ctrl_work+0x22/0x30 [nvme_rdma] This has been introduced by commit 39baf10310e6 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM") Since nvme_rdma_reconnect_work() needs to flush process_one_req(), we have to restore the WQ_MEM_RECLAIM flag on the addr_wq workqueue. This reverts commit 39baf10310e6669564a485b55267fae70a4e44ae. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> --- drivers/infiniband/core/addr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)