Message ID | 20220907113800.22182-5-phaddad@nvidia.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | Provide more error details when a QP moves to | expand |
On Wed, Sep 07, 2022 at 02:38:00PM +0300, Patrisious Haddad wrote: > From: Israel Rukshin <israelr@nvidia.com> > > Add debug prints for fatal QP events that are helpful for finding the > root cause of the errors. The ib_get_qp_err_syndrome is called at > a work queue since the QP event callback is running on an > interrupt context that can't sleep. What an awkward interface. What prevents us from allowing ib_get_qp_err_syndrome to be called from arbitrary calling contexts, or even better just delivering the error directly as part of the event?
On Wed, Sep 07, 2022 at 02:02:00PM +0200, Christoph Hellwig wrote: > On Wed, Sep 07, 2022 at 02:38:00PM +0300, Patrisious Haddad wrote: > > From: Israel Rukshin <israelr@nvidia.com> > > > > Add debug prints for fatal QP events that are helpful for finding the > > root cause of the errors. The ib_get_qp_err_syndrome is called at > > a work queue since the QP event callback is running on an > > interrupt context that can't sleep. > > What an awkward interface. What prevents us from allowing > ib_get_qp_err_syndrome to be called from arbitrary calling contexts, > or even better just delivering the error directly as part of the > event? We need to call to our FW through command interface and unfortunately it is not possible to do in atomic context. Thanks
> From: Israel Rukshin <israelr@nvidia.com> > > Add debug prints for fatal QP events that are helpful for finding the > root cause of the errors. The ib_get_qp_err_syndrome is called at > a work queue since the QP event callback is running on an > interrupt context that can't sleep. > > Signed-off-by: Israel Rukshin <israelr@nvidia.com> > Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> > Reviewed-by: Leon Romanovsky <leonro@nvidia.com> What makes nvme-rdma special here? Why do you get this in nvme-rdma and not srp/iser/nfs-rdma/rds/smc/ipoib etc? This entire code needs to move to the rdma core instead of being leaked to ulps.
On Wed, Sep 07, 2022 at 03:34:21PM +0300, Sagi Grimberg wrote: > > > From: Israel Rukshin <israelr@nvidia.com> > > > > Add debug prints for fatal QP events that are helpful for finding the > > root cause of the errors. The ib_get_qp_err_syndrome is called at > > a work queue since the QP event callback is running on an > > interrupt context that can't sleep. > > > > Signed-off-by: Israel Rukshin <israelr@nvidia.com> > > Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> > > Reviewed-by: Leon Romanovsky <leonro@nvidia.com> > > What makes nvme-rdma special here? Why do you get this in > nvme-rdma and not srp/iser/nfs-rdma/rds/smc/ipoib etc? > > This entire code needs to move to the rdma core instead > of being leaked to ulps. We can move, but you will lose connection between queue number, caller and error itself. As I answered to Christoph, we will need to execute query QP command in a workqueue outside of event handler. So you will get a print about queue in error state and later you will see parsed error print somewhere in the dmesg. Thanks
>>> From: Israel Rukshin <israelr@nvidia.com> >>> >>> Add debug prints for fatal QP events that are helpful for finding the >>> root cause of the errors. The ib_get_qp_err_syndrome is called at >>> a work queue since the QP event callback is running on an >>> interrupt context that can't sleep. >>> >>> Signed-off-by: Israel Rukshin <israelr@nvidia.com> >>> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> >>> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> >> >> What makes nvme-rdma special here? Why do you get this in >> nvme-rdma and not srp/iser/nfs-rdma/rds/smc/ipoib etc? >> >> This entire code needs to move to the rdma core instead >> of being leaked to ulps. > > We can move, but you will lose connection between queue number, > caller and error itself. That still doesn't explain why nvme-rdma is special. In any event, the ulp can log the qpn so the context can be interrogated if that is important.
On Wed, Sep 07, 2022 at 06:16:05PM +0300, Sagi Grimberg wrote: >>> >>> This entire code needs to move to the rdma core instead >>> of being leaked to ulps. >> >> We can move, but you will lose connection between queue number, >> caller and error itself. > > That still doesn't explain why nvme-rdma is special. > > In any event, the ulp can log the qpn so the context can be interrogated > if that is important. I also don't see why the QP event handler can't be called from user context to start with. I see absolutely no reason to add boilerplate code to drivers for reporting slighly more verbose errors on one specific piece of hrdware. I'd say clean up the mess that is the QP event handler first, and then once error reporting becomes trivial we can just do it.
On Wed, Sep 07, 2022 at 06:16:05PM +0300, Sagi Grimberg wrote: > > > > > From: Israel Rukshin <israelr@nvidia.com> > > > > > > > > Add debug prints for fatal QP events that are helpful for finding the > > > > root cause of the errors. The ib_get_qp_err_syndrome is called at > > > > a work queue since the QP event callback is running on an > > > > interrupt context that can't sleep. > > > > > > > > Signed-off-by: Israel Rukshin <israelr@nvidia.com> > > > > Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> > > > > Reviewed-by: Leon Romanovsky <leonro@nvidia.com> > > > > > > What makes nvme-rdma special here? Why do you get this in > > > nvme-rdma and not srp/iser/nfs-rdma/rds/smc/ipoib etc? > > > > > > This entire code needs to move to the rdma core instead > > > of being leaked to ulps. > > > > We can move, but you will lose connection between queue number, > > caller and error itself. > > That still doesn't explain why nvme-rdma is special. It was important for us to get proper review from at least one ULP, nvme-rdma is not special at all. > > In any event, the ulp can log the qpn so the context can be interrogated > if that is important. ok
On Wed, Sep 07, 2022 at 05:18:18PM +0200, Christoph Hellwig wrote: > On Wed, Sep 07, 2022 at 06:16:05PM +0300, Sagi Grimberg wrote: > >>> > >>> This entire code needs to move to the rdma core instead > >>> of being leaked to ulps. > >> > >> We can move, but you will lose connection between queue number, > >> caller and error itself. > > > > That still doesn't explain why nvme-rdma is special. > > > > In any event, the ulp can log the qpn so the context can be interrogated > > if that is important. > > I also don't see why the QP event handler can't be called > from user context to start with. I see absolutely no reason to > add boilerplate code to drivers for reporting slighly more verbose > errors on one specific piece of hrdware. I'd say clean up the mess > that is the QP event handler first, and then once error reporting > becomes trivial we can just do it. I don't know, Chuck documented it in 2018: eb93c82ed8c7 ("RDMA/core: Document QP @event_handler function") 1164 struct ib_qp_init_attr { 1165 /* Consumer's event_handler callback must not block */ 1166 void (*event_handler)(struct ib_event *, void *); Thanks
On 07/09/2022 18:18, Christoph Hellwig wrote: > External email: Use caution opening links or attachments > > > On Wed, Sep 07, 2022 at 06:16:05PM +0300, Sagi Grimberg wrote: >>>> >>>> This entire code needs to move to the rdma core instead >>>> of being leaked to ulps. >>> >>> We can move, but you will lose connection between queue number, >>> caller and error itself. >> >> That still doesn't explain why nvme-rdma is special. >> >> In any event, the ulp can log the qpn so the context can be interrogated >> if that is important. > > I also don't see why the QP event handler can't be called > from user context to start with. I see absolutely no reason to > add boilerplate code to drivers for reporting slighly more verbose > errors on one specific piece of hrdware. I'd say clean up the mess > that is the QP event handler first, and then once error reporting > becomes trivial we can just do it. I would like to emphasize that it is not just about slightly more verbose error, but mainly it is about an error that wouldn't have been reported at all without this feature, as I previously mentioned error cases in which the remote side doesn't generate a CQE, the remote side wouldn't even know why the QP was moved to error state without this feature.
On 9/8/2022 1:39 AM, Leon Romanovsky wrote: > External email: Use caution opening links or attachments > > > On Wed, Sep 07, 2022 at 05:18:18PM +0200, Christoph Hellwig wrote: >> On Wed, Sep 07, 2022 at 06:16:05PM +0300, Sagi Grimberg wrote: >>>>> >>>>> This entire code needs to move to the rdma core instead >>>>> of being leaked to ulps. >>>> >>>> We can move, but you will lose connection between queue number, >>>> caller and error itself. >>> >>> That still doesn't explain why nvme-rdma is special. >>> >>> In any event, the ulp can log the qpn so the context can be interrogated >>> if that is important. >> >> I also don't see why the QP event handler can't be called >> from user context to start with. I see absolutely no reason to >> add boilerplate code to drivers for reporting slighly more verbose >> errors on one specific piece of hrdware. I'd say clean up the mess >> that is the QP event handler first, and then once error reporting >> becomes trivial we can just do it. > > I don't know, Chuck documented it in 2018: > eb93c82ed8c7 ("RDMA/core: Document QP @event_handler function") > > 1164 struct ib_qp_init_attr { > 1165 /* Consumer's event_handler callback must not block */ > 1166 void (*event_handler)(struct ib_event *, void *); Looks like driver calls it in an atomic way, e.g.: mlx5_ib_qp_event() -> ibqp->event_handler(&event, ibqp->qp_context); Could driver also report it as an IB async event, as WQ_CATAS_ERROR is defined as an async event (IB spec C11-39), and QP_FATAL is also an event of enum ib_event_type? Is it a problem that one event is reported twice? If it is acceptable then ulp could register this event handler with ib_register_event_handler(), which is non-atomic.
On 11/1/2022 5:12 PM, Mark Zhang wrote: > External email: Use caution opening links or attachments > > > On 9/8/2022 1:39 AM, Leon Romanovsky wrote: >> External email: Use caution opening links or attachments >> >> >> On Wed, Sep 07, 2022 at 05:18:18PM +0200, Christoph Hellwig wrote: >>> On Wed, Sep 07, 2022 at 06:16:05PM +0300, Sagi Grimberg wrote: >>>>>> >>>>>> This entire code needs to move to the rdma core instead >>>>>> of being leaked to ulps. >>>>> >>>>> We can move, but you will lose connection between queue number, >>>>> caller and error itself. >>>> >>>> That still doesn't explain why nvme-rdma is special. >>>> >>>> In any event, the ulp can log the qpn so the context can be >>>> interrogated >>>> if that is important. >>> >>> I also don't see why the QP event handler can't be called >>> from user context to start with. I see absolutely no reason to >>> add boilerplate code to drivers for reporting slighly more verbose >>> errors on one specific piece of hrdware. I'd say clean up the mess >>> that is the QP event handler first, and then once error reporting >>> becomes trivial we can just do it. >> >> I don't know, Chuck documented it in 2018: >> eb93c82ed8c7 ("RDMA/core: Document QP @event_handler function") >> >> 1164 struct ib_qp_init_attr { >> 1165 /* Consumer's event_handler callback must not block */ >> 1166 void (*event_handler)(struct ib_event >> *, void *); > > Looks like driver calls it in an atomic way, e.g.: > mlx5_ib_qp_event() -> ibqp->event_handler(&event, ibqp->qp_context); > > Could driver also report it as an IB async event, as WQ_CATAS_ERROR is > defined as an async event (IB spec C11-39), and QP_FATAL is also an > event of enum ib_event_type? Is it a problem that one event is reported > twice? > > If it is acceptable then ulp could register this event handler with > ib_register_event_handler(), which is non-atomic. Or move qp event handler to non-atomic as Christoph suggested? This means to fix the mlx4/mlx5 driver, to call it in a work queue.
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 3100643be299..7e56c0dbe8ea 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -99,6 +99,7 @@ struct nvme_rdma_queue { bool pi_support; int cq_size; struct mutex queue_lock; + struct work_struct qp_err_work; }; struct nvme_rdma_ctrl { @@ -237,11 +238,31 @@ static struct nvme_rdma_qe *nvme_rdma_alloc_ring(struct ib_device *ibdev, return NULL; } +static void nvme_rdma_qp_error_work(struct work_struct *work) +{ + struct nvme_rdma_queue *queue = container_of(work, + struct nvme_rdma_queue, qp_err_work); + int ret; + char err[IB_ERR_SYNDROME_LENGTH]; + + ret = ib_get_qp_err_syndrome(queue->qp, err); + if (ret) + return; + + pr_err("Queue %d got QP error syndrome %s\n", + nvme_rdma_queue_idx(queue), err); +} + static void nvme_rdma_qp_event(struct ib_event *event, void *context) { + struct nvme_rdma_queue *queue = context; + pr_debug("QP event %s (%d)\n", ib_event_msg(event->event), event->event); + if (event->event == IB_EVENT_QP_FATAL || + event->event == IB_EVENT_QP_ACCESS_ERR) + queue_work(nvme_wq, &queue->qp_err_work); } static int nvme_rdma_wait_for_cm(struct nvme_rdma_queue *queue) @@ -261,7 +282,9 @@ static int nvme_rdma_create_qp(struct nvme_rdma_queue *queue, const int factor) struct ib_qp_init_attr init_attr; int ret; + INIT_WORK(&queue->qp_err_work, nvme_rdma_qp_error_work); memset(&init_attr, 0, sizeof(init_attr)); + init_attr.qp_context = queue; init_attr.event_handler = nvme_rdma_qp_event; /* +1 for drain */ init_attr.cap.max_send_wr = factor * queue->queue_size + 1; @@ -434,6 +457,7 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue) ib_mr_pool_destroy(queue->qp, &queue->qp->sig_mrs); ib_mr_pool_destroy(queue->qp, &queue->qp->rdma_mrs); + flush_work(&queue->qp_err_work); /* * The cm_id object might have been destroyed during RDMA connection * establishment error flow to avoid getting other cma events, thus