diff mbox

possible core cq bug

Message ID e7782921-b55a-a628-02d4-874d88049ec5@grimberg.me (mailing list archive)
State Not Applicable
Headers show

Commit Message

Sagi Grimberg Dec. 3, 2017, 4:53 p.m. UTC
>>> If an application creates its cq for DIRECT poll mode using ib_create_cq()
>>> instead of ib_alloc_cq(),  and then uses ib_drain_qp() to drain its qp,
>>> ib_drain_sq/rq() will always hang forever because cq->wc is NULL.  IE
>>> ib_create_cq() doesn't allocate cq->wc, and ib_alloc_cq() does.  Yet the
>>> __ib_process_cq() requires cq->wc to actually complete any completions
>> and
>>> calling the cqe_done function.
>>>
>>> Is this a bug in the CQ core code or the application?
>>
>> Take a look in __ib_drain_rq/__ib_drain_sq for
>> cq->poll_ctx == IB_POLL_DIRECT. The drain routine polls
>> the completion queue from time to time...
> 
> Yes, but it ends up calling __ib_process_cq() which doesn't actually poll the CQ because cq->wc is NULL.

Do you mean that the CQ allocation wasn't done with ib_alloc_cq? That
indeed would be a bug. We can WARN on it as well so the application
will know to allocate its CQ with ib_alloc_cq.

Does something like this makes sense?
--
--
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Steve Wise Dec. 3, 2017, 5:17 p.m. UTC | #1
> > Yes, but it ends up calling __ib_process_cq() which doesn't actually poll
> the CQ because cq->wc is NULL.
> 
> Do you mean that the CQ allocation wasn't done with ib_alloc_cq? That
> indeed would be a bug. We can WARN on it as well so the application
> will know to allocate its CQ with ib_alloc_cq.

Yes, the application used ib_create_cq().

> 
> Does something like this makes sense?
>

That would at least log the issue, but the thread in ib_drain_qp() will be stuck forever continually blocking for 1/10sec and then polling.  Perhaps the drain logic should detect this, and then return?

Is there a reason we don't get rid of ib_create_cq()?   Or just make it call ib_alloc_cq()...

> --
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index f2ae75fa3128..90eac56b5f1a 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -69,7 +69,7 @@ static int __ib_process_cq(struct ib_cq *cq, int budget)
>    */
>   int ib_process_cq_direct(struct ib_cq *cq, int budget)
>   {
> -       WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT);
> +       WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT || !cq->wc);
> 
>          return __ib_process_cq(cq, budget);
>   }
> --

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Dec. 4, 2017, 6:41 p.m. UTC | #2
> That would at least log the issue, but the thread in ib_drain_qp() will be stuck forever continually blocking for 1/10sec and then polling.  Perhaps the drain logic should detect this, and then return?

I don't think returning is a better choice here..

> Is there a reason we don't get rid of ib_create_cq()?   Or just make it call ib_alloc_cq()...

We could do that, it also makes sense to me. Thoughts from others?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise Dec. 7, 2017, 4 p.m. UTC | #3
> 
> > That would at least log the issue, but the thread in ib_drain_qp() will be
> stuck forever continually blocking for 1/10sec and then polling.  Perhaps the
> drain logic should detect this, and then return?
> 
> I don't think returning is a better choice here..
> 

Why not?

> > Is there a reason we don't get rid of ib_create_cq()?   Or just make it call
> ib_alloc_cq()...
> 
> We could do that, it also makes sense to me. Thoughts from others?
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index f2ae75fa3128..90eac56b5f1a 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -69,7 +69,7 @@  static int __ib_process_cq(struct ib_cq *cq, int budget)
   */
  int ib_process_cq_direct(struct ib_cq *cq, int budget)
  {
-       WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT);
+       WARN_ON_ONCE(cq->poll_ctx != IB_POLL_DIRECT || !cq->wc);

         return __ib_process_cq(cq, budget);
  }