diff mbox

[rdma-rc,3/9] Revert "IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one"

Message ID 1472371118-8260-4-git-send-email-leon@kernel.org (mailing list archive)
State Accepted
Headers show

Commit Message

Leon Romanovsky Aug. 28, 2016, 7:58 a.m. UTC
From: Leon Romanovsky <leonro@mellanox.com>

By Mellanox HW design and SW implementation poll_cq never
fails and returns errors, so all these prints are to catch ULP bugs.

In case of such bug, the reverted patch patch will cause to reentry
(EAGAIN) and kprints storm again and again.

It is undesired and misleading behaviour.

This reverts commit 5412352fcd8f ("IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one")

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx4/cq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Yuval Shaia Aug. 28, 2016, 8:09 a.m. UTC | #1
On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> By Mellanox HW design and SW implementation poll_cq never
> fails and returns errors, so all these prints are to catch ULP bugs.
> 
> In case of such bug, the reverted patch patch will cause to reentry

s/patch patch/atch

> (EAGAIN) and kprints storm again and again.
> 
> It is undesired and misleading behaviour.
> 
> This reverts commit 5412352fcd8f ("IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one")
> 
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/infiniband/hw/mlx4/cq.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
> index 006db64..15b6289 100644
> --- a/drivers/infiniband/hw/mlx4/cq.c
> +++ b/drivers/infiniband/hw/mlx4/cq.c
> @@ -690,7 +690,7 @@ repoll:
>  	if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) == MLX4_OPCODE_NOP &&
>  		     is_send)) {
>  		pr_warn("Completion for NOP opcode detected!\n");
> -		return -EAGAIN;
> +		return -EINVAL;
>  	}
> 
>  	/* Resize CQ in progress */
> @@ -721,7 +721,7 @@ repoll:
>  		if (unlikely(!mqp)) {
>  			pr_warn("CQ %06x with entry for unknown QPN %06x\n",
>  			       cq->mcq.cqn, be32_to_cpu(cqe->vlan_my_qpn) & MLX4_CQE_QPN_MASK);
> -			return -EAGAIN;
> +			return -EINVAL;
>  		}
> 
>  		*cur_qp = to_mibqp(mqp);
> @@ -739,7 +739,7 @@ repoll:
>  		if (unlikely(!msrq)) {
>  			pr_warn("CQ %06x with entry for unknown SRQN %06x\n",
>  				cq->mcq.cqn, srq_num);
> -			return -EAGAIN;
> +			return -EINVAL;
>  		}
>  	}
> 
> --
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Aug. 28, 2016, 8:32 a.m. UTC | #2
On Sun, Aug 28, 2016 at 11:09:55AM +0300, Yuval Shaia wrote:
> On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > By Mellanox HW design and SW implementation poll_cq never
> > fails and returns errors, so all these prints are to catch ULP bugs.
> >
> > In case of such bug, the reverted patch patch will cause to reentry
>
> s/patch patch/atch

Doug,
Do you want me to respin it?

Thanks
Jason Gunthorpe Aug. 28, 2016, 5:17 p.m. UTC | #3
On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> By Mellanox HW design and SW implementation poll_cq never
> fails and returns errors, so all these prints are to catch ULP bugs.

Eh? How can a ULP cause poll_cq to get errors?

Are you sure these are not driver bugs?

Why can't you just print and discard the broken CQ entry?

What should use ULP do when it get EINVAL? You say poll again is
not correct, so you suggest a full QP tear down?

The entire point was to make the ULP interface sane. IMHO if the
driver has bugged itself then it should do something sane internally
and not just punt random error codes to the user.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Aug. 28, 2016, 6:26 p.m. UTC | #4
On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote:
> On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > By Mellanox HW design and SW implementation poll_cq never
> > fails and returns errors, so all these prints are to catch ULP bugs.
>
> Eh? How can a ULP cause poll_cq to get errors?
>
> Are you sure these are not driver bugs?
>
> Why can't you just print and discard the broken CQ entry?
>
> What should use ULP do when it get EINVAL? You say poll again is
> not correct, so you suggest a full QP tear down?

See patches 4 and 6, they completely removed these EINVALs.
Leon Romanovsky Aug. 28, 2016, 6:27 p.m. UTC | #5
On Sun, Aug 28, 2016 at 9:26 PM, Leon Romanovsky <leon@kernel.org> wrote:
> On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote:
>> On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote:
>> > From: Leon Romanovsky <leonro@mellanox.com>
>> >
>> > By Mellanox HW design and SW implementation poll_cq never
>> > fails and returns errors, so all these prints are to catch ULP bugs.
>>
>> Eh? How can a ULP cause poll_cq to get errors?
>>
>> Are you sure these are not driver bugs?
>>
>> Why can't you just print and discard the broken CQ entry?
>>
>> What should use ULP do when it get EINVAL? You say poll again is
>> not correct, so you suggest a full QP tear down?
>
> See patches 4 and 6, they completely removed these EINVALs.

Sorry, 4 and 9
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Aug. 28, 2016, 6:28 p.m. UTC | #6
On Sun, Aug 28, 2016 at 09:26:13PM +0300, Leon Romanovsky wrote:
> On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote:
> > On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@mellanox.com>
> > >
> > > By Mellanox HW design and SW implementation poll_cq never
> > > fails and returns errors, so all these prints are to catch ULP bugs.
> >
> > Eh? How can a ULP cause poll_cq to get errors?
> >
> > Are you sure these are not driver bugs?
> >
> > Why can't you just print and discard the broken CQ entry?
> >
> > What should use ULP do when it get EINVAL? You say poll again is
> > not correct, so you suggest a full QP tear down?
> 
> See patches 4 and 6, they completely removed these EINVALs.

So the commit message is still wrong.

Why do we need this revert? Just squash it and mark it fixup the
original.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Aug. 28, 2016, 6:35 p.m. UTC | #7
On Sun, Aug 28, 2016 at 12:28:13PM -0600, Jason Gunthorpe wrote:
> On Sun, Aug 28, 2016 at 09:26:13PM +0300, Leon Romanovsky wrote:
> > On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote:
> > > On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <leonro@mellanox.com>
> > > >
> > > > By Mellanox HW design and SW implementation poll_cq never
> > > > fails and returns errors, so all these prints are to catch ULP bugs.
> > >
> > > Eh? How can a ULP cause poll_cq to get errors?
> > >
> > > Are you sure these are not driver bugs?
> > >
> > > Why can't you just print and discard the broken CQ entry?
> > >
> > > What should use ULP do when it get EINVAL? You say poll again is
> > > not correct, so you suggest a full QP tear down?
> >
> > See patches 4 and 6, they completely removed these EINVALs.
>
> So the commit message is still wrong.
>
> Why do we need this revert? Just squash it and mark it fixup the
> original.

We need this revert, because the original commit is wrong and as was presented
by Sasha Levin in his talk about automatic creation of stable trees [1], he needs
this information to ensure that this commit won't be in stable tree by mistake.

[1]
https://lcccna2016.sched.org/event/7JW4/automating-the-creation-of-stable-trees-sasha-levin-verizon-labs

>
> Jason
Jason Gunthorpe Aug. 28, 2016, 6:39 p.m. UTC | #8
On Sun, Aug 28, 2016 at 09:35:00PM +0300, Leon Romanovsky wrote:

> We need this revert, because the original commit is wrong and as was
> presented

Sure, but it was also wrong to return EINVAL at all, the only correct
outcome I can see is to apply your whole series. Reverting doesn't
make the big picture any better, just differently wrong.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Sept. 2, 2016, 6:03 p.m. UTC | #9
On 8/28/2016 2:35 PM, Leon Romanovsky wrote:
> On Sun, Aug 28, 2016 at 12:28:13PM -0600, Jason Gunthorpe wrote:
>> On Sun, Aug 28, 2016 at 09:26:13PM +0300, Leon Romanovsky wrote:
>>> On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote:
>>>> On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote:
>>>>> From: Leon Romanovsky <leonro@mellanox.com>
>>>>>
>>>>> By Mellanox HW design and SW implementation poll_cq never
>>>>> fails and returns errors, so all these prints are to catch ULP bugs.
>>>>
>>>> Eh? How can a ULP cause poll_cq to get errors?
>>>>
>>>> Are you sure these are not driver bugs?
>>>>
>>>> Why can't you just print and discard the broken CQ entry?
>>>>
>>>> What should use ULP do when it get EINVAL? You say poll again is
>>>> not correct, so you suggest a full QP tear down?
>>>
>>> See patches 4 and 6, they completely removed these EINVALs.
>>
>> So the commit message is still wrong.
>>
>> Why do we need this revert? Just squash it and mark it fixup the
>> original.
> 
> We need this revert, because the original commit is wrong and as was presented
> by Sasha Levin in his talk about automatic creation of stable trees [1], he needs
> this information to ensure that this commit won't be in stable tree by mistake.
> 
> [1]
> https://lcccna2016.sched.org/event/7JW4/automating-the-creation-of-stable-trees-sasha-levin-verizon-labs

I didn't see the presentation at this link, but I'm not sure I agree
with your statement here.  There should be no difference, effectively,
between a commit to revert a bad commit, and a commit to fix a bad
commit that uses a Fixes: tag referencing the original commit.
Leon Romanovsky Sept. 4, 2016, 6:14 a.m. UTC | #10
On Fri, Sep 02, 2016 at 02:03:13PM -0400, Doug Ledford wrote:
> On 8/28/2016 2:35 PM, Leon Romanovsky wrote:
> > On Sun, Aug 28, 2016 at 12:28:13PM -0600, Jason Gunthorpe wrote:
> >> On Sun, Aug 28, 2016 at 09:26:13PM +0300, Leon Romanovsky wrote:
> >>> On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote:
> >>>> On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote:
> >>>>> From: Leon Romanovsky <leonro@mellanox.com>
> >>>>>
> >>>>> By Mellanox HW design and SW implementation poll_cq never
> >>>>> fails and returns errors, so all these prints are to catch ULP bugs.
> >>>>
> >>>> Eh? How can a ULP cause poll_cq to get errors?
> >>>>
> >>>> Are you sure these are not driver bugs?
> >>>>
> >>>> Why can't you just print and discard the broken CQ entry?
> >>>>
> >>>> What should use ULP do when it get EINVAL? You say poll again is
> >>>> not correct, so you suggest a full QP tear down?
> >>>
> >>> See patches 4 and 6, they completely removed these EINVALs.
> >>
> >> So the commit message is still wrong.
> >>
> >> Why do we need this revert? Just squash it and mark it fixup the
> >> original.
> >
> > We need this revert, because the original commit is wrong and as was presented
> > by Sasha Levin in his talk about automatic creation of stable trees [1], he needs
> > this information to ensure that this commit won't be in stable tree by mistake.
> >
> > [1]
> > https://lcccna2016.sched.org/event/7JW4/automating-the-creation-of-stable-trees-sasha-levin-verizon-labs
>
> I didn't see the presentation at this link, but I'm not sure I agree
> with your statement here.  There should be no difference, effectively,
> between a commit to revert a bad commit, and a commit to fix a bad
> commit that uses a Fixes: tag referencing the original commit.

The final code will look the same, but it will be much easier for the
tools to pickup them.

>
>
> --
> Doug Ledford <dledford@redhat.com>
>     GPG Key ID: 0E572FDD
>
diff mbox

Patch

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 006db64..15b6289 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -690,7 +690,7 @@  repoll:
 	if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) == MLX4_OPCODE_NOP &&
 		     is_send)) {
 		pr_warn("Completion for NOP opcode detected!\n");
-		return -EAGAIN;
+		return -EINVAL;
 	}

 	/* Resize CQ in progress */
@@ -721,7 +721,7 @@  repoll:
 		if (unlikely(!mqp)) {
 			pr_warn("CQ %06x with entry for unknown QPN %06x\n",
 			       cq->mcq.cqn, be32_to_cpu(cqe->vlan_my_qpn) & MLX4_CQE_QPN_MASK);
-			return -EAGAIN;
+			return -EINVAL;
 		}

 		*cur_qp = to_mibqp(mqp);
@@ -739,7 +739,7 @@  repoll:
 		if (unlikely(!msrq)) {
 			pr_warn("CQ %06x with entry for unknown SRQN %06x\n",
 				cq->mcq.cqn, srq_num);
-			return -EAGAIN;
+			return -EINVAL;
 		}
 	}