Message ID | 1472371118-8260-4-git-send-email-leon@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@mellanox.com> > > By Mellanox HW design and SW implementation poll_cq never > fails and returns errors, so all these prints are to catch ULP bugs. > > In case of such bug, the reverted patch patch will cause to reentry s/patch patch/atch > (EAGAIN) and kprints storm again and again. > > It is undesired and misleading behaviour. > > This reverts commit 5412352fcd8f ("IB/mlx4: Return EAGAIN for any error in mlx4_ib_poll_one") > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com> > --- > drivers/infiniband/hw/mlx4/cq.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c > index 006db64..15b6289 100644 > --- a/drivers/infiniband/hw/mlx4/cq.c > +++ b/drivers/infiniband/hw/mlx4/cq.c > @@ -690,7 +690,7 @@ repoll: > if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) == MLX4_OPCODE_NOP && > is_send)) { > pr_warn("Completion for NOP opcode detected!\n"); > - return -EAGAIN; > + return -EINVAL; > } > > /* Resize CQ in progress */ > @@ -721,7 +721,7 @@ repoll: > if (unlikely(!mqp)) { > pr_warn("CQ %06x with entry for unknown QPN %06x\n", > cq->mcq.cqn, be32_to_cpu(cqe->vlan_my_qpn) & MLX4_CQE_QPN_MASK); > - return -EAGAIN; > + return -EINVAL; > } > > *cur_qp = to_mibqp(mqp); > @@ -739,7 +739,7 @@ repoll: > if (unlikely(!msrq)) { > pr_warn("CQ %06x with entry for unknown SRQN %06x\n", > cq->mcq.cqn, srq_num); > - return -EAGAIN; > + return -EINVAL; > } > } > > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Aug 28, 2016 at 11:09:55AM +0300, Yuval Shaia wrote: > On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@mellanox.com> > > > > By Mellanox HW design and SW implementation poll_cq never > > fails and returns errors, so all these prints are to catch ULP bugs. > > > > In case of such bug, the reverted patch patch will cause to reentry > > s/patch patch/atch Doug, Do you want me to respin it? Thanks
On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@mellanox.com> > > By Mellanox HW design and SW implementation poll_cq never > fails and returns errors, so all these prints are to catch ULP bugs. Eh? How can a ULP cause poll_cq to get errors? Are you sure these are not driver bugs? Why can't you just print and discard the broken CQ entry? What should use ULP do when it get EINVAL? You say poll again is not correct, so you suggest a full QP tear down? The entire point was to make the ULP interface sane. IMHO if the driver has bugged itself then it should do something sane internally and not just punt random error codes to the user. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote: > On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@mellanox.com> > > > > By Mellanox HW design and SW implementation poll_cq never > > fails and returns errors, so all these prints are to catch ULP bugs. > > Eh? How can a ULP cause poll_cq to get errors? > > Are you sure these are not driver bugs? > > Why can't you just print and discard the broken CQ entry? > > What should use ULP do when it get EINVAL? You say poll again is > not correct, so you suggest a full QP tear down? See patches 4 and 6, they completely removed these EINVALs.
On Sun, Aug 28, 2016 at 9:26 PM, Leon Romanovsky <leon@kernel.org> wrote: > On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote: >> On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote: >> > From: Leon Romanovsky <leonro@mellanox.com> >> > >> > By Mellanox HW design and SW implementation poll_cq never >> > fails and returns errors, so all these prints are to catch ULP bugs. >> >> Eh? How can a ULP cause poll_cq to get errors? >> >> Are you sure these are not driver bugs? >> >> Why can't you just print and discard the broken CQ entry? >> >> What should use ULP do when it get EINVAL? You say poll again is >> not correct, so you suggest a full QP tear down? > > See patches 4 and 6, they completely removed these EINVALs. Sorry, 4 and 9 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Aug 28, 2016 at 09:26:13PM +0300, Leon Romanovsky wrote: > On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote: > > On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote: > > > From: Leon Romanovsky <leonro@mellanox.com> > > > > > > By Mellanox HW design and SW implementation poll_cq never > > > fails and returns errors, so all these prints are to catch ULP bugs. > > > > Eh? How can a ULP cause poll_cq to get errors? > > > > Are you sure these are not driver bugs? > > > > Why can't you just print and discard the broken CQ entry? > > > > What should use ULP do when it get EINVAL? You say poll again is > > not correct, so you suggest a full QP tear down? > > See patches 4 and 6, they completely removed these EINVALs. So the commit message is still wrong. Why do we need this revert? Just squash it and mark it fixup the original. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Aug 28, 2016 at 12:28:13PM -0600, Jason Gunthorpe wrote: > On Sun, Aug 28, 2016 at 09:26:13PM +0300, Leon Romanovsky wrote: > > On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote: > > > On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote: > > > > From: Leon Romanovsky <leonro@mellanox.com> > > > > > > > > By Mellanox HW design and SW implementation poll_cq never > > > > fails and returns errors, so all these prints are to catch ULP bugs. > > > > > > Eh? How can a ULP cause poll_cq to get errors? > > > > > > Are you sure these are not driver bugs? > > > > > > Why can't you just print and discard the broken CQ entry? > > > > > > What should use ULP do when it get EINVAL? You say poll again is > > > not correct, so you suggest a full QP tear down? > > > > See patches 4 and 6, they completely removed these EINVALs. > > So the commit message is still wrong. > > Why do we need this revert? Just squash it and mark it fixup the > original. We need this revert, because the original commit is wrong and as was presented by Sasha Levin in his talk about automatic creation of stable trees [1], he needs this information to ensure that this commit won't be in stable tree by mistake. [1] https://lcccna2016.sched.org/event/7JW4/automating-the-creation-of-stable-trees-sasha-levin-verizon-labs > > Jason
On Sun, Aug 28, 2016 at 09:35:00PM +0300, Leon Romanovsky wrote: > We need this revert, because the original commit is wrong and as was > presented Sure, but it was also wrong to return EINVAL at all, the only correct outcome I can see is to apply your whole series. Reverting doesn't make the big picture any better, just differently wrong. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/28/2016 2:35 PM, Leon Romanovsky wrote: > On Sun, Aug 28, 2016 at 12:28:13PM -0600, Jason Gunthorpe wrote: >> On Sun, Aug 28, 2016 at 09:26:13PM +0300, Leon Romanovsky wrote: >>> On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote: >>>> On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote: >>>>> From: Leon Romanovsky <leonro@mellanox.com> >>>>> >>>>> By Mellanox HW design and SW implementation poll_cq never >>>>> fails and returns errors, so all these prints are to catch ULP bugs. >>>> >>>> Eh? How can a ULP cause poll_cq to get errors? >>>> >>>> Are you sure these are not driver bugs? >>>> >>>> Why can't you just print and discard the broken CQ entry? >>>> >>>> What should use ULP do when it get EINVAL? You say poll again is >>>> not correct, so you suggest a full QP tear down? >>> >>> See patches 4 and 6, they completely removed these EINVALs. >> >> So the commit message is still wrong. >> >> Why do we need this revert? Just squash it and mark it fixup the >> original. > > We need this revert, because the original commit is wrong and as was presented > by Sasha Levin in his talk about automatic creation of stable trees [1], he needs > this information to ensure that this commit won't be in stable tree by mistake. > > [1] > https://lcccna2016.sched.org/event/7JW4/automating-the-creation-of-stable-trees-sasha-levin-verizon-labs I didn't see the presentation at this link, but I'm not sure I agree with your statement here. There should be no difference, effectively, between a commit to revert a bad commit, and a commit to fix a bad commit that uses a Fixes: tag referencing the original commit.
On Fri, Sep 02, 2016 at 02:03:13PM -0400, Doug Ledford wrote: > On 8/28/2016 2:35 PM, Leon Romanovsky wrote: > > On Sun, Aug 28, 2016 at 12:28:13PM -0600, Jason Gunthorpe wrote: > >> On Sun, Aug 28, 2016 at 09:26:13PM +0300, Leon Romanovsky wrote: > >>> On Sun, Aug 28, 2016 at 11:17:58AM -0600, Jason Gunthorpe wrote: > >>>> On Sun, Aug 28, 2016 at 10:58:32AM +0300, Leon Romanovsky wrote: > >>>>> From: Leon Romanovsky <leonro@mellanox.com> > >>>>> > >>>>> By Mellanox HW design and SW implementation poll_cq never > >>>>> fails and returns errors, so all these prints are to catch ULP bugs. > >>>> > >>>> Eh? How can a ULP cause poll_cq to get errors? > >>>> > >>>> Are you sure these are not driver bugs? > >>>> > >>>> Why can't you just print and discard the broken CQ entry? > >>>> > >>>> What should use ULP do when it get EINVAL? You say poll again is > >>>> not correct, so you suggest a full QP tear down? > >>> > >>> See patches 4 and 6, they completely removed these EINVALs. > >> > >> So the commit message is still wrong. > >> > >> Why do we need this revert? Just squash it and mark it fixup the > >> original. > > > > We need this revert, because the original commit is wrong and as was presented > > by Sasha Levin in his talk about automatic creation of stable trees [1], he needs > > this information to ensure that this commit won't be in stable tree by mistake. > > > > [1] > > https://lcccna2016.sched.org/event/7JW4/automating-the-creation-of-stable-trees-sasha-levin-verizon-labs > > I didn't see the presentation at this link, but I'm not sure I agree > with your statement here. There should be no difference, effectively, > between a commit to revert a bad commit, and a commit to fix a bad > commit that uses a Fixes: tag referencing the original commit. The final code will look the same, but it will be much easier for the tools to pickup them. > > > -- > Doug Ledford <dledford@redhat.com> > GPG Key ID: 0E572FDD >
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 006db64..15b6289 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -690,7 +690,7 @@ repoll: if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) == MLX4_OPCODE_NOP && is_send)) { pr_warn("Completion for NOP opcode detected!\n"); - return -EAGAIN; + return -EINVAL; } /* Resize CQ in progress */ @@ -721,7 +721,7 @@ repoll: if (unlikely(!mqp)) { pr_warn("CQ %06x with entry for unknown QPN %06x\n", cq->mcq.cqn, be32_to_cpu(cqe->vlan_my_qpn) & MLX4_CQE_QPN_MASK); - return -EAGAIN; + return -EINVAL; } *cur_qp = to_mibqp(mqp); @@ -739,7 +739,7 @@ repoll: if (unlikely(!msrq)) { pr_warn("CQ %06x with entry for unknown SRQN %06x\n", cq->mcq.cqn, srq_num); - return -EAGAIN; + return -EINVAL; } }