diff mbox series

[2/2] rxe: correctly calculate iCRC for unaligned payloads

Message ID 20191203020319.15036-2-larrystevenwise@gmail.com (mailing list archive)
State Mainlined
Commit 2030abddec6884aaf5892f5724c48fc340e6826f
Headers show
Series [1/2] Update mailmap info for Steve Wise | expand

Commit Message

Steve Wise Dec. 3, 2019, 2:03 a.m. UTC
If RoCE PDUs being sent or received contain pad bytes, then the iCRC
is miscalculated resulting PDUs being emitted by RXE with an incorrect
iCRC, as well as ingress PDUs being dropped due to erroneously detecting
a bad iCRC in the PDU.  The fix is to include the pad bytes, if any,
in iCRC computations.

CC: bvanassche@acm.org,3100102071@zju.edu.cn,leon@kernel.org
Signed-off-by: Steve Wise <larrystevenwise@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_recv.c | 2 +-
 drivers/infiniband/sw/rxe/rxe_req.c  | 6 ++++++
 drivers/infiniband/sw/rxe/rxe_resp.c | 7 +++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

Comments

Bart Van Assche Dec. 3, 2019, 4:25 p.m. UTC | #1
On 12/2/19 6:03 PM, Steve Wise wrote:
> If RoCE PDUs being sent or received contain pad bytes, then the iCRC
> is miscalculated resulting PDUs being emitted by RXE with an incorrect
> iCRC, as well as ingress PDUs being dropped due to erroneously detecting
> a bad iCRC in the PDU.  The fix is to include the pad bytes, if any,
> in iCRC computations.

Should this description mention that this patch breaks compatibility 
with SoftRoCE drivers that do not include this fix? Do we need a kernel 
module parameter that allows to select either the old or the new behavior?

> CC: bvanassche@acm.org,3100102071@zju.edu.cn,leon@kernel.org

Should this Cc-line perhaps be converted into three Cc-lines?

Otherwise this patch looks fine to me.

Thanks,

Bart.
Steve Wise Dec. 3, 2019, 6:32 p.m. UTC | #2
On Tue, Dec 3, 2019 at 10:25 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 12/2/19 6:03 PM, Steve Wise wrote:
> > If RoCE PDUs being sent or received contain pad bytes, then the iCRC
> > is miscalculated resulting PDUs being emitted by RXE with an incorrect
> > iCRC, as well as ingress PDUs being dropped due to erroneously detecting
> > a bad iCRC in the PDU.  The fix is to include the pad bytes, if any,
> > in iCRC computations.
>
> Should this description mention that this patch breaks compatibility
> with SoftRoCE drivers that do not include this fix? Do we need a kernel
> module parameter that allows to select either the old or the new behavior?
>

Good point.  I defer to others on how they want to handle that.

> > CC: bvanassche@acm.org,3100102071@zju.edu.cn,leon@kernel.org
>
> Should this Cc-line perhaps be converted into three Cc-lines?

Yea I screwed this up.   I really didn't want this in the commit log
vs just CCing on the email submission, but I was having issues with
git send-email.  Pilot error. ;)


Stevo
Doug Ledford Dec. 4, 2019, 12:46 a.m. UTC | #3
On Tue, 2019-12-03 at 08:25 -0800, Bart Van Assche wrote:
> On 12/2/19 6:03 PM, Steve Wise wrote:
> > If RoCE PDUs being sent or received contain pad bytes, then the iCRC
> > is miscalculated resulting PDUs being emitted by RXE with an
> > incorrect
> > iCRC, as well as ingress PDUs being dropped due to erroneously
> > detecting
> > a bad iCRC in the PDU.  The fix is to include the pad bytes, if any,
> > in iCRC computations.
> 
> Should this description mention that this patch breaks compatibility 
> with SoftRoCE drivers that do not include this fix? Do we need a
> kernel 
> module parameter that allows to select either the old or the new
> behavior?

No.  The original soft-RoCE driver was supposed to be compatible with
hardware devices.  Because of this bug, it obviously wasn't.  This is a
bug fix, and we do not need to do anything to be compatible with the
broken behavior.  Instead, it just needs noting that the soft-RoCE
implementation in prior kernels has a known wire format bug that impacts
communications with both fixed versions of the driver and real hardware
devices.

> > CC: bvanassche@acm.org,3100102071@zju.edu.cn,leon@kernel.org
> 
> Should this Cc-line perhaps be converted into three Cc-lines?
> 
> Otherwise this patch looks fine to me.
> 
> Thanks,
> 
> Bart.
>
Doug Ledford Dec. 9, 2019, 7:07 p.m. UTC | #4
On Tue, 2019-12-03 at 19:46 -0500, Doug Ledford wrote:
> On Tue, 2019-12-03 at 08:25 -0800, Bart Van Assche wrote:
> > On 12/2/19 6:03 PM, Steve Wise wrote:
> > > If RoCE PDUs being sent or received contain pad bytes, then the
> > > iCRC
> > > is miscalculated resulting PDUs being emitted by RXE with an
> > > incorrect
> > > iCRC, as well as ingress PDUs being dropped due to erroneously
> > > detecting
> > > a bad iCRC in the PDU.  The fix is to include the pad bytes, if
> > > any,
> > > in iCRC computations.
> > 
> > Should this description mention that this patch breaks
> > compatibility 
> > with SoftRoCE drivers that do not include this fix? Do we need a
> > kernel 
> > module parameter that allows to select either the old or the new
> > behavior?
> 
> No.  The original soft-RoCE driver was supposed to be compatible with
> hardware devices.  Because of this bug, it obviously wasn't.  This is
> a
> bug fix, and we do not need to do anything to be compatible with the
> broken behavior.  Instead, it just needs noting that the soft-RoCE
> implementation in prior kernels has a known wire format bug that
> impacts
> communications with both fixed versions of the driver and real
> hardware
> devices.

I've taken these two patches into for-rc (with fixups to the commit
message on the second, as well as adding a Fixes: tag on the second).

I stand by what I said about not needing a compatibility flag or module
option for the user to set.  However, that isn't to say that we can't
autodetect old soft-RoCE peers.  If we get a packet that fails CRC and
has pad bytes, then re-run the CRC without the pad bytes and see if it
matches.  If it does, we could A) mark the current QP as being to an old
soft-RoCE device (causing us to send without including the pad bytes in
the CRC) and B) allocate a struct old_soft_roce_peer and save the guid
into that struct and then put that struct on a list that we then search
any time we are creating a new queue pair and if the new queue pair goes
to a guid in the list, then we immediately flag that qp as being to an
old soft roce device and get the right behavior.  It would slow down qp
creation somewhat due to the list search, but probably not enough to
worry about.  No one will be doing a 1,000 node cluster MPI job over
soft-RoCE, so we should never notice the list length causing search
problems.  A patch to do something like that would be welcome.
Leon Romanovsky Dec. 10, 2019, 6:54 a.m. UTC | #5
On Mon, Dec 09, 2019 at 02:07:06PM -0500, Doug Ledford wrote:
> On Tue, 2019-12-03 at 19:46 -0500, Doug Ledford wrote:
> > On Tue, 2019-12-03 at 08:25 -0800, Bart Van Assche wrote:
> > > On 12/2/19 6:03 PM, Steve Wise wrote:
> > > > If RoCE PDUs being sent or received contain pad bytes, then the
> > > > iCRC
> > > > is miscalculated resulting PDUs being emitted by RXE with an
> > > > incorrect
> > > > iCRC, as well as ingress PDUs being dropped due to erroneously
> > > > detecting
> > > > a bad iCRC in the PDU.  The fix is to include the pad bytes, if
> > > > any,
> > > > in iCRC computations.
> > >
> > > Should this description mention that this patch breaks
> > > compatibility
> > > with SoftRoCE drivers that do not include this fix? Do we need a
> > > kernel
> > > module parameter that allows to select either the old or the new
> > > behavior?
> >
> > No.  The original soft-RoCE driver was supposed to be compatible with
> > hardware devices.  Because of this bug, it obviously wasn't.  This is
> > a
> > bug fix, and we do not need to do anything to be compatible with the
> > broken behavior.  Instead, it just needs noting that the soft-RoCE
> > implementation in prior kernels has a known wire format bug that
> > impacts
> > communications with both fixed versions of the driver and real
> > hardware
> > devices.
>
> I've taken these two patches into for-rc (with fixups to the commit
> message on the second, as well as adding a Fixes: tag on the second).
>
> I stand by what I said about not needing a compatibility flag or module
> option for the user to set.  However, that isn't to say that we can't
> autodetect old soft-RoCE peers.  If we get a packet that fails CRC and
> has pad bytes, then re-run the CRC without the pad bytes and see if it
> matches.  If it does, we could A) mark the current QP as being to an old
> soft-RoCE device (causing us to send without including the pad bytes in
> the CRC) and B) allocate a struct old_soft_roce_peer and save the guid
> into that struct and then put that struct on a list that we then search
> any time we are creating a new queue pair and if the new queue pair goes
> to a guid in the list, then we immediately flag that qp as being to an
> old soft roce device and get the right behavior.  It would slow down qp
> creation somewhat due to the list search, but probably not enough to
> worry about.  No one will be doing a 1,000 node cluster MPI job over
> soft-RoCE, so we should never notice the list length causing search
> problems.  A patch to do something like that would be welcome.

Do you find this implementation needed? I see RXE as a development
platform and in my view it is unlikely that someone will run RXE in
production with mixture of different kernel versions, which requires
such compatibility fallback.

Thanks

>
> --
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>     Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
Doug Ledford Dec. 11, 2019, 4:24 a.m. UTC | #6
On Tue, 2019-12-10 at 08:54 +0200, Leon Romanovsky wrote:
> On Mon, Dec 09, 2019 at 02:07:06PM -0500, Doug Ledford wrote:
> > 
> > I've taken these two patches into for-rc (with fixups to the commit
> > message on the second, as well as adding a Fixes: tag on the
> > second).
> > 
> > I stand by what I said about not needing a compatibility flag or
> > module
> > option for the user to set.  However, that isn't to say that we
> > can't
> > autodetect old soft-RoCE peers.  If we get a packet that fails CRC
> > and
> > has pad bytes, then re-run the CRC without the pad bytes and see if
> > it
> > matches.  If it does, we could A) mark the current QP as being to an
> > old
> > soft-RoCE device (causing us to send without including the pad bytes
> > in
> > the CRC) and B) allocate a struct old_soft_roce_peer and save the
> > guid
> > into that struct and then put that struct on a list that we then
> > search
> > any time we are creating a new queue pair and if the new queue pair
> > goes
> > to a guid in the list, then we immediately flag that qp as being to
> > an
> > old soft roce device and get the right behavior.  It would slow down
> > qp
> > creation somewhat due to the list search, but probably not enough to
> > worry about.  No one will be doing a 1,000 node cluster MPI job over
> > soft-RoCE, so we should never notice the list length causing search
> > problems.  A patch to do something like that would be welcome.
> 
> Do you find this implementation needed? I see RXE as a development
> platform and in my view it is unlikely that someone will run RXE in
> production with mixture of different kernel versions, which requires
> such compatibility fallback.

It's not a requirement, that's why I took the patches as they were.  It
would just be a "nice to have".
Leon Romanovsky Dec. 11, 2019, 5:59 a.m. UTC | #7
On Tue, Dec 10, 2019 at 11:24:21PM -0500, Doug Ledford wrote:
> On Tue, 2019-12-10 at 08:54 +0200, Leon Romanovsky wrote:
> > On Mon, Dec 09, 2019 at 02:07:06PM -0500, Doug Ledford wrote:
> > >
> > > I've taken these two patches into for-rc (with fixups to the commit
> > > message on the second, as well as adding a Fixes: tag on the
> > > second).
> > >
> > > I stand by what I said about not needing a compatibility flag or
> > > module
> > > option for the user to set.  However, that isn't to say that we
> > > can't
> > > autodetect old soft-RoCE peers.  If we get a packet that fails CRC
> > > and
> > > has pad bytes, then re-run the CRC without the pad bytes and see if
> > > it
> > > matches.  If it does, we could A) mark the current QP as being to an
> > > old
> > > soft-RoCE device (causing us to send without including the pad bytes
> > > in
> > > the CRC) and B) allocate a struct old_soft_roce_peer and save the
> > > guid
> > > into that struct and then put that struct on a list that we then
> > > search
> > > any time we are creating a new queue pair and if the new queue pair
> > > goes
> > > to a guid in the list, then we immediately flag that qp as being to
> > > an
> > > old soft roce device and get the right behavior.  It would slow down
> > > qp
> > > creation somewhat due to the list search, but probably not enough to
> > > worry about.  No one will be doing a 1,000 node cluster MPI job over
> > > soft-RoCE, so we should never notice the list length causing search
> > > problems.  A patch to do something like that would be welcome.
> >
> > Do you find this implementation needed? I see RXE as a development
> > platform and in my view it is unlikely that someone will run RXE in
> > production with mixture of different kernel versions, which requires
> > such compatibility fallback.
>
> It's not a requirement, that's why I took the patches as they were.  It
> would just be a "nice to have".

I see, thanks

>
> --
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
>     Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
Tom Talpey Dec. 11, 2019, 2:42 p.m. UTC | #8
On 12/10/2019 11:24 PM, Doug Ledford wrote:
> On Tue, 2019-12-10 at 08:54 +0200, Leon Romanovsky wrote:
>> On Mon, Dec 09, 2019 at 02:07:06PM -0500, Doug Ledford wrote:
>>>
>>> I've taken these two patches into for-rc (with fixups to the commit
>>> message on the second, as well as adding a Fixes: tag on the
>>> second).
>>>
>>> I stand by what I said about not needing a compatibility flag or
>>> module
>>> option for the user to set.  However, that isn't to say that we
>>> can't
>>> autodetect old soft-RoCE peers.  If we get a packet that fails CRC
>>> and
>>> has pad bytes, then re-run the CRC without the pad bytes and see if
>>> it
>>> matches.  If it does, we could A) mark the current QP as being to an
>>> old
>>> soft-RoCE device (causing us to send without including the pad bytes
>>> in
>>> the CRC) and B) allocate a struct old_soft_roce_peer and save the
>>> guid
>>> into that struct and then put that struct on a list that we then
>>> search
>>> any time we are creating a new queue pair and if the new queue pair
>>> goes
>>> to a guid in the list, then we immediately flag that qp as being to
>>> an
>>> old soft roce device and get the right behavior.  It would slow down
>>> qp
>>> creation somewhat due to the list search, but probably not enough to
>>> worry about.  No one will be doing a 1,000 node cluster MPI job over
>>> soft-RoCE, so we should never notice the list length causing search
>>> problems.  A patch to do something like that would be welcome.
>>
>> Do you find this implementation needed? I see RXE as a development
>> platform and in my view it is unlikely that someone will run RXE in
>> production with mixture of different kernel versions, which requires
>> such compatibility fallback.
> 
> It's not a requirement, that's why I took the patches as they were.  It
> would just be a "nice to have".

The counterargument to this is that it only extends the protocol bug
into the future, and for one single RoCE implementation. No hardware
implementation will do this, as it violates the protocol. And, it
potentially opens a silent data corruption, by accepting packets which
don't actually pass the checksum.

Personally, I'd say it "nice to avoid", i.e. don't apply such a patch.

MHO.
Doug Ledford Dec. 12, 2019, 10:06 p.m. UTC | #9
On Wed, 2019-12-11 at 09:42 -0500, Tom Talpey wrote:
> On 12/10/2019 11:24 PM, Doug Ledford wrote:
> > On Tue, 2019-12-10 at 08:54 +0200, Leon Romanovsky wrote:
> > > On Mon, Dec 09, 2019 at 02:07:06PM -0500, Doug Ledford wrote:
> > > > 
> > > Do you find this implementation needed? I see RXE as a development
> > > platform and in my view it is unlikely that someone will run RXE
> > > in
> > > production with mixture of different kernel versions, which
> > > requires
> > > such compatibility fallback.
> > 
> > It's not a requirement, that's why I took the patches as they
> > were.  It
> > would just be a "nice to have".
> 
> The counterargument to this is that it only extends the protocol bug
> into the future, and for one single RoCE implementation.

It just allows buggy implementations to talk to newer soft-RoCE
(although not to hardware).

>  No hardware
> implementation will do this, as it violates the protocol.

Right.

>  And, it
> potentially opens a silent data corruption, by accepting packets which
> don't actually pass the checksum.

This, I highly doubt.  For packets without padding, it would be the
same.  For packets with padding, it would only allow packets where the
data bytes had a correct CRC, so it's not like it just allows anything
to come through.  And it would only allow it if the flag was set, it's
not like we would allow two different CRCs on every packet with padding,
it's either on or off, and the check still covers all data bytes.  I
find it highly unlikely that this would introduce any sort of data
consistency problems for the specific case of old soft-RoCE talking to
new soft-RoCE.

> Personally, I'd say it "nice to avoid", i.e. don't apply such a patch.

No one has submitted a patch, so we seem to be good regardless ;-)
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index f9a492ed900b..831ad578a7b2 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -389,7 +389,7 @@  void rxe_rcv(struct sk_buff *skb)
 
 	calc_icrc = rxe_icrc_hdr(pkt, skb);
 	calc_icrc = rxe_crc32(rxe, calc_icrc, (u8 *)payload_addr(pkt),
-			      payload_size(pkt));
+			      payload_size(pkt) + bth_pad(pkt));
 	calc_icrc = (__force u32)cpu_to_be32(~calc_icrc);
 	if (unlikely(calc_icrc != pack_icrc)) {
 		if (skb->protocol == htons(ETH_P_IPV6))
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index c5d9b558fa90..e5031172c019 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -500,6 +500,12 @@  static int fill_packet(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
 			if (err)
 				return err;
 		}
+		if (bth_pad(pkt)) {
+			u8 *pad = payload_addr(pkt) + paylen;
+
+			memset(pad, 0, bth_pad(pkt));
+			crc = rxe_crc32(rxe, crc, pad, bth_pad(pkt));
+		}
 	}
 	p = payload_addr(pkt) + paylen + bth_pad(pkt);
 
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 1cbfbd98eb22..c4a8195bf670 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -732,6 +732,13 @@  static enum resp_states read_reply(struct rxe_qp *qp,
 	if (err)
 		pr_err("Failed copying memory\n");
 
+	if (bth_pad(&ack_pkt)) {
+		struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+		u8 *pad = payload_addr(&ack_pkt) + payload;
+
+		memset(pad, 0, bth_pad(&ack_pkt));
+		icrc = rxe_crc32(rxe, icrc, pad, bth_pad(&ack_pkt));
+	}
 	p = payload_addr(&ack_pkt) + payload + bth_pad(&ack_pkt);
 	*p = ~icrc;