diff mbox series

[for-next,6/6] RDMA/rxe: Fix redundant skb_put_zero

Message ID 20210618045742.204195-7-rpearsonhpe@gmail.com (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series Fix extra/redundant copies | expand

Commit Message

Bob Pearson June 18, 2021, 4:57 a.m. UTC
rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
for the payload and zero it out. All these bytes are then re-written
with RoCE headers and payload. Remove this useless extra copy.

Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Zhu Yanjun June 18, 2021, 8:02 a.m. UTC | #1
On Fri, Jun 18, 2021 at 1:00 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
> for the payload and zero it out. All these bytes are then re-written
> with RoCE headers and payload. Remove this useless extra copy.

The paylen seems to be a variable, that is, the length of pkt->hdr is not fixed.

Can you confirm that all the pkt->hdr are re-writtenwith RoCE headers
and payload?

Zhu Yanjun

>
> Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index 178a66a45312..6605ee777667 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -470,7 +470,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
>
>         pkt->rxe        = rxe;
>         pkt->port_num   = port_num;
> -       pkt->hdr        = skb_put_zero(skb, paylen);
> +       pkt->hdr        = skb_put(skb, paylen);
>         pkt->mask       |= RXE_GRH_MASK;
>
>  out:
> --
> 2.30.2
>
Bob Pearson June 18, 2021, 3:32 p.m. UTC | #2
On 6/18/21 3:02 AM, Zhu Yanjun wrote:
> On Fri, Jun 18, 2021 at 1:00 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>
>> rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
>> for the payload and zero it out. All these bytes are then re-written
>> with RoCE headers and payload. Remove this useless extra copy.
> 
> The paylen seems to be a variable, that is, the length of pkt->hdr is not fixed.
> 
> Can you confirm that all the pkt->hdr are re-writtenwith RoCE headers
> and payload?

Yes. rxe_init_packet() is called twice, once from rxe_req.c for request packets and once from rxe_resp.c for response packets.
In rxe_req.c in init_req_packet() paylen is set to

    paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE

which is the correct size of the packet from the UDP header to the frame FCS i.e. the UDP payload. rxe_opcode[opcode] is a table that includes the length of the all the RoCE headers for a given opcode which does vary. Payload is the RoCE payload and pad is the number of pad bytes required to extend the payload to a multiple of 4 bytes. RXE_ICRC_SIZE is the 4 bytes for the RoCE invariant CRC. It requires some checking but all the headers are fully written, the payload is fully copied from the client and the pad and ICRC bytes are also written. In rxe_resp.c paylen is set to the same value.
 
There are two potential issues here 1) Is the intended packet sent to the destination, and 2) is there a possibility that information can leak from the kernel to the outside. The above addresses 1). 2) requires the assumption that the NIC is not examining data outside of the proper data area in the skb and doing something with it. But you have a worse problem there since the NIC has DMA access to all of kernel memory and can send any packet it likes.

Bob Pearson

> Zhu Yanjun
> 
>>
>> Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>>  drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
>> index 178a66a45312..6605ee777667 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>> @@ -470,7 +470,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
>>
>>         pkt->rxe        = rxe;
>>         pkt->port_num   = port_num;
>> -       pkt->hdr        = skb_put_zero(skb, paylen);
>> +       pkt->hdr        = skb_put(skb, paylen);
>>         pkt->mask       |= RXE_GRH_MASK;
>>
>>  out:
>> --
>> 2.30.2
>>
Zhu Yanjun June 20, 2021, 2:07 p.m. UTC | #3
On Fri, Jun 18, 2021 at 11:32 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> On 6/18/21 3:02 AM, Zhu Yanjun wrote:
> > On Fri, Jun 18, 2021 at 1:00 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >>
> >> rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
> >> for the payload and zero it out. All these bytes are then re-written
> >> with RoCE headers and payload. Remove this useless extra copy.
> >
> > The paylen seems to be a variable, that is, the length of pkt->hdr is not fixed.
> >
> > Can you confirm that all the pkt->hdr are re-writtenwith RoCE headers
> > and payload?
>
> Yes. rxe_init_packet() is called twice, once from rxe_req.c for request packets and once from rxe_resp.c for response packets.
> In rxe_req.c in init_req_packet() paylen is set to
>
>     paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE
>
> which is the correct size of the packet from the UDP header to the frame FCS i.e. the UDP payload. rxe_opcode[opcode] is a table that includes the length of the all the RoCE headers for a given opcode which does vary. Payload is the RoCE payload and pad is the number of pad bytes required to extend the payload to a multiple of 4 bytes. RXE_ICRC_SIZE is the 4 bytes for the RoCE invariant CRC. It requires some checking but all the headers are fully written, the payload is fully copied from the client and the pad and ICRC bytes are also written. In rxe_resp.c paylen is set to the same value.

Too complicated assignment.
So I prefer to skb_put_zero.

Zhu Yanjun
>
> There are two potential issues here 1) Is the intended packet sent to the destination, and 2) is there a possibility that information can leak from the kernel to the outside. The above addresses 1). 2) requires the assumption that the NIC is not examining data outside of the proper data area in the skb and doing something with it. But you have a worse problem there since the NIC has DMA access to all of kernel memory and can send any packet it likes.
>
> Bob Pearson
>
> > Zhu Yanjun
> >
> >>
> >> Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
> >> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >> ---
> >>  drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> >> index 178a66a45312..6605ee777667 100644
> >> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> >> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> >> @@ -470,7 +470,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
> >>
> >>         pkt->rxe        = rxe;
> >>         pkt->port_num   = port_num;
> >> -       pkt->hdr        = skb_put_zero(skb, paylen);
> >> +       pkt->hdr        = skb_put(skb, paylen);
> >>         pkt->mask       |= RXE_GRH_MASK;
> >>
> >>  out:
> >> --
> >> 2.30.2
> >>
>
Bob Pearson June 20, 2021, 8:21 p.m. UTC | #4
On 6/20/21 9:07 AM, Zhu Yanjun wrote:
> On Fri, Jun 18, 2021 at 11:32 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>
>> On 6/18/21 3:02 AM, Zhu Yanjun wrote:
>>> On Fri, Jun 18, 2021 at 1:00 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>>>>
>>>> rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
>>>> for the payload and zero it out. All these bytes are then re-written
>>>> with RoCE headers and payload. Remove this useless extra copy.
>>>
>>> The paylen seems to be a variable, that is, the length of pkt->hdr is not fixed.
>>>
>>> Can you confirm that all the pkt->hdr are re-writtenwith RoCE headers
>>> and payload?
>>
>> Yes. rxe_init_packet() is called twice, once from rxe_req.c for request packets and once from rxe_resp.c for response packets.
>> In rxe_req.c in init_req_packet() paylen is set to
>>
>>     paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE
>>
>> which is the correct size of the packet from the UDP header to the frame FCS i.e. the UDP payload. rxe_opcode[opcode] is a table that includes the length of the all the RoCE headers for a given opcode which does vary. Payload is the RoCE payload and pad is the number of pad bytes required to extend the payload to a multiple of 4 bytes. RXE_ICRC_SIZE is the 4 bytes for the RoCE invariant CRC. It requires some checking but all the headers are fully written, the payload is fully copied from the client and the pad and ICRC bytes are also written. In rxe_resp.c paylen is set to the same value.
> 
> Too complicated assignment.
> So I prefer to skb_put_zero.

My goal here is to improve the performance of rxe. This one line adds an extra memory copy on every sent message. Without the skb_put_zero it passes all the tests and works correctly. What are you worried about exactly?

Bob
> 
> Zhu Yanjun
>>
>> There are two potential issues here 1) Is the intended packet sent to the destination, and 2) is there a possibility that information can leak from the kernel to the outside. The above addresses 1). 2) requires the assumption that the NIC is not examining data outside of the proper data area in the skb and doing something with it. But you have a worse problem there since the NIC has DMA access to all of kernel memory and can send any packet it likes.
>>
>> Bob Pearson
>>
>>> Zhu Yanjun
>>>
>>>>
>>>> Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>>> ---
>>>>  drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
>>>> index 178a66a45312..6605ee777667 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>>>> @@ -470,7 +470,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
>>>>
>>>>         pkt->rxe        = rxe;
>>>>         pkt->port_num   = port_num;
>>>> -       pkt->hdr        = skb_put_zero(skb, paylen);
>>>> +       pkt->hdr        = skb_put(skb, paylen);
>>>>         pkt->mask       |= RXE_GRH_MASK;
>>>>
>>>>  out:
>>>> --
>>>> 2.30.2
>>>>
>>
Zhu Yanjun June 21, 2021, 2:58 a.m. UTC | #5
On Mon, Jun 21, 2021 at 4:21 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> On 6/20/21 9:07 AM, Zhu Yanjun wrote:
> > On Fri, Jun 18, 2021 at 11:32 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >>
> >> On 6/18/21 3:02 AM, Zhu Yanjun wrote:
> >>> On Fri, Jun 18, 2021 at 1:00 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >>>>
> >>>> rxe_init_packet() in rxe_net.c calls skb_put_zero() to reserve space
> >>>> for the payload and zero it out. All these bytes are then re-written
> >>>> with RoCE headers and payload. Remove this useless extra copy.
> >>>
> >>> The paylen seems to be a variable, that is, the length of pkt->hdr is not fixed.
> >>>
> >>> Can you confirm that all the pkt->hdr are re-writtenwith RoCE headers
> >>> and payload?
> >>
> >> Yes. rxe_init_packet() is called twice, once from rxe_req.c for request packets and once from rxe_resp.c for response packets.
> >> In rxe_req.c in init_req_packet() paylen is set to
> >>
> >>     paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE
> >>
> >> which is the correct size of the packet from the UDP header to the frame FCS i.e. the UDP payload. rxe_opcode[opcode] is a table that includes the length of the all the RoCE headers for a given opcode which does vary. Payload is the RoCE payload and pad is the number of pad bytes required to extend the payload to a multiple of 4 bytes. RXE_ICRC_SIZE is the 4 bytes for the RoCE invariant CRC. It requires some checking but all the headers are fully written, the payload is fully copied from the client and the pad and ICRC bytes are also written. In rxe_resp.c paylen is set to the same value.
> >
> > Too complicated assignment.
> > So I prefer to skb_put_zero.
>
> My goal here is to improve the performance of rxe. This one line adds an extra memory copy on every sent message. Without the skb_put_zero it passes all the tests and works correctly. What are you worried about exactly?

Please show us the performance data.

Thanks
Zhu Yanjun

>
> Bob
> >
> > Zhu Yanjun
> >>
> >> There are two potential issues here 1) Is the intended packet sent to the destination, and 2) is there a possibility that information can leak from the kernel to the outside. The above addresses 1). 2) requires the assumption that the NIC is not examining data outside of the proper data area in the skb and doing something with it. But you have a worse problem there since the NIC has DMA access to all of kernel memory and can send any packet it likes.
> >>
> >> Bob Pearson
> >>
> >>> Zhu Yanjun
> >>>
> >>>>
> >>>> Fixes: ecb238f6a7f3 ("IB/cxgb4: use skb_put_zero()/__skb_put_zero")
> >>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> >>>> ---
> >>>>  drivers/infiniband/sw/rxe/rxe_net.c | 2 +-
> >>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> >>>> index 178a66a45312..6605ee777667 100644
> >>>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> >>>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> >>>> @@ -470,7 +470,7 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
> >>>>
> >>>>         pkt->rxe        = rxe;
> >>>>         pkt->port_num   = port_num;
> >>>> -       pkt->hdr        = skb_put_zero(skb, paylen);
> >>>> +       pkt->hdr        = skb_put(skb, paylen);
> >>>>         pkt->mask       |= RXE_GRH_MASK;
> >>>>
> >>>>  out:
> >>>> --
> >>>> 2.30.2
> >>>>
> >>
>
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 178a66a45312..6605ee777667 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -470,7 +470,7 @@  struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
 
 	pkt->rxe	= rxe;
 	pkt->port_num	= port_num;
-	pkt->hdr	= skb_put_zero(skb, paylen);
+	pkt->hdr	= skb_put(skb, paylen);
 	pkt->mask	|= RXE_GRH_MASK;
 
 out: