diff mbox

[net-next] xen-netfront: avoid packet loss when ethernet header crosses page boundary

Message ID 1471880577-21380-1-git-send-email-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vitaly Kuznetsov Aug. 22, 2016, 3:42 p.m. UTC
Small packet loss is reported on complex multi host network configurations
including tunnels, NAT, ... My investigation led me to the following check
in netback which drops packets:

        if (unlikely(txreq.size < ETH_HLEN)) {
                netdev_err(queue->vif->dev,
                           "Bad packet size: %d\n", txreq.size);
                xenvif_tx_err(queue, &txreq, extra_count, idx);
                break;
        }

But this check itself is legitimate. SKBs consist of a linear part (which
has to have the ethernet header) and (optionally) a number of frags.
Netfront transmits the head of the linear part up to the page boundary
as the first request and all the rest becomes frags so when we're
reconstructing the SKB in netback we can't distinguish between original
frags and the 'tail' of the linear part. The first SKB needs to be at
least ETH_HLEN size. So in case we have an SKB with its linear part
starting too close to the page boundary the packet is lost.

I see two ways to fix the issue:
- Change the 'wire' protocol between netfront and netback to start keeping
  the original SKB structure. We'll have to add a flag indicating the fact
  that the particular request is a part of the original linear part and not
  a frag. We'll need to know the length of the linear part to pre-allocate
  memory.
- Avoid transmitting SKBs with linear parts starting too close to the page
  boundary. That seems preferable short-term and shouldn't bring
  significant performance degradation as such packets are rare. That's what
  this patch is trying to achieve with skb_copy().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/net/xen-netfront.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

David Vrabel Aug. 22, 2016, 4:55 p.m. UTC | #1
On 22/08/16 16:42, Vitaly Kuznetsov wrote:
> 
> I see two ways to fix the issue:
> - Change the 'wire' protocol between netfront and netback to start keeping
>   the original SKB structure. We'll have to add a flag indicating the fact
>   that the particular request is a part of the original linear part and not
>   a frag. We'll need to know the length of the linear part to pre-allocate
>   memory.

I don't think there needs to be a protocol change.  I think the check in
netback is bogus -- it's the total packet length that must be >
HLEN_ETH.  The upper layers will pull any headers from the frags as
needed (or if necessary, netback could pull a minimum amount).

There's no need to preserve the skb layout (e.g., look how the to-guest
direction we do not do this).

David
Vitaly Kuznetsov Aug. 23, 2016, 6:51 p.m. UTC | #2
David Vrabel <david.vrabel@citrix.com> writes:

> On 22/08/16 16:42, Vitaly Kuznetsov wrote:
>> 
>> I see two ways to fix the issue:
>> - Change the 'wire' protocol between netfront and netback to start keeping
>>   the original SKB structure. We'll have to add a flag indicating the fact
>>   that the particular request is a part of the original linear part and not
>>   a frag. We'll need to know the length of the linear part to pre-allocate
>>   memory.
>
> I don't think there needs to be a protocol change.  I think the check in
> netback is bogus -- it's the total packet length that must be >
> HLEN_ETH.  The upper layers will pull any headers from the frags as
> needed

I'm afraid this is not always true, just removing the check leads us to
the following:

[  495.442186] kernel BUG at ./include/linux/skbuff.h:1927! 
[  495.468789] invalid opcode: 0000 [#1] SMP 
[  495.490094] Modules linked in: tun loop bridge stp llc intel_rapl sb_edac edac_core x86_pkg_temp_thermal ipmi_ssif igb coretemp iTCO_wdt crct10dif_pclmul crc32_pclmul ptp ipmi_si iTCO_vendor_support ghash_clmulni_intel hpwdt ipmi_msghandler ioatdma hpilo pps_core lpc_ich acpi_power_meter wmi fjes tpm_tis dca shpchp tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd xenfs grace xen_privcmd sunrpc xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper ttm drm crc32c_intel serio_raw xen_scsiback target_core_mod xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn 
[  495.749431] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc3+ #2 
[  495.782648] Hardware name: HP ProLiant DL380e Gen8, BIOS P73 08/20/2012 
[  495.817578] task: ffffffff81c0d500 task.stack: ffffffff81c00000 
[  495.847805] RIP: e030:[<ffffffff816f68a0>]  [<ffffffff816f68a0>] eth_type_trans+0xf0/0x130 
[  495.888942] RSP: e02b:ffff880429203d70  EFLAGS: 00010297 
[  495.916005] RAX: 0000000000000014 RBX: ffff88041f7bf200 RCX: 0000000000000000 
[  495.952133] RDX: ffff88041ed76c40 RSI: ffff88041ad6b000 RDI: ffff88041f7bf200 
[  495.988919] RBP: ffff880429203d80 R08: 0000000000000000 R09: ffff88041ed76cf0 
[  496.025782] R10: 0000160000000000 R11: ffffc900041aa2f8 R12: 000000000000000a 
[  496.061351] R13: ffffc900041b0200 R14: 000000000000000b R15: ffffc900041aa2a0 
[  496.098178] FS:  00007fa2b9442880(0000) GS:ffff880429200000(0000) knlGS:0000000000000000 
[  496.139767] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033 
[  496.169105] CR2: 00005558e4d43ea0 CR3: 000000042024e000 CR4: 0000000000042660 
[  496.206816] Stack: 
[  496.216904]  000000000000000b 51859c5d87cdd22f ffff880429203e68 ffffffffc002dd59 
[  496.254093]  ffffffff8155eed0 51859c5d87cdd22f ffff88041a450000 0000000a22d66f70 
[  496.292351]  ffff88041a450000 ffffc900041ad9e0 ffffc900041aa3c0 ffff88041f7bf200 
[  496.330823] Call Trace: 
[  496.343397]  <IRQ>  
[  496.352992]  [<ffffffffc002dd59>] xenvif_tx_action+0x569/0x8b0 [xen_netback] 
[  496.389933]  [<ffffffff8155eed0>] ? scsi_put_command+0x80/0xd0 
[  496.418810]  [<ffffffff816ccc07>] ? __napi_schedule+0x47/0x50 
[  496.449097]  [<ffffffffc00311f0>] ? xenvif_tx_interrupt+0x50/0x60 [xen_netback] 
[  496.485804]  [<ffffffff81101bed>] ? __handle_irq_event_percpu+0x8d/0x190 
...
Vitaly Kuznetsov Aug. 29, 2016, 10:28 a.m. UTC | #3
Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> David Vrabel <david.vrabel@citrix.com> writes:
>
>> On 22/08/16 16:42, Vitaly Kuznetsov wrote:
>>> 
>>> I see two ways to fix the issue:
>>> - Change the 'wire' protocol between netfront and netback to start keeping
>>>   the original SKB structure. We'll have to add a flag indicating the fact
>>>   that the particular request is a part of the original linear part and not
>>>   a frag. We'll need to know the length of the linear part to pre-allocate
>>>   memory.
>>
>> I don't think there needs to be a protocol change.  I think the check in
>> netback is bogus -- it's the total packet length that must be >
>> HLEN_ETH.  The upper layers will pull any headers from the frags as
>> needed
>
> I'm afraid this is not always true, just removing the check leads us to
> the following:
>
> [  495.442186] kernel BUG at ./include/linux/skbuff.h:1927! 
> [  495.468789] invalid opcode: 0000 [#1] SMP 

What I wanted to say here is that this test makes me think the
description of the patch I suggested is correct: an SKB can't have its
linear part shorter than ETH_HLEN as the header is being pointed directly,
upper network layers don't assemble it from frags, the check in netback
is valid.

So, how can we proceed here?
Vitaly Kuznetsov Sept. 9, 2016, 1:39 p.m. UTC | #4
Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>
>> David Vrabel <david.vrabel@citrix.com> writes:
>>
>>> On 22/08/16 16:42, Vitaly Kuznetsov wrote:
>>>> 
>>>> I see two ways to fix the issue:
>>>> - Change the 'wire' protocol between netfront and netback to start keeping
>>>>   the original SKB structure. We'll have to add a flag indicating the fact
>>>>   that the particular request is a part of the original linear part and not
>>>>   a frag. We'll need to know the length of the linear part to pre-allocate
>>>>   memory.
>>>
>>> I don't think there needs to be a protocol change.  I think the check in
>>> netback is bogus -- it's the total packet length that must be >
>>> HLEN_ETH.  The upper layers will pull any headers from the frags as
>>> needed
>>
>> I'm afraid this is not always true, just removing the check leads us to
>> the following:
>>
>> [  495.442186] kernel BUG at ./include/linux/skbuff.h:1927! 
>> [  495.468789] invalid opcode: 0000 [#1] SMP 
>
> What I wanted to say here is that this test makes me think the
> description of the patch I suggested is correct: an SKB can't have its
> linear part shorter than ETH_HLEN as the header is being pointed directly,
> upper network layers don't assemble it from frags, the check in netback
> is valid.
>
> So, how can we proceed here?

Sorry for the second ping but I'd really like to see this moving
forward...
David Vrabel Sept. 9, 2016, 1:41 p.m. UTC | #5
On 22/08/16 16:42, Vitaly Kuznetsov wrote:
> Small packet loss is reported on complex multi host network configurations
> including tunnels, NAT, ... My investigation led me to the following check
> in netback which drops packets:
> 
>         if (unlikely(txreq.size < ETH_HLEN)) {
>                 netdev_err(queue->vif->dev,
>                            "Bad packet size: %d\n", txreq.size);
>                 xenvif_tx_err(queue, &txreq, extra_count, idx);
>                 break;
>         }
> 
> But this check itself is legitimate. SKBs consist of a linear part (which
> has to have the ethernet header) and (optionally) a number of frags.
> Netfront transmits the head of the linear part up to the page boundary
> as the first request and all the rest becomes frags so when we're
> reconstructing the SKB in netback we can't distinguish between original
> frags and the 'tail' of the linear part. The first SKB needs to be at
> least ETH_HLEN size. So in case we have an SKB with its linear part
> starting too close to the page boundary the packet is lost.
> 
> I see two ways to fix the issue:
> - Change the 'wire' protocol between netfront and netback to start keeping
>   the original SKB structure. We'll have to add a flag indicating the fact
>   that the particular request is a part of the original linear part and not
>   a frag. We'll need to know the length of the linear part to pre-allocate
>   memory.
> - Avoid transmitting SKBs with linear parts starting too close to the page
>   boundary. That seems preferable short-term and shouldn't bring
>   significant performance degradation as such packets are rare. That's what
>   this patch is trying to achieve with skb_copy().
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

We should probably fix the backend to handle this (by grant copying a
minimum amount in the linear area, but since netfront needs to work with
older netback.

Acked-by: David Vrabel <david.vrabel@citrix.com>

David
Vitaly Kuznetsov Sept. 12, 2016, 11:52 a.m. UTC | #6
David Vrabel <david.vrabel@citrix.com> writes:

> On 22/08/16 16:42, Vitaly Kuznetsov wrote:
>> Small packet loss is reported on complex multi host network configurations
>> including tunnels, NAT, ... My investigation led me to the following check
>> in netback which drops packets:
>> 
>>         if (unlikely(txreq.size < ETH_HLEN)) {
>>                 netdev_err(queue->vif->dev,
>>                            "Bad packet size: %d\n", txreq.size);
>>                 xenvif_tx_err(queue, &txreq, extra_count, idx);
>>                 break;
>>         }
>> 
>> But this check itself is legitimate. SKBs consist of a linear part (which
>> has to have the ethernet header) and (optionally) a number of frags.
>> Netfront transmits the head of the linear part up to the page boundary
>> as the first request and all the rest becomes frags so when we're
>> reconstructing the SKB in netback we can't distinguish between original
>> frags and the 'tail' of the linear part. The first SKB needs to be at
>> least ETH_HLEN size. So in case we have an SKB with its linear part
>> starting too close to the page boundary the packet is lost.
>> 
>> I see two ways to fix the issue:
>> - Change the 'wire' protocol between netfront and netback to start keeping
>>   the original SKB structure. We'll have to add a flag indicating the fact
>>   that the particular request is a part of the original linear part and not
>>   a frag. We'll need to know the length of the linear part to pre-allocate
>>   memory.
>> - Avoid transmitting SKBs with linear parts starting too close to the page
>>   boundary. That seems preferable short-term and shouldn't bring
>>   significant performance degradation as such packets are rare. That's what
>>   this patch is trying to achieve with skb_copy().
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>
> We should probably fix the backend to handle this (by grant copying a
> minimum amount in the linear area, but since netfront needs to work with
> older netback.
>
> Acked-by: David Vrabel <david.vrabel@citrix.com>

David,

could you please pick this up for net-next or are there any concerns
remaining?

Thanks,
diff mbox

Patch

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 96ccd4e..28c4a66 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -565,6 +565,7 @@  static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct netfront_queue *queue = NULL;
 	unsigned int num_queues = dev->real_num_tx_queues;
 	u16 queue_index;
+	struct sk_buff *nskb;
 
 	/* Drop the packet if no queues are set up */
 	if (num_queues < 1)
@@ -595,6 +596,19 @@  static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	offset = offset_in_page(skb->data);
 	len = skb_headlen(skb);
 
+	/* The first req should be at least ETH_HLEN size or the packet will be
+	 * dropped by netback.
+	 */
+	if (unlikely(PAGE_SIZE - offset < ETH_HLEN)) {
+		nskb = skb_copy(skb, GFP_ATOMIC);
+		if (!nskb)
+			goto drop;
+		dev_kfree_skb_any(skb);
+		skb = nskb;
+		page = virt_to_page(skb->data);
+		offset = offset_in_page(skb->data);
+	}
+
 	spin_lock_irqsave(&queue->tx_lock, flags);
 
 	if (unlikely(!netif_carrier_ok(dev) ||