Message ID | 1471880577-21380-1-git-send-email-vkuznets@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 22/08/16 16:42, Vitaly Kuznetsov wrote: > > I see two ways to fix the issue: > - Change the 'wire' protocol between netfront and netback to start keeping > the original SKB structure. We'll have to add a flag indicating the fact > that the particular request is a part of the original linear part and not > a frag. We'll need to know the length of the linear part to pre-allocate > memory. I don't think there needs to be a protocol change. I think the check in netback is bogus -- it's the total packet length that must be > HLEN_ETH. The upper layers will pull any headers from the frags as needed (or if necessary, netback could pull a minimum amount). There's no need to preserve the skb layout (e.g., look how the to-guest direction we do not do this). David
David Vrabel <david.vrabel@citrix.com> writes: > On 22/08/16 16:42, Vitaly Kuznetsov wrote: >> >> I see two ways to fix the issue: >> - Change the 'wire' protocol between netfront and netback to start keeping >> the original SKB structure. We'll have to add a flag indicating the fact >> that the particular request is a part of the original linear part and not >> a frag. We'll need to know the length of the linear part to pre-allocate >> memory. > > I don't think there needs to be a protocol change. I think the check in > netback is bogus -- it's the total packet length that must be > > HLEN_ETH. The upper layers will pull any headers from the frags as > needed I'm afraid this is not always true, just removing the check leads us to the following: [ 495.442186] kernel BUG at ./include/linux/skbuff.h:1927! [ 495.468789] invalid opcode: 0000 [#1] SMP [ 495.490094] Modules linked in: tun loop bridge stp llc intel_rapl sb_edac edac_core x86_pkg_temp_thermal ipmi_ssif igb coretemp iTCO_wdt crct10dif_pclmul crc32_pclmul ptp ipmi_si iTCO_vendor_support ghash_clmulni_intel hpwdt ipmi_msghandler ioatdma hpilo pps_core lpc_ich acpi_power_meter wmi fjes tpm_tis dca shpchp tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd xenfs grace xen_privcmd sunrpc xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper ttm drm crc32c_intel serio_raw xen_scsiback target_core_mod xen_pciback xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn [ 495.749431] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc3+ #2 [ 495.782648] Hardware name: HP ProLiant DL380e Gen8, BIOS P73 08/20/2012 [ 495.817578] task: ffffffff81c0d500 task.stack: ffffffff81c00000 [ 495.847805] RIP: e030:[<ffffffff816f68a0>] [<ffffffff816f68a0>] eth_type_trans+0xf0/0x130 [ 495.888942] RSP: e02b:ffff880429203d70 EFLAGS: 00010297 [ 495.916005] RAX: 0000000000000014 RBX: ffff88041f7bf200 RCX: 0000000000000000 [ 495.952133] RDX: ffff88041ed76c40 RSI: ffff88041ad6b000 RDI: ffff88041f7bf200 [ 495.988919] RBP: ffff880429203d80 R08: 0000000000000000 R09: ffff88041ed76cf0 [ 496.025782] R10: 0000160000000000 R11: ffffc900041aa2f8 R12: 000000000000000a [ 496.061351] R13: ffffc900041b0200 R14: 000000000000000b R15: ffffc900041aa2a0 [ 496.098178] FS: 00007fa2b9442880(0000) GS:ffff880429200000(0000) knlGS:0000000000000000 [ 496.139767] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 496.169105] CR2: 00005558e4d43ea0 CR3: 000000042024e000 CR4: 0000000000042660 [ 496.206816] Stack: [ 496.216904] 000000000000000b 51859c5d87cdd22f ffff880429203e68 ffffffffc002dd59 [ 496.254093] ffffffff8155eed0 51859c5d87cdd22f ffff88041a450000 0000000a22d66f70 [ 496.292351] ffff88041a450000 ffffc900041ad9e0 ffffc900041aa3c0 ffff88041f7bf200 [ 496.330823] Call Trace: [ 496.343397] <IRQ> [ 496.352992] [<ffffffffc002dd59>] xenvif_tx_action+0x569/0x8b0 [xen_netback] [ 496.389933] [<ffffffff8155eed0>] ? scsi_put_command+0x80/0xd0 [ 496.418810] [<ffffffff816ccc07>] ? __napi_schedule+0x47/0x50 [ 496.449097] [<ffffffffc00311f0>] ? xenvif_tx_interrupt+0x50/0x60 [xen_netback] [ 496.485804] [<ffffffff81101bed>] ? __handle_irq_event_percpu+0x8d/0x190 ...
Vitaly Kuznetsov <vkuznets@redhat.com> writes: > David Vrabel <david.vrabel@citrix.com> writes: > >> On 22/08/16 16:42, Vitaly Kuznetsov wrote: >>> >>> I see two ways to fix the issue: >>> - Change the 'wire' protocol between netfront and netback to start keeping >>> the original SKB structure. We'll have to add a flag indicating the fact >>> that the particular request is a part of the original linear part and not >>> a frag. We'll need to know the length of the linear part to pre-allocate >>> memory. >> >> I don't think there needs to be a protocol change. I think the check in >> netback is bogus -- it's the total packet length that must be > >> HLEN_ETH. The upper layers will pull any headers from the frags as >> needed > > I'm afraid this is not always true, just removing the check leads us to > the following: > > [ 495.442186] kernel BUG at ./include/linux/skbuff.h:1927! > [ 495.468789] invalid opcode: 0000 [#1] SMP What I wanted to say here is that this test makes me think the description of the patch I suggested is correct: an SKB can't have its linear part shorter than ETH_HLEN as the header is being pointed directly, upper network layers don't assemble it from frags, the check in netback is valid. So, how can we proceed here?
Vitaly Kuznetsov <vkuznets@redhat.com> writes: > Vitaly Kuznetsov <vkuznets@redhat.com> writes: > >> David Vrabel <david.vrabel@citrix.com> writes: >> >>> On 22/08/16 16:42, Vitaly Kuznetsov wrote: >>>> >>>> I see two ways to fix the issue: >>>> - Change the 'wire' protocol between netfront and netback to start keeping >>>> the original SKB structure. We'll have to add a flag indicating the fact >>>> that the particular request is a part of the original linear part and not >>>> a frag. We'll need to know the length of the linear part to pre-allocate >>>> memory. >>> >>> I don't think there needs to be a protocol change. I think the check in >>> netback is bogus -- it's the total packet length that must be > >>> HLEN_ETH. The upper layers will pull any headers from the frags as >>> needed >> >> I'm afraid this is not always true, just removing the check leads us to >> the following: >> >> [ 495.442186] kernel BUG at ./include/linux/skbuff.h:1927! >> [ 495.468789] invalid opcode: 0000 [#1] SMP > > What I wanted to say here is that this test makes me think the > description of the patch I suggested is correct: an SKB can't have its > linear part shorter than ETH_HLEN as the header is being pointed directly, > upper network layers don't assemble it from frags, the check in netback > is valid. > > So, how can we proceed here? Sorry for the second ping but I'd really like to see this moving forward...
On 22/08/16 16:42, Vitaly Kuznetsov wrote: > Small packet loss is reported on complex multi host network configurations > including tunnels, NAT, ... My investigation led me to the following check > in netback which drops packets: > > if (unlikely(txreq.size < ETH_HLEN)) { > netdev_err(queue->vif->dev, > "Bad packet size: %d\n", txreq.size); > xenvif_tx_err(queue, &txreq, extra_count, idx); > break; > } > > But this check itself is legitimate. SKBs consist of a linear part (which > has to have the ethernet header) and (optionally) a number of frags. > Netfront transmits the head of the linear part up to the page boundary > as the first request and all the rest becomes frags so when we're > reconstructing the SKB in netback we can't distinguish between original > frags and the 'tail' of the linear part. The first SKB needs to be at > least ETH_HLEN size. So in case we have an SKB with its linear part > starting too close to the page boundary the packet is lost. > > I see two ways to fix the issue: > - Change the 'wire' protocol between netfront and netback to start keeping > the original SKB structure. We'll have to add a flag indicating the fact > that the particular request is a part of the original linear part and not > a frag. We'll need to know the length of the linear part to pre-allocate > memory. > - Avoid transmitting SKBs with linear parts starting too close to the page > boundary. That seems preferable short-term and shouldn't bring > significant performance degradation as such packets are rare. That's what > this patch is trying to achieve with skb_copy(). > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> We should probably fix the backend to handle this (by grant copying a minimum amount in the linear area, but since netfront needs to work with older netback. Acked-by: David Vrabel <david.vrabel@citrix.com> David
David Vrabel <david.vrabel@citrix.com> writes: > On 22/08/16 16:42, Vitaly Kuznetsov wrote: >> Small packet loss is reported on complex multi host network configurations >> including tunnels, NAT, ... My investigation led me to the following check >> in netback which drops packets: >> >> if (unlikely(txreq.size < ETH_HLEN)) { >> netdev_err(queue->vif->dev, >> "Bad packet size: %d\n", txreq.size); >> xenvif_tx_err(queue, &txreq, extra_count, idx); >> break; >> } >> >> But this check itself is legitimate. SKBs consist of a linear part (which >> has to have the ethernet header) and (optionally) a number of frags. >> Netfront transmits the head of the linear part up to the page boundary >> as the first request and all the rest becomes frags so when we're >> reconstructing the SKB in netback we can't distinguish between original >> frags and the 'tail' of the linear part. The first SKB needs to be at >> least ETH_HLEN size. So in case we have an SKB with its linear part >> starting too close to the page boundary the packet is lost. >> >> I see two ways to fix the issue: >> - Change the 'wire' protocol between netfront and netback to start keeping >> the original SKB structure. We'll have to add a flag indicating the fact >> that the particular request is a part of the original linear part and not >> a frag. We'll need to know the length of the linear part to pre-allocate >> memory. >> - Avoid transmitting SKBs with linear parts starting too close to the page >> boundary. That seems preferable short-term and shouldn't bring >> significant performance degradation as such packets are rare. That's what >> this patch is trying to achieve with skb_copy(). >> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > > We should probably fix the backend to handle this (by grant copying a > minimum amount in the linear area, but since netfront needs to work with > older netback. > > Acked-by: David Vrabel <david.vrabel@citrix.com> David, could you please pick this up for net-next or are there any concerns remaining? Thanks,
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 96ccd4e..28c4a66 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -565,6 +565,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) struct netfront_queue *queue = NULL; unsigned int num_queues = dev->real_num_tx_queues; u16 queue_index; + struct sk_buff *nskb; /* Drop the packet if no queues are set up */ if (num_queues < 1) @@ -595,6 +596,19 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) offset = offset_in_page(skb->data); len = skb_headlen(skb); + /* The first req should be at least ETH_HLEN size or the packet will be + * dropped by netback. + */ + if (unlikely(PAGE_SIZE - offset < ETH_HLEN)) { + nskb = skb_copy(skb, GFP_ATOMIC); + if (!nskb) + goto drop; + dev_kfree_skb_any(skb); + skb = nskb; + page = virt_to_page(skb->data); + offset = offset_in_page(skb->data); + } + spin_lock_irqsave(&queue->tx_lock, flags); if (unlikely(!netif_carrier_ok(dev) ||
Small packet loss is reported on complex multi host network configurations including tunnels, NAT, ... My investigation led me to the following check in netback which drops packets: if (unlikely(txreq.size < ETH_HLEN)) { netdev_err(queue->vif->dev, "Bad packet size: %d\n", txreq.size); xenvif_tx_err(queue, &txreq, extra_count, idx); break; } But this check itself is legitimate. SKBs consist of a linear part (which has to have the ethernet header) and (optionally) a number of frags. Netfront transmits the head of the linear part up to the page boundary as the first request and all the rest becomes frags so when we're reconstructing the SKB in netback we can't distinguish between original frags and the 'tail' of the linear part. The first SKB needs to be at least ETH_HLEN size. So in case we have an SKB with its linear part starting too close to the page boundary the packet is lost. I see two ways to fix the issue: - Change the 'wire' protocol between netfront and netback to start keeping the original SKB structure. We'll have to add a flag indicating the fact that the particular request is a part of the original linear part and not a frag. We'll need to know the length of the linear part to pre-allocate memory. - Avoid transmitting SKBs with linear parts starting too close to the page boundary. That seems preferable short-term and shouldn't bring significant performance degradation as such packets are rare. That's what this patch is trying to achieve with skb_copy(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> --- drivers/net/xen-netfront.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)