[RFC,v2,net-next] net: Preserve skb delivery time during forward

The skb->skb_mstamp_ns is used as the EDT (Earliest Department Time)
in TCP.  skb->skb_mstamp_ns is a union member of skb->tstamp.

When the skb traveling veth and being forwarded like below, the skb->tstamp
is reset to 0 at multiple points.

                                                                        (c: skb->tstamp = 0)
                                                                         vv
tcp-sender => veth@netns => veth@hostns(b: rx: skb->tstamp = real_clock) => fq@eth0
                         ^^
                        (a: skb->tstamp = 0)

(a) veth@netns TX to veth@hostns:
    skb->tstamp (mono clock) is a EDT and it is in future time.
    Reset to 0 so that it won't skip the net_timestamp_check at the
    RX side in (b).
(b) RX (netif_rx) in veth@hostns:
    net_timestamp_check puts a current time (real clock) in skb->tstamp.
(c) veth@hostns forward to fq@eth0:
    skb->tstamp is reset back to 0 again because fq is using
    mono clock.

This leads to an unstable TCP throughput issue described by Daniel in [0].

We also have a use case that a bpf runs at ingress@veth@hostns
to set EDT in skb->tstamp to limit the bandwidth usage
of a particular netns.  This EDT currently also gets
reset in step (c) as described above.

Unlike RFC v1 trying to migrate rx tstamp to mono first,
this patch is to preserve the EDT in skb->skb_mstamp_ns during forward.

The idea is to temporarily store skb->skb_mstamp_ns during forward.
skb_shinfo(skb)->hwtstamps is used as a temporary store and
it is union-ed with the newly added "u64 tx_delivery_tstamp".
hwtstamps should only be used when a packet is received or
sent out of a hw device.

During forward, skb->tstamp will be temporarily stored in
skb_shinfo(skb)->tx_delivery_tstamp and a new bit
(SKBTX_DELIVERY_TSTAMP) in skb_shinfo(skb)->tx_flags
will also be set to tell tx_delivery_tstamp is in use.
hwtstamps is accessed through the skb_hwtstamps() getter,
so unlikely(tx_flags & SKBTX_DELIVERY_TSTAMP) can
be tested in there and reset tx_delivery_tstamp to 0
before hwtstamps is used.

After moving the skb->tstamp to skb_shinfo(skb)->tx_delivery_tstamp,
the skb->tstamp will still be reset to 0 during forward.  Thus,
on the RX side (__netif_receive_skb_core), all existing code paths
will still get the received time in real clock and will work as-is.

When this skb finally xmit-ing out in __dev_queue_xmit(),
it will check the SKBTX_DELIVERY_TSTAMP bit in skb_shinfo(skb)->tx_flags
and restore the skb->tstamp from skb_shinfo(skb)->tx_delivery_tstamp
if needed.  This bit test is done immediately after another existing
bit test 'skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP'.

Another bit SKBTX_DELIVERY_TSTAMP_ALLOW_FWD is added
to skb_shinfo(skb)->tx_flags.  It is used to specify
the skb->tstamp is set as a delivery time and can be
temporarily stored during forward.  This bit is now set
when EDT is stored in skb->skb_mstamp_ns in tcp_output.c
This will avoid packet received from a NIC with real-clock
in skb->tstamp being forwarded without reset.

The change in af_packet.c is to avoid it calling skb_hwtstamps()
which will reset the skb_shinfo(skb)->tx_delivery_tstamp.
af_packet.c only wants to read the hwtstamps instead of
storing a time in it, so a new read only getter skb_hwtstamps_ktime()
is added.  Otherwise, a tcpdump will trigger this code path
and unnecessarily reset the EDT stored in tx_delivery_tstamp.

[Note: not all skb->tstamp=0 reset has been changed in this RFC yet]

[0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/skbuff.h  | 52 ++++++++++++++++++++++++++++++++++++++++-
 net/bridge/br_forward.c |  2 +-
 net/core/dev.c          |  1 +
 net/core/filter.c       |  6 ++---
 net/core/skbuff.c       |  2 +-
 net/ipv4/ip_forward.c   |  2 +-
 net/ipv4/tcp_output.c   | 21 +++++++++++------
 net/ipv6/ip6_output.c   |  2 +-
 net/packet/af_packet.c  |  8 +++----
 9 files changed, 77 insertions(+), 19 deletions(-)

Message ID	20211215201158.271976-1-kafai@fb.com (mailing list archive)
State	RFC
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42271C433F5 for <netdev@archiver.kernel.org>; Wed, 15 Dec 2021 20:12:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234941AbhLOUMH (ORCPT <rfc822;netdev@archiver.kernel.org>); Wed, 15 Dec 2021 15:12:07 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:57098 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S235026AbhLOUME (ORCPT <rfc822;netdev@vger.kernel.org>); Wed, 15 Dec 2021 15:12:04 -0500 Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.1.2/8.16.1.2) with ESMTP id 1BFIA5vl030176 for <netdev@vger.kernel.org>; Wed, 15 Dec 2021 12:12:03 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : content-type : content-transfer-encoding : mime-version; s=facebook; bh=Z/qT55To4Odk9SiHTAxxrsF29Xyhy8GgoikkFgifY8E=; b=aNphvMTiC3zftLI5Ll0RZOC3fmTMsZ3JVKsIhDVBGexHUt2/u6EAoi6KVLMxyz/+atvw sDfRRor36vriIXLXBGYwyRo0XNn34AT6Q5QLru7atTuEk2OeFBUJbWe/2fqolwp4Br2C 7bXxZntZD5Z1p7WtDcyxXdCWyZEBMN57urc= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3cy9rcnxx1-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for <netdev@vger.kernel.org>; Wed, 15 Dec 2021 12:12:03 -0800 Received: from intmgw001.05.ash7.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20; Wed, 15 Dec 2021 12:12:01 -0800 Received: by devbig005.ftw2.facebook.com (Postfix, from userid 6611) id 1FF0C3F237B6; Wed, 15 Dec 2021 12:11:58 -0800 (PST) From: Martin KaFai Lau <kafai@fb.com> To: <netdev@vger.kernel.org> CC: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, David Miller <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, <kernel-team@fb.com>, Willem de Bruijn <willemb@google.com> Subject: [RFC PATCH v2 net-next] net: Preserve skb delivery time during forward Date: Wed, 15 Dec 2021 12:11:58 -0800 Message-ID: <20211215201158.271976-1-kafai@fb.com> X-Mailer: git-send-email 2.30.2 X-FB-Internal: Safe Content-Type: text/plain X-FB-Source: Intern X-Proofpoint-GUID: 8vxgNZo3Uj0riouToHhkOD-VuafPgv6n X-Proofpoint-ORIG-GUID: 8vxgNZo3Uj0riouToHhkOD-VuafPgv6n Content-Transfer-Encoding: quoted-printable X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.11.62.513 definitions=2021-12-15_12,2021-12-14_01,2021-12-02_01 X-Proofpoint-Spam-Details: rule=fb_outbound_notspam policy=fb_outbound score=0 clxscore=1015 spamscore=0 mlxlogscore=999 lowpriorityscore=0 impostorscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 suspectscore=0 priorityscore=1501 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2112150112 X-FB-Internal: deliver Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC
Series	[RFC,v2,net-next] net: Preserve skb delivery time during forward \| expand [RFC,v2,net-next] net: Preserve skb delivery time during forward

Context	Check	Description
netdev/tree_selection	success	Clearly marked for net-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 6011 this patch: 6011
netdev/cc_maintainers	warning	14 maintainers not CCed: bpf@vger.kernel.org nikolay@nvidia.com alobakin@pm.me roopa@nvidia.com yoshfuji@linux-ipv6.org songliubraving@fb.com dsahern@kernel.org bridge@lists.linux-foundation.org kpsingh@kernel.org john.fastabend@gmail.com pablo@netfilter.org yhs@fb.com jonathan.lemon@gmail.com andrii@kernel.org
netdev/build_clang	success	Errors and warnings before: 1022 this patch: 1022
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 6162 this patch: 6162
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 221 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

[RFC,v2,net-next] net: Preserve skb delivery time during forward

Checks

Commit Message

Comments

Patch