From patchwork Wed Sep 12 03:16:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 10596703 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0A6B913B8 for ; Wed, 12 Sep 2018 03:19:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EDCB829BEF for ; Wed, 12 Sep 2018 03:19:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E1E2429BF8; Wed, 12 Sep 2018 03:19:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 91B6B29BEF for ; Wed, 12 Sep 2018 03:19:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727685AbeILITz (ORCPT ); Wed, 12 Sep 2018 04:19:55 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:55762 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726027AbeILITz (ORCPT ); Wed, 12 Sep 2018 04:19:55 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7379087A50; Wed, 12 Sep 2018 03:17:30 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-130.pek2.redhat.com [10.72.12.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id 811662027EA4; Wed, 12 Sep 2018 03:17:27 +0000 (UTC) From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, jasowang@redhat.com Subject: [PATCH net-next V2 01/11] net: sock: introduce SOCK_XDP Date: Wed, 12 Sep 2018 11:16:59 +0800 Message-Id: <20180912031709.14112-2-jasowang@redhat.com> In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com> References: <20180912031709.14112-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Wed, 12 Sep 2018 03:17:30 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Wed, 12 Sep 2018 03:17:30 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch introduces a new sock flag - SOCK_XDP. This will be used for notifying the upper layer that XDP program is attached on the lower socket, and requires for extra headroom. TUN will be the first user. Signed-off-by: Jason Wang --- drivers/net/tun.c | 19 +++++++++++++++++++ include/net/sock.h | 1 + 2 files changed, 20 insertions(+) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index ebd07ad82431..2c548bd20393 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -869,6 +869,9 @@ static int tun_attach(struct tun_struct *tun, struct file *file, tun_napi_init(tun, tfile, napi); } + if (rtnl_dereference(tun->xdp_prog)) + sock_set_flag(&tfile->sk, SOCK_XDP); + tun_set_real_num_queues(tun); /* device is allowed to go away first, so no need to hold extra @@ -1241,13 +1244,29 @@ static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog, struct netlink_ext_ack *extack) { struct tun_struct *tun = netdev_priv(dev); + struct tun_file *tfile; struct bpf_prog *old_prog; + int i; old_prog = rtnl_dereference(tun->xdp_prog); rcu_assign_pointer(tun->xdp_prog, prog); if (old_prog) bpf_prog_put(old_prog); + for (i = 0; i < tun->numqueues; i++) { + tfile = rtnl_dereference(tun->tfiles[i]); + if (prog) + sock_set_flag(&tfile->sk, SOCK_XDP); + else + sock_reset_flag(&tfile->sk, SOCK_XDP); + } + list_for_each_entry(tfile, &tun->disabled, next) { + if (prog) + sock_set_flag(&tfile->sk, SOCK_XDP); + else + sock_reset_flag(&tfile->sk, SOCK_XDP); + } + return 0; } diff --git a/include/net/sock.h b/include/net/sock.h index 433f45fc2d68..38cae35f6e16 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -800,6 +800,7 @@ enum sock_flags { SOCK_SELECT_ERR_QUEUE, /* Wake select on error queue */ SOCK_RCU_FREE, /* wait rcu grace period in sk_destruct() */ SOCK_TXTIME, + SOCK_XDP, /* XDP is attached */ }; #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE)) From patchwork Wed Sep 12 03:17:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 10596701 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F200920 for ; Wed, 12 Sep 2018 03:19:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7D84529BEF for ; Wed, 12 Sep 2018 03:19:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 71D2429BF8; Wed, 12 Sep 2018 03:19:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2F04029BEF for ; Wed, 12 Sep 2018 03:19:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727942AbeILIT7 (ORCPT ); Wed, 12 Sep 2018 04:19:59 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:40102 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726027AbeILIT6 (ORCPT ); Wed, 12 Sep 2018 04:19:58 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2F1D440216F7; Wed, 12 Sep 2018 03:17:34 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-130.pek2.redhat.com [10.72.12.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id 404902027EA4; Wed, 12 Sep 2018 03:17:30 +0000 (UTC) From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, jasowang@redhat.com Subject: [PATCH net-next V2 02/11] tuntap: switch to use XDP_PACKET_HEADROOM Date: Wed, 12 Sep 2018 11:17:00 +0800 Message-Id: <20180912031709.14112-3-jasowang@redhat.com> In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com> References: <20180912031709.14112-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 12 Sep 2018 03:17:34 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 12 Sep 2018 03:17:34 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- drivers/net/tun.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 2c548bd20393..d3677a544b56 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -113,7 +113,6 @@ do { \ } while (0) #endif -#define TUN_HEADROOM 256 #define TUN_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD) /* TUN device flags */ @@ -1654,7 +1653,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); if (xdp_prog) - pad += TUN_HEADROOM; + pad += XDP_PACKET_HEADROOM; buflen += SKB_DATA_ALIGN(len + pad); rcu_read_unlock(); From patchwork Wed Sep 12 03:17:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 10596699 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9067B13B8 for ; Wed, 12 Sep 2018 03:19:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F32229BEF for ; Wed, 12 Sep 2018 03:19:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7381C29BF8; Wed, 12 Sep 2018 03:19:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 222B129BEF for ; Wed, 12 Sep 2018 03:19:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728027AbeILIUC (ORCPT ); Wed, 12 Sep 2018 04:20:02 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:40106 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726027AbeILIUB (ORCPT ); Wed, 12 Sep 2018 04:20:01 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D3D2740216F7; Wed, 12 Sep 2018 03:17:37 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-130.pek2.redhat.com [10.72.12.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id CB9002027EA4; Wed, 12 Sep 2018 03:17:34 +0000 (UTC) From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, jasowang@redhat.com Subject: [PATCH net-next V2 03/11] tuntap: enable bh early during processing XDP Date: Wed, 12 Sep 2018 11:17:01 +0800 Message-Id: <20180912031709.14112-4-jasowang@redhat.com> In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com> References: <20180912031709.14112-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 12 Sep 2018 03:17:37 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 12 Sep 2018 03:17:37 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch move the bh enabling a little bit earlier, this will be used for factoring out the core XDP logic of tuntap. Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- drivers/net/tun.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index d3677a544b56..372caf7d67d9 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1726,22 +1726,18 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, goto err_xdp; } } + rcu_read_unlock(); + local_bh_enable(); skb = build_skb(buf, buflen); - if (!skb) { - rcu_read_unlock(); - local_bh_enable(); + if (!skb) return ERR_PTR(-ENOMEM); - } skb_reserve(skb, pad - delta); skb_put(skb, len); get_page(alloc_frag->page); alloc_frag->offset += buflen; - rcu_read_unlock(); - local_bh_enable(); - return skb; err_redirect: From patchwork Wed Sep 12 03:17:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 10596683 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D6A0B14DB for ; Wed, 12 Sep 2018 03:17:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BEB7D28E5E for ; Wed, 12 Sep 2018 03:17:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B30E629B5C; Wed, 12 Sep 2018 03:17:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 55A4428E5E for ; Wed, 12 Sep 2018 03:17:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728118AbeILIUG (ORCPT ); Wed, 12 Sep 2018 04:20:06 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:44548 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726027AbeILIUF (ORCPT ); Wed, 12 Sep 2018 04:20:05 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3BD4340241C0; Wed, 12 Sep 2018 03:17:41 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-130.pek2.redhat.com [10.72.12.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id 89E1B2027EA4; Wed, 12 Sep 2018 03:17:38 +0000 (UTC) From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, jasowang@redhat.com Subject: [PATCH net-next V2 04/11] tuntap: simplify error handling in tun_build_skb() Date: Wed, 12 Sep 2018 11:17:02 +0800 Message-Id: <20180912031709.14112-5-jasowang@redhat.com> In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com> References: <20180912031709.14112-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 12 Sep 2018 03:17:41 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 12 Sep 2018 03:17:41 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP There's no need to duplicate page get logic in each action. So this patch tries to get page and calculate the offset before processing XDP actions (except for XDP_DROP), and undo them when meet errors (we don't care the performance on errors). This will be used for factoring out XDP logic. Signed-off-by: Jason Wang --- drivers/net/tun.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 372caf7d67d9..257cf7342d54 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1701,17 +1701,13 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, xdp_do_flush_map(); if (err) goto err_redirect; - rcu_read_unlock(); - local_bh_enable(); - return NULL; + goto out; case XDP_TX: get_page(alloc_frag->page); alloc_frag->offset += buflen; if (tun_xdp_tx(tun->dev, &xdp) < 0) goto err_redirect; - rcu_read_unlock(); - local_bh_enable(); - return NULL; + goto out; case XDP_PASS: delta = orig_data - xdp.data; len = xdp.data_end - xdp.data; @@ -1742,7 +1738,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, err_redirect: put_page(alloc_frag->page); -err_xdp: +out: rcu_read_unlock(); local_bh_enable(); this_cpu_inc(tun->pcpu_stats->rx_dropped); From patchwork Wed Sep 12 03:17:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 10596697 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B98613B8 for ; Wed, 12 Sep 2018 03:18:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7AD3429BEF for ; Wed, 12 Sep 2018 03:18:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6F59F29BF8; Wed, 12 Sep 2018 03:18:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1816A29BEF for ; Wed, 12 Sep 2018 03:18:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728139AbeILIUL (ORCPT ); Wed, 12 Sep 2018 04:20:11 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:55766 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728125AbeILIUK (ORCPT ); Wed, 12 Sep 2018 04:20:10 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 87F6B87A72; Wed, 12 Sep 2018 03:17:46 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-130.pek2.redhat.com [10.72.12.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id DF7102027EA4; Wed, 12 Sep 2018 03:17:41 +0000 (UTC) From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, jasowang@redhat.com Subject: [PATCH net-next V2 05/11] tuntap: tweak on the path of skb XDP case in tun_build_skb() Date: Wed, 12 Sep 2018 11:17:03 +0800 Message-Id: <20180912031709.14112-6-jasowang@redhat.com> In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com> References: <20180912031709.14112-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Wed, 12 Sep 2018 03:17:46 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Wed, 12 Sep 2018 03:17:46 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If we're sure not to go native XDP, there's no need for several things like bh and rcu stuffs. So this patch introduces a helper to build skb and hold page refcnt. When we found we will go through skb path, build skb directly. Signed-off-by: Jason Wang --- drivers/net/tun.c | 39 ++++++++++++++++++++++++--------------- 1 file changed, 24 insertions(+), 15 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 257cf7342d54..946c6148ed75 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1635,6 +1635,23 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile, return true; } +static struct sk_buff *__tun_build_skb(struct page_frag *alloc_frag, char *buf, + int buflen, int len, int pad, int delta) +{ + struct sk_buff *skb = build_skb(buf, buflen); + + if (!skb) + return ERR_PTR(-ENOMEM); + + skb_reserve(skb, pad - delta); + skb_put(skb, len); + + get_page(alloc_frag->page); + alloc_frag->offset += buflen; + + return skb; +} + static struct sk_buff *tun_build_skb(struct tun_struct *tun, struct tun_file *tfile, struct iov_iter *from, @@ -1642,7 +1659,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, int len, int *skb_xdp) { struct page_frag *alloc_frag = ¤t->task_frag; - struct sk_buff *skb; struct bpf_prog *xdp_prog; int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); unsigned int delta = 0; @@ -1672,10 +1688,12 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, * of xdp_prog above, this should be rare and for simplicity * we do XDP on skb in case the headroom is not enough. */ - if (hdr->gso_type || !xdp_prog) + if (hdr->gso_type || !xdp_prog) { *skb_xdp = 1; - else - *skb_xdp = 0; + return __tun_build_skb(alloc_frag, buf, buflen, len, pad, delta); + } + + *skb_xdp = 0; local_bh_disable(); rcu_read_lock(); @@ -1719,22 +1737,13 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, trace_xdp_exception(tun->dev, xdp_prog, act); /* fall through */ case XDP_DROP: - goto err_xdp; + goto out; } } rcu_read_unlock(); local_bh_enable(); - skb = build_skb(buf, buflen); - if (!skb) - return ERR_PTR(-ENOMEM); - - skb_reserve(skb, pad - delta); - skb_put(skb, len); - get_page(alloc_frag->page); - alloc_frag->offset += buflen; - - return skb; + return __tun_build_skb(alloc_frag, buf, buflen, len, pad, delta); err_redirect: put_page(alloc_frag->page); From patchwork Wed Sep 12 03:17:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 10596695 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C834920 for ; Wed, 12 Sep 2018 03:18:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B8E229BEF for ; Wed, 12 Sep 2018 03:18:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4FA8829BF6; Wed, 12 Sep 2018 03:18:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DB6BC29BEF for ; Wed, 12 Sep 2018 03:18:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728175AbeILIUQ (ORCPT ); Wed, 12 Sep 2018 04:20:16 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:40112 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728125AbeILIUP (ORCPT ); Wed, 12 Sep 2018 04:20:15 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1E5BA40216F7; Wed, 12 Sep 2018 03:17:50 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-130.pek2.redhat.com [10.72.12.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2D0E22027EA4; Wed, 12 Sep 2018 03:17:46 +0000 (UTC) From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, jasowang@redhat.com Subject: [PATCH net-next V2 06/11] tuntap: split out XDP logic Date: Wed, 12 Sep 2018 11:17:04 +0800 Message-Id: <20180912031709.14112-7-jasowang@redhat.com> In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com> References: <20180912031709.14112-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 12 Sep 2018 03:17:50 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 12 Sep 2018 03:17:50 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch split out XDP logic into a single function. This make it to be reused by XDP batching path in the following patch. Signed-off-by: Jason Wang --- drivers/net/tun.c | 88 +++++++++++++++++++++++++++-------------------- 1 file changed, 51 insertions(+), 37 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 946c6148ed75..14fe94098180 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1636,14 +1636,14 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile, } static struct sk_buff *__tun_build_skb(struct page_frag *alloc_frag, char *buf, - int buflen, int len, int pad, int delta) + int buflen, int len, int pad) { struct sk_buff *skb = build_skb(buf, buflen); if (!skb) return ERR_PTR(-ENOMEM); - skb_reserve(skb, pad - delta); + skb_reserve(skb, pad); skb_put(skb, len); get_page(alloc_frag->page); @@ -1652,6 +1652,39 @@ static struct sk_buff *__tun_build_skb(struct page_frag *alloc_frag, char *buf, return skb; } +static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog, + struct xdp_buff *xdp, u32 act) +{ + int err; + + switch (act) { + case XDP_REDIRECT: + err = xdp_do_redirect(tun->dev, xdp, xdp_prog); + xdp_do_flush_map(); + if (err) + return err; + break; + case XDP_TX: + err = tun_xdp_tx(tun->dev, xdp); + if (err < 0) + return err; + break; + case XDP_PASS: + break; + default: + bpf_warn_invalid_xdp_action(act); + /* fall through */ + case XDP_ABORTED: + trace_xdp_exception(tun->dev, xdp_prog, act); + /* fall through */ + case XDP_DROP: + this_cpu_inc(tun->pcpu_stats->rx_dropped); + break; + } + + return act; +} + static struct sk_buff *tun_build_skb(struct tun_struct *tun, struct tun_file *tfile, struct iov_iter *from, @@ -1661,10 +1694,10 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, struct page_frag *alloc_frag = ¤t->task_frag; struct bpf_prog *xdp_prog; int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - unsigned int delta = 0; char *buf; size_t copied; - int err, pad = TUN_RX_PAD; + int pad = TUN_RX_PAD; + int err = 0; rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); @@ -1690,7 +1723,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, */ if (hdr->gso_type || !xdp_prog) { *skb_xdp = 1; - return __tun_build_skb(alloc_frag, buf, buflen, len, pad, delta); + return __tun_build_skb(alloc_frag, buf, buflen, len, pad); } *skb_xdp = 0; @@ -1698,9 +1731,8 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, local_bh_disable(); rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); - if (xdp_prog && !*skb_xdp) { + if (xdp_prog) { struct xdp_buff xdp; - void *orig_data; u32 act; xdp.data_hard_start = buf; @@ -1708,49 +1740,31 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, xdp_set_data_meta_invalid(&xdp); xdp.data_end = xdp.data + len; xdp.rxq = &tfile->xdp_rxq; - orig_data = xdp.data; - act = bpf_prog_run_xdp(xdp_prog, &xdp); - switch (act) { - case XDP_REDIRECT: - get_page(alloc_frag->page); - alloc_frag->offset += buflen; - err = xdp_do_redirect(tun->dev, &xdp, xdp_prog); - xdp_do_flush_map(); - if (err) - goto err_redirect; - goto out; - case XDP_TX: + act = bpf_prog_run_xdp(xdp_prog, &xdp); + if (act == XDP_REDIRECT || act == XDP_TX) { get_page(alloc_frag->page); alloc_frag->offset += buflen; - if (tun_xdp_tx(tun->dev, &xdp) < 0) - goto err_redirect; - goto out; - case XDP_PASS: - delta = orig_data - xdp.data; - len = xdp.data_end - xdp.data; - break; - default: - bpf_warn_invalid_xdp_action(act); - /* fall through */ - case XDP_ABORTED: - trace_xdp_exception(tun->dev, xdp_prog, act); - /* fall through */ - case XDP_DROP: - goto out; } + err = tun_xdp_act(tun, xdp_prog, &xdp, act); + if (err < 0) + goto err_xdp; + if (err != XDP_PASS) + goto out; + + pad = xdp.data - xdp.data_hard_start; + len = xdp.data_end - xdp.data; } rcu_read_unlock(); local_bh_enable(); - return __tun_build_skb(alloc_frag, buf, buflen, len, pad, delta); + return __tun_build_skb(alloc_frag, buf, buflen, len, pad); -err_redirect: +err_xdp: put_page(alloc_frag->page); out: rcu_read_unlock(); local_bh_enable(); - this_cpu_inc(tun->pcpu_stats->rx_dropped); return NULL; } From patchwork Wed Sep 12 03:17:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 10596693 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF995920 for ; Wed, 12 Sep 2018 03:18:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AE0B629BEF for ; Wed, 12 Sep 2018 03:18:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A272529BF6; Wed, 12 Sep 2018 03:18:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E77D29BEF for ; Wed, 12 Sep 2018 03:18:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728222AbeILIUS (ORCPT ); Wed, 12 Sep 2018 04:20:18 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:44554 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728125AbeILIUR (ORCPT ); Wed, 12 Sep 2018 04:20:17 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B507640241C0; Wed, 12 Sep 2018 03:17:53 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-130.pek2.redhat.com [10.72.12.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id E227F2027EA4; Wed, 12 Sep 2018 03:17:50 +0000 (UTC) From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, jasowang@redhat.com Subject: [PATCH net-next V2 07/11] tuntap: move XDP flushing out of tun_do_xdp() Date: Wed, 12 Sep 2018 11:17:05 +0800 Message-Id: <20180912031709.14112-8-jasowang@redhat.com> In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com> References: <20180912031709.14112-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 12 Sep 2018 03:17:53 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 12 Sep 2018 03:17:53 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This will allow adding batch flushing on top. Signed-off-by: Jason Wang --- drivers/net/tun.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 14fe94098180..3ae539374f6b 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1660,7 +1660,6 @@ static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog, switch (act) { case XDP_REDIRECT: err = xdp_do_redirect(tun->dev, xdp, xdp_prog); - xdp_do_flush_map(); if (err) return err; break; @@ -1749,6 +1748,8 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, err = tun_xdp_act(tun, xdp_prog, &xdp, act); if (err < 0) goto err_xdp; + if (err == XDP_REDIRECT) + xdp_do_flush_map(); if (err != XDP_PASS) goto out; From patchwork Wed Sep 12 03:17:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 10596691 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2CD8813B8 for ; Wed, 12 Sep 2018 03:18:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BD5929BEF for ; Wed, 12 Sep 2018 03:18:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1026129BF6; Wed, 12 Sep 2018 03:18:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8457429BEF for ; Wed, 12 Sep 2018 03:18:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728246AbeILIUX (ORCPT ); Wed, 12 Sep 2018 04:20:23 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:40116 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728242AbeILIUW (ORCPT ); Wed, 12 Sep 2018 04:20:22 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DF5D440216F7; Wed, 12 Sep 2018 03:17:57 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-130.pek2.redhat.com [10.72.12.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id 973072027EA4; Wed, 12 Sep 2018 03:17:54 +0000 (UTC) From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, jasowang@redhat.com Subject: [PATCH net-next V2 08/11] tun: switch to new type of msg_control Date: Wed, 12 Sep 2018 11:17:06 +0800 Message-Id: <20180912031709.14112-9-jasowang@redhat.com> In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com> References: <20180912031709.14112-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 12 Sep 2018 03:17:57 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 12 Sep 2018 03:17:57 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch introduces to a new tun/tap specific msg_control: #define TUN_MSG_UBUF 1 #define TUN_MSG_PTR 2 struct tun_msg_ctl { int type; void *ptr; }; This allows us to pass different kinds of msg_control through sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will be used by the existed vhost_net zerocopy code. The second is XDP buff, which allows vhost_net to pass XDP buff to TUN. This could be used to implement accepting an array of XDP buffs from vhost_net in the following patches. Signed-off-by: Jason Wang --- drivers/net/tap.c | 18 ++++++++++++------ drivers/net/tun.c | 6 +++++- drivers/vhost/net.c | 7 +++++-- include/linux/if_tun.h | 14 ++++++++++++++ 4 files changed, 36 insertions(+), 9 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index f0f7cd977667..7996ed7cbf18 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -619,7 +619,7 @@ static inline struct sk_buff *tap_alloc_skb(struct sock *sk, size_t prepad, #define TAP_RESERVE HH_DATA_OFF(ETH_HLEN) /* Get packet from user space buffer */ -static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m, +static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, struct iov_iter *from, int noblock) { int good_linear = SKB_MAX_HEAD(TAP_RESERVE); @@ -663,7 +663,7 @@ static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m, if (unlikely(len < ETH_HLEN)) goto err; - if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) { + if (msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) { struct iov_iter i; copylen = vnet_hdr.hdr_len ? @@ -724,11 +724,11 @@ static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m, tap = rcu_dereference(q->tap); /* copy skb_ubuf_info for callback when skb has no error */ if (zerocopy) { - skb_shinfo(skb)->destructor_arg = m->msg_control; + skb_shinfo(skb)->destructor_arg = msg_control; skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY; skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG; - } else if (m && m->msg_control) { - struct ubuf_info *uarg = m->msg_control; + } else if (msg_control) { + struct ubuf_info *uarg = msg_control; uarg->callback(uarg, false); } @@ -1150,7 +1150,13 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) { struct tap_queue *q = container_of(sock, struct tap_queue, sock); - return tap_get_user(q, m, &m->msg_iter, m->msg_flags & MSG_DONTWAIT); + struct tun_msg_ctl *ctl = m->msg_control; + + if (ctl && ctl->type != TUN_MSG_UBUF) + return -EINVAL; + + return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter, + m->msg_flags & MSG_DONTWAIT); } static int tap_recvmsg(struct socket *sock, struct msghdr *m, diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 3ae539374f6b..89779b58c7ca 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2431,11 +2431,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) int ret; struct tun_file *tfile = container_of(sock, struct tun_file, socket); struct tun_struct *tun = tun_get(tfile); + struct tun_msg_ctl *ctl = m->msg_control; if (!tun) return -EBADFD; - ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter, + if (ctl && ctl->type != TUN_MSG_UBUF) + return -EINVAL; + + ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter, m->msg_flags & MSG_DONTWAIT, m->msg_flags & MSG_MORE); tun_put(tun); diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 4e656f89cb22..fb01ce6d981c 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -620,6 +620,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) .msg_controllen = 0, .msg_flags = MSG_DONTWAIT, }; + struct tun_msg_ctl ctl; size_t len, total_len = 0; int err; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); @@ -664,8 +665,10 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) ubuf->ctx = nvq->ubufs; ubuf->desc = nvq->upend_idx; refcount_set(&ubuf->refcnt, 1); - msg.msg_control = ubuf; - msg.msg_controllen = sizeof(ubuf); + msg.msg_control = &ctl; + ctl.type = TUN_MSG_UBUF; + ctl.ptr = ubuf; + msg.msg_controllen = sizeof(ctl); ubufs = nvq->ubufs; atomic_inc(&ubufs->refcount); nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV; diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h index 3d2996dc7d85..12e3eebf0ce6 100644 --- a/include/linux/if_tun.h +++ b/include/linux/if_tun.h @@ -16,9 +16,23 @@ #define __IF_TUN_H #include +#include #define TUN_XDP_FLAG 0x1UL +#define TUN_MSG_UBUF 1 +#define TUN_MSG_PTR 2 +struct tun_msg_ctl { + unsigned short type; + unsigned short num; + void *ptr; +}; + +struct tun_xdp_hdr { + int buflen; + struct virtio_net_hdr gso; +}; + #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) struct socket *tun_get_socket(struct file *); struct ptr_ring *tun_get_tx_ring(struct file *file); From patchwork Wed Sep 12 03:17:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 10596689 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB10013B8 for ; Wed, 12 Sep 2018 03:18:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA5B929BEF for ; Wed, 12 Sep 2018 03:18:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CC3D329BF6; Wed, 12 Sep 2018 03:18:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 514B029BEF for ; Wed, 12 Sep 2018 03:18:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728286AbeILIU0 (ORCPT ); Wed, 12 Sep 2018 04:20:26 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:44560 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728242AbeILIUZ (ORCPT ); Wed, 12 Sep 2018 04:20:25 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 257DF40241C4; Wed, 12 Sep 2018 03:18:01 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-130.pek2.redhat.com [10.72.12.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id 82A132027EB7; Wed, 12 Sep 2018 03:17:58 +0000 (UTC) From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, jasowang@redhat.com Subject: [PATCH net-next V2 09/11] tuntap: accept an array of XDP buffs through sendmsg() Date: Wed, 12 Sep 2018 11:17:07 +0800 Message-Id: <20180912031709.14112-10-jasowang@redhat.com> In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com> References: <20180912031709.14112-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 12 Sep 2018 03:18:01 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 12 Sep 2018 03:18:01 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch implement TUN_MSG_PTR msg_control type. This type allows the caller to pass an array of XDP buffs to tuntap through ptr field of the tun_msg_control. If an XDP program is attached, tuntap can run XDP program directly. If not, tuntap will build skb and do a fast receiving since part of the work has been done by vhost_net. This will avoid lots of indirect calls thus improves the icache utilization and allows to do XDP batched flushing when doing XDP redirection. Signed-off-by: Jason Wang --- drivers/net/tun.c | 117 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 114 insertions(+), 3 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 89779b58c7ca..2a2cd35853b7 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2426,22 +2426,133 @@ static void tun_sock_write_space(struct sock *sk) kill_fasync(&tfile->fasync, SIGIO, POLL_OUT); } +static int tun_xdp_one(struct tun_struct *tun, + struct tun_file *tfile, + struct xdp_buff *xdp, int *flush) +{ + struct tun_xdp_hdr *hdr = xdp->data_hard_start; + struct virtio_net_hdr *gso = &hdr->gso; + struct tun_pcpu_stats *stats; + struct bpf_prog *xdp_prog; + struct sk_buff *skb = NULL; + u32 rxhash = 0, act; + int buflen = hdr->buflen; + int err = 0; + bool skb_xdp = false; + + xdp_prog = rcu_dereference(tun->xdp_prog); + if (xdp_prog) { + if (gso->gso_type) { + skb_xdp = true; + goto build; + } + xdp_set_data_meta_invalid(xdp); + xdp->rxq = &tfile->xdp_rxq; + + act = bpf_prog_run_xdp(xdp_prog, xdp); + err = tun_xdp_act(tun, xdp_prog, xdp, act); + if (err < 0) { + put_page(virt_to_head_page(xdp->data)); + return err; + } + + switch (err) { + case XDP_REDIRECT: + *flush = true; + /* fall through */ + case XDP_TX: + return 0; + case XDP_PASS: + break; + default: + put_page(virt_to_head_page(xdp->data)); + return 0; + } + } + +build: + skb = build_skb(xdp->data_hard_start, buflen); + if (!skb) { + err = -ENOMEM; + goto out; + } + + skb_reserve(skb, xdp->data - xdp->data_hard_start); + skb_put(skb, xdp->data_end - xdp->data); + + if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) { + this_cpu_inc(tun->pcpu_stats->rx_frame_errors); + kfree_skb(skb); + err = -EINVAL; + goto out; + } + + skb->protocol = eth_type_trans(skb, tun->dev); + skb_reset_network_header(skb); + skb_probe_transport_header(skb, 0); + + if (skb_xdp) { + err = do_xdp_generic(xdp_prog, skb); + if (err != XDP_PASS) + goto out; + } + + if (!rcu_dereference(tun->steering_prog)) + rxhash = __skb_get_hash_symmetric(skb); + + netif_receive_skb(skb); + + stats = get_cpu_ptr(tun->pcpu_stats); + u64_stats_update_begin(&stats->syncp); + stats->rx_packets++; + stats->rx_bytes += skb->len; + u64_stats_update_end(&stats->syncp); + put_cpu_ptr(stats); + + if (rxhash) + tun_flow_update(tun, rxhash, tfile); + +out: + return err; +} + static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) { - int ret; + int ret, i; struct tun_file *tfile = container_of(sock, struct tun_file, socket); struct tun_struct *tun = tun_get(tfile); struct tun_msg_ctl *ctl = m->msg_control; + struct xdp_buff *xdp; if (!tun) return -EBADFD; - if (ctl && ctl->type != TUN_MSG_UBUF) - return -EINVAL; + if (ctl && (ctl->type == TUN_MSG_PTR)) { + int n = ctl->num; + int flush = 0; + + local_bh_disable(); + rcu_read_lock(); + + for (i = 0; i < n; i++) { + xdp = &((struct xdp_buff *)ctl->ptr)[i]; + tun_xdp_one(tun, tfile, xdp, &flush); + } + + if (flush) + xdp_do_flush_map(); + + rcu_read_unlock(); + local_bh_enable(); + + ret = total_len; + goto out; + } ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter, m->msg_flags & MSG_DONTWAIT, m->msg_flags & MSG_MORE); +out: tun_put(tun); return ret; } From patchwork Wed Sep 12 03:17:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 10596687 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F284920 for ; Wed, 12 Sep 2018 03:18:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5EF9A29BEF for ; Wed, 12 Sep 2018 03:18:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 52AC329BF6; Wed, 12 Sep 2018 03:18:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E9A0A29BEF for ; Wed, 12 Sep 2018 03:18:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728315AbeILIUb (ORCPT ); Wed, 12 Sep 2018 04:20:31 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:52134 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728242AbeILIUb (ORCPT ); Wed, 12 Sep 2018 04:20:31 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6D2B5804B4DB; Wed, 12 Sep 2018 03:18:06 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-130.pek2.redhat.com [10.72.12.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id DB04D2027EB7; Wed, 12 Sep 2018 03:18:01 +0000 (UTC) From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, jasowang@redhat.com Subject: [PATCH net-next V2 10/11] tap: accept an array of XDP buffs through sendmsg() Date: Wed, 12 Sep 2018 11:17:08 +0800 Message-Id: <20180912031709.14112-11-jasowang@redhat.com> In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com> References: <20180912031709.14112-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Wed, 12 Sep 2018 03:18:06 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Wed, 12 Sep 2018 03:18:06 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch implement TUN_MSG_PTR msg_control type. This type allows the caller to pass an array of XDP buffs to tuntap through ptr field of the tun_msg_control. Tap will build skb through those XDP buffers. This will avoid lots of indirect calls thus improves the icache utilization and allows to do XDP batched flushing when doing XDP redirection. Signed-off-by: Jason Wang --- drivers/net/tap.c | 74 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 72 insertions(+), 2 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 7996ed7cbf18..a4ab4a791fe7 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -1146,14 +1146,84 @@ static const struct file_operations tap_fops = { #endif }; +static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp) +{ + struct tun_xdp_hdr *hdr = xdp->data_hard_start; + struct virtio_net_hdr *gso = &hdr->gso; + int buflen = hdr->buflen; + int vnet_hdr_len = 0; + struct tap_dev *tap; + struct sk_buff *skb; + int err, depth; + + if (q->flags & IFF_VNET_HDR) + vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz); + + skb = build_skb(xdp->data_hard_start, buflen); + if (!skb) { + err = -ENOMEM; + goto err; + } + + skb_reserve(skb, xdp->data - xdp->data_hard_start); + skb_put(skb, xdp->data_end - xdp->data); + + skb_set_network_header(skb, ETH_HLEN); + skb_reset_mac_header(skb); + skb->protocol = eth_hdr(skb)->h_proto; + + if (vnet_hdr_len) { + err = virtio_net_hdr_to_skb(skb, gso, tap_is_little_endian(q)); + if (err) + goto err_kfree; + } + + skb_probe_transport_header(skb, ETH_HLEN); + + /* Move network header to the right position for VLAN tagged packets */ + if ((skb->protocol == htons(ETH_P_8021Q) || + skb->protocol == htons(ETH_P_8021AD)) && + __vlan_get_protocol(skb, skb->protocol, &depth) != 0) + skb_set_network_header(skb, depth); + + rcu_read_lock(); + tap = rcu_dereference(q->tap); + if (tap) { + skb->dev = tap->dev; + dev_queue_xmit(skb); + } else { + kfree_skb(skb); + } + rcu_read_unlock(); + + return 0; + +err_kfree: + kfree_skb(skb); +err: + rcu_read_lock(); + tap = rcu_dereference(q->tap); + if (tap && tap->count_tx_dropped) + tap->count_tx_dropped(tap); + rcu_read_unlock(); + return err; +} + static int tap_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) { struct tap_queue *q = container_of(sock, struct tap_queue, sock); struct tun_msg_ctl *ctl = m->msg_control; + struct xdp_buff *xdp; + int i; - if (ctl && ctl->type != TUN_MSG_UBUF) - return -EINVAL; + if (ctl && (ctl->type == TUN_MSG_PTR)) { + for (i = 0; i < ctl->num; i++) { + xdp = &((struct xdp_buff *)ctl->ptr)[i]; + tap_get_user_xdp(q, xdp); + } + return 0; + } return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter, m->msg_flags & MSG_DONTWAIT); From patchwork Wed Sep 12 03:17:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Wang X-Patchwork-Id: 10596685 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7DE09920 for ; Wed, 12 Sep 2018 03:18:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 69B5629BEF for ; Wed, 12 Sep 2018 03:18:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5AD3229BF3; Wed, 12 Sep 2018 03:18:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7FBF929BEB for ; Wed, 12 Sep 2018 03:18:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728354AbeILIUh (ORCPT ); Wed, 12 Sep 2018 04:20:37 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:56164 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728242AbeILIUg (ORCPT ); Wed, 12 Sep 2018 04:20:36 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 18AE1402333A; Wed, 12 Sep 2018 03:18:10 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-130.pek2.redhat.com [10.72.12.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1DF322027EA4; Wed, 12 Sep 2018 03:18:06 +0000 (UTC) From: Jason Wang To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, mst@redhat.com, jasowang@redhat.com Subject: [PATCH net-next V2 11/11] vhost_net: batch submitting XDP buffers to underlayer sockets Date: Wed, 12 Sep 2018 11:17:09 +0800 Message-Id: <20180912031709.14112-12-jasowang@redhat.com> In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com> References: <20180912031709.14112-1-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Wed, 12 Sep 2018 03:18:10 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Wed, 12 Sep 2018 03:18:10 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch implements XDP batching for vhost_net. The idea is first to try to do userspace copy and build XDP buff directly in vhost. Instead of submitting the packet immediately, vhost_net will batch them in an array and submit every 64 (VHOST_NET_BATCH) packets to the under layer sockets through msg_control of sendmsg(). When XDP is enabled on the TUN/TAP, TUN/TAP can process XDP inside a loop without caring GUP thus it can do batch map flushing. When XDP is not enabled or not supported, the underlayer socket need to build skb and pass it to network core. The batched packet submission allows us to do batching like netif_receive_skb_list() in the future. This saves lots of indirect calls for better cache utilization. For the case that we can't so batching e.g when sndbuf is limited or packet size is too large, we will go for usual one packet per sendmsg() way. Doing testpmd on various setups gives us: Test /+pps% XDP_DROP on TAP /+44.8% XDP_REDIRECT on TAP /+29% macvtap (skb) /+26% Netperf tests shows obvious improvements for small packet transmission: size/session/+thu%/+normalize% 64/ 1/ +2%/ 0% 64/ 2/ +3%/ +1% 64/ 4/ +7%/ +5% 64/ 8/ +8%/ +6% 256/ 1/ +3%/ 0% 256/ 2/ +10%/ +7% 256/ 4/ +26%/ +22% 256/ 8/ +27%/ +23% 512/ 1/ +3%/ +2% 512/ 2/ +19%/ +14% 512/ 4/ +43%/ +40% 512/ 8/ +45%/ +41% 1024/ 1/ +4%/ 0% 1024/ 2/ +27%/ +21% 1024/ 4/ +38%/ +73% 1024/ 8/ +15%/ +24% 2048/ 1/ +10%/ +7% 2048/ 2/ +16%/ +12% 2048/ 4/ 0%/ +2% 2048/ 8/ 0%/ +2% 4096/ 1/ +36%/ +60% 4096/ 2/ -11%/ -26% 4096/ 4/ 0%/ +14% 4096/ 8/ 0%/ +4% 16384/ 1/ -1%/ +5% 16384/ 2/ 0%/ +2% 16384/ 4/ 0%/ -3% 16384/ 8/ 0%/ +4% 65535/ 1/ 0%/ +10% 65535/ 2/ 0%/ +8% 65535/ 4/ 0%/ +1% 65535/ 8/ 0%/ +3% Signed-off-by: Jason Wang --- drivers/vhost/net.c | 174 ++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 161 insertions(+), 13 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index fb01ce6d981c..dd4e0a301635 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -116,6 +116,8 @@ struct vhost_net_virtqueue { * For RX, number of batched heads */ int done_idx; + /* Number of XDP frames batched */ + int batched_xdp; /* an array of userspace buffers info */ struct ubuf_info *ubuf_info; /* Reference counting for outstanding ubufs. @@ -123,6 +125,8 @@ struct vhost_net_virtqueue { struct vhost_net_ubuf_ref *ubufs; struct ptr_ring *rx_ring; struct vhost_net_buf rxq; + /* Batched XDP buffs */ + struct xdp_buff *xdp; }; struct vhost_net { @@ -338,6 +342,11 @@ static bool vhost_sock_zcopy(struct socket *sock) sock_flag(sock->sk, SOCK_ZEROCOPY); } +static bool vhost_sock_xdp(struct socket *sock) +{ + return sock_flag(sock->sk, SOCK_XDP); +} + /* In case of DMA done not in order in lower device driver for some reason. * upend_idx is used to track end of used idx, done_idx is used to track head * of used idx. Once lower device DMA done contiguously, we will signal KVM @@ -444,10 +453,37 @@ static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq) nvq->done_idx = 0; } +static void vhost_tx_batch(struct vhost_net *net, + struct vhost_net_virtqueue *nvq, + struct socket *sock, + struct msghdr *msghdr) +{ + struct tun_msg_ctl ctl = { + .type = TUN_MSG_PTR, + .num = nvq->batched_xdp, + .ptr = nvq->xdp, + }; + int err; + + if (nvq->batched_xdp == 0) + goto signal_used; + + msghdr->msg_control = &ctl; + err = sock->ops->sendmsg(sock, msghdr, 0); + if (unlikely(err < 0)) { + vq_err(&nvq->vq, "Fail to batch sending packets\n"); + return; + } + +signal_used: + vhost_net_signal_used(nvq); + nvq->batched_xdp = 0; +} + static int vhost_net_tx_get_vq_desc(struct vhost_net *net, struct vhost_net_virtqueue *nvq, unsigned int *out_num, unsigned int *in_num, - bool *busyloop_intr) + struct msghdr *msghdr, bool *busyloop_intr) { struct vhost_virtqueue *vq = &nvq->vq; unsigned long uninitialized_var(endtime); @@ -455,8 +491,9 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net, out_num, in_num, NULL, NULL); if (r == vq->num && vq->busyloop_timeout) { + /* Flush batched packets first */ if (!vhost_sock_zcopy(vq->private_data)) - vhost_net_signal_used(nvq); + vhost_tx_batch(net, nvq, vq->private_data, msghdr); preempt_disable(); endtime = busy_clock() + vq->busyloop_timeout; while (vhost_can_busy_poll(endtime)) { @@ -512,7 +549,7 @@ static int get_tx_bufs(struct vhost_net *net, struct vhost_virtqueue *vq = &nvq->vq; int ret; - ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, busyloop_intr); + ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, msg, busyloop_intr); if (ret < 0 || ret == vq->num) return ret; @@ -540,6 +577,80 @@ static bool tx_can_batch(struct vhost_virtqueue *vq, size_t total_len) !vhost_vq_avail_empty(vq->dev, vq); } +#define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD) + +static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, + struct iov_iter *from) +{ + struct vhost_virtqueue *vq = &nvq->vq; + struct socket *sock = vq->private_data; + struct page_frag *alloc_frag = ¤t->task_frag; + struct virtio_net_hdr *gso; + struct xdp_buff *xdp = &nvq->xdp[nvq->batched_xdp]; + struct tun_xdp_hdr *hdr; + size_t len = iov_iter_count(from); + int headroom = vhost_sock_xdp(sock) ? XDP_PACKET_HEADROOM : 0; + int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + int pad = SKB_DATA_ALIGN(VHOST_NET_RX_PAD + headroom + nvq->sock_hlen); + int sock_hlen = nvq->sock_hlen; + void *buf; + int copied; + + if (unlikely(len < nvq->sock_hlen)) + return -EFAULT; + + if (SKB_DATA_ALIGN(len + pad) + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE) + return -ENOSPC; + + buflen += SKB_DATA_ALIGN(len + pad); + alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES); + if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL))) + return -ENOMEM; + + buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; + copied = copy_page_from_iter(alloc_frag->page, + alloc_frag->offset + + offsetof(struct tun_xdp_hdr, gso), + sock_hlen, from); + if (copied != sock_hlen) + return -EFAULT; + + hdr = buf; + gso = &hdr->gso; + + if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && + vhost16_to_cpu(vq, gso->csum_start) + + vhost16_to_cpu(vq, gso->csum_offset) + 2 > + vhost16_to_cpu(vq, gso->hdr_len)) { + gso->hdr_len = cpu_to_vhost16(vq, + vhost16_to_cpu(vq, gso->csum_start) + + vhost16_to_cpu(vq, gso->csum_offset) + 2); + + if (vhost16_to_cpu(vq, gso->hdr_len) > len) + return -EINVAL; + } + + len -= sock_hlen; + copied = copy_page_from_iter(alloc_frag->page, + alloc_frag->offset + pad, + len, from); + if (copied != len) + return -EFAULT; + + xdp->data_hard_start = buf; + xdp->data = buf + pad; + xdp->data_end = xdp->data + len; + hdr->buflen = buflen; + + get_page(alloc_frag->page); + alloc_frag->offset += buflen; + + ++nvq->batched_xdp; + + return 0; +} + static void handle_tx_copy(struct vhost_net *net, struct socket *sock) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; @@ -556,10 +667,14 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock) size_t len, total_len = 0; int err; int sent_pkts = 0; + bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX); for (;;) { bool busyloop_intr = false; + if (nvq->done_idx == VHOST_NET_BATCH) + vhost_tx_batch(net, nvq, sock, &msg); + head = get_tx_bufs(net, nvq, &msg, &out, &in, &len, &busyloop_intr); /* On error, stop handling until the next kick. */ @@ -577,14 +692,34 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock) break; } - vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head); - vq->heads[nvq->done_idx].len = 0; - total_len += len; - if (tx_can_batch(vq, total_len)) - msg.msg_flags |= MSG_MORE; - else - msg.msg_flags &= ~MSG_MORE; + + /* For simplicity, TX batching is only enabled if + * sndbuf is unlimited. + */ + if (sock_can_batch) { + err = vhost_net_build_xdp(nvq, &msg.msg_iter); + if (!err) { + goto done; + } else if (unlikely(err != -ENOSPC)) { + vhost_tx_batch(net, nvq, sock, &msg); + vhost_discard_vq_desc(vq, 1); + vhost_net_enable_vq(net, vq); + break; + } + + /* We can't build XDP buff, go for single + * packet path but let's flush batched + * packets. + */ + vhost_tx_batch(net, nvq, sock, &msg); + msg.msg_control = NULL; + } else { + if (tx_can_batch(vq, total_len)) + msg.msg_flags |= MSG_MORE; + else + msg.msg_flags &= ~MSG_MORE; + } /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(sock, &msg, len); @@ -596,15 +731,17 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock) if (err != len) pr_debug("Truncated TX packet: len %d != %zd\n", err, len); - if (++nvq->done_idx >= VHOST_NET_BATCH) - vhost_net_signal_used(nvq); +done: + vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head); + vq->heads[nvq->done_idx].len = 0; + ++nvq->done_idx; if (vhost_exceeds_weight(++sent_pkts, total_len)) { vhost_poll_queue(&vq->poll); break; } } - vhost_net_signal_used(nvq); + vhost_tx_batch(net, nvq, sock, &msg); } static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) @@ -1081,6 +1218,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) struct vhost_dev *dev; struct vhost_virtqueue **vqs; void **queue; + struct xdp_buff *xdp; int i; n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_RETRY_MAYFAIL); @@ -1101,6 +1239,14 @@ static int vhost_net_open(struct inode *inode, struct file *f) } n->vqs[VHOST_NET_VQ_RX].rxq.queue = queue; + xdp = kmalloc_array(VHOST_NET_BATCH, sizeof(*xdp), GFP_KERNEL); + if (!xdp) { + kfree(vqs); + kvfree(n); + kfree(queue); + } + n->vqs[VHOST_NET_VQ_TX].xdp = xdp; + dev = &n->dev; vqs[VHOST_NET_VQ_TX] = &n->vqs[VHOST_NET_VQ_TX].vq; vqs[VHOST_NET_VQ_RX] = &n->vqs[VHOST_NET_VQ_RX].vq; @@ -1111,6 +1257,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) n->vqs[i].ubuf_info = NULL; n->vqs[i].upend_idx = 0; n->vqs[i].done_idx = 0; + n->vqs[i].batched_xdp = 0; n->vqs[i].vhost_hlen = 0; n->vqs[i].sock_hlen = 0; n->vqs[i].rx_ring = NULL; @@ -1194,6 +1341,7 @@ static int vhost_net_release(struct inode *inode, struct file *f) * since jobs can re-queue themselves. */ vhost_net_flush(n); kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue); + kfree(n->vqs[VHOST_NET_VQ_TX].xdp); kfree(n->dev.vqs); kvfree(n); return 0;