From patchwork Tue Mar 12 21:44:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590613 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D940914374A for ; Tue, 12 Mar 2024 21:44:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279877; cv=none; b=Ypa6nyZLUU1/+rl90W4fh6Nj1kMHFVIKvNGQ9NFJJaYY1aQQTA4DtJHZIlRKq4Z6gbR8ozJr0NlEHK5WqEEOfFXYDcT77eKCrfZsIIe/lvr37D3vK1b0y574HKlE2uJQdbj8l0I57UzAWURcMzp/AmlVlxBOyi8PAMtI3PwlSfs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279877; c=relaxed/simple; bh=5LIz+n21IO/BI82JjqRprXHon+WApM+Js27POLGqQWc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KOt+UoIGPzlbJqlzkprI1tt8oG+PweFiCwMSFj5vkKXHooeltkf5DKZrUcn2TemdP2+04VgVrwJaoP+0zOaYYqPJW13nFq99iglniyiPy4zuwXb9Dui7fKkdX6PuIJU5S7MKajGB0hTw9QGi8SQjt5U1oDU6YCFIOeKX5Yg/vHk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=Pwv0Gfy7; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="Pwv0Gfy7" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-6e649a2548cso4226522b3a.3 for ; Tue, 12 Mar 2024 14:44:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279875; x=1710884675; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ki4E7PMtuF4SSoJCESY70J5JzUkN3PrG7EZAs5PtLHg=; b=Pwv0Gfy7BNKH8v92w1sXPteJAvi/mSXFv/Id3cgoo9jYzp3kDNMgrmxH383koBD2Jh 3r7LYUZXfrAfe0qzZXewMdwnaZX6XU4K/XosLjCYJ4vsJQHYyMYyJw6f5ApP0GaFpOMn 7pqA/DRHAmqYPMXUACUh2o1AsGe+bshtL21jquYGxTCTmEFSLNQhD5o9RFd6znB+QK6S EbgNlb2i/tDEAytbupO1pKZU0POvVmw65OQSiddT1M/5XoZNCcZbFjTY+q6Zx+zdpOLZ pzSbuqIoy7VDkRyS1Xq1d5uplwzcUZOJK1oi0IIJnpg1SU22hkShOzGpF27O5kH0G8rx 9DnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279875; x=1710884675; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ki4E7PMtuF4SSoJCESY70J5JzUkN3PrG7EZAs5PtLHg=; b=MrvkSSdzGrhD423yuQl2GhjERT2/LFZyvU1ufOkPPPV9bbhiq7KwC7a6n8YaRWUmQM X+9NScRwQiNUz+o7Iy4XPkKdT8Rw8W8obaf0lU7TWgPCcckXsykeYEYTAWCYg/JufBzY +o+buRwnz5REsQHrMvmn7K6oG5n0G+ACJan7sy0OUU+rLHbfGEm7j+UD5EohrQAjWAuB bwMGqd9Jlv5Nus8gOrRb01NahgAOIBvadh8eaWlVBByvhAn4gk2mWuI+q9ucJsBomC8a Bjm2ahphySgpwn30dvNDDJXWGotFLNvY2CSRJvvZSLUuKqHqTjn/X8918M5SU30wLwVs 3RrQ== X-Gm-Message-State: AOJu0YwQjdRZzbvZbZRPJ26TE0nhcPjONRwK4RX664bxI3FS5pA7hGqw Lb03MmIB6wMgezWw4tl7HvU3PinTpU+Wr7Cerip6wrj+0sn5K4ns60NjG+3yNJUncwCa9J/T5Ul k X-Google-Smtp-Source: AGHT+IEytnLFhFfxcOEd6KuDVTAKRKneRq76aXN3P/JDiA9CUmXL59+Us4Nkh7nNStj0CDihdR7Ksw== X-Received: by 2002:a05:6a00:c95:b0:6e5:696d:9eb8 with SMTP id a21-20020a056a000c9500b006e5696d9eb8mr916681pfv.3.1710279874793; Tue, 12 Mar 2024 14:44:34 -0700 (PDT) Received: from localhost (fwdproxy-prn-024.fbsv.net. [2a03:2880:ff:18::face:b00c]) by smtp.gmail.com with ESMTPSA id t64-20020a628143000000b006e6aee6807dsm985326pfd.22.2024.03.12.14.44.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:34 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 01/16] net: generalise pp provider params passing Date: Tue, 12 Mar 2024 14:44:15 -0700 Message-ID: <20240312214430.2923019-2-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov RFC only, not for upstream Add a way to pass custom page pool parameters, but the final version should converge with devmem. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/netdev_rx_queue.h | 3 +++ net/core/dev.c | 2 +- net/core/page_pool.c | 3 +++ 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h index 5dc35628633a..41f8c4e049bb 100644 --- a/include/net/netdev_rx_queue.h +++ b/include/net/netdev_rx_queue.h @@ -26,6 +26,9 @@ struct netdev_rx_queue { */ struct napi_struct *napi; struct netdev_dmabuf_binding *binding; + + const struct memory_provider_ops *pp_ops; + void *pp_private; } ____cacheline_aligned_in_smp; /* diff --git a/net/core/dev.c b/net/core/dev.c index 255a38cf59b1..2096ff57685a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2189,7 +2189,7 @@ int netdev_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, rxq = __netif_get_rx_queue(dev, rxq_idx); - if (rxq->binding) + if (rxq->binding || rxq->pp_ops) return -EEXIST; err = xa_alloc(&binding->bound_rxq_list, &xa_idx, rxq, xa_limit_32b, diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 53039d2f8514..5d5b78878473 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -262,6 +262,9 @@ static int page_pool_init(struct page_pool *pool, if (binding) { pool->mp_ops = &dmabuf_devmem_ops; pool->mp_priv = binding; + } else if (pool->p.queue && pool->p.queue->pp_ops) { + pool->mp_ops = pool->p.queue->pp_ops; + pool->mp_priv = pool->p.queue->pp_private; } if (pool->mp_ops) { From patchwork Tue Mar 12 21:44:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590614 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 04DC4144023 for ; Tue, 12 Mar 2024 21:44:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279878; cv=none; b=kI5UO+X/pn55XE6y9Hs1xI4GOR+1HakgP9h6r3ix3Bd3fehwqKoBsetxYIsaIkRCYz45yg+kTboyAHW3ohq+GxAR0FK61LXfEAjDyGfbuE+F5+T7lhIxKlhoo9V6luehe3owfTDPQbBUJx2+Q2YHYqLS5cq7t3ZaoRaiMKuqhvM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279878; c=relaxed/simple; bh=aE7ib+EBfLAW69AC4DQA5Mh7DG/hbMyPxEWYSNYC1ug=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sYzHzUQgafYvpdblgsevJTOzs4/MXDIfxwDh3GOY0P7C7REiFZD7X9Dsl4H8Kj7vmOa9LdJQw6sQ203drozb4BXuPlMN6p66bxtsidguXdmvllKn0dVcFAKysYkKxIqq8VfDEsdBxEXPa7zSPRADCZSYIPrGzDIcuygXG+z9lns= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=tMdL1EOO; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="tMdL1EOO" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-6e6b5831bc8so246889b3a.0 for ; Tue, 12 Mar 2024 14:44:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279876; x=1710884676; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3F8FT2j0FIObJvAopDIV6rSNV9Y4g8SglBip9V0gj+4=; b=tMdL1EOO0fOI1kxUdL7tme7vYAR9xvCk55epmnJfx2BDQOVnjpQ58blA6cqSa2NbdJ zjTC2kxtBgZ6s4FbNUVOFTr695eOKypA9ouEXudwCPzqcfOrdr9gVkB13DgEaOGCz6I2 kDisRdIMsDZAiz53mhiA5G57gJOrDwXIOyoh4gOvF6M3tc/cislo8ye1H2pCaDU60NKG jNVccKZnaYr2fpS1N0ZotVykklUXi4h9vdnV42we9acxIulDjCKojMKQIdoWeZ+7lJNh xvJtAK+jRVN6BHUPQ/zGTYrZZuR3lg6yNUF6IhfA0t81X93rQ6d42dZUQWa+TyTi6MTn QaAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279876; x=1710884676; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3F8FT2j0FIObJvAopDIV6rSNV9Y4g8SglBip9V0gj+4=; b=DlMttoPplB/MaE5fBgBMkzTWOtKnK3aaNZOuuftD/TP3fKReR8ihEiN03b21shwu1L CM1XR9TskURHwJPdYgkfespsNxicS7VNNTE+Kk1H3YT3LRq+JdHaSvf2DtrxAPapC3CJ nbaQ9kHOmAj/NeGAU1VnMLK4Xog79QN9hjY+K19KVUP6hAsT5IO4qPMRb7wqEY3iM2hT Y0v/iU34Xcyx7UQjO47O91L6WoHQOkZxH0cdCt7tUVaKCBystMKTCm9x+SjIRH09+cl7 2sr/L+Pj1njGUefoA71rGi4eHUeNiPyMzxp+9ogGE++YapyKWG2Lr01dJMKSO9lEiAet JhZg== X-Gm-Message-State: AOJu0YyecOFjxPRB8c6y5QVLrqkNbxhsdtU4hEyWWpA2zL/93F5f/GuH Ob12aItMfXfzhEXyker709DEmxKQkP+Mof+nxAUyhSqg1Q4ncAK4/N8PWf8pM9L70iOfe8bebMu 0 X-Google-Smtp-Source: AGHT+IGykgqEeNv3UDI3vBebc0uAZwt30CKqqnPbpoZHLhV7iapU8FXMUGnxzRZGJMPaxuhaI4ymBg== X-Received: by 2002:a05:6a00:4fcb:b0:6e6:7af6:2201 with SMTP id le11-20020a056a004fcb00b006e67af62201mr712106pfb.8.1710279875807; Tue, 12 Mar 2024 14:44:35 -0700 (PDT) Received: from localhost (fwdproxy-prn-002.fbsv.net. [2a03:2880:ff:2::face:b00c]) by smtp.gmail.com with ESMTPSA id p35-20020a056a000a2300b006e6ab799457sm1304163pfh.110.2024.03.12.14.44.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:35 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 02/16] io_uring: delayed cqe commit Date: Tue, 12 Mar 2024 14:44:16 -0700 Message-ID: <20240312214430.2923019-3-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov RFC only, not for upstream A stub patch allowing to delay and batch the final step of cqe posting for aux cqes. A different version will be sent separately to upstream. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/linux/io_uring_types.h | 1 + io_uring/io_uring.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index d8111d64812b..500772189fee 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -205,6 +205,7 @@ struct io_submit_state { bool plug_started; bool need_plug; + bool flush_cqes; unsigned short submit_nr; unsigned int cqes_count; struct blk_plug plug; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index cf2f514b7cc0..e44c2ef271b9 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -176,7 +176,7 @@ static struct ctl_table kernel_io_uring_disabled_table[] = { static inline void io_submit_flush_completions(struct io_ring_ctx *ctx) { if (!wq_list_empty(&ctx->submit_state.compl_reqs) || - ctx->submit_state.cqes_count) + ctx->submit_state.cqes_count || ctx->submit_state.flush_cqes) __io_submit_flush_completions(ctx); } @@ -1598,6 +1598,7 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx) io_free_batch_list(ctx, state->compl_reqs.first); INIT_WQ_LIST(&state->compl_reqs); } + ctx->submit_state.flush_cqes = false; } static unsigned io_cqring_events(struct io_ring_ctx *ctx) From patchwork Tue Mar 12 21:44:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590615 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5183144040 for ; Tue, 12 Mar 2024 21:44:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279879; cv=none; b=DclpeLErgDCovoWbeokbbDnWVBPudDpoY2ds1H9OH7bqzcemaGw159U4y6RGUhjX0FgAU9PEx5v4DJ5K4Kns9a/lh8dPY4p7BUdt9ntV1iiDEcR/qYqLHegIGggWD/z6gE/5pMVZoo94bB/YDi3pSLlCM54627SZOPfWPlO2EX8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279879; c=relaxed/simple; bh=W3WXMs3garqhAb32SzklKNlFKnh2r5NOCtW103aAhto=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gqFAjTf4gFGw8S/g0yX8wV3Cp05stvGGBVC//PLgXmbmdC56hAUmxgLMvr1h0rNNqkikPZhbv70KJRzy1hgccJ2d3qA4+v+VOIpNr1/TujlM1NJUwCsbatDgv/YBmSOy3kvG1blraoJdrPpTOxzqyywBAqPCcPWPamt4SuFKHCg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=np0WOB+7; arc=none smtp.client-ip=209.85.216.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="np0WOB+7" Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-29c14800a7fso1745195a91.2 for ; Tue, 12 Mar 2024 14:44:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279877; x=1710884677; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=e+tA/rM03kyiWmHG7W3g16fbOzLvHn1bTNN6KvcsaQ0=; b=np0WOB+7rylLFl/sAjdGwd736MiRPi/74atlRWcJqw8aDDXihCL7MjJ8A+81PHNKYF WUDDLZOYpdVwswxmNVYr4oUoKohwa/iGplXQfUfKp/GWX8EL8tfzxUiF6y/v4y/L2Rv7 i7jpZz0AyQzXYznLtMT/QGb91qhxaw099L7ZuITz2DEV69GI1JeBnqR+t0e1PWKbdE3w oHF0HyrDAEXdcjmaN/KPY1NfpULYPuTX6c4zRkQc944yH0PDY83cQ3xtXFcl+hwYiRsY W7VqUbFyOjtSOhZ8XOGLkLfsoq9XTSJmlsanlyJS0ELylT/TlM/Q5OPnI3ljIIXxzPaV Tn+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279877; x=1710884677; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e+tA/rM03kyiWmHG7W3g16fbOzLvHn1bTNN6KvcsaQ0=; b=H+FOohKTvEuk3fHlDlhMzSMMAfYXLon2O1kE+v2k9ItNKtNKlApRYrm1JdTHRJ9L8J 7WT2RtBk2RNMzsxuYXNJzhvBPFsmjv9KPFjGaSd5GViarM4KSbZTdvfklYHEqTe/yY1o BajYNQcFr+uAOQuC+Zp6XRXaT/BWu0ltVtU1MRVrgiFdrTm4ihZmg01UU9WWGxZxq/aa HcqtcWeudGkIPjTOUSXFqdAFUskqeQWhufnLeY9tdpxhNAv0K0nyt23wQoXUyQ1RAKPd nq/bPmhZwEz6KLIwnfw0R59CY08Fe9KEz2zjLpcJI2o21NJj4Qe4/pfbKUq3hau7Z24s wgsw== X-Gm-Message-State: AOJu0YxR1Ld/dNwxUUtKXiyfb69/B8pOOGiPdF368L3JExkNHNPjFaA3 fu/3pGTymKn5p+1rDkLBu8vnmmKOmTC1LtMdm1YW0EOWKDYIFtME6Bfi+TFdEOmMnWvLMJHlg6j O X-Google-Smtp-Source: AGHT+IH5+jrF0vlqMy6uApDvygsRxFdJckSqSzWO7yWz6TQbdmOZB1Brgo8GpGb8aHoRcJ50xBNLBg== X-Received: by 2002:a17:90b:400f:b0:29b:d899:e7a7 with SMTP id ie15-20020a17090b400f00b0029bd899e7a7mr3000692pjb.7.1710279876687; Tue, 12 Mar 2024 14:44:36 -0700 (PDT) Received: from localhost (fwdproxy-prn-015.fbsv.net. [2a03:2880:ff:f::face:b00c]) by smtp.gmail.com with ESMTPSA id c12-20020a17090aa60c00b0029c0cc16888sm56937pjq.1.2024.03.12.14.44.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:36 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 03/16] net: page_pool: add ->scrub mem provider callback Date: Tue, 12 Mar 2024 14:44:17 -0700 Message-ID: <20240312214430.2923019-4-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov page pool is now waiting for all ppiovs to return before destroying itself, and for that to happen the memory provider might need to push some buffers, flush caches and so on. todo: we'll try to get by without it before the final release Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/page_pool/types.h | 1 + net/core/page_pool.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 096cd2455b2c..347837b83d36 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -134,6 +134,7 @@ struct memory_provider_ops { void (*destroy)(struct page_pool *pool); netmem_ref (*alloc_pages)(struct page_pool *pool, gfp_t gfp); bool (*release_page)(struct page_pool *pool, netmem_ref netmem); + void (*scrub)(struct page_pool *pool); }; extern const struct memory_provider_ops dmabuf_devmem_ops; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 5d5b78878473..fc92e551ed13 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -984,6 +984,9 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool) static void page_pool_scrub(struct page_pool *pool) { + if (pool->mp_ops && pool->mp_ops->scrub) + pool->mp_ops->scrub(pool); + page_pool_empty_alloc_cache_once(pool); pool->destroy_cnt++; From patchwork Tue Mar 12 21:44:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590616 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BBFBF144055 for ; Tue, 12 Mar 2024 21:44:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279880; cv=none; b=ZBtm7/ZL1x+CMfExKA+XmC/FkvzKyEPza/TGFm5J+J/luD4uOT4WwVGvoKKQ/Csgn1z9i+en+xFr/oBr8B1xnZGIy9k3cLmsLMw0LZ5DbslXKYrFgyPHJwFL9TCQzr7l/nHDGRZ3eAZXQ8mitsc/jHcwhY2KENjisMvfHZCIcIs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279880; c=relaxed/simple; bh=I4DkiUJPKP/SNnghbAnRFXvAzrtp+wFZDRulKLlCD6E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hwJX0Uajk/KG3fLeLPLzClKAZTuEuOb+Y/tZMUPvJjEowDjK/09fTmgzB7AbgPldbOcW4CfuQJYc/t05Q7CnZfW60bHVlajw9g/S3J4c1jZCwo4ofnVN5OIvbW4FRmtVsGHYI1njMMtRc2pcnt79qhXZo9hdeEy+BruJFhHa5vU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=2UpLVQHJ; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="2UpLVQHJ" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-6e6b6e000a4so86896b3a.0 for ; Tue, 12 Mar 2024 14:44:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279878; x=1710884678; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OHnlF2xfY1sV5nPIj52b/9LW93hoRYAyd9rWuaJTtKA=; b=2UpLVQHJG0fDY8HG5IWIn1MIctW8hyHIXq8ZDotVbnBQQSvXs6sGCPuFVKuMc046Qz lfa1sGrtJfYA56SyQ48n8fg5QZ+bQdk45nnrB9fIVWp/TQ5/WSBVWOVuqqY5Z2VWgRlU D3fKcPA+xxZ0JA76GjYXtlusoKyXCDSLtJQntRaUf2+8i/pme20zt/Phq6wCoeZdFPuU 8FlV/X34qrlqbA1E0d+yz3AOfmRK2iTTa9DJzY4wCoTfYt+mrIH3GCu/4L78QhYHEwfN XosFR6e3D4hAN8gx9uX19vpBYVuEvOZAMASYYZjOxefBhqDQdCX7IQyaW+wFjLJlQ6KB K6yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279878; x=1710884678; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OHnlF2xfY1sV5nPIj52b/9LW93hoRYAyd9rWuaJTtKA=; b=csjcud+IpimiEWwNqteSE4xZZniHYCwgEXVqSx0QlXinmZENnizzUnD4suuosDNkci 54JACkPBC4MjDj4c1ErZvKGEl2LFNztul3hLO6q6lWS0zjt6LO+OdTjU3Kg+ob3f3LpG lMspnuNMXO8LEWGNOG/+GSQOBC9aASr0nggt9RmqVG4B2aut1vbLCZCZ97vp7vmOzzcy ob0Hdsgc7MipXxKfNk5Mb4v8klsMtxoYukVqJ/PXWsZcz2nu3N0/YCeoAH4oivn/LpMn sB+O8VsayrsP2D1d8CE5mEq1FdNNicm843LAPXs2x18J7El19dsHkpEqYPMP01J8o0/i ukbw== X-Gm-Message-State: AOJu0Yy2GTs5THFAwQq99+jJeKyDq4q8ytiwktwMroYnmuabBXtbSEpx G4kcL8PPCV8vnQTuE0Hc4LjMUNNOcpVBocEU/mhpNNnfRi8syXmEgLeMIcFnjLgjSmPjZwXuFtZ i X-Google-Smtp-Source: AGHT+IGijpkcSJHwZGLpndsro/SSwC0BV49Gse2r6r8xefCXAEDgDSknnBQKaFjRr1FvLb0RWkrOEw== X-Received: by 2002:a05:6a20:1583:b0:1a3:1180:4232 with SMTP id h3-20020a056a20158300b001a311804232mr1061647pzj.29.1710279877802; Tue, 12 Mar 2024 14:44:37 -0700 (PDT) Received: from localhost (fwdproxy-prn-020.fbsv.net. [2a03:2880:ff:14::face:b00c]) by smtp.gmail.com with ESMTPSA id c15-20020aa78c0f000000b006e623357178sm6774016pfd.176.2024.03.12.14.44.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:37 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 04/16] io_uring: separate header for exported net bits Date: Tue, 12 Mar 2024 14:44:18 -0700 Message-ID: <20240312214430.2923019-5-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov We're exporting some io_uring bits to networking, e.g. for implementing a net callback for io_uring cmds, but we don't want to expose more than needed. Add a separate header for networking. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/linux/io_uring.h | 6 ------ include/linux/io_uring/net.h | 18 ++++++++++++++++++ io_uring/uring_cmd.c | 1 + net/socket.c | 2 +- 4 files changed, 20 insertions(+), 7 deletions(-) create mode 100644 include/linux/io_uring/net.h diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 68ed6697fece..e123d5e17b52 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -11,7 +11,6 @@ void __io_uring_cancel(bool cancel_all); void __io_uring_free(struct task_struct *tsk); void io_uring_unreg_ringfd(void); const char *io_uring_get_opcode(u8 opcode); -int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); bool io_is_uring_fops(struct file *file); static inline void io_uring_files_cancel(void) @@ -45,11 +44,6 @@ static inline const char *io_uring_get_opcode(u8 opcode) { return ""; } -static inline int io_uring_cmd_sock(struct io_uring_cmd *cmd, - unsigned int issue_flags) -{ - return -EOPNOTSUPP; -} static inline bool io_is_uring_fops(struct file *file) { return false; diff --git a/include/linux/io_uring/net.h b/include/linux/io_uring/net.h new file mode 100644 index 000000000000..b58f39fed4d5 --- /dev/null +++ b/include/linux/io_uring/net.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_IO_URING_NET_H +#define _LINUX_IO_URING_NET_H + +struct io_uring_cmd; + +#if defined(CONFIG_IO_URING) +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); + +#else +static inline int io_uring_cmd_sock(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + return -EOPNOTSUPP; +} +#endif + +#endif diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 42f63adfa54a..0b504b33806c 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include diff --git a/net/socket.c b/net/socket.c index ed3df2f749bf..c69cd0e652b8 100644 --- a/net/socket.c +++ b/net/socket.c @@ -88,7 +88,7 @@ #include #include #include -#include +#include #include #include From patchwork Tue Mar 12 21:44:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590617 Received: from mail-oa1-f53.google.com (mail-oa1-f53.google.com [209.85.160.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 167FB1448D8 for ; Tue, 12 Mar 2024 21:44:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279881; cv=none; b=SsaRXUjfmFLEbG6PZAC4fLnwrZ8RDaRLPs4tXNEQT5xHZ3yKdG2FQFhaOUXRamtSjnrM6j9KQZRBYD1rLffoi5TDGanWLHgxmHDPeTPo/2K4OM4OQ33Ec1NMK19ddnaisdRBe2fwC/6REIdhRW0p0urABizKKeeBCr0EQmAmZnk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279881; c=relaxed/simple; bh=xDw4xwqYNvzEtsEJkeXp/EDyNctETx7Y/M4UY4XenI0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ka/BDTCM/ZPks71w5Cx1ukiDmLQabHOTh67+Flu/HcAuUlVHcrn15wBCBeaz2div/2SVS/DSQuVy+aiwA0L9+3tc8cDZdxyQaend5pbjeFJ/NgxfkfbN7wbuNN+hsFf2H6p/vDbDnBhDkbRwHAiU3TvixOhEu4afN/wB0KJh2+M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=Z7F/TmZG; arc=none smtp.client-ip=209.85.160.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="Z7F/TmZG" Received: by mail-oa1-f53.google.com with SMTP id 586e51a60fabf-221e1910c3bso1761173fac.1 for ; Tue, 12 Mar 2024 14:44:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279879; x=1710884679; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GaKAi998LkbXWeL8EN5F32Ym68wcxO7wQb8zF1vI+q4=; b=Z7F/TmZGgbkjRJLZ9NBFR384kenlH+pGUruI7BwmRJWXJIRWYVA39fckbC3HRcAnMs wH7bqAxFYBUvqaCwiBBvEHjdd34/4gWFTmt38KK0rQSe3CytVx0kGFwZQHFD/6KM/xZq /jVfXjDYagO/VFem7P6t1ncPyxc1qfVPkDZ9iHc8U+l0xJcNaKQ6FfU0JhveG++9u+9t ujlXfDSbkH8gz0X5jeEKFpSbA7Z8qSSn+/ysdEyBRXgDKPNIOGjkKupNq3OQJjNiWsoL wr1PGho9xQMtbJ+bXmhCNMG612w1R4V0j18M3Oeo+rLFyUOiiJCU+9x2hiap8Nbxsn4W XONw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279879; x=1710884679; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GaKAi998LkbXWeL8EN5F32Ym68wcxO7wQb8zF1vI+q4=; b=JrOa68bF//j4OoV+kPjIb9ESRpHkvFLSldXVeKv3GAZYYeR5KfZSbnE/VjCdjUxams I6P90EycuBt8cNAezO8oBcuCP1W1Jpsh5I8SKwdmw7gKyffs2+WouQUvFwB8U+iH8J+h rbaYB1gypmyXNObqH6a2gEjufLz/XPHuPR6HcOJkxtcZUkcms658/5/BUfldMQJAr5gT U3mc/wYbSlWvF45qyCZV0dkfnI+b7VIYwZbiUseu9Xg2QpiWDYq63B2S5LcxscByLTQ1 A/H0MJvPJWvwlWcpK3aR3UlIalVbW/oaWwjkK5cKzMeuSy0chipBBK60l5rtrw02KNmV 6YtA== X-Gm-Message-State: AOJu0Yxb/6Zu99pmKn4wlI/K5lNVMjXrA4KTvHSnLGXkn56QQFR1NG4p /tCZU22Eu76J+sUDnqv84OIQyabOqp2o6KcgSoxFOIfYwhpfvVMUp+JikWik0M+QlZKhOTXIYQ7 F X-Google-Smtp-Source: AGHT+IEzXnWfV21NXBOZIFGvF02twSyBteR8js/V5WflDWgKdloxj7LQXfsLPnQTtijYpqDXRcZSTA== X-Received: by 2002:a05:6870:d6a2:b0:21e:a40e:7465 with SMTP id z34-20020a056870d6a200b0021ea40e7465mr11616026oap.24.1710279878811; Tue, 12 Mar 2024 14:44:38 -0700 (PDT) Received: from localhost (fwdproxy-prn-006.fbsv.net. [2a03:2880:ff:6::face:b00c]) by smtp.gmail.com with ESMTPSA id g17-20020aa79f11000000b006e6b41511fdsm428082pfr.94.2024.03.12.14.44.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:38 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 05/16] io_uring: introduce interface queue Date: Tue, 12 Mar 2024 14:44:19 -0700 Message-ID: <20240312214430.2923019-6-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: David Wei This patch introduces a new object in io_uring called an interface queue (ifq) which contains: * A pool region allocated by userspace and registered w/ io_uring where Rx data is transferred to via DMA. * A net device and one specific Rx queue in it that will be configured for ZC Rx. * A new shared ringbuf w/ userspace called a refill ring. When userspace is done with bufs with Rx packet payloads, it writes entries into this ring to tell the kernel that bufs can be re-used by the NIC again. Each entry in the refill ring is a struct io_uring_rbuf_rqe. On the completion side, the main CQ ring is used to notify userspace of recv()'d packets. Big CQEs (32 bytes) are required to support this, as the upper 16 bytes are used by ZC Rx to store a feature specific struct io_uring_rbuf_cqe. Add two new struct types: 1. io_uring_rbuf_rqe - entry in refill ring 2. io_uring_rbuf_cqe - entry in upper 16 bytes of a big CQE For now, each io_uring instance has a single ifq, and each ifq has a single pool region associated with one Rx queue. Add a new opcode and functions to setup and tear down an ifq. Size and offsets of the shared refill ring are returned to userspace for it to mmap in the registration struct io_uring_zc_rx_ifq_reg, similar to the main SQ/CQ rings. Signed-off-by: David Wei --- include/linux/io_uring_types.h | 4 ++ include/uapi/linux/io_uring.h | 40 ++++++++++++ io_uring/Makefile | 3 +- io_uring/io_uring.c | 7 +++ io_uring/register.c | 7 +++ io_uring/zc_rx.c | 109 +++++++++++++++++++++++++++++++++ io_uring/zc_rx.h | 35 +++++++++++ 7 files changed, 204 insertions(+), 1 deletion(-) create mode 100644 io_uring/zc_rx.c create mode 100644 io_uring/zc_rx.h diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 500772189fee..27e750a02ea5 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -39,6 +39,8 @@ enum io_uring_cmd_flags { IO_URING_F_COMPAT = (1 << 12), }; +struct io_zc_rx_ifq; + struct io_wq_work_node { struct io_wq_work_node *next; }; @@ -385,6 +387,8 @@ struct io_ring_ctx { struct io_rsrc_data *file_data; struct io_rsrc_data *buf_data; + struct io_zc_rx_ifq *ifq; + /* protected by ->uring_lock */ struct list_head rsrc_ref_list; struct io_alloc_cache rsrc_node_cache; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 7bd10201a02b..7b643fe420c5 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -575,6 +575,9 @@ enum { IORING_REGISTER_NAPI = 27, IORING_UNREGISTER_NAPI = 28, + /* register a network interface queue for zerocopy */ + IORING_REGISTER_ZC_RX_IFQ = 29, + /* this goes last */ IORING_REGISTER_LAST, @@ -782,6 +785,43 @@ enum { SOCKET_URING_OP_SETSOCKOPT, }; +struct io_uring_rbuf_rqe { + __u32 off; + __u32 len; + __u16 region; + __u8 __pad[6]; +}; + +struct io_uring_rbuf_cqe { + __u32 off; + __u32 len; + __u16 region; + __u8 __pad[6]; +}; + +struct io_rbuf_rqring_offsets { + __u32 head; + __u32 tail; + __u32 rqes; + __u8 __pad[4]; +}; + +/* + * Argument for IORING_REGISTER_ZC_RX_IFQ + */ +struct io_uring_zc_rx_ifq_reg { + __u32 if_idx; + /* hw rx descriptor ring id */ + __u32 if_rxq_id; + __u32 region_id; + __u32 rq_entries; + __u32 flags; + __u16 cpu; + + __u32 mmap_sz; + struct io_rbuf_rqring_offsets rq_off; +}; + #ifdef __cplusplus } #endif diff --git a/io_uring/Makefile b/io_uring/Makefile index 2e1d4e03799c..bb47231c611b 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -8,7 +8,8 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \ statx.o net.o msg_ring.o timeout.o \ sqpoll.o fdinfo.o tctx.o poll.o \ cancel.o kbuf.o rsrc.o rw.o opdef.o \ - notif.o waitid.o register.o truncate.o + notif.o waitid.o register.o truncate.o \ + zc_rx.o obj-$(CONFIG_IO_WQ) += io-wq.o obj-$(CONFIG_FUTEX) += futex.o obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index e44c2ef271b9..5614c47cecd9 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -95,6 +95,7 @@ #include "waitid.h" #include "futex.h" #include "napi.h" +#include "zc_rx.h" #include "timeout.h" #include "poll.h" @@ -2861,6 +2862,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) return; mutex_lock(&ctx->uring_lock); + io_unregister_zc_rx_ifqs(ctx); if (ctx->buf_data) __io_sqe_buffers_unregister(ctx); if (ctx->file_data) @@ -3032,6 +3034,11 @@ static __cold void io_ring_exit_work(struct work_struct *work) io_cqring_overflow_kill(ctx); mutex_unlock(&ctx->uring_lock); } + if (ctx->ifq) { + mutex_lock(&ctx->uring_lock); + io_shutdown_zc_rx_ifqs(ctx); + mutex_unlock(&ctx->uring_lock); + } if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) io_move_task_work_from_local(ctx); diff --git a/io_uring/register.c b/io_uring/register.c index 99c37775f974..760f0b6a051c 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -27,6 +27,7 @@ #include "cancel.h" #include "kbuf.h" #include "napi.h" +#include "zc_rx.h" #define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \ IORING_REGISTER_LAST + IORING_OP_LAST) @@ -563,6 +564,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_unregister_napi(ctx, arg); break; + case IORING_REGISTER_ZC_RX_IFQ: + ret = -EINVAL; + if (!arg || nr_args != 1) + break; + ret = io_register_zc_rx_ifq(ctx, arg); + break; default: ret = -EINVAL; break; diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c new file mode 100644 index 000000000000..e6c33f94c086 --- /dev/null +++ b/io_uring/zc_rx.c @@ -0,0 +1,109 @@ +// SPDX-License-Identifier: GPL-2.0 +#if defined(CONFIG_PAGE_POOL) +#include +#include +#include +#include + +#include + +#include "io_uring.h" +#include "kbuf.h" +#include "zc_rx.h" + +static int io_allocate_rbuf_ring(struct io_zc_rx_ifq *ifq, + struct io_uring_zc_rx_ifq_reg *reg) +{ + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; + size_t off, rq_size; + void *ptr; + + off = sizeof(struct io_uring); + rq_size = reg->rq_entries * sizeof(struct io_uring_rbuf_rqe); + ptr = (void *) __get_free_pages(gfp, get_order(off + rq_size)); + if (!ptr) + return -ENOMEM; + ifq->rq_ring = (struct io_uring *)ptr; + ifq->rqes = (struct io_uring_rbuf_rqe *)((char *)ptr + off); + return 0; +} + +static void io_free_rbuf_ring(struct io_zc_rx_ifq *ifq) +{ + if (ifq->rq_ring) + folio_put(virt_to_folio(ifq->rq_ring)); +} + +static struct io_zc_rx_ifq *io_zc_rx_ifq_alloc(struct io_ring_ctx *ctx) +{ + struct io_zc_rx_ifq *ifq; + + ifq = kzalloc(sizeof(*ifq), GFP_KERNEL); + if (!ifq) + return NULL; + + ifq->if_rxq_id = -1; + ifq->ctx = ctx; + return ifq; +} + +static void io_zc_rx_ifq_free(struct io_zc_rx_ifq *ifq) +{ + io_free_rbuf_ring(ifq); + kfree(ifq); +} + +int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_ifq_reg __user *arg) +{ + struct io_uring_zc_rx_ifq_reg reg; + struct io_zc_rx_ifq *ifq; + int ret; + + if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN && + ctx->flags & IORING_SETUP_CQE32)) + return -EINVAL; + if (copy_from_user(®, arg, sizeof(reg))) + return -EFAULT; + if (ctx->ifq) + return -EBUSY; + if (reg.if_rxq_id == -1) + return -EINVAL; + + ifq = io_zc_rx_ifq_alloc(ctx); + if (!ifq) + return -ENOMEM; + + ret = io_allocate_rbuf_ring(ifq, ®); + if (ret) + goto err; + + ifq->rq_entries = reg.rq_entries; + ifq->if_rxq_id = reg.if_rxq_id; + ctx->ifq = ifq; + + return 0; +err: + io_zc_rx_ifq_free(ifq); + return ret; +} + +void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx) +{ + struct io_zc_rx_ifq *ifq = ctx->ifq; + + lockdep_assert_held(&ctx->uring_lock); + + if (!ifq) + return; + + ctx->ifq = NULL; + io_zc_rx_ifq_free(ifq); +} + +void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx) +{ + lockdep_assert_held(&ctx->uring_lock); +} + +#endif diff --git a/io_uring/zc_rx.h b/io_uring/zc_rx.h new file mode 100644 index 000000000000..35b019b275e0 --- /dev/null +++ b/io_uring/zc_rx.h @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef IOU_ZC_RX_H +#define IOU_ZC_RX_H + +struct io_zc_rx_ifq { + struct io_ring_ctx *ctx; + struct net_device *dev; + struct io_uring *rq_ring; + struct io_uring_rbuf_rqe *rqes; + u32 rq_entries; + + /* hw rx descriptor ring id */ + u32 if_rxq_id; +}; + +#if defined(CONFIG_PAGE_POOL) +int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_ifq_reg __user *arg); +void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx); +void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx); +#else +static inline int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_ifq_reg __user *arg) +{ + return -EOPNOTSUPP; +} +static inline void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx) +{ +} +static inline void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx) +{ +} +#endif + +#endif From patchwork Tue Mar 12 21:44:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590618 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAA571448E6 for ; Tue, 12 Mar 2024 21:44:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279882; cv=none; b=Fh7jovKoEL8kJi4EjHib5QHp8VMoLRszR6ThzD+TYTVE0j2PlCK0XQSfFkAlhOg5zWVaZ4tcngI18puvNO6KX/Y6KXFa4pjp+MWb9TO6GLyPejVJ6nQtiE31/vWsHT6BpCIdPA76OhZAfrNKSnouDdD8U3cuujSoTTrZQkvbtTM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279882; c=relaxed/simple; bh=Au5TaJwti+/pCSZs4du2poKs+p8m7iLV6MJss0mMKm0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WTqoGycBoIPoHt/A5mui/Zx48uEy8e+ym2B3xUpCi57pZ/dJJTaJRHw5DNCK/vGw2UnhdKCy8N5+EgJhMq1b5+k2IZHWFFLHSf0UeXSiGNI0DiFbwZsO4ErbF9tSc6QDyt+cMUbGtLvXIt3ZGzyfu7youNPoVOt15iyEIdBZ1LI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=idQWWBiN; arc=none smtp.client-ip=209.85.215.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="idQWWBiN" Received: by mail-pg1-f169.google.com with SMTP id 41be03b00d2f7-5c66b093b86so224397a12.0 for ; Tue, 12 Mar 2024 14:44:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279880; x=1710884680; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lDipnnybmoHT+VNKp8P+1c7yNG5XeUpEUaaGmNJ3fNs=; b=idQWWBiNJBfxT+sV6Gyi1lhuJMVgLK22FwC7eMbOqW7uQMwEv8IgBcr1qNec7WHLrD IuPLnJgCHiVoGZJQqHM1cL4FZg6evIWVhz3yUfu2mXOjdr2SbaJVGASB7/079j5whzAz m/z4HUBoR8v7f++3hDCFcYFVSaZxUYv+P3gByeNCbRRkAxMOBtmuW1Z+guRS0dkrjzf9 EQ73EN0Q0EQ/H7bxs+TCi+zjWWMEvINQcOVJ4i7v7V5595HMwEBvX+fZTZk+D9C84CCe nBBabPwKnnrY62S4KKvx+KUd4NgQ+g0BQ0kAUvZ8ZrRz+LQrrFStRBFmOiDTZIPtnSIp 5RkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279880; x=1710884680; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lDipnnybmoHT+VNKp8P+1c7yNG5XeUpEUaaGmNJ3fNs=; b=c2IR5KmjYziDygbH24MSs2brZJF/PUnetfCQEjsvBFdk2/7HS6YB2x15aoqUMxIQnL DcY5WTb5KlgQGrBxmruJbaiSukqMkM+Yq5/PHGrFDY2det+3LYGOzjx7J6qu//qlIi1Q 2mA8S/VWsaAMdckATtAezZHwF6xoKGLM2Wy3yAaebLCr7pkpZvuoxvHOWDQspBPjPaZC JAUp82X3Jse5keNuPi/K+OcTDo0sM9X5g3wrr5th0liVPVls9M6uEjZAtKNY9lEfCwxY SSWJbiatzUmUiQ2E96b7Z19qvkYEpYqQ//PlSSohzO/bb28R1DXdcfpmogcnIv/XXJzm RJ5g== X-Gm-Message-State: AOJu0YxHIUe9DQsuOXAyHuG/IHz9PBCxc2BZOEAjUfmjoTdvZ18DJt8u 60U4hhOOhNeCrxf6Ns4S0NTIVEYyYXD1FlR6kwZanh8f1JbwtFf1PKxJAVvDho/Lbew92Ix8IS1 A X-Google-Smtp-Source: AGHT+IGXFFMsyn12TGJpkd/F1krEy783KFAWHV6gkAUHyjREN82kjk/2T8EtM1scBgtGbhvLUHDvOg== X-Received: by 2002:a17:90a:bb85:b0:29b:f0e8:6454 with SMTP id v5-20020a17090abb8500b0029bf0e86454mr1012593pjr.21.1710279879717; Tue, 12 Mar 2024 14:44:39 -0700 (PDT) Received: from localhost (fwdproxy-prn-008.fbsv.net. [2a03:2880:ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id fr12-20020a17090ae2cc00b0029c472ec962sm50922pjb.47.2024.03.12.14.44.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:39 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 06/16] io_uring: add mmap support for shared ifq ringbuffers Date: Tue, 12 Mar 2024 14:44:20 -0700 Message-ID: <20240312214430.2923019-7-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: David Wei This patch adds mmap support for ifq refill ring. Just like the io_uring SQ/CQ rings, userspace issues a single mmap call using the io_uring fd w/ magic offset IORING_OFF_RQ_RING. An opaque ptr is returned to userspace, which is then expected to use the offsets returned in the registration struct to get access to the head/tail and rings. Signed-off-by: David Wei --- include/uapi/linux/io_uring.h | 2 ++ io_uring/io_uring.c | 5 +++++ io_uring/zc_rx.c | 15 ++++++++++++++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 7b643fe420c5..a085ed60478f 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -438,6 +438,8 @@ enum { #define IORING_OFF_PBUF_RING 0x80000000ULL #define IORING_OFF_PBUF_SHIFT 16 #define IORING_OFF_MMAP_MASK 0xf8000000ULL +#define IORING_OFF_RQ_RING 0x20000000ULL +#define IORING_OFF_RQ_SHIFT 16 /* * Filled with the offset for mmap(2) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 5614c47cecd9..280f2a2fd1fe 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3434,6 +3434,11 @@ static void *io_uring_validate_mmap_request(struct file *file, return ERR_PTR(-EINVAL); break; } + case IORING_OFF_RQ_RING: + if (!ctx->ifq) + return ERR_PTR(-EINVAL); + ptr = ctx->ifq->rq_ring; + break; default: return ERR_PTR(-EINVAL); } diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index e6c33f94c086..6987bb991418 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -58,6 +58,7 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, { struct io_uring_zc_rx_ifq_reg reg; struct io_zc_rx_ifq *ifq; + size_t ring_sz, rqes_sz; int ret; if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN && @@ -80,8 +81,20 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, ifq->rq_entries = reg.rq_entries; ifq->if_rxq_id = reg.if_rxq_id; - ctx->ifq = ifq; + ring_sz = sizeof(struct io_uring); + rqes_sz = sizeof(struct io_uring_rbuf_rqe) * ifq->rq_entries; + reg.mmap_sz = ring_sz + rqes_sz; + reg.rq_off.rqes = ring_sz; + reg.rq_off.head = offsetof(struct io_uring, head); + reg.rq_off.tail = offsetof(struct io_uring, tail); + + if (copy_to_user(arg, ®, sizeof(reg))) { + ret = -EFAULT; + goto err; + } + + ctx->ifq = ifq; return 0; err: io_zc_rx_ifq_free(ifq); From patchwork Tue Mar 12 21:44:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590619 Received: from mail-oo1-f44.google.com (mail-oo1-f44.google.com [209.85.161.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 118181448F9 for ; Tue, 12 Mar 2024 21:44:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279883; cv=none; b=KVKRSAoqs1gGBuczb47CWxXCXOa8adJqbhRUgvZvIstZG4QJUVxUlRQoRpX8xCwX8gjpwgb9fH6HCZzXD8OXjeJe1rCvDq1ewh5NhAFrAVYrLz0qaHptcIXsDbC1ASx7HxybH+2BnlZB8rtEHmjezedjgcQJbB9uuh3Xo+PJNOs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279883; c=relaxed/simple; bh=tt7Qj0Ee9P7w5WJse4ZzKwHZdEsDt8EZR8C9Xbepvys=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n5m482+3Nq7RACpsJzOQybg4TmYve5pMw6saXRl685q7t3XfUr++OEIDN44K2Zv9AoyXL/JwVRgXVvQRWACClnN7JuZtv8QCzoocj7BW9987rvZZJDu6jpS1J5LNqYY1JqZJ0b5PVkm0qNb7/LNKIFYLL1SSYTBD/vrIOS/Pyhk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=fqUNScge; arc=none smtp.client-ip=209.85.161.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="fqUNScge" Received: by mail-oo1-f44.google.com with SMTP id 006d021491bc7-5a1bd83d55dso2497760eaf.2 for ; Tue, 12 Mar 2024 14:44:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279881; x=1710884681; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RSdOpXaG5eyYtOQK8HIj1OSb+RRKm9zsEqtIIKixBPg=; b=fqUNScgesnLybc05nt4hihQfbQEVPk/gVfDFIKICiCoxBkx0uYn08ySxfzYQM7lBLt QevpTx6gdQqgfjuP90QP+X4ol82khu6c8ACAjl6vgmUfzanfd1KNnblMupSFTSW7CtsU IasXaTpB5kesL3dbINRVjnH7MJjdLA1yQQBh/U/lg6TW7aKylC9oz0BK94PQcX5Ituat usDMzKjzuZnSKggGQEtcdOQvfow7QbXDKdWR1YOjDMpmCjyM4ZjlNV7ZXJVzoZAyBWyz TeCdBmLp50TjIMciCOk6nYsq5ii9E4Z5DI6MKQuiII+6HdEQrnMBr5PUcW5XybXComRp rXUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279881; x=1710884681; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RSdOpXaG5eyYtOQK8HIj1OSb+RRKm9zsEqtIIKixBPg=; b=AbIhZyBk372btqPjr9rv1D86lWK9K0e+EpPcMOJeXQnATRbcZWwj/GkXWh7g4Prw55 M4apEMR/nPVOiIqyj5wdyDAQ99HbVf9e+sljVwA7aLUQHbq4MX/z5uKBJEn0lAu9WjC9 QEvcliX7XXvDTPfWauKEQw/jlMbgZr+RRZzAoLiYcnXbG60qht2FpvTcNG8PoORAlpAY XI5s3gOW7o+fx6pJnaGZFVIwt/fpnM3vF51d7MeUF5wBJMrca54izGqozXbSnJecgJYt RcB08sIMCmAJCbqGUhkd2NJprWuEW+Ny8Fi7zToHasBajGyzdvl8l/NiMBdEpN3rcslt HU5A== X-Gm-Message-State: AOJu0Yz7vxYLNUgvb3Pfkkz20rZSbd6ekGdoZm2FCdnx8eeJiywLLOuc kBnjvzGE8tY0Cw9b7EIMJIZwwR/APreZyCOe1ialeauHUSuVMD4IVRDfwjqTtjTxYsHYH9bNpEX F X-Google-Smtp-Source: AGHT+IEyE7hAxIEltzuiBpQ/u1B4tId9l6LVAuyoHvYl8QxHH1psuFJRa4N+p/CejwhJ97/E83Htrg== X-Received: by 2002:a05:6358:890:b0:17e:8b57:df56 with SMTP id m16-20020a056358089000b0017e8b57df56mr3208554rwj.5.1710279880701; Tue, 12 Mar 2024 14:44:40 -0700 (PDT) Received: from localhost (fwdproxy-prn-025.fbsv.net. [2a03:2880:ff:19::face:b00c]) by smtp.gmail.com with ESMTPSA id i33-20020a635421000000b005d880b41598sm6475388pgb.94.2024.03.12.14.44.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:40 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 07/16] netdev: add XDP_SETUP_ZC_RX command Date: Tue, 12 Mar 2024 14:44:21 -0700 Message-ID: <20240312214430.2923019-8-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: David Wei RFC only, not for upstream. This will be replaced with a separate ndo callback or some other mechanism in next patchset revisions. This patch adds a new XDP_SETUP_ZC_RX command that will be used in a later patch to enable or disable ZC RX for a specific RX queue. We are open to suggestions on a better way of doing this. Google's TCP devmem proposal sets up struct netdev_rx_queue which persists across device reset, then expects userspace to use an out-of-band method (e.g. ethtool) to reset the device, thus re-filling a hardware Rx queue. Signed-off-by: David Wei --- include/linux/netdevice.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ac7102118d68..699cce69a5a6 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1009,6 +1009,7 @@ enum bpf_netdev_command { BPF_OFFLOAD_MAP_ALLOC, BPF_OFFLOAD_MAP_FREE, XDP_SETUP_XSK_POOL, + XDP_SETUP_ZC_RX, }; struct bpf_prog_offload_ops; @@ -1047,6 +1048,11 @@ struct netdev_bpf { struct xsk_buff_pool *pool; u16 queue_id; } xsk; + /* XDP_SETUP_ZC_RX */ + struct { + struct io_zc_rx_ifq *ifq; + u16 queue_id; + } zc_rx; }; }; From patchwork Tue Mar 12 21:44:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590620 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A45AE14532C for ; Tue, 12 Mar 2024 21:44:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279884; cv=none; b=DBbl3aY1L00/Azgtzs8D9WR5BQ2C56BCv74wtD8nKUXfjJlZkHgxQFkRiU+Dx+IcaRkWm2iuu60RTRrzJWG5Crpv9PE31S/TKRm/GjCGlcwpa33WJ13IK/uWHKTfLDy/S02XaBNUqMhKMtxLrHiNDka6WeSxaS1kBTIoJYl8Lpk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279884; c=relaxed/simple; bh=B6RRKQIm2WKzMYtzsA77lI1a0FvklUtMlXfjfN0wIbw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AkUNs56drsriZyF8K9BtW2Rv89Fc+teJMrbJPJHQACFVSORe+I7osh84aTD4BQBRTCeVQPzJdyGTovdzWKYX7akZTVgivyN5r2dz7j0fPBLJeS2//IGiC+OkA8yxXJAcSj8PG1smXN/s+v0NRsjfZbrdf5lU69F0dU6dtGoY5Aw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=JkleoP2c; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="JkleoP2c" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-1dd6412da28so2958865ad.3 for ; Tue, 12 Mar 2024 14:44:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279882; x=1710884682; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rj0cYgShXYH5VhmSnGepet0UkxqjvpM473hZF/0XZ3c=; b=JkleoP2ciwhG1a0lGMoEm507wXF8Rl4UE4M6rxhXOsvkkiMXsDIzDMZXMbq5xXteJ4 r7xqc8Wg6WtHDXh3HUEK0pPzqVXK1CxkyVNM6dm8aI6MFHyF21yrdx52HPuoehDCYeZB ml5/Ls6v2tI1yb4/TPBmpzXXa3F/KGwrXfJFVgtNKp8njXrWz+m6XfDpqQwlhFx3ooev Mv2ip/9QeQuZy4pOgaBk/66zok6u9+pYm//2jbqgeyphMT2XTIuxxoexCRcEYPtgWEYv BoeO31snbw/44oXtBy+sD65qfkd/ND0lQzgWBfDbqD255LQahE0xumGXvgRWR96zeaRd RVtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279882; x=1710884682; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rj0cYgShXYH5VhmSnGepet0UkxqjvpM473hZF/0XZ3c=; b=df04c5eva2fqFVzxQ5U5rmtEOANFDwD1YMY8df5Sgdyqm25A+zuMJXND09TAv5pMI7 wSSUTypsnY2KmccJaoCVPyiKGuTfwe1zbRsnQEJYxpPGYaMk0EOsTECP/TsoHLwHhyE/ RKEr6c4+cenDVsQuCMGQMHJSgGpJWzNAMESeu6XGMrgjxNOC7+3k8a2BjiEGx3GG+1fG DmkMoXtWqtS3lKRa6bbdv4M1sSwlgvxp3nVyjK5IYLgkHmpfLMYx5ZRh2Ihvjl8ZvP1E beuswrExLkeHfLan5yAHwwpGb3eq7jiw9HAZ4f/9YaN+BkpQqWsHhHFSEKHVo09sPPd+ 6bsQ== X-Gm-Message-State: AOJu0Yyc5/DpmXEbu67Rytinu4DdPdNeXRwOu5CPSvYwH1NnDm/UwCvT W5Q5aYvJW14+YxqxEZrZTdiqUTwwdKisvzhhxDbu58cPsd+amnnH5T272K5Nf7AtlWCMg8ThoaW R X-Google-Smtp-Source: AGHT+IHXWW1fnHlG+Rl1vWCNUw0xrcw6qZRHYrzDZb+F06rysW+Ft1xEAXTu1TEH4AuY3fi9w6AfTg== X-Received: by 2002:a17:902:f68b:b0:1db:e74b:5bbf with SMTP id l11-20020a170902f68b00b001dbe74b5bbfmr5003902plg.0.1710279881660; Tue, 12 Mar 2024 14:44:41 -0700 (PDT) Received: from localhost (fwdproxy-prn-017.fbsv.net. [2a03:2880:ff:11::face:b00c]) by smtp.gmail.com with ESMTPSA id u12-20020a170902e80c00b001ddb73e719dsm2049257plg.27.2024.03.12.14.44.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:41 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 08/16] io_uring: setup ZC for an Rx queue when registering an ifq Date: Tue, 12 Mar 2024 14:44:22 -0700 Message-ID: <20240312214430.2923019-9-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: David Wei RFC only, not for upstream Just as with the previous patch, it will be migrated from ndo_bpf This patch sets up ZC for an Rx queue in a net device when an ifq is registered with io_uring. The Rx queue is specified in the registration struct. For now since there is only one ifq, its destruction is implicit during io_uring cleanup. Signed-off-by: David Wei --- io_uring/zc_rx.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index 6987bb991418..521eeea04f9d 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -4,6 +4,7 @@ #include #include #include +#include #include @@ -11,6 +12,34 @@ #include "kbuf.h" #include "zc_rx.h" +typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); + +static int __io_queue_mgmt(struct net_device *dev, struct io_zc_rx_ifq *ifq, + u16 queue_id) +{ + struct netdev_bpf cmd; + bpf_op_t ndo_bpf; + + ndo_bpf = dev->netdev_ops->ndo_bpf; + if (!ndo_bpf) + return -EINVAL; + + cmd.command = XDP_SETUP_ZC_RX; + cmd.zc_rx.ifq = ifq; + cmd.zc_rx.queue_id = queue_id; + return ndo_bpf(dev, &cmd); +} + +static int io_open_zc_rxq(struct io_zc_rx_ifq *ifq) +{ + return __io_queue_mgmt(ifq->dev, ifq, ifq->if_rxq_id); +} + +static int io_close_zc_rxq(struct io_zc_rx_ifq *ifq) +{ + return __io_queue_mgmt(ifq->dev, NULL, ifq->if_rxq_id); +} + static int io_allocate_rbuf_ring(struct io_zc_rx_ifq *ifq, struct io_uring_zc_rx_ifq_reg *reg) { @@ -49,6 +78,10 @@ static struct io_zc_rx_ifq *io_zc_rx_ifq_alloc(struct io_ring_ctx *ctx) static void io_zc_rx_ifq_free(struct io_zc_rx_ifq *ifq) { + if (ifq->if_rxq_id != -1) + io_close_zc_rxq(ifq); + if (ifq->dev) + dev_put(ifq->dev); io_free_rbuf_ring(ifq); kfree(ifq); } @@ -79,9 +112,18 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, if (ret) goto err; + ret = -ENODEV; + ifq->dev = dev_get_by_index(current->nsproxy->net_ns, reg.if_idx); + if (!ifq->dev) + goto err; + ifq->rq_entries = reg.rq_entries; ifq->if_rxq_id = reg.if_rxq_id; + ret = io_open_zc_rxq(ifq); + if (ret) + goto err; + ring_sz = sizeof(struct io_uring); rqes_sz = sizeof(struct io_uring_rbuf_rqe) * ifq->rq_entries; reg.mmap_sz = ring_sz + rqes_sz; @@ -90,6 +132,7 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, reg.rq_off.tail = offsetof(struct io_uring, tail); if (copy_to_user(arg, ®, sizeof(reg))) { + io_close_zc_rxq(ifq); ret = -EFAULT; goto err; } From patchwork Tue Mar 12 21:44:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590621 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9325A145339 for ; Tue, 12 Mar 2024 21:44:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279885; cv=none; b=Jk5+zePckr2AoUAwU6h80GMNNFdxYZ2r8ODPzE+PcRbxv8PrSHVQ+EGfdQktiVUMHkS2MPASL9kkguzEeH3sAGaITnccQPiZKzl1hBWw7DVjhqcITX/Qwa0Hl+HuE1voBf9x67hFLgoqlaaPz3W9+CD77OqShXtFSxYNPMFQc8I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279885; c=relaxed/simple; bh=jq3KJoKRpx3fmCjnE4Ssin7zDmiZaX04toiitKi+c6k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=i3pv4WE7nrA8ByllMUh0/GMam1+acgFiUzuWLd1jiEJjAEhobk/4anMrSPo8qP0pWDg9U4YnReJaXrgHJmqpNtSUV1h1BFVRlhLgwg9lc2sbCUz4zWKsJ958lpX+V08ZSGPn3aiCuMBICullnT6V/Wn3ew2dTtAv1GVlQFcY41Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=kuW/KzYd; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="kuW/KzYd" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-6e6b22af648so239829b3a.0 for ; Tue, 12 Mar 2024 14:44:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279883; x=1710884683; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1e150jmPmcmnhUwqwkgkSEpSo+j9RzHXKffT0azea6k=; b=kuW/KzYdlboaUiJr8cZG++jpTJOCEXicDox+JBgn0WvYs0sBYaMLZRHG9jjMxOyDv/ FEdbUy7HibOvi6hjxXi/LypGDibkCVIJfqu2/vvCF/0HtGlCtgIsxsN2pp4loDGuqsHj PC0ETXP87eEOXH6n+8Rp9vTX9BibUXDnDRbFF5/fUanThkc7Mb2TKcUv4zMB2BswQwWO yHCZdA+caFAHLSKoYV0I+QH0JJPQU5aKorJCBugvbgXjBJRpVJlMOmfB87fnrCe9nBkU lljI/K0YwsqjJSacGWFT7mGiicBJAmB2kz5aQTrWsZDkSRl5xmwpRYsUFQeDkxQh23Da 4GjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279883; x=1710884683; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1e150jmPmcmnhUwqwkgkSEpSo+j9RzHXKffT0azea6k=; b=emYpC3SAtNcUdcc56BDvuzLQLhuqlqH+YMD52wt1sl0qn/jORcKnA0J9SRJwwq0t4b j7tO/K/mgoSVu0o9vKDoWTz/b3Kmx67Ytiwrsx8A59acS360YFfprwpnoYcMUQRud9hX BmKUpVaLjwzkrpU8+qshpg2Hf0G3bXcrI+NjWCp39S1CsmSoBPgQBE22lppFQtwZhEKX ogOK/IsEoY5Ks9Bc/TXyUcSkE1fzFVLmATI8TeyTZ9NiZJR1knM2M30XoqoMnSUtl+hn 3u84f251+tjyb4Soa62WHczKrmEUOjvxhu+iAExuH4husmoA1yK24jZDwCawvZzYxm4E hoyA== X-Gm-Message-State: AOJu0Yy8eBerRYSCkLoyCVpvGm+GWakYscwg+lIt0IsZsqpU21lGZfGz eQN/DeRrax9x1yWD+78CHYM2wc17MH2eaJX3mJOq+zmf0LhgsInTq8MErmLDe1gbHcJQ4LXrY9d a X-Google-Smtp-Source: AGHT+IHOcvscYPKCK8J8fb8jWUvnptRIGd04d9aE24Qi4deIrOQGZ1H1ukCkgOC6LYYroq04Cm9CVQ== X-Received: by 2002:a05:6a20:3d02:b0:1a1:841a:33ef with SMTP id y2-20020a056a203d0200b001a1841a33efmr1130568pzi.3.1710279882656; Tue, 12 Mar 2024 14:44:42 -0700 (PDT) Received: from localhost (fwdproxy-prn-006.fbsv.net. [2a03:2880:ff:6::face:b00c]) by smtp.gmail.com with ESMTPSA id w2-20020aa79a02000000b006e6931a50e8sm4177179pfj.79.2024.03.12.14.44.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:42 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 09/16] io_uring/zcrx: implement socket registration Date: Tue, 12 Mar 2024 14:44:23 -0700 Message-ID: <20240312214430.2923019-10-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov We want userspace to explicitly list all sockets it'll be using with a particular zc ifq, so we can properly configure them, e.g. binding the sockets to the corresponding interface and setting steering rules. We'll also need it to better control ifq lifetime and for termination / unregistration purposes. TODO: remove zc_rx_idx from struct socket, which will fix zc_rx_idx token init races and re-registration bug. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/linux/net.h | 2 + include/uapi/linux/io_uring.h | 7 +++ io_uring/net.c | 20 ++++++++ io_uring/register.c | 6 +++ io_uring/zc_rx.c | 91 +++++++++++++++++++++++++++++++++-- io_uring/zc_rx.h | 17 +++++++ net/socket.c | 1 + 7 files changed, 141 insertions(+), 3 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index c9b4a63791a4..867061a91d30 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -126,6 +126,8 @@ struct socket { const struct proto_ops *ops; /* Might change with IPV6_ADDRFORM or MPTCP. */ struct socket_wq wq; + + unsigned zc_rx_idx; }; /* diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index a085ed60478f..26e945e6258d 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -579,6 +579,7 @@ enum { /* register a network interface queue for zerocopy */ IORING_REGISTER_ZC_RX_IFQ = 29, + IORING_REGISTER_ZC_RX_SOCK = 30, /* this goes last */ IORING_REGISTER_LAST, @@ -824,6 +825,12 @@ struct io_uring_zc_rx_ifq_reg { struct io_rbuf_rqring_offsets rq_off; }; +struct io_uring_zc_rx_sock_reg { + __u32 sockfd; + __u32 zc_rx_ifq_idx; + __u32 __resv[2]; +}; + #ifdef __cplusplus } #endif diff --git a/io_uring/net.c b/io_uring/net.c index 54dff492e064..1fa7c1fa6b5d 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -16,6 +16,7 @@ #include "net.h" #include "notif.h" #include "rsrc.h" +#include "zc_rx.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -1033,6 +1034,25 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return ret; } +static __maybe_unused +struct io_zc_rx_ifq *io_zc_verify_sock(struct io_kiocb *req, + struct socket *sock) +{ + unsigned token = READ_ONCE(sock->zc_rx_idx); + unsigned ifq_idx = token >> IO_ZC_IFQ_IDX_OFFSET; + unsigned sock_idx = token & IO_ZC_IFQ_IDX_MASK; + struct io_zc_rx_ifq *ifq; + + if (ifq_idx) + return NULL; + ifq = req->ctx->ifq; + if (!ifq || sock_idx >= ifq->nr_sockets) + return NULL; + if (ifq->sockets[sock_idx] != req->file) + return NULL; + return ifq; +} + void io_send_zc_cleanup(struct io_kiocb *req) { struct io_sr_msg *zc = io_kiocb_to_cmd(req, struct io_sr_msg); diff --git a/io_uring/register.c b/io_uring/register.c index 760f0b6a051c..7f40403a1716 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -570,6 +570,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_zc_rx_ifq(ctx, arg); break; + case IORING_REGISTER_ZC_RX_SOCK: + ret = -EINVAL; + if (!arg || nr_args != 1) + break; + ret = io_register_zc_rx_sock(ctx, arg); + break; default: ret = -EINVAL; break; diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index 521eeea04f9d..77459c0fc14b 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -5,12 +5,15 @@ #include #include #include +#include +#include #include #include "io_uring.h" #include "kbuf.h" #include "zc_rx.h" +#include "rsrc.h" typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); @@ -76,10 +79,31 @@ static struct io_zc_rx_ifq *io_zc_rx_ifq_alloc(struct io_ring_ctx *ctx) return ifq; } -static void io_zc_rx_ifq_free(struct io_zc_rx_ifq *ifq) +static void io_shutdown_ifq(struct io_zc_rx_ifq *ifq) { - if (ifq->if_rxq_id != -1) + int i; + + if (!ifq) + return; + + for (i = 0; i < ifq->nr_sockets; i++) { + if (ifq->sockets[i]) { + fput(ifq->sockets[i]); + ifq->sockets[i] = NULL; + } + } + ifq->nr_sockets = 0; + + if (ifq->if_rxq_id != -1) { io_close_zc_rxq(ifq); + ifq->if_rxq_id = -1; + } +} + +static void io_zc_rx_ifq_free(struct io_zc_rx_ifq *ifq) +{ + io_shutdown_ifq(ifq); + if (ifq->dev) dev_put(ifq->dev); io_free_rbuf_ring(ifq); @@ -132,7 +156,6 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, reg.rq_off.tail = offsetof(struct io_uring, tail); if (copy_to_user(arg, ®, sizeof(reg))) { - io_close_zc_rxq(ifq); ret = -EFAULT; goto err; } @@ -153,6 +176,8 @@ void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx) if (!ifq) return; + WARN_ON_ONCE(ifq->nr_sockets); + ctx->ifq = NULL; io_zc_rx_ifq_free(ifq); } @@ -160,6 +185,66 @@ void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx) void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx) { lockdep_assert_held(&ctx->uring_lock); + + io_shutdown_ifq(ctx->ifq); +} + +int io_register_zc_rx_sock(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_sock_reg __user *arg) +{ + struct io_uring_zc_rx_sock_reg sr; + struct io_zc_rx_ifq *ifq; + struct socket *sock; + struct file *file; + int ret = -EEXIST; + int idx; + + if (copy_from_user(&sr, arg, sizeof(sr))) + return -EFAULT; + if (sr.__resv[0] || sr.__resv[1]) + return -EINVAL; + if (sr.zc_rx_ifq_idx != 0 || !ctx->ifq) + return -EINVAL; + + ifq = ctx->ifq; + if (ifq->nr_sockets >= ARRAY_SIZE(ifq->sockets)) + return -EINVAL; + + BUILD_BUG_ON(ARRAY_SIZE(ifq->sockets) > IO_ZC_IFQ_IDX_MASK); + + file = fget(sr.sockfd); + if (!file) + return -EBADF; + + if (!!unix_get_socket(file)) { + fput(file); + return -EBADF; + } + + sock = sock_from_file(file); + if (unlikely(!sock || !sock->sk)) { + fput(file); + return -ENOTSOCK; + } + + idx = ifq->nr_sockets; + lock_sock(sock->sk); + if (!sock->zc_rx_idx) { + unsigned token; + + token = idx + (sr.zc_rx_ifq_idx << IO_ZC_IFQ_IDX_OFFSET); + WRITE_ONCE(sock->zc_rx_idx, token); + ret = 0; + } + release_sock(sock->sk); + + if (ret) { + fput(file); + return ret; + } + ifq->sockets[idx] = file; + ifq->nr_sockets++; + return 0; } #endif diff --git a/io_uring/zc_rx.h b/io_uring/zc_rx.h index 35b019b275e0..d7b8397d525f 100644 --- a/io_uring/zc_rx.h +++ b/io_uring/zc_rx.h @@ -2,6 +2,13 @@ #ifndef IOU_ZC_RX_H #define IOU_ZC_RX_H +#include +#include + +#define IO_ZC_MAX_IFQ_SOCKETS 16 +#define IO_ZC_IFQ_IDX_OFFSET 16 +#define IO_ZC_IFQ_IDX_MASK ((1U << IO_ZC_IFQ_IDX_OFFSET) - 1) + struct io_zc_rx_ifq { struct io_ring_ctx *ctx; struct net_device *dev; @@ -11,6 +18,9 @@ struct io_zc_rx_ifq { /* hw rx descriptor ring id */ u32 if_rxq_id; + + unsigned nr_sockets; + struct file *sockets[IO_ZC_MAX_IFQ_SOCKETS]; }; #if defined(CONFIG_PAGE_POOL) @@ -18,6 +28,8 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, struct io_uring_zc_rx_ifq_reg __user *arg); void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx); void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx); +int io_register_zc_rx_sock(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_sock_reg __user *arg); #else static inline int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, struct io_uring_zc_rx_ifq_reg __user *arg) @@ -30,6 +42,11 @@ static inline void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx) static inline void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx) { } +static inline int io_register_zc_rx_sock(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_sock_reg __user *arg) +{ + return -EOPNOTSUPP; +} #endif #endif diff --git a/net/socket.c b/net/socket.c index c69cd0e652b8..18181a4e0295 100644 --- a/net/socket.c +++ b/net/socket.c @@ -637,6 +637,7 @@ struct socket *sock_alloc(void) sock = SOCKET_I(inode); + sock->zc_rx_idx = 0; inode->i_ino = get_next_ino(); inode->i_mode = S_IFSOCK | S_IRWXUGO; inode->i_uid = current_fsuid(); From patchwork Tue Mar 12 21:44:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590622 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8EC57145350 for ; Tue, 12 Mar 2024 21:44:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279886; cv=none; b=h5/et77nh9XoRsweZsmHi6U0alsKR8HPxwZDZN2CuIU5alNHw9bgUt+TlEGVmCj+948u4LLRgAdosMcXJblDct6v8/nHdByTi5/EYnxcScuGHiQEh055eGweLlHthqCY9IH+WlaZMK3BqM99lprAE/b59K3BYSXdzEp/g6lPQLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279886; c=relaxed/simple; bh=eRcCQkeaZ8F0GKFd2KHE5Z+F854s9I1FEkuc3LSzm2c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aJ6fvoMvzW8P6AkaodjHJZMssXiD8JWy2A/DqQoUNEW/LN2emdbHnij4wnHQHTrUzLW/I2o4LQsj/y3+JmGsDJnU+JWZPvGeKPoGs4GLfSzoTJEG5i++opkpPkR3n2NfhA8Gl3ZXdgS5bCyQTChKiAdh2t6Q6AC+QtiZ8jUyA2Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=kmhIAGjt; arc=none smtp.client-ip=209.85.210.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="kmhIAGjt" Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-6e5d7f1f25fso3064663b3a.0 for ; Tue, 12 Mar 2024 14:44:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279884; x=1710884684; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KCy4u9W62ktWc/P2lKJyJOpaYewa777F/POJ5w4gbEU=; b=kmhIAGjtTvRveEAIi+6KlRa2BkST490cqz9UQ/5bDtmZG/wxpU5rpCt3aW5WRCcZzc 0Dk3OWoz4K3IhmCINrIO7tQwi7VY6pTOM83ewJimCkN9CQ1cFFFEtgQyhlTMcXe9/lOC oBsZrNW+ij89j9J9+ptsP58FfT+VvE5VlDBIPD8BXaxYdICtTaFCxDIHUEw2h7eiifGO L9D0R4G+AEySvl4pJ3rNsaD8IT+SDnGy3U5alkglzloIwgRZdp6Qo4gRrIoIXk1CdLUQ Sm0/zivyGJyF/ejru5jEHwZMmg3W8bnVQ8strWs5wLaDpUnaRasEU6bqO5jE2TEnE85O 0MdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279884; x=1710884684; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KCy4u9W62ktWc/P2lKJyJOpaYewa777F/POJ5w4gbEU=; b=mEq5EkrC9VXsD++7LtCqKhCNneiKeZxHMJx2TmgH2k2fHPiDuSJkn0nYNoM55wu0ET xFhSRUhtrztxh54bDrigV/uAJUhy6AW7Vttl0i7cKaUrIQipk3xICoPsFN5c6tA7w+Ig yByD38sCZ7TJlqojDiUmCoY94s2qgNVWzepvcIbFatIr5LbDHsWU0i/Cv06cdIo3Kado sI0OOiOFnFSsAlPpF5A2iTSUUDAulqylBadDKrUMpv2S4LSfvuLl9kxI/vyJfdUWWKcT puSqCqVTQ6uJnB6QNOwBAl4Qy3Z8HzSVJYfVsCr5Huo1rqJX/SuXVmhFd/G02ab4Uahp CJqw== X-Gm-Message-State: AOJu0YxbLf1UaUvzSA7bfHmYD7Yp0COgZVq1Hx5KXG6WJhNunIPY11V6 8vqi2+G46Oj4ly3QMgqhngYdBvRXUUFdidrheVCrrAoKkdTwICruyH3A9EEBmsvC46DDv57DZp0 W X-Google-Smtp-Source: AGHT+IH6qaz8/MgTiX2wWsxztFgdejg3skTArfeuUDbnJDyUll4L7vespliwv23QrCjuFSo8kukySg== X-Received: by 2002:a05:6a00:9299:b0:6e6:a1ed:aff1 with SMTP id jw25-20020a056a00929900b006e6a1edaff1mr881668pfb.5.1710279883637; Tue, 12 Mar 2024 14:44:43 -0700 (PDT) Received: from localhost (fwdproxy-prn-117.fbsv.net. [2a03:2880:ff:75::face:b00c]) by smtp.gmail.com with ESMTPSA id z8-20020aa79e48000000b006e6686effd7sm6555348pfq.76.2024.03.12.14.44.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:43 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 10/16] io_uring: add zero copy buf representation and pool Date: Tue, 12 Mar 2024 14:44:24 -0700 Message-ID: <20240312214430.2923019-11-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: David Wei This patch adds two objects: * Zero copy buffer representation, holding a page and a net_iov. The page is needed as net_iov is designed for opaque device memory, whereas we are backed by real pages. * Zero copy pool, spiritually similar to page pool, that holds ZC bufs and hands them out to net devices. This will be used as an implementation of page pool memory provider. Pool regions are registered w/ io_uring using the registered buffer API, with a 1:1 mapping between region and nr_iovec in io_uring_register_buffers. This does the heavy lifting of pinning and chunking into bvecs into a struct io_mapped_ubuf for us. For now as there is only one pool region per ifq, there is no separate API for adding/removing regions yet and it is mapped implicitly during ifq registration. Signed-off-by: David Wei --- include/linux/io_uring/net.h | 7 +++ io_uring/zc_rx.c | 110 +++++++++++++++++++++++++++++++++++ io_uring/zc_rx.h | 15 +++++ 3 files changed, 132 insertions(+) diff --git a/include/linux/io_uring/net.h b/include/linux/io_uring/net.h index b58f39fed4d5..05d5a6a97264 100644 --- a/include/linux/io_uring/net.h +++ b/include/linux/io_uring/net.h @@ -2,8 +2,15 @@ #ifndef _LINUX_IO_URING_NET_H #define _LINUX_IO_URING_NET_H +#include + struct io_uring_cmd; +struct io_zc_rx_buf { + struct net_iov niov; + struct page *page; +}; + #if defined(CONFIG_IO_URING) int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index 77459c0fc14b..326ae3fcc643 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -66,6 +67,109 @@ static void io_free_rbuf_ring(struct io_zc_rx_ifq *ifq) folio_put(virt_to_folio(ifq->rq_ring)); } +static int io_zc_rx_init_buf(struct page *page, struct io_zc_rx_buf *buf) +{ + memset(&buf->niov, 0, sizeof(buf->niov)); + atomic_long_set(&buf->niov.pp_ref_count, 0); + + buf->page = page; + get_page(page); + return 0; +} + +static void io_zc_rx_free_buf(struct io_zc_rx_buf *buf) +{ + struct page *page = buf->page; + + put_page(page); +} + +static int io_zc_rx_init_pool(struct io_zc_rx_pool *pool, + struct io_mapped_ubuf *imu) +{ + struct io_zc_rx_buf *buf; + struct page *page; + int i, ret; + + for (i = 0; i < imu->nr_bvecs; i++) { + page = imu->bvec[i].bv_page; + buf = &pool->bufs[i]; + ret = io_zc_rx_init_buf(page, buf); + if (ret) + goto err; + + pool->freelist[i] = i; + } + + pool->free_count = imu->nr_bvecs; + return 0; +err: + while (i--) { + buf = &pool->bufs[i]; + io_zc_rx_free_buf(buf); + } + return ret; +} + +static int io_zc_rx_create_pool(struct io_ring_ctx *ctx, + struct io_zc_rx_ifq *ifq, + u16 id) +{ + struct io_mapped_ubuf *imu; + struct io_zc_rx_pool *pool; + int nr_pages; + int ret; + + if (ifq->pool) + return -EFAULT; + + if (unlikely(id >= ctx->nr_user_bufs)) + return -EFAULT; + id = array_index_nospec(id, ctx->nr_user_bufs); + imu = ctx->user_bufs[id]; + if (imu->ubuf & ~PAGE_MASK || imu->ubuf_end & ~PAGE_MASK) + return -EFAULT; + + ret = -ENOMEM; + nr_pages = imu->nr_bvecs; + pool = kvmalloc(struct_size(pool, freelist, nr_pages), GFP_KERNEL); + if (!pool) + goto err; + + pool->bufs = kvmalloc_array(nr_pages, sizeof(*pool->bufs), GFP_KERNEL); + if (!pool->bufs) + goto err_buf; + + ret = io_zc_rx_init_pool(pool, imu); + if (ret) + goto err_map; + + pool->ifq = ifq; + pool->pool_id = id; + pool->nr_bufs = nr_pages; + spin_lock_init(&pool->freelist_lock); + ifq->pool = pool; + return 0; +err_map: + kvfree(pool->bufs); +err_buf: + kvfree(pool); +err: + return ret; +} + +static void io_zc_rx_free_pool(struct io_zc_rx_pool *pool) +{ + struct io_zc_rx_buf *buf; + + for (int i = 0; i < pool->nr_bufs; i++) { + buf = &pool->bufs[i]; + io_zc_rx_free_buf(buf); + } + kvfree(pool->bufs); + kvfree(pool); +} + static struct io_zc_rx_ifq *io_zc_rx_ifq_alloc(struct io_ring_ctx *ctx) { struct io_zc_rx_ifq *ifq; @@ -104,6 +208,8 @@ static void io_zc_rx_ifq_free(struct io_zc_rx_ifq *ifq) { io_shutdown_ifq(ifq); + if (ifq->pool) + io_zc_rx_free_pool(ifq->pool); if (ifq->dev) dev_put(ifq->dev); io_free_rbuf_ring(ifq); @@ -141,6 +247,10 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, if (!ifq->dev) goto err; + ret = io_zc_rx_create_pool(ctx, ifq, reg.region_id); + if (ret) + goto err; + ifq->rq_entries = reg.rq_entries; ifq->if_rxq_id = reg.if_rxq_id; diff --git a/io_uring/zc_rx.h b/io_uring/zc_rx.h index d7b8397d525f..466b2b8f9813 100644 --- a/io_uring/zc_rx.h +++ b/io_uring/zc_rx.h @@ -3,15 +3,30 @@ #define IOU_ZC_RX_H #include +#include #include #define IO_ZC_MAX_IFQ_SOCKETS 16 #define IO_ZC_IFQ_IDX_OFFSET 16 #define IO_ZC_IFQ_IDX_MASK ((1U << IO_ZC_IFQ_IDX_OFFSET) - 1) +struct io_zc_rx_pool { + struct io_zc_rx_ifq *ifq; + struct io_zc_rx_buf *bufs; + u32 nr_bufs; + u16 pool_id; + + /* freelist */ + spinlock_t freelist_lock; + u32 free_count; + u32 freelist[]; +}; + struct io_zc_rx_ifq { struct io_ring_ctx *ctx; struct net_device *dev; + struct io_zc_rx_pool *pool; + struct io_uring *rq_ring; struct io_uring_rbuf_rqe *rqes; u32 rq_entries; From patchwork Tue Mar 12 21:44:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590623 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AFB1B145354 for ; Tue, 12 Mar 2024 21:44:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279887; cv=none; b=kHVSPV99khaRlci6QG8aEC7NGEbcoCURN9aSp+bfDzgWgx2T56/PE1bDx68isHQJ7S6iSJ5NHGKxBFyWcro4nvVGjOYUVgNSZA+xEDIRntlbR3sjRICfBaR/ZHFWdsuR2WtU9j+82Zbk+Dd7iy81rrgshAIwm0RxTNYQeh1TAnc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279887; c=relaxed/simple; bh=fBd7aiW8lj0+VawSo9hjm9m3CrBu6h6cjHpI8ufQ6Tc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Z1cyJu7HixczCmsn1TN/m5de6izuUiBFv1MM+SzSK1GPiDZ7E6k8bEjO3MR3C12Os51BZ1sxBj/kWFEYbyUVxEt4Ppm8WIjhqxq7Bs+G+O2dD9gBjRZWckdABEYEYOU8oAg3H4ne84NMcf/+6PSWOKXEO1kmTpqmZPXIu/1iQ90= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=wv9r2612; arc=none smtp.client-ip=209.85.210.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="wv9r2612" Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-6e6b6e000a4so86941b3a.0 for ; Tue, 12 Mar 2024 14:44:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279885; x=1710884685; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Q/bJ5cz00biADxk0xZyDduH11+GpxHGkffjZ4B+okoY=; b=wv9r2612y5iNlXIYo0aFlA5vi3rdcmbWskyr9IetP0sSh6LNUh8TZD4V7JQ8lY3WF6 gpOK8gHVVK2swHHbIyB5UfZLbW17hHXjeKW4PbSB3I2lxeMdinC2cVxbNyTX/IYl0bJ1 ZmzAPkWFAXssWSvQWqTFRGuLy08LqZC6wsxLYipP/yUJa2uLRhL7Gy2JeGArJ77Ap3xW BI4dSEdJ+Ecwv1IfYwGn0EkVGSCmHWyB8aU548Pv89zlzWVngHIqojulAKSFZXUoftST 9s0DhmUbuQxgcv7ozzROi4gz8qh0dXo2CVsPUoFPL5SY0w59YzTXhBe1lCQ7MBX1Wwhe vqDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279885; x=1710884685; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q/bJ5cz00biADxk0xZyDduH11+GpxHGkffjZ4B+okoY=; b=XsPwUSM6oxLw2+/0aFlw6T5/cUNpkKHVa0DejHMf3nZdUXwWk1Fm0/65vGpF2Ab/bo 87cDoYkIQ+RmFkGxFqmBRgPrz7V+cyZBsPWsHVrPTxUpnsRTt6ASXSQAaWGdoWheymLK 1mn8epxQ5bppKmyVJaeVgjXvdRIHWUkVYpi5XZLZ0nkT7phwTnl/y2TIXgWgwvkd1+Rz Ltz9foT+ELpYPrO/M0oA3QKgfHIqoxOxAHApJ0k+zY3s0YCdv3M20spVV/5CXSb35uKt YCwmwhXeZI3MO24Izm8Zd17xgDufBVjssOeAAoCj/qMj+wEnn6yWij1pykcqX9/IsYfY Y8WQ== X-Gm-Message-State: AOJu0Yx/uMPr0ToWsb/UrIY/JWVIhh86ZMR1wlSngi0xRstvwGqSXfTz LDrpJ2ZqDo83uy6Ptm7u10ZIns9kfIluxvoiEt/8Zkaaqrec3P+7oecxXAH6t2pOWLjcXITEgzx Z X-Google-Smtp-Source: AGHT+IEyzPjJqpvSE1n18GL+g6MedsBAuXrULBa6xfyiiV4h7LYMzOL0v6SLuCy5/J6G/1xX4hfCVQ== X-Received: by 2002:a05:6a00:803:b0:6e6:75d8:3d19 with SMTP id m3-20020a056a00080300b006e675d83d19mr822851pfk.8.1710279884624; Tue, 12 Mar 2024 14:44:44 -0700 (PDT) Received: from localhost (fwdproxy-prn-118.fbsv.net. [2a03:2880:ff:76::face:b00c]) by smtp.gmail.com with ESMTPSA id p1-20020a62b801000000b006e66a76d877sm6877532pfe.153.2024.03.12.14.44.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:44 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 11/16] io_uring: implement pp memory provider for zc rx Date: Tue, 12 Mar 2024 14:44:25 -0700 Message-ID: <20240312214430.2923019-12-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov Implement a new pp memory provider for io_uring zerocopy receive. All buffers are backed by struct io_zc_rx_buf, which is a thin extension of struct net_iov. Initially, all of them are unallocated and placed in a spinlock protected ->freelist. Then, they will be allocate via the ->alloc_pages callback, which sets refcount to 1. Later, buffers would either be dropped by the net stack and recycled back into page pool / released by ->release_page, or, more likely, get transferred to the userspace by posting a corresponding CQE and elevating refcount by IO_ZC_RX_UREF. When the user is done with a buffer, it should be put into the refill ring. Next time io_pp_zc_alloc_pages() runs it'll check the ring, put user refs and ultimately grab buffers from there. That's done in the attached napi context and so doesn't need any additional synchronisation. That is the second hottest path after getting a buffer from the pp lockless cache. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/linux/io_uring/net.h | 5 + include/net/page_pool/types.h | 1 + io_uring/zc_rx.c | 202 ++++++++++++++++++++++++++++++++++ io_uring/zc_rx.h | 5 + net/core/page_pool.c | 2 +- 5 files changed, 214 insertions(+), 1 deletion(-) diff --git a/include/linux/io_uring/net.h b/include/linux/io_uring/net.h index 05d5a6a97264..a225d7090b6b 100644 --- a/include/linux/io_uring/net.h +++ b/include/linux/io_uring/net.h @@ -12,6 +12,11 @@ struct io_zc_rx_buf { }; #if defined(CONFIG_IO_URING) + +#if defined(CONFIG_PAGE_POOL) +extern const struct memory_provider_ops io_uring_pp_zc_ops; +#endif + int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); #else diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 347837b83d36..9e91f2cdbe61 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -227,6 +227,7 @@ netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool, struct page_pool *page_pool_create(const struct page_pool_params *params); struct page_pool *page_pool_create_percpu(const struct page_pool_params *params, int cpuid); +void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem); struct xdp_mem_info; diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index 326ae3fcc643..b2507df121fb 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -8,6 +8,7 @@ #include #include #include +#include #include @@ -357,4 +358,205 @@ int io_register_zc_rx_sock(struct io_ring_ctx *ctx, return 0; } +static inline struct io_zc_rx_buf *io_niov_to_buf(struct net_iov *niov) +{ + return container_of(niov, struct io_zc_rx_buf, niov); +} + +static inline unsigned io_buf_pgid(struct io_zc_rx_pool *pool, + struct io_zc_rx_buf *buf) +{ + return buf - pool->bufs; +} + +static __maybe_unused void io_zc_rx_get_buf_uref(struct io_zc_rx_buf *buf) +{ + atomic_long_add(IO_ZC_RX_UREF, &buf->niov.pp_ref_count); +} + +static bool io_zc_rx_buf_put(struct io_zc_rx_buf *buf, int nr) +{ + return atomic_long_sub_and_test(nr, &buf->niov.pp_ref_count); +} + +static bool io_zc_rx_put_buf_uref(struct io_zc_rx_buf *buf) +{ + if (atomic_long_read(&buf->niov.pp_ref_count) < IO_ZC_RX_UREF) + return false; + + return io_zc_rx_buf_put(buf, IO_ZC_RX_UREF); +} + +static inline netmem_ref io_zc_buf_to_netmem(struct io_zc_rx_buf *buf) +{ + return net_iov_to_netmem(&buf->niov); +} + +static inline void io_zc_add_pp_cache(struct page_pool *pp, + struct io_zc_rx_buf *buf) +{ + netmem_ref netmem = io_zc_buf_to_netmem(buf); + + page_pool_set_pp_info(pp, netmem); + pp->alloc.cache[pp->alloc.count++] = netmem; +} + +static inline u32 io_zc_rx_rqring_entries(struct io_zc_rx_ifq *ifq) +{ + u32 entries; + + entries = smp_load_acquire(&ifq->rq_ring->tail) - ifq->cached_rq_head; + return min(entries, ifq->rq_entries); +} + +static void io_zc_rx_ring_refill(struct page_pool *pp, + struct io_zc_rx_ifq *ifq) +{ + unsigned int entries = io_zc_rx_rqring_entries(ifq); + unsigned int mask = ifq->rq_entries - 1; + struct io_zc_rx_pool *pool = ifq->pool; + + if (unlikely(!entries)) + return; + + while (entries--) { + unsigned int rq_idx = ifq->cached_rq_head++ & mask; + struct io_uring_rbuf_rqe *rqe = &ifq->rqes[rq_idx]; + u32 pgid = rqe->off / PAGE_SIZE; + struct io_zc_rx_buf *buf = &pool->bufs[pgid]; + + if (!io_zc_rx_put_buf_uref(buf)) + continue; + io_zc_add_pp_cache(pp, buf); + if (pp->alloc.count >= PP_ALLOC_CACHE_REFILL) + break; + } + smp_store_release(&ifq->rq_ring->head, ifq->cached_rq_head); +} + +static void io_zc_rx_refill_slow(struct page_pool *pp, struct io_zc_rx_ifq *ifq) +{ + struct io_zc_rx_pool *pool = ifq->pool; + + spin_lock_bh(&pool->freelist_lock); + while (pool->free_count && pp->alloc.count < PP_ALLOC_CACHE_REFILL) { + struct io_zc_rx_buf *buf; + u32 pgid; + + pgid = pool->freelist[--pool->free_count]; + buf = &pool->bufs[pgid]; + + io_zc_add_pp_cache(pp, buf); + pp->pages_state_hold_cnt++; + trace_page_pool_state_hold(pp, io_zc_buf_to_netmem(buf), + pp->pages_state_hold_cnt); + } + spin_unlock_bh(&pool->freelist_lock); +} + +static void io_zc_rx_recycle_buf(struct io_zc_rx_pool *pool, + struct io_zc_rx_buf *buf) +{ + spin_lock_bh(&pool->freelist_lock); + pool->freelist[pool->free_count++] = io_buf_pgid(pool, buf); + spin_unlock_bh(&pool->freelist_lock); +} + +static netmem_ref io_pp_zc_alloc_pages(struct page_pool *pp, gfp_t gfp) +{ + struct io_zc_rx_ifq *ifq = pp->mp_priv; + + /* pp should already be ensuring that */ + if (unlikely(pp->alloc.count)) + goto out_return; + + io_zc_rx_ring_refill(pp, ifq); + if (likely(pp->alloc.count)) + goto out_return; + + io_zc_rx_refill_slow(pp, ifq); + if (!pp->alloc.count) + return 0; +out_return: + return pp->alloc.cache[--pp->alloc.count]; +} + +static bool io_pp_zc_release_page(struct page_pool *pp, netmem_ref netmem) +{ + struct io_zc_rx_ifq *ifq = pp->mp_priv; + struct io_zc_rx_buf *buf; + struct net_iov *niov; + + if (WARN_ON_ONCE(!netmem_is_net_iov(netmem))) + return false; + + niov = netmem_to_net_iov(netmem); + buf = io_niov_to_buf(niov); + + if (io_zc_rx_buf_put(buf, 1)) + io_zc_rx_recycle_buf(ifq->pool, buf); + return false; +} + +static void io_pp_zc_scrub(struct page_pool *pp) +{ + struct io_zc_rx_ifq *ifq = pp->mp_priv; + struct io_zc_rx_pool *pool = ifq->pool; + int i; + + for (i = 0; i < pool->nr_bufs; i++) { + struct io_zc_rx_buf *buf = &pool->bufs[i]; + int count; + + if (!io_zc_rx_put_buf_uref(buf)) + continue; + io_zc_rx_recycle_buf(pool, buf); + + count = atomic_inc_return_relaxed(&pp->pages_state_release_cnt); + trace_page_pool_state_release(pp, io_zc_buf_to_netmem(buf), count); + } +} + +static int io_pp_zc_init(struct page_pool *pp) +{ + struct io_zc_rx_ifq *ifq = pp->mp_priv; + + if (!ifq) + return -EINVAL; + if (pp->p.order != 0) + return -EINVAL; + if (!pp->p.napi) + return -EINVAL; + if (pp->p.flags & PP_FLAG_DMA_MAP) + return -EOPNOTSUPP; + if (pp->p.flags & PP_FLAG_DMA_SYNC_DEV) + return -EOPNOTSUPP; + + percpu_ref_get(&ifq->ctx->refs); + ifq->pp = pp; + return 0; +} + +static void io_pp_zc_destroy(struct page_pool *pp) +{ + struct io_zc_rx_ifq *ifq = pp->mp_priv; + struct io_zc_rx_pool *pool = ifq->pool; + + ifq->pp = NULL; + + if (WARN_ON_ONCE(pool->free_count != pool->nr_bufs)) + return; + percpu_ref_put(&ifq->ctx->refs); +} + +const struct memory_provider_ops io_uring_pp_zc_ops = { + .alloc_pages = io_pp_zc_alloc_pages, + .release_page = io_pp_zc_release_page, + .init = io_pp_zc_init, + .destroy = io_pp_zc_destroy, + .scrub = io_pp_zc_scrub, +}; +EXPORT_SYMBOL(io_uring_pp_zc_ops); + + #endif diff --git a/io_uring/zc_rx.h b/io_uring/zc_rx.h index 466b2b8f9813..c02bf8cabc6c 100644 --- a/io_uring/zc_rx.h +++ b/io_uring/zc_rx.h @@ -10,6 +10,9 @@ #define IO_ZC_IFQ_IDX_OFFSET 16 #define IO_ZC_IFQ_IDX_MASK ((1U << IO_ZC_IFQ_IDX_OFFSET) - 1) +#define IO_ZC_RX_UREF 0x10000 +#define IO_ZC_RX_KREF_MASK (IO_ZC_RX_UREF - 1) + struct io_zc_rx_pool { struct io_zc_rx_ifq *ifq; struct io_zc_rx_buf *bufs; @@ -26,10 +29,12 @@ struct io_zc_rx_ifq { struct io_ring_ctx *ctx; struct net_device *dev; struct io_zc_rx_pool *pool; + struct page_pool *pp; struct io_uring *rq_ring; struct io_uring_rbuf_rqe *rqes; u32 rq_entries; + u32 cached_rq_head; /* hw rx descriptor ring id */ u32 if_rxq_id; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index fc92e551ed13..f83ddbb4ebd8 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -460,7 +460,7 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) return false; } -static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) +void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) { netmem_set_pp(netmem, pool); netmem_or_pp_magic(netmem, PP_SIGNATURE); From patchwork Tue Mar 12 21:44:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590624 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87C57145FEC for ; Tue, 12 Mar 2024 21:44:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279887; cv=none; b=GUnjuCbO5J8c5hPcWiOxTgecR1tlDHiqhN8pRkFIrligUiRSxZ99sxHKTcakNowppMnMyoqpj6oCdlOTKVbwoZg5xFFwfC5GfP4N70OovhSDEHUvjwfhdfutH0DofUiuVktng+SFEynwM9SoBG4sLzqa4pCQ2a/7oSaTOgzuYq4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279887; c=relaxed/simple; bh=aeHuFMOyRuZ3XW+Rnr2l0qtWQLXFbMfZhOAAcqgGQ2Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hVEwBnoBGzx5GgXZeKyNEWofGM5c98jEUyDuKokueix6gOg+wNZvVzoUpld2/AIbA+yu3Z87joAGbftXYiy7lC8pTeHzKIA75wwpSzSvp24XnBQt7XYIwOmZG6QJuhEMUjxwEDI4Cliy3pUTjCk/aHqhJcQhAdn3Sp7I2/HOlkY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=XSUj7XUb; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="XSUj7XUb" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1dc09556599so49856835ad.1 for ; Tue, 12 Mar 2024 14:44:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279886; x=1710884686; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wAVR2O4v3CTtiY0k6MfQnUcvPzTNjtLVzW6+B1SUorQ=; b=XSUj7XUbLVzsTFJjQFtzb2sF05IafY7VEjIqHcN8oLrYy97wM2xSD54BTQmgX6zOQd nVdU56hbcpw1NghWz7IYnow3Nj1NOYwu2V/ocoWwozVtD3QKsSdG5nIAnA65Wp+G2tKX 24sQ11duV0qjvG8FyuKVY3TVI8cGlRXkRAFOG8GRCjefqoVq0kACuhv7tussnS1rqVtI YlZIXzadQDhmhp2c7RvsQdFI9Qu/q1uh+FELRbhRvAgF2cylJrvqS2k+05A+BEOLKZze vOFcqkB/vPZpLvs2D2jF4PU1frBcNDaO6ZfDNE7AeXoRq9eLWkrLImbBf2NQeJaOp1Bj OZ1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279886; x=1710884686; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wAVR2O4v3CTtiY0k6MfQnUcvPzTNjtLVzW6+B1SUorQ=; b=xOke9H3amxlx0JA3B2kLTFmR1v3xPZVSX0x6s+IjKY0r/DD+RNUtxGcN1Tyy1o4r9Y l6bTetAcTP6vh+MVLqfpQ4Bt4AYdKOaHtQwhsUZOfZJ+4SwH9RPZCYIb1eWJ/6dH3nBv a+tOmguaF0zI6ZpaxNHrDYqr7ZqcURDnwtkMJSMqf0NDsphHkoCRhX+teEB/QqIwMKPk 70KzkkOgephp6hudrqQsrD9G72fFfxSxRp9jO3UyRbv3DXC5lgO1PHnI2JrrfBdNAonT FcJgZhqqkOauKp1sZ1e8GHw1X2eprDK679Fl6ChWZ5RVn139xxMpCnK4u4iRaVjfcFi/ 1/3Q== X-Gm-Message-State: AOJu0Yw5Fs0Chc7QD23+M6o8IhW3Lflt/OoPELTbuoGdIRBeWoeqfmHK uvM9Fq48OsIK25ybDIIVRLCME9GjmKXfiyY1iu+2uOd5KnPpeiibwiWSjLxtQAznkUDCqOnKGzH Q X-Google-Smtp-Source: AGHT+IGmqJhtdBjWOobQoRY8deHIp2H+HZWi2xEjoE27n/25R3VmXSVpOS4wZto79bOen1D1uAkOng== X-Received: by 2002:a17:903:244d:b0:1dd:b140:d00e with SMTP id l13-20020a170903244d00b001ddb140d00emr5551806pls.18.1710279885698; Tue, 12 Mar 2024 14:44:45 -0700 (PDT) Received: from localhost (fwdproxy-prn-016.fbsv.net. [2a03:2880:ff:10::face:b00c]) by smtp.gmail.com with ESMTPSA id k8-20020a170902760800b001dc10796951sm7250819pll.19.2024.03.12.14.44.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:45 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 12/16] io_uring/zcrx: implement PP_FLAG_DMA_* handling Date: Tue, 12 Mar 2024 14:44:26 -0700 Message-ID: <20240312214430.2923019-13-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov The patch implements support for PP_FLAG_DMA_MAP and PP_FLAG_DMA_SYNC_DEV. Dma map buffers when creating a page pool if needed, and unmap on tear down. Most of synching is done by page pool apart from when we're grabbing buffers from the refill ring, in which case it we need to do it by hand. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- io_uring/zc_rx.c | 90 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 86 insertions(+), 4 deletions(-) diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index b2507df121fb..4bd27eda4bc9 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -9,6 +9,7 @@ #include #include #include +#include #include @@ -72,6 +73,7 @@ static int io_zc_rx_init_buf(struct page *page, struct io_zc_rx_buf *buf) { memset(&buf->niov, 0, sizeof(buf->niov)); atomic_long_set(&buf->niov.pp_ref_count, 0); + page_pool_set_dma_addr_netmem(net_iov_to_netmem(&buf->niov), 0); buf->page = page; get_page(page); @@ -392,12 +394,25 @@ static inline netmem_ref io_zc_buf_to_netmem(struct io_zc_rx_buf *buf) return net_iov_to_netmem(&buf->niov); } +static inline void io_zc_sync_for_device(struct page_pool *pp, + netmem_ref netmem) +{ + if (pp->p.flags & PP_FLAG_DMA_SYNC_DEV) { + dma_addr_t dma_addr = page_pool_get_dma_addr_netmem(netmem); + + dma_sync_single_range_for_device(pp->p.dev, dma_addr, + pp->p.offset, pp->p.max_len, + pp->p.dma_dir); + } +} + static inline void io_zc_add_pp_cache(struct page_pool *pp, struct io_zc_rx_buf *buf) { netmem_ref netmem = io_zc_buf_to_netmem(buf); page_pool_set_pp_info(pp, netmem); + io_zc_sync_for_device(pp, netmem); pp->alloc.cache[pp->alloc.count++] = netmem; } @@ -517,9 +532,71 @@ static void io_pp_zc_scrub(struct page_pool *pp) } } +#define IO_PP_DMA_ATTRS (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING) + +static void io_pp_unmap_buf(struct io_zc_rx_buf *buf, struct page_pool *pp) +{ + netmem_ref netmem = net_iov_to_netmem(&buf->niov); + dma_addr_t dma = page_pool_get_dma_addr_netmem(netmem); + + dma_unmap_page_attrs(pp->p.dev, dma, PAGE_SIZE << pp->p.order, + pp->p.dma_dir, IO_PP_DMA_ATTRS); + page_pool_set_dma_addr_netmem(netmem, 0); +} + +static int io_pp_map_buf(struct io_zc_rx_buf *buf, struct page_pool *pp) +{ + netmem_ref netmem = net_iov_to_netmem(&buf->niov); + dma_addr_t dma_addr; + int ret; + + dma_addr = dma_map_page_attrs(pp->p.dev, buf->page, 0, + PAGE_SIZE << pp->p.order, pp->p.dma_dir, + IO_PP_DMA_ATTRS); + ret = dma_mapping_error(pp->p.dev, dma_addr); + if (ret) + return ret; + + if (WARN_ON_ONCE(page_pool_set_dma_addr_netmem(netmem, dma_addr))) { + dma_unmap_page_attrs(pp->p.dev, dma_addr, + PAGE_SIZE << pp->p.order, pp->p.dma_dir, + IO_PP_DMA_ATTRS); + return -EFAULT; + } + + io_zc_sync_for_device(pp, netmem); + return 0; +} + +static int io_pp_map_pool(struct io_zc_rx_pool *pool, struct page_pool *pp) +{ + int i, ret = 0; + + for (i = 0; i < pool->nr_bufs; i++) { + ret = io_pp_map_buf(&pool->bufs[i], pp); + if (ret) + break; + } + + if (ret) { + while (i--) + io_pp_unmap_buf(&pool->bufs[i], pp); + } + return ret; +} + +static void io_pp_unmap_pool(struct io_zc_rx_pool *pool, struct page_pool *pp) +{ + int i; + + for (i = 0; i < pool->nr_bufs; i++) + io_pp_unmap_buf(&pool->bufs[i], pp); +} + static int io_pp_zc_init(struct page_pool *pp) { struct io_zc_rx_ifq *ifq = pp->mp_priv; + int ret; if (!ifq) return -EINVAL; @@ -527,10 +604,12 @@ static int io_pp_zc_init(struct page_pool *pp) return -EINVAL; if (!pp->p.napi) return -EINVAL; - if (pp->p.flags & PP_FLAG_DMA_MAP) - return -EOPNOTSUPP; - if (pp->p.flags & PP_FLAG_DMA_SYNC_DEV) - return -EOPNOTSUPP; + + if (pp->p.flags & PP_FLAG_DMA_MAP) { + ret = io_pp_map_pool(ifq->pool, pp); + if (ret) + return ret; + } percpu_ref_get(&ifq->ctx->refs); ifq->pp = pp; @@ -542,6 +621,9 @@ static void io_pp_zc_destroy(struct page_pool *pp) struct io_zc_rx_ifq *ifq = pp->mp_priv; struct io_zc_rx_pool *pool = ifq->pool; + if (pp->p.flags & PP_FLAG_DMA_MAP) + io_pp_unmap_pool(ifq->pool, pp); + ifq->pp = NULL; if (WARN_ON_ONCE(pool->free_count != pool->nr_bufs)) From patchwork Tue Mar 12 21:44:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590625 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B46F145FF6 for ; Tue, 12 Mar 2024 21:44:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279889; cv=none; b=KlC7IrkIVyI+D9+hX5D+rQM5DUPO0X7m7jQZE0WDjRK4VIzWh5Wh1Q7C4swJQdJtF1mU+CPepzwLnllPlm5FCQyBHmNstKKdC+4qMgpVttoY1iLUQqTa+/qiN0kHgOO8kOAtL69Zn9oQMuLJ8goTZNN7CSm7dwDkYHroWkIWn8Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279889; c=relaxed/simple; bh=mNGLbXfa9NPPeS5WlGJbuh8AL62SbdqEzpioONP8Yso=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WrU/1b2b2AVcxoUWzl1JpR0EG4MsT/lnm/lVeqk5ktFB+Zxux1OpJgcB52IzbSzws7FLfDzes/svMsau/aOcUAvK/X9wzL0fTaG6b5X1kKQgk/pAcfpXRmv/qV6Vkbv4KSCZfQ9/hLAJpBbYGIeExPOh9TyHJHJdmu1cWsyk3IE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=ZJFdi2Pi; arc=none smtp.client-ip=209.85.215.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="ZJFdi2Pi" Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-5dbf7b74402so303760a12.0 for ; Tue, 12 Mar 2024 14:44:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279887; x=1710884687; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TFThT2VJUbNvR7G2lUNJR/GJIbE8IFg24e5r8fNOtpw=; b=ZJFdi2PisycqHUUCbcg328CQrV6U/2KLIgqtglTnhTQ+cIMHMfakfp21BpVBvqkVla RiIA0RuGfe2ckaU5W64H0jE/tvnJ8BG8UlvzG+sSg2oOVA+3C15c2iOGM9zNsOpAJ9rB Trg7EbfDPdbwlm46Pu/fnW1mVo4Jc/wKT6wyj6NeXCVAICtM9WO5IFLsvzd5imAnj/gS SVzX1NnF1K2/+Nqn7kIp6iJJIxHPCSrspojcj3TAkd266D0Z3RuWchp6sg/4h1/olBwY +PXW7R7tBDxWmVpgrDDOjqddDnY90Sy9xh6WO6p/BnfMN7bj08wJP6PasyfMFBQ61qbn X7jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279887; x=1710884687; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TFThT2VJUbNvR7G2lUNJR/GJIbE8IFg24e5r8fNOtpw=; b=PGZpvkWyWMdfKOhKH60eQGPATgL+CYT8SGjiCHiotJhtbR/1AciXmrwFbakTIsBVoQ WyTb3AcycZvnSHcnxwiuxzAFXOCUXG+5mf0Sg9/8j+ABxIhR8Nk+xof2/a0goRwVs7zC bdz5ZP7cgBiSyXII7/4KMNxBIvFf4RsWBSfL58Cv1sPDnuSWaNp30a+YUeKAz6YG18wB BTduvj8zc2/3jm87Fp3Cz3YwLh6MJMVDdpgjPTUUUQBCGHMatPqT5MWUie6KmTjWPpsw 49yZRGSagEOsJCPt/8Lg4WL/01vhgL2zM9V3De460bRUXPOSd8qafA8JdR3D97swn4CL uqZw== X-Gm-Message-State: AOJu0Yx0FIfQWMV/qNJnSI1UhT9BUy/UwPx5i7Cf2IjDYFm/m3AGtCXI +9OgotUTlGcGB2xb+5fgjzUa3aQRGjCVYx/Tl4M3JwdODrSoHSSRkFQUkhXuRtSVLRMIY9mTIur D X-Google-Smtp-Source: AGHT+IESt7nl46Cvv4dYfkhvVIiOkCTWusNIK3SZxhfAujk5rj+rp8Fa5tHuY66FpRgY5FFZ4SwdCQ== X-Received: by 2002:a17:90b:4f83:b0:29c:3d5b:dd42 with SMTP id qe3-20020a17090b4f8300b0029c3d5bdd42mr2472606pjb.3.1710279886631; Tue, 12 Mar 2024 14:44:46 -0700 (PDT) Received: from localhost (fwdproxy-prn-008.fbsv.net. [2a03:2880:ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id i6-20020a17090ac40600b0029bafbd1e02sm56018pjt.14.2024.03.12.14.44.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:46 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 13/16] io_uring: add io_recvzc request Date: Tue, 12 Mar 2024 14:44:27 -0700 Message-ID: <20240312214430.2923019-14-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add an io_uring opcode OP_RECV_ZC for doing ZC reads from a socket that is set up for ZC Rx. The request reads skbs from a socket. Completions are posted into the main CQ for each page frag read. Big CQEs (CQE32) is required as the OP_RECV_ZC specific metadata (ZC region, offset, len) are stored in the extended 16 bytes as a struct io_uring_rbuf_cqe. For now there is no limit as to how much work each OP_RECV_ZC request does. It will attempt to drain a socket of all available data. Multishot requests are also supported. The first time an io_recvzc request completes, EAGAIN is returned which arms an async poll. Then, in subsequent runs in task work, IOU_ISSUE_SKIP_COMPLETE is returned to continue async polling. Signed-off-by: David Wei --- include/uapi/linux/io_uring.h | 1 + io_uring/io_uring.h | 10 ++ io_uring/net.c | 94 +++++++++++++++++- io_uring/opdef.c | 16 +++ io_uring/zc_rx.c | 177 +++++++++++++++++++++++++++++++++- io_uring/zc_rx.h | 11 +++ 6 files changed, 302 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 26e945e6258d..ad2ec60b0390 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -256,6 +256,7 @@ enum io_uring_op { IORING_OP_FUTEX_WAITV, IORING_OP_FIXED_FD_INSTALL, IORING_OP_FTRUNCATE, + IORING_OP_RECV_ZC, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 6426ee382276..cd1b3da96f62 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -180,6 +180,16 @@ static inline bool io_get_cqe(struct io_ring_ctx *ctx, struct io_uring_cqe **ret return io_get_cqe_overflow(ctx, ret, false); } +static inline bool io_defer_get_uncommited_cqe(struct io_ring_ctx *ctx, + struct io_uring_cqe **cqe_ret) +{ + io_lockdep_assert_cq_locked(ctx); + + ctx->cq_extra++; + ctx->submit_state.flush_cqes = true; + return io_get_cqe(ctx, cqe_ret); +} + static __always_inline bool io_fill_cqe_req(struct io_ring_ctx *ctx, struct io_kiocb *req) { diff --git a/io_uring/net.c b/io_uring/net.c index 1fa7c1fa6b5d..56172335387e 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -79,6 +79,12 @@ struct io_sr_msg { */ #define MULTISHOT_MAX_RETRY 32 +struct io_recvzc { + struct file *file; + unsigned msg_flags; + u16 flags; +}; + static inline bool io_check_multishot(struct io_kiocb *req, unsigned int issue_flags) { @@ -695,7 +701,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, unsigned int cflags; cflags = io_put_kbuf(req, issue_flags); - if (msg->msg_inq && msg->msg_inq != -1) + if (msg && msg->msg_inq && msg->msg_inq != -1) cflags |= IORING_CQE_F_SOCK_NONEMPTY; if (!(req->flags & REQ_F_APOLL_MULTISHOT)) { @@ -723,7 +729,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, goto enobufs; /* Known not-empty or unknown state, retry */ - if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq == -1) { + if (cflags & IORING_CQE_F_SOCK_NONEMPTY || (msg && msg->msg_inq == -1)) { if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY) return false; /* mshot retries exceeded, force a requeue */ @@ -1034,9 +1040,8 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return ret; } -static __maybe_unused -struct io_zc_rx_ifq *io_zc_verify_sock(struct io_kiocb *req, - struct socket *sock) +static struct io_zc_rx_ifq *io_zc_verify_sock(struct io_kiocb *req, + struct socket *sock) { unsigned token = READ_ONCE(sock->zc_rx_idx); unsigned ifq_idx = token >> IO_ZC_IFQ_IDX_OFFSET; @@ -1053,6 +1058,85 @@ struct io_zc_rx_ifq *io_zc_verify_sock(struct io_kiocb *req, return ifq; } +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + + /* non-iopoll defer_taskrun only */ + if (!req->ctx->task_complete) + return -EINVAL; + if (unlikely(sqe->file_index || sqe->addr2)) + return -EINVAL; + if (READ_ONCE(sqe->len) || READ_ONCE(sqe->addr3)) + return -EINVAL; + + zc->flags = READ_ONCE(sqe->ioprio); + zc->msg_flags = READ_ONCE(sqe->msg_flags); + + if (zc->msg_flags) + return -EINVAL; + if (zc->flags & ~RECVMSG_FLAGS) + return -EINVAL; + if (zc->flags & IORING_RECV_MULTISHOT) + req->flags |= REQ_F_APOLL_MULTISHOT; +#ifdef CONFIG_COMPAT + if (req->ctx->compat) + zc->msg_flags |= MSG_CMSG_COMPAT; +#endif + return 0; +} + +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + struct io_zc_rx_ifq *ifq; + struct socket *sock; + int ret; + + /* + * We're posting CQEs deeper in the stack, and to avoid taking CQ locks + * we serialise by having only the master thread modifying the CQ with + * DEFER_TASkRUN checked earlier and forbidding executing it from io-wq. + * That's similar to io_check_multishot() for multishot CQEs. + */ + if (issue_flags & IO_URING_F_IOWQ) + return -EAGAIN; + if (WARN_ON_ONCE(!(issue_flags & IO_URING_F_NONBLOCK))) + return -EAGAIN; + if (!(req->flags & REQ_F_POLLED) && + (zc->flags & IORING_RECVSEND_POLL_FIRST)) + return -EAGAIN; + + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + ifq = io_zc_verify_sock(req, sock); + if (!ifq) + return -EINVAL; + + ret = io_zc_rx_recv(req, ifq, sock, zc->msg_flags | MSG_DONTWAIT); + if (unlikely(ret <= 0)) { + if (ret == -EAGAIN) { + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_ISSUE_SKIP_COMPLETE; + return -EAGAIN; + } + if (ret == -ERESTARTSYS) + ret = -EINTR; + + req_set_fail(req); + io_req_set_res(req, ret, 0); + + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_STOP_MULTISHOT; + return IOU_OK; + } + + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_ISSUE_SKIP_COMPLETE; + return -EAGAIN; +} + void io_send_zc_cleanup(struct io_kiocb *req) { struct io_sr_msg *zc = io_kiocb_to_cmd(req, struct io_sr_msg); diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 9c080aadc5a6..78ec5197917e 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -36,6 +36,7 @@ #include "waitid.h" #include "futex.h" #include "truncate.h" +#include "zc_rx.h" static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags) { @@ -481,6 +482,18 @@ const struct io_issue_def io_issue_defs[] = { .prep = io_ftruncate_prep, .issue = io_ftruncate, }, + [IORING_OP_RECV_ZC] = { + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollin = 1, + .ioprio = 1, +#if defined(CONFIG_NET) + .prep = io_recvzc_prep, + .issue = io_recvzc, +#else + .prep = io_eopnotsupp_prep, +#endif + }, }; const struct io_cold_def io_cold_defs[] = { @@ -722,6 +735,9 @@ const struct io_cold_def io_cold_defs[] = { [IORING_OP_FTRUNCATE] = { .name = "FTRUNCATE", }, + [IORING_OP_RECV_ZC] = { + .name = "RECV_ZC", + }, }; const char *io_uring_get_opcode(u8 opcode) diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index 4bd27eda4bc9..bb9251111735 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -6,10 +6,12 @@ #include #include #include + +#include #include #include + #include -#include #include @@ -18,6 +20,12 @@ #include "zc_rx.h" #include "rsrc.h" +struct io_zc_rx_args { + struct io_kiocb *req; + struct io_zc_rx_ifq *ifq; + struct socket *sock; +}; + typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); static int __io_queue_mgmt(struct net_device *dev, struct io_zc_rx_ifq *ifq, @@ -371,7 +379,7 @@ static inline unsigned io_buf_pgid(struct io_zc_rx_pool *pool, return buf - pool->bufs; } -static __maybe_unused void io_zc_rx_get_buf_uref(struct io_zc_rx_buf *buf) +static void io_zc_rx_get_buf_uref(struct io_zc_rx_buf *buf) { atomic_long_add(IO_ZC_RX_UREF, &buf->niov.pp_ref_count); } @@ -640,5 +648,170 @@ const struct memory_provider_ops io_uring_pp_zc_ops = { }; EXPORT_SYMBOL(io_uring_pp_zc_ops); +static bool zc_rx_queue_cqe(struct io_kiocb *req, struct io_zc_rx_buf *buf, + struct io_zc_rx_ifq *ifq, int off, int len) +{ + struct io_uring_rbuf_cqe *rcqe; + struct io_uring_cqe *cqe; + + if (!io_defer_get_uncommited_cqe(req->ctx, &cqe)) + return false; + + cqe->user_data = req->cqe.user_data; + cqe->res = 0; + cqe->flags = IORING_CQE_F_MORE; + + rcqe = (struct io_uring_rbuf_cqe *)(cqe + 1); + rcqe->region = 0; + rcqe->off = io_buf_pgid(ifq->pool, buf) * PAGE_SIZE + off; + rcqe->len = len; + memset(rcqe->__pad, 0, sizeof(rcqe->__pad)); + return true; +} + +static int zc_rx_recv_frag(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + const skb_frag_t *frag, int off, int len) +{ + off += skb_frag_off(frag); + + if (likely(skb_frag_is_net_iov(frag))) { + struct io_zc_rx_buf *buf; + struct net_iov *niov; + + niov = netmem_to_net_iov(frag->netmem); + if (niov->pp->mp_ops != &io_uring_pp_zc_ops || + niov->pp->mp_priv != ifq) + return -EFAULT; + + buf = io_niov_to_buf(niov); + if (!zc_rx_queue_cqe(req, buf, ifq, off, len)) + return -ENOSPC; + io_zc_rx_get_buf_uref(buf); + } else { + return -EOPNOTSUPP; + } + + return len; +} + +static int +zc_rx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, + unsigned int offset, size_t len) +{ + struct io_zc_rx_args *args = desc->arg.data; + struct io_zc_rx_ifq *ifq = args->ifq; + struct io_kiocb *req = args->req; + struct sk_buff *frag_iter; + unsigned start, start_off; + int i, copy, end, off; + int ret = 0; + + start = skb_headlen(skb); + start_off = offset; + + if (offset < start) + return -EOPNOTSUPP; + + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + const skb_frag_t *frag; + + if (WARN_ON(start > offset + len)) + return -EFAULT; + + frag = &skb_shinfo(skb)->frags[i]; + end = start + skb_frag_size(frag); + + if (offset < end) { + copy = end - offset; + if (copy > len) + copy = len; + + off = offset - start; + ret = zc_rx_recv_frag(req, ifq, frag, off, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + + skb_walk_frags(skb, frag_iter) { + if (WARN_ON(start > offset + len)) + return -EFAULT; + + end = start + frag_iter->len; + if (offset < end) { + copy = end - offset; + if (copy > len) + copy = len; + + off = offset - start; + ret = zc_rx_recv_skb(desc, frag_iter, off, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + +out: + if (offset == start_off) + return ret; + return offset - start_off; +} + +static int io_zc_rx_tcp_recvmsg(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + struct sock *sk, int flags) +{ + struct io_zc_rx_args args = { + .req = req, + .ifq = ifq, + .sock = sk->sk_socket, + }; + read_descriptor_t rd_desc = { + .count = 1, + .arg.data = &args, + }; + int ret; + + lock_sock(sk); + ret = tcp_read_sock(sk, &rd_desc, zc_rx_recv_skb); + if (ret <= 0) { + if (ret < 0 || sock_flag(sk, SOCK_DONE)) + goto out; + if (sk->sk_err) + ret = sock_error(sk); + else if (sk->sk_shutdown & RCV_SHUTDOWN) + goto out; + else if (sk->sk_state == TCP_CLOSE) + ret = -ENOTCONN; + else + ret = -EAGAIN; + } +out: + release_sock(sk); + return ret; +} + +int io_zc_rx_recv(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + struct socket *sock, unsigned int flags) +{ + struct sock *sk = sock->sk; + const struct proto *prot = READ_ONCE(sk->sk_prot); + + if (prot->recvmsg != tcp_recvmsg) + return -EPROTONOSUPPORT; + + sock_rps_record_flow(sk); + return io_zc_rx_tcp_recvmsg(req, ifq, sk, flags); +} #endif diff --git a/io_uring/zc_rx.h b/io_uring/zc_rx.h index c02bf8cabc6c..c14ea3cf544a 100644 --- a/io_uring/zc_rx.h +++ b/io_uring/zc_rx.h @@ -50,6 +50,8 @@ void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx); void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx); int io_register_zc_rx_sock(struct io_ring_ctx *ctx, struct io_uring_zc_rx_sock_reg __user *arg); +int io_zc_rx_recv(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + struct socket *sock, unsigned int flags); #else static inline int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, struct io_uring_zc_rx_ifq_reg __user *arg) @@ -67,6 +69,15 @@ static inline int io_register_zc_rx_sock(struct io_ring_ctx *ctx, { return -EOPNOTSUPP; } + +static inline int io_zc_rx_recv(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + struct socket *sock, unsigned int flags) +{ + return -EOPNOTSUPP; +} #endif +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags); +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); + #endif From patchwork Tue Mar 12 21:44:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590626 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BF3F146008 for ; Tue, 12 Mar 2024 21:44:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279889; cv=none; b=dFJI1g9CNxK72vVESsMFL4Pbn5P5Be913bGXWKxjjPLkYifVKk6SPUdiH6UaqoESpQcP91ssVYQGBfBCimeEY1Spj3Cx+XN+Tesuf4nvw0hXVMpmu8FYrewhwKNDDwsAx5Ddbp6gP+7K7yNYetdOT4zFK8MUQK+Z8B0fkHV2Bjg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279889; c=relaxed/simple; bh=1hALqYekJ2oj/HWhQXcw5HAvfVNul9K500R6znLcI+w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pU/slkEl8vmZcX3IY0EUpKD2znbR1Jer+d2lrSpQIazOx/8e94xBSF2ZYSqKdFT4fM8YzOAuBQwjKHwmKQO3/PeG6Ch1OzvgIwy0rY/IUp6WYMZ+6UspqS2PUDvuyUrt5+ZZhoh8gsshL0easzYCZKaKpn08by4w/Q9i/IC6zXg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=SYlvft1j; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="SYlvft1j" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-1dd7df835a8so32323665ad.1 for ; Tue, 12 Mar 2024 14:44:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279887; x=1710884687; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZRO95Kdke95/CXsJPGgRoGGIRwQcdQkfuE0TmEZDZS0=; b=SYlvft1jWwOMtcS6s89Cjb61ZLegX7DX1xcpbZtv0dXXIOeyB5CKX5KXtlFKh/uVrh 7b9HeKbjBMHTzgG+UDLNz8L3CN0y5jgbp3S8DI7vYwTqEzn0lIBFng31Nh7EjTmeMffB ce3Go5Ypk3WYG4P0EV0of89ZMb2K6kzzkVgnNYHrlXCESMVEHau5FQkhDaSe0RyyYV/0 fK2H+Mi1LwuwSsb+lbSr+fYMkx6wAe5/P+ok3GEFZg1DmRcsS90HcqUwt8+yDhWbPr+x IdqhwWvvIu7ef0wHGwFZd95HGAhlEcGz9dZmMNrcRnRZ0ArDQosPdomwdLBrDQKylVCO qqIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279887; x=1710884687; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZRO95Kdke95/CXsJPGgRoGGIRwQcdQkfuE0TmEZDZS0=; b=CtxcQfxiQhdnXf8F/cgyYYNkt8azrYy+1WSk6RKEz/ntUmmVbPIjWuDlboQD1T/LRt hHHqS8xkqdBVtloB9FT6lcjnmWeGtfEWjOHXcq8ah6WtYw89PURGB8D6L/68gMSaP2Pj 8kaAN7mRUzC9jplYl86jS71xhT0EhsD/Qd8fF144pWzaNSVUxSdK40MK/jb/YZgC1qtG zsJPHHU/WLRYDCE4HrBevcqupstxIVIpQEiEMWOgRxxvUlKOHdosPx/9eR41eP8H1KL9 1TN9onu2Dodc3TlCdb9cI4lLl0hWeWv+9Nzx75ZC3vBRIlSDN0ZhJHTp6bY2RDA1jArg EjFw== X-Gm-Message-State: AOJu0YxQgo2XplBlfUoamJahFB256Jz7meUc2Z4Vyev27gJv5hgMlbea q8mb8BI66R3TfFQXohgOqQK+oWQctDk1Ox8hd4yJvaLtG6Up6wMmhJJM3qEDQ0h/j/WHhPIomJJ 4 X-Google-Smtp-Source: AGHT+IF31cYyvpgFJ1C6r8UOXG8CtpluFYbareGQQz76ulAvEHus9G0Qt/JlvJonH+TOMzSuV3P70g== X-Received: by 2002:a17:902:cf04:b0:1dd:b6b8:3d8e with SMTP id i4-20020a170902cf0400b001ddb6b83d8emr2107298plg.12.1710279887530; Tue, 12 Mar 2024 14:44:47 -0700 (PDT) Received: from localhost (fwdproxy-prn-006.fbsv.net. [2a03:2880:ff:6::face:b00c]) by smtp.gmail.com with ESMTPSA id mm16-20020a1709030a1000b001dcc97aa8fasm7244800plb.17.2024.03.12.14.44.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:47 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 14/16] net: execute custom callback from napi Date: Tue, 12 Mar 2024 14:44:28 -0700 Message-ID: <20240312214430.2923019-15-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov Sometimes we want to access a napi protected resource from task context like in the case of io_uring zc falling back to copy and accessing the buffer ring. Add a helper function that allows to execute a custom function from napi context by first stopping it similarly to napi_busy_loop(). Experimental, needs much polishing and sharing bits with napi_busy_loop(). Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/busy_poll.h | 7 +++++++ net/core/dev.c | 46 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 53 insertions(+) diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index 9b09acac538e..9f4a40898118 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -47,6 +47,8 @@ bool sk_busy_loop_end(void *p, unsigned long start_time); void napi_busy_loop(unsigned int napi_id, bool (*loop_end)(void *, unsigned long), void *loop_end_arg, bool prefer_busy_poll, u16 budget); +void napi_execute(struct napi_struct *napi, + void (*cb)(void *), void *cb_arg); void napi_busy_loop_rcu(unsigned int napi_id, bool (*loop_end)(void *, unsigned long), @@ -63,6 +65,11 @@ static inline bool sk_can_busy_loop(struct sock *sk) return false; } +static inline void napi_execute(struct napi_struct *napi, + void (*cb)(void *), void *cb_arg) +{ +} + #endif /* CONFIG_NET_RX_BUSY_POLL */ static inline unsigned long busy_loop_current_time(void) diff --git a/net/core/dev.c b/net/core/dev.c index 2096ff57685a..4de173667233 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6663,6 +6663,52 @@ void napi_busy_loop(unsigned int napi_id, } EXPORT_SYMBOL(napi_busy_loop); +void napi_execute(struct napi_struct *napi, + void (*cb)(void *), void *cb_arg) +{ + bool done = false; + unsigned long val; + void *have_poll_lock = NULL; + + rcu_read_lock(); + + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); + for (;;) { + local_bh_disable(); + val = READ_ONCE(napi->state); + + /* If multiple threads are competing for this napi, + * we avoid dirtying napi->state as much as we can. + */ + if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED | + NAPIF_STATE_IN_BUSY_POLL)) + goto restart; + + if (cmpxchg(&napi->state, val, + val | NAPIF_STATE_IN_BUSY_POLL | + NAPIF_STATE_SCHED) != val) + goto restart; + + have_poll_lock = netpoll_poll_lock(napi); + cb(cb_arg); + done = true; + gro_normal_list(napi); + local_bh_enable(); + break; +restart: + local_bh_enable(); + if (unlikely(need_resched())) + break; + cpu_relax(); + } + if (done) + busy_poll_stop(napi, have_poll_lock, false, 1); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); + rcu_read_unlock(); +} + #endif /* CONFIG_NET_RX_BUSY_POLL */ static void napi_hash_add(struct napi_struct *napi) From patchwork Tue Mar 12 21:44:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590627 Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76EB614535E for ; Tue, 12 Mar 2024 21:44:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279890; cv=none; b=DDai6Rf+qVeGQLwSZhai+IXQLg7FF2aie7TMoGs7AAj4VsBbVGg2fmJCMq/hl7okmOytoDJI05sc81E6lQXqIJEjpsFGlt3Q0nztyVz5L5FNJdPvYqg9PJxx8W+Cm0mkC0PAg3AjfVlOY+7sk4a/9mkEK0H3RGWI+8Px6ZHsTy0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279890; c=relaxed/simple; bh=z2w+r09/BcYGkyADiO4nrIRRIVTOhcEE99lfEgsktAM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HF+iT7shccrt7/GFOuOVAE4oBgyp+hc+dLnm3rrbQROZvArPfWtM9TnyPz70kfpT9Rqgd2gElPZCVm+DnCPqRvxXF65msnE8BFzyONovee8I5R87unjXhdWOlyL9SM57hUxhU6StaCjrUhvS9CoKh0O6sR4jnnZq7Spkb4t0ymc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=vgL2p05t; arc=none smtp.client-ip=209.85.210.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="vgL2p05t" Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-6e6afb754fcso682252b3a.3 for ; Tue, 12 Mar 2024 14:44:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279888; x=1710884688; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qIC4Ae0sae8eShK2Gi00SxXSModbCHeO8tBGSVnxdXA=; b=vgL2p05tgSlpBkFqYqHwqlzETlIEEUSl5R3ZvHwcFmVyFkfr2fTDhX/ifvRNvZ/I1K O3Rx5Xyq0AZkdGcBwEXU/R/lCA+GYrjv+Oq06Cy+t6f2jp8B56XJLAp0eJ+9G5kYbie1 l84xr7tfb42ny9/J5ICnGz5iG8DpDnb3EMnfX6bere4ha5Ad+emKUHez1JZE1ZLbEkSx AO1k/Be7l2OBSaJINCuaTdE7T/5MYtqrKVNh520IuAmcUkmGlgLubUmQpbpAxAGtVNXp WCN4FJrn6K8bFPidDyL/KnRHQ1uysHUcjosPBP0X53JQd47NzHjNTLo5ZvDefmLaTd+o Zekw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279888; x=1710884688; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qIC4Ae0sae8eShK2Gi00SxXSModbCHeO8tBGSVnxdXA=; b=cprg1Bz4GgxbqXu4ej21v3GDlCE/ru/1uwQKr2tFGC967XRUHfj295fUpqt2agHx+e I+j/clFfzhhmHh/a5sfffsq+goSDEg20+gpuLwkES3IaqKyKAcoLQaDWCq9Y3vPDDKx4 MxKp82TUF9yfwsT7aWt5eQukyEyrX209E7atzq+BlCEXL43X9V5LMasncdZqdzCmuwZX AoIj+plBiMQHSFwr2uJ3WC/CdioILJE7vnCZ2S1aespRRWYn0UE939rCWwThhA+Pr6UL 8O4KksbJpCAN35/6dpUFeOlYH/9+9z5srJJCcpfeWcy2v2+LToQkf67KZ4TAHpibGlst Io6Q== X-Gm-Message-State: AOJu0Yz5QMV+jZGfAcrX7uJYyw6lcs7wyoFuKYO7+Hh65//R6qq6gTS6 T0YCv1lq1cQDKPdT2q3uNkOavtlfJNw3gLvRfNKXafA5X0/88L5FyRPxmM1ZMpM2d2XVqShUGdl x X-Google-Smtp-Source: AGHT+IGmk/rm4/qpneP5Z5KvYp/gR8eyBX/7oPoJnJNaHVb+8SwNc608T+M0WqH8X/aPFxjSmYTQjg== X-Received: by 2002:a05:6a00:4f96:b0:6e5:5597:822d with SMTP id ld22-20020a056a004f9600b006e55597822dmr842673pfb.33.1710279888574; Tue, 12 Mar 2024 14:44:48 -0700 (PDT) Received: from localhost (fwdproxy-prn-025.fbsv.net. [2a03:2880:ff:19::face:b00c]) by smtp.gmail.com with ESMTPSA id r10-20020aa7988a000000b006e681769ee0sm5808845pfl.145.2024.03.12.14.44.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:48 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 15/16] io_uring/zcrx: add copy fallback Date: Tue, 12 Mar 2024 14:44:29 -0700 Message-ID: <20240312214430.2923019-16-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Currently, if user fails to keep up with the network and doesn't refill the buffer ring fast enough the NIC/driver will start dropping packets. That might be too punishing. Add a fallback path, which would allow drivers to allocate normal pages when there is starvation, then zc_rx_recv_skb() we'll detect them and copy into the user specified buffers, when they become available. That should help with adoption and also help the user striking the right balance allocating just the right amount of zerocopy buffers but also being resilient to sudden surges in traffic. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- io_uring/zc_rx.c | 111 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 105 insertions(+), 6 deletions(-) diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index bb9251111735..d5f49590e682 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -8,6 +8,7 @@ #include #include +#include #include #include @@ -26,6 +27,11 @@ struct io_zc_rx_args { struct socket *sock; }; +struct io_zc_refill_data { + struct io_zc_rx_ifq *ifq; + struct io_zc_rx_buf *buf; +}; + typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); static int __io_queue_mgmt(struct net_device *dev, struct io_zc_rx_ifq *ifq, @@ -648,6 +654,34 @@ const struct memory_provider_ops io_uring_pp_zc_ops = { }; EXPORT_SYMBOL(io_uring_pp_zc_ops); +static void io_napi_refill(void *data) +{ + struct io_zc_refill_data *rd = data; + struct io_zc_rx_ifq *ifq = rd->ifq; + netmem_ref netmem; + + if (WARN_ON_ONCE(!ifq->pp)) + return; + + netmem = page_pool_alloc_netmem(ifq->pp, GFP_ATOMIC | __GFP_NOWARN); + if (!netmem) + return; + if (WARN_ON_ONCE(!netmem_is_net_iov(netmem))) + return; + + rd->buf = io_niov_to_buf(netmem_to_net_iov(netmem)); +} + +static struct io_zc_rx_buf *io_zc_get_buf_task_safe(struct io_zc_rx_ifq *ifq) +{ + struct io_zc_refill_data rd = { + .ifq = ifq, + }; + + napi_execute(ifq->pp->p.napi, io_napi_refill, &rd); + return rd.buf; +} + static bool zc_rx_queue_cqe(struct io_kiocb *req, struct io_zc_rx_buf *buf, struct io_zc_rx_ifq *ifq, int off, int len) { @@ -669,6 +703,42 @@ static bool zc_rx_queue_cqe(struct io_kiocb *req, struct io_zc_rx_buf *buf, return true; } +static ssize_t zc_rx_copy_chunk(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + void *data, unsigned int offset, size_t len) +{ + size_t copy_size, copied = 0; + struct io_zc_rx_buf *buf; + int ret = 0, off = 0; + u8 *vaddr; + + do { + buf = io_zc_get_buf_task_safe(ifq); + if (!buf) { + ret = -ENOMEM; + break; + } + + vaddr = kmap_local_page(buf->page); + copy_size = min_t(size_t, PAGE_SIZE, len); + memcpy(vaddr, data + offset, copy_size); + kunmap_local(vaddr); + + if (!zc_rx_queue_cqe(req, buf, ifq, off, copy_size)) { + napi_pp_put_page(net_iov_to_netmem(&buf->niov), false); + return -ENOSPC; + } + + io_zc_rx_get_buf_uref(buf); + napi_pp_put_page(net_iov_to_netmem(&buf->niov), false); + + offset += copy_size; + len -= copy_size; + copied += copy_size; + } while (offset < len); + + return copied ? copied : ret; +} + static int zc_rx_recv_frag(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, const skb_frag_t *frag, int off, int len) { @@ -688,7 +758,22 @@ static int zc_rx_recv_frag(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, return -ENOSPC; io_zc_rx_get_buf_uref(buf); } else { - return -EOPNOTSUPP; + struct page *page = skb_frag_page(frag); + u32 p_off, p_len, t, copied = 0; + u8 *vaddr; + int ret = 0; + + skb_frag_foreach_page(frag, off, len, + page, p_off, p_len, t) { + vaddr = kmap_local_page(page); + ret = zc_rx_copy_chunk(req, ifq, vaddr, p_off, p_len); + kunmap_local(vaddr); + + if (ret < 0) + return copied ? copied : ret; + copied += ret; + } + len = copied; } return len; @@ -702,15 +787,29 @@ zc_rx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, struct io_zc_rx_ifq *ifq = args->ifq; struct io_kiocb *req = args->req; struct sk_buff *frag_iter; - unsigned start, start_off; + unsigned start, start_off = offset; int i, copy, end, off; int ret = 0; - start = skb_headlen(skb); - start_off = offset; + if (unlikely(offset < skb_headlen(skb))) { + ssize_t copied; + size_t to_copy; - if (offset < start) - return -EOPNOTSUPP; + to_copy = min_t(size_t, skb_headlen(skb) - offset, len); + copied = zc_rx_copy_chunk(req, ifq, skb->data, offset, to_copy); + if (copied < 0) { + ret = copied; + goto out; + } + offset += copied; + len -= copied; + if (!len) + goto out; + if (offset != skb_headlen(skb)) + goto out; + } + + start = skb_headlen(skb); for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { const skb_frag_t *frag; From patchwork Tue Mar 12 21:44:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590628 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 996BE146E69 for ; Tue, 12 Mar 2024 21:44:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279892; cv=none; b=c2tGcQRgj/ffKL5EP5+yGuY4aDOYjxlxHIOSaBWMzJZYstbBL7Pq+nugYZztcQhK4Et3HYClvwhpNPhloInC1k/IwPKqLr9/W3+b/DZKxG8AqL2e9POOTFZMfqO70kgrQljaPJV3z41OG+FexTu7H8zgxQ80QFcew/ou99T9a6I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279892; c=relaxed/simple; bh=/tu8OaqjGRKbIV/+P2KUk+/RprWqiM4IZ65Z0Qn3FRA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F/YJbv4AxPSWI9b7ZZgr1zrLQhvaEABCEv5UeWiU6ShC3IR5yhro92e99/ELDt3hutRQMXq7j3fnGHdfsxVIgWLARUfKxeqA9e+7NiDAl3Sg8Rda5+xQHIyEoErSJ4ohe1PVMCOzLpkOSjB0vQbz1Q5QPaQGMSU0gaiRXOFhj3w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=H2vPnXv3; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="H2vPnXv3" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-6e5dddd3b95so221775b3a.1 for ; Tue, 12 Mar 2024 14:44:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279889; x=1710884689; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7Byha3rgEcVqJGeCwdpNjvlHNAMt1U8ih/ENexG1tbU=; b=H2vPnXv3/xChQJF93NEaRsLZmPxjrvonsn7s508F3MhoXgvfjv30uuhy+nRzZXfMbr SVSNOeeX7L5GWDsL6vL+/V06wkckFOO/lvNbV1zL9jJg//fWFLT5QRv/5tmGqd2olons +rVKBu7WjKuQr+tjqJtCejNlsMNDrrgQRJoS1Y0HmHsbYM3kIg2oUB43SojpK03pBeBp Fsekue/drGyBSMyuMffZH+1w+2jZRyBCUvI4YEowcbGwTV5BUgnazfRqAQTmWO0y77Jl FfmAe+M8kTXrxPjuZdbvKzxKxcJA4JNMMKPwv7G3vbgQ+IecB+Wf7znuZkfepf6PXAlT T+CQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279889; x=1710884689; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7Byha3rgEcVqJGeCwdpNjvlHNAMt1U8ih/ENexG1tbU=; b=aqkwuwvLOx50AesnB7uR34oVZPyJg8E+f0Bc+u/YF4NjP1Fo2FbBXiTcSvDLmYO2Ju uZPHtGHR8ufcuoGDTwnbRBd8BPaGPf+gEw1Pi4JhTykdOCaCTKXjblD8bS4qvHd6xGIM sEQxtD6xTuYiaNci178YShs3TLRfd3lN7uEbdqfL7Jb+NT+aNC9yE3kh1dWWukArrcem OGwX0KEZ1xv4hL/lYRhb+QbsalbwdZoGyg0tCvhXe6SCBBcPnKERsWEHZ7LFDRP1hEZN 7TFrDhKoBNowStOnj3X/97glarmOXTfdiPvUDFVcIroAff03Umfnx7J3QFdNSZcw80XG xMJA== X-Gm-Message-State: AOJu0YzA4uJHrgXHpeOecKzEx+GfuzSfCFUZ2NiYbkmtLiyNj2iR6JXo 23jr2Ve1drzJlIL62ovruu7xe9y3O/uniTfzSghKbnA9EVGYDOdr0OXUmqGBPGPuVgWr4v2zUaL y X-Google-Smtp-Source: AGHT+IFeQGe+SoZYHD9gW4lOdyAL/Eqt5tURDQaxCC+tNryp+pFHXVOaS3pOK8fBrf9BwedyA6o9IQ== X-Received: by 2002:a05:6a00:cd2:b0:6e6:b4f3:19ec with SMTP id b18-20020a056a000cd200b006e6b4f319ecmr753596pfv.7.1710279889564; Tue, 12 Mar 2024 14:44:49 -0700 (PDT) Received: from localhost (fwdproxy-prn-013.fbsv.net. [2a03:2880:ff:d::face:b00c]) by smtp.gmail.com with ESMTPSA id d8-20020a056a0010c800b006e685c36d41sm5201452pfu.55.2024.03.12.14.44.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:49 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 16/16] veth: add support for io_uring zc rx Date: Tue, 12 Mar 2024 14:44:30 -0700 Message-ID: <20240312214430.2923019-17-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov Not for upstream, testing only Add io_uring zerocopy support for veth. It's not truly zerocopy, the data is copied in napi, but that's early in the stack and so useful for now for testing. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- drivers/net/veth.c | 214 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 208 insertions(+), 6 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 500b9dfccd08..b56e06113453 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -26,6 +26,8 @@ #include #include #include +#include +#include #include #define DRV_NAME "veth" @@ -67,6 +69,7 @@ struct veth_rq { struct ptr_ring xdp_ring; struct xdp_rxq_info xdp_rxq; struct page_pool *page_pool; + struct netdev_rx_queue rq; }; struct veth_priv { @@ -75,6 +78,7 @@ struct veth_priv { struct bpf_prog *_xdp_prog; struct veth_rq *rq; unsigned int requested_headroom; + bool zc_installed; }; struct veth_xdp_tx_bq { @@ -335,9 +339,12 @@ static bool veth_skb_is_eligible_for_gro(const struct net_device *dev, const struct net_device *rcv, const struct sk_buff *skb) { + struct veth_priv *rcv_priv = netdev_priv(rcv); + return !(dev->features & NETIF_F_ALL_TSO) || (skb->destructor == sock_wfree && - rcv->features & (NETIF_F_GRO_FRAGLIST | NETIF_F_GRO_UDP_FWD)); + rcv->features & (NETIF_F_GRO_FRAGLIST | NETIF_F_GRO_UDP_FWD)) || + rcv_priv->zc_installed; } static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) @@ -726,6 +733,9 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, struct sk_buff *skb = *pskb; u32 frame_sz; + if (WARN_ON_ONCE(1)) + return -EFAULT; + if (skb_shared(skb) || skb_head_is_locked(skb) || skb_shinfo(skb)->nr_frags || skb_headroom(skb) < XDP_PACKET_HEADROOM) { @@ -758,6 +768,90 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, return -ENOMEM; } +static noinline struct sk_buff *veth_iou_rcv_skb(struct veth_rq *rq, + struct sk_buff *skb) +{ + struct sk_buff *nskb; + u32 size, len, off, max_head_size; + struct page *page; + int ret, i, head_off; + void *vaddr; + + /* Testing only, randomly send normal pages to test copy fallback */ + if (ktime_get_ns() % 16 == 0) + return skb; + + skb_prepare_for_gro(skb); + max_head_size = skb_headlen(skb); + + rcu_read_lock(); + nskb = napi_alloc_skb(&rq->xdp_napi, max_head_size); + if (!nskb) + goto drop; + + skb_copy_header(nskb, skb); + skb_mark_for_recycle(nskb); + + size = max_head_size; + if (skb_copy_bits(skb, 0, nskb->data, size)) { + consume_skb(nskb); + goto drop; + } + skb_put(nskb, size); + head_off = skb_headroom(nskb) - skb_headroom(skb); + skb_headers_offset_update(nskb, head_off); + + /* Allocate paged area of new skb */ + off = size; + len = skb->len - off; + + for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) { + struct io_zc_rx_buf *buf; + netmem_ref netmem; + + netmem = page_pool_alloc_netmem(rq->page_pool, GFP_ATOMIC | __GFP_NOWARN); + if (!netmem) { + consume_skb(nskb); + goto drop; + } + if (WARN_ON_ONCE(!netmem_is_net_iov(netmem))) { + consume_skb(nskb); + goto drop; + } + + buf = container_of(netmem_to_net_iov(netmem), + struct io_zc_rx_buf, niov); + page = buf->page; + + if (WARN_ON_ONCE(buf->niov.pp != rq->page_pool)) + goto drop; + + size = min_t(u32, len, PAGE_SIZE); + skb_add_rx_frag_netmem(nskb, i, netmem, 0, size, PAGE_SIZE); + + vaddr = kmap_atomic(page); + ret = skb_copy_bits(skb, off, vaddr, size); + kunmap_atomic(vaddr); + + if (ret) { + consume_skb(nskb); + goto drop; + } + len -= size; + off += size; + } + rcu_read_unlock(); + + consume_skb(skb); + skb = nskb; + return skb; +drop: + rcu_read_unlock(); + kfree_skb(skb); + return NULL; +} + + static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb, struct veth_xdp_tx_bq *bq, @@ -901,8 +995,13 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, /* ndo_start_xmit */ struct sk_buff *skb = ptr; - stats->xdp_bytes += skb->len; - skb = veth_xdp_rcv_skb(rq, skb, bq, stats); + if (rq->page_pool->mp_ops == &io_uring_pp_zc_ops) { + skb = veth_iou_rcv_skb(rq, skb); + } else { + stats->xdp_bytes += skb->len; + skb = veth_xdp_rcv_skb(rq, skb, bq, stats); + } + if (skb) { if (skb_shared(skb) || skb_unclone(skb, GFP_ATOMIC)) netif_receive_skb(skb); @@ -961,15 +1060,22 @@ static int veth_poll(struct napi_struct *napi, int budget) return done; } -static int veth_create_page_pool(struct veth_rq *rq) +static int veth_create_page_pool(struct veth_rq *rq, struct io_zc_rx_ifq *ifq) { struct page_pool_params pp_params = { .order = 0, .pool_size = VETH_RING_SIZE, .nid = NUMA_NO_NODE, .dev = &rq->dev->dev, + .napi = &rq->xdp_napi, }; + if (ifq) { + rq->rq.pp_private = ifq; + rq->rq.pp_ops = &io_uring_pp_zc_ops; + pp_params.queue = &rq->rq; + } + rq->page_pool = page_pool_create(&pp_params); if (IS_ERR(rq->page_pool)) { int err = PTR_ERR(rq->page_pool); @@ -987,7 +1093,7 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end) int err, i; for (i = start; i < end; i++) { - err = veth_create_page_pool(&priv->rq[i]); + err = veth_create_page_pool(&priv->rq[i], NULL); if (err) goto err_page_pool; } @@ -1043,9 +1149,17 @@ static void veth_napi_del_range(struct net_device *dev, int start, int end) for (i = start; i < end; i++) { struct veth_rq *rq = &priv->rq[i]; + void *ptr; + int nr = 0; rq->rx_notify_masked = false; - ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free); + + while ((ptr = ptr_ring_consume(&rq->xdp_ring))) { + veth_ptr_free(ptr); + nr++; + } + + ptr_ring_cleanup(&rq->xdp_ring, NULL); } for (i = start; i < end; i++) { @@ -1281,6 +1395,9 @@ static int veth_set_channels(struct net_device *dev, struct net_device *peer; int err; + if (priv->zc_installed) + return -EINVAL; + /* sanity check. Upper bounds are already enforced by the caller */ if (!ch->rx_count || !ch->tx_count) return -EINVAL; @@ -1358,6 +1475,8 @@ static int veth_open(struct net_device *dev) struct net_device *peer = rtnl_dereference(priv->peer); int err; + priv->zc_installed = false; + if (!peer) return -ENOTCONN; @@ -1536,6 +1655,84 @@ static void veth_set_rx_headroom(struct net_device *dev, int new_hr) rcu_read_unlock(); } +static int __veth_iou_set(struct net_device *dev, + struct netdev_bpf *xdp) +{ + bool napi_already_on = veth_gro_requested(dev) && (dev->flags & IFF_UP); + unsigned qid = xdp->zc_rx.queue_id; + struct veth_priv *priv = netdev_priv(dev); + struct net_device *peer; + struct veth_rq *rq; + int ret; + + if (priv->_xdp_prog) + return -EINVAL; + if (qid >= dev->real_num_rx_queues) + return -EINVAL; + if (!(dev->flags & IFF_UP)) + return -EOPNOTSUPP; + if (dev->real_num_rx_queues != 1) + return -EINVAL; + rq = &priv->rq[qid]; + + if (!xdp->zc_rx.ifq) { + if (!priv->zc_installed) + return -EINVAL; + + veth_napi_del(dev); + priv->zc_installed = false; + if (!veth_gro_requested(dev) && netif_running(dev)) { + dev->features &= ~NETIF_F_GRO; + netdev_features_change(dev); + } + return 0; + } + + if (priv->zc_installed) + return -EINVAL; + + peer = rtnl_dereference(priv->peer); + peer->hw_features &= ~NETIF_F_GSO_SOFTWARE; + + ret = veth_create_page_pool(rq, xdp->zc_rx.ifq); + if (ret) + return ret; + + ret = ptr_ring_init(&rq->xdp_ring, VETH_RING_SIZE, GFP_KERNEL); + if (ret) { + page_pool_destroy(rq->page_pool); + rq->page_pool = NULL; + return ret; + } + + priv->zc_installed = true; + + if (!veth_gro_requested(dev)) { + /* user-space did not require GRO, but adding XDP + * is supposed to get GRO working + */ + dev->features |= NETIF_F_GRO; + netdev_features_change(dev); + } + if (!napi_already_on) { + netif_napi_add(dev, &rq->xdp_napi, veth_poll); + napi_enable(&rq->xdp_napi); + rcu_assign_pointer(rq->napi, &rq->xdp_napi); + } + return 0; +} + +static int veth_iou_set(struct net_device *dev, + struct netdev_bpf *xdp) +{ + int ret; + + rtnl_lock(); + ret = __veth_iou_set(dev, xdp); + rtnl_unlock(); + return ret; +} + static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, struct netlink_ext_ack *extack) { @@ -1545,6 +1742,9 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, unsigned int max_mtu; int err; + if (priv->zc_installed) + return -EINVAL; + old_prog = priv->_xdp_prog; priv->_xdp_prog = prog; peer = rtnl_dereference(priv->peer); @@ -1623,6 +1823,8 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) switch (xdp->command) { case XDP_SETUP_PROG: return veth_xdp_set(dev, xdp->prog, xdp->extack); + case XDP_SETUP_ZC_RX: + return veth_iou_set(dev, xdp); default: return -EINVAL; }