From patchwork Tue Mar 12 21:44:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590631 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8AB1F143C75 for ; Tue, 12 Mar 2024 21:44:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279876; cv=none; b=Yl0EukIVRHgvZr8IReKiQpckUXuGiQX0cAK7NhY1Gx+LPYYbctuKtIv3k6IVBjWi3M7jAr0LG0XLYur1Nr9wW7E8MWB158a24lmzZsCj15+U1slpNhNzJHx/5p80+ajfrfeW8oxUVqpldUmhvAniBbSZYCyAk+MGZ/xJ5EnQcjY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279876; c=relaxed/simple; bh=5LIz+n21IO/BI82JjqRprXHon+WApM+Js27POLGqQWc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lpoHZAnoNx/t8MHhc0Ou9AmA3y9VDdsAIrNKvXr6V2RN1/8lw/qf4sRTrHro3+6X2cXsCBxuqbJ6itlk9EYihbVbmLAbagpFOhGoUnjvu6cjoG2o9VRmdi60FNwEmeEOHVbe+7+S2h3t5nhH3pAVOTkC5eSHHNd5W0A4GQ+HbME= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=Pwv0Gfy7; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="Pwv0Gfy7" Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-6e6ac741ff7so787777b3a.0 for ; Tue, 12 Mar 2024 14:44:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279875; x=1710884675; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Ki4E7PMtuF4SSoJCESY70J5JzUkN3PrG7EZAs5PtLHg=; b=Pwv0Gfy7BNKH8v92w1sXPteJAvi/mSXFv/Id3cgoo9jYzp3kDNMgrmxH383koBD2Jh 3r7LYUZXfrAfe0qzZXewMdwnaZX6XU4K/XosLjCYJ4vsJQHYyMYyJw6f5ApP0GaFpOMn 7pqA/DRHAmqYPMXUACUh2o1AsGe+bshtL21jquYGxTCTmEFSLNQhD5o9RFd6znB+QK6S EbgNlb2i/tDEAytbupO1pKZU0POvVmw65OQSiddT1M/5XoZNCcZbFjTY+q6Zx+zdpOLZ pzSbuqIoy7VDkRyS1Xq1d5uplwzcUZOJK1oi0IIJnpg1SU22hkShOzGpF27O5kH0G8rx 9DnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279875; x=1710884675; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ki4E7PMtuF4SSoJCESY70J5JzUkN3PrG7EZAs5PtLHg=; b=uGnVXHKa34Z1AeYPabwzkGQ67XN7K6adbpfRlwVOBmM78UI2WLfvIy6v5hIGhMC/Tf VTu4HYepfcmQ/YDsRtL5AF5bCpxyFsL/4ZSvWumAv5FkLBdC7/8YJOrAA0KLwmNetd0i 9RZJW3UejE+ns0p6yU5chqQ89LzYjcCggrmmXnHRm50tXZZVLcQoGOvUNPl5m+CbDcGA FaXiNAWg1gxCbNZMt3KqviFCNarWCCNpq7pGMwggMMIBy0UAHAvPyAouXd5EiXLPHhNG B5ai1MV4nn1GN7aTtnkBRnDSt5QWSqgyodEMCiRDL1X46uSMz9PFGC2lhvQ+uIBRmrhD fVAw== X-Forwarded-Encrypted: i=1; AJvYcCXEDCeCXxBZNN1mS/5d6EVf40yQQwZSkoNL/hU/TbmmruF5iewaNiLI17DWu3FvI5QPTQBlMXJwGY+iuwhHITI0IJACG/Od X-Gm-Message-State: AOJu0YyUMLfLyiWLeL4JWWoVe8nys+SYfRSaBky+aZYf/opk8W55Ivzj 0BQeWTFcsfLkP8DMs4AiXTiAirFBgzbRmH0e/VF+BxDoXDuzplcX9Vsvs+bGeWQ= X-Google-Smtp-Source: AGHT+IEytnLFhFfxcOEd6KuDVTAKRKneRq76aXN3P/JDiA9CUmXL59+Us4Nkh7nNStj0CDihdR7Ksw== X-Received: by 2002:a05:6a00:c95:b0:6e5:696d:9eb8 with SMTP id a21-20020a056a000c9500b006e5696d9eb8mr916681pfv.3.1710279874793; Tue, 12 Mar 2024 14:44:34 -0700 (PDT) Received: from localhost (fwdproxy-prn-024.fbsv.net. [2a03:2880:ff:18::face:b00c]) by smtp.gmail.com with ESMTPSA id t64-20020a628143000000b006e6aee6807dsm985326pfd.22.2024.03.12.14.44.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:34 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 01/16] net: generalise pp provider params passing Date: Tue, 12 Mar 2024 14:44:15 -0700 Message-ID: <20240312214430.2923019-2-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Pavel Begunkov RFC only, not for upstream Add a way to pass custom page pool parameters, but the final version should converge with devmem. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/netdev_rx_queue.h | 3 +++ net/core/dev.c | 2 +- net/core/page_pool.c | 3 +++ 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h index 5dc35628633a..41f8c4e049bb 100644 --- a/include/net/netdev_rx_queue.h +++ b/include/net/netdev_rx_queue.h @@ -26,6 +26,9 @@ struct netdev_rx_queue { */ struct napi_struct *napi; struct netdev_dmabuf_binding *binding; + + const struct memory_provider_ops *pp_ops; + void *pp_private; } ____cacheline_aligned_in_smp; /* diff --git a/net/core/dev.c b/net/core/dev.c index 255a38cf59b1..2096ff57685a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2189,7 +2189,7 @@ int netdev_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, rxq = __netif_get_rx_queue(dev, rxq_idx); - if (rxq->binding) + if (rxq->binding || rxq->pp_ops) return -EEXIST; err = xa_alloc(&binding->bound_rxq_list, &xa_idx, rxq, xa_limit_32b, diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 53039d2f8514..5d5b78878473 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -262,6 +262,9 @@ static int page_pool_init(struct page_pool *pool, if (binding) { pool->mp_ops = &dmabuf_devmem_ops; pool->mp_priv = binding; + } else if (pool->p.queue && pool->p.queue->pp_ops) { + pool->mp_ops = pool->p.queue->pp_ops; + pool->mp_priv = pool->p.queue->pp_private; } if (pool->mp_ops) { From patchwork Tue Mar 12 21:44:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590632 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B68E143C7A for ; Tue, 12 Mar 2024 21:44:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279878; cv=none; b=hlr21lp63fMiozbg5Agojp8UOjgoNvIUjWT4OGO7ikp91iBX3t9eL5H0uu6jNz7ZecR92htD2eumwCabSL+H8yCEfcPk9WnObJpGBB/568HZVY7sa19EILHyH57QzpPwkRaZUD/hOHL0Q6G24AYNeuzr6bwN4dKwqGyBXQpCELk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279878; c=relaxed/simple; bh=aE7ib+EBfLAW69AC4DQA5Mh7DG/hbMyPxEWYSNYC1ug=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sYzHzUQgafYvpdblgsevJTOzs4/MXDIfxwDh3GOY0P7C7REiFZD7X9Dsl4H8Kj7vmOa9LdJQw6sQ203drozb4BXuPlMN6p66bxtsidguXdmvllKn0dVcFAKysYkKxIqq8VfDEsdBxEXPa7zSPRADCZSYIPrGzDIcuygXG+z9lns= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=tMdL1EOO; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="tMdL1EOO" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-6e6b3dc3564so356712b3a.2 for ; Tue, 12 Mar 2024 14:44:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279876; x=1710884676; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3F8FT2j0FIObJvAopDIV6rSNV9Y4g8SglBip9V0gj+4=; b=tMdL1EOO0fOI1kxUdL7tme7vYAR9xvCk55epmnJfx2BDQOVnjpQ58blA6cqSa2NbdJ zjTC2kxtBgZ6s4FbNUVOFTr695eOKypA9ouEXudwCPzqcfOrdr9gVkB13DgEaOGCz6I2 kDisRdIMsDZAiz53mhiA5G57gJOrDwXIOyoh4gOvF6M3tc/cislo8ye1H2pCaDU60NKG jNVccKZnaYr2fpS1N0ZotVykklUXi4h9vdnV42we9acxIulDjCKojMKQIdoWeZ+7lJNh xvJtAK+jRVN6BHUPQ/zGTYrZZuR3lg6yNUF6IhfA0t81X93rQ6d42dZUQWa+TyTi6MTn QaAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279876; x=1710884676; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3F8FT2j0FIObJvAopDIV6rSNV9Y4g8SglBip9V0gj+4=; b=Ugx37mNcTz0uJkJkjVujSwahoXVR5SbsmrUoxKnWEq7lp2gFKoCCiVrjNxjBcCJZaS eOEQiw4pVBV0N/xOiatWi3RZ4e+H/UgG6v+SQxLg5dKKREN55KP6sZAqFJpCfryEvJ0l IM9QjzEM93EgIaOWLMGjpP02/oLl+wo977Y2pbW1MfBc0mGSLBiEWjWNKetTemmTvguy WtOohqiLuvNnIkg6ZJ3gEllyifwksL6HfSnjow1ydoyIlub9WrtDnB3FY74cJtAVkG5U Z/QPVJpUciPgiHQ/cGmnO3El83/psYg3crzVx7C9CjxYmZ3AxxgiD5+A3qtHBiF8iJL5 SyWQ== X-Forwarded-Encrypted: i=1; AJvYcCUoKstzFVhAREDfzmWOdSU6pgkvNvrRxEH1AiAWOPdMRLp/IKFQodetdbutlX2A7WzB/rTqpWxwQ+hayTIkvAj6Upfo4vIA X-Gm-Message-State: AOJu0YyFdHozq+tootdEsjp61ZdaarUOjnTQF5EwoHyitnR1bLDHOzQx ZDc8OYaQma3LY6Q834s1zThLxirV+n+Y6ZvJveBeGh1qQBG6E81uU1ZtI4XoZrw= X-Google-Smtp-Source: AGHT+IGykgqEeNv3UDI3vBebc0uAZwt30CKqqnPbpoZHLhV7iapU8FXMUGnxzRZGJMPaxuhaI4ymBg== X-Received: by 2002:a05:6a00:4fcb:b0:6e6:7af6:2201 with SMTP id le11-20020a056a004fcb00b006e67af62201mr712106pfb.8.1710279875807; Tue, 12 Mar 2024 14:44:35 -0700 (PDT) Received: from localhost (fwdproxy-prn-002.fbsv.net. [2a03:2880:ff:2::face:b00c]) by smtp.gmail.com with ESMTPSA id p35-20020a056a000a2300b006e6ab799457sm1304163pfh.110.2024.03.12.14.44.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:35 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 02/16] io_uring: delayed cqe commit Date: Tue, 12 Mar 2024 14:44:16 -0700 Message-ID: <20240312214430.2923019-3-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC From: Pavel Begunkov RFC only, not for upstream A stub patch allowing to delay and batch the final step of cqe posting for aux cqes. A different version will be sent separately to upstream. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/linux/io_uring_types.h | 1 + io_uring/io_uring.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index d8111d64812b..500772189fee 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -205,6 +205,7 @@ struct io_submit_state { bool plug_started; bool need_plug; + bool flush_cqes; unsigned short submit_nr; unsigned int cqes_count; struct blk_plug plug; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index cf2f514b7cc0..e44c2ef271b9 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -176,7 +176,7 @@ static struct ctl_table kernel_io_uring_disabled_table[] = { static inline void io_submit_flush_completions(struct io_ring_ctx *ctx) { if (!wq_list_empty(&ctx->submit_state.compl_reqs) || - ctx->submit_state.cqes_count) + ctx->submit_state.cqes_count || ctx->submit_state.flush_cqes) __io_submit_flush_completions(ctx); } @@ -1598,6 +1598,7 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx) io_free_batch_list(ctx, state->compl_reqs.first); INIT_WQ_LIST(&state->compl_reqs); } + ctx->submit_state.flush_cqes = false; } static unsigned io_cqring_events(struct io_ring_ctx *ctx) From patchwork Tue Mar 12 21:44:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590633 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 565DA14402D for ; Tue, 12 Mar 2024 21:44:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279878; cv=none; b=a1kLNlSby6itXlhsPyjoIRTXdrLksP0FUBVlVn54Mk7AlLGUlWJ7YU62DSiJhb/3Q9oknqCYMSpGQ8dcIHF4eLbhD/alH1nY0vSdoqMKFyLws6zUvbopxzYWXXuidWIh1vaiLhLKI9OEYAtkx5tyD8Y/TIHjUGiWXyhSfHVxjVU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279878; c=relaxed/simple; bh=W3WXMs3garqhAb32SzklKNlFKnh2r5NOCtW103aAhto=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=andXn2kDj8Vd3OFdc7ZARx5KvS340sPzUKshDjWpFd6tOXDBLNgcQZ/6I9QArThqtHV/QUhSBuOkBjWe+DRMmk8sduwo2oIiR/7AWSajiVoQN3+oYRHxPhpEyhp9vdinh3BNqhu0T5EYCOScDMs2kGe06HTxbDYqNqmu+SHlBRA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=np0WOB+7; arc=none smtp.client-ip=209.85.216.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="np0WOB+7" Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-29c14800a7fso1745189a91.2 for ; Tue, 12 Mar 2024 14:44:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279877; x=1710884677; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=e+tA/rM03kyiWmHG7W3g16fbOzLvHn1bTNN6KvcsaQ0=; b=np0WOB+7rylLFl/sAjdGwd736MiRPi/74atlRWcJqw8aDDXihCL7MjJ8A+81PHNKYF WUDDLZOYpdVwswxmNVYr4oUoKohwa/iGplXQfUfKp/GWX8EL8tfzxUiF6y/v4y/L2Rv7 i7jpZz0AyQzXYznLtMT/QGb91qhxaw099L7ZuITz2DEV69GI1JeBnqR+t0e1PWKbdE3w oHF0HyrDAEXdcjmaN/KPY1NfpULYPuTX6c4zRkQc944yH0PDY83cQ3xtXFcl+hwYiRsY W7VqUbFyOjtSOhZ8XOGLkLfsoq9XTSJmlsanlyJS0ELylT/TlM/Q5OPnI3ljIIXxzPaV Tn+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279877; x=1710884677; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e+tA/rM03kyiWmHG7W3g16fbOzLvHn1bTNN6KvcsaQ0=; b=rI5kNhgdrW6EkvRX7Ta6u9PKfXz3o1GnjXlbxVe2LDqKC09QIVYdTDBr3gusnb4Cwm g7OeuOmUug2+cfAEarLXUxlQaOUBgOieTiv5ARXyOing/WryBE7V6wskYrHdvylWNtBc /kJPnpbMllkEwuWZz650hZMXcjbPQPvhTX2PDb4umdaPJF7bg8bwHpfrXtCmYQ7yz5fW d34pw0aKS1rBYe5h5nicFqJOrZBoHZeAXqYKjRMAvO1NusZvQRH9ceHof+Ee6dFdOYh7 iT1/8HWvq4EfXnSpoOBvIZ0WRdlQAb54YtvTjOZZE2CeAGYiqf+1yd5m/aE6hwJ1MoiL vJTQ== X-Forwarded-Encrypted: i=1; AJvYcCXTJcej/2KVOBWc3z9D0xo46yWbIcleapusRMtsT1XSZnhsndcMEX6Z8iUd0xh/qJKhCAv4d4jiwJaVPrwNVPiAhBv/dx4p X-Gm-Message-State: AOJu0YwtRt5e7I82PESj9noma8NdQAgBvbRK++TgwSYq2Mp9aCRgSvPv L9dtM32ihjHhXmrZiUHiLBlKMbqhTEbSZtCMG6qVJjSQumexEhMC4WCd4DvvuTc= X-Google-Smtp-Source: AGHT+IH5+jrF0vlqMy6uApDvygsRxFdJckSqSzWO7yWz6TQbdmOZB1Brgo8GpGb8aHoRcJ50xBNLBg== X-Received: by 2002:a17:90b:400f:b0:29b:d899:e7a7 with SMTP id ie15-20020a17090b400f00b0029bd899e7a7mr3000692pjb.7.1710279876687; Tue, 12 Mar 2024 14:44:36 -0700 (PDT) Received: from localhost (fwdproxy-prn-015.fbsv.net. [2a03:2880:ff:f::face:b00c]) by smtp.gmail.com with ESMTPSA id c12-20020a17090aa60c00b0029c0cc16888sm56937pjq.1.2024.03.12.14.44.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:36 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 03/16] net: page_pool: add ->scrub mem provider callback Date: Tue, 12 Mar 2024 14:44:17 -0700 Message-ID: <20240312214430.2923019-4-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Pavel Begunkov page pool is now waiting for all ppiovs to return before destroying itself, and for that to happen the memory provider might need to push some buffers, flush caches and so on. todo: we'll try to get by without it before the final release Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/page_pool/types.h | 1 + net/core/page_pool.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 096cd2455b2c..347837b83d36 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -134,6 +134,7 @@ struct memory_provider_ops { void (*destroy)(struct page_pool *pool); netmem_ref (*alloc_pages)(struct page_pool *pool, gfp_t gfp); bool (*release_page)(struct page_pool *pool, netmem_ref netmem); + void (*scrub)(struct page_pool *pool); }; extern const struct memory_provider_ops dmabuf_devmem_ops; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 5d5b78878473..fc92e551ed13 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -984,6 +984,9 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool) static void page_pool_scrub(struct page_pool *pool) { + if (pool->mp_ops && pool->mp_ops->scrub) + pool->mp_ops->scrub(pool); + page_pool_empty_alloc_cache_once(pool); pool->destroy_cnt++; From patchwork Tue Mar 12 21:44:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590634 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pf1-f173.google.com (mail-pf1-f173.google.com [209.85.210.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65A19144043 for ; Tue, 12 Mar 2024 21:44:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279879; cv=none; b=JeuHbPRjJ+h4tnFmXMMgUjzIzxv62lYryuIolH5B3SxJZCl0be+4An1jReqQOgr7dZC6EM7t9Amg2JORAr6kaaX8Xsn2tM8+XrQ4X8kkYb/vkOITPv2DfumYslVYv8dfZcA18u47B8quH3tWDrUdbYMjUQdMK+gFPRXadH6N4XE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279879; c=relaxed/simple; bh=I4DkiUJPKP/SNnghbAnRFXvAzrtp+wFZDRulKLlCD6E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Foe1BIiuFMaIXDNOvJSIXvNSAFS6FyuQDxcRH536jGk5euAI/c/aQBr4kP6RDxj1AzlS+9w38KDdXUtCkvqqOJxA1OHaCUV4XF4PVUkAFtL3UPQCnYkdeGOoOSHc5ELZgkrpp3fFYoZf9+PIpL29ig5HPfpakzIAfTT2va/Ti7k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=2UpLVQHJ; arc=none smtp.client-ip=209.85.210.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="2UpLVQHJ" Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-6e6b22af648so239777b3a.0 for ; Tue, 12 Mar 2024 14:44:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279878; x=1710884678; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OHnlF2xfY1sV5nPIj52b/9LW93hoRYAyd9rWuaJTtKA=; b=2UpLVQHJG0fDY8HG5IWIn1MIctW8hyHIXq8ZDotVbnBQQSvXs6sGCPuFVKuMc046Qz lfa1sGrtJfYA56SyQ48n8fg5QZ+bQdk45nnrB9fIVWp/TQ5/WSBVWOVuqqY5Z2VWgRlU D3fKcPA+xxZ0JA76GjYXtlusoKyXCDSLtJQntRaUf2+8i/pme20zt/Phq6wCoeZdFPuU 8FlV/X34qrlqbA1E0d+yz3AOfmRK2iTTa9DJzY4wCoTfYt+mrIH3GCu/4L78QhYHEwfN XosFR6e3D4hAN8gx9uX19vpBYVuEvOZAMASYYZjOxefBhqDQdCX7IQyaW+wFjLJlQ6KB K6yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279878; x=1710884678; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OHnlF2xfY1sV5nPIj52b/9LW93hoRYAyd9rWuaJTtKA=; b=e/68AyIyrLU/pfSoi9tmnFGP/nUoM6llFeHN/IN6ep/CZugO/muT3fc6apIzE00eGM IZYvVYkkZJ7NnWNLHe0pqhJIqOhIWzmQw2LNeJ34tZSi45eo5KOfvk+nbXPP9Rm/2oPE YfUOZZBSZ/wYO2pE4bwpjjmyc5Toz3YP7oI8McXv3uuitiTvvyF3JzMOi2ZRMI3wz/Dp GcMX81BF8/v3iR4NgPzb+mPBIessUgCrSXU/He8YT0s7cwhSLuY4PRB4Sd1IEcNw4ciA ko32zrE+HhOh4nliCDR/ZqxgO4sMJ7Z0I/1V6XyzD5gVOdd/wTbnvITOanTBM4tIxHK8 /21w== X-Forwarded-Encrypted: i=1; AJvYcCX/oMGY3NG4kwEcrFQaWp9NsKKIhCrt9TqFDJqR1J6zs2VINEzYZRX7X/xP9xcDPMen/WfVvvll6i0At1KK7qbpYUO4WK/9 X-Gm-Message-State: AOJu0YzwnS0G6YsRQMFrNpFuiRtevOcHT7smUdjOQPpZw6itfO4dLp8b GpV6n/P2ND8YsVu2ryaIDiYiWZZLHtcn7GHKTHpkg7wI0xwVn6pM2PPQcPJJ21GYQJJ5gVy0yL4 y X-Google-Smtp-Source: AGHT+IGijpkcSJHwZGLpndsro/SSwC0BV49Gse2r6r8xefCXAEDgDSknnBQKaFjRr1FvLb0RWkrOEw== X-Received: by 2002:a05:6a20:1583:b0:1a3:1180:4232 with SMTP id h3-20020a056a20158300b001a311804232mr1061647pzj.29.1710279877802; Tue, 12 Mar 2024 14:44:37 -0700 (PDT) Received: from localhost (fwdproxy-prn-020.fbsv.net. [2a03:2880:ff:14::face:b00c]) by smtp.gmail.com with ESMTPSA id c15-20020aa78c0f000000b006e623357178sm6774016pfd.176.2024.03.12.14.44.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:37 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 04/16] io_uring: separate header for exported net bits Date: Tue, 12 Mar 2024 14:44:18 -0700 Message-ID: <20240312214430.2923019-5-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Pavel Begunkov We're exporting some io_uring bits to networking, e.g. for implementing a net callback for io_uring cmds, but we don't want to expose more than needed. Add a separate header for networking. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/linux/io_uring.h | 6 ------ include/linux/io_uring/net.h | 18 ++++++++++++++++++ io_uring/uring_cmd.c | 1 + net/socket.c | 2 +- 4 files changed, 20 insertions(+), 7 deletions(-) create mode 100644 include/linux/io_uring/net.h diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 68ed6697fece..e123d5e17b52 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -11,7 +11,6 @@ void __io_uring_cancel(bool cancel_all); void __io_uring_free(struct task_struct *tsk); void io_uring_unreg_ringfd(void); const char *io_uring_get_opcode(u8 opcode); -int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); bool io_is_uring_fops(struct file *file); static inline void io_uring_files_cancel(void) @@ -45,11 +44,6 @@ static inline const char *io_uring_get_opcode(u8 opcode) { return ""; } -static inline int io_uring_cmd_sock(struct io_uring_cmd *cmd, - unsigned int issue_flags) -{ - return -EOPNOTSUPP; -} static inline bool io_is_uring_fops(struct file *file) { return false; diff --git a/include/linux/io_uring/net.h b/include/linux/io_uring/net.h new file mode 100644 index 000000000000..b58f39fed4d5 --- /dev/null +++ b/include/linux/io_uring/net.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_IO_URING_NET_H +#define _LINUX_IO_URING_NET_H + +struct io_uring_cmd; + +#if defined(CONFIG_IO_URING) +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); + +#else +static inline int io_uring_cmd_sock(struct io_uring_cmd *cmd, + unsigned int issue_flags) +{ + return -EOPNOTSUPP; +} +#endif + +#endif diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 42f63adfa54a..0b504b33806c 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -3,6 +3,7 @@ #include #include #include +#include #include #include #include diff --git a/net/socket.c b/net/socket.c index ed3df2f749bf..c69cd0e652b8 100644 --- a/net/socket.c +++ b/net/socket.c @@ -88,7 +88,7 @@ #include #include #include -#include +#include #include #include From patchwork Tue Mar 12 21:44:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590635 Received: from mail-oo1-f41.google.com (mail-oo1-f41.google.com [209.85.161.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C87461448D0 for ; Tue, 12 Mar 2024 21:44:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279881; cv=none; b=PWHAoPO4oGsDOtz5tGbZKhERrf9vUiMA3ieHCISlYypSMkPdUResQHiNHqiml933j3ROuZkTvza6twgKrCicggCIjXMRQTkqKas3BdsAt10VUtzJlZnGqR7q7A3ZoCI5ohJuTbig57PMi0Dxe+AWpJSBkWwFbNeS9/ib2MTsC1k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279881; c=relaxed/simple; bh=xDw4xwqYNvzEtsEJkeXp/EDyNctETx7Y/M4UY4XenI0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ka/BDTCM/ZPks71w5Cx1ukiDmLQabHOTh67+Flu/HcAuUlVHcrn15wBCBeaz2div/2SVS/DSQuVy+aiwA0L9+3tc8cDZdxyQaend5pbjeFJ/NgxfkfbN7wbuNN+hsFf2H6p/vDbDnBhDkbRwHAiU3TvixOhEu4afN/wB0KJh2+M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=Z7F/TmZG; arc=none smtp.client-ip=209.85.161.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="Z7F/TmZG" Received: by mail-oo1-f41.google.com with SMTP id 006d021491bc7-5a1ca29db62so2850171eaf.0 for ; Tue, 12 Mar 2024 14:44:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279879; x=1710884679; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GaKAi998LkbXWeL8EN5F32Ym68wcxO7wQb8zF1vI+q4=; b=Z7F/TmZGgbkjRJLZ9NBFR384kenlH+pGUruI7BwmRJWXJIRWYVA39fckbC3HRcAnMs wH7bqAxFYBUvqaCwiBBvEHjdd34/4gWFTmt38KK0rQSe3CytVx0kGFwZQHFD/6KM/xZq /jVfXjDYagO/VFem7P6t1ncPyxc1qfVPkDZ9iHc8U+l0xJcNaKQ6FfU0JhveG++9u+9t ujlXfDSbkH8gz0X5jeEKFpSbA7Z8qSSn+/ysdEyBRXgDKPNIOGjkKupNq3OQJjNiWsoL wr1PGho9xQMtbJ+bXmhCNMG612w1R4V0j18M3Oeo+rLFyUOiiJCU+9x2hiap8Nbxsn4W XONw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279879; x=1710884679; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GaKAi998LkbXWeL8EN5F32Ym68wcxO7wQb8zF1vI+q4=; b=OQNy9vVGRIdtO+wgu+KXnyd616rgpKYc51F2DPV9UHXhX4G58sGrYmyeRLBdwNYOUw 9cW3YHKLpI6HU+WC56pZ9tdMaAeVhptCw8POC3oE1Zo5DF53xVHNRxDukykxxmrwPv9Y PqCnZVMGh3z44tRk4uTVQN71xqt91KuZCQo+A65OoQvexC7LtOPCdMgT4EGZc84FC4V4 KKH5dsIUlWhbQfj8UD4hk7IDdcQ2zOxA2XN4xHBuq5JVfio0NzFKKkYhqlUxNk4vooly qn/w2M2U0suGER0Etei5N4dkAcIMJM+niRN2k0ZFTHTq69Xws9fXnZKMyi46FQTHLeaz foLg== X-Forwarded-Encrypted: i=1; AJvYcCUbvclMHP1D8u7WUjDmuXB9l9d0qDKt5BohmXUSY+O9QJgToeN9qlKAG72cEIbk6FNU3a88RDQebhx+0IwCwcgMNydnOYpO X-Gm-Message-State: AOJu0YxxVW2N/1Dwe/N4bi1KrgvMGq/jfmjG3q2lroNs6ab3GGhEQL4y l5jbn+v76fbYnTP7RrL0epUTw0QRJjLAZQ8wZFKX38w7aS//ho/kojmfPWmWopQ= X-Google-Smtp-Source: AGHT+IEzXnWfV21NXBOZIFGvF02twSyBteR8js/V5WflDWgKdloxj7LQXfsLPnQTtijYpqDXRcZSTA== X-Received: by 2002:a05:6870:d6a2:b0:21e:a40e:7465 with SMTP id z34-20020a056870d6a200b0021ea40e7465mr11616026oap.24.1710279878811; Tue, 12 Mar 2024 14:44:38 -0700 (PDT) Received: from localhost (fwdproxy-prn-006.fbsv.net. [2a03:2880:ff:6::face:b00c]) by smtp.gmail.com with ESMTPSA id g17-20020aa79f11000000b006e6b41511fdsm428082pfr.94.2024.03.12.14.44.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:38 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 05/16] io_uring: introduce interface queue Date: Tue, 12 Mar 2024 14:44:19 -0700 Message-ID: <20240312214430.2923019-6-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC From: David Wei This patch introduces a new object in io_uring called an interface queue (ifq) which contains: * A pool region allocated by userspace and registered w/ io_uring where Rx data is transferred to via DMA. * A net device and one specific Rx queue in it that will be configured for ZC Rx. * A new shared ringbuf w/ userspace called a refill ring. When userspace is done with bufs with Rx packet payloads, it writes entries into this ring to tell the kernel that bufs can be re-used by the NIC again. Each entry in the refill ring is a struct io_uring_rbuf_rqe. On the completion side, the main CQ ring is used to notify userspace of recv()'d packets. Big CQEs (32 bytes) are required to support this, as the upper 16 bytes are used by ZC Rx to store a feature specific struct io_uring_rbuf_cqe. Add two new struct types: 1. io_uring_rbuf_rqe - entry in refill ring 2. io_uring_rbuf_cqe - entry in upper 16 bytes of a big CQE For now, each io_uring instance has a single ifq, and each ifq has a single pool region associated with one Rx queue. Add a new opcode and functions to setup and tear down an ifq. Size and offsets of the shared refill ring are returned to userspace for it to mmap in the registration struct io_uring_zc_rx_ifq_reg, similar to the main SQ/CQ rings. Signed-off-by: David Wei --- include/linux/io_uring_types.h | 4 ++ include/uapi/linux/io_uring.h | 40 ++++++++++++ io_uring/Makefile | 3 +- io_uring/io_uring.c | 7 +++ io_uring/register.c | 7 +++ io_uring/zc_rx.c | 109 +++++++++++++++++++++++++++++++++ io_uring/zc_rx.h | 35 +++++++++++ 7 files changed, 204 insertions(+), 1 deletion(-) create mode 100644 io_uring/zc_rx.c create mode 100644 io_uring/zc_rx.h diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 500772189fee..27e750a02ea5 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -39,6 +39,8 @@ enum io_uring_cmd_flags { IO_URING_F_COMPAT = (1 << 12), }; +struct io_zc_rx_ifq; + struct io_wq_work_node { struct io_wq_work_node *next; }; @@ -385,6 +387,8 @@ struct io_ring_ctx { struct io_rsrc_data *file_data; struct io_rsrc_data *buf_data; + struct io_zc_rx_ifq *ifq; + /* protected by ->uring_lock */ struct list_head rsrc_ref_list; struct io_alloc_cache rsrc_node_cache; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 7bd10201a02b..7b643fe420c5 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -575,6 +575,9 @@ enum { IORING_REGISTER_NAPI = 27, IORING_UNREGISTER_NAPI = 28, + /* register a network interface queue for zerocopy */ + IORING_REGISTER_ZC_RX_IFQ = 29, + /* this goes last */ IORING_REGISTER_LAST, @@ -782,6 +785,43 @@ enum { SOCKET_URING_OP_SETSOCKOPT, }; +struct io_uring_rbuf_rqe { + __u32 off; + __u32 len; + __u16 region; + __u8 __pad[6]; +}; + +struct io_uring_rbuf_cqe { + __u32 off; + __u32 len; + __u16 region; + __u8 __pad[6]; +}; + +struct io_rbuf_rqring_offsets { + __u32 head; + __u32 tail; + __u32 rqes; + __u8 __pad[4]; +}; + +/* + * Argument for IORING_REGISTER_ZC_RX_IFQ + */ +struct io_uring_zc_rx_ifq_reg { + __u32 if_idx; + /* hw rx descriptor ring id */ + __u32 if_rxq_id; + __u32 region_id; + __u32 rq_entries; + __u32 flags; + __u16 cpu; + + __u32 mmap_sz; + struct io_rbuf_rqring_offsets rq_off; +}; + #ifdef __cplusplus } #endif diff --git a/io_uring/Makefile b/io_uring/Makefile index 2e1d4e03799c..bb47231c611b 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -8,7 +8,8 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \ statx.o net.o msg_ring.o timeout.o \ sqpoll.o fdinfo.o tctx.o poll.o \ cancel.o kbuf.o rsrc.o rw.o opdef.o \ - notif.o waitid.o register.o truncate.o + notif.o waitid.o register.o truncate.o \ + zc_rx.o obj-$(CONFIG_IO_WQ) += io-wq.o obj-$(CONFIG_FUTEX) += futex.o obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index e44c2ef271b9..5614c47cecd9 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -95,6 +95,7 @@ #include "waitid.h" #include "futex.h" #include "napi.h" +#include "zc_rx.h" #include "timeout.h" #include "poll.h" @@ -2861,6 +2862,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) return; mutex_lock(&ctx->uring_lock); + io_unregister_zc_rx_ifqs(ctx); if (ctx->buf_data) __io_sqe_buffers_unregister(ctx); if (ctx->file_data) @@ -3032,6 +3034,11 @@ static __cold void io_ring_exit_work(struct work_struct *work) io_cqring_overflow_kill(ctx); mutex_unlock(&ctx->uring_lock); } + if (ctx->ifq) { + mutex_lock(&ctx->uring_lock); + io_shutdown_zc_rx_ifqs(ctx); + mutex_unlock(&ctx->uring_lock); + } if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) io_move_task_work_from_local(ctx); diff --git a/io_uring/register.c b/io_uring/register.c index 99c37775f974..760f0b6a051c 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -27,6 +27,7 @@ #include "cancel.h" #include "kbuf.h" #include "napi.h" +#include "zc_rx.h" #define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \ IORING_REGISTER_LAST + IORING_OP_LAST) @@ -563,6 +564,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_unregister_napi(ctx, arg); break; + case IORING_REGISTER_ZC_RX_IFQ: + ret = -EINVAL; + if (!arg || nr_args != 1) + break; + ret = io_register_zc_rx_ifq(ctx, arg); + break; default: ret = -EINVAL; break; diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c new file mode 100644 index 000000000000..e6c33f94c086 --- /dev/null +++ b/io_uring/zc_rx.c @@ -0,0 +1,109 @@ +// SPDX-License-Identifier: GPL-2.0 +#if defined(CONFIG_PAGE_POOL) +#include +#include +#include +#include + +#include + +#include "io_uring.h" +#include "kbuf.h" +#include "zc_rx.h" + +static int io_allocate_rbuf_ring(struct io_zc_rx_ifq *ifq, + struct io_uring_zc_rx_ifq_reg *reg) +{ + gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; + size_t off, rq_size; + void *ptr; + + off = sizeof(struct io_uring); + rq_size = reg->rq_entries * sizeof(struct io_uring_rbuf_rqe); + ptr = (void *) __get_free_pages(gfp, get_order(off + rq_size)); + if (!ptr) + return -ENOMEM; + ifq->rq_ring = (struct io_uring *)ptr; + ifq->rqes = (struct io_uring_rbuf_rqe *)((char *)ptr + off); + return 0; +} + +static void io_free_rbuf_ring(struct io_zc_rx_ifq *ifq) +{ + if (ifq->rq_ring) + folio_put(virt_to_folio(ifq->rq_ring)); +} + +static struct io_zc_rx_ifq *io_zc_rx_ifq_alloc(struct io_ring_ctx *ctx) +{ + struct io_zc_rx_ifq *ifq; + + ifq = kzalloc(sizeof(*ifq), GFP_KERNEL); + if (!ifq) + return NULL; + + ifq->if_rxq_id = -1; + ifq->ctx = ctx; + return ifq; +} + +static void io_zc_rx_ifq_free(struct io_zc_rx_ifq *ifq) +{ + io_free_rbuf_ring(ifq); + kfree(ifq); +} + +int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_ifq_reg __user *arg) +{ + struct io_uring_zc_rx_ifq_reg reg; + struct io_zc_rx_ifq *ifq; + int ret; + + if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN && + ctx->flags & IORING_SETUP_CQE32)) + return -EINVAL; + if (copy_from_user(®, arg, sizeof(reg))) + return -EFAULT; + if (ctx->ifq) + return -EBUSY; + if (reg.if_rxq_id == -1) + return -EINVAL; + + ifq = io_zc_rx_ifq_alloc(ctx); + if (!ifq) + return -ENOMEM; + + ret = io_allocate_rbuf_ring(ifq, ®); + if (ret) + goto err; + + ifq->rq_entries = reg.rq_entries; + ifq->if_rxq_id = reg.if_rxq_id; + ctx->ifq = ifq; + + return 0; +err: + io_zc_rx_ifq_free(ifq); + return ret; +} + +void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx) +{ + struct io_zc_rx_ifq *ifq = ctx->ifq; + + lockdep_assert_held(&ctx->uring_lock); + + if (!ifq) + return; + + ctx->ifq = NULL; + io_zc_rx_ifq_free(ifq); +} + +void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx) +{ + lockdep_assert_held(&ctx->uring_lock); +} + +#endif diff --git a/io_uring/zc_rx.h b/io_uring/zc_rx.h new file mode 100644 index 000000000000..35b019b275e0 --- /dev/null +++ b/io_uring/zc_rx.h @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef IOU_ZC_RX_H +#define IOU_ZC_RX_H + +struct io_zc_rx_ifq { + struct io_ring_ctx *ctx; + struct net_device *dev; + struct io_uring *rq_ring; + struct io_uring_rbuf_rqe *rqes; + u32 rq_entries; + + /* hw rx descriptor ring id */ + u32 if_rxq_id; +}; + +#if defined(CONFIG_PAGE_POOL) +int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_ifq_reg __user *arg); +void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx); +void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx); +#else +static inline int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_ifq_reg __user *arg) +{ + return -EOPNOTSUPP; +} +static inline void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx) +{ +} +static inline void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx) +{ +} +#endif + +#endif From patchwork Tue Mar 12 21:44:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590636 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B08B1448E4 for ; Tue, 12 Mar 2024 21:44:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279881; cv=none; b=R5TcfmQzzEWLysFs9pCz7w6Im/vcYxGA+5JKVWTpnV/r2YKKaA25wWX97Z9/RxGrAU9mO+RbhhoB2YhyxHP+hitWD0ketQxDlUjvQ1eKoAWzib7IPptEuKVKY6R34If3ztqHrgHazk2jsRJqvX1BRHeOHRUgSa8gpUNcU1EpqkA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279881; c=relaxed/simple; bh=Au5TaJwti+/pCSZs4du2poKs+p8m7iLV6MJss0mMKm0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nCmwZZ4m+l+pdnhSKeefF8UK6QOlH+EKbhjBVdwOe64dVQuJB8UFQRrDXl2rET27sEwRNa2nSgBq30eZkzj5A2B1RDNlgMhllisAj6+M++6v1RhONB2UVJnO/4dRPMg4nvcIp/qedxqe4o+yHtr5Xaoix5PjJXYSvQDUA1sbUHM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=idQWWBiN; arc=none smtp.client-ip=209.85.215.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="idQWWBiN" Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-5c66b093b86so224394a12.0 for ; Tue, 12 Mar 2024 14:44:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279880; x=1710884680; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lDipnnybmoHT+VNKp8P+1c7yNG5XeUpEUaaGmNJ3fNs=; b=idQWWBiNJBfxT+sV6Gyi1lhuJMVgLK22FwC7eMbOqW7uQMwEv8IgBcr1qNec7WHLrD IuPLnJgCHiVoGZJQqHM1cL4FZg6evIWVhz3yUfu2mXOjdr2SbaJVGASB7/079j5whzAz m/z4HUBoR8v7f++3hDCFcYFVSaZxUYv+P3gByeNCbRRkAxMOBtmuW1Z+guRS0dkrjzf9 EQ73EN0Q0EQ/H7bxs+TCi+zjWWMEvINQcOVJ4i7v7V5595HMwEBvX+fZTZk+D9C84CCe nBBabPwKnnrY62S4KKvx+KUd4NgQ+g0BQ0kAUvZ8ZrRz+LQrrFStRBFmOiDTZIPtnSIp 5RkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279880; x=1710884680; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lDipnnybmoHT+VNKp8P+1c7yNG5XeUpEUaaGmNJ3fNs=; b=wPWNNbBRYRsWYonTNXASLlwHl0lFmIjAEXHt/+8rw4pFhHdzCRSsmlf/7T/y9VNXfT tVniVZa2NxCGbTSjb57T598+LvFHjHxN6Gy6rD+iGN0V+E4S11Jo9vRix1Y1u+pmo7Ej doJSrEvcNF5XsyB4bPDsLh6GwK8yEgEiHP/kYVJDFpqWQiZSZWTHWvB/0ZD1IcwMlQnv DRuLScI5rY1Yw5qVvOuan8AN3RR+g62RdftmaPLWXE6jwiCHB5Oj1N6zDWybonwC9784 HfRnUvuZ2LsI804yDIvZ8nkwi3xYrqkwOxyQMNvnEn9LD2K1UKEIkBfvKZfgHUdoEAiQ 9jZg== X-Forwarded-Encrypted: i=1; AJvYcCWD2U74QVxk825xnte4LDHggGY8DR8SUgpApoFRYCN1tAGg4Izag1YYBvcMDUNQfXVKjyDS4dhaBxgopDvoFzB5kOriV3yN X-Gm-Message-State: AOJu0YwecXCGG0DZaB2P+gvOk2jKqYjGxZCtdQyhQtM1y9Y4gq8/lIf1 VMvk4k1FxvO+H61nN4Gdz9rAOisb1KpnDZuxkBq5AOw5qAPPiCKJyxSQ20389ug= X-Google-Smtp-Source: AGHT+IGXFFMsyn12TGJpkd/F1krEy783KFAWHV6gkAUHyjREN82kjk/2T8EtM1scBgtGbhvLUHDvOg== X-Received: by 2002:a17:90a:bb85:b0:29b:f0e8:6454 with SMTP id v5-20020a17090abb8500b0029bf0e86454mr1012593pjr.21.1710279879717; Tue, 12 Mar 2024 14:44:39 -0700 (PDT) Received: from localhost (fwdproxy-prn-008.fbsv.net. [2a03:2880:ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id fr12-20020a17090ae2cc00b0029c472ec962sm50922pjb.47.2024.03.12.14.44.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:39 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 06/16] io_uring: add mmap support for shared ifq ringbuffers Date: Tue, 12 Mar 2024 14:44:20 -0700 Message-ID: <20240312214430.2923019-7-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC From: David Wei This patch adds mmap support for ifq refill ring. Just like the io_uring SQ/CQ rings, userspace issues a single mmap call using the io_uring fd w/ magic offset IORING_OFF_RQ_RING. An opaque ptr is returned to userspace, which is then expected to use the offsets returned in the registration struct to get access to the head/tail and rings. Signed-off-by: David Wei --- include/uapi/linux/io_uring.h | 2 ++ io_uring/io_uring.c | 5 +++++ io_uring/zc_rx.c | 15 ++++++++++++++- 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 7b643fe420c5..a085ed60478f 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -438,6 +438,8 @@ enum { #define IORING_OFF_PBUF_RING 0x80000000ULL #define IORING_OFF_PBUF_SHIFT 16 #define IORING_OFF_MMAP_MASK 0xf8000000ULL +#define IORING_OFF_RQ_RING 0x20000000ULL +#define IORING_OFF_RQ_SHIFT 16 /* * Filled with the offset for mmap(2) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 5614c47cecd9..280f2a2fd1fe 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3434,6 +3434,11 @@ static void *io_uring_validate_mmap_request(struct file *file, return ERR_PTR(-EINVAL); break; } + case IORING_OFF_RQ_RING: + if (!ctx->ifq) + return ERR_PTR(-EINVAL); + ptr = ctx->ifq->rq_ring; + break; default: return ERR_PTR(-EINVAL); } diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index e6c33f94c086..6987bb991418 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -58,6 +58,7 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, { struct io_uring_zc_rx_ifq_reg reg; struct io_zc_rx_ifq *ifq; + size_t ring_sz, rqes_sz; int ret; if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN && @@ -80,8 +81,20 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, ifq->rq_entries = reg.rq_entries; ifq->if_rxq_id = reg.if_rxq_id; - ctx->ifq = ifq; + ring_sz = sizeof(struct io_uring); + rqes_sz = sizeof(struct io_uring_rbuf_rqe) * ifq->rq_entries; + reg.mmap_sz = ring_sz + rqes_sz; + reg.rq_off.rqes = ring_sz; + reg.rq_off.head = offsetof(struct io_uring, head); + reg.rq_off.tail = offsetof(struct io_uring, tail); + + if (copy_to_user(arg, ®, sizeof(reg))) { + ret = -EFAULT; + goto err; + } + + ctx->ifq = ifq; return 0; err: io_zc_rx_ifq_free(ifq); From patchwork Tue Mar 12 21:44:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590637 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-oo1-f50.google.com (mail-oo1-f50.google.com [209.85.161.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CAE4E1448F1 for ; Tue, 12 Mar 2024 21:44:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279883; cv=none; b=W6/jWA05W56bs6ADW3Ym4WJBHgY1zb8UU7WPzNRkNzzAEkuOEt0bjb+6vwAJLEIdg1RBRoI1O8iMMWqRzcVrsYQROGS9uY0XZ+A4Eltwh9aBrDKlca3KFSNXP4uqblD+u28/aj6uumhoJ7aRCDI+VzsdVmLmbLADbesD/CAQk9o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279883; c=relaxed/simple; bh=tt7Qj0Ee9P7w5WJse4ZzKwHZdEsDt8EZR8C9Xbepvys=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n5m482+3Nq7RACpsJzOQybg4TmYve5pMw6saXRl685q7t3XfUr++OEIDN44K2Zv9AoyXL/JwVRgXVvQRWACClnN7JuZtv8QCzoocj7BW9987rvZZJDu6jpS1J5LNqYY1JqZJ0b5PVkm0qNb7/LNKIFYLL1SSYTBD/vrIOS/Pyhk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=fqUNScge; arc=none smtp.client-ip=209.85.161.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="fqUNScge" Received: by mail-oo1-f50.google.com with SMTP id 006d021491bc7-5a1cf769452so2445988eaf.1 for ; Tue, 12 Mar 2024 14:44:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279881; x=1710884681; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RSdOpXaG5eyYtOQK8HIj1OSb+RRKm9zsEqtIIKixBPg=; b=fqUNScgesnLybc05nt4hihQfbQEVPk/gVfDFIKICiCoxBkx0uYn08ySxfzYQM7lBLt QevpTx6gdQqgfjuP90QP+X4ol82khu6c8ACAjl6vgmUfzanfd1KNnblMupSFTSW7CtsU IasXaTpB5kesL3dbINRVjnH7MJjdLA1yQQBh/U/lg6TW7aKylC9oz0BK94PQcX5Ituat usDMzKjzuZnSKggGQEtcdOQvfow7QbXDKdWR1YOjDMpmCjyM4ZjlNV7ZXJVzoZAyBWyz TeCdBmLp50TjIMciCOk6nYsq5ii9E4Z5DI6MKQuiII+6HdEQrnMBr5PUcW5XybXComRp rXUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279881; x=1710884681; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RSdOpXaG5eyYtOQK8HIj1OSb+RRKm9zsEqtIIKixBPg=; b=TSsK0TBJZ28FeLYZFaeB5ar/3WFKpVLsRx/oA6SmEltwCLm7QW9OqoiXu4gcluy+bT 4ZK8sqx5xhS20o6/n3IK1Y9z6yW60fycUe7UsZzR1hvGEScZjBVL5hDCJFJsfWRsE4KT XQ2Qxu5Ng0Bjj6jPlR7isfNAx6BC4Zt6MVJtMWpowpuhMz7ihueBq3hUmqJW/jhaOgbX Drice79B/fFbbhigW5OVNviH8XFBm4XqPmGD/KHfVUmvb3eSprFVI6w/31P4BXXVaD/L HofB2ZbjlhHVZ/gx6opTgL+UjHbbWjPU54P8lpvHWPVcFIw5yfQgbfgaV5eaWAHMbOG9 ENwQ== X-Forwarded-Encrypted: i=1; AJvYcCWakqMkV2kcOI3bmJnmzyKkzsnGyYiqKK7Rfghq+L9YVnXjZrAf4mCpPPVK5ZQBo8vz7udmDl7NniDe7ft3sxwF1Sxee8zV X-Gm-Message-State: AOJu0YyJppwXyrUBfdmY0jg+5u5BXpS0GOfe7K2yIZT4Q4xIIiol10aJ 1MIu8WUgQObp4IZPiVuJWC7+RjoCA/IPyMJagewHEMOPzuGmrS9Z4DwDzRC/ouU= X-Google-Smtp-Source: AGHT+IEyE7hAxIEltzuiBpQ/u1B4tId9l6LVAuyoHvYl8QxHH1psuFJRa4N+p/CejwhJ97/E83Htrg== X-Received: by 2002:a05:6358:890:b0:17e:8b57:df56 with SMTP id m16-20020a056358089000b0017e8b57df56mr3208554rwj.5.1710279880701; Tue, 12 Mar 2024 14:44:40 -0700 (PDT) Received: from localhost (fwdproxy-prn-025.fbsv.net. [2a03:2880:ff:19::face:b00c]) by smtp.gmail.com with ESMTPSA id i33-20020a635421000000b005d880b41598sm6475388pgb.94.2024.03.12.14.44.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:40 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 07/16] netdev: add XDP_SETUP_ZC_RX command Date: Tue, 12 Mar 2024 14:44:21 -0700 Message-ID: <20240312214430.2923019-8-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: David Wei RFC only, not for upstream. This will be replaced with a separate ndo callback or some other mechanism in next patchset revisions. This patch adds a new XDP_SETUP_ZC_RX command that will be used in a later patch to enable or disable ZC RX for a specific RX queue. We are open to suggestions on a better way of doing this. Google's TCP devmem proposal sets up struct netdev_rx_queue which persists across device reset, then expects userspace to use an out-of-band method (e.g. ethtool) to reset the device, thus re-filling a hardware Rx queue. Signed-off-by: David Wei --- include/linux/netdevice.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ac7102118d68..699cce69a5a6 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1009,6 +1009,7 @@ enum bpf_netdev_command { BPF_OFFLOAD_MAP_ALLOC, BPF_OFFLOAD_MAP_FREE, XDP_SETUP_XSK_POOL, + XDP_SETUP_ZC_RX, }; struct bpf_prog_offload_ops; @@ -1047,6 +1048,11 @@ struct netdev_bpf { struct xsk_buff_pool *pool; u16 queue_id; } xsk; + /* XDP_SETUP_ZC_RX */ + struct { + struct io_zc_rx_ifq *ifq; + u16 queue_id; + } zc_rx; }; }; From patchwork Tue Mar 12 21:44:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590638 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 578651448E1 for ; Tue, 12 Mar 2024 21:44:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279883; cv=none; b=en/z/z34IMyvnZ/anGr5iGfrYK0WUAiPIvBcriZYrJUbrqt0qg7uRawRZnMdCHoho35zyt2H71TFHSG9WaKltK5mZ7bsb38MSMboS4T+ZDpvMbAjUgj5DP82FNow6s2w8f9pP4vghiXMg0OBeeLz2LMsvJsq8d+XwEZNKJopC0c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279883; c=relaxed/simple; bh=B6RRKQIm2WKzMYtzsA77lI1a0FvklUtMlXfjfN0wIbw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VNIp9X5ejHtEKHSUM0H/9lHv9+hD491Yi5U5Zwskr1bLQW6M01XiYnFkmxhrH92XtKjvtXHRc49krgTMGfKTV6vCV5KYCA2MxqbO4DuybY5ye23OJNJIjSjuILdPbL+74d5tRtiw9lXaLQ8UkgwIcBgyrkxIo4dEkUN+ZEMlpCc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=JkleoP2c; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="JkleoP2c" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1dd9066b7c3so3118385ad.2 for ; Tue, 12 Mar 2024 14:44:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279882; x=1710884682; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rj0cYgShXYH5VhmSnGepet0UkxqjvpM473hZF/0XZ3c=; b=JkleoP2ciwhG1a0lGMoEm507wXF8Rl4UE4M6rxhXOsvkkiMXsDIzDMZXMbq5xXteJ4 r7xqc8Wg6WtHDXh3HUEK0pPzqVXK1CxkyVNM6dm8aI6MFHyF21yrdx52HPuoehDCYeZB ml5/Ls6v2tI1yb4/TPBmpzXXa3F/KGwrXfJFVgtNKp8njXrWz+m6XfDpqQwlhFx3ooev Mv2ip/9QeQuZy4pOgaBk/66zok6u9+pYm//2jbqgeyphMT2XTIuxxoexCRcEYPtgWEYv BoeO31snbw/44oXtBy+sD65qfkd/ND0lQzgWBfDbqD255LQahE0xumGXvgRWR96zeaRd RVtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279882; x=1710884682; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rj0cYgShXYH5VhmSnGepet0UkxqjvpM473hZF/0XZ3c=; b=vd7AeZIGTIC9KqXR/6iSCE+GiCl8umRaZsO01rJOUyesduZUTnalxbN+cYlNacKzaE SAku6UYmp8HM/jfDxZBVQWT2USlUTVqxI+i2yXdbRoATs042na3/ywmh6z3+jH6vl9MO G15BU6P/uCFwgNaNImWLlnyN0k7GyHyiEQD0p9lEcLFUVnCTRXvE6mUI8+s/eiAyfuQA GWNIRXNjORSNY0P9ld3ol9J5JCQeeERV/S1w/ZPcWRfO9jrWCOlAh5g4LEgZh1uL1/v9 oT8Rv2UPpKoSG+oeWfTQIPRvcHx+quOtLPnXe7N2iJ39ZMl7VLZIJmv8Cg9DPFRYtBFI gAYQ== X-Forwarded-Encrypted: i=1; AJvYcCWXiEf+JvH0XUqjkiut4ZovkdcSPJZxtw0Bn9efRmEUQGEJbMHwHx0q7P4gMeY2WUKEx+DItdmrkPDt2NkquUQ4zKhCSKUq X-Gm-Message-State: AOJu0YyXXRfvmaNqWbl02VD6/DJp58jvbz7mcs50DZzZSipzUAnYCw5S vsZW2gKnaaheFpZZs0sr+1Imf4s36cpGfvx4enD9hdPPS98G3W0rXI9Y/O38gjRVL1XRsmi33qZ V X-Google-Smtp-Source: AGHT+IHXWW1fnHlG+Rl1vWCNUw0xrcw6qZRHYrzDZb+F06rysW+Ft1xEAXTu1TEH4AuY3fi9w6AfTg== X-Received: by 2002:a17:902:f68b:b0:1db:e74b:5bbf with SMTP id l11-20020a170902f68b00b001dbe74b5bbfmr5003902plg.0.1710279881660; Tue, 12 Mar 2024 14:44:41 -0700 (PDT) Received: from localhost (fwdproxy-prn-017.fbsv.net. [2a03:2880:ff:11::face:b00c]) by smtp.gmail.com with ESMTPSA id u12-20020a170902e80c00b001ddb73e719dsm2049257plg.27.2024.03.12.14.44.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:41 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 08/16] io_uring: setup ZC for an Rx queue when registering an ifq Date: Tue, 12 Mar 2024 14:44:22 -0700 Message-ID: <20240312214430.2923019-9-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC From: David Wei RFC only, not for upstream Just as with the previous patch, it will be migrated from ndo_bpf This patch sets up ZC for an Rx queue in a net device when an ifq is registered with io_uring. The Rx queue is specified in the registration struct. For now since there is only one ifq, its destruction is implicit during io_uring cleanup. Signed-off-by: David Wei --- io_uring/zc_rx.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index 6987bb991418..521eeea04f9d 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -4,6 +4,7 @@ #include #include #include +#include #include @@ -11,6 +12,34 @@ #include "kbuf.h" #include "zc_rx.h" +typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); + +static int __io_queue_mgmt(struct net_device *dev, struct io_zc_rx_ifq *ifq, + u16 queue_id) +{ + struct netdev_bpf cmd; + bpf_op_t ndo_bpf; + + ndo_bpf = dev->netdev_ops->ndo_bpf; + if (!ndo_bpf) + return -EINVAL; + + cmd.command = XDP_SETUP_ZC_RX; + cmd.zc_rx.ifq = ifq; + cmd.zc_rx.queue_id = queue_id; + return ndo_bpf(dev, &cmd); +} + +static int io_open_zc_rxq(struct io_zc_rx_ifq *ifq) +{ + return __io_queue_mgmt(ifq->dev, ifq, ifq->if_rxq_id); +} + +static int io_close_zc_rxq(struct io_zc_rx_ifq *ifq) +{ + return __io_queue_mgmt(ifq->dev, NULL, ifq->if_rxq_id); +} + static int io_allocate_rbuf_ring(struct io_zc_rx_ifq *ifq, struct io_uring_zc_rx_ifq_reg *reg) { @@ -49,6 +78,10 @@ static struct io_zc_rx_ifq *io_zc_rx_ifq_alloc(struct io_ring_ctx *ctx) static void io_zc_rx_ifq_free(struct io_zc_rx_ifq *ifq) { + if (ifq->if_rxq_id != -1) + io_close_zc_rxq(ifq); + if (ifq->dev) + dev_put(ifq->dev); io_free_rbuf_ring(ifq); kfree(ifq); } @@ -79,9 +112,18 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, if (ret) goto err; + ret = -ENODEV; + ifq->dev = dev_get_by_index(current->nsproxy->net_ns, reg.if_idx); + if (!ifq->dev) + goto err; + ifq->rq_entries = reg.rq_entries; ifq->if_rxq_id = reg.if_rxq_id; + ret = io_open_zc_rxq(ifq); + if (ret) + goto err; + ring_sz = sizeof(struct io_uring); rqes_sz = sizeof(struct io_uring_rbuf_rqe) * ifq->rq_entries; reg.mmap_sz = ring_sz + rqes_sz; @@ -90,6 +132,7 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, reg.rq_off.tail = offsetof(struct io_uring, tail); if (copy_to_user(arg, ®, sizeof(reg))) { + io_close_zc_rxq(ifq); ret = -EFAULT; goto err; } From patchwork Tue Mar 12 21:44:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590639 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41FD5145333 for ; Tue, 12 Mar 2024 21:44:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279884; cv=none; b=I4Yb8rUO1/hBpOdmH76V72HedO+gkMBUT3fuOfkhAJlxMFII5anAcUMM5hHY+E9slGOu4MADv2DbIVEm2H3j+iVQ35atPrXtlkNLPddtAZmrMKnLHggerQhJxcKyp1S6RNWq2NT9euWOdMrv91LiWxMcaZnxpyk4Fq10sNw+wXw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279884; c=relaxed/simple; bh=jq3KJoKRpx3fmCjnE4Ssin7zDmiZaX04toiitKi+c6k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fv37uS72iWKsLoDXaPQBK8nBFIztlyIsEJ+jAp2SWxK7SbwcJt+63DATGvAKHyiZwxkkVBePG3F8rV3JWk7L5u6904R8emdPCukljzTC7s6S7snYqBvl4QJ9qfhw7ID+F5ON+1Ssn/pHNzS3v3d+LUPWEagrS7ZVJ8PU0sJ6IU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=kuW/KzYd; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="kuW/KzYd" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-6e6b22af648so239824b3a.0 for ; Tue, 12 Mar 2024 14:44:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279883; x=1710884683; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1e150jmPmcmnhUwqwkgkSEpSo+j9RzHXKffT0azea6k=; b=kuW/KzYdlboaUiJr8cZG++jpTJOCEXicDox+JBgn0WvYs0sBYaMLZRHG9jjMxOyDv/ FEdbUy7HibOvi6hjxXi/LypGDibkCVIJfqu2/vvCF/0HtGlCtgIsxsN2pp4loDGuqsHj PC0ETXP87eEOXH6n+8Rp9vTX9BibUXDnDRbFF5/fUanThkc7Mb2TKcUv4zMB2BswQwWO yHCZdA+caFAHLSKoYV0I+QH0JJPQU5aKorJCBugvbgXjBJRpVJlMOmfB87fnrCe9nBkU lljI/K0YwsqjJSacGWFT7mGiicBJAmB2kz5aQTrWsZDkSRl5xmwpRYsUFQeDkxQh23Da 4GjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279883; x=1710884683; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1e150jmPmcmnhUwqwkgkSEpSo+j9RzHXKffT0azea6k=; b=Rn9WvwtNdhdjcJOvRwriQvF+ncN4aBu5lmj48GWLKPY5ktIuXXEubuxNe6IZDOz14g Vsho8NGIphL5iEEz2YY5KP9ht/7Pm5nOjePJuDiGJtYhtWjj7c8aXeXcWbl30w92SwNh +rxOBs3QHyLjW7TB5AhajTNgaSxiYxPXbvZhcSOYJhatW2oSN8paBsEIM40YUy45NA1d Op2L42nKXBGibV/EfLMcbuQHJEhdBDADhDOqRQK/vx7Gjq/p9YTgAmK29d53o1ff5CM9 mWNbZJwgMyV4lUpuqCo+CD4etUOsRrxMYCH2ou0apZQArS0j5z7u2u/r8IoA7/FTDbMs DSzA== X-Forwarded-Encrypted: i=1; AJvYcCV4OkndiB+WK5ue1QbL5KSHT87yCtjmYSIhGI9AmwftbOTU8royzpFhgbu8UpcEu4Lby+hOLfHrVJ1qnG4u6DQoLkJx8vaR X-Gm-Message-State: AOJu0YzEr0cpeqNrjbXlxoZ9jCZX1BYJn46zCFD/XE1UUVkwvjEVstPc YkR7CY04fplEYByaH5VezUJ+CvbvR2ieEAqkOWmsHG1UJbsrEpV+HZzgBXiu6l2mJYpuRJtCT0n W X-Google-Smtp-Source: AGHT+IHOcvscYPKCK8J8fb8jWUvnptRIGd04d9aE24Qi4deIrOQGZ1H1ukCkgOC6LYYroq04Cm9CVQ== X-Received: by 2002:a05:6a20:3d02:b0:1a1:841a:33ef with SMTP id y2-20020a056a203d0200b001a1841a33efmr1130568pzi.3.1710279882656; Tue, 12 Mar 2024 14:44:42 -0700 (PDT) Received: from localhost (fwdproxy-prn-006.fbsv.net. [2a03:2880:ff:6::face:b00c]) by smtp.gmail.com with ESMTPSA id w2-20020aa79a02000000b006e6931a50e8sm4177179pfj.79.2024.03.12.14.44.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:42 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 09/16] io_uring/zcrx: implement socket registration Date: Tue, 12 Mar 2024 14:44:23 -0700 Message-ID: <20240312214430.2923019-10-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Pavel Begunkov We want userspace to explicitly list all sockets it'll be using with a particular zc ifq, so we can properly configure them, e.g. binding the sockets to the corresponding interface and setting steering rules. We'll also need it to better control ifq lifetime and for termination / unregistration purposes. TODO: remove zc_rx_idx from struct socket, which will fix zc_rx_idx token init races and re-registration bug. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/linux/net.h | 2 + include/uapi/linux/io_uring.h | 7 +++ io_uring/net.c | 20 ++++++++ io_uring/register.c | 6 +++ io_uring/zc_rx.c | 91 +++++++++++++++++++++++++++++++++-- io_uring/zc_rx.h | 17 +++++++ net/socket.c | 1 + 7 files changed, 141 insertions(+), 3 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index c9b4a63791a4..867061a91d30 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -126,6 +126,8 @@ struct socket { const struct proto_ops *ops; /* Might change with IPV6_ADDRFORM or MPTCP. */ struct socket_wq wq; + + unsigned zc_rx_idx; }; /* diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index a085ed60478f..26e945e6258d 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -579,6 +579,7 @@ enum { /* register a network interface queue for zerocopy */ IORING_REGISTER_ZC_RX_IFQ = 29, + IORING_REGISTER_ZC_RX_SOCK = 30, /* this goes last */ IORING_REGISTER_LAST, @@ -824,6 +825,12 @@ struct io_uring_zc_rx_ifq_reg { struct io_rbuf_rqring_offsets rq_off; }; +struct io_uring_zc_rx_sock_reg { + __u32 sockfd; + __u32 zc_rx_ifq_idx; + __u32 __resv[2]; +}; + #ifdef __cplusplus } #endif diff --git a/io_uring/net.c b/io_uring/net.c index 54dff492e064..1fa7c1fa6b5d 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -16,6 +16,7 @@ #include "net.h" #include "notif.h" #include "rsrc.h" +#include "zc_rx.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -1033,6 +1034,25 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return ret; } +static __maybe_unused +struct io_zc_rx_ifq *io_zc_verify_sock(struct io_kiocb *req, + struct socket *sock) +{ + unsigned token = READ_ONCE(sock->zc_rx_idx); + unsigned ifq_idx = token >> IO_ZC_IFQ_IDX_OFFSET; + unsigned sock_idx = token & IO_ZC_IFQ_IDX_MASK; + struct io_zc_rx_ifq *ifq; + + if (ifq_idx) + return NULL; + ifq = req->ctx->ifq; + if (!ifq || sock_idx >= ifq->nr_sockets) + return NULL; + if (ifq->sockets[sock_idx] != req->file) + return NULL; + return ifq; +} + void io_send_zc_cleanup(struct io_kiocb *req) { struct io_sr_msg *zc = io_kiocb_to_cmd(req, struct io_sr_msg); diff --git a/io_uring/register.c b/io_uring/register.c index 760f0b6a051c..7f40403a1716 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -570,6 +570,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_zc_rx_ifq(ctx, arg); break; + case IORING_REGISTER_ZC_RX_SOCK: + ret = -EINVAL; + if (!arg || nr_args != 1) + break; + ret = io_register_zc_rx_sock(ctx, arg); + break; default: ret = -EINVAL; break; diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index 521eeea04f9d..77459c0fc14b 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -5,12 +5,15 @@ #include #include #include +#include +#include #include #include "io_uring.h" #include "kbuf.h" #include "zc_rx.h" +#include "rsrc.h" typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); @@ -76,10 +79,31 @@ static struct io_zc_rx_ifq *io_zc_rx_ifq_alloc(struct io_ring_ctx *ctx) return ifq; } -static void io_zc_rx_ifq_free(struct io_zc_rx_ifq *ifq) +static void io_shutdown_ifq(struct io_zc_rx_ifq *ifq) { - if (ifq->if_rxq_id != -1) + int i; + + if (!ifq) + return; + + for (i = 0; i < ifq->nr_sockets; i++) { + if (ifq->sockets[i]) { + fput(ifq->sockets[i]); + ifq->sockets[i] = NULL; + } + } + ifq->nr_sockets = 0; + + if (ifq->if_rxq_id != -1) { io_close_zc_rxq(ifq); + ifq->if_rxq_id = -1; + } +} + +static void io_zc_rx_ifq_free(struct io_zc_rx_ifq *ifq) +{ + io_shutdown_ifq(ifq); + if (ifq->dev) dev_put(ifq->dev); io_free_rbuf_ring(ifq); @@ -132,7 +156,6 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, reg.rq_off.tail = offsetof(struct io_uring, tail); if (copy_to_user(arg, ®, sizeof(reg))) { - io_close_zc_rxq(ifq); ret = -EFAULT; goto err; } @@ -153,6 +176,8 @@ void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx) if (!ifq) return; + WARN_ON_ONCE(ifq->nr_sockets); + ctx->ifq = NULL; io_zc_rx_ifq_free(ifq); } @@ -160,6 +185,66 @@ void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx) void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx) { lockdep_assert_held(&ctx->uring_lock); + + io_shutdown_ifq(ctx->ifq); +} + +int io_register_zc_rx_sock(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_sock_reg __user *arg) +{ + struct io_uring_zc_rx_sock_reg sr; + struct io_zc_rx_ifq *ifq; + struct socket *sock; + struct file *file; + int ret = -EEXIST; + int idx; + + if (copy_from_user(&sr, arg, sizeof(sr))) + return -EFAULT; + if (sr.__resv[0] || sr.__resv[1]) + return -EINVAL; + if (sr.zc_rx_ifq_idx != 0 || !ctx->ifq) + return -EINVAL; + + ifq = ctx->ifq; + if (ifq->nr_sockets >= ARRAY_SIZE(ifq->sockets)) + return -EINVAL; + + BUILD_BUG_ON(ARRAY_SIZE(ifq->sockets) > IO_ZC_IFQ_IDX_MASK); + + file = fget(sr.sockfd); + if (!file) + return -EBADF; + + if (!!unix_get_socket(file)) { + fput(file); + return -EBADF; + } + + sock = sock_from_file(file); + if (unlikely(!sock || !sock->sk)) { + fput(file); + return -ENOTSOCK; + } + + idx = ifq->nr_sockets; + lock_sock(sock->sk); + if (!sock->zc_rx_idx) { + unsigned token; + + token = idx + (sr.zc_rx_ifq_idx << IO_ZC_IFQ_IDX_OFFSET); + WRITE_ONCE(sock->zc_rx_idx, token); + ret = 0; + } + release_sock(sock->sk); + + if (ret) { + fput(file); + return ret; + } + ifq->sockets[idx] = file; + ifq->nr_sockets++; + return 0; } #endif diff --git a/io_uring/zc_rx.h b/io_uring/zc_rx.h index 35b019b275e0..d7b8397d525f 100644 --- a/io_uring/zc_rx.h +++ b/io_uring/zc_rx.h @@ -2,6 +2,13 @@ #ifndef IOU_ZC_RX_H #define IOU_ZC_RX_H +#include +#include + +#define IO_ZC_MAX_IFQ_SOCKETS 16 +#define IO_ZC_IFQ_IDX_OFFSET 16 +#define IO_ZC_IFQ_IDX_MASK ((1U << IO_ZC_IFQ_IDX_OFFSET) - 1) + struct io_zc_rx_ifq { struct io_ring_ctx *ctx; struct net_device *dev; @@ -11,6 +18,9 @@ struct io_zc_rx_ifq { /* hw rx descriptor ring id */ u32 if_rxq_id; + + unsigned nr_sockets; + struct file *sockets[IO_ZC_MAX_IFQ_SOCKETS]; }; #if defined(CONFIG_PAGE_POOL) @@ -18,6 +28,8 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, struct io_uring_zc_rx_ifq_reg __user *arg); void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx); void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx); +int io_register_zc_rx_sock(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_sock_reg __user *arg); #else static inline int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, struct io_uring_zc_rx_ifq_reg __user *arg) @@ -30,6 +42,11 @@ static inline void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx) static inline void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx) { } +static inline int io_register_zc_rx_sock(struct io_ring_ctx *ctx, + struct io_uring_zc_rx_sock_reg __user *arg) +{ + return -EOPNOTSUPP; +} #endif #endif diff --git a/net/socket.c b/net/socket.c index c69cd0e652b8..18181a4e0295 100644 --- a/net/socket.c +++ b/net/socket.c @@ -637,6 +637,7 @@ struct socket *sock_alloc(void) sock = SOCKET_I(inode); + sock->zc_rx_idx = 0; inode->i_ino = get_next_ino(); inode->i_mode = S_IFSOCK | S_IRWXUGO; inode->i_uid = current_fsuid(); From patchwork Tue Mar 12 21:44:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590640 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C5D414534A for ; Tue, 12 Mar 2024 21:44:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279885; cv=none; b=NeRv3LacKv3SggIK7LVBsGieFSyd/yExU7l3sfJxyIdpwicRod1/SnwulUfehYcPRX44ygl6BuWTTvEOmnEjHtHM2RF8CRo4fqFfjM+QULLwb0F5Ajp+loXFm+V7NKBigE5MUvyCy10zYWyQzxgM8ej2jlksBcV7Owurf7xAt/s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279885; c=relaxed/simple; bh=eRcCQkeaZ8F0GKFd2KHE5Z+F854s9I1FEkuc3LSzm2c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hMdw2iziWu7oIy4yV8d3B9MlDfloVBwCmX+TEhS/lEBDooUlqb9MRFXOluyhMm9V65F9U6BZ1npbS7UMEm0E+ILYeoDLHVGv16GC06VhYXUcZaB7yaH2gFpsaLMxnXDXGUMcs41BeluIzYz7eNkp1w1kBSucQNvf1UMgwxQ/rjI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=kmhIAGjt; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="kmhIAGjt" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-6e6afd8da93so761496b3a.3 for ; Tue, 12 Mar 2024 14:44:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279884; x=1710884684; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KCy4u9W62ktWc/P2lKJyJOpaYewa777F/POJ5w4gbEU=; b=kmhIAGjtTvRveEAIi+6KlRa2BkST490cqz9UQ/5bDtmZG/wxpU5rpCt3aW5WRCcZzc 0Dk3OWoz4K3IhmCINrIO7tQwi7VY6pTOM83ewJimCkN9CQ1cFFFEtgQyhlTMcXe9/lOC oBsZrNW+ij89j9J9+ptsP58FfT+VvE5VlDBIPD8BXaxYdICtTaFCxDIHUEw2h7eiifGO L9D0R4G+AEySvl4pJ3rNsaD8IT+SDnGy3U5alkglzloIwgRZdp6Qo4gRrIoIXk1CdLUQ Sm0/zivyGJyF/ejru5jEHwZMmg3W8bnVQ8strWs5wLaDpUnaRasEU6bqO5jE2TEnE85O 0MdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279884; x=1710884684; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KCy4u9W62ktWc/P2lKJyJOpaYewa777F/POJ5w4gbEU=; b=VFE9LMze3/34mjJUBsiSeak7trmgTuQn/tV68InWVZo72kurVHX4kiQOJ/Gsuriz8g c23iMf2+Z9DyjlwHLs372iF47w+Y8ihHwMOoyxFFQJI3iF3jrV0xXxhz85sd+/dXDp2j NkBtJ1cm7QvchUZhEQUtcvoySz3SOaeIMUAyiUApc3FwblnQLehLDraszouWh1DjgVYO 18d/C8CyBPbHjmGRsfFo3zuIfNq0HzwpWimn8VlHSOPcqKxAQ3gPA1rMHvD2ichgYw96 xWWyCZizlnL89hPR6RCfyJdHT/gD7f2gCvqPIqRp/KYT8pBpDvrHNbaJO1M21ZcyR49L 4KiQ== X-Forwarded-Encrypted: i=1; AJvYcCWeGIKe7u5JXtgptAlGAghEmObFyeTyXHv9kO93tcoPx4sztqECWnuE/1s4/fSmaiMTbeQ/bZVV1Su1Kf0pIAh0gqGoA1G3 X-Gm-Message-State: AOJu0YyYSqwSpz22PYHiKyXM9tlrdqMnEYhja8DFMaXm7cLEzwEsbRhJ ES+BEUQz2Yi22kjtWQRBeBckSINXqjCxLXISVQyEPXIE8BZVO6WkcsRfSB5GXuU= X-Google-Smtp-Source: AGHT+IH6qaz8/MgTiX2wWsxztFgdejg3skTArfeuUDbnJDyUll4L7vespliwv23QrCjuFSo8kukySg== X-Received: by 2002:a05:6a00:9299:b0:6e6:a1ed:aff1 with SMTP id jw25-20020a056a00929900b006e6a1edaff1mr881668pfb.5.1710279883637; Tue, 12 Mar 2024 14:44:43 -0700 (PDT) Received: from localhost (fwdproxy-prn-117.fbsv.net. [2a03:2880:ff:75::face:b00c]) by smtp.gmail.com with ESMTPSA id z8-20020aa79e48000000b006e6686effd7sm6555348pfq.76.2024.03.12.14.44.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:43 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 10/16] io_uring: add zero copy buf representation and pool Date: Tue, 12 Mar 2024 14:44:24 -0700 Message-ID: <20240312214430.2923019-11-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC From: David Wei This patch adds two objects: * Zero copy buffer representation, holding a page and a net_iov. The page is needed as net_iov is designed for opaque device memory, whereas we are backed by real pages. * Zero copy pool, spiritually similar to page pool, that holds ZC bufs and hands them out to net devices. This will be used as an implementation of page pool memory provider. Pool regions are registered w/ io_uring using the registered buffer API, with a 1:1 mapping between region and nr_iovec in io_uring_register_buffers. This does the heavy lifting of pinning and chunking into bvecs into a struct io_mapped_ubuf for us. For now as there is only one pool region per ifq, there is no separate API for adding/removing regions yet and it is mapped implicitly during ifq registration. Signed-off-by: David Wei --- include/linux/io_uring/net.h | 7 +++ io_uring/zc_rx.c | 110 +++++++++++++++++++++++++++++++++++ io_uring/zc_rx.h | 15 +++++ 3 files changed, 132 insertions(+) diff --git a/include/linux/io_uring/net.h b/include/linux/io_uring/net.h index b58f39fed4d5..05d5a6a97264 100644 --- a/include/linux/io_uring/net.h +++ b/include/linux/io_uring/net.h @@ -2,8 +2,15 @@ #ifndef _LINUX_IO_URING_NET_H #define _LINUX_IO_URING_NET_H +#include + struct io_uring_cmd; +struct io_zc_rx_buf { + struct net_iov niov; + struct page *page; +}; + #if defined(CONFIG_IO_URING) int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index 77459c0fc14b..326ae3fcc643 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -66,6 +67,109 @@ static void io_free_rbuf_ring(struct io_zc_rx_ifq *ifq) folio_put(virt_to_folio(ifq->rq_ring)); } +static int io_zc_rx_init_buf(struct page *page, struct io_zc_rx_buf *buf) +{ + memset(&buf->niov, 0, sizeof(buf->niov)); + atomic_long_set(&buf->niov.pp_ref_count, 0); + + buf->page = page; + get_page(page); + return 0; +} + +static void io_zc_rx_free_buf(struct io_zc_rx_buf *buf) +{ + struct page *page = buf->page; + + put_page(page); +} + +static int io_zc_rx_init_pool(struct io_zc_rx_pool *pool, + struct io_mapped_ubuf *imu) +{ + struct io_zc_rx_buf *buf; + struct page *page; + int i, ret; + + for (i = 0; i < imu->nr_bvecs; i++) { + page = imu->bvec[i].bv_page; + buf = &pool->bufs[i]; + ret = io_zc_rx_init_buf(page, buf); + if (ret) + goto err; + + pool->freelist[i] = i; + } + + pool->free_count = imu->nr_bvecs; + return 0; +err: + while (i--) { + buf = &pool->bufs[i]; + io_zc_rx_free_buf(buf); + } + return ret; +} + +static int io_zc_rx_create_pool(struct io_ring_ctx *ctx, + struct io_zc_rx_ifq *ifq, + u16 id) +{ + struct io_mapped_ubuf *imu; + struct io_zc_rx_pool *pool; + int nr_pages; + int ret; + + if (ifq->pool) + return -EFAULT; + + if (unlikely(id >= ctx->nr_user_bufs)) + return -EFAULT; + id = array_index_nospec(id, ctx->nr_user_bufs); + imu = ctx->user_bufs[id]; + if (imu->ubuf & ~PAGE_MASK || imu->ubuf_end & ~PAGE_MASK) + return -EFAULT; + + ret = -ENOMEM; + nr_pages = imu->nr_bvecs; + pool = kvmalloc(struct_size(pool, freelist, nr_pages), GFP_KERNEL); + if (!pool) + goto err; + + pool->bufs = kvmalloc_array(nr_pages, sizeof(*pool->bufs), GFP_KERNEL); + if (!pool->bufs) + goto err_buf; + + ret = io_zc_rx_init_pool(pool, imu); + if (ret) + goto err_map; + + pool->ifq = ifq; + pool->pool_id = id; + pool->nr_bufs = nr_pages; + spin_lock_init(&pool->freelist_lock); + ifq->pool = pool; + return 0; +err_map: + kvfree(pool->bufs); +err_buf: + kvfree(pool); +err: + return ret; +} + +static void io_zc_rx_free_pool(struct io_zc_rx_pool *pool) +{ + struct io_zc_rx_buf *buf; + + for (int i = 0; i < pool->nr_bufs; i++) { + buf = &pool->bufs[i]; + io_zc_rx_free_buf(buf); + } + kvfree(pool->bufs); + kvfree(pool); +} + static struct io_zc_rx_ifq *io_zc_rx_ifq_alloc(struct io_ring_ctx *ctx) { struct io_zc_rx_ifq *ifq; @@ -104,6 +208,8 @@ static void io_zc_rx_ifq_free(struct io_zc_rx_ifq *ifq) { io_shutdown_ifq(ifq); + if (ifq->pool) + io_zc_rx_free_pool(ifq->pool); if (ifq->dev) dev_put(ifq->dev); io_free_rbuf_ring(ifq); @@ -141,6 +247,10 @@ int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, if (!ifq->dev) goto err; + ret = io_zc_rx_create_pool(ctx, ifq, reg.region_id); + if (ret) + goto err; + ifq->rq_entries = reg.rq_entries; ifq->if_rxq_id = reg.if_rxq_id; diff --git a/io_uring/zc_rx.h b/io_uring/zc_rx.h index d7b8397d525f..466b2b8f9813 100644 --- a/io_uring/zc_rx.h +++ b/io_uring/zc_rx.h @@ -3,15 +3,30 @@ #define IOU_ZC_RX_H #include +#include #include #define IO_ZC_MAX_IFQ_SOCKETS 16 #define IO_ZC_IFQ_IDX_OFFSET 16 #define IO_ZC_IFQ_IDX_MASK ((1U << IO_ZC_IFQ_IDX_OFFSET) - 1) +struct io_zc_rx_pool { + struct io_zc_rx_ifq *ifq; + struct io_zc_rx_buf *bufs; + u32 nr_bufs; + u16 pool_id; + + /* freelist */ + spinlock_t freelist_lock; + u32 free_count; + u32 freelist[]; +}; + struct io_zc_rx_ifq { struct io_ring_ctx *ctx; struct net_device *dev; + struct io_zc_rx_pool *pool; + struct io_uring *rq_ring; struct io_uring_rbuf_rqe *rqes; u32 rq_entries; From patchwork Tue Mar 12 21:44:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590641 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F0F3145B3D for ; Tue, 12 Mar 2024 21:44:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279887; cv=none; b=mqcMdjFS/2AZdBnsYnZ35WpEFFb44dvyvkwhLfre8jr7+Pvv9NuTlZm7yuFAeZ5UVUYd3/KpP2FTfCNoLViU9c6DqOtSVDueHULYdgkWaV6DrZ/RbZPX8iim4LuNBxEuzWh2CWmKaef2yVs6yt2H8Xz5U7PeX8vgpaghUB4bgJw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279887; c=relaxed/simple; bh=fBd7aiW8lj0+VawSo9hjm9m3CrBu6h6cjHpI8ufQ6Tc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Z1cyJu7HixczCmsn1TN/m5de6izuUiBFv1MM+SzSK1GPiDZ7E6k8bEjO3MR3C12Os51BZ1sxBj/kWFEYbyUVxEt4Ppm8WIjhqxq7Bs+G+O2dD9gBjRZWckdABEYEYOU8oAg3H4ne84NMcf/+6PSWOKXEO1kmTpqmZPXIu/1iQ90= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=wv9r2612; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="wv9r2612" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-6e5dddd3b95so221728b3a.1 for ; Tue, 12 Mar 2024 14:44:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279885; x=1710884685; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Q/bJ5cz00biADxk0xZyDduH11+GpxHGkffjZ4B+okoY=; b=wv9r2612y5iNlXIYo0aFlA5vi3rdcmbWskyr9IetP0sSh6LNUh8TZD4V7JQ8lY3WF6 gpOK8gHVVK2swHHbIyB5UfZLbW17hHXjeKW4PbSB3I2lxeMdinC2cVxbNyTX/IYl0bJ1 ZmzAPkWFAXssWSvQWqTFRGuLy08LqZC6wsxLYipP/yUJa2uLRhL7Gy2JeGArJ77Ap3xW BI4dSEdJ+Ecwv1IfYwGn0EkVGSCmHWyB8aU548Pv89zlzWVngHIqojulAKSFZXUoftST 9s0DhmUbuQxgcv7ozzROi4gz8qh0dXo2CVsPUoFPL5SY0w59YzTXhBe1lCQ7MBX1Wwhe vqDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279885; x=1710884685; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q/bJ5cz00biADxk0xZyDduH11+GpxHGkffjZ4B+okoY=; b=T3pVbJp+Ig7p713WYiMHCTvRDk8pCc0FrblDd7EPtwRWUDQnJt5cU9Uh609Op0M9oE aVuwiBMZQWaY8oV3tirsnXiqDyJzumFtC7LWuLBJexh1xvr9lVn9S3O5umHrnFHtMIY3 e9EtjEIO3gVIKuu6SMeV7V3DoFTfPNGHI3V55GobDLQH/xigMCaqMw9HbnMUC4rdYOEX SsrB4lFUnBJCQsNoQEQQALHuBgdMb67RkEbIU/13uTDrFS1JY02b5cwNFIKfO5pnyscz hPz4r3H/RA9MSjwF2ZGlQdCPW3GbwuCF5DqDYbYkaWQsAeK2FiX74H/f/hDOBs9iIcEj ug9g== X-Forwarded-Encrypted: i=1; AJvYcCVVbWh8c9kbWu5pB1R20mLkz0XD0CkniYdVF1ynTL8+5j/YDiojwsnBsUJeiW+f0Ijpq9Ev9Se/fsZpO3yci/ynDGhqH7BK X-Gm-Message-State: AOJu0YxZBZU0W08AJTPRsg1GCvAYgFtWSXLtIKIcGFVung7+yHsFW/YA pm17h0byPUSsPqy+18Kv2gLYSXxh751CAp/iZkSGKmEXlq4EBSVgcQzU9oErYCk= X-Google-Smtp-Source: AGHT+IEyzPjJqpvSE1n18GL+g6MedsBAuXrULBa6xfyiiV4h7LYMzOL0v6SLuCy5/J6G/1xX4hfCVQ== X-Received: by 2002:a05:6a00:803:b0:6e6:75d8:3d19 with SMTP id m3-20020a056a00080300b006e675d83d19mr822851pfk.8.1710279884624; Tue, 12 Mar 2024 14:44:44 -0700 (PDT) Received: from localhost (fwdproxy-prn-118.fbsv.net. [2a03:2880:ff:76::face:b00c]) by smtp.gmail.com with ESMTPSA id p1-20020a62b801000000b006e66a76d877sm6877532pfe.153.2024.03.12.14.44.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:44 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 11/16] io_uring: implement pp memory provider for zc rx Date: Tue, 12 Mar 2024 14:44:25 -0700 Message-ID: <20240312214430.2923019-12-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Pavel Begunkov Implement a new pp memory provider for io_uring zerocopy receive. All buffers are backed by struct io_zc_rx_buf, which is a thin extension of struct net_iov. Initially, all of them are unallocated and placed in a spinlock protected ->freelist. Then, they will be allocate via the ->alloc_pages callback, which sets refcount to 1. Later, buffers would either be dropped by the net stack and recycled back into page pool / released by ->release_page, or, more likely, get transferred to the userspace by posting a corresponding CQE and elevating refcount by IO_ZC_RX_UREF. When the user is done with a buffer, it should be put into the refill ring. Next time io_pp_zc_alloc_pages() runs it'll check the ring, put user refs and ultimately grab buffers from there. That's done in the attached napi context and so doesn't need any additional synchronisation. That is the second hottest path after getting a buffer from the pp lockless cache. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/linux/io_uring/net.h | 5 + include/net/page_pool/types.h | 1 + io_uring/zc_rx.c | 202 ++++++++++++++++++++++++++++++++++ io_uring/zc_rx.h | 5 + net/core/page_pool.c | 2 +- 5 files changed, 214 insertions(+), 1 deletion(-) diff --git a/include/linux/io_uring/net.h b/include/linux/io_uring/net.h index 05d5a6a97264..a225d7090b6b 100644 --- a/include/linux/io_uring/net.h +++ b/include/linux/io_uring/net.h @@ -12,6 +12,11 @@ struct io_zc_rx_buf { }; #if defined(CONFIG_IO_URING) + +#if defined(CONFIG_PAGE_POOL) +extern const struct memory_provider_ops io_uring_pp_zc_ops; +#endif + int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); #else diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 347837b83d36..9e91f2cdbe61 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -227,6 +227,7 @@ netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool, struct page_pool *page_pool_create(const struct page_pool_params *params); struct page_pool *page_pool_create_percpu(const struct page_pool_params *params, int cpuid); +void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem); struct xdp_mem_info; diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index 326ae3fcc643..b2507df121fb 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -8,6 +8,7 @@ #include #include #include +#include #include @@ -357,4 +358,205 @@ int io_register_zc_rx_sock(struct io_ring_ctx *ctx, return 0; } +static inline struct io_zc_rx_buf *io_niov_to_buf(struct net_iov *niov) +{ + return container_of(niov, struct io_zc_rx_buf, niov); +} + +static inline unsigned io_buf_pgid(struct io_zc_rx_pool *pool, + struct io_zc_rx_buf *buf) +{ + return buf - pool->bufs; +} + +static __maybe_unused void io_zc_rx_get_buf_uref(struct io_zc_rx_buf *buf) +{ + atomic_long_add(IO_ZC_RX_UREF, &buf->niov.pp_ref_count); +} + +static bool io_zc_rx_buf_put(struct io_zc_rx_buf *buf, int nr) +{ + return atomic_long_sub_and_test(nr, &buf->niov.pp_ref_count); +} + +static bool io_zc_rx_put_buf_uref(struct io_zc_rx_buf *buf) +{ + if (atomic_long_read(&buf->niov.pp_ref_count) < IO_ZC_RX_UREF) + return false; + + return io_zc_rx_buf_put(buf, IO_ZC_RX_UREF); +} + +static inline netmem_ref io_zc_buf_to_netmem(struct io_zc_rx_buf *buf) +{ + return net_iov_to_netmem(&buf->niov); +} + +static inline void io_zc_add_pp_cache(struct page_pool *pp, + struct io_zc_rx_buf *buf) +{ + netmem_ref netmem = io_zc_buf_to_netmem(buf); + + page_pool_set_pp_info(pp, netmem); + pp->alloc.cache[pp->alloc.count++] = netmem; +} + +static inline u32 io_zc_rx_rqring_entries(struct io_zc_rx_ifq *ifq) +{ + u32 entries; + + entries = smp_load_acquire(&ifq->rq_ring->tail) - ifq->cached_rq_head; + return min(entries, ifq->rq_entries); +} + +static void io_zc_rx_ring_refill(struct page_pool *pp, + struct io_zc_rx_ifq *ifq) +{ + unsigned int entries = io_zc_rx_rqring_entries(ifq); + unsigned int mask = ifq->rq_entries - 1; + struct io_zc_rx_pool *pool = ifq->pool; + + if (unlikely(!entries)) + return; + + while (entries--) { + unsigned int rq_idx = ifq->cached_rq_head++ & mask; + struct io_uring_rbuf_rqe *rqe = &ifq->rqes[rq_idx]; + u32 pgid = rqe->off / PAGE_SIZE; + struct io_zc_rx_buf *buf = &pool->bufs[pgid]; + + if (!io_zc_rx_put_buf_uref(buf)) + continue; + io_zc_add_pp_cache(pp, buf); + if (pp->alloc.count >= PP_ALLOC_CACHE_REFILL) + break; + } + smp_store_release(&ifq->rq_ring->head, ifq->cached_rq_head); +} + +static void io_zc_rx_refill_slow(struct page_pool *pp, struct io_zc_rx_ifq *ifq) +{ + struct io_zc_rx_pool *pool = ifq->pool; + + spin_lock_bh(&pool->freelist_lock); + while (pool->free_count && pp->alloc.count < PP_ALLOC_CACHE_REFILL) { + struct io_zc_rx_buf *buf; + u32 pgid; + + pgid = pool->freelist[--pool->free_count]; + buf = &pool->bufs[pgid]; + + io_zc_add_pp_cache(pp, buf); + pp->pages_state_hold_cnt++; + trace_page_pool_state_hold(pp, io_zc_buf_to_netmem(buf), + pp->pages_state_hold_cnt); + } + spin_unlock_bh(&pool->freelist_lock); +} + +static void io_zc_rx_recycle_buf(struct io_zc_rx_pool *pool, + struct io_zc_rx_buf *buf) +{ + spin_lock_bh(&pool->freelist_lock); + pool->freelist[pool->free_count++] = io_buf_pgid(pool, buf); + spin_unlock_bh(&pool->freelist_lock); +} + +static netmem_ref io_pp_zc_alloc_pages(struct page_pool *pp, gfp_t gfp) +{ + struct io_zc_rx_ifq *ifq = pp->mp_priv; + + /* pp should already be ensuring that */ + if (unlikely(pp->alloc.count)) + goto out_return; + + io_zc_rx_ring_refill(pp, ifq); + if (likely(pp->alloc.count)) + goto out_return; + + io_zc_rx_refill_slow(pp, ifq); + if (!pp->alloc.count) + return 0; +out_return: + return pp->alloc.cache[--pp->alloc.count]; +} + +static bool io_pp_zc_release_page(struct page_pool *pp, netmem_ref netmem) +{ + struct io_zc_rx_ifq *ifq = pp->mp_priv; + struct io_zc_rx_buf *buf; + struct net_iov *niov; + + if (WARN_ON_ONCE(!netmem_is_net_iov(netmem))) + return false; + + niov = netmem_to_net_iov(netmem); + buf = io_niov_to_buf(niov); + + if (io_zc_rx_buf_put(buf, 1)) + io_zc_rx_recycle_buf(ifq->pool, buf); + return false; +} + +static void io_pp_zc_scrub(struct page_pool *pp) +{ + struct io_zc_rx_ifq *ifq = pp->mp_priv; + struct io_zc_rx_pool *pool = ifq->pool; + int i; + + for (i = 0; i < pool->nr_bufs; i++) { + struct io_zc_rx_buf *buf = &pool->bufs[i]; + int count; + + if (!io_zc_rx_put_buf_uref(buf)) + continue; + io_zc_rx_recycle_buf(pool, buf); + + count = atomic_inc_return_relaxed(&pp->pages_state_release_cnt); + trace_page_pool_state_release(pp, io_zc_buf_to_netmem(buf), count); + } +} + +static int io_pp_zc_init(struct page_pool *pp) +{ + struct io_zc_rx_ifq *ifq = pp->mp_priv; + + if (!ifq) + return -EINVAL; + if (pp->p.order != 0) + return -EINVAL; + if (!pp->p.napi) + return -EINVAL; + if (pp->p.flags & PP_FLAG_DMA_MAP) + return -EOPNOTSUPP; + if (pp->p.flags & PP_FLAG_DMA_SYNC_DEV) + return -EOPNOTSUPP; + + percpu_ref_get(&ifq->ctx->refs); + ifq->pp = pp; + return 0; +} + +static void io_pp_zc_destroy(struct page_pool *pp) +{ + struct io_zc_rx_ifq *ifq = pp->mp_priv; + struct io_zc_rx_pool *pool = ifq->pool; + + ifq->pp = NULL; + + if (WARN_ON_ONCE(pool->free_count != pool->nr_bufs)) + return; + percpu_ref_put(&ifq->ctx->refs); +} + +const struct memory_provider_ops io_uring_pp_zc_ops = { + .alloc_pages = io_pp_zc_alloc_pages, + .release_page = io_pp_zc_release_page, + .init = io_pp_zc_init, + .destroy = io_pp_zc_destroy, + .scrub = io_pp_zc_scrub, +}; +EXPORT_SYMBOL(io_uring_pp_zc_ops); + + #endif diff --git a/io_uring/zc_rx.h b/io_uring/zc_rx.h index 466b2b8f9813..c02bf8cabc6c 100644 --- a/io_uring/zc_rx.h +++ b/io_uring/zc_rx.h @@ -10,6 +10,9 @@ #define IO_ZC_IFQ_IDX_OFFSET 16 #define IO_ZC_IFQ_IDX_MASK ((1U << IO_ZC_IFQ_IDX_OFFSET) - 1) +#define IO_ZC_RX_UREF 0x10000 +#define IO_ZC_RX_KREF_MASK (IO_ZC_RX_UREF - 1) + struct io_zc_rx_pool { struct io_zc_rx_ifq *ifq; struct io_zc_rx_buf *bufs; @@ -26,10 +29,12 @@ struct io_zc_rx_ifq { struct io_ring_ctx *ctx; struct net_device *dev; struct io_zc_rx_pool *pool; + struct page_pool *pp; struct io_uring *rq_ring; struct io_uring_rbuf_rqe *rqes; u32 rq_entries; + u32 cached_rq_head; /* hw rx descriptor ring id */ u32 if_rxq_id; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index fc92e551ed13..f83ddbb4ebd8 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -460,7 +460,7 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) return false; } -static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) +void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) { netmem_set_pp(netmem, pool); netmem_or_pp_magic(netmem, PP_SIGNATURE); From patchwork Tue Mar 12 21:44:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590642 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62840145351 for ; Tue, 12 Mar 2024 21:44:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279887; cv=none; b=hDCvHI8nm/l+7lTtQAwY3oxGLNfynQgshIMWNbbPCgkzsIkbonxBgn/PPF55QsaZqyTy7oAHCXkGB3lNAPxrln3FmN1JWNQyRyPhfuU1zG4vPL2TZRN9GSM06y3HsHpJlJtDf8ccuAoifcZC+br+ng0hNURvlnDnqH/MW4s17qM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279887; c=relaxed/simple; bh=aeHuFMOyRuZ3XW+Rnr2l0qtWQLXFbMfZhOAAcqgGQ2Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hVEwBnoBGzx5GgXZeKyNEWofGM5c98jEUyDuKokueix6gOg+wNZvVzoUpld2/AIbA+yu3Z87joAGbftXYiy7lC8pTeHzKIA75wwpSzSvp24XnBQt7XYIwOmZG6QJuhEMUjxwEDI4Cliy3pUTjCk/aHqhJcQhAdn3Sp7I2/HOlkY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=XSUj7XUb; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="XSUj7XUb" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1dc09556599so49856825ad.1 for ; Tue, 12 Mar 2024 14:44:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279886; x=1710884686; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wAVR2O4v3CTtiY0k6MfQnUcvPzTNjtLVzW6+B1SUorQ=; b=XSUj7XUbLVzsTFJjQFtzb2sF05IafY7VEjIqHcN8oLrYy97wM2xSD54BTQmgX6zOQd nVdU56hbcpw1NghWz7IYnow3Nj1NOYwu2V/ocoWwozVtD3QKsSdG5nIAnA65Wp+G2tKX 24sQ11duV0qjvG8FyuKVY3TVI8cGlRXkRAFOG8GRCjefqoVq0kACuhv7tussnS1rqVtI YlZIXzadQDhmhp2c7RvsQdFI9Qu/q1uh+FELRbhRvAgF2cylJrvqS2k+05A+BEOLKZze vOFcqkB/vPZpLvs2D2jF4PU1frBcNDaO6ZfDNE7AeXoRq9eLWkrLImbBf2NQeJaOp1Bj OZ1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279886; x=1710884686; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wAVR2O4v3CTtiY0k6MfQnUcvPzTNjtLVzW6+B1SUorQ=; b=KKriCcZKj0nRIw3GatHt/mVwx2OVys7YLaNh3pyB21MHo8bGbb6pr/pNfXJU4XRLZY W+gy+VCYBxFoQekBAfLyXlp3fuS2JmR94MyWKj3gU5jfAySaaTUnyrJEzt026VMwqHRG aM0izIxPz5PO+OwgUqXaNFRypimHwb0/bJnH9JFOsWkoaQ127qn2El1WZCiOE8oUsbw0 PAoo1jfy6XfaVI2QhxkU7T+fL31TXmsg1n35vB62EgOJFXEKKjwU2DV3Lo7omPkJvEkb u1+xIixyRv15mgPDYhb6EQzs0xbtQJKx+xE4R1pjWfSxFnMgQV+aqBowA7kX7vAM0q8p vjSg== X-Forwarded-Encrypted: i=1; AJvYcCWQCCLPlD0zmWs8YLSzi/M90Iy64NhX0kIPQUYYsDl0KM6DYXyANCHq+OlxwyTgsiv1nLtuh+Oh7MjFr41f/kBobC9EnNYZ X-Gm-Message-State: AOJu0YxTfTYSnygAv1Gb/+TugQuleZR9McSn///YsjCk4+tMky3FcuJ/ DNZIC98nrVHc4Mvi3/zpggymB289EKA7CovbBLZn6MEGr70TO/XXvtE3qsgslUo= X-Google-Smtp-Source: AGHT+IGmqJhtdBjWOobQoRY8deHIp2H+HZWi2xEjoE27n/25R3VmXSVpOS4wZto79bOen1D1uAkOng== X-Received: by 2002:a17:903:244d:b0:1dd:b140:d00e with SMTP id l13-20020a170903244d00b001ddb140d00emr5551806pls.18.1710279885698; Tue, 12 Mar 2024 14:44:45 -0700 (PDT) Received: from localhost (fwdproxy-prn-016.fbsv.net. [2a03:2880:ff:10::face:b00c]) by smtp.gmail.com with ESMTPSA id k8-20020a170902760800b001dc10796951sm7250819pll.19.2024.03.12.14.44.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:45 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 12/16] io_uring/zcrx: implement PP_FLAG_DMA_* handling Date: Tue, 12 Mar 2024 14:44:26 -0700 Message-ID: <20240312214430.2923019-13-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC From: Pavel Begunkov The patch implements support for PP_FLAG_DMA_MAP and PP_FLAG_DMA_SYNC_DEV. Dma map buffers when creating a page pool if needed, and unmap on tear down. Most of synching is done by page pool apart from when we're grabbing buffers from the refill ring, in which case it we need to do it by hand. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- io_uring/zc_rx.c | 90 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 86 insertions(+), 4 deletions(-) diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index b2507df121fb..4bd27eda4bc9 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -9,6 +9,7 @@ #include #include #include +#include #include @@ -72,6 +73,7 @@ static int io_zc_rx_init_buf(struct page *page, struct io_zc_rx_buf *buf) { memset(&buf->niov, 0, sizeof(buf->niov)); atomic_long_set(&buf->niov.pp_ref_count, 0); + page_pool_set_dma_addr_netmem(net_iov_to_netmem(&buf->niov), 0); buf->page = page; get_page(page); @@ -392,12 +394,25 @@ static inline netmem_ref io_zc_buf_to_netmem(struct io_zc_rx_buf *buf) return net_iov_to_netmem(&buf->niov); } +static inline void io_zc_sync_for_device(struct page_pool *pp, + netmem_ref netmem) +{ + if (pp->p.flags & PP_FLAG_DMA_SYNC_DEV) { + dma_addr_t dma_addr = page_pool_get_dma_addr_netmem(netmem); + + dma_sync_single_range_for_device(pp->p.dev, dma_addr, + pp->p.offset, pp->p.max_len, + pp->p.dma_dir); + } +} + static inline void io_zc_add_pp_cache(struct page_pool *pp, struct io_zc_rx_buf *buf) { netmem_ref netmem = io_zc_buf_to_netmem(buf); page_pool_set_pp_info(pp, netmem); + io_zc_sync_for_device(pp, netmem); pp->alloc.cache[pp->alloc.count++] = netmem; } @@ -517,9 +532,71 @@ static void io_pp_zc_scrub(struct page_pool *pp) } } +#define IO_PP_DMA_ATTRS (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING) + +static void io_pp_unmap_buf(struct io_zc_rx_buf *buf, struct page_pool *pp) +{ + netmem_ref netmem = net_iov_to_netmem(&buf->niov); + dma_addr_t dma = page_pool_get_dma_addr_netmem(netmem); + + dma_unmap_page_attrs(pp->p.dev, dma, PAGE_SIZE << pp->p.order, + pp->p.dma_dir, IO_PP_DMA_ATTRS); + page_pool_set_dma_addr_netmem(netmem, 0); +} + +static int io_pp_map_buf(struct io_zc_rx_buf *buf, struct page_pool *pp) +{ + netmem_ref netmem = net_iov_to_netmem(&buf->niov); + dma_addr_t dma_addr; + int ret; + + dma_addr = dma_map_page_attrs(pp->p.dev, buf->page, 0, + PAGE_SIZE << pp->p.order, pp->p.dma_dir, + IO_PP_DMA_ATTRS); + ret = dma_mapping_error(pp->p.dev, dma_addr); + if (ret) + return ret; + + if (WARN_ON_ONCE(page_pool_set_dma_addr_netmem(netmem, dma_addr))) { + dma_unmap_page_attrs(pp->p.dev, dma_addr, + PAGE_SIZE << pp->p.order, pp->p.dma_dir, + IO_PP_DMA_ATTRS); + return -EFAULT; + } + + io_zc_sync_for_device(pp, netmem); + return 0; +} + +static int io_pp_map_pool(struct io_zc_rx_pool *pool, struct page_pool *pp) +{ + int i, ret = 0; + + for (i = 0; i < pool->nr_bufs; i++) { + ret = io_pp_map_buf(&pool->bufs[i], pp); + if (ret) + break; + } + + if (ret) { + while (i--) + io_pp_unmap_buf(&pool->bufs[i], pp); + } + return ret; +} + +static void io_pp_unmap_pool(struct io_zc_rx_pool *pool, struct page_pool *pp) +{ + int i; + + for (i = 0; i < pool->nr_bufs; i++) + io_pp_unmap_buf(&pool->bufs[i], pp); +} + static int io_pp_zc_init(struct page_pool *pp) { struct io_zc_rx_ifq *ifq = pp->mp_priv; + int ret; if (!ifq) return -EINVAL; @@ -527,10 +604,12 @@ static int io_pp_zc_init(struct page_pool *pp) return -EINVAL; if (!pp->p.napi) return -EINVAL; - if (pp->p.flags & PP_FLAG_DMA_MAP) - return -EOPNOTSUPP; - if (pp->p.flags & PP_FLAG_DMA_SYNC_DEV) - return -EOPNOTSUPP; + + if (pp->p.flags & PP_FLAG_DMA_MAP) { + ret = io_pp_map_pool(ifq->pool, pp); + if (ret) + return ret; + } percpu_ref_get(&ifq->ctx->refs); ifq->pp = pp; @@ -542,6 +621,9 @@ static void io_pp_zc_destroy(struct page_pool *pp) struct io_zc_rx_ifq *ifq = pp->mp_priv; struct io_zc_rx_pool *pool = ifq->pool; + if (pp->p.flags & PP_FLAG_DMA_MAP) + io_pp_unmap_pool(ifq->pool, pp); + ifq->pp = NULL; if (WARN_ON_ONCE(pool->free_count != pool->nr_bufs)) From patchwork Tue Mar 12 21:44:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590643 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CBA2145FF5 for ; Tue, 12 Mar 2024 21:44:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279889; cv=none; b=mTIG8PZZq5uqnZYQjiQzJ7GDPDsDK/w8pSjs9mtv5vUSzFPTzPS8grg1CZQIprQKJgvq9iiE0YalV5SsZO1QMecIHu2xAhDTWpAwh3LP4M2ky+pn/olCokaAtIodzc+Tv2ytNFoRhrDOjDNSE2/A8QkFJptGlMrZnI+DqNx+Mww= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279889; c=relaxed/simple; bh=mNGLbXfa9NPPeS5WlGJbuh8AL62SbdqEzpioONP8Yso=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WrU/1b2b2AVcxoUWzl1JpR0EG4MsT/lnm/lVeqk5ktFB+Zxux1OpJgcB52IzbSzws7FLfDzes/svMsau/aOcUAvK/X9wzL0fTaG6b5X1kKQgk/pAcfpXRmv/qV6Vkbv4KSCZfQ9/hLAJpBbYGIeExPOh9TyHJHJdmu1cWsyk3IE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=ZJFdi2Pi; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="ZJFdi2Pi" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-29a8911d11cso207308a91.1 for ; Tue, 12 Mar 2024 14:44:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279887; x=1710884687; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TFThT2VJUbNvR7G2lUNJR/GJIbE8IFg24e5r8fNOtpw=; b=ZJFdi2PisycqHUUCbcg328CQrV6U/2KLIgqtglTnhTQ+cIMHMfakfp21BpVBvqkVla RiIA0RuGfe2ckaU5W64H0jE/tvnJ8BG8UlvzG+sSg2oOVA+3C15c2iOGM9zNsOpAJ9rB Trg7EbfDPdbwlm46Pu/fnW1mVo4Jc/wKT6wyj6NeXCVAICtM9WO5IFLsvzd5imAnj/gS SVzX1NnF1K2/+Nqn7kIp6iJJIxHPCSrspojcj3TAkd266D0Z3RuWchp6sg/4h1/olBwY +PXW7R7tBDxWmVpgrDDOjqddDnY90Sy9xh6WO6p/BnfMN7bj08wJP6PasyfMFBQ61qbn X7jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279887; x=1710884687; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TFThT2VJUbNvR7G2lUNJR/GJIbE8IFg24e5r8fNOtpw=; b=UH7DaszFalRgCBbFx1efiEVaQmh+iDDMoLsjroPBUzk+mMsJrDWCNUC9he8PWthS3B k0EL+pfPNuyLeunImOtvR/h4pdn6d7mToMJD/Adsbf7EaRwB5X/6XTybKIaU0qRjnOWW bQj/uR3G9KZ7LwN59K4wl0mfr6ksQ0dfsfjVSGyFrcrSNdT6F7d5l0eUFwfq2p0vfZFw tQI8nSdBbL3MhCCxdq/EtO3oa/HophRcK/fFOSlsFwlGVhWW6QBS/c1Ml2q65ms8UTAG uDi0dHCUBcj0Hk9RaZ8caOUQvj/Dt5E7QLu2Sisn76dQyild8qTOllH2EW0FSFE+WuJ2 5bsw== X-Forwarded-Encrypted: i=1; AJvYcCUD1sOMlUzNzSLbp/jjBTxdUxq7NJ0Ia0Z4Cy0cgQ45AUDCatG4clsDxq5Yb3218WGupMHbhEilIT+IP4MYnoOxiW3DjnVG X-Gm-Message-State: AOJu0YzfX6+ohiC3baSMfFYe/ydMylhN9dUY/RDPHjkTHcfj5ZWOhRMs rHNqllyIIj5v4tXCV+piqlUexDcpCKRLxb3XIwgoSJ061w884MM7rJZTR8QzUgs= X-Google-Smtp-Source: AGHT+IESt7nl46Cvv4dYfkhvVIiOkCTWusNIK3SZxhfAujk5rj+rp8Fa5tHuY66FpRgY5FFZ4SwdCQ== X-Received: by 2002:a17:90b:4f83:b0:29c:3d5b:dd42 with SMTP id qe3-20020a17090b4f8300b0029c3d5bdd42mr2472606pjb.3.1710279886631; Tue, 12 Mar 2024 14:44:46 -0700 (PDT) Received: from localhost (fwdproxy-prn-008.fbsv.net. [2a03:2880:ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id i6-20020a17090ac40600b0029bafbd1e02sm56018pjt.14.2024.03.12.14.44.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:46 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 13/16] io_uring: add io_recvzc request Date: Tue, 12 Mar 2024 14:44:27 -0700 Message-ID: <20240312214430.2923019-14-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Add an io_uring opcode OP_RECV_ZC for doing ZC reads from a socket that is set up for ZC Rx. The request reads skbs from a socket. Completions are posted into the main CQ for each page frag read. Big CQEs (CQE32) is required as the OP_RECV_ZC specific metadata (ZC region, offset, len) are stored in the extended 16 bytes as a struct io_uring_rbuf_cqe. For now there is no limit as to how much work each OP_RECV_ZC request does. It will attempt to drain a socket of all available data. Multishot requests are also supported. The first time an io_recvzc request completes, EAGAIN is returned which arms an async poll. Then, in subsequent runs in task work, IOU_ISSUE_SKIP_COMPLETE is returned to continue async polling. Signed-off-by: David Wei --- include/uapi/linux/io_uring.h | 1 + io_uring/io_uring.h | 10 ++ io_uring/net.c | 94 +++++++++++++++++- io_uring/opdef.c | 16 +++ io_uring/zc_rx.c | 177 +++++++++++++++++++++++++++++++++- io_uring/zc_rx.h | 11 +++ 6 files changed, 302 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 26e945e6258d..ad2ec60b0390 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -256,6 +256,7 @@ enum io_uring_op { IORING_OP_FUTEX_WAITV, IORING_OP_FIXED_FD_INSTALL, IORING_OP_FTRUNCATE, + IORING_OP_RECV_ZC, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 6426ee382276..cd1b3da96f62 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -180,6 +180,16 @@ static inline bool io_get_cqe(struct io_ring_ctx *ctx, struct io_uring_cqe **ret return io_get_cqe_overflow(ctx, ret, false); } +static inline bool io_defer_get_uncommited_cqe(struct io_ring_ctx *ctx, + struct io_uring_cqe **cqe_ret) +{ + io_lockdep_assert_cq_locked(ctx); + + ctx->cq_extra++; + ctx->submit_state.flush_cqes = true; + return io_get_cqe(ctx, cqe_ret); +} + static __always_inline bool io_fill_cqe_req(struct io_ring_ctx *ctx, struct io_kiocb *req) { diff --git a/io_uring/net.c b/io_uring/net.c index 1fa7c1fa6b5d..56172335387e 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -79,6 +79,12 @@ struct io_sr_msg { */ #define MULTISHOT_MAX_RETRY 32 +struct io_recvzc { + struct file *file; + unsigned msg_flags; + u16 flags; +}; + static inline bool io_check_multishot(struct io_kiocb *req, unsigned int issue_flags) { @@ -695,7 +701,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, unsigned int cflags; cflags = io_put_kbuf(req, issue_flags); - if (msg->msg_inq && msg->msg_inq != -1) + if (msg && msg->msg_inq && msg->msg_inq != -1) cflags |= IORING_CQE_F_SOCK_NONEMPTY; if (!(req->flags & REQ_F_APOLL_MULTISHOT)) { @@ -723,7 +729,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, goto enobufs; /* Known not-empty or unknown state, retry */ - if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq == -1) { + if (cflags & IORING_CQE_F_SOCK_NONEMPTY || (msg && msg->msg_inq == -1)) { if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY) return false; /* mshot retries exceeded, force a requeue */ @@ -1034,9 +1040,8 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return ret; } -static __maybe_unused -struct io_zc_rx_ifq *io_zc_verify_sock(struct io_kiocb *req, - struct socket *sock) +static struct io_zc_rx_ifq *io_zc_verify_sock(struct io_kiocb *req, + struct socket *sock) { unsigned token = READ_ONCE(sock->zc_rx_idx); unsigned ifq_idx = token >> IO_ZC_IFQ_IDX_OFFSET; @@ -1053,6 +1058,85 @@ struct io_zc_rx_ifq *io_zc_verify_sock(struct io_kiocb *req, return ifq; } +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + + /* non-iopoll defer_taskrun only */ + if (!req->ctx->task_complete) + return -EINVAL; + if (unlikely(sqe->file_index || sqe->addr2)) + return -EINVAL; + if (READ_ONCE(sqe->len) || READ_ONCE(sqe->addr3)) + return -EINVAL; + + zc->flags = READ_ONCE(sqe->ioprio); + zc->msg_flags = READ_ONCE(sqe->msg_flags); + + if (zc->msg_flags) + return -EINVAL; + if (zc->flags & ~RECVMSG_FLAGS) + return -EINVAL; + if (zc->flags & IORING_RECV_MULTISHOT) + req->flags |= REQ_F_APOLL_MULTISHOT; +#ifdef CONFIG_COMPAT + if (req->ctx->compat) + zc->msg_flags |= MSG_CMSG_COMPAT; +#endif + return 0; +} + +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + struct io_zc_rx_ifq *ifq; + struct socket *sock; + int ret; + + /* + * We're posting CQEs deeper in the stack, and to avoid taking CQ locks + * we serialise by having only the master thread modifying the CQ with + * DEFER_TASkRUN checked earlier and forbidding executing it from io-wq. + * That's similar to io_check_multishot() for multishot CQEs. + */ + if (issue_flags & IO_URING_F_IOWQ) + return -EAGAIN; + if (WARN_ON_ONCE(!(issue_flags & IO_URING_F_NONBLOCK))) + return -EAGAIN; + if (!(req->flags & REQ_F_POLLED) && + (zc->flags & IORING_RECVSEND_POLL_FIRST)) + return -EAGAIN; + + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + ifq = io_zc_verify_sock(req, sock); + if (!ifq) + return -EINVAL; + + ret = io_zc_rx_recv(req, ifq, sock, zc->msg_flags | MSG_DONTWAIT); + if (unlikely(ret <= 0)) { + if (ret == -EAGAIN) { + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_ISSUE_SKIP_COMPLETE; + return -EAGAIN; + } + if (ret == -ERESTARTSYS) + ret = -EINTR; + + req_set_fail(req); + io_req_set_res(req, ret, 0); + + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_STOP_MULTISHOT; + return IOU_OK; + } + + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_ISSUE_SKIP_COMPLETE; + return -EAGAIN; +} + void io_send_zc_cleanup(struct io_kiocb *req) { struct io_sr_msg *zc = io_kiocb_to_cmd(req, struct io_sr_msg); diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 9c080aadc5a6..78ec5197917e 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -36,6 +36,7 @@ #include "waitid.h" #include "futex.h" #include "truncate.h" +#include "zc_rx.h" static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags) { @@ -481,6 +482,18 @@ const struct io_issue_def io_issue_defs[] = { .prep = io_ftruncate_prep, .issue = io_ftruncate, }, + [IORING_OP_RECV_ZC] = { + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollin = 1, + .ioprio = 1, +#if defined(CONFIG_NET) + .prep = io_recvzc_prep, + .issue = io_recvzc, +#else + .prep = io_eopnotsupp_prep, +#endif + }, }; const struct io_cold_def io_cold_defs[] = { @@ -722,6 +735,9 @@ const struct io_cold_def io_cold_defs[] = { [IORING_OP_FTRUNCATE] = { .name = "FTRUNCATE", }, + [IORING_OP_RECV_ZC] = { + .name = "RECV_ZC", + }, }; const char *io_uring_get_opcode(u8 opcode) diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index 4bd27eda4bc9..bb9251111735 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -6,10 +6,12 @@ #include #include #include + +#include #include #include + #include -#include #include @@ -18,6 +20,12 @@ #include "zc_rx.h" #include "rsrc.h" +struct io_zc_rx_args { + struct io_kiocb *req; + struct io_zc_rx_ifq *ifq; + struct socket *sock; +}; + typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); static int __io_queue_mgmt(struct net_device *dev, struct io_zc_rx_ifq *ifq, @@ -371,7 +379,7 @@ static inline unsigned io_buf_pgid(struct io_zc_rx_pool *pool, return buf - pool->bufs; } -static __maybe_unused void io_zc_rx_get_buf_uref(struct io_zc_rx_buf *buf) +static void io_zc_rx_get_buf_uref(struct io_zc_rx_buf *buf) { atomic_long_add(IO_ZC_RX_UREF, &buf->niov.pp_ref_count); } @@ -640,5 +648,170 @@ const struct memory_provider_ops io_uring_pp_zc_ops = { }; EXPORT_SYMBOL(io_uring_pp_zc_ops); +static bool zc_rx_queue_cqe(struct io_kiocb *req, struct io_zc_rx_buf *buf, + struct io_zc_rx_ifq *ifq, int off, int len) +{ + struct io_uring_rbuf_cqe *rcqe; + struct io_uring_cqe *cqe; + + if (!io_defer_get_uncommited_cqe(req->ctx, &cqe)) + return false; + + cqe->user_data = req->cqe.user_data; + cqe->res = 0; + cqe->flags = IORING_CQE_F_MORE; + + rcqe = (struct io_uring_rbuf_cqe *)(cqe + 1); + rcqe->region = 0; + rcqe->off = io_buf_pgid(ifq->pool, buf) * PAGE_SIZE + off; + rcqe->len = len; + memset(rcqe->__pad, 0, sizeof(rcqe->__pad)); + return true; +} + +static int zc_rx_recv_frag(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + const skb_frag_t *frag, int off, int len) +{ + off += skb_frag_off(frag); + + if (likely(skb_frag_is_net_iov(frag))) { + struct io_zc_rx_buf *buf; + struct net_iov *niov; + + niov = netmem_to_net_iov(frag->netmem); + if (niov->pp->mp_ops != &io_uring_pp_zc_ops || + niov->pp->mp_priv != ifq) + return -EFAULT; + + buf = io_niov_to_buf(niov); + if (!zc_rx_queue_cqe(req, buf, ifq, off, len)) + return -ENOSPC; + io_zc_rx_get_buf_uref(buf); + } else { + return -EOPNOTSUPP; + } + + return len; +} + +static int +zc_rx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, + unsigned int offset, size_t len) +{ + struct io_zc_rx_args *args = desc->arg.data; + struct io_zc_rx_ifq *ifq = args->ifq; + struct io_kiocb *req = args->req; + struct sk_buff *frag_iter; + unsigned start, start_off; + int i, copy, end, off; + int ret = 0; + + start = skb_headlen(skb); + start_off = offset; + + if (offset < start) + return -EOPNOTSUPP; + + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + const skb_frag_t *frag; + + if (WARN_ON(start > offset + len)) + return -EFAULT; + + frag = &skb_shinfo(skb)->frags[i]; + end = start + skb_frag_size(frag); + + if (offset < end) { + copy = end - offset; + if (copy > len) + copy = len; + + off = offset - start; + ret = zc_rx_recv_frag(req, ifq, frag, off, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + + skb_walk_frags(skb, frag_iter) { + if (WARN_ON(start > offset + len)) + return -EFAULT; + + end = start + frag_iter->len; + if (offset < end) { + copy = end - offset; + if (copy > len) + copy = len; + + off = offset - start; + ret = zc_rx_recv_skb(desc, frag_iter, off, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + +out: + if (offset == start_off) + return ret; + return offset - start_off; +} + +static int io_zc_rx_tcp_recvmsg(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + struct sock *sk, int flags) +{ + struct io_zc_rx_args args = { + .req = req, + .ifq = ifq, + .sock = sk->sk_socket, + }; + read_descriptor_t rd_desc = { + .count = 1, + .arg.data = &args, + }; + int ret; + + lock_sock(sk); + ret = tcp_read_sock(sk, &rd_desc, zc_rx_recv_skb); + if (ret <= 0) { + if (ret < 0 || sock_flag(sk, SOCK_DONE)) + goto out; + if (sk->sk_err) + ret = sock_error(sk); + else if (sk->sk_shutdown & RCV_SHUTDOWN) + goto out; + else if (sk->sk_state == TCP_CLOSE) + ret = -ENOTCONN; + else + ret = -EAGAIN; + } +out: + release_sock(sk); + return ret; +} + +int io_zc_rx_recv(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + struct socket *sock, unsigned int flags) +{ + struct sock *sk = sock->sk; + const struct proto *prot = READ_ONCE(sk->sk_prot); + + if (prot->recvmsg != tcp_recvmsg) + return -EPROTONOSUPPORT; + + sock_rps_record_flow(sk); + return io_zc_rx_tcp_recvmsg(req, ifq, sk, flags); +} #endif diff --git a/io_uring/zc_rx.h b/io_uring/zc_rx.h index c02bf8cabc6c..c14ea3cf544a 100644 --- a/io_uring/zc_rx.h +++ b/io_uring/zc_rx.h @@ -50,6 +50,8 @@ void io_unregister_zc_rx_ifqs(struct io_ring_ctx *ctx); void io_shutdown_zc_rx_ifqs(struct io_ring_ctx *ctx); int io_register_zc_rx_sock(struct io_ring_ctx *ctx, struct io_uring_zc_rx_sock_reg __user *arg); +int io_zc_rx_recv(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + struct socket *sock, unsigned int flags); #else static inline int io_register_zc_rx_ifq(struct io_ring_ctx *ctx, struct io_uring_zc_rx_ifq_reg __user *arg) @@ -67,6 +69,15 @@ static inline int io_register_zc_rx_sock(struct io_ring_ctx *ctx, { return -EOPNOTSUPP; } + +static inline int io_zc_rx_recv(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + struct socket *sock, unsigned int flags) +{ + return -EOPNOTSUPP; +} #endif +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags); +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); + #endif From patchwork Tue Mar 12 21:44:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590644 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20F7D145FE2 for ; Tue, 12 Mar 2024 21:44:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279889; cv=none; b=R8qfixVdDmY4BBT8lDSY5kWsINPNja6wBrgEjACABtxfIXExiernz8r1+ynmgV05Gj3y/nv9XdjVHTzo/hQK2xswTiPrjihxZ7OenMVEMtTJspg3x2N2puoSZZRlh5CHUJruL7xQWUJh4NZPEvvSfTiRZcVrFs2d3EaPdG73rG0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279889; c=relaxed/simple; bh=1hALqYekJ2oj/HWhQXcw5HAvfVNul9K500R6znLcI+w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pU/slkEl8vmZcX3IY0EUpKD2znbR1Jer+d2lrSpQIazOx/8e94xBSF2ZYSqKdFT4fM8YzOAuBQwjKHwmKQO3/PeG6Ch1OzvgIwy0rY/IUp6WYMZ+6UspqS2PUDvuyUrt5+ZZhoh8gsshL0easzYCZKaKpn08by4w/Q9i/IC6zXg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=SYlvft1j; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="SYlvft1j" Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1dd916ad172so22995115ad.2 for ; Tue, 12 Mar 2024 14:44:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279887; x=1710884687; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZRO95Kdke95/CXsJPGgRoGGIRwQcdQkfuE0TmEZDZS0=; b=SYlvft1jWwOMtcS6s89Cjb61ZLegX7DX1xcpbZtv0dXXIOeyB5CKX5KXtlFKh/uVrh 7b9HeKbjBMHTzgG+UDLNz8L3CN0y5jgbp3S8DI7vYwTqEzn0lIBFng31Nh7EjTmeMffB ce3Go5Ypk3WYG4P0EV0of89ZMb2K6kzzkVgnNYHrlXCESMVEHau5FQkhDaSe0RyyYV/0 fK2H+Mi1LwuwSsb+lbSr+fYMkx6wAe5/P+ok3GEFZg1DmRcsS90HcqUwt8+yDhWbPr+x IdqhwWvvIu7ef0wHGwFZd95HGAhlEcGz9dZmMNrcRnRZ0ArDQosPdomwdLBrDQKylVCO qqIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279887; x=1710884687; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZRO95Kdke95/CXsJPGgRoGGIRwQcdQkfuE0TmEZDZS0=; b=TinU+xNHCxffJLzdCeFKb7lCrOqOwPSjwQrRgAy4U98H7ixruxMWQq+k8MhDyErmk7 EqF7ILptK0KNPphb8AidFUIbu+QUeu/waY+9UHMlir4ZHYeY2m2JkZNA871d4K/OX0Au LShoVXfQH0GlVIc0/a8JcwuOTyX1OlwAV1yfLRlGPsfrCUVPwhhz31kE4ZiFk+fahaq2 emSoxH33w/PbWWY/Jfk8XsChqBrb0z2622LOpC0vuzXb2g7oX6j1KYDhbbv3mHgtmGEz HTASq1DxnHbBp2cNKCmhR678QZohauNrW3fXi3NJqquXfB6nSoShr+fZ5f+401LSKDIn pXOA== X-Forwarded-Encrypted: i=1; AJvYcCVglbajTJjbuyCT7RzyvRXdqCoA1b8A//BW4H9f9YpdWpG+tJ5qLaT0lj6LFtgokCgJLqzHrQZ4nKJj5fXoExG6eQSg7677 X-Gm-Message-State: AOJu0Yw2XxPpDEFMGDsLyz79EmeKYY0i3LJ3xw6rR6lDn/l95D7S2H6w JSKuJAPE4DPyxE1akTb3Pd2K5Si5gWkaEacFxzs7DAd+H+I3Nbwh/nF2ehjeti8= X-Google-Smtp-Source: AGHT+IF31cYyvpgFJ1C6r8UOXG8CtpluFYbareGQQz76ulAvEHus9G0Qt/JlvJonH+TOMzSuV3P70g== X-Received: by 2002:a17:902:cf04:b0:1dd:b6b8:3d8e with SMTP id i4-20020a170902cf0400b001ddb6b83d8emr2107298plg.12.1710279887530; Tue, 12 Mar 2024 14:44:47 -0700 (PDT) Received: from localhost (fwdproxy-prn-006.fbsv.net. [2a03:2880:ff:6::face:b00c]) by smtp.gmail.com with ESMTPSA id mm16-20020a1709030a1000b001dcc97aa8fasm7244800plb.17.2024.03.12.14.44.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:47 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 14/16] net: execute custom callback from napi Date: Tue, 12 Mar 2024 14:44:28 -0700 Message-ID: <20240312214430.2923019-15-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Pavel Begunkov Sometimes we want to access a napi protected resource from task context like in the case of io_uring zc falling back to copy and accessing the buffer ring. Add a helper function that allows to execute a custom function from napi context by first stopping it similarly to napi_busy_loop(). Experimental, needs much polishing and sharing bits with napi_busy_loop(). Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/busy_poll.h | 7 +++++++ net/core/dev.c | 46 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 53 insertions(+) diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index 9b09acac538e..9f4a40898118 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -47,6 +47,8 @@ bool sk_busy_loop_end(void *p, unsigned long start_time); void napi_busy_loop(unsigned int napi_id, bool (*loop_end)(void *, unsigned long), void *loop_end_arg, bool prefer_busy_poll, u16 budget); +void napi_execute(struct napi_struct *napi, + void (*cb)(void *), void *cb_arg); void napi_busy_loop_rcu(unsigned int napi_id, bool (*loop_end)(void *, unsigned long), @@ -63,6 +65,11 @@ static inline bool sk_can_busy_loop(struct sock *sk) return false; } +static inline void napi_execute(struct napi_struct *napi, + void (*cb)(void *), void *cb_arg) +{ +} + #endif /* CONFIG_NET_RX_BUSY_POLL */ static inline unsigned long busy_loop_current_time(void) diff --git a/net/core/dev.c b/net/core/dev.c index 2096ff57685a..4de173667233 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6663,6 +6663,52 @@ void napi_busy_loop(unsigned int napi_id, } EXPORT_SYMBOL(napi_busy_loop); +void napi_execute(struct napi_struct *napi, + void (*cb)(void *), void *cb_arg) +{ + bool done = false; + unsigned long val; + void *have_poll_lock = NULL; + + rcu_read_lock(); + + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); + for (;;) { + local_bh_disable(); + val = READ_ONCE(napi->state); + + /* If multiple threads are competing for this napi, + * we avoid dirtying napi->state as much as we can. + */ + if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED | + NAPIF_STATE_IN_BUSY_POLL)) + goto restart; + + if (cmpxchg(&napi->state, val, + val | NAPIF_STATE_IN_BUSY_POLL | + NAPIF_STATE_SCHED) != val) + goto restart; + + have_poll_lock = netpoll_poll_lock(napi); + cb(cb_arg); + done = true; + gro_normal_list(napi); + local_bh_enable(); + break; +restart: + local_bh_enable(); + if (unlikely(need_resched())) + break; + cpu_relax(); + } + if (done) + busy_poll_stop(napi, have_poll_lock, false, 1); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); + rcu_read_unlock(); +} + #endif /* CONFIG_NET_RX_BUSY_POLL */ static void napi_hash_add(struct napi_struct *napi) From patchwork Tue Mar 12 21:44:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590645 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E8E414600D for ; Tue, 12 Mar 2024 21:44:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279890; cv=none; b=KW7oRQJk5EW+sdmRvIPSYTAQqfP+PJ0Yd2XrLPAqRWAy8ImOnTMdyFpBTKmGsmnnj+5gDvuQPxxoYDVlknVDMrwdY9pqk7oDb5CMCeYkRW+69ppR+ffzTpHoUuJ3azBqNo8f03Xs+i9JfL6mUvQBxredXkWL0bTrC89a1S528uI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279890; c=relaxed/simple; bh=z2w+r09/BcYGkyADiO4nrIRRIVTOhcEE99lfEgsktAM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HF+iT7shccrt7/GFOuOVAE4oBgyp+hc+dLnm3rrbQROZvArPfWtM9TnyPz70kfpT9Rqgd2gElPZCVm+DnCPqRvxXF65msnE8BFzyONovee8I5R87unjXhdWOlyL9SM57hUxhU6StaCjrUhvS9CoKh0O6sR4jnnZq7Spkb4t0ymc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=vgL2p05t; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="vgL2p05t" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6e6afb754fcso682249b3a.3 for ; Tue, 12 Mar 2024 14:44:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279888; x=1710884688; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qIC4Ae0sae8eShK2Gi00SxXSModbCHeO8tBGSVnxdXA=; b=vgL2p05tgSlpBkFqYqHwqlzETlIEEUSl5R3ZvHwcFmVyFkfr2fTDhX/ifvRNvZ/I1K O3Rx5Xyq0AZkdGcBwEXU/R/lCA+GYrjv+Oq06Cy+t6f2jp8B56XJLAp0eJ+9G5kYbie1 l84xr7tfb42ny9/J5ICnGz5iG8DpDnb3EMnfX6bere4ha5Ad+emKUHez1JZE1ZLbEkSx AO1k/Be7l2OBSaJINCuaTdE7T/5MYtqrKVNh520IuAmcUkmGlgLubUmQpbpAxAGtVNXp WCN4FJrn6K8bFPidDyL/KnRHQ1uysHUcjosPBP0X53JQd47NzHjNTLo5ZvDefmLaTd+o Zekw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279888; x=1710884688; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qIC4Ae0sae8eShK2Gi00SxXSModbCHeO8tBGSVnxdXA=; b=AXVQ2H7fnivDdUnYrBzavDLEq62UiiOfC3bXxw7Y5A2oKaJpyYZpkyYQhWO7itxZmn 6HbDjrJpMprzTa8Rbe3ra4lSazZfnr+R7muh2nlDcHa05MkORdUmxnkS5qFe9Vq0hOdz DXM4vaWyYWGCFb0V8mgrkuK6SJGkqsQ8UPZk7Z2QVR5Db7TIc4cUGpiAIGIgi/kMv8nQ bKfXLyRKWXZJPjnEkgJMxZWjxM/T3guTq3N2XZbhPDiUOgIg3p4GHfHDJ6cA0PtfDfeI rMCp+RYnIJtQjltUQUb9ho5lQXi9JbiaxIEqf8b1mKrK+wgP2gubrnBJRb8Vtmh1c45v Sstw== X-Forwarded-Encrypted: i=1; AJvYcCXky4lljEIC3sbuIpkn1FsOliqQooqRMb7INiUckKkndAePbsa+mIGrI1nIQQBTRybVj+7MBrgAPgtWlRcJR8OttS4ADZ5d X-Gm-Message-State: AOJu0Ywl0xgRXp1QnlRitHpYgRYQVEI+5NmBJL2gClIBBMkZF3x9xyJi h5elrCn6eK5z1SIgeN99hs0hXz5rz2BXGXEWktJe5DQgCMNwKIrwZyzbPHeZU8LtiZ594fdUrl7 l X-Google-Smtp-Source: AGHT+IGmk/rm4/qpneP5Z5KvYp/gR8eyBX/7oPoJnJNaHVb+8SwNc608T+M0WqH8X/aPFxjSmYTQjg== X-Received: by 2002:a05:6a00:4f96:b0:6e5:5597:822d with SMTP id ld22-20020a056a004f9600b006e55597822dmr842673pfb.33.1710279888574; Tue, 12 Mar 2024 14:44:48 -0700 (PDT) Received: from localhost (fwdproxy-prn-025.fbsv.net. [2a03:2880:ff:19::face:b00c]) by smtp.gmail.com with ESMTPSA id r10-20020aa7988a000000b006e681769ee0sm5808845pfl.145.2024.03.12.14.44.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:48 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 15/16] io_uring/zcrx: add copy fallback Date: Tue, 12 Mar 2024 14:44:29 -0700 Message-ID: <20240312214430.2923019-16-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-State: RFC Currently, if user fails to keep up with the network and doesn't refill the buffer ring fast enough the NIC/driver will start dropping packets. That might be too punishing. Add a fallback path, which would allow drivers to allocate normal pages when there is starvation, then zc_rx_recv_skb() we'll detect them and copy into the user specified buffers, when they become available. That should help with adoption and also help the user striking the right balance allocating just the right amount of zerocopy buffers but also being resilient to sudden surges in traffic. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- io_uring/zc_rx.c | 111 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 105 insertions(+), 6 deletions(-) diff --git a/io_uring/zc_rx.c b/io_uring/zc_rx.c index bb9251111735..d5f49590e682 100644 --- a/io_uring/zc_rx.c +++ b/io_uring/zc_rx.c @@ -8,6 +8,7 @@ #include #include +#include #include #include @@ -26,6 +27,11 @@ struct io_zc_rx_args { struct socket *sock; }; +struct io_zc_refill_data { + struct io_zc_rx_ifq *ifq; + struct io_zc_rx_buf *buf; +}; + typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); static int __io_queue_mgmt(struct net_device *dev, struct io_zc_rx_ifq *ifq, @@ -648,6 +654,34 @@ const struct memory_provider_ops io_uring_pp_zc_ops = { }; EXPORT_SYMBOL(io_uring_pp_zc_ops); +static void io_napi_refill(void *data) +{ + struct io_zc_refill_data *rd = data; + struct io_zc_rx_ifq *ifq = rd->ifq; + netmem_ref netmem; + + if (WARN_ON_ONCE(!ifq->pp)) + return; + + netmem = page_pool_alloc_netmem(ifq->pp, GFP_ATOMIC | __GFP_NOWARN); + if (!netmem) + return; + if (WARN_ON_ONCE(!netmem_is_net_iov(netmem))) + return; + + rd->buf = io_niov_to_buf(netmem_to_net_iov(netmem)); +} + +static struct io_zc_rx_buf *io_zc_get_buf_task_safe(struct io_zc_rx_ifq *ifq) +{ + struct io_zc_refill_data rd = { + .ifq = ifq, + }; + + napi_execute(ifq->pp->p.napi, io_napi_refill, &rd); + return rd.buf; +} + static bool zc_rx_queue_cqe(struct io_kiocb *req, struct io_zc_rx_buf *buf, struct io_zc_rx_ifq *ifq, int off, int len) { @@ -669,6 +703,42 @@ static bool zc_rx_queue_cqe(struct io_kiocb *req, struct io_zc_rx_buf *buf, return true; } +static ssize_t zc_rx_copy_chunk(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, + void *data, unsigned int offset, size_t len) +{ + size_t copy_size, copied = 0; + struct io_zc_rx_buf *buf; + int ret = 0, off = 0; + u8 *vaddr; + + do { + buf = io_zc_get_buf_task_safe(ifq); + if (!buf) { + ret = -ENOMEM; + break; + } + + vaddr = kmap_local_page(buf->page); + copy_size = min_t(size_t, PAGE_SIZE, len); + memcpy(vaddr, data + offset, copy_size); + kunmap_local(vaddr); + + if (!zc_rx_queue_cqe(req, buf, ifq, off, copy_size)) { + napi_pp_put_page(net_iov_to_netmem(&buf->niov), false); + return -ENOSPC; + } + + io_zc_rx_get_buf_uref(buf); + napi_pp_put_page(net_iov_to_netmem(&buf->niov), false); + + offset += copy_size; + len -= copy_size; + copied += copy_size; + } while (offset < len); + + return copied ? copied : ret; +} + static int zc_rx_recv_frag(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, const skb_frag_t *frag, int off, int len) { @@ -688,7 +758,22 @@ static int zc_rx_recv_frag(struct io_kiocb *req, struct io_zc_rx_ifq *ifq, return -ENOSPC; io_zc_rx_get_buf_uref(buf); } else { - return -EOPNOTSUPP; + struct page *page = skb_frag_page(frag); + u32 p_off, p_len, t, copied = 0; + u8 *vaddr; + int ret = 0; + + skb_frag_foreach_page(frag, off, len, + page, p_off, p_len, t) { + vaddr = kmap_local_page(page); + ret = zc_rx_copy_chunk(req, ifq, vaddr, p_off, p_len); + kunmap_local(vaddr); + + if (ret < 0) + return copied ? copied : ret; + copied += ret; + } + len = copied; } return len; @@ -702,15 +787,29 @@ zc_rx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, struct io_zc_rx_ifq *ifq = args->ifq; struct io_kiocb *req = args->req; struct sk_buff *frag_iter; - unsigned start, start_off; + unsigned start, start_off = offset; int i, copy, end, off; int ret = 0; - start = skb_headlen(skb); - start_off = offset; + if (unlikely(offset < skb_headlen(skb))) { + ssize_t copied; + size_t to_copy; - if (offset < start) - return -EOPNOTSUPP; + to_copy = min_t(size_t, skb_headlen(skb) - offset, len); + copied = zc_rx_copy_chunk(req, ifq, skb->data, offset, to_copy); + if (copied < 0) { + ret = copied; + goto out; + } + offset += copied; + len -= copied; + if (!len) + goto out; + if (offset != skb_headlen(skb)) + goto out; + } + + start = skb_headlen(skb); for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { const skb_frag_t *frag; From patchwork Tue Mar 12 21:44:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13590646 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4636B146008 for ; Tue, 12 Mar 2024 21:44:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279891; cv=none; b=Mx/P3TBRbQC84/lUAYmgCngx8KBMiCmonKRDObClfus/EfO4gGeYDbdhhN8dqBvZ3qpLikAA+K+OcGxYezyIF7GSnAAf2XO41LG6lZCiXvH3ZnlkhuYtVd/8KcJDKGu8ox0nYh4Bqddpm647leL90VJyVGw0QFXNznR9s/bx1/k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710279891; c=relaxed/simple; bh=/tu8OaqjGRKbIV/+P2KUk+/RprWqiM4IZ65Z0Qn3FRA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dCciQEI/p9jlT/+Lc5KfUz0Sz0ZiiK6pGuB9XTtehM+ZGOxexY36lHnV3j++VZSYQ0jBsOu0GFXWa2pjkGYCLYqv7XdRkhjdRKUZ78r5Lsga8LwvVke5snDQno97nK/xLUBXxerDOxfuzubFChgn2EuQbC0BpgMgEjMvrWr+x8s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=H2vPnXv3; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="H2vPnXv3" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6e5dddd3b95so221770b3a.1 for ; Tue, 12 Mar 2024 14:44:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1710279889; x=1710884689; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7Byha3rgEcVqJGeCwdpNjvlHNAMt1U8ih/ENexG1tbU=; b=H2vPnXv3/xChQJF93NEaRsLZmPxjrvonsn7s508F3MhoXgvfjv30uuhy+nRzZXfMbr SVSNOeeX7L5GWDsL6vL+/V06wkckFOO/lvNbV1zL9jJg//fWFLT5QRv/5tmGqd2olons +rVKBu7WjKuQr+tjqJtCejNlsMNDrrgQRJoS1Y0HmHsbYM3kIg2oUB43SojpK03pBeBp Fsekue/drGyBSMyuMffZH+1w+2jZRyBCUvI4YEowcbGwTV5BUgnazfRqAQTmWO0y77Jl FfmAe+M8kTXrxPjuZdbvKzxKxcJA4JNMMKPwv7G3vbgQ+IecB+Wf7znuZkfepf6PXAlT T+CQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710279889; x=1710884689; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7Byha3rgEcVqJGeCwdpNjvlHNAMt1U8ih/ENexG1tbU=; b=bY6lImu/bKnlDcAka8b188mu0mX9y/Nt0+ngWrf71lyhIN5/j/zzF/uAbK8+tsxKpf oVeDF6xxIOLHVN9kMiPn8FE/dj+49LMPF0XnAH+k4V4CiAa+IOQg5cF4mbBG+lZVO/jl +vqtkoyh40xJIeZbypuwYW7KZYv/O9vkuhWASw2YBTWqDwimqMHnt7MrF5caH3ig6kW8 5A+qByXbAnIRVw7ETaLpm7uL6wPfpLXePK1s50Pabi/9Gg8R15QXI9yNcG0+R3I4mjC4 0TVneWDm7X1JRLCamQGrb4ZE0aaMMSHaEUQk9O/uGCzpHCDtXw+ATu2g/4ZdGZKMxG31 /f2A== X-Forwarded-Encrypted: i=1; AJvYcCXb0Wy7Bri7jUeW9zkTIqHK1Qim1K8Z9mWzqoc5zTpBZrsgrxEgSJebxuin4TiKHBuz0oBxYVbxOnf6GGEtR3qNLG/gheW9 X-Gm-Message-State: AOJu0YwD1VDePtu+ZjWB1sS9zdKcxS2zq5bk3Mp5SGxW0mhuThpGHzQY T4n2GSvNWRKU9pnPu5694xqTz6iyJy486DpdLTYaBk6no+C5E5fm3v4SY4fipBc= X-Google-Smtp-Source: AGHT+IFeQGe+SoZYHD9gW4lOdyAL/Eqt5tURDQaxCC+tNryp+pFHXVOaS3pOK8fBrf9BwedyA6o9IQ== X-Received: by 2002:a05:6a00:cd2:b0:6e6:b4f3:19ec with SMTP id b18-20020a056a000cd200b006e6b4f319ecmr753596pfv.7.1710279889564; Tue, 12 Mar 2024 14:44:49 -0700 (PDT) Received: from localhost (fwdproxy-prn-013.fbsv.net. [2a03:2880:ff:d::face:b00c]) by smtp.gmail.com with ESMTPSA id d8-20020a056a0010c800b006e685c36d41sm5201452pfu.55.2024.03.12.14.44.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 14:44:49 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [RFC PATCH v4 16/16] veth: add support for io_uring zc rx Date: Tue, 12 Mar 2024 14:44:30 -0700 Message-ID: <20240312214430.2923019-17-dw@davidwei.uk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240312214430.2923019-1-dw@davidwei.uk> References: <20240312214430.2923019-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC From: Pavel Begunkov Not for upstream, testing only Add io_uring zerocopy support for veth. It's not truly zerocopy, the data is copied in napi, but that's early in the stack and so useful for now for testing. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- drivers/net/veth.c | 214 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 208 insertions(+), 6 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 500b9dfccd08..b56e06113453 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -26,6 +26,8 @@ #include #include #include +#include +#include #include #define DRV_NAME "veth" @@ -67,6 +69,7 @@ struct veth_rq { struct ptr_ring xdp_ring; struct xdp_rxq_info xdp_rxq; struct page_pool *page_pool; + struct netdev_rx_queue rq; }; struct veth_priv { @@ -75,6 +78,7 @@ struct veth_priv { struct bpf_prog *_xdp_prog; struct veth_rq *rq; unsigned int requested_headroom; + bool zc_installed; }; struct veth_xdp_tx_bq { @@ -335,9 +339,12 @@ static bool veth_skb_is_eligible_for_gro(const struct net_device *dev, const struct net_device *rcv, const struct sk_buff *skb) { + struct veth_priv *rcv_priv = netdev_priv(rcv); + return !(dev->features & NETIF_F_ALL_TSO) || (skb->destructor == sock_wfree && - rcv->features & (NETIF_F_GRO_FRAGLIST | NETIF_F_GRO_UDP_FWD)); + rcv->features & (NETIF_F_GRO_FRAGLIST | NETIF_F_GRO_UDP_FWD)) || + rcv_priv->zc_installed; } static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) @@ -726,6 +733,9 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, struct sk_buff *skb = *pskb; u32 frame_sz; + if (WARN_ON_ONCE(1)) + return -EFAULT; + if (skb_shared(skb) || skb_head_is_locked(skb) || skb_shinfo(skb)->nr_frags || skb_headroom(skb) < XDP_PACKET_HEADROOM) { @@ -758,6 +768,90 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq, return -ENOMEM; } +static noinline struct sk_buff *veth_iou_rcv_skb(struct veth_rq *rq, + struct sk_buff *skb) +{ + struct sk_buff *nskb; + u32 size, len, off, max_head_size; + struct page *page; + int ret, i, head_off; + void *vaddr; + + /* Testing only, randomly send normal pages to test copy fallback */ + if (ktime_get_ns() % 16 == 0) + return skb; + + skb_prepare_for_gro(skb); + max_head_size = skb_headlen(skb); + + rcu_read_lock(); + nskb = napi_alloc_skb(&rq->xdp_napi, max_head_size); + if (!nskb) + goto drop; + + skb_copy_header(nskb, skb); + skb_mark_for_recycle(nskb); + + size = max_head_size; + if (skb_copy_bits(skb, 0, nskb->data, size)) { + consume_skb(nskb); + goto drop; + } + skb_put(nskb, size); + head_off = skb_headroom(nskb) - skb_headroom(skb); + skb_headers_offset_update(nskb, head_off); + + /* Allocate paged area of new skb */ + off = size; + len = skb->len - off; + + for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) { + struct io_zc_rx_buf *buf; + netmem_ref netmem; + + netmem = page_pool_alloc_netmem(rq->page_pool, GFP_ATOMIC | __GFP_NOWARN); + if (!netmem) { + consume_skb(nskb); + goto drop; + } + if (WARN_ON_ONCE(!netmem_is_net_iov(netmem))) { + consume_skb(nskb); + goto drop; + } + + buf = container_of(netmem_to_net_iov(netmem), + struct io_zc_rx_buf, niov); + page = buf->page; + + if (WARN_ON_ONCE(buf->niov.pp != rq->page_pool)) + goto drop; + + size = min_t(u32, len, PAGE_SIZE); + skb_add_rx_frag_netmem(nskb, i, netmem, 0, size, PAGE_SIZE); + + vaddr = kmap_atomic(page); + ret = skb_copy_bits(skb, off, vaddr, size); + kunmap_atomic(vaddr); + + if (ret) { + consume_skb(nskb); + goto drop; + } + len -= size; + off += size; + } + rcu_read_unlock(); + + consume_skb(skb); + skb = nskb; + return skb; +drop: + rcu_read_unlock(); + kfree_skb(skb); + return NULL; +} + + static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb, struct veth_xdp_tx_bq *bq, @@ -901,8 +995,13 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget, /* ndo_start_xmit */ struct sk_buff *skb = ptr; - stats->xdp_bytes += skb->len; - skb = veth_xdp_rcv_skb(rq, skb, bq, stats); + if (rq->page_pool->mp_ops == &io_uring_pp_zc_ops) { + skb = veth_iou_rcv_skb(rq, skb); + } else { + stats->xdp_bytes += skb->len; + skb = veth_xdp_rcv_skb(rq, skb, bq, stats); + } + if (skb) { if (skb_shared(skb) || skb_unclone(skb, GFP_ATOMIC)) netif_receive_skb(skb); @@ -961,15 +1060,22 @@ static int veth_poll(struct napi_struct *napi, int budget) return done; } -static int veth_create_page_pool(struct veth_rq *rq) +static int veth_create_page_pool(struct veth_rq *rq, struct io_zc_rx_ifq *ifq) { struct page_pool_params pp_params = { .order = 0, .pool_size = VETH_RING_SIZE, .nid = NUMA_NO_NODE, .dev = &rq->dev->dev, + .napi = &rq->xdp_napi, }; + if (ifq) { + rq->rq.pp_private = ifq; + rq->rq.pp_ops = &io_uring_pp_zc_ops; + pp_params.queue = &rq->rq; + } + rq->page_pool = page_pool_create(&pp_params); if (IS_ERR(rq->page_pool)) { int err = PTR_ERR(rq->page_pool); @@ -987,7 +1093,7 @@ static int __veth_napi_enable_range(struct net_device *dev, int start, int end) int err, i; for (i = start; i < end; i++) { - err = veth_create_page_pool(&priv->rq[i]); + err = veth_create_page_pool(&priv->rq[i], NULL); if (err) goto err_page_pool; } @@ -1043,9 +1149,17 @@ static void veth_napi_del_range(struct net_device *dev, int start, int end) for (i = start; i < end; i++) { struct veth_rq *rq = &priv->rq[i]; + void *ptr; + int nr = 0; rq->rx_notify_masked = false; - ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free); + + while ((ptr = ptr_ring_consume(&rq->xdp_ring))) { + veth_ptr_free(ptr); + nr++; + } + + ptr_ring_cleanup(&rq->xdp_ring, NULL); } for (i = start; i < end; i++) { @@ -1281,6 +1395,9 @@ static int veth_set_channels(struct net_device *dev, struct net_device *peer; int err; + if (priv->zc_installed) + return -EINVAL; + /* sanity check. Upper bounds are already enforced by the caller */ if (!ch->rx_count || !ch->tx_count) return -EINVAL; @@ -1358,6 +1475,8 @@ static int veth_open(struct net_device *dev) struct net_device *peer = rtnl_dereference(priv->peer); int err; + priv->zc_installed = false; + if (!peer) return -ENOTCONN; @@ -1536,6 +1655,84 @@ static void veth_set_rx_headroom(struct net_device *dev, int new_hr) rcu_read_unlock(); } +static int __veth_iou_set(struct net_device *dev, + struct netdev_bpf *xdp) +{ + bool napi_already_on = veth_gro_requested(dev) && (dev->flags & IFF_UP); + unsigned qid = xdp->zc_rx.queue_id; + struct veth_priv *priv = netdev_priv(dev); + struct net_device *peer; + struct veth_rq *rq; + int ret; + + if (priv->_xdp_prog) + return -EINVAL; + if (qid >= dev->real_num_rx_queues) + return -EINVAL; + if (!(dev->flags & IFF_UP)) + return -EOPNOTSUPP; + if (dev->real_num_rx_queues != 1) + return -EINVAL; + rq = &priv->rq[qid]; + + if (!xdp->zc_rx.ifq) { + if (!priv->zc_installed) + return -EINVAL; + + veth_napi_del(dev); + priv->zc_installed = false; + if (!veth_gro_requested(dev) && netif_running(dev)) { + dev->features &= ~NETIF_F_GRO; + netdev_features_change(dev); + } + return 0; + } + + if (priv->zc_installed) + return -EINVAL; + + peer = rtnl_dereference(priv->peer); + peer->hw_features &= ~NETIF_F_GSO_SOFTWARE; + + ret = veth_create_page_pool(rq, xdp->zc_rx.ifq); + if (ret) + return ret; + + ret = ptr_ring_init(&rq->xdp_ring, VETH_RING_SIZE, GFP_KERNEL); + if (ret) { + page_pool_destroy(rq->page_pool); + rq->page_pool = NULL; + return ret; + } + + priv->zc_installed = true; + + if (!veth_gro_requested(dev)) { + /* user-space did not require GRO, but adding XDP + * is supposed to get GRO working + */ + dev->features |= NETIF_F_GRO; + netdev_features_change(dev); + } + if (!napi_already_on) { + netif_napi_add(dev, &rq->xdp_napi, veth_poll); + napi_enable(&rq->xdp_napi); + rcu_assign_pointer(rq->napi, &rq->xdp_napi); + } + return 0; +} + +static int veth_iou_set(struct net_device *dev, + struct netdev_bpf *xdp) +{ + int ret; + + rtnl_lock(); + ret = __veth_iou_set(dev, xdp); + rtnl_unlock(); + return ret; +} + static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, struct netlink_ext_ack *extack) { @@ -1545,6 +1742,9 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, unsigned int max_mtu; int err; + if (priv->zc_installed) + return -EINVAL; + old_prog = priv->_xdp_prog; priv->_xdp_prog = prog; peer = rtnl_dereference(priv->peer); @@ -1623,6 +1823,8 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) switch (xdp->command) { case XDP_SETUP_PROG: return veth_xdp_set(dev, xdp->prog, xdp->extack); + case XDP_SETUP_ZC_RX: + return veth_iou_set(dev, xdp); default: return -EINVAL; }