From patchwork Wed Jan 8 22:06:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931619 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3A561A2396 for ; Wed, 8 Jan 2025 22:07:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374026; cv=none; b=BhG5qGqyroPDUbmAH1cnDUY0S+9sU0n85dTjUV+QA5U04VAtOcHKQxiPVwnEndm1Z5/Ba64QtTA73craRLg0hRZ2DKnQgkn7cEGUWdnD3LtjHo1p4rwWO20vf8wUtRW3pOtswfvcreeEJHKQeGAIyCGYoPDOtA5PBG5Xi+yS1Cs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374026; c=relaxed/simple; bh=VAoUMQEGqjLFxP/Xbi0jSPU2djzzUYFnXJh5BOeqN14=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Su/5ssjjfGWx2g8DCqBDWYRsgrj0OtDE9Tw3JWHOY51KPCgzuyhYDkFBGxV8Dg16CLsJ0l5v5dG41MxQw/D1Xu8In9MJ+/tiIH+vgRqnhml6uz4z+w0VpFfJwMEPgaEZc+UqgXWO/dSr28KTvoARyruGU5/KLS+TreS3iP8xAZQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=GJPHVQ7i; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="GJPHVQ7i" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-21654fdd5daso3390945ad.1 for ; Wed, 08 Jan 2025 14:07:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374024; x=1736978824; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qtABS+dmyoe0kdyaoBin9gQ/i6vhks1kSLsluvXpY8o=; b=GJPHVQ7inCWfegO88IZpUo3aoT5KH4s8ZEgdc753YYwwzzEWt6WnlSGNw0Xp4IDtWP P37Zz4GIf/qLi3dFEeohaIRVx/SRNTTDE84LD6mqdkXe9ERGOEsvW/QcwnR6UczhtNia r3XGz83qXoeG2mISTDahcwrwapk3KNNHPZdleolvN7cwO1lml8HH1cun0YLStEtCshC8 ahVkXdAKaAXJlkKnYmebk3+JZWN8pjumiWJOH6uOOdvS4zfhN8djdA7LymwktMsBiFfx vAwQuXUp7JvDjgQxGAGrRNBm+y10gQnXp73Wh3AmRPCAo2qnMRovmEFBZEJtX+BkDHYh 9LRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374024; x=1736978824; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qtABS+dmyoe0kdyaoBin9gQ/i6vhks1kSLsluvXpY8o=; b=oL1PRdUCxONthZ5BZTjLrM+QY+/Wjt33w0AdozTlPEzF3gbZZON2FiAlQJcoN/ePGh FIpq4zDRADtQlLPsrfimz/dz/x6Id3ncJnErXGb4LBluhOOeRp/cHrJazb1onmiUd2tG JxSJbo40Ve3hAMIxwyT7L6F2dEOJfnVyk1ip+bWdSoRnN8FZ9OrKz+Rg2hSAq6FWSXuI 3pyciY4fT/kKcetLk2DUrZkw8lmHuE9CuiMbwZV3RBpqa2gkmsWgqo1xONy7MKtGXJQy qMd7XZKBoYob6BVWWHDe1x7BwSfVS53rob2+jSYCXux9dZ/4m6kbk0KW7a46zBzYj+95 TYUA== X-Forwarded-Encrypted: i=1; AJvYcCVfgidxWWEQWwqJHvzAMbSHm/IB8nGjdUSUpi8AZ479heEnMOGMnXgNIbf1F/tNCDbotp6pke0=@vger.kernel.org X-Gm-Message-State: AOJu0YxvxAvGNYJReOnaV+dGqSANtLCzK+zRNHpohk0nVJe0DFM4SYWY Q2kQhSApWAO4xtxBW0NTqzdYt+0ybuSPNNPQFpuT7B6MrjioslbtjmVRLn1USkc= X-Gm-Gg: ASbGncuh0nSSSPnHJ/IhZnlApvYGvTQdK90UpzQGlFyGicTlIc8jxSGokHxAeH3p5xX alExlj0nCTrta37YhrgwDHOACoGBOdbLczcePkjv0hlIjj3MYgd4pJjpdqDJrfe2sCIzj82P9LW aG0xCMT2sG9vi53kfaITxwZzExKXbe+Emthg7GwJP5M/R73hDZp39jdYwyquXWKjHdHSBsuRaUJ KTVt9Hs9BYAdaWMDiwriT4AI6ofcynPQrQJkjrD X-Google-Smtp-Source: AGHT+IGDDgScB0l1nhJ5EZNjTq3i4T3SvpWHOaO5xF5nuTlOcFaaBYfOnCUs22uz9e2PIY798zrLKA== X-Received: by 2002:a05:6a20:c88d:b0:1dc:e8d:c8f0 with SMTP id adf61e73a8af0-1e88d0bfd48mr8336038637.29.1736374024131; Wed, 08 Jan 2025 14:07:04 -0800 (PST) Received: from localhost ([2a03:2880:ff:3::]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72aad8dbafesm36622053b3a.128.2025.01.08.14.07.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:03 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 01/22] net: make page_pool_ref_netmem work with net iovs Date: Wed, 8 Jan 2025 14:06:22 -0800 Message-ID: <20250108220644.3528845-2-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov page_pool_ref_netmem() should work with either netmem representation, but currently it casts to a page with netmem_to_page(), which will fail with net iovs. Use netmem_get_pp_ref_count_ref() instead. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/page_pool/helpers.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h index 543f54fa3020..582a3d00cbe2 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -307,7 +307,7 @@ static inline long page_pool_unref_page(struct page *page, long nr) static inline void page_pool_ref_netmem(netmem_ref netmem) { - atomic_long_inc(&netmem_to_page(netmem)->pp_ref_count); + atomic_long_inc(netmem_get_pp_ref_count_ref(netmem)); } static inline void page_pool_ref_page(struct page *page) From patchwork Wed Jan 8 22:06:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931620 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16AAF204687 for ; Wed, 8 Jan 2025 22:07:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374027; cv=none; b=Ugl3BaJRDuS3MDodrvw909LWVylbxXocGPE4CjRy9bqBGe38k6HYD2QMdI660VJCyd3RcgGHKIGq1liQjKvyPXqd8+kbeeYgcaqDM38P3kV9MJdUEPlfJsiYJgjXatw5GdmRrQqCpo9qKCPxDxE8q0sbfK8v1HL6mUxnLXn2nv8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374027; c=relaxed/simple; bh=RN9FRieTmgxqv0c+YKrWG+l7gQJLp/gap7HunHQoqkE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gxRyjSGLWbdLxhZXLhqNer7cpvcVz8j88MXnQS0uqBpfICEhAO9sFdp1A2xdazxujZgy90wlqgr+SfrbeUpDj/PJY2/UOhmY5URlHe/yFLR1SU5fgbCgCAWs30AuYxrZnsnLxxGFa81tV+OzmBVelHpFebNaHhvrOTLXtwGXk8Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=BAV2BZ3k; arc=none smtp.client-ip=209.85.216.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="BAV2BZ3k" Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-2ee709715d9so401655a91.3 for ; Wed, 08 Jan 2025 14:07:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374025; x=1736978825; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4BJhQIF+q/BdwFZrWY61E+04UcDJKhF4LPhTAqO37Wg=; b=BAV2BZ3kQYK6YyEsezI1M8hrzvZs5pJh8WP2ve3NvkeKK60GuSHIblp91jGSHwB1mM nky2Hg/boYDQVEyGdpcpfbYfFYcUq8wr7LyUS1lc9oYWTmQqwXiCV1Uz0oTJsNS5PB5r vf/cRwig1oVZ4TGSqOgLlZ8tmQQ+CqRXB+HDG8sIADBE2hFxGi9cqNPahffFib4uKQ2y QavVf+isq3xY/ZF5+zswvCXJAV2WrFnkTW17qsju8cJROYRiEt6HDlVGPz8wbO+2BSUi iHVw4Rc31Y8KxkrEV24leFxHvM+JM9Rs4QpTNADFe1EZYiY4aaRrl6zghwq/A6/GQk7Q Cb3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374025; x=1736978825; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4BJhQIF+q/BdwFZrWY61E+04UcDJKhF4LPhTAqO37Wg=; b=NnEmptrjZOXKQKLiYtLHLbmzdZHvWt5J3U0w8MQPWGy1yyuBMr5qIEHuFRD8iZAG/k Bt75ThMYU92muHTEaOn9dNQL9Kx1/s873swGnPU80sHJvR4MMFRJDRJveP2aZHbPkDXJ hmQxBif2t1bJ4iS8hCdqhElNlBNeewXBfvGKM0kiufWLqzfb06xFBCCH6vI5txFBNh5y KmQB8wUtvS7mQ5S26abZyS1CQEel+Y9yeyRUmfTPoyBdITzmP6aSpHLHixAbKGkG1ffa t/ubXQ5tagaVVJIlJ/qnl8eDIdLvkIqvU24GvNFGeyGfIVh74E+wbpHmZvEl1k3ieIXR CeGQ== X-Forwarded-Encrypted: i=1; AJvYcCWHDsqE559G0jTcqXlBTU0IeJrtYoeoezQ4BRe7NvVQQtykncfD3UYpYdcVH1ZeqeKK/Ef+mKI=@vger.kernel.org X-Gm-Message-State: AOJu0YyrQ6bh3yN7MDebuDpG2FAd8WOS5akPnTTIfK4abHHjFNsPq3cm uJAPUXxeqo1IQ2Zed/klqvLKvs8SCCbGASK0ifpCcpJPyjlzvPBElej+XjJ8A0Y= X-Gm-Gg: ASbGncvk9jsiaWedLA/nT1qUE06X0dZT5bB5NbZ71N40dqclw8L5xqKc0h/tgWDIqGc /mfe9NO1tjzhhisivHoZ8eWtJ0wn+HJ65vb/BD4dtyTaLDopvq3Pw4p0pFNKcaSLXypBuy+tbsr IeC7/ppRmglGxnMZmfZF7NYSCtXVrRNpEd0p4iSCgBXwO4zQeojBeoQRVgJPXNVOUp8c+DVCos+ 61sWg8P2nRXoEF8/v1Pf72ebv51bbTAbDDS5yTsGQ== X-Google-Smtp-Source: AGHT+IG30t7WuCDSwHCzEWAOGT9TEAdPgdISqVX1a/HXZhmUwyes9kCEGDD0QvKCmgEhlrcZgDT4AA== X-Received: by 2002:a17:90b:2f0e:b0:2f4:49d8:e718 with SMTP id 98e67ed59e1d1-2f548eac0bfmr6895925a91.9.1736374025407; Wed, 08 Jan 2025 14:07:05 -0800 (PST) Received: from localhost ([2a03:2880:ff:17::]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f54a2ab6e8sm2096423a91.23.2025.01.08.14.07.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:05 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 02/22] net: page_pool: don't cast mp param to devmem Date: Wed, 8 Jan 2025 14:06:23 -0800 Message-ID: <20250108220644.3528845-3-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov page_pool_check_memory_provider() is a generic path and shouldn't assume anything about the actual type of the memory provider argument. It's fine while devmem is the only provider, but cast away the devmem specific binding types to avoid confusion. Reviewed-by: Jakub Kicinski Reviewed-by: Mina Almasry Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- net/core/page_pool_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c index 48335766c1bf..8d31c71bea1a 100644 --- a/net/core/page_pool_user.c +++ b/net/core/page_pool_user.c @@ -353,7 +353,7 @@ void page_pool_unlist(struct page_pool *pool) int page_pool_check_memory_provider(struct net_device *dev, struct netdev_rx_queue *rxq) { - struct net_devmem_dmabuf_binding *binding = rxq->mp_params.mp_priv; + void *binding = rxq->mp_params.mp_priv; struct page_pool *pool; struct hlist_node *n; From patchwork Wed Jan 8 22:06:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931621 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 482222046B0 for ; Wed, 8 Jan 2025 22:07:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374028; cv=none; b=tyuofoc3hFrYlN/1YgWGEhZFLV3F5CEI1A9KAAm7gV+z3acXVebACH09fJWI07g9pfeDbVeo6xbjAV6HwJgrc8CPKBAcN9LWkz337tcb7mWFKITgpBCKKJ77OksP1W2P2k+ZnUGmkjlMZ/qs/TOyXorxk2xZf+esjsXbcDCQ6ZY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374028; c=relaxed/simple; bh=1iGhFb5yYmlt+N/E7uhnersaFVha+jc84E0nut9lM44=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fnr99XRsnPYGpKbQ1pzgv4BD83C7a+kFuwZjKZON5+SruW14G03rWZEVISxxYx+/ibCBcAbW8IPFXeNkEbL0/0pPQHOnS8zdAtdtJ6zZrZ5BQ/Xn6Bx0cJp4uRW9rgqHV4MqKtnTU3gclWUuQ9TnPTE4eo1gaafwdS3v7s/LzgM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=ORlG1huc; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="ORlG1huc" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2f4409fc8fdso394936a91.1 for ; Wed, 08 Jan 2025 14:07:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374026; x=1736978826; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ls7TIJmWbtjH6cQ+zvuFtI+oK15m0KZtEVTJMdgXvRQ=; b=ORlG1huc0fXUqqZCeOknixAKVInrzH5ZhZc6jr2haLHq9YjRbrlCRbgijmh4ghw6Ut 81DwukMX5J1ABeUmZumEOIv1G7C5OOBze6HZmtZwtGXSUn3r7vjoRN936FVAi6BhoCkW e/zm2JSEbmxiCw3FHhcs7VLOfu4fFcD1CMDMPrTdy6jwuMet4oY6J+HsHijnHWv7v+SX Ca50Ujp7yKkl9qZc1zUQytosycfIPF03HlhOSaqTLjQokLntEAGN+tO8tKlFGrwZBZew 1LiEqsPnWH+VCxGV0mpWIFtyTzAd2SpWWHhMMTsebxCx7/bf/Zv20c0MiWXJnxcpHUuP PCNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374026; x=1736978826; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ls7TIJmWbtjH6cQ+zvuFtI+oK15m0KZtEVTJMdgXvRQ=; b=WUOOZptJSB+oQG5d0dNxJcZPkfA0CH6QWjCtQiDdd3remTQO5qMmgWh+vHZ82EdzIN 5oO6TWerSqjN06aQkFNoWd0SfUlmNI9H6HWs1CnGFqHMKtnbSY89fSpqrpAs6NUuuQQR gKPRGZu3dfxzyJqhs3tzN/+4OLL/TCZZfTXt5iWMZ1iRHqtY9BJUBjr0mt1fE6JblW/U AfhNF2HIZXLjuT1Hdc1j+pmmj57yBA6mIMgoUSUy9ZXLyYT/eFjuBr3krbtGMt8lm3EV dzTrqHrsQIEPO3qevzjfoP9R9lFOG5BVTwyluoM5bzVB7NpToFzjW37UDpUHwG02/VHy HPtA== X-Forwarded-Encrypted: i=1; AJvYcCVVgJX8WS6PSuzEsczUZKwRiXCwpeIkBfCTi0CJJ3rSHbD9mrEKfk7GUMQWyGqtqB98U217eV4=@vger.kernel.org X-Gm-Message-State: AOJu0YzN/4hQ9iJ2C6y8LM8PWfgZ1YLu5kIMrcwm2omIXknSikmfG23R gK3xtLC7dxrgProkm8eASqiveaSz/62hGThebpWDiFLDhaSsr1oCL3sWsysAgEk= X-Gm-Gg: ASbGncuNMoDCnqNbBVbIJz0FNhp6824UtORWlG/BZ3XRwp/FB9EwbVWlUieelb9A8g4 3xyoJKdyOfjjezxkzDddeo6Eqme0MVDHR2n9DUFBPFITnV8afupE8m84DmAoh6PFXUoT7gGuS2j v8qNc/ouQbSvmKaMTgnt63gogtD+8MJeRacSe1ksuvP4HyN/GB2vgU87KmuyCYh+P4sa9orJe9m iWBMVFE2JS1XXURtKuwl7hUUtxXJ4toNVlKQ0AKCg== X-Google-Smtp-Source: AGHT+IH9IqSP8v5T7l2QXuQWImU55frwdai+9V8O6x8Vau+rR90vz8gGxOPpZEwrAVkkqcQS0BinDw== X-Received: by 2002:a17:90b:1a8a:b0:2ef:ad27:7c1a with SMTP id 98e67ed59e1d1-2f5543d2d98mr1325814a91.2.1736374026649; Wed, 08 Jan 2025 14:07:06 -0800 (PST) Received: from localhost ([2a03:2880:ff:12::]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f54a2d9168sm2118353a91.39.2025.01.08.14.07.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:06 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 03/22] net: prefix devmem specific helpers Date: Wed, 8 Jan 2025 14:06:24 -0800 Message-ID: <20250108220644.3528845-4-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Add prefixes to all helpers that are specific to devmem TCP, i.e. net_iov_binding[_id]. Reviewed-by: Jakub Kicinski Reviewed-by: Mina Almasry Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- net/core/devmem.c | 2 +- net/core/devmem.h | 14 +++++++------- net/ipv4/tcp.c | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/net/core/devmem.c b/net/core/devmem.c index 0b6ed7525b22..5e1a05082ab8 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -93,7 +93,7 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding) void net_devmem_free_dmabuf(struct net_iov *niov) { - struct net_devmem_dmabuf_binding *binding = net_iov_binding(niov); + struct net_devmem_dmabuf_binding *binding = net_devmem_iov_binding(niov); unsigned long dma_addr = net_devmem_get_dma_addr(niov); if (WARN_ON(!gen_pool_has_addr(binding->chunk_pool, dma_addr, diff --git a/net/core/devmem.h b/net/core/devmem.h index 76099ef9c482..99782ddeca40 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -86,11 +86,16 @@ static inline unsigned int net_iov_idx(const struct net_iov *niov) } static inline struct net_devmem_dmabuf_binding * -net_iov_binding(const struct net_iov *niov) +net_devmem_iov_binding(const struct net_iov *niov) { return net_iov_owner(niov)->binding; } +static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov) +{ + return net_devmem_iov_binding(niov)->id; +} + static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov) { struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov); @@ -99,11 +104,6 @@ static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov) ((unsigned long)net_iov_idx(niov) << PAGE_SHIFT); } -static inline u32 net_iov_binding_id(const struct net_iov *niov) -{ - return net_iov_owner(niov)->binding->id; -} - static inline void net_devmem_dmabuf_binding_get(struct net_devmem_dmabuf_binding *binding) { @@ -171,7 +171,7 @@ static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov) return 0; } -static inline u32 net_iov_binding_id(const struct net_iov *niov) +static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov) { return 0; } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 0d704bda6c41..b872de9a8271 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2494,7 +2494,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb, /* Will perform the exchange later */ dmabuf_cmsg.frag_token = tcp_xa_pool.tokens[tcp_xa_pool.idx]; - dmabuf_cmsg.dmabuf_id = net_iov_binding_id(niov); + dmabuf_cmsg.dmabuf_id = net_devmem_iov_binding_id(niov); offset += copy; remaining_len -= copy; From patchwork Wed Jan 8 22:06:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931622 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D769204C03 for ; Wed, 8 Jan 2025 22:07:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374030; cv=none; b=G2PnUAOuW+PxtykiwWHLTzW3XPjEORfdGFXWl87VxQQstJ604q6qb5OAu5rIhQTNBznhBHerD89X3YSmDNZA33ccm0152l+uRAgmwRTQDhsSClo/ysurCEU/dLUiNTtNsvqYaIGx9pruaoG+LQFT1MPEFeoz3u3JiE4/b2oA6Wc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374030; c=relaxed/simple; bh=pcgA9iCs/ieieDW/xSNKiw2slohdBDWi6PwGr9syOoI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LO8Wios1BekVjX0Qno8TnYr/pN2caKYA5rc0iNIVp4oJYyvzILnc+DijF4c+c7GBNVgRPPrlrBs713asSFjIlaePhivcsANx1JegY9P6PEc7YwbBG7f8lHtmWYj4Y4uTYA1MfMGkqVvz723VPf+DFCdrMbd3ceeOFGz1aBZVstg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=aMLA4gNm; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="aMLA4gNm" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-21654fdd5daso3391795ad.1 for ; Wed, 08 Jan 2025 14:07:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374028; x=1736978828; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jpi18jMRooqrSQpnsZ+v4ZIObLSk5Puxja5FgSfrqcQ=; b=aMLA4gNmrGCdtVXLhiJlGiku+HsH2mUAKz0q0Krhnyk9fWlR5UYDT+GvXEEi/cEi+i ymXwU86ozpM8uEIMiPOvQEs6oDMBZdUHATHKhuagQR+FrnlgQItlX44BkohSAoRvaB3e PuAxBgocIbf+RDLNpWxmOKPMkGVI9dtf1eA93sC2JmkvYdLQHQD+vSeKLHubhyZ8Vnmh AzidEV2scTTmAjZ6r1RrLboNtSXC+WPlaAHA3PUZ5u37vGULN6EnlOR9TFqbBos4FqkO v3nBlp+ombGkD3kFPo3yU/dAGXGQZ1rMkFhv6vZ4F2TB7M9HWLYvLdWKl5IKqMwt+ZUe QttQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374028; x=1736978828; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jpi18jMRooqrSQpnsZ+v4ZIObLSk5Puxja5FgSfrqcQ=; b=FmR3vO0CRUVrC0bOX4yVVXjUmctkxu1gAXPL45wdothiFQLHXY41uAQkZXenErgMnZ wGq6hAsi1PFaBc1ytgMhu+vuEc3HKcovSPC1Jnx9lpyAutUfLLNUcNaoexTrOISfDuh5 2/PErLQG2uC5MNBXeEZWeJRZvRxTDCzea3tlRQiNoi0vos88JKY3mxKtkr/yaPB+pNSi FoBabbZRRwxuDSDrsx5VhWmjY4aEj7n3u4QHWXMnuK6MZZwEmvSxD/jyc7yMArdPV+wG kwdUxd1bLpsfNVNp7HzS9yaWHwcclm0JdLiGoCi2+AKKQlC9lDNrS/8Fo8L6Dk4VPM5s ZPig== X-Forwarded-Encrypted: i=1; AJvYcCUNSehhpRL9mJAG9UAIPyXEb1/WoJiK417r/0cZphtMZ+QBjGGQHO8ttRe/9FA4+En8oioJihw=@vger.kernel.org X-Gm-Message-State: AOJu0Ywb/LMxyBx95KvC3R0CKzbd6ciYcPqxkerpLKVAaY75Q9msHaKz /cxUVpqQ6tyMPUes2zvcOWDdGs/PcOUwx+gaB8IrCB4HskF4GFmkUOh7MfWfZlI= X-Gm-Gg: ASbGncvUd5B9eKVQwTJUQ/2YMilOuOTJI3wyBD3zoBDEvQmSsRVEulJwPrnf+qgzHbN 4l3yVqX7xUD9ks5a6H7S22Y2CWmkB3hrdkx06B6wwEmx9nPLhTtZcSQvIvebgYoFWXxUQoROB93 8NtbuWEPU/R8LpoGNxUjSIgbi6zozyYGBFNG3rQbBF+RyaksGAhxGpQtDxGKxkStzzFfCrdOSsI IPaDQ7GqmVWE2SX4S5RUbOe/S9jIEz6rZujjsWI3A== X-Google-Smtp-Source: AGHT+IEEqAGG1Jbl6iqi885uTc5/x2au4PnylByqxVldJB+0hhXAjo9UPA1p46olnhoiCpGu4SCe/w== X-Received: by 2002:a17:902:c94d:b0:215:9f5a:a236 with SMTP id d9443c01a7336-21a83f43b51mr54104465ad.6.1736374027944; Wed, 08 Jan 2025 14:07:07 -0800 (PST) Received: from localhost ([2a03:2880:ff:1b::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc96313esm331643545ad.20.2025.01.08.14.07.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:07 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 04/22] net: generalise net_iov chunk owners Date: Wed, 8 Jan 2025 14:06:25 -0800 Message-ID: <20250108220644.3528845-5-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Currently net_iov stores a pointer to struct dmabuf_genpool_chunk_owner, which serves as a useful abstraction to share data and provide a context. However, it's too devmem specific, and we want to reuse it for other memory providers, and for that we need to decouple net_iov from devmem. Make net_iov to point to a new base structure called net_iov_area, which dmabuf_genpool_chunk_owner extends. Reviewed-by: Mina Almasry Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/netmem.h | 21 ++++++++++++++++++++- net/core/devmem.c | 25 +++++++++++++------------ net/core/devmem.h | 25 +++++++++---------------- 3 files changed, 42 insertions(+), 29 deletions(-) diff --git a/include/net/netmem.h b/include/net/netmem.h index 1b58faa4f20f..c61d5b21e7b4 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -24,11 +24,20 @@ struct net_iov { unsigned long __unused_padding; unsigned long pp_magic; struct page_pool *pp; - struct dmabuf_genpool_chunk_owner *owner; + struct net_iov_area *owner; unsigned long dma_addr; atomic_long_t pp_ref_count; }; +struct net_iov_area { + /* Array of net_iovs for this area. */ + struct net_iov *niovs; + size_t num_niovs; + + /* Offset into the dma-buf where this chunk starts. */ + unsigned long base_virtual; +}; + /* These fields in struct page are used by the page_pool and net stack: * * struct { @@ -54,6 +63,16 @@ NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr); NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count); #undef NET_IOV_ASSERT_OFFSET +static inline struct net_iov_area *net_iov_owner(const struct net_iov *niov) +{ + return niov->owner; +} + +static inline unsigned int net_iov_idx(const struct net_iov *niov) +{ + return niov - net_iov_owner(niov)->niovs; +} + /* netmem */ /** diff --git a/net/core/devmem.c b/net/core/devmem.c index 5e1a05082ab8..c250db6993d3 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -32,14 +32,15 @@ static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool, { struct dmabuf_genpool_chunk_owner *owner = chunk->owner; - kvfree(owner->niovs); + kvfree(owner->area.niovs); kfree(owner); } static dma_addr_t net_devmem_get_dma_addr(const struct net_iov *niov) { - struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov); + struct dmabuf_genpool_chunk_owner *owner; + owner = net_devmem_iov_to_chunk_owner(niov); return owner->base_dma_addr + ((dma_addr_t)net_iov_idx(niov) << PAGE_SHIFT); } @@ -82,7 +83,7 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding) offset = dma_addr - owner->base_dma_addr; index = offset / PAGE_SIZE; - niov = &owner->niovs[index]; + niov = &owner->area.niovs[index]; niov->pp_magic = 0; niov->pp = NULL; @@ -250,9 +251,9 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, goto err_free_chunks; } - owner->base_virtual = virtual; + owner->area.base_virtual = virtual; owner->base_dma_addr = dma_addr; - owner->num_niovs = len / PAGE_SIZE; + owner->area.num_niovs = len / PAGE_SIZE; owner->binding = binding; err = gen_pool_add_owner(binding->chunk_pool, dma_addr, @@ -264,17 +265,17 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, goto err_free_chunks; } - owner->niovs = kvmalloc_array(owner->num_niovs, - sizeof(*owner->niovs), - GFP_KERNEL); - if (!owner->niovs) { + owner->area.niovs = kvmalloc_array(owner->area.num_niovs, + sizeof(*owner->area.niovs), + GFP_KERNEL); + if (!owner->area.niovs) { err = -ENOMEM; goto err_free_chunks; } - for (i = 0; i < owner->num_niovs; i++) { - niov = &owner->niovs[i]; - niov->owner = owner; + for (i = 0; i < owner->area.num_niovs; i++) { + niov = &owner->area.niovs[i]; + niov->owner = &owner->area; page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), net_devmem_get_dma_addr(niov)); } diff --git a/net/core/devmem.h b/net/core/devmem.h index 99782ddeca40..a2b9913e9a17 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -10,6 +10,8 @@ #ifndef _NET_DEVMEM_H #define _NET_DEVMEM_H +#include + struct netlink_ext_ack; struct net_devmem_dmabuf_binding { @@ -51,17 +53,11 @@ struct net_devmem_dmabuf_binding { * allocations from this chunk. */ struct dmabuf_genpool_chunk_owner { - /* Offset into the dma-buf where this chunk starts. */ - unsigned long base_virtual; + struct net_iov_area area; + struct net_devmem_dmabuf_binding *binding; /* dma_addr of the start of the chunk. */ dma_addr_t base_dma_addr; - - /* Array of net_iovs for this chunk. */ - struct net_iov *niovs; - size_t num_niovs; - - struct net_devmem_dmabuf_binding *binding; }; void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding); @@ -75,20 +71,17 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, void dev_dmabuf_uninstall(struct net_device *dev); static inline struct dmabuf_genpool_chunk_owner * -net_iov_owner(const struct net_iov *niov) +net_devmem_iov_to_chunk_owner(const struct net_iov *niov) { - return niov->owner; -} + struct net_iov_area *owner = net_iov_owner(niov); -static inline unsigned int net_iov_idx(const struct net_iov *niov) -{ - return niov - net_iov_owner(niov)->niovs; + return container_of(owner, struct dmabuf_genpool_chunk_owner, area); } static inline struct net_devmem_dmabuf_binding * net_devmem_iov_binding(const struct net_iov *niov) { - return net_iov_owner(niov)->binding; + return net_devmem_iov_to_chunk_owner(niov)->binding; } static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov) @@ -98,7 +91,7 @@ static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov) static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov) { - struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov); + struct net_iov_area *owner = net_iov_owner(niov); return owner->base_virtual + ((unsigned long)net_iov_idx(niov) << PAGE_SHIFT); From patchwork Wed Jan 8 22:06:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931623 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EED47204C17 for ; Wed, 8 Jan 2025 22:07:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374031; cv=none; b=fO9V77hEhLIdBWqoJPcsEAt8U0CA2+P1Ru6i+y/O+zaWitHix+Nb4YdnEKw2MV9f59riLSwTnwXLmMIOwTw4zWMQufeSHjVHb6xGUQKun64Mzf0eDjhYEl++O17pGWwZfmI9P3bZHmvfX0QczK9U292ykEp1+QUPhLbqGxuS5PI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374031; c=relaxed/simple; bh=T/WsGe1F3/KANs5Z1NhzY/rCLInIVwIgjCKI35/wAxE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=az9WeYZ9JpoSMaquTBQjDL3+e9Pf0UGGPW79AvDbMx2J1J0tT/eDsCAVq5zbFbVOgs4kmiV9G8lTnkgVhrASmj/C1fvhhfccngxSQOBROmEeVSaMOV0PW1yqWT6PHQ+5DGusQmRdPxX2oXLtbSSusPdcpAM+oBpE7ZJNG2d6yW8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=VbySEImE; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="VbySEImE" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-21680814d42so2946335ad.2 for ; Wed, 08 Jan 2025 14:07:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374029; x=1736978829; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pw8qu+SEHYxdEfim8EZqTmfADlQgU2tSEEjaVQVrDsY=; b=VbySEImEFu8ErtMw6qdJmJizKgGkhqqky1dxUWdBeL7O+4KDUwB68UcBYX0arqAnW/ MfaevH8jPujIVYfJyD6qBbQU3GMCiTm3qp8ZPD+/TEsTyu32THEQK17QEsw3ZFekKQHg CJOBbWPT0JvftsRRr8dSXzNC0seIb6F0DPpQZAMgPV+ixEPUty+D2s6mDepFNPX4vUGZ vuZeILCN5Gnye25e09uaw+/PvRBHCrTvtMZAild+j5DOHzqwhsdSU98mH69wFffBHtGk AgBQdWxT5+vb2To2IEeDAwv5meBBJ/btRzOQaneC1y35t/+7cCofxm5JGliaA9ANkh9C Jp5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374029; x=1736978829; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pw8qu+SEHYxdEfim8EZqTmfADlQgU2tSEEjaVQVrDsY=; b=eOaX2+W6Kq2rLFHwN38HzQ18zoLbISxstdl+FvPDN3h+4vAweg+32WEpylGFBfHUE4 OQXx7ievkCkgVVcoJNfG1kzQyOKnqgQuY/b7CFpOb8q+8dzy/EX7PMeRMoAyuVEIVwVQ fGGLniqg/9vzKmRsmuWOp5DTSNmlgvxvLYHRJ0iZLSpj0XEwfRHq8yDewfyUTsWkrvZC PtSUWiVF5Dx4773g4wuenIqnM5ozNE12PCtQrIikl84zl4n7nTDxvAawoqM38uYVn7cJ 9CPUNekZRNcImjli3sqnIuskjJL90A+1QjU3SSybw8kg4euFIbst1mFofJbJAQTwiHB0 D3/g== X-Forwarded-Encrypted: i=1; AJvYcCW8WL0/jFYfiY0QqmJ4QGoJz7hzUlrdLn62nVbwiV3k1k4CDEo1uqvzqAecfSRUJZxBr0DY+pY=@vger.kernel.org X-Gm-Message-State: AOJu0YzaEZWr1S8CXJZ79wSFNfBFlLZYMen92jBHbbpiiTGp2qXcgshe o6/bmyUFjr9cEDZT6tGKPBs9eze7xX2WJTnnCZnLrimnjjXkkzQmrCqZJACYbMs= X-Gm-Gg: ASbGnctcyAmMZmUIhgDFKkaX4uZIAPAytDFVc/Uf8E6hZY56P2MV+04suAKjv61foIf UriQ6+1mIAvrr8ygTizR1S9+OTGlH8QQjLHTnnsiM95WcIO4r34QfYsN8u0xvX8fAHcaO/qqR/z 70b26aefDX/kJ1KOw5rVCpVc2i0z6gKj+/p53nv0y7zI3L8uW7TiQ+EXpVZiYg7ZyuAvNJ5xt+S IpgYNuWCFnhzFs9OXu8sNpMK1NKMHBFwuU+ce8S X-Google-Smtp-Source: AGHT+IFxIU0l/blYAReuxBp8Mpq8O4nWukgSD1le7XaZ4bZ2SYdj9HeG5dxmtA5txwPn910KNsQvTA== X-Received: by 2002:a17:903:2287:b0:212:67a5:ab2d with SMTP id d9443c01a7336-21a83fc9918mr69291195ad.44.1736374029332; Wed, 08 Jan 2025 14:07:09 -0800 (PST) Received: from localhost ([2a03:2880:ff:7::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc962dc0sm333958605ad.32.2025.01.08.14.07.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:08 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 05/22] net: page pool: export page_pool_set_dma_addr_netmem() Date: Wed, 8 Jan 2025 14:06:26 -0800 Message-ID: <20250108220644.3528845-6-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Export page_pool_set_dma_addr_netmem() in page_pool/helpers.h. This is needed by memory provider implementations that are outside of net/ to be able to set the dma addrs on net_iovs during alloc/free. Signed-off-by: David Wei --- include/net/page_pool/helpers.h | 10 ++++++++++ net/core/page_pool.c | 16 ++++++++++++++++ net/core/page_pool_priv.h | 17 ----------------- 3 files changed, 26 insertions(+), 17 deletions(-) diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h index 582a3d00cbe2..4ecd45646c77 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -492,4 +492,14 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid) page_pool_update_nid(pool, new_nid); } +#if defined(CONFIG_PAGE_POOL) +bool page_pool_set_dma_addr_netmem(netmem_ref netmem, dma_addr_t addr); +#else +static inline bool page_pool_set_dma_addr_netmem(netmem_ref netmem, + dma_addr_t addr) +{ + return false; +} +#endif + #endif /* _NET_PAGE_POOL_HELPERS_H */ diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 9733206d6406..fc3a04823087 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -649,6 +649,22 @@ void page_pool_clear_pp_info(netmem_ref netmem) netmem_set_pp(netmem, NULL); } +bool page_pool_set_dma_addr_netmem(netmem_ref netmem, dma_addr_t addr) +{ + if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) { + netmem_set_dma_addr(netmem, addr >> PAGE_SHIFT); + + /* We assume page alignment to shave off bottom bits, + * if this "compression" doesn't work we need to drop. + */ + return addr != (dma_addr_t)netmem_get_dma_addr(netmem) + << PAGE_SHIFT; + } + + netmem_set_dma_addr(netmem, addr); + return false; +} + static __always_inline void __page_pool_release_page_dma(struct page_pool *pool, netmem_ref netmem) { diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h index 57439787b9c2..fda9c59ee283 100644 --- a/net/core/page_pool_priv.h +++ b/net/core/page_pool_priv.h @@ -13,23 +13,6 @@ int page_pool_list(struct page_pool *pool); void page_pool_detached(struct page_pool *pool); void page_pool_unlist(struct page_pool *pool); -static inline bool -page_pool_set_dma_addr_netmem(netmem_ref netmem, dma_addr_t addr) -{ - if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) { - netmem_set_dma_addr(netmem, addr >> PAGE_SHIFT); - - /* We assume page alignment to shave off bottom bits, - * if this "compression" doesn't work we need to drop. - */ - return addr != (dma_addr_t)netmem_get_dma_addr(netmem) - << PAGE_SHIFT; - } - - netmem_set_dma_addr(netmem, addr); - return false; -} - static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr) { return page_pool_set_dma_addr_netmem(page_to_netmem(page), addr); From patchwork Wed Jan 8 22:06:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931624 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com [209.85.216.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EF30204C27 for ; Wed, 8 Jan 2025 22:07:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374032; cv=none; b=g1RE40jHmKVOJd3PbNV6JwqRqQenjCKlVNy86OMyatiPzq1zW9giXUP9kkLubK7nqladlbbWBfbNdeVHtCChDHCOQANJoNUNT5P9dDPumF/fJn2dO1a1p//5R452N1Za9VWRlO7z/guiLoYhTrQJdszlXfENMD0DOrewmUQ3kL0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374032; c=relaxed/simple; bh=vsqLuE6DqLf3J0UFpwIwwTXWmo6/0dTiQbeWm5ynXEQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A7t9PkGyCKcoOjq7DZPgxOqTq2RboBZqL1olpBT2Eet2wSGuk5xOjWhqZy44iUXXSOG256DOpB9JRqvnTLIfU6aHJjM9Kijt7SAqc7knsZo2EfOkpzLugyawRncqhWolEwFkVYUeSHc5qz4YswwfhHDlHz/PNP7uYg4sWYa/thw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=QhBDBWqH; arc=none smtp.client-ip=209.85.216.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="QhBDBWqH" Received: by mail-pj1-f54.google.com with SMTP id 98e67ed59e1d1-2ee50ffcf14so2181317a91.0 for ; Wed, 08 Jan 2025 14:07:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374031; x=1736978831; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UPy7NZ/HefVEBZMyFVm1hd+CR3G+7zsYSwKOvIz2FyU=; b=QhBDBWqHDuOUmG7rpzDXAfsO2VmcGH5Fj/7fxy6t2jRdlukb2MYpckPfenXqvQBeM0 mC03ibrBcaj5T3LPZ8uO2VcfuhJij63fGCEik9OAeWf5G9EF5Mpk3edZJXK2B346tPaK mT67/6MhtetuwSZGjAh+ydHzWm/CxGyVfAdcdyJnpYAODQTy1491ckOO0fm0/xcN/Es9 JEOdXLN/NKACs/z61Lv0FPKnjIxqSugI6uQfSgj602TvqKaYTpAhyu6RuzVWW1zZ5IYc B+qnIg+VkbD4ODmjnb8zhGhpuZGq6YSE/U5ut0Rq87xLrUB2LGd3VzEMCSV8zMBw5Mvu Qjzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374031; x=1736978831; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UPy7NZ/HefVEBZMyFVm1hd+CR3G+7zsYSwKOvIz2FyU=; b=PqCXDw1gbwAxKyTTRGImpbucFVhl6LPbvRgHufKis3lcyFAl5yl9yLhiXcO/8YP0Y9 xnuwDPa2mR6K9haUuK4rUcgHV2czOlUeQC+VJ/ybiVYl50K5gX4R+qf1MABwRxclPyYK o56bgKVRg5JYr1EHagJet0N3pjXmBRBszpDAtPsGWnyqMirsUlkUegO5KtMHq68KAfvf 6odJD6jDiDAMwMZ1Ug0SkXjHFRZVYnQAaMhXp3loChfI01ALrVEQbv7mSDuZc54xkqTo KakcnVyPci5dkTRD55PQfJg9xtdCRhcyLHPkrrFGSXY0fYC/sKu0XBLSzzsEI0dbftEv cbmg== X-Forwarded-Encrypted: i=1; AJvYcCXbc/yQNsTZHy4qitn88WhD6SBxga68jXIpyLsun5PfIS/MAFIcZ4NmyNEPRuKjl5vlYZ+s3uQ=@vger.kernel.org X-Gm-Message-State: AOJu0YwSIvsI+DpHipPll5EXtg3A6uStGlIhDh0L+/fxqly8+0iB+iqj azPS0Dhp0WIjQSxQLFQykuGb6/kapiOJmK2UBIRo2bEf/sa59zao5QS+FrmyVMg= X-Gm-Gg: ASbGncs0QtEAJxP7qW5HGIFl1w8LqEyuzDsZxmYN/q3GtKrSV+RjWEiCsgFq67pMgLA xY4aAfIIxKzqwMWwVT3qKaYWsJ97fBjXP7HhiCx3E+XCAPWFkHhDqRp+51GxZTHZbjasOevMggn RWvg7HGtR05EyvOp6F0t0OZWyEUCtOCTqj79C4ybtSpmirvzx2Vernlukn7ct5DcCZpstJpj7TD u7KGRBP4EnA8SOeoRX7f7NnZBL7MLAF3C27Wneipw== X-Google-Smtp-Source: AGHT+IHxqz8Tw10oJ/rDnvBmMsvo/Ohi5UrLMOF52SlyZc5UiSa2yJQpU6BJRG4JR8KRWKdh/1Ii6Q== X-Received: by 2002:a05:6a00:2a0b:b0:725:e386:3c5b with SMTP id d2e1a72fcca58-72d30327553mr1702779b3a.5.1736374030659; Wed, 08 Jan 2025 14:07:10 -0800 (PST) Received: from localhost ([2a03:2880:ff:1d::]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72aad815796sm36896848b3a.28.2025.01.08.14.07.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:10 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 06/22] net: page_pool: create hooks for custom memory providers Date: Wed, 8 Jan 2025 14:06:27 -0800 Message-ID: <20250108220644.3528845-7-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Jakub Kicinski A spin off from the original page pool memory providers patch by Jakub, which allows extending page pools with custom allocators. One of such providers is devmem TCP, and the other is io_uring zerocopy added in following patches. Suggested-by: Jakub Kicinski Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/page_pool/memory_provider.h | 20 ++++++++++++++++++++ include/net/page_pool/types.h | 4 ++++ net/core/devmem.c | 15 ++++++++++++++- net/core/page_pool.c | 23 +++++++++++++++-------- 4 files changed, 53 insertions(+), 9 deletions(-) create mode 100644 include/net/page_pool/memory_provider.h diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h new file mode 100644 index 000000000000..79412a8714fa --- /dev/null +++ b/include/net/page_pool/memory_provider.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * page_pool/memory_provider.h + * Author: Pavel Begunkov + * Author: David Wei + */ +#ifndef _NET_PAGE_POOL_MEMORY_PROVIDER_H +#define _NET_PAGE_POOL_MEMORY_PROVIDER_H + +#include +#include + +struct memory_provider_ops { + netmem_ref (*alloc_netmems)(struct page_pool *pool, gfp_t gfp); + bool (*release_netmem)(struct page_pool *pool, netmem_ref netmem); + int (*init)(struct page_pool *pool); + void (*destroy)(struct page_pool *pool); +}; + +#endif diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index ed4cd114180a..88f65c3e2ad9 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -152,8 +152,11 @@ struct page_pool_stats { */ #define PAGE_POOL_FRAG_GROUP_ALIGN (4 * sizeof(long)) +struct memory_provider_ops; + struct pp_memory_provider_params { void *mp_priv; + const struct memory_provider_ops *mp_ops; }; struct page_pool { @@ -216,6 +219,7 @@ struct page_pool { struct ptr_ring ring; void *mp_priv; + const struct memory_provider_ops *mp_ops; #ifdef CONFIG_PAGE_POOL_STATS /* recycle stats are per-cpu to avoid locking */ diff --git a/net/core/devmem.c b/net/core/devmem.c index c250db6993d3..48833c1dcbd4 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include "devmem.h" @@ -26,6 +27,8 @@ /* Protected by rtnl_lock() */ static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1); +static const struct memory_provider_ops dmabuf_devmem_ops; + static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool, struct gen_pool_chunk *chunk, void *not_used) @@ -117,6 +120,7 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding) WARN_ON(rxq->mp_params.mp_priv != binding); rxq->mp_params.mp_priv = NULL; + rxq->mp_params.mp_ops = NULL; rxq_idx = get_netdev_rx_queue_index(rxq); @@ -142,7 +146,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, } rxq = __netif_get_rx_queue(dev, rxq_idx); - if (rxq->mp_params.mp_priv) { + if (rxq->mp_params.mp_ops) { NL_SET_ERR_MSG(extack, "designated queue already memory provider bound"); return -EEXIST; } @@ -160,6 +164,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, return err; rxq->mp_params.mp_priv = binding; + rxq->mp_params.mp_ops = &dmabuf_devmem_ops; err = netdev_rx_queue_restart(dev, rxq_idx); if (err) @@ -169,6 +174,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, err_xa_erase: rxq->mp_params.mp_priv = NULL; + rxq->mp_params.mp_ops = NULL; xa_erase(&binding->bound_rxqs, xa_idx); return err; @@ -388,3 +394,10 @@ bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem) /* We don't want the page pool put_page()ing our net_iovs. */ return false; } + +static const struct memory_provider_ops dmabuf_devmem_ops = { + .init = mp_dmabuf_devmem_init, + .destroy = mp_dmabuf_devmem_destroy, + .alloc_netmems = mp_dmabuf_devmem_alloc_netmems, + .release_netmem = mp_dmabuf_devmem_release_page, +}; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index fc3a04823087..0c5da8c056ec 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -13,6 +13,7 @@ #include #include +#include #include #include @@ -285,13 +286,19 @@ static int page_pool_init(struct page_pool *pool, rxq = __netif_get_rx_queue(pool->slow.netdev, pool->slow.queue_idx); pool->mp_priv = rxq->mp_params.mp_priv; + pool->mp_ops = rxq->mp_params.mp_ops; } - if (pool->mp_priv) { + if (pool->mp_ops) { if (!pool->dma_map || !pool->dma_sync) return -EOPNOTSUPP; - err = mp_dmabuf_devmem_init(pool); + if (WARN_ON(!is_kernel_rodata((unsigned long)pool->mp_ops))) { + err = -EFAULT; + goto free_ptr_ring; + } + + err = pool->mp_ops->init(pool); if (err) { pr_warn("%s() mem-provider init failed %d\n", __func__, err); @@ -588,8 +595,8 @@ netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp) return netmem; /* Slow-path: cache empty, do real allocation */ - if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv) - netmem = mp_dmabuf_devmem_alloc_netmems(pool, gfp); + if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops) + netmem = pool->mp_ops->alloc_netmems(pool, gfp); else netmem = __page_pool_alloc_pages_slow(pool, gfp); return netmem; @@ -696,8 +703,8 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) bool put; put = true; - if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv) - put = mp_dmabuf_devmem_release_page(pool, netmem); + if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops) + put = pool->mp_ops->release_netmem(pool, netmem); else __page_pool_release_page_dma(pool, netmem); @@ -1065,8 +1072,8 @@ static void __page_pool_destroy(struct page_pool *pool) page_pool_unlist(pool); page_pool_uninit(pool); - if (pool->mp_priv) { - mp_dmabuf_devmem_destroy(pool); + if (pool->mp_ops) { + pool->mp_ops->destroy(pool); static_branch_dec(&page_pool_mem_providers); } From patchwork Wed Jan 8 22:06:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931625 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C92F6204689 for ; Wed, 8 Jan 2025 22:07:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374034; cv=none; b=e6S5CI/bHUfEOG+Li+gmpKAVGnfNOsAba7uCgrjb/tOdUYWD5LMkuRkpN1hPBCVhqkFlqYFKOVZ7v8tGkocp7mtGtIqbKp4th4KhxA7TaLcbTg+RHxCB5u9ZN4+zLablQV8/iLbxDRI+fIqdAkN7rBOJg+SrJO7DQxlYQ2vHRxY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374034; c=relaxed/simple; bh=BLz2CIl+2gWaVD1yXUBMqm+9EtPiGKPqSFLgPfg/p4Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DwTDwoKbj8BhHR929NYC6I3hVtV7PFtUPI5BGtVvLXxIICr7mk5AHjWcr21NrNEhvXMnc65SR2pJrSxV3i8/QAgELymJXR+V03hP541frKnQSRGT9ZyVjVAbdEOu32QuqqG+4k0TbegScxb7J6n8tECKG9O/x8TqcOHxBRL/yBk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=ijaa+zOr; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="ijaa+zOr" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-21670dce0a7so4511005ad.1 for ; Wed, 08 Jan 2025 14:07:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374032; x=1736978832; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mX/tjyz7ib4qidu6OyYUCNKQIC8lKBXHPUPNB1Bean4=; b=ijaa+zOruUF1SCTIV7cKfw7SaBorWMxiJ9rmP3g2PvWOpmHCLTYggEcIPirDERYUy+ FUhVGoqjNRETFL59QGx5QanS9pyvtZCLpaWqesEsaxGVQzzcXwOx1BRtk5Pq6xzedxhD k/DjrnEh21qvCn4kErJy3HZOjHh/KWWYknFTCe5gZTD0Vez3hGPiplitYvfGc4MDpvOv En1T3y/rDjIT0N4Qb/sCY6DFYbeTHyxkRJTLyQqD8UnuHR8G+bYMfq4yY/zlGVy01B4x w9a55vhrTt6LfDIrPxxBWz0EC5pNYL2ANrZdw4iMplSaNb6imhQlKIRq3lUs79rEGn1h 8iPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374032; x=1736978832; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mX/tjyz7ib4qidu6OyYUCNKQIC8lKBXHPUPNB1Bean4=; b=s3SOuho/np5TrnBtlabzxiwVo8uwFhQaKHNZ3w0fb4DZopo2h6CN+wSYzDb1xnNGVf jR79UnG41pvBVIWIEJaUShQReFqC1nW9nWHTySNlI9idRPsRbi/+THzc2DfC4v+HOmmb Nx/ZvsGHauCi5HDQVnL8kQryLFA5r/0u/41z2cKa6B+UgkWwJ2fsmxb7pvVLITQaMjNd dRnLAhU0d9lT9N8iMEszSW6Nhr159xPptagLa/pd3mMBXTBARf2Op5mVMXJlwmZokMkD kbCZlifGecz60h+PJwCB4A8CDL6quJIEucDKG0vHNczTeAba9b9G7bkaCOTxd+lck9hb +9Mg== X-Forwarded-Encrypted: i=1; AJvYcCXIaUV7B+f8LXJYIWLlxYRDZwoKGRXNmTC2CI1/1Jq+ybVBWx2WmKqxZib4zltZ2KqY48XjAhE=@vger.kernel.org X-Gm-Message-State: AOJu0YxQaVeJWTEwVtRQtKbglN3Ebsww8Ax/5uMQvCOzR6u4zmbdJAIU 9I42aycVkIFhCb7BYMNG6VWJYUMFAICLlfl2wgzR1Q8IyHn0Yzu8nsTeUVjuwjM= X-Gm-Gg: ASbGncvjtTOBieCnHuxxyctMo+K3BTK5r8TKS7vFCgroBP5w+PYL62ZVGRMvulToVXh CJaWFenT2vQ+jrwV2YFTlS+DDK+kCNADr8H10ivYBj1Z6LaUCNyjcfq2xmCFBLun0WNW020ZSDe M+Ub6F4f/eRZRgZ0RMiwRegxD9csF6UekSGayOK2YBxfeBAZrJhDs0UUpAR1eEuZMwOSH2+vavv c2MtDnILvHGnlrFQC79frgpkgTBM8R3LAVRhv9dRg== X-Google-Smtp-Source: AGHT+IFQ/fod37JQsWNUJCo1pK530n5DANk+hYj+5M6R5H9LpNdJGMMQyjPm5JTJAtys3l6Z+VUAHg== X-Received: by 2002:a05:6a21:33aa:b0:1e1:f281:8cec with SMTP id adf61e73a8af0-1e88cfa61dcmr7504727637.10.1736374031918; Wed, 08 Jan 2025 14:07:11 -0800 (PST) Received: from localhost ([2a03:2880:ff:11::]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72aad8fb9fcsm35897270b3a.164.2025.01.08.14.07.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:11 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 07/22] netdev: add io_uring memory provider info Date: Wed, 8 Jan 2025 14:06:28 -0800 Message-ID: <20250108220644.3528845-8-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Add a nested attribute for io_uring memory provider info. For now it is empty and its presence indicates that a particular page pool or queue has an io_uring memory provider attached. $ ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get [{'id': 80, 'ifindex': 2, 'inflight': 64, 'inflight-mem': 262144, 'napi-id': 525}, {'id': 79, 'ifindex': 2, 'inflight': 320, 'inflight-mem': 1310720, 'io_uring': {}, 'napi-id': 525}, ... $ ./cli.py --spec netlink/specs/netdev.yaml --dump queue-get [{'id': 0, 'ifindex': 1, 'type': 'rx'}, {'id': 0, 'ifindex': 1, 'type': 'tx'}, {'id': 0, 'ifindex': 2, 'napi-id': 513, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 514, 'type': 'rx'}, ... {'id': 12, 'ifindex': 2, 'io_uring': {}, 'napi-id': 525, 'type': 'rx'}, ... Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- Documentation/netlink/specs/netdev.yaml | 15 +++++++++++++++ include/uapi/linux/netdev.h | 8 ++++++++ tools/include/uapi/linux/netdev.h | 8 ++++++++ 3 files changed, 31 insertions(+) diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index cbb544bd6c84..288923e965ae 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -114,6 +114,9 @@ attribute-sets: doc: Bitmask of enabled AF_XDP features. type: u64 enum: xsk-flags + - + name: io-uring-provider-info + attributes: [] - name: page-pool attributes: @@ -171,6 +174,11 @@ attribute-sets: name: dmabuf doc: ID of the dmabuf this page-pool is attached to. type: u32 + - + name: io-uring + doc: io-uring memory provider information. + type: nest + nested-attributes: io-uring-provider-info - name: page-pool-info subset-of: page-pool @@ -296,6 +304,11 @@ attribute-sets: name: dmabuf doc: ID of the dmabuf attached to this queue, if any. type: u32 + - + name: io-uring + doc: io_uring memory provider information. + type: nest + nested-attributes: io-uring-provider-info - name: qstats @@ -572,6 +585,7 @@ operations: - inflight-mem - detach-time - dmabuf + - io-uring dump: reply: *pp-reply config-cond: page-pool @@ -637,6 +651,7 @@ operations: - napi-id - ifindex - dmabuf + - io-uring dump: request: attributes: diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index e4be227d3ad6..684090732068 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -86,6 +86,12 @@ enum { NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1) }; +enum { + + __NETDEV_A_IO_URING_PROVIDER_INFO_MAX, + NETDEV_A_IO_URING_PROVIDER_INFO_MAX = (__NETDEV_A_IO_URING_PROVIDER_INFO_MAX - 1) +}; + enum { NETDEV_A_PAGE_POOL_ID = 1, NETDEV_A_PAGE_POOL_IFINDEX, @@ -94,6 +100,7 @@ enum { NETDEV_A_PAGE_POOL_INFLIGHT_MEM, NETDEV_A_PAGE_POOL_DETACH_TIME, NETDEV_A_PAGE_POOL_DMABUF, + NETDEV_A_PAGE_POOL_IO_URING, __NETDEV_A_PAGE_POOL_MAX, NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1) @@ -136,6 +143,7 @@ enum { NETDEV_A_QUEUE_TYPE, NETDEV_A_QUEUE_NAPI_ID, NETDEV_A_QUEUE_DMABUF, + NETDEV_A_QUEUE_IO_URING, __NETDEV_A_QUEUE_MAX, NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1) diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h index e4be227d3ad6..684090732068 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -86,6 +86,12 @@ enum { NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1) }; +enum { + + __NETDEV_A_IO_URING_PROVIDER_INFO_MAX, + NETDEV_A_IO_URING_PROVIDER_INFO_MAX = (__NETDEV_A_IO_URING_PROVIDER_INFO_MAX - 1) +}; + enum { NETDEV_A_PAGE_POOL_ID = 1, NETDEV_A_PAGE_POOL_IFINDEX, @@ -94,6 +100,7 @@ enum { NETDEV_A_PAGE_POOL_INFLIGHT_MEM, NETDEV_A_PAGE_POOL_DETACH_TIME, NETDEV_A_PAGE_POOL_DMABUF, + NETDEV_A_PAGE_POOL_IO_URING, __NETDEV_A_PAGE_POOL_MAX, NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1) @@ -136,6 +143,7 @@ enum { NETDEV_A_QUEUE_TYPE, NETDEV_A_QUEUE_NAPI_ID, NETDEV_A_QUEUE_DMABUF, + NETDEV_A_QUEUE_IO_URING, __NETDEV_A_QUEUE_MAX, NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1) From patchwork Wed Jan 8 22:06:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931626 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13A48204F72 for ; Wed, 8 Jan 2025 22:07:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374036; cv=none; b=Ka92JIf7yYaTG7CtZf0SBnZkq9eq+qTSZDFyKpoIbilh87FccnQ2Jz5ERjXIj3po/amkPthmuWcIGubmp93vrHCjgYNLE3jvQ1aF8vctYqMTXefnI4pTEVsXGEcoJc7dKTDNedxG4/waQSnyhn3B3zLiMbzbeiqVWy0ounj68Z4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374036; c=relaxed/simple; bh=BvfCg8XrKhKo+Hk/JZ2OOE7Gn15jev1hzSbcDK9JqTc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aZLZ/2h8xK/0TyamgBDmAiKcIDLQzdPQ14VrVPaBhuO6CHGEDVCAcUxCmDtJWwy3P18koqeTAdSLZR38a5WqSNpEs6ehp6neDVJTZG52IfqQGi8LqwWuWHYcgz9xjBpZrxGHqPumVgbXEOVeqgSw4Cvk/eXgXSapO+au5yyC27I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=dN+BxL+h; arc=none smtp.client-ip=209.85.216.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="dN+BxL+h" Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-2ee786b3277so381877a91.1 for ; Wed, 08 Jan 2025 14:07:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374033; x=1736978833; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xOwH1TpuoWZADqKN7mEhLSZxCObxMi7MjYf6hC9Eb50=; b=dN+BxL+hHm1w4Nb+CcA4G9t09t/zwCQBrlB1KvSLx4ksZzOI2qgj/fxDRoWXRM5ope f+fsZrfv0EfARkLZt9kwyNeG+KSQtEBuaM1d79YUh5xJ5D2G7bViZaQ//QlqH5hUGG9H i9UXD7b6RxUvhBFhTr+DcQlsv1AthLxgbNvCmLo2D71XkSB5d4QEjAizNaqen7hGt0Hk 4y4WSEN0bLEwa8pz0yyv9xO1oWxfYj8YFcRhCppYV35qOjUcC22lsKr1drpKnUEPmEhm V8ajrqMF99hAxNaETvXEPVSMe0yqOB9QLRXan57hEKoiNHt5kep9ZA4Iqz2kgxQjmije 8M/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374033; x=1736978833; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xOwH1TpuoWZADqKN7mEhLSZxCObxMi7MjYf6hC9Eb50=; b=gBgSXD6WVRH8rXpRisDD1XYZ5G/AxzbueTF3NhOEORJeuufuMZXe2e/QjhnYUBwoyo g8F7Oi3aI1IIb8SABseHVb7/YrYNN7YXXyvRstLHPjT/tSsZCBSQE8RslQyShq5Rp1tO YpSq2P/2EqX02cE78EfUkft4rVzpBF3Wzpey2VJeZdzkpp6mQdJ5Kse/twgleHYcAW7z 7X/GRsEuul9EsjNfGF/EssrSATps/XI9Br3Xsi9+lSWRgmK9nkI8cLfZYfJi6pM1ocP0 kSuiAAyfit8NsdvaasWCM243paJ27NjYmVOPNxKhQ/EWw9RbkeaUqiMmBLtQhtPTRsep orCA== X-Forwarded-Encrypted: i=1; AJvYcCX7a7b48kjYkUr81EGP2VlECqyHWzazHtbEmtZkGOUE6S5Vht7Q1xvzBCPqHBKlSYz/fmtB+hc=@vger.kernel.org X-Gm-Message-State: AOJu0YwYOjn6R9kBZLZtV9KnhHgcZQtbn0I5P5FYdzarrZCuH+c7E1sv qNeqO2mL2vi91KJ5cYwie8KvHI2oAFlDRVI2G1XoU0N575bODr/hjTOmI1fKB98= X-Gm-Gg: ASbGnctoEO449r/R1QP+2b0iJ/0rM2dX2eFyx51RQ0U4GLMH+AzhGzCvC3NLxfChp83 nCLsUlGHk3KZlqQRT0mijslMhA6jGlUdiaKekmqi4PLC0Wo7YhnXEOyJvERuqE4iAOtR/+6MOqH P0G4X9ZsfvMTYGHRPjXmJPfsz9lsjOG6Ud8kDtRLmA/yqyIKwkT2aPoJKa+gTDMXGJglZh9t9XA HouxvmBPUL3bEVK2U1BBwJS4jg9Hgap/PJHbbrY X-Google-Smtp-Source: AGHT+IGUwFlPuQwssBrA77nLgmuppWrqbXs2hiwqrPLXGXlmES2mfWsoRU0mbok3y06DaBxzYzsStw== X-Received: by 2002:a17:90a:d64f:b0:2ee:94d1:7a89 with SMTP id 98e67ed59e1d1-2f548ea62aemr6394544a91.1.1736374033276; Wed, 08 Jan 2025 14:07:13 -0800 (PST) Received: from localhost ([2a03:2880:ff:2::]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f54a2f4d55sm2077432a91.51.2025.01.08.14.07.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:12 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 08/22] net: page_pool: add callback for mp info printing Date: Wed, 8 Jan 2025 14:06:29 -0800 Message-ID: <20250108220644.3528845-9-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Add a mandatory callback that prints information about the memory provider to netlink. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/page_pool/memory_provider.h | 5 +++++ net/core/devmem.c | 10 ++++++++++ net/core/netdev-genl.c | 11 ++++++----- net/core/page_pool_user.c | 5 ++--- 4 files changed, 23 insertions(+), 8 deletions(-) diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h index 79412a8714fa..5f9d4834235d 100644 --- a/include/net/page_pool/memory_provider.h +++ b/include/net/page_pool/memory_provider.h @@ -10,11 +10,16 @@ #include #include +struct netdev_rx_queue; +struct sk_buff; + struct memory_provider_ops { netmem_ref (*alloc_netmems)(struct page_pool *pool, gfp_t gfp); bool (*release_netmem)(struct page_pool *pool, netmem_ref netmem); int (*init)(struct page_pool *pool); void (*destroy)(struct page_pool *pool); + int (*nl_fill)(void *mp_priv, struct sk_buff *rsp, + struct netdev_rx_queue *rxq); }; #endif diff --git a/net/core/devmem.c b/net/core/devmem.c index 48833c1dcbd4..c0bde0869f72 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -395,9 +395,19 @@ bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem) return false; } +static int mp_dmabuf_devmem_nl_fill(void *mp_priv, struct sk_buff *rsp, + struct netdev_rx_queue *rxq) +{ + const struct net_devmem_dmabuf_binding *binding = mp_priv; + int type = rxq ? NETDEV_A_QUEUE_DMABUF : NETDEV_A_PAGE_POOL_DMABUF; + + return nla_put_u32(rsp, type, binding->id); +} + static const struct memory_provider_ops dmabuf_devmem_ops = { .init = mp_dmabuf_devmem_init, .destroy = mp_dmabuf_devmem_destroy, .alloc_netmems = mp_dmabuf_devmem_alloc_netmems, .release_netmem = mp_dmabuf_devmem_release_page, + .nl_fill = mp_dmabuf_devmem_nl_fill, }; diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 2d3ae0cd3ad2..4bc05fb27890 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "dev.h" #include "devmem.h" @@ -368,7 +369,6 @@ static int netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev, u32 q_idx, u32 q_type, const struct genl_info *info) { - struct net_devmem_dmabuf_binding *binding; struct netdev_rx_queue *rxq; struct netdev_queue *txq; void *hdr; @@ -385,15 +385,16 @@ netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev, switch (q_type) { case NETDEV_QUEUE_TYPE_RX: rxq = __netif_get_rx_queue(netdev, q_idx); + struct pp_memory_provider_params *params; + if (rxq->napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID, rxq->napi->napi_id)) goto nla_put_failure; - binding = rxq->mp_params.mp_priv; - if (binding && - nla_put_u32(rsp, NETDEV_A_QUEUE_DMABUF, binding->id)) + params = &rxq->mp_params; + if (params->mp_ops && + params->mp_ops->nl_fill(params->mp_priv, rsp, rxq)) goto nla_put_failure; - break; case NETDEV_QUEUE_TYPE_TX: txq = netdev_get_tx_queue(netdev, q_idx); diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c index 8d31c71bea1a..bd017537fa80 100644 --- a/net/core/page_pool_user.c +++ b/net/core/page_pool_user.c @@ -7,9 +7,9 @@ #include #include #include +#include #include -#include "devmem.h" #include "page_pool_priv.h" #include "netdev-genl-gen.h" @@ -214,7 +214,6 @@ static int page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool, const struct genl_info *info) { - struct net_devmem_dmabuf_binding *binding = pool->mp_priv; size_t inflight, refsz; void *hdr; @@ -244,7 +243,7 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool, pool->user.detach_time)) goto err_cancel; - if (binding && nla_put_u32(rsp, NETDEV_A_PAGE_POOL_DMABUF, binding->id)) + if (pool->mp_ops && pool->mp_ops->nl_fill(pool->mp_priv, rsp, NULL)) goto err_cancel; genlmsg_end(rsp, hdr); From patchwork Wed Jan 8 22:06:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931627 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A9EC204F73 for ; Wed, 8 Jan 2025 22:07:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374036; cv=none; b=ZaF/XDSKd8yKh44qCP1yIc8MlcnomP2p5zrT0X0V5aFgXUwHH7aLfeeu8Kv7QVoBfMgA2NR0M5py8vmm8gHqi+zYalI0vD+FpqpDwp3kHMFTZiXkCJKMXGVFciuKjSu6ZocwNLu3Z4kNuiEepdKjfeCdKVOg3Z2G6BgTrlsYJa8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374036; c=relaxed/simple; bh=aHuuTS4lj9jHydflnLw45LXqHHlddqracVR8/7S9y1Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tmjou69heFvOBgh0/Wjt4H0rygdEcbFDvTk2EvzByZgiHFchqNPnU3b02LEBWy8dj9CrXx7cHdSEtifvN028PnkQiBBSurrq3kQzdNYNtJ/rmSE43NcIpvO5Icq/Fpuc13sAqh/vXj0tRDYNSTsUHsD28wwa/rfXMUMdDttoUdU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=Dl9r/Xgr; arc=none smtp.client-ip=209.85.216.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="Dl9r/Xgr" Received: by mail-pj1-f48.google.com with SMTP id 98e67ed59e1d1-2f43da61ba9so371028a91.2 for ; Wed, 08 Jan 2025 14:07:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374034; x=1736978834; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Pgcp/Pu4u1HA5kSIl2CYgQKpujUzs/1MRh8eEoDuahI=; b=Dl9r/Xgr3KIhrJlDkrHn9vRxKf3tFdlkHpEyCW6GL50i8qYYesvCUX060BxdSy+xng 3NVRPK3680svmsQktQ5SDbdMn+/VQS3C6mBTXnvYtScGXqWnYSA1ivhKF5xaI+2v/8lM pOn5vOn+Ql6xZLGJNbqFXc9f/xHPS5AR0pT9MU1RJCfXYz2plpwt7YoFawjwHV8wxqee vCUOlW3qisVhDH8kP24M8jjTm/zzqkGPcGH1Opu1gEyDj+CX5e9gab9yBGAct0c3T+hv Fk1dK8H7/FEW9Ffl6bG/Y92v45Q6TcGfEcciA62naqtPB86PXGtyT7IewEIRLrM7cc8f +jBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374034; x=1736978834; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Pgcp/Pu4u1HA5kSIl2CYgQKpujUzs/1MRh8eEoDuahI=; b=ENSLw5papAe+1R7u0B3RelIlZ782ZXQG5z/yEBFZ4u17LBWfIY9T79EO4fgOgJRrmp I0A3dEq4onxKvvWkmV9jNPJauV6VsXLNbKU5BDhdWBbd7IAaSYntYc2DvY96N+hI2Geq zzIT+R6FHF8SURK0gSf3APj048Npc0XYLop4A3YeSJCyPksLmWI6pLIIKCHM8YK3eb0y ZsB+ayJ+2YyaHiT2NHsISfJq9egxT3t1EvoUqjAIAKNPVlKgIr1Nk5KH9+f5LlVpTPoJ DhFj9w5wdijEhPF4sg02XntpFefjZrU0EkVtRUiKKKWr4/BhrZv2K3vhQMGQv73mCZt+ n3FA== X-Forwarded-Encrypted: i=1; AJvYcCUd3wzK+p1f4Arq9S6qGMl3F/61ZZeb2YDwMhycQ7GwaAgv6H+tWukwhmZ4y1Fq8Vclx7bOlAw=@vger.kernel.org X-Gm-Message-State: AOJu0YyFL9UeovYPDgfNvIciAnAr7xfoGKuxvHwoPHry5m3f1kRCQU05 q1ndRMCmat+uQDqxit9mys530FQHCgv27L3dLuCb77peBpDZcAJMhwrVtNcB62Q= X-Gm-Gg: ASbGnctYTY/nZTmvNWChGdFhmvzLKoRnKtb5GcwSADRHOVZ3OINQ4LdjreyjEci6VWn qaE612XrAgzZ4nnXyx7sk6pj2z5z65NoSvu8ROxFdyFvhKr5+XiY4wrIpPsFgfbqVOzDa06Ch6w L1pa7AsLmuS/um6NDBV2vEXZ02FSwLiRUIWgYIlWjlIir/9jhBMjiHRhsQ4OczellDoSF4dE7V9 e2jcQ+RW/k2V3dYA3ztfjktgBBl5xGnuBXFsqA3 X-Google-Smtp-Source: AGHT+IEiy8bLVMUcIbGyXB/kIMWCKTwETpPN2YvRhGL+63aBroakDn8FcEMI9x5hecd7xU3G1xEF4g== X-Received: by 2002:a17:90b:2d43:b0:2ee:9d36:6821 with SMTP id 98e67ed59e1d1-2f5490bd0d6mr5942226a91.27.1736374034633; Wed, 08 Jan 2025 14:07:14 -0800 (PST) Received: from localhost ([2a03:2880:ff:3::]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f54a26ac45sm2106877a91.7.2025.01.08.14.07.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:14 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 09/22] net: page_pool: add a mp hook to unregister_netdevice* Date: Wed, 8 Jan 2025 14:06:30 -0800 Message-ID: <20250108220644.3528845-10-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Devmem TCP needs a hook in unregister_netdevice_many_notify() to upkeep the set tracking queues it's bound to, i.e. ->bound_rxqs. Instead of devmem sticking directly out of the genetic path, add a mp function. Reviewed-by: Jakub Kicinski Reviewed-by: Mina Almasry Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/page_pool/memory_provider.h | 1 + net/core/dev.c | 16 ++++++++++- net/core/devmem.c | 36 +++++++++++-------------- net/core/devmem.h | 5 ---- 4 files changed, 32 insertions(+), 26 deletions(-) diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h index 5f9d4834235d..ef7c28ddf39d 100644 --- a/include/net/page_pool/memory_provider.h +++ b/include/net/page_pool/memory_provider.h @@ -20,6 +20,7 @@ struct memory_provider_ops { void (*destroy)(struct page_pool *pool); int (*nl_fill)(void *mp_priv, struct sk_buff *rsp, struct netdev_rx_queue *rxq); + void (*uninstall)(void *mp_priv, struct netdev_rx_queue *rxq); }; #endif diff --git a/net/core/dev.c b/net/core/dev.c index c7f3dea3e0eb..1d99974e8fba 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -157,6 +157,7 @@ #include #include #include +#include #include #include @@ -11464,6 +11465,19 @@ void unregister_netdevice_queue(struct net_device *dev, struct list_head *head) } EXPORT_SYMBOL(unregister_netdevice_queue); +static void dev_memory_provider_uninstall(struct net_device *dev) +{ + unsigned int i; + + for (i = 0; i < dev->real_num_rx_queues; i++) { + struct netdev_rx_queue *rxq = &dev->_rx[i]; + struct pp_memory_provider_params *p = &rxq->mp_params; + + if (p->mp_ops && p->mp_ops->uninstall) + p->mp_ops->uninstall(rxq->mp_params.mp_priv, rxq); + } +} + void unregister_netdevice_many_notify(struct list_head *head, u32 portid, const struct nlmsghdr *nlh) { @@ -11516,7 +11530,7 @@ void unregister_netdevice_many_notify(struct list_head *head, dev_tcx_uninstall(dev); dev_xdp_uninstall(dev); bpf_dev_bound_netdev_unregister(dev); - dev_dmabuf_uninstall(dev); + dev_memory_provider_uninstall(dev); netdev_offload_xstats_disable_all(dev); diff --git a/net/core/devmem.c b/net/core/devmem.c index c0bde0869f72..6f46286d45a9 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -309,26 +309,6 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, return ERR_PTR(err); } -void dev_dmabuf_uninstall(struct net_device *dev) -{ - struct net_devmem_dmabuf_binding *binding; - struct netdev_rx_queue *rxq; - unsigned long xa_idx; - unsigned int i; - - for (i = 0; i < dev->real_num_rx_queues; i++) { - binding = dev->_rx[i].mp_params.mp_priv; - if (!binding) - continue; - - xa_for_each(&binding->bound_rxqs, xa_idx, rxq) - if (rxq == &dev->_rx[i]) { - xa_erase(&binding->bound_rxqs, xa_idx); - break; - } - } -} - /*** "Dmabuf devmem memory provider" ***/ int mp_dmabuf_devmem_init(struct page_pool *pool) @@ -404,10 +384,26 @@ static int mp_dmabuf_devmem_nl_fill(void *mp_priv, struct sk_buff *rsp, return nla_put_u32(rsp, type, binding->id); } +static void mp_dmabuf_devmem_uninstall(void *mp_priv, + struct netdev_rx_queue *rxq) +{ + struct net_devmem_dmabuf_binding *binding = mp_priv; + struct netdev_rx_queue *bound_rxq; + unsigned long xa_idx; + + xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) { + if (bound_rxq == rxq) { + xa_erase(&binding->bound_rxqs, xa_idx); + break; + } + } +} + static const struct memory_provider_ops dmabuf_devmem_ops = { .init = mp_dmabuf_devmem_init, .destroy = mp_dmabuf_devmem_destroy, .alloc_netmems = mp_dmabuf_devmem_alloc_netmems, .release_netmem = mp_dmabuf_devmem_release_page, .nl_fill = mp_dmabuf_devmem_nl_fill, + .uninstall = mp_dmabuf_devmem_uninstall, }; diff --git a/net/core/devmem.h b/net/core/devmem.h index a2b9913e9a17..8e999fe2ae67 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -68,7 +68,6 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding); int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, struct net_devmem_dmabuf_binding *binding, struct netlink_ext_ack *extack); -void dev_dmabuf_uninstall(struct net_device *dev); static inline struct dmabuf_genpool_chunk_owner * net_devmem_iov_to_chunk_owner(const struct net_iov *niov) @@ -145,10 +144,6 @@ net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, return -EOPNOTSUPP; } -static inline void dev_dmabuf_uninstall(struct net_device *dev) -{ -} - static inline struct net_iov * net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding) { From patchwork Wed Jan 8 22:06:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931628 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7439A204F8D for ; Wed, 8 Jan 2025 22:07:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374038; cv=none; b=nhSFUVlGWEheLALjNhRJbU8DdOpegg/MQad1Krp692mqmGRLeVT+ykMe2R09yXDRWIbUgitTOwxxXEoDNBRBr8pmcYwVwVcrvQUso3e58GdDHywl0xDxoDKVxpBy17a4BszskcmR5JOfTn1iXWV7L9F/V9rTh0lsh509mvw3UPI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374038; c=relaxed/simple; bh=Bd0rHranX1tldtf5eS5ic1T+HB/zykxgLBff110E+Tw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kmQhv0d4j7s613cIPuhTqxviNUx/JgQ+cb86ridDC9qGjAo5HXnRwIIGrXEblYyHIWJJ8NF3CMpAhYVGX6I+kDTKnMpJ6OVe3uZW6gO2dVucAYctDUi/77olJT/jG1t5chLLpHioVYF+7Zd1nLdFp1+QcMe9CQ6hYEbVMyZXqHs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=mqAecmdf; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="mqAecmdf" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2164b1f05caso3364215ad.3 for ; Wed, 08 Jan 2025 14:07:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374036; x=1736978836; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/iXcgdxDm1mzjOKK5nUQi77bzCX5UD7reUTrJiMtg6k=; b=mqAecmdfJtA0qCZk8yIIe4uo40IiwYIraz+RzxxSAjeOjL9THEJljoVM6KxOOTlJxL DfRA2uteIoGdfbt74i2a0ATMJkdYd6bnuMSrcH0OxSq0lddyyqRWsweqaqjvNjBsNBfv lCU54VhRcgKlbXfpESjctE7RZGOcM+tLEsqgu5tuWU30HpSslYoJMtD970vvBLEqGID+ bCz8hpUVFjfOR3rR5wl2ol4/2au5b0dmEHA7RCgzYkJSYj1PNKzjwfh/MA+78yI0hHUZ I2mBRrWlAVoA7Lj7ZKubM6jDRZb1xbGjqNqA1mRSOrbTHWiuWHM5XiD8DduBaUNV5+Jy cQLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374036; x=1736978836; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/iXcgdxDm1mzjOKK5nUQi77bzCX5UD7reUTrJiMtg6k=; b=W9iOjKq6GZqzf74ckf2t+lBNoiEc5tODUHl4MnVfRDm4jGYpepIVM5Q13cxRV63vjD QitlUgeQhXiz+ucPYaI6HpzZ/dWiNhgFAHCcDtSRNTKEnMB1O3gN0yKTIA1oyE8DlkiY DypmMXWRE/eBFaV5BzSJEcyt+4AjaDRySAIJlDbPzypidFDfMwfxtvRWIDfi6W5AjUGL VYH8d5QQiEvsSUvX/LW0raMheDlO6dA28OmsFSvZyOj86QdaCEfVYmgVqD9neNV2YkBp 8W1iUGqPB2xPKM7ZJn52C0OMfowdri7I/maoYXfVMaIeK/EZZDgO95qSPxmC/VISxu6y CS4A== X-Forwarded-Encrypted: i=1; AJvYcCUgFvnf6NwzgcYXeYvfgnj9+STjr4XWGM64ioxdg0du9yvOKaTwq+yeEoSlRTdHubgg4YSga2Y=@vger.kernel.org X-Gm-Message-State: AOJu0YxCjkK5PytU9GoLT2tIP3hUCsv7QfmUNW25En5eeun97t/m0gGD lECwR3REPpjBNWyX15mdLY1W7oms1/b4cTEfTW7F8zfsMccp7mSn00ezwo0TL3M= X-Gm-Gg: ASbGncsK3Q32gfvrQGj4tlpuVk+WTS+Tqe5GdJDV1FbKviJ263QHrU9cMKh1tnOoGC+ h85XH1ioxkBYSDG2eYE1OqLqwRtnx9zvHAmf0d3ZjOBFi0vv75pHh7ZgW2+XMHy+L/W9JJ0+X7h dnImJ9soJlG/+xwDIez3GY0wG44aWMqMg9wtjtv2jofE6/7NebkvaQF4wxqFzb+6HA4DUxAflta LeQtDTVaypC6jeuSoD7pal34/geuSOSSa7ElI1EEQ== X-Google-Smtp-Source: AGHT+IEhULlhi05aHg7h595uBywuyU3yAhGRtdIPLo8jQjGgQZgTPBEXackKna6YQgxV88TWSVehMg== X-Received: by 2002:a17:902:ea08:b0:216:3876:2cff with SMTP id d9443c01a7336-21a83fe4cb5mr58816885ad.54.1736374035821; Wed, 08 Jan 2025 14:07:15 -0800 (PST) Received: from localhost ([2a03:2880:ff:10::]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-842aba72f44sm33318215a12.15.2025.01.08.14.07.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:15 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 10/22] net: prepare for non devmem TCP memory providers Date: Wed, 8 Jan 2025 14:06:31 -0800 Message-ID: <20250108220644.3528845-11-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov There is a good bunch of places in generic paths assuming that the only page pool memory provider is devmem TCP. As we want to reuse the net_iov and provider infrastructure, we need to patch it up and explicitly check the provider type when we branch into devmem TCP code. Reviewed-by: Mina Almasry Reviewed-by: Jakub Kicinski Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- net/core/devmem.c | 5 +++++ net/core/devmem.h | 7 +++++++ net/ipv4/tcp.c | 5 +++++ 3 files changed, 17 insertions(+) diff --git a/net/core/devmem.c b/net/core/devmem.c index 6f46286d45a9..8fcb0c7b63be 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -29,6 +29,11 @@ static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1); static const struct memory_provider_ops dmabuf_devmem_ops; +bool net_is_devmem_iov(struct net_iov *niov) +{ + return niov->pp->mp_ops == &dmabuf_devmem_ops; +} + static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool, struct gen_pool_chunk *chunk, void *not_used) diff --git a/net/core/devmem.h b/net/core/devmem.h index 8e999fe2ae67..7fc158d52729 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -115,6 +115,8 @@ struct net_iov * net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding); void net_devmem_free_dmabuf(struct net_iov *ppiov); +bool net_is_devmem_iov(struct net_iov *niov); + #else struct net_devmem_dmabuf_binding; @@ -163,6 +165,11 @@ static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov) { return 0; } + +static inline bool net_is_devmem_iov(struct net_iov *niov) +{ + return false; +} #endif #endif /* _NET_DEVMEM_H */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index b872de9a8271..7f43d31c9400 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2476,6 +2476,11 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb, } niov = skb_frag_net_iov(frag); + if (!net_is_devmem_iov(niov)) { + err = -ENODEV; + goto out; + } + end = start + skb_frag_size(frag); copy = end - offset; From patchwork Wed Jan 8 22:06:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931629 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBAC3204F94 for ; Wed, 8 Jan 2025 22:07:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374039; cv=none; b=iO2N+Bf4aRCmD37kpORTstAjbL+kvNpepRyGD591Vz6WaQ/Ppwp8T4aXTDE2Youe2lbWj4G6myaDI794yq+Ut7PXMBUYrhLQKT8Vm4tsZJOZWS4n84880F5rs5GHOCqbxNw9MaXJ93HTev2WTqCASuFNprnHY39nDhv8Esn6C/4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374039; c=relaxed/simple; bh=M/dVYf0LGQa6CQVsOOIFCgdYwKSSZAsHqwkr/NwsAxI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n0b+myFjKct2ziogOIm2y3NP2PVmiREwwzBqPfeCtFuHQqHzdeb1Fef6tk5GN16c/E4jdYZ8dIaMi5527n6ZcvofbjzOkQgg8+gDftrsaxbLgEFq6SQIPb3zOsVtDLtG1YH584PGv15fZzdHNP2sYLV8ttRllwsxzQBK0LxItQw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=H3p8llut; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="H3p8llut" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-219f8263ae0so3698355ad.0 for ; Wed, 08 Jan 2025 14:07:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374037; x=1736978837; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=E88BGwIt5bUHwmtMswODWXw/q/DmBb3EpfR2MxHweGI=; b=H3p8llutR+uM4anuzB0jMjBQzYP2xOHJTK7FNZJ/b6iWUHksbJth1yL1ulC5bLFocK ni2LYGBNlco5HGTO64QGUrPtlSnW09G+iU1tsTYK8MrPzTfLqgu0KOEYAkG1spfPn/g3 RKtbB11e06nU8na2HIx2AQwvXyxM2DYX9EiH19bppCBcPNqYDapJEZOmA1xFyX9gR7Aa EpLLBxkhkyWsclKbPBdTPrQ4TWldIyUJkll6TNEC6g8a8BJ/jK/FvqH+7kPegkt0TsgW A/14DaHmc1a31IMJVV/d9e3kpbgfoJncuVtObzSiN7uDlvIHA3EU5iEWLdKz5uqAgXh+ MDCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374037; x=1736978837; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=E88BGwIt5bUHwmtMswODWXw/q/DmBb3EpfR2MxHweGI=; b=unpYD+V0yNU/BRvK+MAED8zvhoD+udKbgVz3Qv1KeFXw5dVmU4AtU14j4mRc9wFJmh VY78c8dy6qgfA7KaBElKxldC5xYmnIqedL+LBb/v/9IxiCxaPSn4uIKdVJwiJ+JOScth zsqCEYyTPeQxkcLxuEo3yWvj3irz/1anDeH0lr+2ZwL80ZbPWde/izTZ6PuPMOI0ZCvt n4itJ2kIq/6O5e8Y1bKegFYIqBGIPDHkyHfK/BKm7h2NU4SwQ/rkosw78Um14FGO84wn fasinRsbN6z5tGyMNKTjpZGnywrBxEeWKXeCqd5mcKbLTjjP6fHEKsyNnAQyxxA01BOu JoIg== X-Forwarded-Encrypted: i=1; AJvYcCUO9oO3mTqfGVXukrm9BnQqnoYX6bvqCsV8u0BzyQqIEl8Tc/LwqqIBDo8PvEEM89W4cCh7U5o=@vger.kernel.org X-Gm-Message-State: AOJu0Yw8QnXDXervjQnRKpMl+6vqS1cKIODPqLZkM05B/mfuKWD59ANP /+WsRIT4/tqO880W/CWEtVpyNJvWcyCDv/X4QP0t7f0yh8Bs6Dcb+nKA+H7VOso= X-Gm-Gg: ASbGnctCEcMn9zo9igyNuEyUbRuQoNCxKZMj7Qi+Pl39zDXiuocAS3pBdIjHy79p8+6 cJOpU1IZidW5HVE3VQy4zK3YJhlsOwCQ7DnPdWV26CALWWUjW4apdV7usnJpxtaOaGlFG35pRSR 0W/XPAlH460HIEYiGSXya+q99+xb9SI0j9SZZsOy25kF26Eb3vV+1Sd44QFzgHjrJG5pLGyBqpl D/peNimytfOJLuceFHBSocnrXy7wdzUFFOu9aXN5A== X-Google-Smtp-Source: AGHT+IEfClJH41YC9th1DSIqA6Ne/cqSeNeurvw5YSc9k/gX7ZNkjtHs9UixT3lobYiVXXMDd61LMQ== X-Received: by 2002:a05:6a00:39a6:b0:724:e75b:22d1 with SMTP id d2e1a72fcca58-72d21ff524emr6060020b3a.16.1736374037114; Wed, 08 Jan 2025 14:07:17 -0800 (PST) Received: from localhost ([2a03:2880:ff:18::]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72aad8361b7sm35665238b3a.74.2025.01.08.14.07.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:16 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 11/22] net: page_pool: add memory provider helpers Date: Wed, 8 Jan 2025 14:06:32 -0800 Message-ID: <20250108220644.3528845-12-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Add helpers for memory providers to interact with page pools. net_mp_niov_{set,clear}_page_pool() serve to [dis]associate a net_iov with a page pool. If used, the memory provider is responsible to match "set" calls with "clear" once a net_iov is not going to be used by a page pool anymore, changing a page pool, etc. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/page_pool/memory_provider.h | 18 ++++++++++++++++++ net/core/page_pool.c | 23 +++++++++++++++++++++++ 2 files changed, 41 insertions(+) diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h index ef7c28ddf39d..c58ac54adb2f 100644 --- a/include/net/page_pool/memory_provider.h +++ b/include/net/page_pool/memory_provider.h @@ -23,4 +23,22 @@ struct memory_provider_ops { void (*uninstall)(void *mp_priv, struct netdev_rx_queue *rxq); }; +void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov); +void net_mp_niov_clear_page_pool(struct net_iov *niov); + +/* + * net_mp_netmem_place_in_cache() - give a netmem to a page pool + * @pool: the page pool to place the netmem into + * @netmem: netmem to give + * + * Push an accounted netmem into the page pool's allocation cache. The caller + * must ensure that there is space in the cache. It should only be called off + * the mp_ops->alloc_netmems() path. + */ +static inline void net_mp_netmem_place_in_cache(struct page_pool *pool, + netmem_ref netmem) +{ + pool->alloc.cache[pool->alloc.count++] = netmem; +} + #endif diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 0c5da8c056ec..29591177eb31 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -1212,3 +1212,26 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid) } } EXPORT_SYMBOL(page_pool_update_nid); + +/* Associate a niov with a page pool. Should follow with a matching + * net_mp_niov_clear_page_pool() + */ +void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov) +{ + netmem_ref netmem = net_iov_to_netmem(niov); + + page_pool_set_pp_info(pool, netmem); + + pool->pages_state_hold_cnt++; + trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt); +} + +/* Disassociate a niov from a page pool. Should only be used in the + * ->release_netmem() path. + */ +void net_mp_niov_clear_page_pool(struct net_iov *niov) +{ + netmem_ref netmem = net_iov_to_netmem(niov); + + page_pool_clear_pp_info(netmem); +} From patchwork Wed Jan 8 22:06:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931630 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B8E3204F98 for ; Wed, 8 Jan 2025 22:07:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374040; cv=none; b=G1YyTjB6pqDcUrrk2ykr9gIet/TgPBMbh3sg4l+3hc32SR2xQlui1dIpB2ky3TLy7RlsjTbITKIvleVTgp7uV8NrUhC05T3oVrPeKy90v+DMOlPCm7Az/0DtJSX8VEoOVuxWG7BA5wT1yO2unOl0MH8ZBWHpweIACAd/oHOopNo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374040; c=relaxed/simple; bh=naGHykPCFniyVspq3/qrf1hb5UiMHZZj43LEdVC/S7s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nkLiWUXs+TW/r13Kdcj/VSU81BEzGt/v9oYRQ+oeyoICh9RkAQO1+wlWTnV7SOiTrfWVpoFCpkGi9shI9pPpq1cgGF/KOrExB4F3xoT17KzlAKkCFImoSlhad56uwdZUEekfRoNx741hY7t14/fFesXF+ysmD7TwBklTiqaZJlw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=tom2Jp/c; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="tom2Jp/c" Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-21654fdd5daso3393835ad.1 for ; Wed, 08 Jan 2025 14:07:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374038; x=1736978838; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QFeVztGJ0Opek7DCa1BcFSjSzFZpJHDDY/O4Y65sHAo=; b=tom2Jp/cv7tv1LICAbbYjvSoDSy57Sfgtlcp1HetAH0SHuBzsGmSkZjurkdBBJdatg NHDnaAYzdth7Dnwn6jlkrPYKoKItDXf6YNfkrAmKdsA2++A1bQjgZfb4Uovk145JcH3e CEX1wxYcUCVMdBNEDDwvVnlrzt50dzGWlOZKVIGn5dWqdXEWU2SsPxxLRV2bKDtxE/Wp qFz0boR1EtDJF35sUx6TQdTwA/uqkFABXgcgGECzNOi++bFqn5ogkmmnTMRi9mjMCM1g REGDr8WZBjklT4ioj54Gn0FMN3xPqoESeppmN2fvW8B9B1rXrwPvwmuDEJUaRJ8VVrTX KySQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374038; x=1736978838; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QFeVztGJ0Opek7DCa1BcFSjSzFZpJHDDY/O4Y65sHAo=; b=kiiA2NaOLPXd21wXYtTMjEETmdSE+cl/JyLk7pCjGiKZ9uDKi9T+nQ7Dqy5m1rFbma IqSjCmo4cmF9nNpYtDfRdmxV5N7mLXzNyiVnoNpXea6RUK6WBwfrUn41MSia2gc0d1f1 Zn8SCyouyBUOAP9dxw0qiq9HS2tKC2EXWTa5CCYKGLT2E5QDD+fFpyWptAF14UxYiOTO wFlb4/bniCbzjTycBXuTwnbtyQV8keMmaoSmqmX0JZLQYYBsbeXH0vfr19uBTIUL1zjX CP3vEBukG5YalE3Eb8J4aX/eKNIKQVJX6d/vqEhqk2aFm//A5kn7ZHvu3TZevNI/fc2i TAEw== X-Forwarded-Encrypted: i=1; AJvYcCWU+mCW27v59dp7or0xLtgFki2aAezucnX5c282t+p8cC5TsWU9yZZdkulZ/HvvwlLEKTH0Qq4=@vger.kernel.org X-Gm-Message-State: AOJu0YxBxe3t8oWQF2qy3mrOBkimqTUURJ0AD4vW/GntM5suvfjS9hw8 9tr8drTx9+SW0DNcxB9BJnLZ/HwdX4P0Nvyem5HsKSXREMvv4+iIp8mg+neCgiQ= X-Gm-Gg: ASbGnculGx0+zUICxYkxF86P/xHGBhIyQ1eJVRtU/Q+xLMOUHw5sMta8ZJOSvIl2tFA ZMlfm87EhpvIijaiAycoTiZo+9Dow5ShRXtCGVO+qlcGzzhHUYup+E+WrH+yWj//0Yuu5QbGdiL BAqZaTOaetqGgNc6LiFqSn2yLeMOKN2pElObM3spav0G9cHLU+gBtdvUWv/EK6yy23Z871NQUVp bvskQrD9WI883hef5HtjmIJsNsMxeAr5BjatDoT X-Google-Smtp-Source: AGHT+IE24RQ1Amv7nnBmH3HgiSU3yV/dUGMhCv1HUvCrpIA2mBoka9jrITV3xG42JlN1o/OEhFKPEA== X-Received: by 2002:a05:6a00:3a01:b0:71d:eb7d:20d5 with SMTP id d2e1a72fcca58-72d21f31848mr6897797b3a.8.1736374038361; Wed, 08 Jan 2025 14:07:18 -0800 (PST) Received: from localhost ([2a03:2880:ff:1::]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72aad81644csm35639178b3a.33.2025.01.08.14.07.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:17 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 12/22] io_uring/zcrx: add interface queue and refill queue Date: Wed, 8 Jan 2025 14:06:33 -0800 Message-ID: <20250108220644.3528845-13-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Add a new object called an interface queue (ifq) that represents a net rx queue that has been configured for zero copy. Each ifq is registered using a new registration opcode IORING_REGISTER_ZCRX_IFQ. The refill queue is allocated by the kernel and mapped by userspace using a new offset IORING_OFF_RQ_RING, in a similar fashion to the main SQ/CQ. It is used by userspace to return buffers that it is done with, which will then be re-used by the netdev again. The main CQ ring is used to notify userspace of received data by using the upper 16 bytes of a big CQE as a new struct io_uring_zcrx_cqe. Each entry contains the offset + len to the data. For now, each io_uring instance only has a single ifq. Reviewed-by: Jens Axboe Signed-off-by: David Wei --- Kconfig | 2 + include/linux/io_uring_types.h | 6 ++ include/uapi/linux/io_uring.h | 43 +++++++++- io_uring/KConfig | 10 +++ io_uring/Makefile | 1 + io_uring/io_uring.c | 7 ++ io_uring/memmap.h | 1 + io_uring/register.c | 7 ++ io_uring/zcrx.c | 149 +++++++++++++++++++++++++++++++++ io_uring/zcrx.h | 35 ++++++++ 10 files changed, 260 insertions(+), 1 deletion(-) create mode 100644 io_uring/KConfig create mode 100644 io_uring/zcrx.c create mode 100644 io_uring/zcrx.h diff --git a/Kconfig b/Kconfig index 745bc773f567..529ea7694ba9 100644 --- a/Kconfig +++ b/Kconfig @@ -30,3 +30,5 @@ source "lib/Kconfig" source "lib/Kconfig.debug" source "Documentation/Kconfig" + +source "io_uring/KConfig" diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 493a8f7fa8e4..f0c6e18d176a 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -40,6 +40,8 @@ enum io_uring_cmd_flags { IO_URING_F_TASK_DEAD = (1 << 13), }; +struct io_zcrx_ifq; + struct io_wq_work_node { struct io_wq_work_node *next; }; @@ -384,6 +386,8 @@ struct io_ring_ctx { struct wait_queue_head poll_wq; struct io_restriction restrictions; + struct io_zcrx_ifq *ifq; + u32 pers_next; struct xarray personalities; @@ -436,6 +440,8 @@ struct io_ring_ctx { struct io_mapped_region ring_region; /* used for optimised request parameter and wait argument passing */ struct io_mapped_region param_region; + /* just one zcrx per ring for now, will move to io_zcrx_ifq eventually */ + struct io_mapped_region zcrx_region; }; struct io_tw_state { diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 38f0d6b10eaf..3af8b7a19824 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -638,7 +638,8 @@ enum io_uring_register_op { /* send MSG_RING without having a ring */ IORING_REGISTER_SEND_MSG_RING = 31, - /* 32 reserved for zc rx */ + /* register a netdev hw rx queue for zerocopy */ + IORING_REGISTER_ZCRX_IFQ = 32, /* resize CQ ring */ IORING_REGISTER_RESIZE_RINGS = 33, @@ -955,6 +956,46 @@ enum io_uring_socket_op { SOCKET_URING_OP_SETSOCKOPT, }; +/* Zero copy receive refill queue entry */ +struct io_uring_zcrx_rqe { + __u64 off; + __u32 len; + __u32 __pad; +}; + +struct io_uring_zcrx_cqe { + __u64 off; + __u64 __pad; +}; + +/* The bit from which area id is encoded into offsets */ +#define IORING_ZCRX_AREA_SHIFT 48 +#define IORING_ZCRX_AREA_MASK (~(((__u64)1 << IORING_ZCRX_AREA_SHIFT) - 1)) + +struct io_uring_zcrx_offsets { + __u32 head; + __u32 tail; + __u32 rqes; + __u32 __resv2; + __u64 __resv[2]; +}; + +/* + * Argument for IORING_REGISTER_ZCRX_IFQ + */ +struct io_uring_zcrx_ifq_reg { + __u32 if_idx; + __u32 if_rxq; + __u32 rq_entries; + __u32 flags; + + __u64 area_ptr; /* pointer to struct io_uring_zcrx_area_reg */ + __u64 region_ptr; /* struct io_uring_region_desc * */ + + struct io_uring_zcrx_offsets offsets; + __u64 __resv[4]; +}; + #ifdef __cplusplus } #endif diff --git a/io_uring/KConfig b/io_uring/KConfig new file mode 100644 index 000000000000..9e2a4beba1ef --- /dev/null +++ b/io_uring/KConfig @@ -0,0 +1,10 @@ +# SPDX-License-Identifier: GPL-2.0-only +# +# io_uring configuration +# + +config IO_URING_ZCRX + def_bool y + depends on PAGE_POOL + depends on INET + depends on NET_RX_BUSY_POLL diff --git a/io_uring/Makefile b/io_uring/Makefile index 53167bef37d7..a95b0b8229c9 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -14,6 +14,7 @@ obj-$(CONFIG_IO_URING) += io_uring.o opdef.o kbuf.o rsrc.o notif.o \ epoll.o statx.o timeout.o fdinfo.o \ cancel.o waitid.o register.o \ truncate.o memmap.o +obj-$(CONFIG_IO_URING_ZCRX) += zcrx.o obj-$(CONFIG_IO_WQ) += io-wq.o obj-$(CONFIG_FUTEX) += futex.o obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 5535a72b0ce1..0c02a2e97f01 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -97,6 +97,7 @@ #include "uring_cmd.h" #include "msg_ring.h" #include "memmap.h" +#include "zcrx.h" #include "timeout.h" #include "poll.h" @@ -2686,6 +2687,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) mutex_lock(&ctx->uring_lock); io_sqe_buffers_unregister(ctx); io_sqe_files_unregister(ctx); + io_unregister_zcrx_ifqs(ctx); io_cqring_overflow_kill(ctx); io_eventfd_unregister(ctx); io_alloc_cache_free(&ctx->apoll_cache, kfree); @@ -2851,6 +2853,11 @@ static __cold void io_ring_exit_work(struct work_struct *work) io_cqring_overflow_kill(ctx); mutex_unlock(&ctx->uring_lock); } + if (ctx->ifq) { + mutex_lock(&ctx->uring_lock); + io_shutdown_zcrx_ifqs(ctx); + mutex_unlock(&ctx->uring_lock); + } if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) io_move_task_work_from_local(ctx); diff --git a/io_uring/memmap.h b/io_uring/memmap.h index c898dcba2b4e..dad0aa5b1b45 100644 --- a/io_uring/memmap.h +++ b/io_uring/memmap.h @@ -2,6 +2,7 @@ #define IO_URING_MEMMAP_H #define IORING_MAP_OFF_PARAM_REGION 0x20000000ULL +#define IORING_MAP_OFF_ZCRX_REGION 0x30000000ULL struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages); diff --git a/io_uring/register.c b/io_uring/register.c index f1698c18c7cb..f9dfdca79a80 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -30,6 +30,7 @@ #include "eventfd.h" #include "msg_ring.h" #include "memmap.h" +#include "zcrx.h" #define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \ IORING_REGISTER_LAST + IORING_OP_LAST) @@ -798,6 +799,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_clone_buffers(ctx, arg); break; + case IORING_REGISTER_ZCRX_IFQ: + ret = -EINVAL; + if (!arg || nr_args != 1) + break; + ret = io_register_zcrx_ifq(ctx, arg); + break; case IORING_REGISTER_RESIZE_RINGS: ret = -EINVAL; if (!arg || nr_args != 1) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c new file mode 100644 index 000000000000..f3ace7e8264d --- /dev/null +++ b/io_uring/zcrx.c @@ -0,0 +1,149 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include + +#include + +#include "io_uring.h" +#include "kbuf.h" +#include "memmap.h" +#include "zcrx.h" + +#define IO_RQ_MAX_ENTRIES 32768 + +static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq, + struct io_uring_zcrx_ifq_reg *reg, + struct io_uring_region_desc *rd) +{ + size_t off, size; + void *ptr; + int ret; + + off = sizeof(struct io_uring); + size = off + sizeof(struct io_uring_zcrx_rqe) * reg->rq_entries; + if (size > rd->size) + return -EINVAL; + + ret = io_create_region_mmap_safe(ifq->ctx, &ifq->ctx->zcrx_region, rd, + IORING_MAP_OFF_ZCRX_REGION); + if (ret < 0) + return ret; + + ptr = io_region_get_ptr(&ifq->ctx->zcrx_region); + ifq->rq_ring = (struct io_uring *)ptr; + ifq->rqes = (struct io_uring_zcrx_rqe *)(ptr + off); + return 0; +} + +static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq) +{ + io_free_region(ifq->ctx, &ifq->ctx->zcrx_region); + ifq->rq_ring = NULL; + ifq->rqes = NULL; +} + +static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx) +{ + struct io_zcrx_ifq *ifq; + + ifq = kzalloc(sizeof(*ifq), GFP_KERNEL); + if (!ifq) + return NULL; + + ifq->if_rxq = -1; + ifq->ctx = ctx; + return ifq; +} + +static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq) +{ + io_free_rbuf_ring(ifq); + kfree(ifq); +} + +int io_register_zcrx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zcrx_ifq_reg __user *arg) +{ + struct io_uring_zcrx_ifq_reg reg; + struct io_uring_region_desc rd; + struct io_zcrx_ifq *ifq; + int ret; + + /* + * 1. Interface queue allocation. + * 2. It can observe data destined for sockets of other tasks. + */ + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + /* mandatory io_uring features for zc rx */ + if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN && + ctx->flags & IORING_SETUP_CQE32)) + return -EINVAL; + if (ctx->ifq) + return -EBUSY; + if (copy_from_user(®, arg, sizeof(reg))) + return -EFAULT; + if (copy_from_user(&rd, u64_to_user_ptr(reg.region_ptr), sizeof(rd))) + return -EFAULT; + if (memchr_inv(®.__resv, 0, sizeof(reg.__resv))) + return -EINVAL; + if (reg.if_rxq == -1 || !reg.rq_entries || reg.flags) + return -EINVAL; + if (reg.rq_entries > IO_RQ_MAX_ENTRIES) { + if (!(ctx->flags & IORING_SETUP_CLAMP)) + return -EINVAL; + reg.rq_entries = IO_RQ_MAX_ENTRIES; + } + reg.rq_entries = roundup_pow_of_two(reg.rq_entries); + + if (!reg.area_ptr) + return -EFAULT; + + ifq = io_zcrx_ifq_alloc(ctx); + if (!ifq) + return -ENOMEM; + + ret = io_allocate_rbuf_ring(ifq, ®, &rd); + if (ret) + goto err; + + ifq->rq_entries = reg.rq_entries; + ifq->if_rxq = reg.if_rxq; + + reg.offsets.rqes = sizeof(struct io_uring); + reg.offsets.head = offsetof(struct io_uring, head); + reg.offsets.tail = offsetof(struct io_uring, tail); + + if (copy_to_user(arg, ®, sizeof(reg)) || + copy_to_user(u64_to_user_ptr(reg.region_ptr), &rd, sizeof(rd))) { + ret = -EFAULT; + goto err; + } + + ctx->ifq = ifq; + return 0; +err: + io_zcrx_ifq_free(ifq); + return ret; +} + +void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx) +{ + struct io_zcrx_ifq *ifq = ctx->ifq; + + lockdep_assert_held(&ctx->uring_lock); + + if (!ifq) + return; + + ctx->ifq = NULL; + io_zcrx_ifq_free(ifq); +} + +void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) +{ + lockdep_assert_held(&ctx->uring_lock); +} diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h new file mode 100644 index 000000000000..58e4ab6c6083 --- /dev/null +++ b/io_uring/zcrx.h @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef IOU_ZC_RX_H +#define IOU_ZC_RX_H + +#include + +struct io_zcrx_ifq { + struct io_ring_ctx *ctx; + struct io_uring *rq_ring; + struct io_uring_zcrx_rqe *rqes; + u32 rq_entries; + + u32 if_rxq; +}; + +#if defined(CONFIG_IO_URING_ZCRX) +int io_register_zcrx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zcrx_ifq_reg __user *arg); +void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx); +void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx); +#else +static inline int io_register_zcrx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zcrx_ifq_reg __user *arg) +{ + return -EOPNOTSUPP; +} +static inline void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx) +{ +} +static inline void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) +{ +} +#endif + +#endif From patchwork Wed Jan 8 22:06:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931631 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64EE62054F5 for ; Wed, 8 Jan 2025 22:07:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374041; cv=none; b=oXQwTVdliK2D7oHd1Mo1h0QKB74WQOL6L+lw2gVL8E1bbuh76k+HoT02Sn0BNsqWWOpSxq4p5rXrB58ZtGYRWr/ICT//6C1mvoiwpEhANjXoyz0nseZ9g3Gl1tpES3GSqXaz3Qs5pT3FLBBhcYPJnr5EImw3RfI1oga5XTXdRFM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374041; c=relaxed/simple; bh=XVYbxLhQjPLjRXvHRi0K/kcnMc22GtAVQg7MWFvQ9yk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OiPkoJLviS7v4vqzwen792OOg4t97Ux/ynKHaIx20oJBNBHarCdX5OJm58EDDHiq4lHyidkzPYf0SOgc6aJR18eGxY88BzMCddtyPnfG+AIGeSdpdFStvJS3vpLdEb7hQIFpa8kItWdyox68/XpW30yTBFpTtxzKP+ncaQ4JJyU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=M+FDXQJB; arc=none smtp.client-ip=209.85.214.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="M+FDXQJB" Received: by mail-pl1-f179.google.com with SMTP id d9443c01a7336-2165448243fso3653185ad.1 for ; Wed, 08 Jan 2025 14:07:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374040; x=1736978840; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HANf/wnrk/pvHPDftSyBB7C0ozx5QMtCSP2F71f05Iw=; b=M+FDXQJBJQNupeD2phEMQ9zgFjNoYZ90LfFu2SpwHC1VMLySLh3kzqnQUv50wsiSm1 Dgf9Fg2j7270F3wqaQXza/iq7D74BBT/4l7pxWUSI0fT7eUt5XOe4MISucUViIQAZ8wi gz3fz42PiYwVNnUUAMOGCRSW6RBLsoLXishDjEaY8243Jrz7TZL7fkWeKHxzYizDvm4R lTZ55QUJry9gw1BrJbseKa/e2Uwhg+YQxj7jHEaNuYAx9+ca85/ovumVITZzu5Mbpmdh 7pwuWWjXYlBoVTik+3HY0/Be9JQVCDrhX5+Ci7TdTr3+6+Ww/N8HABicfuqH/ldsHauj PBkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374040; x=1736978840; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HANf/wnrk/pvHPDftSyBB7C0ozx5QMtCSP2F71f05Iw=; b=cHunsnKPRMGAzN3HrMF9bZsSE/rUOOeVlqlEJpqV2HqQmMo2jUJgqN+DS7gDADcPgj l1hnn9MRxRG0bSqZycOHfP2+5WRiAICJQj5bvlzWAeuHOot36DC0mWdNi4d/4hyIF8PY t1mHkl8s4f3QIuNab4U++SSrZR4FpDZwL31scd0pvaXffr1GAf/oTPaOvgCT9Wr/DHZ9 e25GTe3RHzhMOf1BJz+fKu+CfHwhdnjQoryDsaGn9u9S9nKDfL6mnv+Z3gVQCY9oHJ4r IqpARCVOu44fYqJN2R54niauBVnmhJkmYnLIgagihJuPZD8HhAh4mO7zKNG1onKbqplJ boOw== X-Forwarded-Encrypted: i=1; AJvYcCVymrd4PIVZkUfE3fXwGZyrwOW7yleOl6mclgTOERymU9uOFkj6h2g9NQO0zxUKC2CV3jAPAas=@vger.kernel.org X-Gm-Message-State: AOJu0YztNAA1gXNzz+ipZ+ollV927BuKEJ3ZgmtffBdexo/mNImED9Zt 2rc1zaiLb9hKetBJ2RuHc/tyHZy+V7Bn1Zjlq+f5DNlRrTmh4SSAmxJ5Hppl1vM= X-Gm-Gg: ASbGncv73Mt8+zHm04EtecYHOI515y6NifNwLwofyp79ZfsU8bWUQWvmrefdCmRPW9A XjST/4eO9LG5RF+vXSp5MXGgQZVxt9izSx6O3a7QwPOf6ph1S1Q7N/7ICbWz0Tu71TvMi9a2hbf 0da91K095/5zYcZqRZjXbVMN57HLstrSb9ZC3JvV8RPdnqT4p/WORKdaWi19Ia57xpG8j/gOVGX lkhu54OFv9Zr08pGqDCqNDChxBXn594iaT0Hgh+ X-Google-Smtp-Source: AGHT+IF9+gsMmTye5KLw1hee0vHdVY8QAD3A2U2Vsn0d24au4I58smFbBymsLwZ1cq22AO0VuVDZoA== X-Received: by 2002:a17:902:da87:b0:212:63db:bb15 with SMTP id d9443c01a7336-21a83fe4c2emr62738775ad.38.1736374039679; Wed, 08 Jan 2025 14:07:19 -0800 (PST) Received: from localhost ([2a03:2880:ff:4::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a60ef62e8sm88366935ad.26.2025.01.08.14.07.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:19 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 13/22] io_uring/zcrx: add io_zcrx_area Date: Wed, 8 Jan 2025 14:06:34 -0800 Message-ID: <20250108220644.3528845-14-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Add io_zcrx_area that represents a region of userspace memory that is used for zero copy. During ifq registration, userspace passes in the uaddr and len of userspace memory, which is then pinned by the kernel. Each net_iov is mapped to one of these pages. The freelist is a spinlock protected list that keeps track of all the net_iovs/pages that aren't used. For now, there is only one area per ifq and area registration happens implicitly as part of ifq registration. There is no API for adding/removing areas yet. The struct for area registration is there for future extensibility once we support multiple areas and TCP devmem. Reviewed-by: Jens Axboe Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/uapi/linux/io_uring.h | 9 ++++ io_uring/rsrc.c | 2 +- io_uring/rsrc.h | 1 + io_uring/zcrx.c | 89 ++++++++++++++++++++++++++++++++++- io_uring/zcrx.h | 16 +++++++ 5 files changed, 114 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 3af8b7a19824..e251f28507ce 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -980,6 +980,15 @@ struct io_uring_zcrx_offsets { __u64 __resv[2]; }; +struct io_uring_zcrx_area_reg { + __u64 addr; + __u64 len; + __u64 rq_area_token; + __u32 flags; + __u32 __resv1; + __u64 __resv2[2]; +}; + /* * Argument for IORING_REGISTER_ZCRX_IFQ */ diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index f2ff108485c8..d0f11b5aec0d 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -77,7 +77,7 @@ static int io_account_mem(struct io_ring_ctx *ctx, unsigned long nr_pages) return 0; } -static int io_buffer_validate(struct iovec *iov) +int io_buffer_validate(struct iovec *iov) { unsigned long tmp, acct_len = iov->iov_len + (PAGE_SIZE - 1); diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index c8b093584461..0ae54ddeb1fd 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -66,6 +66,7 @@ int io_register_rsrc_update(struct io_ring_ctx *ctx, void __user *arg, unsigned size, unsigned type); int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg, unsigned int size, unsigned int type); +int io_buffer_validate(struct iovec *iov); bool io_check_coalesce_buffer(struct page **page_array, int nr_pages, struct io_imu_folio_data *data); diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index f3ace7e8264d..04883a3ae80c 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -10,6 +10,7 @@ #include "kbuf.h" #include "memmap.h" #include "zcrx.h" +#include "rsrc.h" #define IO_RQ_MAX_ENTRIES 32768 @@ -44,6 +45,79 @@ static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq) ifq->rqes = NULL; } +static void io_zcrx_free_area(struct io_zcrx_area *area) +{ + kvfree(area->freelist); + kvfree(area->nia.niovs); + if (area->pages) { + unpin_user_pages(area->pages, area->nia.num_niovs); + kvfree(area->pages); + } + kfree(area); +} + +static int io_zcrx_create_area(struct io_zcrx_ifq *ifq, + struct io_zcrx_area **res, + struct io_uring_zcrx_area_reg *area_reg) +{ + struct io_zcrx_area *area; + int i, ret, nr_pages; + struct iovec iov; + + if (area_reg->flags || area_reg->rq_area_token) + return -EINVAL; + if (area_reg->__resv1 || area_reg->__resv2[0] || area_reg->__resv2[1]) + return -EINVAL; + if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK) + return -EINVAL; + + iov.iov_base = u64_to_user_ptr(area_reg->addr); + iov.iov_len = area_reg->len; + ret = io_buffer_validate(&iov); + if (ret) + return ret; + + ret = -ENOMEM; + area = kzalloc(sizeof(*area), GFP_KERNEL); + if (!area) + goto err; + + area->pages = io_pin_pages((unsigned long)area_reg->addr, area_reg->len, + &nr_pages); + if (IS_ERR(area->pages)) { + ret = PTR_ERR(area->pages); + area->pages = NULL; + goto err; + } + area->nia.num_niovs = nr_pages; + + area->nia.niovs = kvmalloc_array(nr_pages, sizeof(area->nia.niovs[0]), + GFP_KERNEL | __GFP_ZERO); + if (!area->nia.niovs) + goto err; + + area->freelist = kvmalloc_array(nr_pages, sizeof(area->freelist[0]), + GFP_KERNEL | __GFP_ZERO); + if (!area->freelist) + goto err; + + for (i = 0; i < nr_pages; i++) + area->freelist[i] = i; + + area->free_count = nr_pages; + area->ifq = ifq; + /* we're only supporting one area per ifq for now */ + area->area_id = 0; + area_reg->rq_area_token = (u64)area->area_id << IORING_ZCRX_AREA_SHIFT; + spin_lock_init(&area->freelist_lock); + *res = area; + return 0; +err: + if (area) + io_zcrx_free_area(area); + return ret; +} + static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx) { struct io_zcrx_ifq *ifq; @@ -59,6 +133,9 @@ static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx) static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq) { + if (ifq->area) + io_zcrx_free_area(ifq->area); + io_free_rbuf_ring(ifq); kfree(ifq); } @@ -66,6 +143,7 @@ static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq) int io_register_zcrx_ifq(struct io_ring_ctx *ctx, struct io_uring_zcrx_ifq_reg __user *arg) { + struct io_uring_zcrx_area_reg area; struct io_uring_zcrx_ifq_reg reg; struct io_uring_region_desc rd; struct io_zcrx_ifq *ifq; @@ -99,7 +177,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, } reg.rq_entries = roundup_pow_of_two(reg.rq_entries); - if (!reg.area_ptr) + if (copy_from_user(&area, u64_to_user_ptr(reg.area_ptr), sizeof(area))) return -EFAULT; ifq = io_zcrx_ifq_alloc(ctx); @@ -110,6 +188,10 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, if (ret) goto err; + ret = io_zcrx_create_area(ifq, &ifq->area, &area); + if (ret) + goto err; + ifq->rq_entries = reg.rq_entries; ifq->if_rxq = reg.if_rxq; @@ -122,7 +204,10 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, ret = -EFAULT; goto err; } - + if (copy_to_user(u64_to_user_ptr(reg.area_ptr), &area, sizeof(area))) { + ret = -EFAULT; + goto err; + } ctx->ifq = ifq; return 0; err: diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index 58e4ab6c6083..53fd94b65b38 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -3,9 +3,25 @@ #define IOU_ZC_RX_H #include +#include + +struct io_zcrx_area { + struct net_iov_area nia; + struct io_zcrx_ifq *ifq; + + u16 area_id; + struct page **pages; + + /* freelist */ + spinlock_t freelist_lock ____cacheline_aligned_in_smp; + u32 free_count; + u32 *freelist; +}; struct io_zcrx_ifq { struct io_ring_ctx *ctx; + struct io_zcrx_area *area; + struct io_uring *rq_ring; struct io_uring_zcrx_rqe *rqes; u32 rq_entries; From patchwork Wed Jan 8 22:06:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931632 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9CA862046A9 for ; Wed, 8 Jan 2025 22:07:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374043; cv=none; b=CU72rFiryb8sQkfsmu11/Q15hL6vfuudsRks4ZSt/0z5ffNPGBolMlugc9mwXITeORp0vYpOqOlY+64jk6Pk+WMZOYKcUkYcsZ0NicZxmtXCHZeL8poC8TZLlbWu6r4TbCwpNCdSvLR19+8bDiMK09Zpp30dWnEiDnV7LfXcnYE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374043; c=relaxed/simple; bh=47+zvc2Ici6Dhxn2E8wYQnSlqsBoCJHtk3I1q+12dkA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XKOvYB39v+mRrJRWCr6HnUI1MapUs7TnxXuqD9orlULdhBbbNTQnHvlEZJ6rcxSnsRvAI8k9moXXHGBKcGJDfaJ0JD6lZ77A07xqUCWyl1Rc7Kix0KJ7a6b2er9PI03ibjzPLFoCm1XP/a8/ZMzXPQkEDKVXBWA73q4vbVu6ddM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=fY7a6lvR; arc=none smtp.client-ip=209.85.216.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="fY7a6lvR" Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-2ee9a780de4so369867a91.3 for ; Wed, 08 Jan 2025 14:07:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374041; x=1736978841; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BrJyIoqHvwSKUkoZTkSSHSYdDjrfAJzArFg5r7V7N5Q=; b=fY7a6lvRS0yQ/8igCbORVvnbC20kjRKmiv9ZyDCLgvONapvgXvFFddzINSFAjy2BSn OJv2FecVTUqm6npLoqbZYIZ/lb+cMErknQfALPCbdh5LS3KBreFjOsvL+2PNasoZ8FO9 DCWPqi5+wug6eO3kv/fvBe/3PJOBcy0IOCp0OZvQqh4A+C/fj+ekLClw0xpMfc/xH/yR 5kzYc1uKFXEZpP7EzdhAGDIGMxA3sfwW4MPwkPNoOsUNNXDkdEyIIJbPofsrKS0pDFgZ K+dYeCB6MyGhNVK3b9O/ZyAflPYA9QywqKmyN6GPjnHpGfxC0hPhDRUbR3Vx4OTz89gR QXQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374041; x=1736978841; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BrJyIoqHvwSKUkoZTkSSHSYdDjrfAJzArFg5r7V7N5Q=; b=Hqn2gSe2uezf6smppsjy6dIad/5sylHK2dl0F/cXqYNFihXcNQm8W1RjunJJqettBf IRgRFESGZVpVFSYiBKixPTwsMHca8N87XeT5Jcky1UDhwC4IkBasqej73psuKYdblQO0 zKa8xAJlh1FRY3npLWTeGXmzIC4OpIUbtS10Wc49/wzoVBoTi3+9DcpVJXZMeZXrL2m9 Wpr+p4bXH1sbdBKNXnhe4Ofz4TjOfkgE1CfUtdj6z5lhPbExBW8RIu5CfXLXW247wJ2J M52mokhQwGurBkMCkOp5wsSEjoMLw3m2MPOLAKVm05mQqOToLpyEJw1eivA9j0pvE/kb Rl/g== X-Forwarded-Encrypted: i=1; AJvYcCWgj0BvzWREYrSpgepSofc38gnhL+2ak5OFeGojXWcTW86dG4UqH8yejStyCyuIv9dQuQcK8sA=@vger.kernel.org X-Gm-Message-State: AOJu0Yyq/XZ24IGdMhE4UY9POwOkgj6XoHz3JwWPVxb97I5wshgH1r9F LStNK7e5Sp14bEWYIqjXy8DX5ZYYSfNrIkGYPNtV/qvltO7P1+2zLH3GuwNnPa8= X-Gm-Gg: ASbGnct4pTLJLMJwYzyOd12ORhTg8UoF1jEzAk+q4jDMwjqWrTp0u7Mwb3e3+0gf3PP BQAWQSgFOI+fMdrGm7yXKPFQuprHW4CdBEfF8L/83iMDZ6YUfQbch4WEl5kNZmb7Ql6kmnhZFy6 wuerpmGwM7kX/FUpEfakgSfntxcIOJpY+2oW+MXHToI7GEsommxGRh6Za+7vXgZAi5znziYWPih DfgiRPJl09lnrhZRDRvnvMbpbhmZN55DAPnlN8oUQ== X-Google-Smtp-Source: AGHT+IGvmLmr2nz6de/uaFySICL6uCdtyb6A3NlfWNpmpXbniS3AxufBFSiNq5oEiDX1paYwMwTzCQ== X-Received: by 2002:a17:90b:2f46:b0:2ee:bbe0:98c6 with SMTP id 98e67ed59e1d1-2f548ea6488mr7112436a91.8.1736374040952; Wed, 08 Jan 2025 14:07:20 -0800 (PST) Received: from localhost ([2a03:2880:ff:71::]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f54a265db1sm2306594a91.1.2025.01.08.14.07.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:20 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 14/22] io_uring/zcrx: grab a net device Date: Wed, 8 Jan 2025 14:06:35 -0800 Message-ID: <20250108220644.3528845-15-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Zerocopy receive needs a net device to bind to its rx queue and dma map buffers. As a preparation to following patches, resolve a net device from the if_idx parameter with no functional changes otherwise. Reviewed-by: Jens Axboe Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- io_uring/zcrx.c | 10 ++++++++++ io_uring/zcrx.h | 3 +++ 2 files changed, 13 insertions(+) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 04883a3ae80c..e6cca6747148 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -3,6 +3,8 @@ #include #include #include +#include +#include #include @@ -136,6 +138,8 @@ static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq) if (ifq->area) io_zcrx_free_area(ifq->area); + if (ifq->dev) + netdev_put(ifq->dev, &ifq->netdev_tracker); io_free_rbuf_ring(ifq); kfree(ifq); } @@ -195,6 +199,12 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, ifq->rq_entries = reg.rq_entries; ifq->if_rxq = reg.if_rxq; + ret = -ENODEV; + ifq->dev = netdev_get_by_index(current->nsproxy->net_ns, reg.if_idx, + &ifq->netdev_tracker, GFP_KERNEL); + if (!ifq->dev) + goto err; + reg.offsets.rqes = sizeof(struct io_uring); reg.offsets.head = offsetof(struct io_uring, head); reg.offsets.tail = offsetof(struct io_uring, tail); diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index 53fd94b65b38..46988a1dbd54 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -4,6 +4,7 @@ #include #include +#include struct io_zcrx_area { struct net_iov_area nia; @@ -27,6 +28,8 @@ struct io_zcrx_ifq { u32 rq_entries; u32 if_rxq; + struct net_device *dev; + netdevice_tracker netdev_tracker; }; #if defined(CONFIG_IO_URING_ZCRX) From patchwork Wed Jan 8 22:06:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931633 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F166B205513 for ; Wed, 8 Jan 2025 22:07:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374044; cv=none; b=Av3guS+EbUccAmoXbA9TTq7AENSEg1erld524W9aAVvKJEBd2Qx1vbmjpIziAvalbotzPQmofQrRft+O6ihcx2GWPA8uJ1Re0/y92UCwoASCIWVAwLyoceBzuuwShml/mT6/o/VZ3q49qWPGxkkeNLb4+6bqelc3P6eXbATaCig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374044; c=relaxed/simple; bh=fRUByEwg0FvgkNn2bCixfrj0Uwn+Ak02MnYWq8R24a0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aXAyvhezuBJH8OnSr1TIOgngMf+2d87HzDpt/wMQDTjG+btE0BFSRMy25km/73t3t9PBjekdEosRFylfNaGMrsqCaSxgAbqx3VbdUbEpJtdEeRu285z9IG1x9LonMw+PtfoSLcu7a2JX15RoIOND5s6x3ldv/S/d3UECDN6qXxE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=y/1Bubcv; arc=none smtp.client-ip=209.85.216.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="y/1Bubcv" Received: by mail-pj1-f48.google.com with SMTP id 98e67ed59e1d1-2f42992f608so432129a91.0 for ; Wed, 08 Jan 2025 14:07:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374042; x=1736978842; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rWzIMoqTgS/gdRC23WX/R1zHF1HLYqQL2tgeCd7UOH8=; b=y/1BubcvFchG3uhgqZ/yu7xGKMMVk4wd3NQrBEm3uqak9+51O84G+8I/zii0qwzZ9f 5J99OCa00voNn7fYwwrbDH/GpZ/OFgXysduVv81TUdx/w1XO5mViXcmdhDPy2ALlVJu4 iM3nMIrmItPMO3cIvaxUdHFb0tv+YxKhVAmX9khwUeiNyxluoFDkLFMIhjYlZXzj2eOq LEUgzjxx9bR/hz8XiqlXms28xkFcGtstoarFOItqD892UqPFpnLQ2tlKXvEpML9KUzEt 2xU27yxsSJX5HZAhS8EnfDxwnc6t4/6sd2zZtRvK7ByGGlAunb61YIm2BmXiK8z970um Siaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374042; x=1736978842; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rWzIMoqTgS/gdRC23WX/R1zHF1HLYqQL2tgeCd7UOH8=; b=YMb3BHShBUgKzDYV9v3Ey3+3QRSIEwB+++pd2n5/K6/3zvzofdrf2g1Py03+NfFxyk J729yPGCCWqARVXq3D0udFXoilV7NCr2CgU3FbqkAKeH2nyRdis5foRcEr2Uhq2h5KT/ PPuZluAh9kjXbea5r7iR+/CgucMvr2fkIaGtfQCGqAtQsQwxKMZKIbI8suqzYs4F4lcR X/ztkNIo4mlKw89SczLpcIAG6t/FapvpeSGfYeUWMsMX5boC00CurFFwP8T8spsL5Rzc gMh1fX6XyprqqcU0Jjm/8l+Ewc4WHxI5/HwAeehAw+YCRTXsFKgD6z63EwumRdmjAzRE kLKA== X-Forwarded-Encrypted: i=1; AJvYcCVmTfMXl1k+OMBy1x+/jHZGJsMN0rzfoLCRDccvt7aG+MXoo7a8VtJCUvZUzigIIpT6YJWtinQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yz6VU+/D+dqQcxgJV6dPqXSSkgDUaVivu0tkASPn/OEG99A8TCL jYYKYR5EJ9zb2z+ksfakzp4sw4Y8MkyCqgLVV1nbsrxiAT1JelOSC5qn2nfh+bk= X-Gm-Gg: ASbGncupqc20kvvUuO0On1lMsTpexjN1ikVsnNYRd0UBL5yPqaFYq8DfonqzkM3Vu2q gBpJdd6S4TuGa4pXcBoi8njrwEMwHorH/640UfsYSRhLSKH4clJgcIXAKJ/8PjHkedGrZLbcxq+ CHrq5yHWPQWTp7bzSk9KJvwdS2oYNBN4JBxFTmoOSDrmzh4Bcsd05jqnmvl106sfrbR2zxaMcwr F/cgfdVcjOQDJYVNhPoS2PQdancQfLhHyOjQtq1 X-Google-Smtp-Source: AGHT+IFGo1ai17kfmPc4xM8AQJAuJI3a1b3noZBe7w57FWCh3ssBWrV48bCLJPmFNFRotfW3wbu5mg== X-Received: by 2002:a17:90b:280a:b0:2ee:d186:fe48 with SMTP id 98e67ed59e1d1-2f548f1b5cfmr5977765a91.28.1736374042331; Wed, 08 Jan 2025 14:07:22 -0800 (PST) Received: from localhost ([2a03:2880:ff:c::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9706c3sm332664485ad.83.2025.01.08.14.07.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:21 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 15/22] io_uring/zcrx: implement zerocopy receive pp memory provider Date: Wed, 8 Jan 2025 14:06:36 -0800 Message-ID: <20250108220644.3528845-16-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Implement a page pool memory provider for io_uring to receieve in a zero copy fashion. For that, the provider allocates user pages wrapped around into struct net_iovs, that are stored in a previously registered struct net_iov_area. Unlike the traditional receive, that frees pages and returns them back to the page pool right after data was copied to the user, e.g. inside recv(2), we extend the lifetime until the user space confirms that it's done processing the data. That's done by taking a net_iov reference. When the user is done with the buffer, it must return it back to the kernel by posting an entry into the refill ring, which is usually polled off the io_uring memory provider callback in the page pool's netmem allocation path. There is also a separate set of per net_iov "user" references accounting whether a buffer is currently given to the user (including possible fragmentation). Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- io_uring/zcrx.c | 263 ++++++++++++++++++++++++++++++++++++++++++++++++ io_uring/zcrx.h | 3 + 2 files changed, 266 insertions(+) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index e6cca6747148..64e86d14acc7 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -2,10 +2,17 @@ #include #include #include +#include #include #include #include +#include +#include +#include + +#include + #include #include "io_uring.h" @@ -16,6 +23,33 @@ #define IO_RQ_MAX_ENTRIES 32768 +__maybe_unused +static const struct memory_provider_ops io_uring_pp_zc_ops; + +static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *niov) +{ + struct net_iov_area *owner = net_iov_owner(niov); + + return container_of(owner, struct io_zcrx_area, nia); +} + +static inline atomic_t *io_get_user_counter(struct net_iov *niov) +{ + struct io_zcrx_area *area = io_zcrx_iov_to_area(niov); + + return &area->user_refs[net_iov_idx(niov)]; +} + +static bool io_zcrx_put_niov_uref(struct net_iov *niov) +{ + atomic_t *uref = io_get_user_counter(niov); + + if (unlikely(!atomic_read(uref))) + return false; + atomic_dec(uref); + return true; +} + static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq, struct io_uring_zcrx_ifq_reg *reg, struct io_uring_region_desc *rd) @@ -51,6 +85,7 @@ static void io_zcrx_free_area(struct io_zcrx_area *area) { kvfree(area->freelist); kvfree(area->nia.niovs); + kvfree(area->user_refs); if (area->pages) { unpin_user_pages(area->pages, area->nia.num_niovs); kvfree(area->pages); @@ -106,6 +141,19 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq, for (i = 0; i < nr_pages; i++) area->freelist[i] = i; + area->user_refs = kvmalloc_array(nr_pages, sizeof(area->user_refs[0]), + GFP_KERNEL | __GFP_ZERO); + if (!area->user_refs) + goto err; + + for (i = 0; i < nr_pages; i++) { + struct net_iov *niov = &area->nia.niovs[i]; + + niov->owner = &area->nia; + area->freelist[i] = i; + atomic_set(&area->user_refs[i], 0); + } + area->free_count = nr_pages; area->ifq = ifq; /* we're only supporting one area per ifq for now */ @@ -130,6 +178,7 @@ static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx) ifq->if_rxq = -1; ifq->ctx = ctx; + spin_lock_init(&ifq->rq_lock); return ifq; } @@ -238,7 +287,221 @@ void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx) io_zcrx_ifq_free(ifq); } +static struct net_iov *__io_zcrx_get_free_niov(struct io_zcrx_area *area) +{ + unsigned niov_idx; + + lockdep_assert_held(&area->freelist_lock); + + niov_idx = area->freelist[--area->free_count]; + return &area->nia.niovs[niov_idx]; +} + +static void io_zcrx_return_niov_freelist(struct net_iov *niov) +{ + struct io_zcrx_area *area = io_zcrx_iov_to_area(niov); + + spin_lock_bh(&area->freelist_lock); + area->freelist[area->free_count++] = net_iov_idx(niov); + spin_unlock_bh(&area->freelist_lock); +} + +static void io_zcrx_return_niov(struct net_iov *niov) +{ + netmem_ref netmem = net_iov_to_netmem(niov); + + page_pool_put_unrefed_netmem(niov->pp, netmem, -1, false); +} + +static void io_zcrx_scrub(struct io_zcrx_ifq *ifq) +{ + struct io_zcrx_area *area = ifq->area; + int i; + + if (!area) + return; + + /* Reclaim back all buffers given to the user space. */ + for (i = 0; i < area->nia.num_niovs; i++) { + struct net_iov *niov = &area->nia.niovs[i]; + int nr; + + if (!atomic_read(io_get_user_counter(niov))) + continue; + nr = atomic_xchg(io_get_user_counter(niov), 0); + if (nr && !page_pool_unref_netmem(net_iov_to_netmem(niov), nr)) + io_zcrx_return_niov(niov); + } +} + void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) { lockdep_assert_held(&ctx->uring_lock); + + if (ctx->ifq) + io_zcrx_scrub(ctx->ifq); +} + +static inline u32 io_zcrx_rqring_entries(struct io_zcrx_ifq *ifq) +{ + u32 entries; + + entries = smp_load_acquire(&ifq->rq_ring->tail) - ifq->cached_rq_head; + return min(entries, ifq->rq_entries); +} + +static struct io_uring_zcrx_rqe *io_zcrx_get_rqe(struct io_zcrx_ifq *ifq, + unsigned mask) +{ + unsigned int idx = ifq->cached_rq_head++ & mask; + + return &ifq->rqes[idx]; } + +static void io_zcrx_ring_refill(struct page_pool *pp, + struct io_zcrx_ifq *ifq) +{ + unsigned int mask = ifq->rq_entries - 1; + unsigned int entries; + netmem_ref netmem; + + spin_lock_bh(&ifq->rq_lock); + + entries = io_zcrx_rqring_entries(ifq); + entries = min_t(unsigned, entries, PP_ALLOC_CACHE_REFILL - pp->alloc.count); + if (unlikely(!entries)) { + spin_unlock_bh(&ifq->rq_lock); + return; + } + + do { + struct io_uring_zcrx_rqe *rqe = io_zcrx_get_rqe(ifq, mask); + struct io_zcrx_area *area; + struct net_iov *niov; + unsigned niov_idx, area_idx; + + area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT; + niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) >> PAGE_SHIFT; + + if (unlikely(rqe->__pad || area_idx)) + continue; + area = ifq->area; + + if (unlikely(niov_idx >= area->nia.num_niovs)) + continue; + niov_idx = array_index_nospec(niov_idx, area->nia.num_niovs); + + niov = &area->nia.niovs[niov_idx]; + if (!io_zcrx_put_niov_uref(niov)) + continue; + + netmem = net_iov_to_netmem(niov); + if (page_pool_unref_netmem(netmem, 1) != 0) + continue; + + if (unlikely(niov->pp != pp)) { + io_zcrx_return_niov(niov); + continue; + } + + net_mp_netmem_place_in_cache(pp, netmem); + } while (--entries); + + smp_store_release(&ifq->rq_ring->head, ifq->cached_rq_head); + spin_unlock_bh(&ifq->rq_lock); +} + +static void io_zcrx_refill_slow(struct page_pool *pp, struct io_zcrx_ifq *ifq) +{ + struct io_zcrx_area *area = ifq->area; + + spin_lock_bh(&area->freelist_lock); + while (area->free_count && pp->alloc.count < PP_ALLOC_CACHE_REFILL) { + struct net_iov *niov = __io_zcrx_get_free_niov(area); + netmem_ref netmem = net_iov_to_netmem(niov); + + net_mp_niov_set_page_pool(pp, niov); + net_mp_netmem_place_in_cache(pp, netmem); + } + spin_unlock_bh(&area->freelist_lock); +} + +static netmem_ref io_pp_zc_alloc_netmems(struct page_pool *pp, gfp_t gfp) +{ + struct io_zcrx_ifq *ifq = pp->mp_priv; + + /* pp should already be ensuring that */ + if (unlikely(pp->alloc.count)) + goto out_return; + + io_zcrx_ring_refill(pp, ifq); + if (likely(pp->alloc.count)) + goto out_return; + + io_zcrx_refill_slow(pp, ifq); + if (!pp->alloc.count) + return 0; +out_return: + return pp->alloc.cache[--pp->alloc.count]; +} + +static bool io_pp_zc_release_netmem(struct page_pool *pp, netmem_ref netmem) +{ + struct net_iov *niov; + + if (WARN_ON_ONCE(!netmem_is_net_iov(netmem))) + return false; + + niov = netmem_to_net_iov(netmem); + net_mp_niov_clear_page_pool(niov); + io_zcrx_return_niov_freelist(niov); + return false; +} + +static int io_pp_zc_init(struct page_pool *pp) +{ + struct io_zcrx_ifq *ifq = pp->mp_priv; + + if (WARN_ON_ONCE(!ifq)) + return -EINVAL; + if (WARN_ON_ONCE(ifq->dev != pp->slow.netdev)) + return -EINVAL; + if (pp->dma_map) + return -EOPNOTSUPP; + if (pp->p.order != 0) + return -EOPNOTSUPP; + + percpu_ref_get(&ifq->ctx->refs); + return 0; +} + +static void io_pp_zc_destroy(struct page_pool *pp) +{ + struct io_zcrx_ifq *ifq = pp->mp_priv; + struct io_zcrx_area *area = ifq->area; + + if (WARN_ON_ONCE(area->free_count != area->nia.num_niovs)) + return; + percpu_ref_put(&ifq->ctx->refs); +} + +static int io_pp_nl_fill(void *mp_priv, struct sk_buff *rsp, + struct netdev_rx_queue *rxq) +{ + struct nlattr *nest; + int type; + + type = rxq ? NETDEV_A_QUEUE_IO_URING : NETDEV_A_PAGE_POOL_IO_URING; + nest = nla_nest_start(rsp, type); + nla_nest_end(rsp, nest); + + return 0; +} + +static const struct memory_provider_ops io_uring_pp_zc_ops = { + .alloc_netmems = io_pp_zc_alloc_netmems, + .release_netmem = io_pp_zc_release_netmem, + .init = io_pp_zc_init, + .destroy = io_pp_zc_destroy, + .nl_fill = io_pp_nl_fill, +}; diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index 46988a1dbd54..f31c8006ca9c 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -9,6 +9,7 @@ struct io_zcrx_area { struct net_iov_area nia; struct io_zcrx_ifq *ifq; + atomic_t *user_refs; u16 area_id; struct page **pages; @@ -26,6 +27,8 @@ struct io_zcrx_ifq { struct io_uring *rq_ring; struct io_uring_zcrx_rqe *rqes; u32 rq_entries; + u32 cached_rq_head; + spinlock_t rq_lock; u32 if_rxq; struct net_device *dev; From patchwork Wed Jan 8 22:06:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931634 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2BEF420551E for ; Wed, 8 Jan 2025 22:07:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374045; cv=none; b=GIZ11DK4AJu5lYF7kTQ3LdmTZwVOH08hUsRpq+QocM7IL4j5RMaEHz1zNCKyVXzrnJKxjMq7J7KqEks5zKfmDyr3X1zmZU+UuBqtOyT0m66JiAGQa9fJrvYoYJ8P7uG4bb5/MCcYNRc/t6eDynN46rtsboJC5C4YY9V1P7gWjok= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374045; c=relaxed/simple; bh=yuITOvVTI1jyxUeP1TltVKMBQAsbuKeDYQ08bxcQKnw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=an5ieeEa9cgOKulqesDqnT9StIOHbhNLdrvsGanibyHEERgneillNWmDUb99k35+6z1dhahw9eGP0Q9yfKWWiNbIESI4lacPg7LTTW7kEZB47QcCcNzVpKUXTAH8j6bEgOZzNXobAJM6yyftCs0KwxY7AVW8Gqns5L/YxBBkY4Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=d/5VWW91; arc=none smtp.client-ip=209.85.216.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="d/5VWW91" Received: by mail-pj1-f46.google.com with SMTP id 98e67ed59e1d1-2f4448bf96fso406078a91.0 for ; Wed, 08 Jan 2025 14:07:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374043; x=1736978843; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qXCSv20CUAs31SZq5vRMJYKkS1QpcQw/HpUfxHdkGGk=; b=d/5VWW91iRDiLx/1NvTjzvHtLqHBR4s/wUKt9VvEOw+izUid5PmH6Rog+MueWEgbeE 3iI3mIpoLQ4cNjz808Oi+j/Z1iN8vw6qeWkH4mU9iK5iKS0yGuv8MNwwWUYR2MYRKLfu Bg5l3B7bNYQQt+hPRznbJk4L4K0JcdUcAJTEAPXUuRhFXnsomGF2kLl91CF4M1jifT6l BPXpf4HNFjCc957NwdrCBAoIryHj1kXLRKuNDmN9cQzhPjUNwnnsB+x0vNhtTiO75NVJ L8wIXE4wSKquWsWDh5LXFfRmGQt5ao72CXO/FUWiMNa5FadAs05FGavvcn0Av/MW/PoE KyGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374043; x=1736978843; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qXCSv20CUAs31SZq5vRMJYKkS1QpcQw/HpUfxHdkGGk=; b=sj7TyUu9sitfZQfr3Wmd+dkW2mrAjSNpgXFpWsbgATwZx8VIilxXTlFpwlqwL5hFMf /jCUvJD3DwzNkEVDZe2IJorRPODU5GC5YncumNa9W5H7Ip0cQkn81l+HVu7/bd0F9ohr TEwt3HjuhwLx5dfkGWvoY9a0AKeLotru/8dLuGsjfK8YputPrXLKz3x37dSGe7mBKbAe p7kQrcXobJXIaN/4PHbtp4yPoBUSiPBTbvia4R2jzYXeRFbFWuHJz5qqpNmtv8ZXY3ks V1tC9AK1yvRu91sSRFYNbyvknu05tEdCYBnEknTXNiJ0BLsdcZFO5rSABxOpg6K/ExNi MyZg== X-Forwarded-Encrypted: i=1; AJvYcCUmNe8k2fmfIr6itplWM2zh5TfFQsVRLmdTCt7nFSbCufmTNyEt2B9VLx5FmdPOfcoDvhKEw/U=@vger.kernel.org X-Gm-Message-State: AOJu0Yzb4UXELRIp+9EYfaOU90mqNQ8mUfu0JCCxNboXStzUnDXe08mW Abl4I9jFsVUa0LSivbRYdjsxMdROIk9rzWAfJZLH4xGRkH4z8DLqwiYtU2VPjvE= X-Gm-Gg: ASbGncvW/RPoqck5+3EYkrkjIyp3+iWg1fX+2EHjFfsolourl6mhjVmc+7977L/Oa3+ qBI8ROUOFPKeYk5R8SU0BiBfswrZIavnbPiMJdzLpW9EvX0elGByq63a0BI9T71KFmX6N1yVs6J m4kLwrzVK7VZUf3SyCDPum5guJUJVIEtuuWkI7vQ55VIS4C6dVR3JMVYdCcxOd/VpZSflv++Sq3 irwhqHcEu1E799tiV4z5a55FzaPaKnT895thmRfSQ== X-Google-Smtp-Source: AGHT+IFKJUu89PWR7Fg4XdSIPhiSlLCqULnhKxNJk2G4NuluyqVlESPDGHgFeuxGcBUFOp/s+cAeug== X-Received: by 2002:a17:90b:520e:b0:2f2:ab09:c256 with SMTP id 98e67ed59e1d1-2f548f424b1mr6405397a91.33.1736374043564; Wed, 08 Jan 2025 14:07:23 -0800 (PST) Received: from localhost ([2a03:2880:ff:21::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9f6951sm332840325ad.171.2025.01.08.14.07.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:23 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 16/22] io_uring/zcrx: dma-map area for the device Date: Wed, 8 Jan 2025 14:06:37 -0800 Message-ID: <20250108220644.3528845-17-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Setup DMA mappings for the area into which we intend to receive data later on. We know the device we want to attach to even before we get a page pool and can pre-map in advance. All net_iov are synchronised for device when allocated, see page_pool_mp_return_in_cache(). Reviewed-by: Jens Axboe Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- io_uring/zcrx.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++- io_uring/zcrx.h | 1 + 2 files changed, 91 insertions(+), 1 deletion(-) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 64e86d14acc7..273bad4d86a2 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include @@ -21,6 +22,82 @@ #include "zcrx.h" #include "rsrc.h" +#define IO_DMA_ATTR (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING) + +static struct device *io_zcrx_get_device(struct io_zcrx_ifq *ifq) +{ + return ifq->dev->dev.parent; +} + +static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq, + struct io_zcrx_area *area, int nr_mapped) +{ + int i; + + for (i = 0; i < nr_mapped; i++) { + struct net_iov *niov = &area->nia.niovs[i]; + dma_addr_t dma; + + dma = page_pool_get_dma_addr_netmem(net_iov_to_netmem(niov)); + dma_unmap_page_attrs(io_zcrx_get_device(ifq), dma, PAGE_SIZE, + DMA_FROM_DEVICE, IO_DMA_ATTR); + page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), 0); + } +} + +static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area) +{ + if (area->is_mapped) + __io_zcrx_unmap_area(ifq, area, area->nia.num_niovs); +} + +static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area) +{ + struct device *dev = io_zcrx_get_device(ifq); + int i; + + if (!dev) + return -EINVAL; + + for (i = 0; i < area->nia.num_niovs; i++) { + struct net_iov *niov = &area->nia.niovs[i]; + dma_addr_t dma; + + dma = dma_map_page_attrs(dev, area->pages[i], 0, PAGE_SIZE, + DMA_FROM_DEVICE, IO_DMA_ATTR); + if (dma_mapping_error(dev, dma)) + break; + if (page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), dma)) { + dma_unmap_page_attrs(dev, dma, PAGE_SIZE, + DMA_FROM_DEVICE, IO_DMA_ATTR); + break; + } + } + + if (i != area->nia.num_niovs) { + __io_zcrx_unmap_area(ifq, area, i); + return -EINVAL; + } + + area->is_mapped = true; + return 0; +} + +static void io_zcrx_sync_for_device(const struct page_pool *pool, + struct net_iov *niov) +{ +#if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC) + dma_addr_t dma_addr; + + if (!dma_dev_need_sync(pool->p.dev)) + return; + + dma_addr = page_pool_get_dma_addr_netmem(net_iov_to_netmem(niov)); + __dma_sync_single_for_device(pool->p.dev, dma_addr + pool->p.offset, + PAGE_SIZE, pool->p.dma_dir); +#endif +} + #define IO_RQ_MAX_ENTRIES 32768 __maybe_unused @@ -83,6 +160,8 @@ static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq) static void io_zcrx_free_area(struct io_zcrx_area *area) { + io_zcrx_unmap_area(area->ifq, area); + kvfree(area->freelist); kvfree(area->nia.niovs); kvfree(area->user_refs); @@ -254,6 +333,10 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, if (!ifq->dev) goto err; + ret = io_zcrx_map_area(ifq, ifq->area); + if (ret) + goto err; + reg.offsets.rqes = sizeof(struct io_uring); reg.offsets.head = offsetof(struct io_uring, head); reg.offsets.tail = offsetof(struct io_uring, tail); @@ -404,6 +487,7 @@ static void io_zcrx_ring_refill(struct page_pool *pp, continue; } + io_zcrx_sync_for_device(pp, niov); net_mp_netmem_place_in_cache(pp, netmem); } while (--entries); @@ -421,6 +505,7 @@ static void io_zcrx_refill_slow(struct page_pool *pp, struct io_zcrx_ifq *ifq) netmem_ref netmem = net_iov_to_netmem(niov); net_mp_niov_set_page_pool(pp, niov); + io_zcrx_sync_for_device(pp, niov); net_mp_netmem_place_in_cache(pp, netmem); } spin_unlock_bh(&area->freelist_lock); @@ -466,10 +551,14 @@ static int io_pp_zc_init(struct page_pool *pp) return -EINVAL; if (WARN_ON_ONCE(ifq->dev != pp->slow.netdev)) return -EINVAL; - if (pp->dma_map) + if (WARN_ON_ONCE(io_zcrx_get_device(ifq) != pp->p.dev)) + return -EINVAL; + if (WARN_ON_ONCE(!pp->dma_map)) return -EOPNOTSUPP; if (pp->p.order != 0) return -EOPNOTSUPP; + if (pp->p.dma_dir != DMA_FROM_DEVICE) + return -EOPNOTSUPP; percpu_ref_get(&ifq->ctx->refs); return 0; diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index f31c8006ca9c..beacf1ea6380 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -11,6 +11,7 @@ struct io_zcrx_area { struct io_zcrx_ifq *ifq; atomic_t *user_refs; + bool is_mapped; u16 area_id; struct page **pages; From patchwork Wed Jan 8 22:06:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931635 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3384205AAB for ; Wed, 8 Jan 2025 22:07:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374048; cv=none; b=UpMyihrXkOgeb5C8RuSnzGz7CIa5ajp1EjYoJMjIFyCUpYZHZrmRwcUGg+rBgzirN9FmTElrMnVg8Q+z0fiTjc5fXpIrmmMrmkuBs3QOgT6kebS6YG+4b3kOBWQVWK6IG3JwgB35PORWQq6wO0dVDHzCbMToYlk1lgy9kvcXMdM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374048; c=relaxed/simple; bh=rSCXub7Sr6CFfE47H/LfbpMvJVQNf6BTBpm9UI0kJ2Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Y6t3q6q/hFwSkmJkwK+fB9C49crbSSjz+KuKpZW/h0NlPMOFA05pHXp8w0ZXAoEdsMzO36Yyxirh2ly/d767+DkR96qun6e4wyo5OX1fcuaR4RXfnXbskS8ge9TqQOSZQ1w7Lhzv77BHM+MsrowDY+tKxo+z2+kHLYFyuElHAec= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=W4UAiKzT; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="W4UAiKzT" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-21670dce0a7so4514055ad.1 for ; Wed, 08 Jan 2025 14:07:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374045; x=1736978845; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=O92YYKzY0H2ygBXbf/2/Y4ij/Z3R5LvC20UjCWCkSeA=; b=W4UAiKzTashKgx5Kf8XB/34bbyYTFKn5wod7hcoLTV1ks8nFMBPVYAiifGy4eP7u2J VesZHrdgoDEcN0pjcjKBWp6o91cHWCARJPQATR088+1V1aZrbGWEym48O8wcW66yrmid YjFzPnLdiS53m0nXyo5TMY6QmnUnLHsaE3qFaPb34PGh6I9oRj/nIjxAMFKBBmZ553py qBINmhnNdGu+x1UeWwmb1B5O/xuld91vfboUOnjr/suurFiVenFhRhVRuANCPuQlmlrH XL5o41uOzF4KSkj47+TDrGAkJspoyaj/3pj6fo1YD8hQoM54ht0r4dewvbVBG5m6Yiy+ zDeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374045; x=1736978845; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=O92YYKzY0H2ygBXbf/2/Y4ij/Z3R5LvC20UjCWCkSeA=; b=QQDaPU4kIKzIAjy93ILyacspSTadkST8vNb7C0BJuNJ6dAtBqsEZC8VmL7Xho7yfx1 rBnE+4KJya50DepV8enxWgVdx1O+gKqvlsqBNvRsHPUU7Dq8KlNXSTuOEHspOq/Mz367 3e0tJW+idgYlyoH4GZdcouQzWZ4TIztRr88/4MRmPW60ivSHcLfOhOWtBgEK/fMJBzbV qFRbZAFrUEmGXzUaUpKhUqQhArOMzEnV/JMU1bpsfJgYcoVONsgTXgtehvtp/W8/J4y9 N3weeFE1giYYrxYjdASY5QrUvsJvY1/LNSSYCeK53QIgX56SQpixI8zA8MNw6IPBmndi daIA== X-Forwarded-Encrypted: i=1; AJvYcCXFr7Fi2C27bswgvsLsdPHCXkDXVJH4bWYsmy0jtw2DC+qoWZAyKNuWu1nM1r+r7y9+6sTwJ20=@vger.kernel.org X-Gm-Message-State: AOJu0YzC9178eafJsPF8coBZHuW3NV2bF4l0D2KcV6F6jpDIEAKwP/BN XpIrR9WXCBPhk7RKPlyRxPiEua5j/d9ZjtR8LmT2prRxlEydvcDxsE4IzKnoLEg= X-Gm-Gg: ASbGncudNvwx84m1qugaegT8XgD8HP3MA+e5o8EaBhkQoDjVgkb6I/Ck+8sE8IjTmMU PnuubVhW8yeN6M2h07G3SGNgYS3FoGYc39cC5L5xGseRnctz5+a0qzmv1CizqnjPaGKbYFCtIao a7sRJh8z1ZAQyvAJPuF08Ulwyszrw/MEkDy5fd2fA6Yy4KCCb0w4ZKVFRQ1e/rClmH13664t2x2 Ehpi+RuduR77OQwHQ8ZTQZVHvkAH4vEF4hlM6fV X-Google-Smtp-Source: AGHT+IGDQuMN+gg29lAh2F0Zc3BMnlKsqFh26CwGfibI6wQ3ACaTMswTHz+YNP0Tf758TFFJ3ZFsOA== X-Received: by 2002:a17:902:ec83:b0:215:e98c:c5d9 with SMTP id d9443c01a7336-21a83f57012mr54803325ad.18.1736374044819; Wed, 08 Jan 2025 14:07:24 -0800 (PST) Received: from localhost ([2a03:2880:ff:4::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc96e883sm332533265ad.72.2025.01.08.14.07.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:24 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 17/22] io_uring/zcrx: add io_recvzc request Date: Wed, 8 Jan 2025 14:06:38 -0800 Message-ID: <20250108220644.3528845-18-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Add io_uring opcode OP_RECV_ZC for doing zero copy reads out of a socket. Only the connection should be land on the specific rx queue set up for zero copy, and the socket must be handled by the io_uring instance that the rx queue was registered for zero copy with. That's because neither net_iovs / buffers from our queue can be read by outside applications, nor zero copy is possible if traffic for the zero copy connection goes to another queue. This coordination is outside of the scope of this patch series. Also, any traffic directed to the zero copy enabled queue is immediately visible to the application, which is why CAP_NET_ADMIN is required at the registration step. Of course, no data is actually read out of the socket, it has already been copied by the netdev into userspace memory via DMA. OP_RECV_ZC reads skbs out of the socket and checks that its frags are indeed net_iovs that belong to io_uring. A cqe is queued for each one of these frags. Recall that each cqe is a big cqe, with the top half being an io_uring_zcrx_cqe. The cqe res field contains the len or error. The lower IORING_ZCRX_AREA_SHIFT bits of the struct io_uring_zcrx_cqe::off field contain the offset relative to the start of the zero copy area. The upper part of the off field is trivially zero, and will be used to carry the area id. For now, there is no limit as to how much work each OP_RECV_ZC request does. It will attempt to drain a socket of all available data. This request always operates in multishot mode. Reviewed-by: Jens Axboe Signed-off-by: David Wei --- include/uapi/linux/io_uring.h | 2 + io_uring/io_uring.h | 10 ++ io_uring/net.c | 72 +++++++++++++ io_uring/opdef.c | 16 +++ io_uring/zcrx.c | 190 +++++++++++++++++++++++++++++++++- io_uring/zcrx.h | 13 +++ 6 files changed, 302 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index e251f28507ce..b919a541d44f 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -87,6 +87,7 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + __u32 zcrx_ifq_idx; __u32 optlen; struct { __u16 addr_len; @@ -278,6 +279,7 @@ enum io_uring_op { IORING_OP_FTRUNCATE, IORING_OP_BIND, IORING_OP_LISTEN, + IORING_OP_RECV_ZC, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 032758b28d78..2b1ce5539bfe 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -184,6 +184,16 @@ static inline bool io_get_cqe(struct io_ring_ctx *ctx, struct io_uring_cqe **ret return io_get_cqe_overflow(ctx, ret, false); } +static inline bool io_defer_get_uncommited_cqe(struct io_ring_ctx *ctx, + struct io_uring_cqe **cqe_ret) +{ + io_lockdep_assert_cq_locked(ctx); + + ctx->cq_extra++; + ctx->submit_state.cq_flush = true; + return io_get_cqe(ctx, cqe_ret); +} + static __always_inline bool io_fill_cqe_req(struct io_ring_ctx *ctx, struct io_kiocb *req) { diff --git a/io_uring/net.c b/io_uring/net.c index 8457408194e7..5d8b9a016766 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -16,6 +16,7 @@ #include "net.h" #include "notif.h" #include "rsrc.h" +#include "zcrx.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -88,6 +89,13 @@ struct io_sr_msg { */ #define MULTISHOT_MAX_RETRY 32 +struct io_recvzc { + struct file *file; + unsigned msg_flags; + u16 flags; + struct io_zcrx_ifq *ifq; +}; + int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_shutdown *shutdown = io_kiocb_to_cmd(req, struct io_shutdown); @@ -1209,6 +1217,70 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return ret; } +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + unsigned ifq_idx; + + if (unlikely(sqe->file_index || sqe->addr2 || sqe->addr || + sqe->len || sqe->addr3)) + return -EINVAL; + + ifq_idx = READ_ONCE(sqe->zcrx_ifq_idx); + if (ifq_idx != 0) + return -EINVAL; + zc->ifq = req->ctx->ifq; + if (!zc->ifq) + return -EINVAL; + + zc->flags = READ_ONCE(sqe->ioprio); + zc->msg_flags = READ_ONCE(sqe->msg_flags); + if (zc->msg_flags) + return -EINVAL; + if (zc->flags & ~(IORING_RECVSEND_POLL_FIRST | IORING_RECV_MULTISHOT)) + return -EINVAL; + /* multishot required */ + if (!(zc->flags & IORING_RECV_MULTISHOT)) + return -EINVAL; + /* All data completions are posted as aux CQEs. */ + req->flags |= REQ_F_APOLL_MULTISHOT; + + return 0; +} + +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + struct socket *sock; + int ret; + + if (!(req->flags & REQ_F_POLLED) && + (zc->flags & IORING_RECVSEND_POLL_FIRST)) + return -EAGAIN; + + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + + ret = io_zcrx_recv(req, zc->ifq, sock, zc->msg_flags | MSG_DONTWAIT, + issue_flags); + if (unlikely(ret <= 0) && ret != -EAGAIN) { + if (ret == -ERESTARTSYS) + ret = -EINTR; + + req_set_fail(req); + io_req_set_res(req, ret, 0); + + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_STOP_MULTISHOT; + return IOU_OK; + } + + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_ISSUE_SKIP_COMPLETE; + return -EAGAIN; +} + void io_send_zc_cleanup(struct io_kiocb *req) { struct io_sr_msg *zc = io_kiocb_to_cmd(req, struct io_sr_msg); diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 3de75eca1c92..6ae00c0af9a8 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -36,6 +36,7 @@ #include "waitid.h" #include "futex.h" #include "truncate.h" +#include "zcrx.h" static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags) { @@ -513,6 +514,18 @@ const struct io_issue_def io_issue_defs[] = { .async_size = sizeof(struct io_async_msghdr), #else .prep = io_eopnotsupp_prep, +#endif + }, + [IORING_OP_RECV_ZC] = { + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollin = 1, + .ioprio = 1, +#if defined(CONFIG_NET) + .prep = io_recvzc_prep, + .issue = io_recvzc, +#else + .prep = io_eopnotsupp_prep, #endif }, }; @@ -744,6 +757,9 @@ const struct io_cold_def io_cold_defs[] = { [IORING_OP_LISTEN] = { .name = "LISTEN", }, + [IORING_OP_RECV_ZC] = { + .name = "RECV_ZC", + }, }; const char *io_uring_get_opcode(u8 opcode) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 273bad4d86a2..036de2981e64 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -13,6 +13,8 @@ #include #include +#include +#include #include @@ -100,7 +102,12 @@ static void io_zcrx_sync_for_device(const struct page_pool *pool, #define IO_RQ_MAX_ENTRIES 32768 -__maybe_unused +struct io_zcrx_args { + struct io_kiocb *req; + struct io_zcrx_ifq *ifq; + struct socket *sock; +}; + static const struct memory_provider_ops io_uring_pp_zc_ops; static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *niov) @@ -127,6 +134,11 @@ static bool io_zcrx_put_niov_uref(struct net_iov *niov) return true; } +static void io_zcrx_get_niov_uref(struct net_iov *niov) +{ + atomic_inc(io_get_user_counter(niov)); +} + static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq, struct io_uring_zcrx_ifq_reg *reg, struct io_uring_region_desc *rd) @@ -594,3 +606,179 @@ static const struct memory_provider_ops io_uring_pp_zc_ops = { .destroy = io_pp_zc_destroy, .nl_fill = io_pp_nl_fill, }; + +static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov, + struct io_zcrx_ifq *ifq, int off, int len) +{ + struct io_uring_zcrx_cqe *rcqe; + struct io_zcrx_area *area; + struct io_uring_cqe *cqe; + u64 offset; + + if (!io_defer_get_uncommited_cqe(req->ctx, &cqe)) + return false; + + cqe->user_data = req->cqe.user_data; + cqe->res = len; + cqe->flags = IORING_CQE_F_MORE; + + area = io_zcrx_iov_to_area(niov); + offset = off + (net_iov_idx(niov) << PAGE_SHIFT); + rcqe = (struct io_uring_zcrx_cqe *)(cqe + 1); + rcqe->off = offset + ((u64)area->area_id << IORING_ZCRX_AREA_SHIFT); + rcqe->__pad = 0; + return true; +} + +static int io_zcrx_recv_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + const skb_frag_t *frag, int off, int len) +{ + struct net_iov *niov; + + if (unlikely(!skb_frag_is_net_iov(frag))) + return -EOPNOTSUPP; + + niov = netmem_to_net_iov(frag->netmem); + if (niov->pp->mp_ops != &io_uring_pp_zc_ops || + niov->pp->mp_priv != ifq) + return -EFAULT; + + if (!io_zcrx_queue_cqe(req, niov, ifq, off + skb_frag_off(frag), len)) + return -ENOSPC; + + /* + * Prevent it from being recycled while user is accessing it. + * It has to be done before grabbing a user reference. + */ + page_pool_ref_netmem(net_iov_to_netmem(niov)); + io_zcrx_get_niov_uref(niov); + return len; +} + +static int +io_zcrx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, + unsigned int offset, size_t len) +{ + struct io_zcrx_args *args = desc->arg.data; + struct io_zcrx_ifq *ifq = args->ifq; + struct io_kiocb *req = args->req; + struct sk_buff *frag_iter; + unsigned start, start_off; + int i, copy, end, off; + int ret = 0; + + start = skb_headlen(skb); + start_off = offset; + + if (offset < start) + return -EOPNOTSUPP; + + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + const skb_frag_t *frag; + + if (WARN_ON(start > offset + len)) + return -EFAULT; + + frag = &skb_shinfo(skb)->frags[i]; + end = start + skb_frag_size(frag); + + if (offset < end) { + copy = end - offset; + if (copy > len) + copy = len; + + off = offset - start; + ret = io_zcrx_recv_frag(req, ifq, frag, off, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + + skb_walk_frags(skb, frag_iter) { + if (WARN_ON(start > offset + len)) + return -EFAULT; + + end = start + frag_iter->len; + if (offset < end) { + copy = end - offset; + if (copy > len) + copy = len; + + off = offset - start; + ret = io_zcrx_recv_skb(desc, frag_iter, off, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + +out: + if (offset == start_off) + return ret; + return offset - start_off; +} + +static int io_zcrx_tcp_recvmsg(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + struct sock *sk, int flags, + unsigned issue_flags) +{ + struct io_zcrx_args args = { + .req = req, + .ifq = ifq, + .sock = sk->sk_socket, + }; + read_descriptor_t rd_desc = { + .count = 1, + .arg.data = &args, + }; + int ret; + + lock_sock(sk); + ret = tcp_read_sock(sk, &rd_desc, io_zcrx_recv_skb); + if (ret <= 0) { + if (ret < 0 || sock_flag(sk, SOCK_DONE)) + goto out; + if (sk->sk_err) + ret = sock_error(sk); + else if (sk->sk_shutdown & RCV_SHUTDOWN) + goto out; + else if (sk->sk_state == TCP_CLOSE) + ret = -ENOTCONN; + else + ret = -EAGAIN; + } else if (sock_flag(sk, SOCK_DONE)) { + /* Make it to retry until it finally gets 0. */ + if (issue_flags & IO_URING_F_MULTISHOT) + ret = IOU_REQUEUE; + else + ret = -EAGAIN; + } +out: + release_sock(sk); + return ret; +} + +int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + struct socket *sock, unsigned int flags, + unsigned issue_flags) +{ + struct sock *sk = sock->sk; + const struct proto *prot = READ_ONCE(sk->sk_prot); + + if (prot->recvmsg != tcp_recvmsg) + return -EPROTONOSUPPORT; + + sock_rps_record_flow(sk); + return io_zcrx_tcp_recvmsg(req, ifq, sk, flags, issue_flags); +} diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index beacf1ea6380..65e92756720f 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -3,6 +3,7 @@ #define IOU_ZC_RX_H #include +#include #include #include @@ -41,6 +42,9 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, struct io_uring_zcrx_ifq_reg __user *arg); void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx); void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx); +int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + struct socket *sock, unsigned int flags, + unsigned issue_flags); #else static inline int io_register_zcrx_ifq(struct io_ring_ctx *ctx, struct io_uring_zcrx_ifq_reg __user *arg) @@ -53,6 +57,15 @@ static inline void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx) static inline void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) { } +static inline int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + struct socket *sock, unsigned int flags, + unsigned issue_flags) +{ + return -EOPNOTSUPP; +} #endif +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags); +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); + #endif From patchwork Wed Jan 8 22:06:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931637 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CFA77205AC0 for ; Wed, 8 Jan 2025 22:07:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374050; cv=none; b=Dq29so9YBnq8W70rSgDqfaHPYcQHtPDPP/mPRt3mMGyJUbkLJHCdlbikjGnEWFs13iA9fhonIx0MDgt21cAq2gGLYHBZQlWd3iedNU33YdTgAHOw6bqzJ0hKwc7OQ1fgqSmh7o1uaXXDuXn30rhG2aXVk95D9Gv+oNtdstCW/Io= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374050; c=relaxed/simple; bh=6Bqn2WlGC/QyLXjHkwEt96grEpG1mYxHhlF1Chcw+Es=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=irUC884emjcFpz/wIgopMM9U1IoYoF/NX7gNmZ52rOSJiWp0Vf4JGui5M0N7wHYTKMsAUE9dKICnqNH794UyIWaHuTVSo/RMApFEX+CpBwpyV6n02FOPGFFzeno0tfLLWCXTgYeL3pnsF4D4BM8DgBdUrpAZPN2BGMr0yNrfLgE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=FGbuWMhZ; arc=none smtp.client-ip=209.85.216.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="FGbuWMhZ" Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-2ef70c7efa5so404256a91.2 for ; Wed, 08 Jan 2025 14:07:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374046; x=1736978846; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PgGQEoJLO7hhZFXH7kddctHl5vKUc6YtY/qUSu41pew=; b=FGbuWMhZ8EFCbz0wfB0zosrHX5b0E7KeOA7UfbnL7Fuwk41Frk97q+KPsoPyLuMNkk unloXQFz8Er/qWjlsYgo2KDuSMQdEh7ikiEY+maTVz5zIV0vaYyTnlYTIg+0Y0oodFHM 9ua987mZlIBH8B3IfPJ40sj/H+3x7sd+N1DCsj4BwFVRGt1nvD8jiKBPczjpwoVYnYLO s4zPlDwANBbGFIKaFKLQ4oNfeCELurVvaVoarogq8nzNS+gb7I2+I4L/ccfeR0glF8lO 6gzl+Qusvp0tAmUFrTCiyhh/jTUP/KZ5/6KLBJOcOwVcx01ff2majRVeT4reUJeR7QWi l9qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374046; x=1736978846; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PgGQEoJLO7hhZFXH7kddctHl5vKUc6YtY/qUSu41pew=; b=t5WRkXOfSWhS493OSAlnGvqxLqo3RuZGztQ+tb6zorX/kmNVSpzGhIIDRaqV6nG3uX RNVcEWqy3D3tDY9ExDr2W51wNQ+l6Hi+tUyJMeNtxB9B4YxlJGHWrWcF5mWyTMydB/ps UvA2rP2vUIjQoHWgoXkNaLUsSkMijxF0FtuKdhwGOmWRaKOvXeyVmZvrYx4Ou5Epql1W pVGrUwt4w4NDd6ZClzcfjTZaqggKLgLWIcfzxEs0Qx25b9kTXa3EbmXkdrMetLBF98Mb cnAimzwVcS3K3U3ebGyv/AJaxeLFYOL16jLmoj8Wj415///6vSyPRletDASl6EVcwXw6 5ImA== X-Forwarded-Encrypted: i=1; AJvYcCVPXsWLD4Hi/315kjBoLEvYBsRNfHvkHRyk3Hn9CHPn6RF7XjQshVhpAGHZ+p8KKZKcj42MB6U=@vger.kernel.org X-Gm-Message-State: AOJu0YwT4LfoIpe5l3ACS0WlkyuHib6C79AuJL6hn9gnZzrxRULDKltP W8msMQNcRIre5gvjwrYyILhWs1iBk49jyS9yYvRa9GU9400SaLsT7CnkZNjbSyI= X-Gm-Gg: ASbGncuN2oMt5ORR1bayHAWqxzBQs0TclpVOoMV84UamL6WGcFKKTHBHp7kHzyHY/sV g3kB3i6FU4EkWXibag5mvmz+l5ad59XaJW6owaG14oKJDjDRzHDXVlbvPA01rCDVhVrilDhLkJ7 Y0Nim29E+0/zGclu92aB9RBnUk8jAsVhWzRSbPVMwZZyenbvbAtVO15H+vxvc8Nn2KiyWuwJwhZ R4Xd2ogSheAqJiME3Xm24LDQqWvzVfvCPGqfsZFVQ== X-Google-Smtp-Source: AGHT+IF3DPLBKCj1BgtTvQaQMgpAUunD+hMtH5A7pos3phLAMFWyYcaHc7D2A8IIuO3on97HbSrMew== X-Received: by 2002:a17:90a:e18c:b0:2ee:d797:408b with SMTP id 98e67ed59e1d1-2f548e98537mr6484711a91.2.1736374046028; Wed, 08 Jan 2025 14:07:26 -0800 (PST) Received: from localhost ([2a03:2880:ff:1d::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc962d9esm335194945ad.17.2025.01.08.14.07.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:25 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 18/22] io_uring/zcrx: set pp memory provider for an rx queue Date: Wed, 8 Jan 2025 14:06:39 -0800 Message-ID: <20250108220644.3528845-19-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Set the page pool memory provider for the rx queue configured for zero copy to io_uring. Then the rx queue is reset using netdev_rx_queue_restart() and netdev core + page pool will take care of filling the rx queue from the io_uring zero copy memory provider. For now, there is only one ifq so its destruction happens implicitly during io_uring cleanup. Reviewed-by: Jens Axboe Signed-off-by: David Wei --- io_uring/zcrx.c | 83 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 75 insertions(+), 8 deletions(-) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 036de2981e64..caaec528cc3c 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -11,11 +11,12 @@ #include #include #include - -#include +#include #include #include +#include + #include #include "io_uring.h" @@ -139,6 +140,65 @@ static void io_zcrx_get_niov_uref(struct net_iov *niov) atomic_inc(io_get_user_counter(niov)); } +static int io_open_zc_rxq(struct io_zcrx_ifq *ifq, unsigned ifq_idx) +{ + struct netdev_rx_queue *rxq; + struct net_device *dev = ifq->dev; + int ret; + + ASSERT_RTNL(); + + if (ifq_idx >= dev->num_rx_queues) + return -EINVAL; + ifq_idx = array_index_nospec(ifq_idx, dev->num_rx_queues); + + rxq = __netif_get_rx_queue(ifq->dev, ifq_idx); + if (rxq->mp_params.mp_priv) + return -EEXIST; + + ifq->if_rxq = ifq_idx; + rxq->mp_params.mp_ops = &io_uring_pp_zc_ops; + rxq->mp_params.mp_priv = ifq; + ret = netdev_rx_queue_restart(ifq->dev, ifq->if_rxq); + if (ret) + goto fail; + return 0; +fail: + rxq->mp_params.mp_ops = NULL; + rxq->mp_params.mp_priv = NULL; + ifq->if_rxq = -1; + return ret; +} + +static void io_close_zc_rxq(struct io_zcrx_ifq *ifq) +{ + struct netdev_rx_queue *rxq; + int err; + + if (ifq->if_rxq == -1) + return; + + rtnl_lock(); + if (WARN_ON_ONCE(ifq->if_rxq >= ifq->dev->num_rx_queues)) { + rtnl_unlock(); + return; + } + + rxq = __netif_get_rx_queue(ifq->dev, ifq->if_rxq); + + WARN_ON_ONCE(rxq->mp_params.mp_priv != ifq); + + rxq->mp_params.mp_ops = NULL; + rxq->mp_params.mp_priv = NULL; + + err = netdev_rx_queue_restart(ifq->dev, ifq->if_rxq); + if (err) + pr_devel("io_uring: can't restart a queue on zcrx close\n"); + + rtnl_unlock(); + ifq->if_rxq = -1; +} + static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq, struct io_uring_zcrx_ifq_reg *reg, struct io_uring_region_desc *rd) @@ -275,6 +335,8 @@ static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx) static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq) { + io_close_zc_rxq(ifq); + if (ifq->area) io_zcrx_free_area(ifq->area); @@ -337,7 +399,6 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, goto err; ifq->rq_entries = reg.rq_entries; - ifq->if_rxq = reg.if_rxq; ret = -ENODEV; ifq->dev = netdev_get_by_index(current->nsproxy->net_ns, reg.if_idx, @@ -349,16 +410,20 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, if (ret) goto err; + rtnl_lock(); + ret = io_open_zc_rxq(ifq, reg.if_rxq); + rtnl_unlock(); + if (ret) + goto err; + reg.offsets.rqes = sizeof(struct io_uring); reg.offsets.head = offsetof(struct io_uring, head); reg.offsets.tail = offsetof(struct io_uring, tail); if (copy_to_user(arg, ®, sizeof(reg)) || - copy_to_user(u64_to_user_ptr(reg.region_ptr), &rd, sizeof(rd))) { - ret = -EFAULT; - goto err; - } - if (copy_to_user(u64_to_user_ptr(reg.area_ptr), &area, sizeof(area))) { + copy_to_user(u64_to_user_ptr(reg.region_ptr), &rd, sizeof(rd)) || + copy_to_user(u64_to_user_ptr(reg.area_ptr), &area, sizeof(area))) { + io_close_zc_rxq(ifq); ret = -EFAULT; goto err; } @@ -435,6 +500,8 @@ void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) if (ctx->ifq) io_zcrx_scrub(ctx->ifq); + + io_close_zc_rxq(ctx->ifq); } static inline u32 io_zcrx_rqring_entries(struct io_zcrx_ifq *ifq) From patchwork Wed Jan 8 22:06:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931636 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D763D205AC1 for ; Wed, 8 Jan 2025 22:07:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374050; cv=none; b=uczsjruuAZmfl2i/iIyHA+qcj+5KDtz6z1W0cf6MKuRN30/gMZ0YPKPmZOODPxw1mQBdGnZz332+u8qF4x90QUjnqXoiBvcf35J++FISImE9I56e9tgDgGjbfFMwvR/2qy0b4IfwKu0GqlQ+rt6wsqI5zq2KdmxVTP1p5DDPfOU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374050; c=relaxed/simple; bh=ucD427e85qUTwFRQjJcGMV6IGNLXsiIhlBVrZEzf/2Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SFrX2MY4VX9tQGsCLDivEjqvi49BLeA9/Z5ic4TYjcDn8FU9ZWV6O/5zJqPxskZjcXh0DWeWhUFPZAgjy4aKe/BIN+4myd3IxbjkYELH1VnxZSRCIZvYxjuMBndd7UFuPjnBChI/dnx+pUg4Ie8I8orR+SavixoB6D2zi2LugeQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=cDqoXsIW; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="cDqoXsIW" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-2163dc5155fso3665475ad.0 for ; Wed, 08 Jan 2025 14:07:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374047; x=1736978847; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rM5oqwlPCboF3L0KLpjyjzpa6cyTqgw+zU9Oh18/VIs=; b=cDqoXsIWm3Vqb/BcuZ0fScFAWKxGw0Lg7eohspM6pfIHtIJEU4EGBkWl3gYWwGfohO nFJM2Z+PEb1k/2JDWLJA0xs34DfPeUj5r9Ftm2eAuhgbyV/MNweKKwaXgzOL9junUzNm 96iwELXcjVmzXPsYlurYFE3PitPUeZBSWfxZKYFCXWwrla5a84dWpreZvuTTg3L10lVp LV4C4THDTRbOOqEoN3dVH+TEH/fBysxmn839cgfn1hle0iHeWkmZDOSxmw711AizbRXM bkgIffREBiZrsYbzrSx8dDL8kBXSK/5hkb0ckVru81MGuKPheQCGRFFOzqZwDI2DOHZn tlsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374047; x=1736978847; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rM5oqwlPCboF3L0KLpjyjzpa6cyTqgw+zU9Oh18/VIs=; b=SE2FhIymbfx1+wshX9HIWuOTWB9SJjFRg3rDc806PhmnNkZrcI9w7dRseNOt9jrefR qaip83uRbrD693UB2CjFiB6fLEZmb41b4OURrnPLYwjd5OgxAMnrTdudfRmgOvZhjMjm OngTU+YjsfunlG5B2HHmXYS1UYcE9rYTPLvVYT4YhEePaLbCqgwo4eQZs1lD00viTqqQ IbbxqQFgKDg73gNz0lmDdAyeGfAWMBEr1IRdpGU6ECaJUjNn59qBCkcK362c8M0H972L 32UaSwqvB7Qs1LEM16VU03jy6DRguKghkXMobRnGIiWM0A9cDKF3NyZ1z1bPKp7uvy8l gr3w== X-Forwarded-Encrypted: i=1; AJvYcCVD/HEw2xYCE5Z7UTkwzo2QvRNB0qem2Q7Grg+r0BveuepVYi0JcW3RNrc0prUhxt6xNzpMa9Q=@vger.kernel.org X-Gm-Message-State: AOJu0YxboSazgbHrLQVq73EepluM53YtsRZ2DTSCecoAlJWWD2bzdjdd rKTtT3asLW0pSnAP2ADkuhmK7gM/N02f+nanEIqaZ2hFNfW6Eh5ZbHHxXuVnvj8= X-Gm-Gg: ASbGncvikBRNqK/qXV46hQPwUkOpgkm8USzZQrzmdvV2T4+V+PHrKBYr4TtUX+30LmX O/ZRmKHGbvU890n/d100bEKZnc2hYq5gl5NeNgMutfq4gFVPupQAq8npiLXiSpqExJOS3W9ApxT w5RbeKr0EDiukaersWrbRGXv79Lifpww2tToDuFW+lGxr0kQ6B4RBxwHC6TS19tdfUQOXQitRR4 oPdv9d6fHidZR7eLGDQXrfyJNIHgh4yylsCy24Y X-Google-Smtp-Source: AGHT+IHocG7GscyfUwB2sW3F7E+mdclDRN/mLUnuLEbf4sEtVLy56qsB+YrqQ8es+luw47eXQJJIoA== X-Received: by 2002:a17:902:f706:b0:216:69ca:773b with SMTP id d9443c01a7336-21a83f4b2bfmr65633395ad.5.1736374047223; Wed, 08 Jan 2025 14:07:27 -0800 (PST) Received: from localhost ([2a03:2880:ff:9::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dca02aa7sm335703425ad.268.2025.01.08.14.07.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:26 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 19/22] io_uring/zcrx: throttle receive requests Date: Wed, 8 Jan 2025 14:06:40 -0800 Message-ID: <20250108220644.3528845-20-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov io_zc_rx_tcp_recvmsg() continues until it fails or there is nothing to receive. If the other side sends fast enough, we might get stuck in io_zc_rx_tcp_recvmsg() producing more and more CQEs but not letting the user to handle them leading to unbound latencies. Break out of it based on an arbitrarily chosen limit, the upper layer will either return to userspace or requeue the request. Reviewed-by: Jens Axboe Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- io_uring/net.c | 2 ++ io_uring/zcrx.c | 9 +++++++++ 2 files changed, 11 insertions(+) diff --git a/io_uring/net.c b/io_uring/net.c index 5d8b9a016766..86eaba37e739 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1267,6 +1267,8 @@ int io_recvzc(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(ret <= 0) && ret != -EAGAIN) { if (ret == -ERESTARTSYS) ret = -EINTR; + if (ret == IOU_REQUEUE) + return IOU_REQUEUE; req_set_fail(req); io_req_set_res(req, ret, 0); diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index caaec528cc3c..0c737ab9058d 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -103,10 +103,13 @@ static void io_zcrx_sync_for_device(const struct page_pool *pool, #define IO_RQ_MAX_ENTRIES 32768 +#define IO_SKBS_PER_CALL_LIMIT 20 + struct io_zcrx_args { struct io_kiocb *req; struct io_zcrx_ifq *ifq; struct socket *sock; + unsigned nr_skbs; }; static const struct memory_provider_ops io_uring_pp_zc_ops; @@ -734,6 +737,9 @@ io_zcrx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, int i, copy, end, off; int ret = 0; + if (unlikely(args->nr_skbs++ > IO_SKBS_PER_CALL_LIMIT)) + return -EAGAIN; + start = skb_headlen(skb); start_off = offset; @@ -824,6 +830,9 @@ static int io_zcrx_tcp_recvmsg(struct io_kiocb *req, struct io_zcrx_ifq *ifq, ret = -ENOTCONN; else ret = -EAGAIN; + } else if (unlikely(args.nr_skbs > IO_SKBS_PER_CALL_LIMIT) && + (issue_flags & IO_URING_F_MULTISHOT)) { + ret = IOU_REQUEUE; } else if (sock_flag(sk, SOCK_DONE)) { /* Make it to retry until it finally gets 0. */ if (issue_flags & IO_URING_F_MULTISHOT) From patchwork Wed Jan 8 22:06:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931638 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1976F205AC5 for ; Wed, 8 Jan 2025 22:07:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374050; cv=none; b=X+d4O/iq2g7X5YDeZ2AckW6kP2EkCxdk7BpwCkTBVfioTl8bTQABGalEiNdPeoZdwnRH73EVjix7h7ai1Mu3JzqKjB8sH/tUoNW9mKjPJthYnRELsE0OrtQW1OgJaKqtNl2tIQCmCrCdq1IFtK4Rs2SqScLOsX54HjRDr7PHLvc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374050; c=relaxed/simple; bh=8WDGKnJjhLIfGvG3a81UX+B9IT5lnrkN7/jYE+f6kLM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NeETpqfPog9myS7prdNQpWTcsXgnS2aEbvkLBXQMh4uM0een6SP8cs7TVpIslXof0P2N2Ea+aXNtuD1adjcx1Yn9hVbpJ0/7K8bUXKq+I13r+YW8p6ckmy9BGiKSYck4sAOceG799jD6OjM+jZKctWyuGp4EJUmrxHa/KpQ9F/k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=LchMLp/P; arc=none smtp.client-ip=209.85.216.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="LchMLp/P" Received: by mail-pj1-f42.google.com with SMTP id 98e67ed59e1d1-2eeb4d643a5so488721a91.3 for ; Wed, 08 Jan 2025 14:07:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374048; x=1736978848; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NZU9x/wqygrpt9DsfwnKD2FuRQQk1M2IBmyzCsWeZ5s=; b=LchMLp/PD9CKN6aNGjvdZBaUk9HvdN1iwpObc2P+1hEDK7QNsIYrfOOUI2ukJyt4zr ZkFUR6KPq29CIUqk+XRQx+k2QaI+G2PBkZq090i11VOBii0854CGrngrqA9/DU715EFF VLTD515ngWQ3y/EO1CAyrBAVqiTvSMOIU1vA7tBp5m3Ek/UTDQPU8BmHaH7HnCWFFr4I DAe1H6mTT9ObQvJIpsTBvIZELMeiOrNd5yS5chz8ogkxa4okCSXXeYRymKNE/Xfwp0C0 KWpoKCm/J+r+E3whNJ452A11cnKr1CBMTtcY3OQ2KeSeGAycWnKvZzmmL+2SdRMBffvu Gu1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374048; x=1736978848; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NZU9x/wqygrpt9DsfwnKD2FuRQQk1M2IBmyzCsWeZ5s=; b=SHagYUOr1H1/oA18CpBAoTmrcruxkJe9gMAOm17Vh7OTV/vk0l9sZiTUT8y1NI4K3i mPteXc0Bovc2CtWV4hXWetQi7VyRYxK263W1STKzb8RI7Uo0KWVKkR/FuThhMvIYVKXq DRT+ODDzsjZSmCFvTcZCdsYlCUwUXEHqr8C3K+MSkS4W3q6R40PNWP3uSp/lp6sroGl4 vsUsEQ3I/AS8HbHE2TbodKtVFJVshfMJFjG/f5HD81+codmJtb936Os7a2Em8PKLvLy0 uL/PV0Wx133hB0fw/uZ6tZePkhpx2GHsRrEiLKUho0LtOfK6G5NLioDMDOf/yLieBryY YIAg== X-Forwarded-Encrypted: i=1; AJvYcCWcaUZq/gORKdzhPapW/KXlptTWoFy52BKAEaEPKmWGFNnx6P6yW5pTz6qGJCke/QI14kQ2gNg=@vger.kernel.org X-Gm-Message-State: AOJu0YyyWOG0APOOmXoZgVv3VDeYAk48fpnHebvogaU6HFUFuMrGFuKP zI/Do7bKYhkzHaLg5PeVfm/XIbilHZmhn2sDV1yTMlowbaAyaC+J2PNXh6IpNAg= X-Gm-Gg: ASbGncsCx3f4uIo5MfV9WhBKddw2SXVXPW/Hp+zypksjCpzpGyUfBMb5UQkw2P560Wb E/WYgUuh3hjaaGJEllGACBW+RtjZ9KizLKpVp0hxQaO/IySWo+OM/SRPZMZFEEb8qRWeX3/3wjK ERE1ZCdTnhtC3f0xRyxJxfzvli5ETRtA3a7VMg9oNaM8BLmzvKICCUJ29XlEjBYxaPYPyjIWZp8 ruJO/DKhPCcu+y5N7ItxnhLBOxpwlUekEwO4z5b X-Google-Smtp-Source: AGHT+IEFqwpe6FeB+6UwmdaptfVGLFGNuJHWyasVbPZIIprT0ltZExx3+7vFesafv0j8PfnoNnpLcQ== X-Received: by 2002:a17:90b:1f91:b0:2ee:f46f:4d5f with SMTP id 98e67ed59e1d1-2f548f17310mr6093970a91.6.1736374048356; Wed, 08 Jan 2025 14:07:28 -0800 (PST) Received: from localhost ([2a03:2880:ff:7::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc96e85csm332166945ad.61.2025.01.08.14.07.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:27 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 20/22] io_uring/zcrx: add copy fallback Date: Wed, 8 Jan 2025 14:06:41 -0800 Message-ID: <20250108220644.3528845-21-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov There are scenarios in which the zerocopy path can get a kernel buffer instead of a net_iov and needs to copy it to the user, whether it is because of mis-steering or simply getting an skb with the linear part. In this case, grab a net_iov, copy into it and return it to the user as normally. At the moment the user doesn't get any indication whether there was a copy or not, which is left for follow up work. Reviewed-by: Jens Axboe Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- io_uring/zcrx.c | 121 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 115 insertions(+), 6 deletions(-) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 0c737ab9058d..b5ce336fc78d 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -143,6 +144,13 @@ static void io_zcrx_get_niov_uref(struct net_iov *niov) atomic_inc(io_get_user_counter(niov)); } +static inline struct page *io_zcrx_iov_page(const struct net_iov *niov) +{ + struct io_zcrx_area *area = io_zcrx_iov_to_area(niov); + + return area->pages[net_iov_idx(niov)]; +} + static int io_open_zc_rxq(struct io_zcrx_ifq *ifq, unsigned ifq_idx) { struct netdev_rx_queue *rxq; @@ -165,6 +173,7 @@ static int io_open_zc_rxq(struct io_zcrx_ifq *ifq, unsigned ifq_idx) ret = netdev_rx_queue_restart(ifq->dev, ifq->if_rxq); if (ret) goto fail; + return 0; fail: rxq->mp_params.mp_ops = NULL; @@ -473,6 +482,11 @@ static void io_zcrx_return_niov(struct net_iov *niov) { netmem_ref netmem = net_iov_to_netmem(niov); + if (!niov->pp) { + /* copy fallback allocated niovs */ + io_zcrx_return_niov_freelist(niov); + return; + } page_pool_put_unrefed_netmem(niov->pp, netmem, -1, false); } @@ -700,13 +714,93 @@ static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov, return true; } +static struct net_iov *io_zcrx_alloc_fallback(struct io_zcrx_area *area) +{ + struct net_iov *niov = NULL; + + spin_lock_bh(&area->freelist_lock); + if (area->free_count) + niov = __io_zcrx_get_free_niov(area); + spin_unlock_bh(&area->freelist_lock); + + if (niov) + page_pool_fragment_netmem(net_iov_to_netmem(niov), 1); + return niov; +} + +static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + void *src_base, struct page *src_page, + unsigned int src_offset, size_t len) +{ + struct io_zcrx_area *area = ifq->area; + size_t copied = 0; + int ret = 0; + + while (len) { + size_t copy_size = min_t(size_t, PAGE_SIZE, len); + const int dst_off = 0; + struct net_iov *niov; + struct page *dst_page; + void *dst_addr; + + niov = io_zcrx_alloc_fallback(area); + if (!niov) { + ret = -ENOMEM; + break; + } + + dst_page = io_zcrx_iov_page(niov); + dst_addr = kmap_local_page(dst_page); + if (src_page) + src_base = kmap_local_page(src_page); + + memcpy(dst_addr, src_base + src_offset, copy_size); + + if (src_page) + kunmap_local(src_base); + kunmap_local(dst_addr); + + if (!io_zcrx_queue_cqe(req, niov, ifq, dst_off, copy_size)) { + io_zcrx_return_niov(niov); + ret = -ENOSPC; + break; + } + + io_zcrx_get_niov_uref(niov); + src_offset += copy_size; + len -= copy_size; + copied += copy_size; + } + + return copied ? copied : ret; +} + +static int io_zcrx_copy_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + const skb_frag_t *frag, int off, int len) +{ + struct page *page = skb_frag_page(frag); + u32 p_off, p_len, t, copied = 0; + int ret = 0; + + off += skb_frag_off(frag); + + skb_frag_foreach_page(frag, off, len, + page, p_off, p_len, t) { + ret = io_zcrx_copy_chunk(req, ifq, NULL, page, p_off, p_len); + if (ret < 0) + return copied ? copied : ret; + copied += ret; + } + return copied; +} + static int io_zcrx_recv_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq, const skb_frag_t *frag, int off, int len) { struct net_iov *niov; if (unlikely(!skb_frag_is_net_iov(frag))) - return -EOPNOTSUPP; + return io_zcrx_copy_frag(req, ifq, frag, off, len); niov = netmem_to_net_iov(frag->netmem); if (niov->pp->mp_ops != &io_uring_pp_zc_ops || @@ -733,18 +827,33 @@ io_zcrx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, struct io_zcrx_ifq *ifq = args->ifq; struct io_kiocb *req = args->req; struct sk_buff *frag_iter; - unsigned start, start_off; + unsigned start, start_off = offset; int i, copy, end, off; int ret = 0; if (unlikely(args->nr_skbs++ > IO_SKBS_PER_CALL_LIMIT)) return -EAGAIN; - start = skb_headlen(skb); - start_off = offset; + if (unlikely(offset < skb_headlen(skb))) { + ssize_t copied; + size_t to_copy; - if (offset < start) - return -EOPNOTSUPP; + to_copy = min_t(size_t, skb_headlen(skb) - offset, len); + copied = io_zcrx_copy_chunk(req, ifq, skb->data, NULL, + offset, to_copy); + if (copied < 0) { + ret = copied; + goto out; + } + offset += copied; + len -= copied; + if (!len) + goto out; + if (offset != skb_headlen(skb)) + goto out; + } + + start = skb_headlen(skb); for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { const skb_frag_t *frag; From patchwork Wed Jan 8 22:06:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931639 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 590DB205E08 for ; Wed, 8 Jan 2025 22:07:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374051; cv=none; b=Car3tkbIYhY6/uuhnOzn7Z3K72ZnOqI2eEoCfY8ceNHI5oOeEFOTbEfp6qwohUQ3lDj1jYktCM7wl+jNBQ3tESrsojKJge2Cu8393IseT6KJX2l6q3yYuDB6jC5QM6sDe/mxUFAUYPidOAxaxjR3nsEKk/zNA62PHOaE0evsk00= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374051; c=relaxed/simple; bh=aPsgcI6sv2YIZcrbbnlWPzhKE0O2/vrVZpMxYf0vPWw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AW+qJyCJtXdzsBJHdni+aRxUkSpkCPiR4UissyAifMgU7Lm5+QVzYdtptXo7maEqYe3SqQW0kBBT0gGU0Ql4jBEaaSSp7JrnQRDh/n+tkrGLjGb1oKnyhR8SbkP4+mkL27ZbumGmD7aS/OZPOvIHCN8UrWhwY1GDy7w+cmSE+/8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=16Ov+DcH; arc=none smtp.client-ip=209.85.214.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="16Ov+DcH" Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-21669fd5c7cso3275115ad.3 for ; Wed, 08 Jan 2025 14:07:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374050; x=1736978850; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=EFrx4dtkhpWaD6dvhicsjhhig5cONS03Ll2dOXxfwMI=; b=16Ov+DcH6KfImxpV0HsYRMn3HZd1IX5qa7td4jUA+AJawCb1oD4xIC2vGzJaq3lZLy GySheGOmVYpDQ+l65emR/GhyP7ONnWDiSl9F9+MzJnuLOqogY18wj4jn5qsrwf0nhQHk WDQlj3bq9ADcvTJoF5firNhTw19i/E1ssQhgWcBVRXI7Kjni1RtRu43p3sJ71jKUwLIk 8zcvOqq53ciuJzLWlbOlLjNcfvwAVoVAE4OqPTg9bUbQrJTPMiWZq8zwU80NphkFM23Y gVtBf5euEHa6UH/wSIghXdJ/Nuvg4bjm4lz1YS0GO3Kn/u2Jgi8IVm6jj9GXKkh6gCOi J9ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374050; x=1736978850; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EFrx4dtkhpWaD6dvhicsjhhig5cONS03Ll2dOXxfwMI=; b=tDelmFBZNv8c2gQBNE6MuNqaYbJeB4gK0HH4d32yP0YGQWaoL9p538xbnEm33EXjAk QskjGYtgbo/WhebAuG8VLYtpBwRnBLVHU2bHgo3pWbQxVgTc90r5Ae6Ix3+hOVKLTHIW lwkdJfchhs5rXJ5HWGW1hH8bMvJb3dk5rb8nImTtAoSDX84inz8U1JX+89ytG/EqeT8J ReOoa62nBM6nfbGypqQtTnobhYTUdCfYnOqxu72ouAl02PJSeNs9KyK3M46VskqvLy0c 8xSfs/66OAQBgzbuOYfCxcnZdZsqj97XeK9pDt1UP8/rMH3IsRfSUM9EO2+hK66MJWqc Xv/w== X-Forwarded-Encrypted: i=1; AJvYcCUsBJ/Lka1edkz5tZG+Jcg+7biv+5tPwudzap4nHHcIxjcu8dISD8gJAY/ndve+r+udt2Ahu+o=@vger.kernel.org X-Gm-Message-State: AOJu0YwWC0gw/AIUTPmOnGRNmC8JXn972jZHmN6UuC/UfDb18dMTnR9D 2I/3EFzxZUuwCVM7QTVtEFaz93S5UEboflXU/MKo+W6k8RO5fu1SFg+0T0rVjCA= X-Gm-Gg: ASbGnct3lL+mEZVPSYjfDA729V2wSPpBIIOQXnIA05CFJ7fABZAqgsO2EGfyHm8GZkB 6HEwtzLWBiFKijdQT5yNFe8vWeYC1Jk8vCtOPu+lfiXlLerR3YS+aVuiFSChNEF6HRveYLD1DZH k20R9QLfPFLGsf5oHvba1YHG0GXb9rsFYmKBsZlg2q/V373F5cUzW1gxN6fyUjaNY7B6XQ5C0Wc BLGyk+1tN5o5cQZdllCsRXy5IvNuqs70nRf0g/hig== X-Google-Smtp-Source: AGHT+IHubdplEtDBOgwQZLxpUp6EjcLcVWxOUHK4M4irb7aq/vUTNwZcezfthrMzynOMLPZ4ojZsJQ== X-Received: by 2002:a05:6a20:2594:b0:1db:f68a:d943 with SMTP id adf61e73a8af0-1e88cfa87e5mr7562855637.17.1736374049611; Wed, 08 Jan 2025 14:07:29 -0800 (PST) Received: from localhost ([2a03:2880:ff:70::]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72afb89fedesm26378042b3a.84.2025.01.08.14.07.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:29 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 21/22] net: add documentation for io_uring zcrx Date: Wed, 8 Jan 2025 14:06:42 -0800 Message-ID: <20250108220644.3528845-22-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Add documentation for io_uring zero copy Rx that explains requirements and the user API. Signed-off-by: David Wei --- Documentation/networking/index.rst | 1 + Documentation/networking/iou-zcrx.rst | 201 ++++++++++++++++++++++++++ 2 files changed, 202 insertions(+) create mode 100644 Documentation/networking/iou-zcrx.rst diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 058193ed2eeb..c64133d309bf 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -63,6 +63,7 @@ Contents: gtp ila ioam6-sysctl + iou-zcrx ip_dynaddr ipsec ip-sysctl diff --git a/Documentation/networking/iou-zcrx.rst b/Documentation/networking/iou-zcrx.rst new file mode 100644 index 000000000000..7f6b7c072b59 --- /dev/null +++ b/Documentation/networking/iou-zcrx.rst @@ -0,0 +1,201 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +io_uring zero copy Rx +===================== + +Introduction +============ + +io_uring zero copy Rx (ZC Rx) is a feature that removes kernel-to-user copy on +the network receive path, allowing packet data to be received directly into +userspace memory. This feature is different to TCP_ZEROCOPY_RECEIVE in that +there are no strict alignment requirements and no need to mmap()/munmap(). +Compared to kernel bypass solutions such as e.g. DPDK, the packet headers are +processed by the kernel TCP stack as normal. + +NIC HW Requirements +=================== + +Several NIC HW features are required for io_uring ZC Rx to work. For now the +kernel API does not configure the NIC and it must be done by the user. + +Header/data split +----------------- + +Required to split packets at the L4 boundary into a header and a payload. +Headers are received into kernel memory as normal and processed by the TCP +stack as normal. Payloads are received into userspace memory directly. + +Flow steering +------------- + +Specific HW Rx queues are configured for this feature, but modern NICs +typically distribute flows across all HW Rx queues. Flow steering is required +to ensure that only desired flows are directed towards HW queues that are +configured for io_uring ZC Rx. + +RSS +--- + +In addition to flow steering above, RSS is required to steer all other non-zero +copy flows away from queues that are configured for io_uring ZC Rx. + +Usage +===== + +Setup NIC +--------- + +Must be done out of band for now. + +Ensure there are at least two queues:: + + ethtool -L eth0 combined 2 + +Enable header/data split:: + + ethtool -G eth0 tcp-data-split on + +Carve out half of the HW Rx queues for zero copy using RSS:: + + ethtool -X eth0 equal 1 + +Set up flow steering, bearing in mind that queues are 0-indexed:: + + ethtool -N eth0 flow-type tcp6 ... action 1 + +Setup io_uring +-------------- + +This section describes the low level io_uring kernel API. Please refer to +liburing documentation for how to use the higher level API. + +Create an io_uring instance with the following required setup flags:: + + IORING_SETUP_SINGLE_ISSUER + IORING_SETUP_DEFER_TASKRUN + IORING_SETUP_CQE32 + +Create memory area +------------------ + +Allocate userspace memory area for receiving zero copy data:: + + void *area_ptr = mmap(NULL, area_size, + PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, + 0, 0); + +Create refill ring +------------------ + +Allocate memory for a shared ringbuf used for returning consumed buffers:: + + void *ring_ptr = mmap(NULL, ring_size, + PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, + 0, 0); + +This refill ring consists of some space for the header, followed by an array of +``struct io_uring_zcrx_rqe``:: + + size_t rq_entries = 4096; + size_t ring_size = rq_entries * sizeof(struct io_uring_zcrx_rqe) + PAGE_SIZE; + /* align to page size */ + ring_size = (ring_size + (PAGE_SIZE - 1)) & ~(PAGE_SIZE - 1); + +Register ZC Rx +-------------- + +Fill in registration structs:: + + struct io_uring_zcrx_area_reg area_reg = { + .addr = (__u64)(unsigned long)area_ptr, + .len = area_size, + .flags = 0, + }; + + struct io_uring_region_desc region_reg = { + .user_addr = (__u64)(unsigned long)ring_ptr, + .size = ring_size, + .flags = IORING_MEM_REGION_TYPE_USER, + }; + + struct io_uring_zcrx_ifq_reg reg = { + .if_idx = if_nametoindex("eth0"), + /* this is the HW queue with desired flow steered into it */ + .if_rxq = 1, + .rq_entries = rq_entries, + .area_ptr = (__u64)(unsigned long)&area_reg, + .region_ptr = (__u64)(unsigned long)®ion_reg, + }; + +Register with kernel:: + + io_uring_register_ifq(ring, ®); + +Map refill ring +--------------- + +The kernel fills in fields for the refill ring in the registration ``struct +io_uring_zcrx_ifq_reg``. Map it into userspace:: + + struct io_uring_zcrx_rq refill_ring; + + refill_ring.khead = (unsigned *)((char *)ring_ptr + reg.offsets.head); + refill_ring.khead = (unsigned *)((char *)ring_ptr + reg.offsets.tail); + refill_ring.rqes = + (struct io_uring_zcrx_rqe *)((char *)ring_ptr + reg.offsets.rqes); + refill_ring.rq_tail = 0; + refill_ring.ring_ptr = ring_ptr; + +Receiving data +-------------- + +Prepare a zero copy recv request:: + + struct io_uring_sqe *sqe; + + sqe = io_uring_get_sqe(ring); + io_uring_prep_rw(IORING_OP_RECV_ZC, sqe, fd, NULL, 0, 0); + sqe->ioprio |= IORING_RECV_MULTISHOT; + +Now, submit and wait:: + + io_uring_submit_and_wait(ring, 1); + +Finally, process completions:: + + struct io_uring_cqe *cqe; + unsigned int count = 0; + unsigned int head; + + io_uring_for_each_cqe(ring, head, cqe) { + struct io_uring_zcrx_cqe *rcqe = (struct io_uring_zcrx_cqe *)(cqe + 1); + + unsigned char *data = area_ptr + (rcqe->off & IORING_ZCRX_AREA_MASK); + /* do something with the data */ + + count++; + } + io_uring_cq_advance(ring, count); + +Recycling buffers +----------------- + +Return buffers back to the kernel to be used again:: + + struct io_uring_zcrx_rqe *rqe; + unsigned mask = refill_ring.ring_entries - 1; + rqe = &refill_ring.rqes[refill_ring.rq_tail & mask]; + + area_offset = rcqe->off & IORING_ZCRX_AREA_MASK; + rqe->off = area_offset | area_reg.rq_area_token; + rqe->len = cqe->res; + IO_URING_WRITE_ONCE(*refill_ring.ktail, ++refill_ring.rq_tail); + +Testing +======= + +See ``tools/testing/selftests/drivers/net/hw/iou-zcrx.c`` From patchwork Wed Jan 8 22:06:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13931640 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89AFB205E35 for ; Wed, 8 Jan 2025 22:07:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374053; cv=none; b=E5hjIB5Hcr9gVEgsCtR1nRXbV+NXonVWIOSzC++IRSLZZLBV7MoZgcpmo7Mv05X47oyBf+e8X0ldpRYeU6s3S4buEwCjz4Q0a2f+qyeeaJ62gE+8xWJLXXOjzYIvHP6AoNAsw91vrHi2ji+fDtmEmMtu4jsQQQ+5/oaR8eArYqo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736374053; c=relaxed/simple; bh=RJ3K+4piK/UVgdgx1vz1appLr/O4OzWUPMB/bMydooE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kDh1++sAaqcIDzBcKx6zUpW4ZDUv7HkI3kkYFvIYBde2rbrszDU0iE+gqrGWQItjo8mMRSh0oxBn6ISH4tRT/W6eIfYrapiid6DbJXm0MYu5X8y853qKd3eXmQqEqlwpY9uHirItHTlm/OJAZIGYlA5s3lWXGK0/omrfk8sNIMc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=XTPVJLZA; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="XTPVJLZA" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-216634dd574so2400355ad.2 for ; Wed, 08 Jan 2025 14:07:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1736374051; x=1736978851; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=t1OiJy+4RfkLKo2enhvLZPVnuvjHilcaCeMjxTsj5g4=; b=XTPVJLZARMwzk7nqaUB8a14mnYAgoRsVvc9BacFpFnfHRGXDnsih1Y8ahNZQbo8uyU i0OFh/q8dUsieS/RzM2QTYEKyeFcoJux8tX8r39iNAgA3IYaJE9zsSjT+a8Vjyw994MW bgwC9/KJVwmTTTCavbq/y3BnlH+Owe4BAGAHOj3hzk1oWT0Ai6x8qSOG5x602eGqNiMk hIpWJthxn7m2CCinDG02oy+NfvFnodriHhCYWdfpnYQvfYii24HUOVviLM7Gwxe5UlAD UZu/z+uhJiuRO/LF407NTmhieBIsAYX8tdr0r1w699TN5ooRhWImhTMyUlk0COChNSS7 Y1gA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736374051; x=1736978851; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=t1OiJy+4RfkLKo2enhvLZPVnuvjHilcaCeMjxTsj5g4=; b=l0oMpoyfWyQUqFjdzh144uFfsapKxDvmVxToFGB0UvP20UsBaxuwVolQmcaomtXHYY DexjiqKsa6WSyH9sIT1HzOw1ygcAY/qtPaXpxNNini/3vKHIpKwS6MVi0RXWQb58riCR AkBlo+F338GGYRF58IWRcvO6qXWXgOOtBZHVfAw5jCmCnDm+JOIWdvoC4idfT8OIQhC8 E8v6Z7GVJCx4TjEynawOwA/oBsCO7IWs2zny3He4gc5CZx4xXgiL+ZYZKr/GJBrTVB5D 1N7LdPWE0HIfbgZ2Nsb/+z18DZi5xpitltH+mMQVkqEBwzKzfiffndCNsQ9k2fSPHMnC u8yQ== X-Forwarded-Encrypted: i=1; AJvYcCXbNJ/KSyarIILcfaxmG1K11wepTsuUVb6oVRH+xaoX/0lFvcoc6yOS8j9skMyOvaLndJF1Kow=@vger.kernel.org X-Gm-Message-State: AOJu0YyZMycLpUFqsvKjmU69Amq/YTck8QW3+k46j08ueCaXjSl03mxu ReGgAEgFr/TCFVHUyArIriGH0TZ6bvvSwpsxzgQJybRiDyIKDLcZ2xQqoLvHYRA= X-Gm-Gg: ASbGncs975tU2MEaLWqKP5XWVxNU+uhwI7R7FhlPwDp5xK+GAzLZs62vEl2g1QpYsi3 U0vuNLOlWOzNqLstGMVXkr8flMflErQguKrXkcBv2vpo6X++W20eHngc2b4TIt+TxT5t+r5Oesm a0PebJ5XqKm46KN83116TIVUnZLOGvWRnOVfwr+OX/VpCqLPDmCralx1gU3raTPIEfyGLu4JzhZ f6d+9c5I4DMclI4uWEMf5imCCHsGgu6cX1os2NDdA== X-Google-Smtp-Source: AGHT+IF3aIvH6+1fmeVyXmr5GDQ6urCt42Jn4lH+KX4J2FddhDTG01lqWOestO2ZxQPMyuf+C+xUAg== X-Received: by 2002:a17:902:cec5:b0:215:8270:77e2 with SMTP id d9443c01a7336-21a83da4965mr61317305ad.0.1736374050896; Wed, 08 Jan 2025 14:07:30 -0800 (PST) Received: from localhost ([2a03:2880:ff:1f::]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dca02b34sm333615695ad.252.2025.01.08.14.07.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jan 2025 14:07:30 -0800 (PST) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry , Stanislav Fomichev , Joe Damato , Pedro Tammela Subject: [PATCH net-next v10 22/22] io_uring/zcrx: add selftest Date: Wed, 8 Jan 2025 14:06:43 -0800 Message-ID: <20250108220644.3528845-23-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250108220644.3528845-1-dw@davidwei.uk> References: <20250108220644.3528845-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Add a selftest for io_uring zero copy Rx. This test cannot run locally and requires a remote host to be configured in net.config. The remote host must have hardware support for zero copy Rx as listed in the documentation page. The test will restore the NIC config back to before the test and is idempotent. liburing is required to compile the test and be installed on the remote host running the test. Signed-off-by: David Wei --- .../selftests/drivers/net/hw/.gitignore | 2 + .../testing/selftests/drivers/net/hw/Makefile | 6 + .../selftests/drivers/net/hw/iou-zcrx.c | 432 ++++++++++++++++++ .../selftests/drivers/net/hw/iou-zcrx.py | 64 +++ 4 files changed, 504 insertions(+) create mode 100644 tools/testing/selftests/drivers/net/hw/iou-zcrx.c create mode 100755 tools/testing/selftests/drivers/net/hw/iou-zcrx.py diff --git a/tools/testing/selftests/drivers/net/hw/.gitignore b/tools/testing/selftests/drivers/net/hw/.gitignore index e9fe6ede681a..6942bf575497 100644 --- a/tools/testing/selftests/drivers/net/hw/.gitignore +++ b/tools/testing/selftests/drivers/net/hw/.gitignore @@ -1 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0-only +iou-zcrx ncdevmem diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile index 21ba64ce1e34..5431af8e8210 100644 --- a/tools/testing/selftests/drivers/net/hw/Makefile +++ b/tools/testing/selftests/drivers/net/hw/Makefile @@ -1,5 +1,7 @@ # SPDX-License-Identifier: GPL-2.0+ OR MIT +TEST_GEN_FILES = iou-zcrx + TEST_PROGS = \ csum.py \ devlink_port_split.py \ @@ -10,6 +12,7 @@ TEST_PROGS = \ ethtool_rmon.sh \ hw_stats_l3.sh \ hw_stats_l3_gre.sh \ + iou-zcrx.py \ loopback.sh \ nic_link_layer.py \ nic_performance.py \ @@ -38,3 +41,6 @@ include ../../../lib.mk # YNL build YNL_GENS := ethtool netdev include ../../../net/ynl.mk + +$(OUTPUT)/iou-zcrx: CFLAGS += -I/usr/include/ +$(OUTPUT)/iou-zcrx: LDLIBS += -luring diff --git a/tools/testing/selftests/drivers/net/hw/iou-zcrx.c b/tools/testing/selftests/drivers/net/hw/iou-zcrx.c new file mode 100644 index 000000000000..0809db134bba --- /dev/null +++ b/tools/testing/selftests/drivers/net/hw/iou-zcrx.c @@ -0,0 +1,432 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define PAGE_SIZE (4096) +#define AREA_SIZE (8192 * PAGE_SIZE) +#define SEND_SIZE (512 * 4096) +#define min(a, b) \ + ({ \ + typeof(a) _a = (a); \ + typeof(b) _b = (b); \ + _a < _b ? _a : _b; \ + }) +#define min_t(t, a, b) \ + ({ \ + t _ta = (a); \ + t _tb = (b); \ + min(_ta, _tb); \ + }) + +#define ALIGN_UP(v, align) (((v) + (align) - 1) & ~((align) - 1)) + +static int cfg_family = PF_UNSPEC; +static int cfg_server; +static int cfg_client; +static int cfg_port = 8000; +static int cfg_payload_len; +static const char *cfg_ifname; +static int cfg_queue_id = -1; + +static socklen_t cfg_alen; +static struct sockaddr_storage cfg_addr; + +static char payload[SEND_SIZE] __attribute__((aligned(PAGE_SIZE))); +static void *area_ptr; +static void *ring_ptr; +static size_t ring_size; +static struct io_uring_zcrx_rq rq_ring; +static unsigned long area_token; +static int connfd; +static bool stop; +static size_t received; + +static unsigned long gettimeofday_ms(void) +{ + struct timeval tv; + + gettimeofday(&tv, NULL); + return (tv.tv_sec * 1000) + (tv.tv_usec / 1000); +} + +static inline size_t get_refill_ring_size(unsigned int rq_entries) +{ + size_t size; + + ring_size = rq_entries * sizeof(struct io_uring_zcrx_rqe); + /* add space for the header (head/tail/etc.) */ + ring_size += PAGE_SIZE; + return ALIGN_UP(ring_size, 4096); +} + +static void setup_zcrx(struct io_uring *ring) +{ + unsigned int ifindex; + unsigned int rq_entries = 4096; + int ret; + + ifindex = if_nametoindex(cfg_ifname); + if (!ifindex) + error(1, 0, "bad interface name: %s", cfg_ifname); + + area_ptr = mmap(NULL, + AREA_SIZE, + PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, + 0, + 0); + if (area_ptr == MAP_FAILED) + error(1, 0, "mmap(): zero copy area"); + + ring_size = get_refill_ring_size(rq_entries); + ring_ptr = mmap(NULL, + ring_size, + PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, + 0, + 0); + + struct io_uring_region_desc region_reg = { + .size = ring_size, + .user_addr = (__u64)(unsigned long)ring_ptr, + .flags = IORING_MEM_REGION_TYPE_USER, + }; + + struct io_uring_zcrx_area_reg area_reg = { + .addr = (__u64)(unsigned long)area_ptr, + .len = AREA_SIZE, + .flags = 0, + }; + + struct io_uring_zcrx_ifq_reg reg = { + .if_idx = ifindex, + .if_rxq = cfg_queue_id, + .rq_entries = rq_entries, + .area_ptr = (__u64)(unsigned long)&area_reg, + .region_ptr = (__u64)(unsigned long)®ion_reg, + }; + + ret = io_uring_register_ifq(ring, ®); + if (ret) + error(1, 0, "io_uring_register_ifq(): %d", ret); + + rq_ring.khead = (unsigned int *)((char *)ring_ptr + reg.offsets.head); + rq_ring.ktail = (unsigned int *)((char *)ring_ptr + reg.offsets.tail); + rq_ring.rqes = (struct io_uring_zcrx_rqe *)((char *)ring_ptr + reg.offsets.rqes); + rq_ring.rq_tail = 0; + rq_ring.ring_entries = reg.rq_entries; + + area_token = area_reg.rq_area_token; +} + +static void add_accept(struct io_uring *ring, int sockfd) +{ + struct io_uring_sqe *sqe; + + sqe = io_uring_get_sqe(ring); + + io_uring_prep_accept(sqe, sockfd, NULL, NULL, 0); + sqe->user_data = 1; +} + +static void add_recvzc(struct io_uring *ring, int sockfd) +{ + struct io_uring_sqe *sqe; + + sqe = io_uring_get_sqe(ring); + + io_uring_prep_rw(IORING_OP_RECV_ZC, sqe, sockfd, NULL, 0, 0); + sqe->ioprio |= IORING_RECV_MULTISHOT; + sqe->user_data = 2; +} + +static void process_accept(struct io_uring *ring, struct io_uring_cqe *cqe) +{ + if (cqe->res < 0) + error(1, 0, "accept()"); + if (connfd) + error(1, 0, "Unexpected second connection"); + + connfd = cqe->res; + add_recvzc(ring, connfd); +} + +static void process_recvzc(struct io_uring *ring, struct io_uring_cqe *cqe) +{ + unsigned rq_mask = rq_ring.ring_entries - 1; + struct io_uring_zcrx_cqe *rcqe; + struct io_uring_zcrx_rqe *rqe; + struct io_uring_sqe *sqe; + uint64_t mask; + char *data; + ssize_t n; + int i; + + if (cqe->res == 0 && cqe->flags == 0) { + stop = true; + return; + } + + if (cqe->res < 0) + error(1, 0, "recvzc(): %d", cqe->res); + + if (!(cqe->flags & IORING_CQE_F_MORE)) + add_recvzc(ring, connfd); + + rcqe = (struct io_uring_zcrx_cqe *)(cqe + 1); + + n = cqe->res; + mask = (1ULL << IORING_ZCRX_AREA_SHIFT) - 1; + data = (char *)area_ptr + (rcqe->off & mask); + + for (i = 0; i < n; i++) { + if (*(data + i) != payload[(received + i)]) + error(1, 0, "payload mismatch"); + } + received += n; + + rqe = &rq_ring.rqes[(rq_ring.rq_tail & rq_mask)]; + rqe->off = (rcqe->off & IORING_ZCRX_AREA_MASK) | area_token; + rqe->len = cqe->res; + io_uring_smp_store_release(rq_ring.ktail, ++rq_ring.rq_tail); +} + +static void server_loop(struct io_uring *ring) +{ + struct io_uring_cqe *cqe; + unsigned int count = 0; + unsigned int head; + int i, ret; + + io_uring_submit_and_wait(ring, 1); + + io_uring_for_each_cqe(ring, head, cqe) { + if (cqe->user_data == 1) + process_accept(ring, cqe); + else if (cqe->user_data == 2) + process_recvzc(ring, cqe); + else + error(1, 0, "unknown cqe"); + count++; + } + io_uring_cq_advance(ring, count); +} + +static void run_server(void) +{ + unsigned int flags = 0; + struct io_uring ring; + int fd, enable, ret; + uint64_t tstop; + + fd = socket(cfg_family, SOCK_STREAM, 0); + if (fd == -1) + error(1, 0, "socket()"); + + enable = 1; + ret = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(int)); + if (ret < 0) + error(1, 0, "setsockopt(SO_REUSEADDR)"); + + ret = bind(fd, (const struct sockaddr *)&cfg_addr, sizeof(cfg_addr)); + if (ret < 0) + error(1, 0, "bind()"); + + if (listen(fd, 1024) < 0) + error(1, 0, "listen()"); + + flags |= IORING_SETUP_COOP_TASKRUN; + flags |= IORING_SETUP_SINGLE_ISSUER; + flags |= IORING_SETUP_DEFER_TASKRUN; + flags |= IORING_SETUP_SUBMIT_ALL; + flags |= IORING_SETUP_CQE32; + + io_uring_queue_init(512, &ring, flags); + + setup_zcrx(&ring); + + add_accept(&ring, fd); + + tstop = gettimeofday_ms() + 5000; + while (!stop && gettimeofday_ms() < tstop) + server_loop(&ring); + + if (!stop) + error(1, 0, "test failed\n"); +} + +static void run_client(void) +{ + ssize_t to_send = SEND_SIZE; + ssize_t sent = 0; + ssize_t chunk, res; + int fd; + + fd = socket(cfg_family, SOCK_STREAM, 0); + if (fd == -1) + error(1, 0, "socket()"); + + if (connect(fd, (void *)&cfg_addr, cfg_alen)) + error(1, 0, "connect()"); + + while (to_send) { + void *src = &payload[sent]; + + chunk = min_t(ssize_t, cfg_payload_len, to_send); + res = send(fd, src, chunk, 0); + if (res < 0) + error(1, 0, "send(): %d", sent); + sent += res; + to_send -= res; + } + + close(fd); +} + +static void usage(const char *filepath) +{ + error(1, 0, "Usage: %s (-4|-6) (-s|-c) -h -p " + "-l -i -q", filepath); +} + +static void parse_opts(int argc, char **argv) +{ + const int max_payload_len = sizeof(payload) - + sizeof(struct ipv6hdr) - + sizeof(struct tcphdr) - + 40 /* max tcp options */; + struct sockaddr_in6 *addr6 = (void *) &cfg_addr; + struct sockaddr_in *addr4 = (void *) &cfg_addr; + char *addr = NULL; + int c; + + if (argc <= 1) + usage(argv[0]); + cfg_payload_len = max_payload_len; + + while ((c = getopt(argc, argv, "46sch:p:l:i:q:")) != -1) { + switch (c) { + case '4': + if (cfg_family != PF_UNSPEC) + error(1, 0, "Pass one of -4 or -6"); + cfg_family = PF_INET; + cfg_alen = sizeof(struct sockaddr_in); + break; + case '6': + if (cfg_family != PF_UNSPEC) + error(1, 0, "Pass one of -4 or -6"); + cfg_family = PF_INET6; + cfg_alen = sizeof(struct sockaddr_in6); + break; + case 's': + if (cfg_client) + error(1, 0, "Pass one of -s or -c"); + cfg_server = 1; + break; + case 'c': + if (cfg_server) + error(1, 0, "Pass one of -s or -c"); + cfg_client = 1; + break; + case 'h': + addr = optarg; + break; + case 'p': + cfg_port = strtoul(optarg, NULL, 0); + break; + case 'l': + cfg_payload_len = strtoul(optarg, NULL, 0); + break; + case 'i': + cfg_ifname = optarg; + break; + case 'q': + cfg_queue_id = strtoul(optarg, NULL, 0); + break; + } + } + + if (cfg_server && addr) + error(1, 0, "Receiver cannot have -h specified"); + + switch (cfg_family) { + case PF_INET: + memset(addr4, 0, sizeof(*addr4)); + addr4->sin_family = AF_INET; + addr4->sin_port = htons(cfg_port); + addr4->sin_addr.s_addr = htonl(INADDR_ANY); + + if (addr && + inet_pton(AF_INET, addr, &(addr4->sin_addr)) != 1) + error(1, 0, "ipv4 parse error: %s", addr); + break; + case PF_INET6: + memset(addr6, 0, sizeof(*addr6)); + addr6->sin6_family = AF_INET6; + addr6->sin6_port = htons(cfg_port); + addr6->sin6_addr = in6addr_any; + + if (addr && + inet_pton(AF_INET6, addr, &(addr6->sin6_addr)) != 1) + error(1, 0, "ipv6 parse error: %s", addr); + break; + default: + error(1, 0, "illegal domain"); + } + + if (cfg_payload_len > max_payload_len) + error(1, 0, "-l: payload exceeds max (%d)", max_payload_len); +} + +int main(int argc, char **argv) +{ + const char *cfg_test = argv[argc - 1]; + int i; + + parse_opts(argc, argv); + + for (i = 0; i < SEND_SIZE; i++) + payload[i] = 'a' + (i % 26); + + if (cfg_server) + run_server(); + else if (cfg_client) + run_client(); + + return 0; +} diff --git a/tools/testing/selftests/drivers/net/hw/iou-zcrx.py b/tools/testing/selftests/drivers/net/hw/iou-zcrx.py new file mode 100755 index 000000000000..3998d0ad504f --- /dev/null +++ b/tools/testing/selftests/drivers/net/hw/iou-zcrx.py @@ -0,0 +1,64 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +from os import path +from lib.py import ksft_run, ksft_exit +from lib.py import NetDrvEpEnv +from lib.py import bkg, cmd, wait_port_listen + + +def _get_rx_ring_entries(cfg): + eth_cmd = "ethtool -g {} | awk '/RX:/ {{count++}} count == 2 {{print $2; exit}}'" + res = cmd(eth_cmd.format(cfg.ifname), host=cfg.remote) + return int(res.stdout) + + +def _get_combined_channels(cfg): + eth_cmd = "ethtool -l {} | awk '/Combined:/ {{count++}} count == 2 {{print $2; exit}}'" + res = cmd(eth_cmd.format(cfg.ifname), host=cfg.remote) + return int(res.stdout) + + +def _set_flow_rule(cfg, chan): + eth_cmd = "ethtool -N {} flow-type tcp6 dst-port 9999 action {} | awk '{{print $NF}}'" + res = cmd(eth_cmd.format(cfg.ifname, chan), host=cfg.remote) + return int(res.stdout) + + +def test_zcrx(cfg) -> None: + cfg.require_v6() + cfg.require_cmd("awk", remote=True) + + combined_chans = _get_combined_channels(cfg) + if combined_chans < 2: + raise KsftSkipEx('at least 2 combined channels required') + rx_ring = _get_rx_ring_entries(cfg) + + rx_cmd = f"{cfg.bin_remote} -6 -s -p 9999 -i {cfg.ifname} -q {combined_chans - 1}" + tx_cmd = f"{cfg.bin_local} -6 -c -h {cfg.remote_v6} -p 9999 -l 12840" + + try: + cmd(f"ethtool -G {cfg.ifname} rx 64", host=cfg.remote) + cmd(f"ethtool -X {cfg.ifname} equal {combined_chans - 1}", host=cfg.remote) + flow_rule_id = _set_flow_rule(cfg, combined_chans - 1) + + with bkg(rx_cmd, host=cfg.remote, exit_wait=True): + wait_port_listen(9999, proto="tcp", host=cfg.remote) + cmd(tx_cmd) + finally: + cmd(f"ethtool -N {cfg.ifname} delete {flow_rule_id}", host=cfg.remote) + cmd(f"ethtool -X {cfg.ifname} default", host=cfg.remote) + cmd(f"ethtool -G {cfg.ifname} rx {rx_ring}", host=cfg.remote) + + +def main() -> None: + with NetDrvEpEnv(__file__) as cfg: + cfg.bin_local = path.abspath(path.dirname(__file__) + "/../../../drivers/net/hw/iou-zcrx") + cfg.bin_remote = cfg.remote.deploy(cfg.bin_local) + + ksft_run(globs=globals(), case_pfx={"test_"}, args=(cfg, )) + ksft_exit() + + +if __name__ == "__main__": + main()