From patchwork Wed Jan 15 18:59:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Ousterhout X-Patchwork-Id: 13940827 Received: from smtp1.cs.Stanford.EDU (smtp1.cs.stanford.edu [171.64.64.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C80C1D5AB8 for ; Wed, 15 Jan 2025 19:00:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=171.64.64.25 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736967619; cv=none; b=urRhfpjYU3pYHlfwliObRyzf+nUkky/lEv0JleItlawKvbcu5u5dL0EDt/Z/1Ab77f60F7aPNsKfo9dsc0ddSi4+sOIwwlkCnLPDdSGZUb3A9PM2JSuptwWuxFxO45fKWNwV5zwaNpmtRsJz/IaKX3Gu2mTHTwOWnQ8JYbi03p8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736967619; c=relaxed/simple; bh=EpjgUbWZasQo0ueRLGg6KfUxrjJcjihbrnyhcJc8CiM=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=KjvSPQTEbzaycUUo7fkR1BHc5YYDF3d9WkPMNneWsc/MlR/ZhCWc3mXZNMoMIVE98IV1eg+Wtvx26WKxY61bKgnokX1hmAr8V65SMbY9B90unhUUn/ikWsLkCrmmZMXQxkFDsoGydaVnzkngTDw7yECCIF++Bek3idczoHujDsw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cs.stanford.edu; spf=pass smtp.mailfrom=cs.stanford.edu; dkim=pass (2048-bit key) header.d=cs.stanford.edu header.i=@cs.stanford.edu header.b=aKd9X3Yb; arc=none smtp.client-ip=171.64.64.25 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cs.stanford.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cs.stanford.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cs.stanford.edu header.i=@cs.stanford.edu header.b="aKd9X3Yb" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cs.stanford.edu; s=cs2308; h=Content-Transfer-Encoding:MIME-Version: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=tG1lcSBoGRdPkDm89qBLAXp4dc2n/d09TUMM39tRJXM=; t=1736967617; x=1737831617; b=aKd9X3YbiGIt/HebXAKj+wQUJis74l/uualPrbaXARmnhWjLLPFgXk7iZQ525AjrkqZkalETXkc EY2ZiGAJa7r2Pi4RyNSmq4uVZOZVhgUhh/lHMEWAdvuEykBLcQkOao+vJzwM1fm/oLMMZvGzcP+Ab zEIFoEVe3smZyya8gSHPfzeYaJhVgGecGYrhYwF329fRlqCnd82/GH0Hkw84mW/GMPgRwTaNm4Kgh j7m9MYV2tP3XHFAMVeB7dZu+tXpxlejIORRtl6xaRugdqw5sn6GtFDt6UCedA9Mse2v6Q9fhUeNBT Up1UkCm9jg4NMRhtn1u1u30/CDh1lD0s69jQ==; Received: from ouster448.stanford.edu ([172.24.72.71]:52661 helo=localhost.localdomain) by smtp1.cs.Stanford.EDU with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tY8cb-0002BF-OY; Wed, 15 Jan 2025 11:00:10 -0800 From: John Ousterhout To: netdev@vger.kernel.org Cc: pabeni@redhat.com, edumazet@google.com, horms@kernel.org, kuba@kernel.org, John Ousterhout Subject: [PATCH net-next v6 00/12] Begin upstreaming Homa transport protocol Date: Wed, 15 Jan 2025 10:59:24 -0800 Message-ID: <20250115185937.1324-1-ouster@cs.stanford.edu> X-Mailer: git-send-email 2.45.1 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Score: -101.0 X-Scan-Signature: 1d850ab87a3e5a2ab763966491000bbf X-Patchwork-Delegate: kuba@kernel.org This patch series begins the process of upstreaming the Homa transport protocol. Homa is an alternative to TCP for use in datacenter environments. It provides 10-100x reductions in tail latency for short messages relative to TCP. Its benefits are greatest for mixed workloads containing both short and long messages running under high network loads. Homa is not API-compatible with TCP: it is connectionless and message- oriented (but still reliable and flow-controlled). Homa's new API not only contributes to its performance gains, but it also eliminates the massive amount of connection state required by TCP for highly connected datacenter workloads. For more details on Homa, please consult the Homa Wiki: https://homa-transport.atlassian.net/wiki/spaces/HOMA/overview The Wiki has pointers to two papers on Homa (one of which describes this implementation) as well as man pages describing the application API and other information. There is also a GitHub repo for Homa: https://github.com/PlatformLab/HomaModule The GitHub repo contains a superset of this patch set, including: * Additional source code that will eventually be upstreamed * Extensive unit tests (which will also be upstreamed eventually) * Application-level library functions (which need to go in glibc?) * Man pages (which need to be upstreamed as well) * Benchmarking and instrumentation code For this patch series, Homa has been stripped down to the bare minimum functionality capable of actually executing remote procedure calls. (about 8000 lines of source code, compared to 15000 in the complete Homa). The remaining code will be upstreamed in smaller batches once this patch series has been accepted. Note: the code in this patch series is functional but its performance is not very interesting (about the same as TCP). The patch series is arranged to introduce the major functional components of Homa. Until the last patch has been applied, the code is inert (it will not be compiled). Note: this implementation of Homa supports both IPv4 and IPv6. v6 changes: - Make hrtimer variable in homa_timer_main static instead of stack-allocated (avoids complaints when in debug mode). - Remove unnecessary cast in homa_dst_refresh. - Replace erroneous uses of GFP_KERNEL with GFP_ATOMIC. - Check for "all ports in use" in homa_sock_init. - Refactor API for homa_rpc_reap to incorporate "reap all" feature, eliminate need for callers to specify exact amount of work to do when in "reap a few" mode. - Fix bug in homa_rpc_reap (wasn't resetting rx_frees for each iteration of outer loop). v5 changes: - Change type of start in struct homa_rcvbuf_args from void* to __u64; also add more __user annotations. - Refactor homa_interest: replace awkward ready_rpc field with two fields: rpc and rpc_ready. Added new functions homa_interest_get_rpc and homa_interest_set_rpc to encapsulate/clarify access to interest->rpc_ready. - Eliminate use of LIST_POISON1 etc. in homa_interests (use list_del_init instead of list_del). - Remove homa_next_skb function, which is obsolete, unused, and incorrect - Eliminate ipv4_to_ipv6 function (use ipv6_addr_set_v4mapped instead) - Eliminate is_mapped_ipv4 function (use ipv6_addr_v4mapped instead) - Use __u64 instead of uint64_t in homa.h - Remove 'extern "C"' from homa.h - Various fixes from patchwork checks (checkpatch.pl, etc.) - A few improvements to comments v4 changes: - Remove sport argument for homa_find_server_rpc (unneeded). Also remove client_port field from struct homa_ack - Refactor ICMP packet handling (v6 was incorrect) - Check for socket shutdown in homa_poll - Fix potential for memory garbling in homa_symbol_for_type - Remove unused ETHERNET_MAX_PAYLOAD declaration - Rename classes in homa_wire.h so they all have "homa_" prefixes - Various fixes from patchwork checks (checkpatch.pl, etc.) - A few improvements to comments v3 changes: - Fix formatting in Kconfig - Set ipv6_pinfo_offset in struct proto - Check return value of inet6_register_protosw - In homa_load cleanup, don't cleanup things that haven't been initialized - Add MODULE_ALIAS_NET_PF_PROTO_TYPE to auto-load module - Check return value from kzalloc call in homa_sock_init - Change SO_HOMA_SET_BUF to SO_HOMA_RCVBUF - Change struct homa_set_buf_args to struct homa_rcvbuf_args - Implement getsockopt for SO_HOMA_RCVBUF - Return ENOPROTOOPT instead of EINVAL where appropriate in setsockopt and getsockopt - Fix crash in homa_pool_check_waiting if pool has no region yet - Check for NULL msg->msg_name in homa_sendmsg - Change addr->in6.sin6_family to addr->sa.sa_family in homa_sendmsg for clarity - For some errors in homa_recvmsg, return directly rather than "goto done" - Return error from recvmsg if offsets of returned read buffers are bogus - Added comments to clarify lock-unlock pairs for RPCs - Renamed homa_try_bucket_lock to homa_try_rpc_lock - Fix issues found by test robot and checkpatch.pl - Ensure first argument to do_div is 64 bits - Remove C++ style comments - Removed some code that will only be relevant in future patches that fill in missing Homa functionality v2 changes: - Remove sockaddr_in_union declaration from public API in homa.h - Remove kernel wrapper functions (homa_send, etc.) from homa.h - Fix many sparse warnings (still more work to do here) and other issues uncovered by test robot - Fix checkpatch.pl issues - Remove residual code related to unit tests - Remove references to tt_record from comments - Make it safe to delete sockets during homa_socktab scans - Use uintptr_t for portability fo 32-bit platforms - Use do_div instead of "/" for portability - Remove homa->busy_usecs and homa->gro_busy_usecs (not needed in this stripped down version of Homa) - Eliminate usage of cpu_khz, use sched_clock instead of get_cycles - Add missing checks of kmalloc return values - Remove "inline" qualifier from functions in .c files - Document that pad fields must be zero - Use more precise type "uint32_t" rather than "int" - Remove unneeded #include of linux/version.h John Ousterhout (12): net: homa: define user-visible API for Homa net: homa: create homa_wire.h net: homa: create shared Homa header files net: homa: create homa_pool.h and homa_pool.c net: homa: create homa_rpc.h and homa_rpc.c net: homa: create homa_peer.h and homa_peer.c net: homa: create homa_sock.h and homa_sock.c net: homa: create homa_incoming.c net: homa: create homa_outgoing.c net: homa: create homa_timer.c net: homa: create homa_plumbing.c and homa_utils.c net: homa: create Makefile and Kconfig MAINTAINERS | 7 + include/uapi/linux/homa.h | 161 ++++++ net/Kconfig | 1 + net/Makefile | 1 + net/homa/Kconfig | 19 + net/homa/Makefile | 14 + net/homa/homa_impl.h | 711 ++++++++++++++++++++++++ net/homa/homa_incoming.c | 1076 +++++++++++++++++++++++++++++++++++++ net/homa/homa_outgoing.c | 855 +++++++++++++++++++++++++++++ net/homa/homa_peer.c | 366 +++++++++++++ net/homa/homa_peer.h | 233 ++++++++ net/homa/homa_plumbing.c | 1004 ++++++++++++++++++++++++++++++++++ net/homa/homa_pool.c | 453 ++++++++++++++++ net/homa/homa_pool.h | 154 ++++++ net/homa/homa_rpc.c | 494 +++++++++++++++++ net/homa/homa_rpc.h | 458 ++++++++++++++++ net/homa/homa_sock.c | 388 +++++++++++++ net/homa/homa_sock.h | 410 ++++++++++++++ net/homa/homa_stub.h | 81 +++ net/homa/homa_timer.c | 157 ++++++ net/homa/homa_utils.c | 166 ++++++ net/homa/homa_wire.h | 367 +++++++++++++ 22 files changed, 7576 insertions(+) create mode 100644 include/uapi/linux/homa.h create mode 100644 net/homa/Kconfig create mode 100644 net/homa/Makefile create mode 100644 net/homa/homa_impl.h create mode 100644 net/homa/homa_incoming.c create mode 100644 net/homa/homa_outgoing.c create mode 100644 net/homa/homa_peer.c create mode 100644 net/homa/homa_peer.h create mode 100644 net/homa/homa_plumbing.c create mode 100644 net/homa/homa_pool.c create mode 100644 net/homa/homa_pool.h create mode 100644 net/homa/homa_rpc.c create mode 100644 net/homa/homa_rpc.h create mode 100644 net/homa/homa_sock.c create mode 100644 net/homa/homa_sock.h create mode 100644 net/homa/homa_stub.h create mode 100644 net/homa/homa_timer.c create mode 100644 net/homa/homa_utils.c create mode 100644 net/homa/homa_wire.h --- 2.34.1