From patchwork Mon Oct 9 18:27:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 13414237 X-Patchwork-Delegate: dsahern@gmail.com Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35EBF3AC23 for ; Mon, 9 Oct 2023 18:28:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RSTsbPgW" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56FF1B7 for ; Mon, 9 Oct 2023 11:28:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696876082; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g9XW6ph3BbRHoQ33LKECupveYptvcVRGWd/vaH0q3ZA=; b=RSTsbPgWnxyDVpQbY5QOLedRBplCNMnt4oknu3Jir8KgyLHMA1MAxAaPj6Zl70FNVpvzFT NBIMGQT/wxM0fKwbeE2Ie0G5Sc/MU0/8h74+vUUrK4jX83Bsig47xJPgZ3Yc5asOPWWD28 3djaeVb/BwEWG079nhB8sogtv4/ZAec= Received: from mail-yw1-f198.google.com (mail-yw1-f198.google.com [209.85.128.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-573-KRMEigrRNaybd73ZkOgfHg-1; Mon, 09 Oct 2023 14:28:01 -0400 X-MC-Unique: KRMEigrRNaybd73ZkOgfHg-1 Received: by mail-yw1-f198.google.com with SMTP id 00721157ae682-5a22029070bso72680297b3.3 for ; Mon, 09 Oct 2023 11:28:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696876080; x=1697480880; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g9XW6ph3BbRHoQ33LKECupveYptvcVRGWd/vaH0q3ZA=; b=KVII2YdsDsjECG9sF/UZFhjZb1jYpPQBM9JV0k9iAVpzJKW5qpCzNxOidvY6nUmbl9 nbCuqpp300pB6J2WHg/dngQYRaf8bba+FZXJZRSBSf+g9qu1H8dWzi8MOfgvmF06r/H8 2jNzN/XsQAEYXg+ST6lb+uiupHjgvVN1oDOsUmvJMvHm5KSemImVH64DkZAYfiEO8J84 0pFzO0ZD30WemxEUNKYGTfOxcdz002iNLUETF1u8GLQT9XEVnBf3jtpo1hUGoJFMf6ww nh3VY3aUCc/cCYKuMcxFiGyYZCci+98zHQcVlU8axVckvfWWoxGloi2OzXS65SQlB3cN fUtA== X-Gm-Message-State: AOJu0YxXYWTm37n0iNYtZme23SlAtEgUB2A60ETN9L6LsLKy043NKspz c26RVL8BNB87dG5QFeGl4X+u1iCqoFZE0mmz6O+IQoZWCPNBBga0S8V1PViPfVf9e2akp2UYjpn CvyaQB12j5drxDlpj X-Received: by 2002:a0d:ca4f:0:b0:5a7:ba54:af02 with SMTP id m76-20020a0dca4f000000b005a7ba54af02mr332360ywd.38.1696876080523; Mon, 09 Oct 2023 11:28:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGcxS5YpO3dKoZVXEM2VQeGCGJDZeArO2JkqF1R76gqufaKuHfOqItVYTfBjdreY/kpKBferA== X-Received: by 2002:a0d:ca4f:0:b0:5a7:ba54:af02 with SMTP id m76-20020a0dca4f000000b005a7ba54af02mr332350ywd.38.1696876080259; Mon, 09 Oct 2023 11:28:00 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id s7-20020a817707000000b005707fb5110bsm3859204ywc.58.2023.10.09.11.27.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Oct 2023 11:27:58 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 2DC07E5820D; Mon, 9 Oct 2023 20:27:57 +0200 (CEST) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: David Ahern , Stephen Hemminger Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Nicolas Dichtel , Christian Brauner , "Eric W . Biederman" , David Laight Subject: [RFC PATCH iproute2-next 1/5] ip: Mount netns in child process instead of from inside the new namespace Date: Mon, 9 Oct 2023 20:27:49 +0200 Message-ID: <20231009182753.851551-2-toke@redhat.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231009182753.851551-1-toke@redhat.com> References: <20231009182753.851551-1-toke@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: dsahern@gmail.com X-Patchwork-State: RFC Refactor the netns creation code so that we offload the mounting of the namespace file to a child process instead of bind mounting from inside the newly created namespace. This is done in preparation for also persisting a mount namespace; the mount namespace reference cannot be bind-mounted from inside that namespace itself, so we need to mount that from a child process anyway. The child process approach (as well as some of the helper functions used for it) is adapted from the code in the unshare(1) utility that is part of util-linux. No functional change is intended with this patch. Signed-off-by: Toke Høiland-Jørgensen --- ip/ipnetns.c | 184 ++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 130 insertions(+), 54 deletions(-) diff --git a/ip/ipnetns.c b/ip/ipnetns.c index 9d996832aef8..a05d84514326 100644 --- a/ip/ipnetns.c +++ b/ip/ipnetns.c @@ -1,4 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 */ +#include #define _ATFILE_SOURCE #include #include @@ -7,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -25,6 +27,9 @@ #include "namespace.h" #include "json_print.h" +/* synchronize parent and child by pipe */ +#define PIPE_SYNC_BYTE 0x06 + static int usage(void) { fprintf(stderr, @@ -46,7 +51,6 @@ static int usage(void) static struct rtnl_handle rtnsh = { .fd = -1 }; static int have_rtnl_getnsid = -1; -static int saved_netns = -1; static struct link_filter filter; static int ipnetns_accept_msg(struct rtnl_ctrl_data *ctrl, @@ -768,31 +772,131 @@ static int create_netns_dir(void) return 0; } -/* Obtain a FD for the current namespace, so we can reenter it later */ -static void netns_save(void) +/** + * waitchild() - Wait for a process to exit successfully + * @pid: PID of the process to wait for + * + * Wait for a process to exit successfully and return its exit status. + */ +static int waitchild(int pid) { - if (saved_netns != -1) - return; + int rc, status; + + do { + rc = waitpid(pid, &status, 0); + if (rc < 0) { + if (errno == EINTR) + continue; + return -errno; + } + if (WIFEXITED(status) && + WEXITSTATUS(status) != EXIT_SUCCESS) + return WEXITSTATUS(status); + } while (rc < 0); - saved_netns = open("/proc/self/ns/net", O_RDONLY | O_CLOEXEC); - if (saved_netns == -1) { - perror("Cannot open init namespace"); - exit(1); + return 0; +} + +/** + * sync_with_child() - Tell our child we're ready and wait for it to exit + * @pid: The pid of our child + * @fd: A file descriptor created with eventfd() + * + * This tells a child created with fork_and_wait() that we are ready for it to + * continue. Once we have done that, wait for our child to exit. + */ +static int sync_with_child(pid_t pid, int fd) +{ + uint64_t ch = PIPE_SYNC_BYTE; + + write(fd, &ch, sizeof(ch)); + close(fd); + + return waitchild(pid); +} + +/** + * fork_and_wait() - Fork and wait to be sync'd with + * @fd - A file descriptor created with eventfd() which should be passed to + * sync_with_child() + * + * This creates an eventfd and forks. The parent process returns immediately, + * but the child waits for a %PIPE_SYNC_BYTE on the eventfd before returning. + * This allows the parent to perform some tasks before the child starts its + * work. The parent should call sync_with_child() once it is ready for the + * child to continue. + * + * Return: The pid from fork() + */ +static pid_t fork_and_wait(int *fd) +{ + uint64_t ch; + pid_t pid; + + *fd = eventfd(0, 0); + if (*fd < 0) { + fprintf(stderr, "eventfd failed: %s\n", strerror(errno)); + exit(EXIT_FAILURE); + } + + pid = fork(); + if (pid < 0) { + fprintf(stderr, "fork failed: %s\n", strerror(errno)); + exit(EXIT_FAILURE); + } + + if (!pid) { + /* wait for the our parent to tell us to continue */ + if (read(*fd, (char *)&ch, sizeof(ch)) != sizeof(ch) || + ch != PIPE_SYNC_BYTE) { + fprintf(stderr, "failed to read eventfd\n"); + exit(EXIT_FAILURE); + } + close(*fd); } + + return pid; } -static void netns_restore(void) +static int bind_ns_file(const char *parent, const char *nsfile, + const char *ns_name, pid_t target_pid) { - if (saved_netns == -1) - return; + char ns_path[PATH_MAX], proc_path[PATH_MAX]; + int fd; - if (setns(saved_netns, CLONE_NEWNET)) { - perror("setns"); - exit(1); + snprintf(ns_path, sizeof(ns_path), "%s/%s", parent, ns_name); + snprintf(proc_path, sizeof(proc_path), "/proc/%d/ns/%s", target_pid, nsfile); + + /* Create the filesystem state */ + fd = open(ns_path, O_RDONLY|O_CREAT|O_EXCL, 0); + if (fd < 0) { + fprintf(stderr, "Cannot create namespace file \"%s\": %s\n", + ns_path, strerror(errno)); + return -1; } + close(fd); - close(saved_netns); - saved_netns = -1; + if (mount(proc_path, ns_path, "none", MS_BIND, NULL) < 0) { + fprintf(stderr, "Bind %s -> %s failed: %s\n", proc_path, + ns_path, strerror(errno)); + unlink(ns_path); + return -1; + } + return 0; +} + +static pid_t bind_ns_files_from_child(const char *ns_name, pid_t target_pid, + int *fd) +{ + pid_t child; + + child = fork_and_wait(fd); + if (child) + return child; + + if (bind_ns_file(NETNS_RUN_DIR, "net", ns_name, target_pid)) + exit(EXIT_FAILURE); + exit(EXIT_SUCCESS); } static int netns_add(int argc, char **argv, bool create) @@ -808,10 +912,9 @@ static int netns_add(int argc, char **argv, bool create) * userspace tweaks like remounting /sys, or bind mounting * a new /etc/resolv.conf can be shared between users. */ - char netns_path[PATH_MAX], proc_path[PATH_MAX]; const char *name; - pid_t pid; - int fd; + pid_t pid, child; + int event_fd; int lock; int made_netns_run_dir_mount = 0; @@ -820,6 +923,7 @@ static int netns_add(int argc, char **argv, bool create) fprintf(stderr, "No netns name specified\n"); return -1; } + pid = getpid(); } else { if (argc < 2) { fprintf(stderr, "No netns name and PID specified\n"); @@ -833,8 +937,6 @@ static int netns_add(int argc, char **argv, bool create) } name = argv[0]; - snprintf(netns_path, sizeof(netns_path), "%s/%s", NETNS_RUN_DIR, name); - if (create_netns_dir()) return -1; @@ -894,46 +996,20 @@ static int netns_add(int argc, char **argv, bool create) close(lock); } - /* Create the filesystem state */ - fd = open(netns_path, O_RDONLY|O_CREAT|O_EXCL, 0); - if (fd < 0) { - fprintf(stderr, "Cannot create namespace file \"%s\": %s\n", - netns_path, strerror(errno)); - return -1; - } - close(fd); + child = bind_ns_files_from_child(name, pid, &event_fd); + if (child < 0) + exit(EXIT_FAILURE); if (create) { - netns_save(); if (unshare(CLONE_NEWNET) < 0) { fprintf(stderr, "Failed to create a new network namespace \"%s\": %s\n", name, strerror(errno)); - goto out_delete; + close(event_fd); + exit(EXIT_FAILURE); } - - strcpy(proc_path, "/proc/self/ns/net"); - } else { - snprintf(proc_path, sizeof(proc_path), "/proc/%d/ns/net", pid); - } - - /* Bind the netns last so I can watch for it */ - if (mount(proc_path, netns_path, "none", MS_BIND, NULL) < 0) { - fprintf(stderr, "Bind %s -> %s failed: %s\n", - proc_path, netns_path, strerror(errno)); - goto out_delete; } - netns_restore(); - return 0; -out_delete: - if (create) { - netns_restore(); - netns_delete(argc, argv); - } else if (unlink(netns_path) < 0) { - fprintf(stderr, "Cannot remove namespace file \"%s\": %s\n", - netns_path, strerror(errno)); - } - return -1; + return sync_with_child(child, event_fd); } int set_netnsid_from_name(const char *name, int nsid) From patchwork Mon Oct 9 18:27:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 13414236 X-Patchwork-Delegate: dsahern@gmail.com Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF4C538BBD for ; Mon, 9 Oct 2023 18:28:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="h/oKi9+X" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B515AC for ; Mon, 9 Oct 2023 11:28:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696876082; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VYH9XfkihIXpLbV+51gJXEgU/bDFdcyGZt2r9bCG9Xw=; b=h/oKi9+XoGJQFfO2dfPjQZ8Rv8wSRJmU6GAJfixq8hs0YLsl//bCzxUQjhc//QyfliFFfo sZYhUNVdN5TX66prlJ1F9xAuk1m8mFRQ/+MHmrEafPpTo4y/eW5cCxy6rIa2vuFIfLqcyM d2SS25+7PQfdWQqqBjbqpM129GYh3GQ= Received: from mail-yw1-f199.google.com (mail-yw1-f199.google.com [209.85.128.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-636-8-hY48duMq2PUlj-1UPVNA-1; Mon, 09 Oct 2023 14:28:00 -0400 X-MC-Unique: 8-hY48duMq2PUlj-1UPVNA-1 Received: by mail-yw1-f199.google.com with SMTP id 00721157ae682-5a234ffeb90so74081167b3.3 for ; Mon, 09 Oct 2023 11:28:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696876080; x=1697480880; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VYH9XfkihIXpLbV+51gJXEgU/bDFdcyGZt2r9bCG9Xw=; b=iGSqwtE46uXPsnqWkJJykROmzNrdLOi7rXE28zqxVFihmYxiU6sHBoqoXf8vHpJZeE yMHHSHkt0ISJqZeP7g9briHShyh0XFglWADClCQ12n7oVXISqtTmQ5ialI6qBHQLBJAs NEoTg1OfFSVxlzBzzm8eMhtyNGnJM6MtEVq9l/K3HhwN9Bes1SKnCX0+m3y2+xBmFgnB ALVPFbOoayVXFcOResmhHPUoTc32CgaOYjQhq3G1Gqu6WyFpqTlVtalFc/x1yrzuM59U wyeE9DJUcqFcpjTYli2UsF0nWDZ5Ly3dm8I3fyqS04cIeMb8KaYTTlPdcpZA8f45vtL3 dvCA== X-Gm-Message-State: AOJu0YwBrARKZWGyqThwWNlXBFRXJqNAlWIF+BPO6ijtj/9+AoAfGgEc 7W8UvbrskN/umK9h7Rm9iEBXnoLqtuYdf0O0120WMbiRBMPgqWYMxA5HcJroKtx4cvR1jVo8jCS AqxpB6fsp9OapVQbq X-Received: by 2002:a0d:d785:0:b0:56c:e70b:b741 with SMTP id z127-20020a0dd785000000b0056ce70bb741mr16834881ywd.20.1696876080349; Mon, 09 Oct 2023 11:28:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHV/UNn8eKeh/Ij0IuhQsYuvVMppjrCwc8tLc+/elkrb9XecCAykABhic9iVdMNI21PIDzW2w== X-Received: by 2002:a0d:d785:0:b0:56c:e70b:b741 with SMTP id z127-20020a0dd785000000b0056ce70bb741mr16834864ywd.20.1696876080070; Mon, 09 Oct 2023 11:28:00 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id l21-20020a81ad15000000b005a20ab8a184sm109092ywh.31.2023.10.09.11.27.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Oct 2023 11:27:58 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 32404E5820F; Mon, 9 Oct 2023 20:27:57 +0200 (CEST) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: David Ahern , Stephen Hemminger Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Nicolas Dichtel , Christian Brauner , "Eric W . Biederman" , David Laight Subject: [RFC PATCH iproute2-next 2/5] ip: Split out code creating namespace mount dir so it can be reused Date: Mon, 9 Oct 2023 20:27:50 +0200 Message-ID: <20231009182753.851551-3-toke@redhat.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231009182753.851551-1-toke@redhat.com> References: <20231009182753.851551-1-toke@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: dsahern@gmail.com X-Patchwork-State: RFC Move the code creating the parent directory for namespace references into its own function, so it can be reused for creating a separate directory to contain mount namespace references. No functional change is intended with this patch. Signed-off-by: Toke Høiland-Jørgensen --- ip/ipnetns.c | 127 ++++++++++++++++++++++++++++----------------------- 1 file changed, 69 insertions(+), 58 deletions(-) diff --git a/ip/ipnetns.c b/ip/ipnetns.c index a05d84514326..529790482683 100644 --- a/ip/ipnetns.c +++ b/ip/ipnetns.c @@ -758,13 +758,13 @@ static int netns_delete(int argc, char **argv) return on_netns_del(argv[0], NULL); } -static int create_netns_dir(void) +static int ensure_dir(const char *path) { /* Create the base netns directory if it doesn't exist */ - if (mkdir(NETNS_RUN_DIR, S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH)) { + if (mkdir(path, S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH)) { if (errno != EEXIST) { fprintf(stderr, "mkdir %s failed: %s\n", - NETNS_RUN_DIR, strerror(errno)); + path, strerror(errno)); return -1; } } @@ -899,53 +899,15 @@ static pid_t bind_ns_files_from_child(const char *ns_name, pid_t target_pid, exit(EXIT_SUCCESS); } -static int netns_add(int argc, char **argv, bool create) +static int prepare_ns_mount_dir(const char *target_dir, int mount_flag) { - /* This function creates a new network namespace and - * a new mount namespace and bind them into a well known - * location in the filesystem based on the name provided. - * - * If create is true, a new namespace will be created, - * otherwise an existing one will be attached to the file. - * - * The mount namespace is created so that any necessary - * userspace tweaks like remounting /sys, or bind mounting - * a new /etc/resolv.conf can be shared between users. - */ - const char *name; - pid_t pid, child; - int event_fd; + int made_dir_mount = 0; int lock; - int made_netns_run_dir_mount = 0; - if (create) { - if (argc < 1) { - fprintf(stderr, "No netns name specified\n"); - return -1; - } - pid = getpid(); - } else { - if (argc < 2) { - fprintf(stderr, "No netns name and PID specified\n"); - return -1; - } - - if (get_s32(&pid, argv[1], 0) || !pid) { - fprintf(stderr, "Invalid PID: %s\n", argv[1]); - return -1; - } - } - name = argv[0]; - - if (create_netns_dir()) + if (ensure_dir(target_dir)) return -1; - /* Make it possible for network namespace mounts to propagate between - * mount namespaces. This makes it likely that a unmounting a network - * namespace file in one namespace will unmount the network namespace - * file in all namespaces allowing the network namespace to be freed - * sooner. - * These setup steps need to happen only once, as if multiple ip processes + /* These setup steps need to happen only once, as if multiple ip processes * try to attempt the same operation at the same time, the mountpoints will * be recursively created multiple times, eventually causing the system * to lock up. For example, this has been observed when multiple netns @@ -955,23 +917,23 @@ static int netns_add(int argc, char **argv, bool create) * this cannot happen, but proceed nonetheless if it cannot happen for any * reason. */ - lock = open(NETNS_RUN_DIR, O_RDONLY|O_DIRECTORY, 0); + lock = open(target_dir, O_RDONLY|O_DIRECTORY, 0); if (lock < 0) { - fprintf(stderr, "Cannot open netns runtime directory \"%s\": %s\n", - NETNS_RUN_DIR, strerror(errno)); + fprintf(stderr, "Cannot open ns runtime directory \"%s\": %s\n", + target_dir, strerror(errno)); return -1; } if (flock(lock, LOCK_EX) < 0) { - fprintf(stderr, "Warning: could not flock netns runtime directory \"%s\": %s\n", - NETNS_RUN_DIR, strerror(errno)); + fprintf(stderr, "Warning: could not flock ns runtime directory \"%s\": %s\n", + target_dir, strerror(errno)); close(lock); lock = -1; } - while (mount("", NETNS_RUN_DIR, "none", MS_SHARED | MS_REC, NULL)) { + while (mount("", target_dir, "none", mount_flag | MS_REC, NULL)) { /* Fail unless we need to make the mount point */ - if (errno != EINVAL || made_netns_run_dir_mount) { + if (errno != EINVAL || made_dir_mount) { fprintf(stderr, "mount --make-shared %s failed: %s\n", - NETNS_RUN_DIR, strerror(errno)); + target_dir, strerror(errno)); if (lock != -1) { flock(lock, LOCK_UN); close(lock); @@ -979,23 +941,72 @@ static int netns_add(int argc, char **argv, bool create) return -1; } - /* Upgrade NETNS_RUN_DIR to a mount point */ - if (mount(NETNS_RUN_DIR, NETNS_RUN_DIR, "none", MS_BIND | MS_REC, NULL)) { + /* Upgrade target directory to a mount point */ + if (mount(target_dir, target_dir, "none", MS_BIND | MS_REC, NULL)) { fprintf(stderr, "mount --bind %s %s failed: %s\n", - NETNS_RUN_DIR, NETNS_RUN_DIR, strerror(errno)); + target_dir, target_dir, strerror(errno)); if (lock != -1) { flock(lock, LOCK_UN); close(lock); } return -1; } - made_netns_run_dir_mount = 1; + made_dir_mount = 1; } if (lock != -1) { flock(lock, LOCK_UN); close(lock); } + return 0; +} + +static int netns_add(int argc, char **argv, bool create) +{ + /* This function creates a new network namespace and + * a new mount namespace and bind them into a well known + * location in the filesystem based on the name provided. + * + * If create is true, a new namespace will be created, + * otherwise an existing one will be attached to the file. + * + * The mount namespace is created so that any necessary + * userspace tweaks like remounting /sys, or bind mounting + * a new /etc/resolv.conf can be shared between users. + */ + const char *name; + pid_t pid, child; + int event_fd; + + if (create) { + if (argc < 1) { + fprintf(stderr, "No netns name specified\n"); + return -1; + } + pid = getpid(); + } else { + if (argc < 2) { + fprintf(stderr, "No netns name and PID specified\n"); + return -1; + } + + if (get_s32(&pid, argv[1], 0) || !pid) { + fprintf(stderr, "Invalid PID: %s\n", argv[1]); + return -1; + } + } + name = argv[0]; + + /* Pass the MS_SHARED flag to the mount of the network namespace + * directory to make it possible for network namespace mounts to + * propagate between mount namespaces. This makes it likely that a + * unmounting a network namespace file in one namespace will unmount the + * network namespace file in all namespaces allowing the network + * namespace to be freed sooner. + */ + if (prepare_ns_mount_dir(NETNS_RUN_DIR, MS_SHARED)) + return -1; + child = bind_ns_files_from_child(name, pid, &event_fd); if (child < 0) exit(EXIT_FAILURE); @@ -1079,7 +1090,7 @@ static int netns_monitor(int argc, char **argv) return -1; } - if (create_netns_dir()) + if (ensure_dir(NETNS_RUN_DIR)) return -1; if (inotify_add_watch(fd, NETNS_RUN_DIR, IN_CREATE | IN_DELETE) < 0) { From patchwork Mon Oct 9 18:27:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 13414239 X-Patchwork-Delegate: dsahern@gmail.com Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 172EA38BBD for ; Mon, 9 Oct 2023 18:28:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="B/RTM4dz" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7549C9C for ; Mon, 9 Oct 2023 11:28:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696876091; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GTQXYJd8i3E7CzEfytGs5GOKAgtU9bg1Astb3Kifl9o=; b=B/RTM4dziSEtFgM+pf9Y7DBx3DqaY8rScpefUOpZIPs38sDCKKsCu7ujfJUov/QeSk9oKG e81Be4/eDNWSQaiR5hrjUskz5SOANAuZg8WAPhAjZVT/2EaprbRNC0oHE8H4EIGGGBfRL1 pxXN2ZQUi797mRSXMJT7dNsxjjfaxQQ= Received: from mail-yw1-f199.google.com (mail-yw1-f199.google.com [209.85.128.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-636-GYV4EpolP1-mXu8iW8vWtw-1; Mon, 09 Oct 2023 14:28:00 -0400 X-MC-Unique: GYV4EpolP1-mXu8iW8vWtw-1 Received: by mail-yw1-f199.google.com with SMTP id 00721157ae682-59e8ebc0376so72488587b3.2 for ; Mon, 09 Oct 2023 11:28:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696876080; x=1697480880; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GTQXYJd8i3E7CzEfytGs5GOKAgtU9bg1Astb3Kifl9o=; b=TGmlCZHMKIxpXVc5BWUD2KtsgVjoQ6qQ+sVI+o217rMbtGkQjmVEgbU3UvmL2/uREr uGQN4UOLOD9F7mdlakt77HE7bhTbLht77BDa4nG6wP6+ZbSuLBgiVFPHbVZAM1nX5Euu nvp3Gx4dLEtDZxe1CJ1j+XTTpqTXLaxkb3aIVRucQWpkr3HA8Tx0wrvBC/VTiqzt+x8W q1v5FtTwKdV2+bj5wE5EfPFslG4SSrEXrKtVstH4cX2fof2kXOrIwGc9BTnB39eJaE4/ 9EfDfulfVI4vHCMqujX82HJuyIE1ln2IA8NCKS0qpV/7qyR61tK8xox+ZkedTkZEzkLM GL8w== X-Gm-Message-State: AOJu0Yz1LXOKS/e/1qczbJ2Wi4qVearXUG5fcFFUHsaEIkW2Bz0nEe4W sxQ2Q/0X2F2WnnTzqgUf2VJ+usEkzLTpAl4cRLH4lOA9BbyBVZ9642KXU4KtrKLj86sDMh60+8F IhIaw2PLRbRPsFN0M X-Received: by 2002:a0d:d808:0:b0:59b:d796:2a55 with SMTP id a8-20020a0dd808000000b0059bd7962a55mr19175708ywe.1.1696876079984; Mon, 09 Oct 2023 11:27:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFTNj0AbK9ekwU00cCwK6y/Lj3Gh+lW9brvWZW0cVRJ/uvWejhGnnRItros79pLhQKtpdlApA== X-Received: by 2002:a0d:d808:0:b0:59b:d796:2a55 with SMTP id a8-20020a0dd808000000b0059bd7962a55mr19175699ywe.1.1696876079736; Mon, 09 Oct 2023 11:27:59 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id k7-20020a0dc807000000b0057736c436f1sm3789582ywd.141.2023.10.09.11.27.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Oct 2023 11:27:58 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 358F6E58211; Mon, 9 Oct 2023 20:27:57 +0200 (CEST) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: David Ahern , Stephen Hemminger Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Nicolas Dichtel , Christian Brauner , "Eric W . Biederman" , David Laight Subject: [RFC PATCH iproute2-next 3/5] lib/namespace: Factor out code for reuse Date: Mon, 9 Oct 2023 20:27:51 +0200 Message-ID: <20231009182753.851551-4-toke@redhat.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231009182753.851551-1-toke@redhat.com> References: <20231009182753.851551-1-toke@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: dsahern@gmail.com X-Patchwork-State: RFC Factor out the code that switches namespaces and the code that sets up a new mount namespace into utility functions that can be reused when we add mount namespace pinning. No functional change is intended with this patch. Signed-off-by: Toke Høiland-Jørgensen --- include/namespace.h | 1 + lib/namespace.c | 73 ++++++++++++++++++++++++++++++++------------- 2 files changed, 54 insertions(+), 20 deletions(-) diff --git a/include/namespace.h b/include/namespace.h index e47f9b5d49d1..b694a12e8397 100644 --- a/include/namespace.h +++ b/include/namespace.h @@ -49,6 +49,7 @@ static inline int setns(int fd, int nstype) } #endif /* HAVE_SETNS */ +int prepare_mountns(const char *name, bool do_unshare); int netns_switch(char *netns); int netns_get_fd(const char *netns); int netns_foreach(int (*func)(char *nsname, void *arg), void *arg); diff --git a/lib/namespace.c b/lib/namespace.c index 1202fa85f97d..5e310762f34b 100644 --- a/lib/namespace.c +++ b/lib/namespace.c @@ -11,6 +11,25 @@ #include "utils.h" #include "namespace.h" +static struct namespace_typename { + int type; /* CLONE_NEW* */ + const char *name; /* */ +} namespace_names[] = { + { .type = CLONE_NEWNET, .name = "network" }, + { .type = CLONE_NEWNS, .name = "mount" }, + { .name = NULL } +}; + +static const char *ns_typename(int nstype) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(namespace_names); i++) + if (namespace_names[i].type == nstype) + return namespace_names[i].name; + return NULL; +} + static void bind_etc(const char *name) { char etc_netns_path[sizeof(NETNS_ETC_DIR) + NAME_MAX]; @@ -42,30 +61,12 @@ static void bind_etc(const char *name) closedir(dir); } -int netns_switch(char *name) +int prepare_mountns(const char *name, bool do_unshare) { - char net_path[PATH_MAX]; - int netns; unsigned long mountflags = 0; struct statvfs fsstat; - snprintf(net_path, sizeof(net_path), "%s/%s", NETNS_RUN_DIR, name); - netns = open(net_path, O_RDONLY | O_CLOEXEC); - if (netns < 0) { - fprintf(stderr, "Cannot open network namespace \"%s\": %s\n", - name, strerror(errno)); - return -1; - } - - if (setns(netns, CLONE_NEWNET) < 0) { - fprintf(stderr, "setting the network namespace \"%s\" failed: %s\n", - name, strerror(errno)); - close(netns); - return -1; - } - close(netns); - - if (unshare(CLONE_NEWNS) < 0) { + if (do_unshare && unshare(CLONE_NEWNS) < 0) { fprintf(stderr, "unshare failed: %s\n", strerror(errno)); return -1; } @@ -97,6 +98,38 @@ int netns_switch(char *name) return 0; } +static int switch_ns(const char *parent_dir, const char *name, int nstype) +{ + char ns_path[PATH_MAX]; + int ns_fd; + + snprintf(ns_path, sizeof(ns_path), "%s/%s", parent_dir, name); + ns_fd = open(ns_path, O_RDONLY | O_CLOEXEC); + if (ns_fd < 0) { + fprintf(stderr, "Cannot open %s namespace \"%s\": %s\n", + ns_typename(nstype), name, strerror(errno)); + return -1; + } + + if (setns(ns_fd, nstype) < 0) { + fprintf(stderr, "setting the %s namespace \"%s\" failed: %s\n", + ns_typename(nstype), name, strerror(errno)); + close(ns_fd); + return -1; + } + close(ns_fd); + return 0; +} + +int netns_switch(char *name) +{ + + if (switch_ns(NETNS_RUN_DIR, name, CLONE_NEWNET)) + return -1; + + return prepare_mountns(name, true); +} + int netns_get_fd(const char *name) { char pathbuf[PATH_MAX]; From patchwork Mon Oct 9 18:27:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 13414238 X-Patchwork-Delegate: dsahern@gmail.com Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 389303AC2E for ; Mon, 9 Oct 2023 18:28:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="T0KtZr6r" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB1F2B4 for ; Mon, 9 Oct 2023 11:28:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696876083; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=orRdjx9+852xXN15Pq+UxCnXmyKpm9bkQ+d1p7lpjng=; b=T0KtZr6r/S0IrG/9660ELnmdzSaSGgSTAnPvd6mIlW62behyTpbQpQhDHTkJhL1ENe5QPM yTpoRZgX0ef2FlPpO7ZfbeuhdbplNSua7i5HpTU+LYxwr2vMx5USdBCa+/x6SPWmQk5yXf SDcFKrFeiFYRsti3ovSIja3C+3liul8= Received: from mail-yw1-f200.google.com (mail-yw1-f200.google.com [209.85.128.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-614-aZSMa4bjPs6wyfcDQrlO3g-1; Mon, 09 Oct 2023 14:28:01 -0400 X-MC-Unique: aZSMa4bjPs6wyfcDQrlO3g-1 Received: by mail-yw1-f200.google.com with SMTP id 00721157ae682-5a7b3ae01c0so5348097b3.3 for ; Mon, 09 Oct 2023 11:28:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696876081; x=1697480881; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=orRdjx9+852xXN15Pq+UxCnXmyKpm9bkQ+d1p7lpjng=; b=eAYq4USABUzxAa4KSCz1ZyXxUCfr/KTzRh3f2exeS2hP1ccR1NOCoA4WcuzxukEc4R eJ5xUNaaIZJNP7gGhgTafU7/Ib6m9o03S/vfLcG93RtsjMXaC4Zf6feJqX5lPNkxhIfn mvCt+xFxRyrpmBWRfBHBcHyKXgPiq5/uIcOoHzoS2AYruXVusC2mfG4JEsF+EI86kBYd KXdFZCo/fox/fozAbd9GOxy57dXJHI0PA3874y+2moxbfHkhXlxy2Ysv1dfjKm9bvFxL /O8BjllQdHCHEfmRw2yDOcHrfBSLScX6OSXYZHLqoGigqD17wXN2tq77oEiF9hq68bca zRZQ== X-Gm-Message-State: AOJu0YwedvmulS5KExMdMSnTzthyunXBMYrwWRvuyEfu3TIMqMsXB4W2 +FcciYMiIqk4Jh13tXxngcIpzakzRsqh6G9iRc+y4Po9pURFS8+k/o7UvJJpDL/VC72h7cE6tnY PkVUBvaD0PWw8f5uy X-Received: by 2002:a81:af45:0:b0:5a7:af72:ad6a with SMTP id x5-20020a81af45000000b005a7af72ad6amr1206458ywj.43.1696876080706; Mon, 09 Oct 2023 11:28:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGqQkRil3E/g24y0JCveVG3AFiR7tPLbgTXUf2PC6SmFpxG5LDDJCmsMZWiLrqDuZykceEHnw== X-Received: by 2002:a81:af45:0:b0:5a7:af72:ad6a with SMTP id x5-20020a81af45000000b005a7af72ad6amr1206445ywj.43.1696876080450; Mon, 09 Oct 2023 11:28:00 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id u206-20020a8147d7000000b0059b4e981fe6sm3796511ywa.102.2023.10.09.11.27.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Oct 2023 11:27:58 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 385A0E58213; Mon, 9 Oct 2023 20:27:57 +0200 (CEST) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: David Ahern , Stephen Hemminger Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Nicolas Dichtel , Christian Brauner , "Eric W . Biederman" , David Laight Subject: [RFC PATCH iproute2-next 4/5] ip: Also create and persist mount namespace when creating netns Date: Mon, 9 Oct 2023 20:27:52 +0200 Message-ID: <20231009182753.851551-5-toke@redhat.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231009182753.851551-1-toke@redhat.com> References: <20231009182753.851551-1-toke@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: dsahern@gmail.com X-Patchwork-State: RFC When creating a new network namespace, persist not only the network namespace reference itself, but also create and persist a new mount namespace that is paired with the network namespace. This means that multiple subsequent invocations of 'ip netns exec' will reuse the same mount namespace instead of creating a new namespace on every entry, as was the behaviour before this patch. The persistent mount namespace has the benefit that any new mounts created inside the namespace will persist. Most notably, this is useful when using bpffs instances along with 'ip netns', as these were previously transient to a single 'ip netns' invocation. To preserve backwards compatibility, when changing namespaces we will fall back to the old behaviour of creating a new mount namespace when switching netns, if we can't find a persisted namespace to enter. This can happen if the netns instance was created with a previous version of iproute2 that doesn't persist the mount namespace. One caveat of the mount namespace persistence is that we can't make the containing directory mount shared, the way we do with the netns mounts. This means that if 'ip netns del' is invoked *inside* a namespace created with 'ip netns', the mount namespace reference will not be deleted and will stick around in the original mount namespace where it was created. This is unavoidable because it is not possible to create a bind-mounted reference to a mount namespace inside that same mount namespace (as that would create a circular reference). In such a situation, we may end up with the network namespace reference being removed but the mount namespace reference sticking around (the same thing can happen if 'ip netns del' is executed with an older version of iproute2). In this situation, a subsequent 'ip netns add' with the same namespace name will end up reusing the old mount namespace reference. Signed-off-by: Toke Høiland-Jørgensen --- Makefile | 2 ++ ip/ipnetns.c | 64 +++++++++++++++++++++++++++++++++++++++++++------ lib/namespace.c | 8 ++++++- 3 files changed, 66 insertions(+), 8 deletions(-) diff --git a/Makefile b/Makefile index 5c559c8dc805..aeb1ddc53c6a 100644 --- a/Makefile +++ b/Makefile @@ -19,6 +19,7 @@ SBINDIR?=/sbin CONF_ETC_DIR?=/etc/iproute2 CONF_USR_DIR?=$(LIBDIR)/iproute2 NETNS_RUN_DIR?=/var/run/netns +MNTNS_RUN_DIR?=/var/run/netns-mnt NETNS_ETC_DIR?=/etc/netns DATADIR?=$(PREFIX)/share HDRDIR?=$(PREFIX)/include/iproute2 @@ -41,6 +42,7 @@ endif DEFINES+=-DCONF_USR_DIR=\"$(CONF_USR_DIR)\" \ -DCONF_ETC_DIR=\"$(CONF_ETC_DIR)\" \ -DNETNS_RUN_DIR=\"$(NETNS_RUN_DIR)\" \ + -DMNTNS_RUN_DIR=\"$(MNTNS_RUN_DIR)\" \ -DNETNS_ETC_DIR=\"$(NETNS_ETC_DIR)\" \ -DCONF_COLOR=$(CONF_COLOR) diff --git a/ip/ipnetns.c b/ip/ipnetns.c index 529790482683..551819577755 100644 --- a/ip/ipnetns.c +++ b/ip/ipnetns.c @@ -733,13 +733,24 @@ static int netns_identify(int argc, char **argv) static int on_netns_del(char *nsname, void *arg) { - char netns_path[PATH_MAX]; + char ns_path[PATH_MAX]; + struct stat st; + + snprintf(ns_path, sizeof(ns_path), "%s/%s", MNTNS_RUN_DIR, nsname); + if (!stat(ns_path, &st)) { /* may not exist if created by old iproute2 */ + umount2(ns_path, MNT_DETACH); + if (unlink(ns_path) < 0) { + fprintf(stderr, "Cannot remove namespace file \"%s\": %s\n", + ns_path, strerror(errno)); + return -1; + } + } - snprintf(netns_path, sizeof(netns_path), "%s/%s", NETNS_RUN_DIR, nsname); - umount2(netns_path, MNT_DETACH); - if (unlink(netns_path) < 0) { + snprintf(ns_path, sizeof(ns_path), "%s/%s", NETNS_RUN_DIR, nsname); + umount2(ns_path, MNT_DETACH); + if (unlink(ns_path) < 0) { fprintf(stderr, "Cannot remove namespace file \"%s\": %s\n", - netns_path, strerror(errno)); + ns_path, strerror(errno)); return -1; } return 0; @@ -885,17 +896,46 @@ static int bind_ns_file(const char *parent, const char *nsfile, return 0; } +static ino_t get_mnt_ino(pid_t pid) +{ + char path[PATH_MAX]; + struct stat st; + + snprintf(path, sizeof(path), "/proc/%u/ns/mnt", (unsigned) pid); + + if (stat(path, &st) != 0) { + fprintf(stderr, "stat of %s failed: %s\n", + path, strerror(errno)); + exit(EXIT_FAILURE); + } + return st.st_ino; +} + static pid_t bind_ns_files_from_child(const char *ns_name, pid_t target_pid, int *fd) { + ino_t mnt_ino; pid_t child; + mnt_ino = get_mnt_ino(getpid()); + child = fork_and_wait(fd); if (child) return child; if (bind_ns_file(NETNS_RUN_DIR, "net", ns_name, target_pid)) exit(EXIT_FAILURE); + + /* We can only bind the mount namespace reference if the target pid is + * actually in a different mount namespace than ourselves. We ignore any + * errors in creating the mount namespace reference because an old + * namespace mount may be present if a network namespace with the same + * name was previously removed by an older version of iproute2; in this + * case that old reference will just be reused. + */ + if (mnt_ino != get_mnt_ino(target_pid)) + bind_ns_file(MNTNS_RUN_DIR, "mnt", ns_name, target_pid); + exit(EXIT_SUCCESS); } @@ -1003,8 +1043,13 @@ static int netns_add(int argc, char **argv, bool create) * unmounting a network namespace file in one namespace will unmount the * network namespace file in all namespaces allowing the network * namespace to be freed sooner. + * + * The mount namespace directory cannot be shared because it's not + * possible to mount references to a mount namespace inside that + * namespace itself. */ - if (prepare_ns_mount_dir(NETNS_RUN_DIR, MS_SHARED)) + if (prepare_ns_mount_dir(NETNS_RUN_DIR, MS_SHARED) || + prepare_ns_mount_dir(MNTNS_RUN_DIR, MS_SLAVE)) return -1; child = bind_ns_files_from_child(name, pid, &event_fd); @@ -1012,12 +1057,17 @@ static int netns_add(int argc, char **argv, bool create) exit(EXIT_FAILURE); if (create) { - if (unshare(CLONE_NEWNET) < 0) { + if (unshare(CLONE_NEWNET | CLONE_NEWNS) < 0) { fprintf(stderr, "Failed to create a new network namespace \"%s\": %s\n", name, strerror(errno)); close(event_fd); exit(EXIT_FAILURE); } + + if (prepare_mountns(name, false)) { + close(event_fd); + exit(EXIT_FAILURE); + } } return sync_with_child(child, event_fd); diff --git a/lib/namespace.c b/lib/namespace.c index 5e310762f34b..5f2449fb0003 100644 --- a/lib/namespace.c +++ b/lib/namespace.c @@ -127,7 +127,13 @@ int netns_switch(char *name) if (switch_ns(NETNS_RUN_DIR, name, CLONE_NEWNET)) return -1; - return prepare_mountns(name, true); + /* Try to enter an existing persisted mount namespace. If this fails, + * preserve the old behaviour of creating a new namespace on entry. + */ + if (switch_ns(MNTNS_RUN_DIR, name, CLONE_NEWNS)) + return prepare_mountns(name, true); + + return 0; } int netns_get_fd(const char *name) From patchwork Mon Oct 9 18:27:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Patchwork-Id: 13414240 X-Patchwork-Delegate: dsahern@gmail.com Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C897D3AC1A for ; Mon, 9 Oct 2023 18:28:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Flw54V43" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D0DDBA for ; Mon, 9 Oct 2023 11:28:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696876094; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cJCshGXBIhUp5eoSJv8BhgcaclTM7hneWJVXPfRd5Oo=; b=Flw54V434TJb4spMHJMuMu2S4Rpwtiqt00wA9Ct95Qktitc2G1m2TNpUUceQIT2wchQ4D8 6mLt32ozoJxGmssjy/AFJG9pW85aTY8Qvx0f/uqjCoQYg248TGUe1eYGQYB4yJnhGoUOyf dPsi7TjzJJtC3C1AJG1NT2zQ+dBvIGA= Received: from mail-yw1-f200.google.com (mail-yw1-f200.google.com [209.85.128.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-659-XbtrnyiwN5aeE2flfErYNg-1; Mon, 09 Oct 2023 14:28:03 -0400 X-MC-Unique: XbtrnyiwN5aeE2flfErYNg-1 Received: by mail-yw1-f200.google.com with SMTP id 00721157ae682-59f7d109926so74118387b3.2 for ; Mon, 09 Oct 2023 11:28:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696876083; x=1697480883; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cJCshGXBIhUp5eoSJv8BhgcaclTM7hneWJVXPfRd5Oo=; b=pee2SFrKDDQSGrFZeDO2V3FkbWIQE/6lTseQMH5o4aYa8v6gX/rAc6+9JnUKubV4Rb pAukvBE0i9qPejbH175UlN/An+KAGkT1muKtzCOsYx570w0m4jTOIDKs6B1BQ7zBo5Ti g58G8piq1VSTbwPOGF9Bf7uvfxtUX0P+XVbsK+jgiXTtG3+Ynw9fr5h8z0DELGGgEO3h dKFXQqR9xqYhj2CWd0Gsas/fH9eIPGdccpq/JfV9VDVH2319VSD8zi+683cW0siacXKJ esZrdMb9h/x2f0JbvcosavBsNDFBY8kn5v3FL65uep0BUDq/z/OiVeHfAbIk8oYXTQEP vqoQ== X-Gm-Message-State: AOJu0Yys4JGQYW0bv966Z/I5oaBiWoxcRsBMdOnbfeh7/hzr2DcSyPqK xn8d4txzgaVqRJT7xFB4823+cryD20fE9XrgtlpOBLnaHBv+gBsT6DTaU83i1xgx/V6zF5OCGIE wQaZKU5W2r2aXHItu X-Received: by 2002:a0d:e28f:0:b0:5a7:a792:a5d7 with SMTP id l137-20020a0de28f000000b005a7a792a5d7mr1541040ywe.15.1696876083054; Mon, 09 Oct 2023 11:28:03 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFPxjr/qAaN0GdyTNJWfdAvn+wK2SMrRy2Rcuv/q0h7zt+3+ulHphaw2l3L43J7ftORjeFpyA== X-Received: by 2002:a0d:e28f:0:b0:5a7:a792:a5d7 with SMTP id l137-20020a0de28f000000b005a7a792a5d7mr1541027ywe.15.1696876082841; Mon, 09 Oct 2023 11:28:02 -0700 (PDT) Received: from alrua-x1.borgediget.toke.dk ([45.145.92.2]) by smtp.gmail.com with ESMTPSA id m2-20020a814042000000b005a7b8fddfedsm121032ywn.41.2023.10.09.11.28.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Oct 2023 11:28:01 -0700 (PDT) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 3B27AE58215; Mon, 9 Oct 2023 20:27:57 +0200 (CEST) From: =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= To: David Ahern , Stephen Hemminger Cc: netdev@vger.kernel.org, =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Nicolas Dichtel , Christian Brauner , "Eric W . Biederman" , David Laight Subject: [RFC PATCH iproute2-next 5/5] lib/namespace: Also mount a bpffs instance inside new mount namespaces Date: Mon, 9 Oct 2023 20:27:53 +0200 Message-ID: <20231009182753.851551-6-toke@redhat.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231009182753.851551-1-toke@redhat.com> References: <20231009182753.851551-1-toke@redhat.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: dsahern@gmail.com X-Patchwork-State: RFC When creating a new mount namespace, we remount /sys inside that namespace, which means there is no bpffs available unless it is manually remounted later. To make it easier to work with BPF in combination with 'ip netns', make sure we always mount a bpffs instance to /sys/fs/bpf after creating a new namespace. Since bpffs may not always be available, we only warn if the mounting fails, but carry on regardless. Signed-off-by: Toke Høiland-Jørgensen --- lib/namespace.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/lib/namespace.c b/lib/namespace.c index 5f2449fb0003..62456ab24e4f 100644 --- a/lib/namespace.c +++ b/lib/namespace.c @@ -93,6 +93,9 @@ int prepare_mountns(const char *name, bool do_unshare) return -1; } + if (mount("bpf", "/sys/fs/bpf", "bpf", mountflags, NULL) < 0) + fprintf(stderr, "could not mount /sys/fs/bpf inside namespace: %s. continuing anyway\n",strerror(errno)); + /* Setup bind mounts for config files in /etc */ bind_etc(name); return 0;