From patchwork Tue Jan 28 23:18:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11355351 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9ACBB1395 for ; Tue, 28 Jan 2020 23:19:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 664DF2087F for ; Tue, 28 Jan 2020 23:19:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="gKXWzx47" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726393AbgA1XTM (ORCPT ); Tue, 28 Jan 2020 18:19:12 -0500 Received: from mail-pf1-f196.google.com ([209.85.210.196]:36056 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726293AbgA1XTM (ORCPT ); Tue, 28 Jan 2020 18:19:12 -0500 Received: by mail-pf1-f196.google.com with SMTP id 185so3669224pfv.3 for ; Tue, 28 Jan 2020 15:19:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=5+hIC30E3bJyImlYZIee7e+CFPKB9pWQXJ+KV9fgdTQ=; b=gKXWzx47Itl3gyqzqaIo8pw9z3Yz3S6WqyK9L1kap/q/vFR/FFQrrDUV5mUYeuATvV 0zyrSpa2mgEKorDPJDC++Rw1JG+7utGGvstK7+FIriCncgCAYcD/LYgd3FjyvX1k0gnT S1mkpH1VNQ1/HVfwkh/6rfKdGb/JRFLlUfgKc9ek4txeIDLHC6Om4ZHxXZHO2mToMGGp QUYaq9USTyoSRMdTAoxHJi1IHb1cs8YQCL62MPYq3qjCjhyV4VFX0+weM1+4pKhF6NpH jy/sWSv5yAnXd+nIXhzoaD++xCHV0Wt4dMeX1MR5NPtGMEuWHnymSIYo4yG4+MX+tVqP 7AKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=5+hIC30E3bJyImlYZIee7e+CFPKB9pWQXJ+KV9fgdTQ=; b=EIkfIBlwOx1dA7pqcXBxXcjiytslYTiuNehJ37P32r0g5rQdvfaRhlmuCTCv+JatkP MxGE2qUtrc5hrZHJ7FBO/7lXyzsBUDcOkh32DETswdyuLiTLSW8pdV8PqLPY8fcXQ8OF /sL6ones5TVyHoxO3wxaAwMlFhnh1q3K8QMUeoMtHAjDRdYeUkWfKU64AEm/PeZLNhiV bES+MV14kNOPMSt3N5QmstIK+AptxdwaKMa1D4vUZxerl2Ukh2s2lPiHn/cLPxU5UKh9 x53zJE/0ok/FqhZk+eaTidsex8CiFtJgJKem1WQTQVCGVDGtMJ0KAcw4o7RpprEPLjIF gQaQ== X-Gm-Message-State: APjAAAWKy0ekCklk+Vyyb30UYydWtK5JRcT73hfSWo2KURd8D4XwA/Yh ritfHixQ3dhGpTDgN2XBC5ncy8FprDk= X-Google-Smtp-Source: APXvYqyEiu6hViVgFwy3wWMUPnVHNktYxg5XSBnAOSAEiYQDnPT0p4YzV/jJn5hw+nBLZKE6WTkL1g== X-Received: by 2002:aa7:9908:: with SMTP id z8mr5989675pff.68.1580253551266; Tue, 28 Jan 2020 15:19:11 -0800 (PST) Received: from vader.thefacebook.com ([2620:10d:c090:200::43a7]) by smtp.gmail.com with ESMTPSA id p24sm156353pgk.19.2020.01.28.15.19.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jan 2020 15:19:10 -0800 (PST) From: Omar Sandoval To: linux-fsdevel@vger.kernel.org, Al Viro Cc: kernel-team@fb.com Subject: [RFC PATCH man-pages] link.2: Document new AT_LINK_REPLACE flag Date: Tue, 28 Jan 2020 15:18:57 -0800 Message-Id: <8480e876e2810afb0485a080ce1cef182f86967f.1580253342.git.osandov@fb.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Omar Sandoval Signed-off-by: Omar Sandoval --- man2/link.2 | 191 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 191 insertions(+) diff --git a/man2/link.2 b/man2/link.2 index 649ba00c7..0097e3071 100644 --- a/man2/link.2 +++ b/man2/link.2 @@ -174,6 +174,60 @@ like this: linkat(AT_FDCWD, "/proc/self/fd/", newdirfd, newname, AT_SYMLINK_FOLLOW); .EE +.TP +.BR AT_LINK_REPLACE " (since Linux 5.7)" +If +.I newpath +exists, replace it atomically. +There is no point at which another process attempting to access +.I newpath +will find it missing. +If +.I newpath +exists but the operation fails, +the original entry specified by +.I newpath +will remain in place. +This does not guarantee data integrity; +see EXAMPLE below for how to use this for crash-safe file replacement with +.BR O_TMPFILE . +.IP +If +.I newpath +is replaced, +any other hard links referring to the original file are unaffected. +Open file descriptors for +.I newpath +are also unaffected. +.IP +.I newpath +must not be a directory. +.IP +If the entry specified by +.I newpath +refers to the file specified by +.I oldpath, +.BR linkat () +does nothing and returns a success status. +Note that this comparison does not follow mounts on +.IR newpath . +.IP +Otherwise, +.I newpath +must not be a mount point in the local namespace. +If it is a mount point in another namespace and the operation succeeds, +all mounts are detached from +.I newpath +in all namespaces, as is the case for +.BR rename (2), +.BR rmdir (2), +and +.BR unlink (2). +.IP +If +.I newpath +refers to a symbolic link, +the link will be replaced. .in .PP Before kernel 2.6.18, the @@ -293,10 +347,34 @@ or .I newdirfd is not a valid file descriptor. .TP +.B EBUSY +.B AT_LINK_REPLACE +was specified in +.IR flags , +.I newpath +does not refer to the file specified by +.IR oldpath , +and +.I newpath +is in use by the system +(for example, it is a mount point in the local namespace). +.TP .B EINVAL An invalid flag value was specified in .IR flags . .TP +.B EINVAL +The filesystem does not support one of the flags in +.IR flags . +.TP +.B EISDIR +.B AT_LINK_REPLACE +was specified in +.I flags +and +.I newpath +refers to an existing directory. +.TP .B ENOENT .B AT_EMPTY_PATH was specified in @@ -344,6 +422,31 @@ was specified in is an empty string, and .IR olddirfd refers to a directory. +.TP +.B EPERM +.B AT_LINK_REPLACE +was specified in +.I flags +and +.I newpath +refers to an immutable or append-only file +or a file in an immutable or append-only directory. +(See +.BR ioctl_iflags (2).) +.TP +.BR EPERM " or " EACCES +.B AT_LINK_REPLACE +was specified in +.IR flags , +the directory containing +.I newpath +has the sticky bit +.RB ( S_ISVTX ) +set, and the process's effective UID is neither the UID of the file to +be deleted nor that of the directory containing it, and +the process is not privileged (Linux: does not have the +.B CAP_FOWNER +capability). .SH VERSIONS .BR linkat () was added to Linux in kernel 2.6.16; @@ -421,6 +524,94 @@ performs the link creation and dies before it can say so. Use .BR stat (2) to find out if the link got created. +.SH EXAMPLE +The following program demonstrates the use of +.BR linkat () +with +.B AT_LINK_REPLACE +and +.BR open (2) +with +.B O_TMPFILE +for crash-safe file replacement. +.SS Example output +.in +4n +.EX +$ \fBecho bar > foo\fP +$ \fB./replace foo\fP +$ \fBcat foo\fP +hello, world +.EE +.in +.SS Program source (replace.c) +.EX +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include + +int +main(int argc, char *argv[]) +{ + char *path, *dirc, *basec, *dir, *base; + int fd, dirfd; + + if (argc != 2) { + fprintf(stderr, "usage: %s PATH\en", argv[0]); + exit(EXIT_FAILURE); + } + + path = argv[1]; + + dirc = strdup(path); + basec = strdup(path); + if (!dirc || !basec) { + perror("strdup"); + exit(EXIT_FAILURE); + } + dir = dirname(dirc); + base = basename(basec); + + /* Open the parent directory. */ + dirfd = open(dir, O_DIRECTORY | O_RDONLY); + if (dirfd == -1) { + perror("open"); + exit(EXIT_FAILURE); + } + + /* Open a temporary file, write data to it, and persist it. */ + fd = open(dir, O_TMPFILE | O_RDWR, 0644); + if (fd == -1) { + perror("open"); + exit(EXIT_FAILURE); + } + if (write(fd, "hello, world\en", 13) == -1) { + perror("write"); + exit(EXIT_FAILURE); + } + if (fsync(fd) == -1) { + perror("fsync"); + exit(EXIT_FAILURE); + } + + /* Replace the original file and persist the directory. */ + if (linkat(fd, "", dirfd, base, AT_EMPTY_PATH | AT_LINK_REPLACE) == -1) { + perror("linkat"); + exit(EXIT_FAILURE); + } + if (fsync(dirfd) == -1) { + perror("fsync"); + exit(EXIT_FAILURE); + } + + exit(EXIT_SUCCESS); +} +.EE .SH SEE ALSO .BR ln (1), .BR open (2), From patchwork Tue Jan 28 23:19:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11355357 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7AD6B1395 for ; Tue, 28 Jan 2020 23:19:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 42B7922522 for ; Tue, 28 Jan 2020 23:19:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="ChgTa9tM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726443AbgA1XTQ (ORCPT ); Tue, 28 Jan 2020 18:19:16 -0500 Received: from mail-pf1-f196.google.com ([209.85.210.196]:36059 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726439AbgA1XTQ (ORCPT ); Tue, 28 Jan 2020 18:19:16 -0500 Received: by mail-pf1-f196.google.com with SMTP id 185so3669321pfv.3 for ; Tue, 28 Jan 2020 15:19:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QEhUOiucewZ5GQPDg2VC31rwEWJkdyQIK+x5OmF4ABA=; b=ChgTa9tMkENLfeWhuFIqVjAhUfsGJclGDAk7SAapPHLudFUWVUxCzibrSRTpDWUNdZ 4vqEH1q/h8oaJCoiRhHyYGp0cp/ixWChuRukF/hkbp1RMOPnCINbZllNi56S+I+wvW2I Of4F5hsmPgg9bRzPN4op/oLDTQHSoV/uMic9DmiBXC/I4PCC/b47eZLPjzbeVtTrhV4A 2zkjUAYWZEj1tx+ic09K0UKROSeH6xoMEAas52NjFvpb1yc4QrSjp8PV7LbfUgIvOP+G 2L3MSMX4NLBsau5ecjSVQOfj1M6hpDA9vueom58B4ACRa6jPa3JWNAOMcaFw+NK2FGag 1HIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QEhUOiucewZ5GQPDg2VC31rwEWJkdyQIK+x5OmF4ABA=; b=pYJgmb3IQfPfj4FdULd+VD6OsD54e/rw7Fx2hr1oq8CZErI7O+kFPFc+MDA1ByS9av 1sx1dQ4myy+4j4wMEt6NN6kzQV6lr0G/z+1HhyGeU8cfpGKpe2qhyV2e7sCvzKbGR4Cy WM68JJoWmi7E2/fQ7qDAODgP+fcJoh5LouyxX5Dt6gTUGlqjugWjgen9igEFBMBbgdFg C7VBJNoR4soGQn4pZdhGM3xWbkHypv52jJ+FjcvCU9loElbYi5cqYrJIb9fHS6hhveCn olxfa6I50cWEvErx+8ZCATr0RWOURAqK+IC5tQxd0Fu5lxKMG8pihvrAASxz30mrRMDM rxGw== X-Gm-Message-State: APjAAAWiaYPC7MhDkE9K981Yi/5LuBzZMA0XtScKgcgTlSGn7tFEkWdE kfl2G3XkpvUcpoScgGajzP7m0jewD7M= X-Google-Smtp-Source: APXvYqzqJT68xGiYk8v5m9JQoqlkAJSWQxbXsj2R4LmuG5RKBrT5T6muPzUw4Zy/0ujSPYTWhY/Wig== X-Received: by 2002:a63:5ce:: with SMTP id 197mr27406985pgf.114.1580253555343; Tue, 28 Jan 2020 15:19:15 -0800 (PST) Received: from vader.thefacebook.com ([2620:10d:c090:200::43a7]) by smtp.gmail.com with ESMTPSA id p24sm156353pgk.19.2020.01.28.15.19.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jan 2020 15:19:14 -0800 (PST) From: Omar Sandoval To: linux-fsdevel@vger.kernel.org, Al Viro Cc: kernel-team@fb.com, Xi Wang Subject: [RFC PATCH v4 2/4] fs: add AT_LINK_REPLACE flag for linkat() which replaces the target Date: Tue, 28 Jan 2020 15:19:01 -0800 Message-Id: <1f5a197a2fdb0668f6dce8b9a4403481bc957a7c.1580251857.git.osandov@fb.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Omar Sandoval One of the most common uses of temporary files is the classic atomic replacement pattern, i.e., - write temporary file - fsync temporary file - rename temporary file over real file - fsync parent directory Now, we have O_TMPFILE, which gives us a much better way to create temporary files, but it's not possible to use it for this pattern. This patch introduces an AT_LINK_REPLACE flag which allows linkat() to replace the target file. Now, the temporary file in the pattern above can be a proper O_TMPFILE. Even without O_TMPFILE, this is a new primitive which might be useful in other contexts. The implementation on the VFS side mimics sys_renameat2(). Cc: Xi Wang Signed-off-by: Omar Sandoval --- fs/ecryptfs/inode.c | 2 +- fs/namei.c | 166 +++++++++++++++++++++++++++++-------- fs/nfsd/vfs.c | 2 +- fs/overlayfs/overlayfs.h | 2 +- include/linux/fs.h | 2 +- include/uapi/linux/fcntl.h | 1 + 6 files changed, 135 insertions(+), 40 deletions(-) diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c index eeb351b220b2..2f36b7a61a2f 100644 --- a/fs/ecryptfs/inode.c +++ b/fs/ecryptfs/inode.c @@ -440,7 +440,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir, dget(lower_new_dentry); lower_dir_dentry = lock_parent(lower_new_dentry); rc = vfs_link(lower_old_dentry, d_inode(lower_dir_dentry), - lower_new_dentry, NULL); + lower_new_dentry, NULL, 0); if (rc || d_really_is_negative(lower_new_dentry)) goto out_lock; rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb); diff --git a/fs/namei.c b/fs/namei.c index 9d690df17aed..78d364e99dca 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -4122,6 +4122,7 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn * @dir: new parent * @new_dentry: where to create the new link * @delegated_inode: returns inode needing a delegation break + * @flags: link flags * * The caller must hold dir->i_mutex * @@ -4135,16 +4136,25 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn * be appropriate for callers that expect the underlying filesystem not * to be NFS exported. */ -int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode) +int vfs_link(struct dentry *old_dentry, struct inode *dir, + struct dentry *new_dentry, struct inode **delegated_inode, + int flags) { struct inode *inode = old_dentry->d_inode; + struct inode *target = new_dentry->d_inode; unsigned max_links = dir->i_sb->s_max_links; int error; if (!inode) return -ENOENT; - error = may_create(dir, new_dentry); + if (target) { + if (inode == target) + return 0; + error = may_delete(dir, new_dentry, false); + } else { + error = may_create(dir, new_dentry); + } if (error) return error; @@ -4172,26 +4182,55 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de if (error) return error; - inode_lock(inode); + dget(new_dentry); + lock_two_nondirectories(inode, target); + + if (is_local_mountpoint(new_dentry)) { + error = -EBUSY; + goto out; + } + /* Make sure we don't allow creating hardlink to an unlinked file */ - if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE)) + if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE)) { error = -ENOENT; - else if (max_links && inode->i_nlink >= max_links) + goto out; + } + if (max_links && inode->i_nlink >= max_links) { error = -EMLINK; - else { - error = try_break_deleg(inode, delegated_inode); - if (!error) - error = dir->i_op->link(old_dentry, dir, new_dentry, 0); + goto out; + } + + error = try_break_deleg(inode, delegated_inode); + if (error) + goto out; + if (target) { + error = try_break_deleg(target, delegated_inode); + if (error) + goto out; + } + + error = dir->i_op->link(old_dentry, dir, new_dentry, flags); + if (error) + goto out; + + if (target) { + dont_mount(new_dentry); + detach_mounts(new_dentry); } - if (!error && (inode->i_state & I_LINKABLE)) { + if (inode->i_state & I_LINKABLE) { spin_lock(&inode->i_lock); inode->i_state &= ~I_LINKABLE; spin_unlock(&inode->i_lock); } - inode_unlock(inode); - if (!error) +out: + unlock_two_nondirectories(inode, target); + dput(new_dentry); + if (!error) { + if (target) + fsnotify_link_count(target); fsnotify_link(dir, inode, new_dentry); + } return error; } EXPORT_SYMBOL(vfs_link); @@ -4210,11 +4249,16 @@ int do_linkat(int olddfd, const char __user *oldname, int newdfd, { struct dentry *new_dentry; struct path old_path, new_path; + struct qstr new_last; + int new_type; struct inode *delegated_inode = NULL; - int how = 0; + struct filename *to; + unsigned int how = 0, target_flags; + bool should_retry = false; int error; - if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) + if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH | + AT_LINK_REPLACE)) != 0) return -EINVAL; /* * To use null names we require CAP_DAC_READ_SEARCH @@ -4229,44 +4273,94 @@ int do_linkat(int olddfd, const char __user *oldname, int newdfd, if (flags & AT_SYMLINK_FOLLOW) how |= LOOKUP_FOLLOW; + + if (flags & AT_LINK_REPLACE) + target_flags = LOOKUP_RENAME_TARGET; + else + target_flags = LOOKUP_CREATE | LOOKUP_EXCL; retry: error = user_path_at(olddfd, oldname, how, &old_path); if (error) return error; - new_dentry = user_path_create(newdfd, newname, &new_path, - (how & LOOKUP_REVAL)); - error = PTR_ERR(new_dentry); - if (IS_ERR(new_dentry)) - goto out; + to = filename_parentat(newdfd, getname(newname), how & LOOKUP_REVAL, + &new_path, &new_last, &new_type); + if (IS_ERR(to)) { + error = PTR_ERR(to); + goto exit1; + } + + if (old_path.mnt != new_path.mnt) { + error = -EXDEV; + goto exit2; + } + + if (new_type != LAST_NORM) { + if (flags & AT_LINK_REPLACE) + error = -EISDIR; + else + error = -EEXIST; + goto exit2; + } + + error = mnt_want_write(old_path.mnt); + if (error) + goto exit2; + +retry_deleg: + inode_lock_nested(new_path.dentry->d_inode, I_MUTEX_PARENT); + + new_dentry = __lookup_hash(&new_last, new_path.dentry, + (how & LOOKUP_REVAL) | target_flags); + if (IS_ERR(new_dentry)) { + error = PTR_ERR(new_dentry); + goto exit3; + } + if (!(flags & AT_LINK_REPLACE) && d_is_positive(new_dentry)) { + error = -EEXIST; + goto exit4; + } + if (new_last.name[new_last.len]) { + if (d_is_negative(new_dentry)) { + error = -ENOENT; + goto exit4; + } + if (!d_is_dir(old_path.dentry)) { + error = -ENOTDIR; + goto exit4; + } + } - error = -EXDEV; - if (old_path.mnt != new_path.mnt) - goto out_dput; error = may_linkat(&old_path); if (unlikely(error)) - goto out_dput; + goto exit4; error = security_path_link(old_path.dentry, &new_path, new_dentry); if (error) - goto out_dput; - error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry, &delegated_inode); -out_dput: - done_path_create(&new_path, new_dentry); + goto exit4; + error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry, + &delegated_inode, flags & AT_LINK_REPLACE); +exit4: + dput(new_dentry); +exit3: + inode_unlock(new_path.dentry->d_inode); if (delegated_inode) { error = break_deleg_wait(&delegated_inode); - if (!error) { - path_put(&old_path); - goto retry; - } + if (!error) + goto retry_deleg; } - if (retry_estale(error, how)) { - path_put(&old_path); + mnt_drop_write(old_path.mnt); +exit2: + if (retry_estale(error, how)) + should_retry = true; + path_put(&new_path); + putname(to); +exit1: + path_put(&old_path); + if (should_retry) { + should_retry = false; how |= LOOKUP_REVAL; goto retry; } -out: - path_put(&old_path); - return error; } diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index c0dc491537a6..3f9291e76b99 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1598,7 +1598,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp, err = nfserr_noent; if (d_really_is_negative(dold)) goto out_dput; - host_err = vfs_link(dold, dirp, dnew, NULL); + host_err = vfs_link(dold, dirp, dnew, NULL, 0); if (!host_err) { err = nfserrno(commit_metadata(ffhp)); if (!err) diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index f283b1d69a9e..b199fc03c891 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -120,7 +120,7 @@ static inline int ovl_do_unlink(struct inode *dir, struct dentry *dentry) static inline int ovl_do_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry) { - int err = vfs_link(old_dentry, dir, new_dentry, NULL); + int err = vfs_link(old_dentry, dir, new_dentry, NULL, 0); pr_debug("link(%pd2, %pd2) = %i\n", old_dentry, new_dentry, err); return err; diff --git a/include/linux/fs.h b/include/linux/fs.h index 3bdb71c97e8f..93eb90eb1fdb 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1712,7 +1712,7 @@ extern int vfs_create(struct inode *, struct dentry *, umode_t, bool); extern int vfs_mkdir(struct inode *, struct dentry *, umode_t); extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t); extern int vfs_symlink(struct inode *, struct dentry *, const char *); -extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **); +extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **, int); extern int vfs_rmdir(struct inode *, struct dentry *); extern int vfs_unlink(struct inode *, struct dentry *, struct inode **); extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int); diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 1f97b33c840e..3704793cd5ab 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -99,6 +99,7 @@ #define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ +#define AT_LINK_REPLACE 0x10000 /* Replace link() target */ #endif /* _UAPI_LINUX_FCNTL_H */ From patchwork Tue Jan 28 23:19:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11355359 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B9C7B13A4 for ; Tue, 28 Jan 2020 23:19:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9952C22522 for ; Tue, 28 Jan 2020 23:19:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="k7VP/rub" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726446AbgA1XTS (ORCPT ); Tue, 28 Jan 2020 18:19:18 -0500 Received: from mail-pg1-f196.google.com ([209.85.215.196]:33947 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726293AbgA1XTR (ORCPT ); Tue, 28 Jan 2020 18:19:17 -0500 Received: by mail-pg1-f196.google.com with SMTP id r11so7835588pgf.1 for ; Tue, 28 Jan 2020 15:19:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=reRZ27o860Jv4T3ywX/rAzqX7TYE/u2uK3SOSUVyejw=; b=k7VP/rublHHD1ZcKcCqFqjLNukGtmMxMUSabaKVQ1Fwz1AluQTI3ayFYmv2MkoF8sF KTnrpMybeItuOdPPq8QNM8hAZKGcMERONy66EiZ6sgIOlU9Ht/eHzyoRCcVxAmGrKtoH eXdXweTv2a9FwCljg7C9XtUIIPlFp9yUk9fVNEFGe2b+FIL+/MbH+dtov51wmF2oTTXn LOYUvY97j74rE084/4VDRjjc23mXQnX89NTJ9rSPvixJX9iPq7+pCFshPzOIAwqV4361 nVL8ac5ons437JDTQda5WWraX7VZTjVK06CyTy1JZeArXIv8ruU8X6qcNALUObs/XedA 7LVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=reRZ27o860Jv4T3ywX/rAzqX7TYE/u2uK3SOSUVyejw=; b=YBozDTX2fezS6eU+mqOc5NPdmJBCoLezwRLKloU89Y5tjsjucqgHhDeB1Zf7TkIy+j qY+Mk6wuaw8DFbgjImiFtnLO5JNXl7lly1j21KW3zwYQV7KPNw/+bXV1K4TL/JtGMIAP wCR343hZovhRTIxRQ1xMcSSHBLaSnKErvbVy7nl/fDEjWXpeWpjU+KB0r6Y7ClmowLBO CHbjJfNpeXLzFZFf6tyxnoiCWNjzUICqZh5wj4RMO4LL0wtAED0WSQsfR1oIgMfqsjsV bbxbeR9k3ojn91LJ2VohbDsiL30AYIE3VmEuXd/qUPus6rAJ2RjwL06n7x/IkoSqcQOO 0dEA== X-Gm-Message-State: APjAAAVs0GSGYqsww1jITZyHz/8C98e4Q6YFuL87lybCor0atV0tXk1K QCUMZYaW0YmJ5S+6xSjdMzui2Y0aIGc= X-Google-Smtp-Source: APXvYqyLrtk0AAf0tz/DjYLGdjS0KGKKZ1P60pGJXzi0fJBMwOkOU2/a7LcF5zo8zm0wSPvnVAwbtw== X-Received: by 2002:a63:e609:: with SMTP id g9mr9889417pgh.75.1580253556349; Tue, 28 Jan 2020 15:19:16 -0800 (PST) Received: from vader.thefacebook.com ([2620:10d:c090:200::43a7]) by smtp.gmail.com with ESMTPSA id p24sm156353pgk.19.2020.01.28.15.19.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jan 2020 15:19:15 -0800 (PST) From: Omar Sandoval To: linux-fsdevel@vger.kernel.org, Al Viro Cc: kernel-team@fb.com Subject: [RFC PATCH v4 3/4] Btrfs: fix inode reference count leak in btrfs_link() error path Date: Tue, 28 Jan 2020 15:19:02 -0800 Message-Id: <885829e37b0cdf75e26f4605e34110a7b23fe162.1580251857.git.osandov@fb.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Omar Sandoval If btrfs_update_inode() or btrfs_orphan_del() fails in btrfs_link(), then we don't drop the reference we got with ihold(). This results in the "VFS: Busy inodes after unmount" crash. The reference is needed for the new dentry, so get it right before we instantiate the dentry. Fixes: 79787eaab461 ("btrfs: replace many BUG_ONs with proper error handling") [Although d_instantiate() was moved further from ihold() before that, in commit 08c422c27f85 ("Btrfs: call d_instantiate after all ops are setup")] Signed-off-by: Omar Sandoval --- fs/btrfs/inode.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index bc7709c4f6eb..8c9a114f48f6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6801,7 +6801,6 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir, inc_nlink(inode); inode_inc_iversion(inode); inode->i_ctime = current_time(inode); - ihold(inode); set_bit(BTRFS_INODE_COPY_EVERYTHING, &BTRFS_I(inode)->runtime_flags); err = btrfs_add_nondir(trans, BTRFS_I(dir), dentry, BTRFS_I(inode), @@ -6825,6 +6824,7 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir, if (err) goto fail; } + ihold(inode); d_instantiate(dentry, inode); ret = btrfs_log_new_name(trans, BTRFS_I(inode), NULL, parent, true, NULL); @@ -6837,10 +6837,8 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir, fail: if (trans) btrfs_end_transaction(trans); - if (drop_inode) { + if (drop_inode) inode_dec_link_count(inode); - iput(inode); - } btrfs_btree_balance_dirty(fs_info); return err; } From patchwork Tue Jan 28 23:19:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 11355365 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6D2411395 for ; Tue, 28 Jan 2020 23:19:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4BB282173E for ; Tue, 28 Jan 2020 23:19:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=osandov-com.20150623.gappssmtp.com header.i=@osandov-com.20150623.gappssmtp.com header.b="RFgyhC5U" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726466AbgA1XTT (ORCPT ); Tue, 28 Jan 2020 18:19:19 -0500 Received: from mail-pf1-f194.google.com ([209.85.210.194]:33847 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726439AbgA1XTS (ORCPT ); Tue, 28 Jan 2020 18:19:18 -0500 Received: by mail-pf1-f194.google.com with SMTP id i6so7458760pfc.1 for ; Tue, 28 Jan 2020 15:19:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=a6Stw5XLuw1I+mUTM2ZMnaa8NrTgZtVImtXCq57YVRU=; b=RFgyhC5Uwml2GvUyl4zHP5bprSEvv8VdlsTBaqiy+fSmEb5op2vLsDqlPjEI0MD5r1 ahUwDSwBppg8UvT32kIafVYeeb+BPVaYTIGbGTZ9ButSfBDSB7rJxktWnhOXULWZhU91 vtiY5nv9m56nsdF4daRLrL99BV8u8mdUxqSu7AE2JkNcklilDA9sTMSdmN0SFlkEXHKz 9fV5/QRZR3rOg1vauuHX0NWZrb8xPFUgKKPqR6gaNfsAsgXdrVEdebTyOfhma125wcFz PSrhmxYvidrcz3Ml+DT7DlOzFbbqsmhIApVIhswNDBw0IJTp5v3ysEQ7lkymcrY2TJ1d mOxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=a6Stw5XLuw1I+mUTM2ZMnaa8NrTgZtVImtXCq57YVRU=; b=idG11Pkrloc7vG02K5CD2/RmDSJ/SQ7tx7wFaJzjogME51rDz3ec20iZXG/oABhd70 YzI5BYx/q+o+weRFkPxOeAFVJps8v0/LsgrCr3FY1n60fmD4enNwg8+ctnTwJRJUaJdT Tn//oi1LOJA8um5nBhsJ/2pPjuxwH+WNMhzXL+PQbXEZrS7MIzOr6HdDSwS6a+yMWa31 VmdR72MoMwGGChtW0buj5HUJbrEcHMm43no33VoygJrPNNWz5bWKzfF02ahV512v+r6d 5l3COxT4brvHdAQI/kF33QFN4sPDpefzXUKo4QcgxeiJxsikuF4HZb5uWYFHkFDwwBeq THYg== X-Gm-Message-State: APjAAAWxGJos9Som48zWpk5MdOkX8Rbi0Z9E4QS0N/HRsFXYo2CvBuTl lNKdCmDBPT2FOTv9EEu+IYhnjLzMIzY= X-Google-Smtp-Source: APXvYqy0046n0TvTbyNHFlnK51YBnlDLjq2SU/8lHTvT0zs0UNl1J2DKZUv26G8Wo/r9bZOzfNnQnQ== X-Received: by 2002:a63:584:: with SMTP id 126mr26906962pgf.100.1580253557351; Tue, 28 Jan 2020 15:19:17 -0800 (PST) Received: from vader.thefacebook.com ([2620:10d:c090:200::43a7]) by smtp.gmail.com with ESMTPSA id p24sm156353pgk.19.2020.01.28.15.19.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jan 2020 15:19:16 -0800 (PST) From: Omar Sandoval To: linux-fsdevel@vger.kernel.org, Al Viro Cc: kernel-team@fb.com Subject: [RFC PATCH v4 4/4] Btrfs: add support for linkat() AT_REPLACE Date: Tue, 28 Jan 2020 15:19:03 -0800 Message-Id: <55e3795a385177f13cde7041fe7a5e1644994879.1580251857.git.osandov@fb.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: References: MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Omar Sandoval The implementation is fairly straightforward and looks a lot like btrfs_rename(). The only tricky bit is that instead of playing games with the dcache, we simply drop the dentry for it to be instantiated on the next lookup. This can be improved in the future. Signed-off-by: Omar Sandoval --- fs/btrfs/inode.c | 63 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 59 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8c9a114f48f6..b489671d1b5d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6762,14 +6762,16 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry, int flags) { struct btrfs_trans_handle *trans = NULL; + unsigned int trans_num_items; struct btrfs_root *root = BTRFS_I(dir)->root; struct inode *inode = d_inode(old_dentry); + struct inode *new_inode = d_inode(dentry); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); u64 index; int err; int drop_inode = 0; - if (flags) + if (flags & ~AT_LINK_REPLACE) return -EINVAL; /* do not allow sys_link's with other subvols of the same device */ @@ -6779,17 +6781,50 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir, if (inode->i_nlink >= BTRFS_LINK_MAX) return -EMLINK; + /* check for collisions, even if the name isn't there */ + err = btrfs_check_dir_item_collision(root, dir->i_ino, + dentry->d_name.name, + dentry->d_name.len); + if (err) { + if (err == -EEXIST) { + if (WARN_ON(!new_inode)) + return err; + } else { + return err; + } + } + + /* + * we're using link to replace one file with another. Start IO on it now + * so we don't add too much work to the end of the transaction + */ + if (new_inode && S_ISREG(inode->i_mode) && new_inode->i_size) + filemap_flush(inode->i_mapping); + err = btrfs_set_inode_index(BTRFS_I(dir), &index); if (err) goto fail; /* + * For the source: * 2 items for inode and inode ref * 2 items for dir items * 1 item for parent inode * 1 item for orphan item deletion if O_TMPFILE + * + * For the target: + * 1 for the possible orphan item + * 1 for the dir item + * 1 for the dir index + * 1 for the inode ref + * 1 for the inode */ - trans = btrfs_start_transaction(root, inode->i_nlink ? 5 : 6); + trans_num_items = 5; + if (!inode->i_nlink) + trans_num_items++; + if (new_inode) + trans_num_items += 5; + trans = btrfs_start_transaction(root, trans_num_items); if (IS_ERR(trans)) { err = PTR_ERR(trans); trans = NULL; @@ -6801,6 +6836,22 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir, inc_nlink(inode); inode_inc_iversion(inode); inode->i_ctime = current_time(inode); + + if (new_inode) { + inode_inc_iversion(new_inode); + new_inode->i_ctime = current_time(new_inode); + err = btrfs_unlink_inode(trans, root, BTRFS_I(dir), + BTRFS_I(new_inode), + dentry->d_name.name, + dentry->d_name.len); + if (!err && new_inode->i_nlink == 0) + err = btrfs_orphan_add(trans, BTRFS_I(new_inode)); + if (err) { + btrfs_abort_transaction(trans, err); + goto fail; + } + } + set_bit(BTRFS_INODE_COPY_EVERYTHING, &BTRFS_I(inode)->runtime_flags); err = btrfs_add_nondir(trans, BTRFS_I(dir), dentry, BTRFS_I(inode), @@ -6824,8 +6875,12 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir, if (err) goto fail; } - ihold(inode); - d_instantiate(dentry, inode); + if (new_inode) { + d_drop(dentry); + } else { + ihold(inode); + d_instantiate(dentry, inode); + } ret = btrfs_log_new_name(trans, BTRFS_I(inode), NULL, parent, true, NULL); if (ret == BTRFS_NEED_TRANS_COMMIT) {