[RFC] VFS: lock source directory for link to avoid rename race.

Message ID	166330881189.15759.13499931397891560275@noble.neil.brown.name (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 From: "NeilBrown" <neilb@suse.de> To: "Al Viro" <viro@zeniv.linux.org.uk> Cc: "Miklos Szeredi" <mszeredi@redhat.com>, "Xavier Roche" <xavier.roche@algolia.com>, linux-fsdevel@vger.kernel.org Subject: [PATCH RFC] VFS: lock source directory for link to avoid rename race. In-reply-to: <YyEcqxthoso9SGI2@ZenIV> References: <20220221082002.508392-1-mszeredi@redhat.com>, <166304411168.30452.12018495245762529070@noble.neil.brown.name>, <YyATCgxi9Ovi8mYv@ZenIV>, <166311315747.20483.5039023553379547679@noble.neil.brown.name>, <YyEcqxthoso9SGI2@ZenIV> Date: Fri, 16 Sep 2022 16:13:31 +1000 Message-id: <166330881189.15759.13499931397891560275@noble.neil.brown.name> Precedence: bulk
Series	[RFC] VFS: lock source directory for link to avoid rename race. \| expand [RFC] VFS: lock source directory for link to avoid rename race.

diff --git a/Documentation/filesystems/directory-locking.rst b/Documentation/filesystems/directory-locking.rst index 504ba940c36c..da6fa5eff81d 100644 --- a/Documentation/filesystems/directory-locking.rst +++ b/Documentation/filesystems/directory-locking.rst @@ -11,7 +11,7 @@ When taking the i_rwsem on multiple non-directory objects, we always acquire the locks in order by increasing address. We'll call that "inode pointer" order in the following. -For our purposes all operations fall in 5 classes: +For our purposes all operations fall in 7 classes: 1) read access. Locking rules: caller locks directory we are accessing. The lock is taken shared. @@ -31,9 +31,9 @@ Then call the method. All locks are exclusive. NB: we might get away with locking the source (and target in exchange case) shared. -5) link creation. Locking rules: +5) link creation - source and target in name directory. Locking rules: - * lock parent + * lock parent before looking up base names * check that source is not a directory * lock source * call the method. @@ -58,6 +58,22 @@ rules: All ->i_rwsem are taken exclusive. Again, we might get away with locking the source (and target in exchange case) shared. +7) cross-directory link. This requires the source directory to be locked +so the source cannot be the target of a rename and so be unlinked before +the link happens (creating a link to an unlinked file is illegal). + +Same rules as cross-directory rename can be used (with different errors). +Locking the filesystem is expensive and often unnecessary so we have a +fast path that avoids it. Locking rules: + + * lock target parent + * trylock source parent. If this fails we unlock target parent + and fall back to full rename locking, then unlock filesystem once + directory locks are held. + * lookup base names + +Lock on source may be shared. + The rules above obviously guarantee that all directories that are going to be read, modified or removed by method will be locked by caller. @@ -101,6 +117,9 @@ non-directory objects are not included in the set of contended locks. Thus link creation can't be a part of deadlock - it can't be blocked on source and it means that it doesn't hold any locks. +The fast-path in link create cannot deadlock as it never blocks while +holding a lock. + Any contended object is either held by cross-directory rename or has a child that is also contended. Indeed, suppose that it is held by operation other than cross-directory rename. Then the lock this operation diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 4bb2627026ec..3190bb18f1c2 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -92,7 +92,7 @@ ops i_rwsem(inode) ============= ============================================= lookup: shared create: exclusive -link: exclusive (both) +link: exclusive (both) possibly shared on source dir mknod: exclusive symlink: exclusive mkdir: exclusive @@ -117,7 +117,11 @@ fileattr_set: exclusive Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem exclusive on victim. + ->rename() has ->i_rwsem on target if it exists, and also on + source if it is a non-directory. cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem. + ->link() may have shared ->i_rwsem on source directory if it is + different from target directory. See Documentation/filesystems/directory-locking.rst for more detailed discussion of the locking scheme for directory operations. diff --git a/fs/namei.c b/fs/namei.c index 53b4bc094db2..877cac4e2e63 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -4518,6 +4518,39 @@ int vfs_link(struct dentry *old_dentry, struct user_namespace *mnt_userns, } EXPORT_SYMBOL(vfs_link); +static void lock_link(struct dentry *dest, struct dentry *source, int flags) +{ + inode_lock_nested(dest->d_inode, I_MUTEX_PARENT); + if (dest == source || (flags & AT_EMPTY_PATH)) + return; + if (inode_trylock_shared(source->d_inode)) + return; + + /* Need rename mutex */ + inode_unlock(dest->d_inode); + + mutex_lock(&dest->d_sb->s_vfs_rename_mutex); + + if (d_ancestor(dest, source)) { + inode_lock_nested(dest->d_inode, I_MUTEX_PARENT); + inode_lock_shared_nested(source->d_inode, I_MUTEX_CHILD); + } else if (d_ancestor(source, dest)) { + inode_lock_shared_nested(source->d_inode, I_MUTEX_PARENT); + inode_lock_nested(dest->d_inode, I_MUTEX_CHILD); + } else { + inode_lock_nested(dest->d_inode, I_MUTEX_PARENT); + inode_lock_shared_nested(source->d_inode, I_MUTEX_PARENT2); + } + mutex_unlock(&dest->d_sb->s_vfs_rename_mutex); +} + +static void unlock_link(struct dentry *dest, struct dentry *source, int flags) +{ + if (source != dest && !(flags & AT_EMPTY_PATH)) + inode_unlock_shared(source->d_inode); + inode_unlock(dest->d_inode); +} + /* * Hardlinks are often used in delicate situations. We avoid * security-related surprises by not following symlinks on the @@ -4531,11 +4564,14 @@ int do_linkat(int olddfd, struct filename *old, int newdfd, struct filename *new, int flags) { struct user_namespace *mnt_userns; - struct dentry *new_dentry; - struct path old_path, new_path; + struct dentry *old_dentry, *new_dentry; + struct path old_path, new_path, link_path; + struct qstr old_last, new_last; + int old_type, new_type; struct inode *delegated_inode = NULL; int how = 0; int error; + int err2; if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) { error = -EINVAL; @@ -4554,44 +4590,83 @@ int do_linkat(int olddfd, struct filename *old, int newdfd, if (flags & AT_SYMLINK_FOLLOW) how |= LOOKUP_FOLLOW; retry: - error = filename_lookup(olddfd, old, how, &old_path, NULL); + err2 = 0; + error = filename_parentat(olddfd, old, how, &old_path, + &old_last, &old_type); if (error) goto out_putnames; + error = -EISDIR; + if (old_type != LAST_NORM && !(flags & AT_EMPTY_PATH)) + goto out_putnames; + error = filename_parentat(newdfd, new, (how & LOOKUP_REVAL), &new_path, + &new_last, &new_type); + if (error) + goto out_putoldpath; - new_dentry = filename_create(newdfd, new, &new_path, - (how & LOOKUP_REVAL)); - error = PTR_ERR(new_dentry); - if (IS_ERR(new_dentry)) - goto out_putpath; + err2 = mnt_want_write(new_path.mnt); error = -EXDEV; if (old_path.mnt != new_path.mnt) - goto out_dput; + goto out_putnewpath; + lock_link(new_path.dentry, old_path.dentry, flags); + + new_dentry = __lookup_hash(&new_last, new_path.dentry, how & LOOKUP_REVAL); + error = PTR_ERR(new_dentry); + if (IS_ERR(new_dentry)) + goto out_unlock; + error = -EEXIST; + if (d_is_positive(new_dentry)) + goto out_dput_new; + if (new_type != LAST_NORM) + goto out_dput_new; + + error = err2; + if (error) + goto out_dput_new; + + if (flags & AT_EMPTY_PATH) + old_dentry = dget(old_path.dentry); + else + old_dentry = __lookup_hash(&old_last, old_path.dentry, how); + error = PTR_ERR(old_dentry); + if (IS_ERR(old_dentry)) + goto out_dput_new; + error = -ENOENT; + if (d_is_negative(old_dentry)) + goto out_dput_old; + mnt_userns = mnt_user_ns(new_path.mnt); - error = may_linkat(mnt_userns, &old_path); + link_path.mnt = old_path.mnt; + link_path.dentry = old_dentry; + error = may_linkat(mnt_userns, &link_path); if (unlikely(error)) - goto out_dput; - error = security_path_link(old_path.dentry, &new_path, new_dentry); + goto out_dput_old; + error = security_path_link(old_dentry, &new_path, new_dentry); if (error) - goto out_dput; - error = vfs_link(old_path.dentry, mnt_userns, new_path.dentry->d_inode, + goto out_dput_old; + error = vfs_link(old_dentry, mnt_userns, new_path.dentry->d_inode, new_dentry, &delegated_inode); -out_dput: - done_path_create(&new_path, new_dentry); +out_dput_old: + dput(old_dentry); +out_dput_new: + dput(new_dentry); +out_unlock: + unlock_link(new_path.dentry, old_path.dentry, flags); +out_putnewpath: + if (!err2) + mnt_drop_write(new_path.mnt); + path_put(&new_path); +out_putoldpath: + path_put(&old_path); if (delegated_inode) { error = break_deleg_wait(&delegated_inode); - if (!error) { - path_put(&old_path); + if (!error) goto retry; - } } if (retry_estale(error, how)) { - path_put(&old_path); how |= LOOKUP_REVAL; goto retry; } -out_putpath: - path_put(&old_path); out_putnames: putname(old); putname(new);

[RFC] VFS: lock source directory for link to avoid rename race.

Commit Message

Comments

Patch