diff mbox series

fs: consistently deref the files table with rcu_access_pointer()

Message ID 20250313123159.1315079-1-mjguzik@gmail.com (mailing list archive)
State New
Headers show
Series fs: consistently deref the files table with rcu_access_pointer() | expand

Commit Message

Mateusz Guzik March 13, 2025, 12:31 p.m. UTC
... except when the table is known to be only used by one thread.

A file pointer can get installed at any moment despite the ->file_lock
being held since the following:
8a81252b774b53e6 ("fs/file.c: don't acquire files->file_lock in fd_install()")

Accesses subject to such a race can in principle suffer load tearing.

While here redo the comment in dup_fd() as it only covered a race against
files showing up, still assuming fd_install() takes the lock.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---

I confirmed the possiblity of the problem with this:
https://lwn.net/Articles/793253/#Load%20Tearing

Granted, the article being 6 years old might mean some magic was added
by now to prevent this particular problem.

While technically this classifies as a bugfix, given that nothing blew
up and this is more of a "just in case" change, I don't think this
warrants any backports. Thus I'm not adding a Fixes: tag to prevent this
from being picked by autosel.

 fs/file.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

Comments

Mateusz Guzik March 13, 2025, 1:42 p.m. UTC | #1
On Thu, Mar 13, 2025 at 1:32 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> ... except when the table is known to be only used by one thread.
>
> A file pointer can get installed at any moment despite the ->file_lock
> being held since the following:
> 8a81252b774b53e6 ("fs/file.c: don't acquire files->file_lock in fd_install()")
>
> Accesses subject to such a race can in principle suffer load tearing.
>
> While here redo the comment in dup_fd() as it only covered a race against
> files showing up, still assuming fd_install() takes the lock.
>
> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
> ---
>
> I confirmed the possiblity of the problem with this:
> https://lwn.net/Articles/793253/#Load%20Tearing
>
> Granted, the article being 6 years old might mean some magic was added
> by now to prevent this particular problem.
>
> While technically this classifies as a bugfix, given that nothing blew
> up and this is more of a "just in case" change, I don't think this
> warrants any backports. Thus I'm not adding a Fixes: tag to prevent this
> from being picked by autosel.
>
>  fs/file.c | 26 +++++++++++++++++---------
>  1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/fs/file.c b/fs/file.c
> index 6c159ede55f1..52010ecb27b8 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -423,17 +423,25 @@ struct files_struct *dup_fd(struct files_struct *oldf, struct fd_range *punch_ho
>         old_fds = old_fdt->fd;
>         new_fds = new_fdt->fd;
>
> +       /*
> +        * We may be racing against fd allocation from other threads using this
> +        * files_struct, despite holding ->file_lock.
> +        *
> +        * alloc_fd() might have already claimed a slot, while fd_install()
> +        * did not populate it yet. Note the latter operates locklessly, so
> +        * the file can show up as we are walking the array below.
> +        *
> +        * At the same time we know no files will disappear as all other
> +        * operations take the lock.
> +        *
> +        * Instead of trying to placate userspace racing with itself, we
> +        * ref the file if we see it and mark the fd slot as unused otherwise.
> +        */
>         for (i = open_files; i != 0; i--) {
> -               struct file *f = *old_fds++;
> +               struct file *f = rcu_access_pointer(*old_fds++);

sigh, that happens to work but is technically bogus -- I thought I did
rcu_deference, but instead had rcu_access_pointer in my fingers from
the assert thing. Thanks for Mathieu for noticing.

That is to say the patch has to s/rcu_access_pointer/rcu_dereference.

However, willy suggested also adding the check. So perhaps this can
instead use the _check variant with lockdep_is_held(&fdt->file_lock)
as the argument.

I don't have an opinion on this bit -- the accesses are next to the
lock acquire, so perhaps this only serves an uglifier.

That said, if you want the assert, I'll post a v2. Otherwise please
run the sed :->

>                 if (f) {
>                         get_file(f);
>                 } else {
> -                       /*
> -                        * The fd may be claimed in the fd bitmap but not yet
> -                        * instantiated in the files array if a sibling thread
> -                        * is partway through open().  So make sure that this
> -                        * fd is available to the new process.
> -                        */
>                         __clear_open_fd(open_files - i, new_fdt);
>                 }
>                 rcu_assign_pointer(*new_fds++, f);
> @@ -684,7 +692,7 @@ struct file *file_close_fd_locked(struct files_struct *files, unsigned fd)
>                 return NULL;
>
>         fd = array_index_nospec(fd, fdt->max_fds);
> -       file = fdt->fd[fd];
> +       file = rcu_access_pointer(fdt->fd[fd]);
>         if (file) {
>                 rcu_assign_pointer(fdt->fd[fd], NULL);
>                 __put_unused_fd(files, fd);
> @@ -1252,7 +1260,7 @@ __releases(&files->file_lock)
>          */
>         fdt = files_fdtable(files);
>         fd = array_index_nospec(fd, fdt->max_fds);
> -       tofree = fdt->fd[fd];
> +       tofree = rcu_access_pointer(fdt->fd[fd]);
>         if (!tofree && fd_is_open(fd, fdt))
>                 goto Ebusy;
>         get_file(file);
> --
> 2.43.0
>
diff mbox series

Patch

diff --git a/fs/file.c b/fs/file.c
index 6c159ede55f1..52010ecb27b8 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -423,17 +423,25 @@  struct files_struct *dup_fd(struct files_struct *oldf, struct fd_range *punch_ho
 	old_fds = old_fdt->fd;
 	new_fds = new_fdt->fd;
 
+	/*
+	 * We may be racing against fd allocation from other threads using this
+	 * files_struct, despite holding ->file_lock.
+	 *
+	 * alloc_fd() might have already claimed a slot, while fd_install()
+	 * did not populate it yet. Note the latter operates locklessly, so
+	 * the file can show up as we are walking the array below.
+	 *
+	 * At the same time we know no files will disappear as all other
+	 * operations take the lock.
+	 *
+	 * Instead of trying to placate userspace racing with itself, we
+	 * ref the file if we see it and mark the fd slot as unused otherwise.
+	 */
 	for (i = open_files; i != 0; i--) {
-		struct file *f = *old_fds++;
+		struct file *f = rcu_access_pointer(*old_fds++);
 		if (f) {
 			get_file(f);
 		} else {
-			/*
-			 * The fd may be claimed in the fd bitmap but not yet
-			 * instantiated in the files array if a sibling thread
-			 * is partway through open().  So make sure that this
-			 * fd is available to the new process.
-			 */
 			__clear_open_fd(open_files - i, new_fdt);
 		}
 		rcu_assign_pointer(*new_fds++, f);
@@ -684,7 +692,7 @@  struct file *file_close_fd_locked(struct files_struct *files, unsigned fd)
 		return NULL;
 
 	fd = array_index_nospec(fd, fdt->max_fds);
-	file = fdt->fd[fd];
+	file = rcu_access_pointer(fdt->fd[fd]);
 	if (file) {
 		rcu_assign_pointer(fdt->fd[fd], NULL);
 		__put_unused_fd(files, fd);
@@ -1252,7 +1260,7 @@  __releases(&files->file_lock)
 	 */
 	fdt = files_fdtable(files);
 	fd = array_index_nospec(fd, fdt->max_fds);
-	tofree = fdt->fd[fd];
+	tofree = rcu_access_pointer(fdt->fd[fd]);
 	if (!tofree && fd_is_open(fd, fdt))
 		goto Ebusy;
 	get_file(file);