diff mbox series

[v2] fsnotify: Avoid data race between fsnotify_recalc_mask() and fsnotify_object_watched()

Message ID 20240715130410.30475-1-jack@suse.cz (mailing list archive)
State New
Headers show
Series [v2] fsnotify: Avoid data race between fsnotify_recalc_mask() and fsnotify_object_watched() | expand

Commit Message

Jan Kara July 15, 2024, 1:04 p.m. UTC
When __fsnotify_recalc_mask() recomputes the mask on the watched object,
the compiler can "optimize" the code to perform partial updates to the
mask (including zeroing it at the beginning). Thus places checking
the object mask without conn->lock such as fsnotify_object_watched()
could see invalid states of the mask. Make sure the mask update is
performed by one memory store using WRITE_ONCE().

Reported-by: syzbot+701037856c25b143f1ad@syzkaller.appspotmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/all/CACT4Y+Zk0ohwwwHSD63U2-PQ=UuamXczr1mKBD6xtj2dyYKBvA@mail.gmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/fsnotify.c             | 21 ++++++++++++---------
 fs/notify/inotify/inotify_user.c |  2 +-
 fs/notify/mark.c                 |  8 ++++++--
 3 files changed, 19 insertions(+), 12 deletions(-)

I plan to merge this patch through my tree.

Comments

Josef Bacik July 15, 2024, 2:22 p.m. UTC | #1
On Mon, Jul 15, 2024 at 03:04:10PM +0200, Jan Kara wrote:
> When __fsnotify_recalc_mask() recomputes the mask on the watched object,
> the compiler can "optimize" the code to perform partial updates to the
> mask (including zeroing it at the beginning). Thus places checking
> the object mask without conn->lock such as fsnotify_object_watched()
> could see invalid states of the mask. Make sure the mask update is
> performed by one memory store using WRITE_ONCE().
> 
> Reported-by: syzbot+701037856c25b143f1ad@syzkaller.appspotmail.com
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Link: https://lore.kernel.org/all/CACT4Y+Zk0ohwwwHSD63U2-PQ=UuamXczr1mKBD6xtj2dyYKBvA@mail.gmail.com
> Signed-off-by: Jan Kara <jack@suse.cz>

I'm still hazy on the rules here and what KCSAN expects, but if we're using
READ_ONCE/WRITE_ONCE on a thing, do we have to use them everywhere we access
that member?  Because there's a few accesses in include/linux/fsnotify_backend.h
that were missed if so.  Thanks,

Josef
Marco Elver July 15, 2024, 2:48 p.m. UTC | #2
On Mon, Jul 15, 2024 at 10:22AM -0400, Josef Bacik wrote:
> On Mon, Jul 15, 2024 at 03:04:10PM +0200, Jan Kara wrote:
> > When __fsnotify_recalc_mask() recomputes the mask on the watched object,
> > the compiler can "optimize" the code to perform partial updates to the
> > mask (including zeroing it at the beginning). Thus places checking
> > the object mask without conn->lock such as fsnotify_object_watched()
> > could see invalid states of the mask. Make sure the mask update is
> > performed by one memory store using WRITE_ONCE().
> > 
> > Reported-by: syzbot+701037856c25b143f1ad@syzkaller.appspotmail.com
> > Reported-by: Dmitry Vyukov <dvyukov@google.com>
> > Link: https://lore.kernel.org/all/CACT4Y+Zk0ohwwwHSD63U2-PQ=UuamXczr1mKBD6xtj2dyYKBvA@mail.gmail.com
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> I'm still hazy on the rules here and what KCSAN expects, but if we're using
> READ_ONCE/WRITE_ONCE on a thing, do we have to use them everywhere we access
> that member?  Because there's a few accesses in include/linux/fsnotify_backend.h
> that were missed if so.  Thanks,

Only if they can be concurrently read + written. Per Linux's memory
model, plain concurrent accesses, where at least one is a write, are
data races. Data races are hard to reason about, because the compiler is
free to shuffle unmarked accesses around as it pleases, along with other
transformations that break atomicity and any ordering assumptions. (Data
races can also be a symptom of missing locking or other missed
serialization, but that does not seem to the case here.)

This is a reasonable overview:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt

I don't know the code well enough here, but the above commit message
also mentions a "conn->lock". E.g. if all accesses under the lock are
serialized, they won't need to be marked with ONCE.  But the lock won't
help if e.g. there are accesses within a lock critical section, but also
accesses outside the critical section, in which case both sides need to
be marked again (even the ones under the lock).

A good concurrent stress test + KCSAN is pretty good about quickly
telling you about data races.
Jan Kara July 17, 2024, 1:16 p.m. UTC | #3
On Mon 15-07-24 10:22:03, Josef Bacik wrote:
> On Mon, Jul 15, 2024 at 03:04:10PM +0200, Jan Kara wrote:
> > When __fsnotify_recalc_mask() recomputes the mask on the watched object,
> > the compiler can "optimize" the code to perform partial updates to the
> > mask (including zeroing it at the beginning). Thus places checking
> > the object mask without conn->lock such as fsnotify_object_watched()
> > could see invalid states of the mask. Make sure the mask update is
> > performed by one memory store using WRITE_ONCE().
> > 
> > Reported-by: syzbot+701037856c25b143f1ad@syzkaller.appspotmail.com
> > Reported-by: Dmitry Vyukov <dvyukov@google.com>
> > Link: https://lore.kernel.org/all/CACT4Y+Zk0ohwwwHSD63U2-PQ=UuamXczr1mKBD6xtj2dyYKBvA@mail.gmail.com
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> I'm still hazy on the rules here and what KCSAN expects, but if we're using
> READ_ONCE/WRITE_ONCE on a thing, do we have to use them everywhere we access
> that member?  Because there's a few accesses in include/linux/fsnotify_backend.h
> that were missed if so.  Thanks,

Indeed there are two accesses there that should be using READ_ONCE() as
well. I've missed those. Thanks for review!

								Honza
diff mbox series

Patch

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index ff69ae24c4e8..6675c4182dbf 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -172,8 +172,10 @@  static bool fsnotify_event_needs_parent(struct inode *inode, __u32 mnt_mask,
 	BUILD_BUG_ON(FS_EVENTS_POSS_ON_CHILD & ~FS_EVENTS_POSS_TO_PARENT);
 
 	/* Did either inode/sb/mount subscribe for events with parent/name? */
-	marks_mask |= fsnotify_parent_needed_mask(inode->i_fsnotify_mask);
-	marks_mask |= fsnotify_parent_needed_mask(inode->i_sb->s_fsnotify_mask);
+	marks_mask |= fsnotify_parent_needed_mask(
+				READ_ONCE(inode->i_fsnotify_mask));
+	marks_mask |= fsnotify_parent_needed_mask(
+				READ_ONCE(inode->i_sb->s_fsnotify_mask));
 	marks_mask |= fsnotify_parent_needed_mask(mnt_mask);
 
 	/* Did they subscribe for this event with parent/name info? */
@@ -184,8 +186,8 @@  static bool fsnotify_event_needs_parent(struct inode *inode, __u32 mnt_mask,
 static inline bool fsnotify_object_watched(struct inode *inode, __u32 mnt_mask,
 					   __u32 mask)
 {
-	__u32 marks_mask = inode->i_fsnotify_mask | mnt_mask |
-			   inode->i_sb->s_fsnotify_mask;
+	__u32 marks_mask = READ_ONCE(inode->i_fsnotify_mask) | mnt_mask |
+			   READ_ONCE(inode->i_sb->s_fsnotify_mask);
 
 	return mask & marks_mask & ALL_FSNOTIFY_EVENTS;
 }
@@ -202,7 +204,8 @@  int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 		      int data_type)
 {
 	const struct path *path = fsnotify_data_path(data, data_type);
-	__u32 mnt_mask = path ? real_mount(path->mnt)->mnt_fsnotify_mask : 0;
+	__u32 mnt_mask = path ?
+		READ_ONCE(real_mount(path->mnt)->mnt_fsnotify_mask) : 0;
 	struct inode *inode = d_inode(dentry);
 	struct dentry *parent;
 	bool parent_watched = dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED;
@@ -546,13 +549,13 @@  int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
 	    (!inode2 || !inode2->i_fsnotify_marks))
 		return 0;
 
-	marks_mask = sb->s_fsnotify_mask;
+	marks_mask = READ_ONCE(sb->s_fsnotify_mask);
 	if (mnt)
-		marks_mask |= mnt->mnt_fsnotify_mask;
+		marks_mask |= READ_ONCE(mnt->mnt_fsnotify_mask);
 	if (inode)
-		marks_mask |= inode->i_fsnotify_mask;
+		marks_mask |= READ_ONCE(inode->i_fsnotify_mask);
 	if (inode2)
-		marks_mask |= inode2->i_fsnotify_mask;
+		marks_mask |= READ_ONCE(inode2->i_fsnotify_mask);
 
 
 	/*
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 4ffc30606e0b..e163a4b79022 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -569,7 +569,7 @@  static int inotify_update_existing_watch(struct fsnotify_group *group,
 		/* more bits in old than in new? */
 		int dropped = (old_mask & ~new_mask);
 		/* more bits in this fsn_mark than the inode's mask? */
-		int do_inode = (new_mask & ~inode->i_fsnotify_mask);
+		int do_inode = (new_mask & ~READ_ONCE(inode->i_fsnotify_mask));
 
 		/* update the inode with this new fsn_mark */
 		if (dropped || do_inode)
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index c3eefa70633c..4aba49a58edd 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -128,7 +128,7 @@  __u32 fsnotify_conn_mask(struct fsnotify_mark_connector *conn)
 	if (WARN_ON(!fsnotify_valid_obj_type(conn->type)))
 		return 0;
 
-	return *fsnotify_conn_mask_p(conn);
+	return READ_ONCE(*fsnotify_conn_mask_p(conn));
 }
 
 static void fsnotify_get_sb_watched_objects(struct super_block *sb)
@@ -245,7 +245,11 @@  static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 		    !(mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF))
 			want_iref = true;
 	}
-	*fsnotify_conn_mask_p(conn) = new_mask;
+	/*
+	 * We use WRITE_ONCE() to prevent silly compiler optimizations from
+	 * confusing readers not holding conn->lock with partial updates.
+	 */
+	WRITE_ONCE(*fsnotify_conn_mask_p(conn), new_mask);
 
 	return fsnotify_update_iref(conn, want_iref);
 }