diff mbox series

[v5,4/5] fs: use backing_file container for internal files with "fake" f_path

Message ID 20230615112229.2143178-5-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series Handle notifications on overlayfs fake path files | expand

Commit Message

Amir Goldstein June 15, 2023, 11:22 a.m. UTC
Overlayfs uses open_with_fake_path() to allocate internal kernel files,
with a "fake" path - whose f_path is not on the same fs as f_inode.

Allocate a container struct backing_file for those internal files, that
is used to hold the "fake" ovl path along with the real path.

backing_file_real_path() can be used to access the stored real path.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/file_table.c     | 50 +++++++++++++++++++++++++++++++++++++++++++--
 fs/internal.h       |  5 +++--
 fs/open.c           | 46 ++++++++++++++++++++++++++++++-----------
 fs/overlayfs/file.c |  4 ++--
 include/linux/fs.h  | 23 ++++++++++++++++-----
 5 files changed, 105 insertions(+), 23 deletions(-)

Comments

Christoph Hellwig June 16, 2023, 7:15 a.m. UTC | #1
> -	if (!(f->f_mode & FMODE_NOACCOUNT))
> +	if (unlikely(f->f_mode & FMODE_BACKING))
> +		path_put(backing_file_real_path(f));
> +	else
>  		percpu_counter_dec(&nr_files);

This is still missing the earlier pointed out fix that we still need
the FMODE_NOACCOUNT check, isn't it?

> + * This is only for kernel internal use, and the allocate file must not be
> + * installed into file tables or such.

I'd use the same blurb I'd suggest for the previous patch here as well.

> +/**
> + * backing_file_open - open a backing file for kernel internal use
> + * @path:	path of the file to open
> + * @flags:	open flags
> + * @path:	path of the backing file
> + * @cred:	credentials for open
> + *
> + * Open a file whose f_inode != d_inode(f_path.dentry).
> + * the returned struct file is embedded inside a struct backing_file
> + * container which is used to store the @real_path of the inode.
> + * The @real_path can be accessed by backing_file_real_path() and the
> + * real dentry can be accessed with file_dentry().
> + * The kernel internal backing file is not accounted in nr_files.
> + * This is only for kernel internal use, and must not be installed into
> + * file tables or such.
> + */

I still find this comment not very descriptive.  Here is my counter
suggestion, which I'm also not totally happy with, and which might not
even be factually correct as I'm trying to understand the use case a bit
better by reading the code:

 * Open a backing file for a stackable file system (e.g. overlayfs).
 * For these backing files that reside on the underlying file system, we still
 * want to be able to return the path of the upper file in the stackable file
 * system.  This is done by embedding the returned file into a container
 * structure that also stores the path on the upper file system, which can be
 * retreived using backing_file_real_path().
 *
 * The return file is not accounted in nr_files and must not be installed
 * into the file descriptor table.

> +static inline const struct path *f_real_path(struct file *f)

Question from an earlier revision: why f_real_path and not file_real_path?

Also this really needs a kerneldoc comment explaining when it should
be used.
Amir Goldstein June 16, 2023, 7:44 a.m. UTC | #2
On Fri, Jun 16, 2023 at 10:15 AM Christoph Hellwig <hch@lst.de> wrote:
>
> > -     if (!(f->f_mode & FMODE_NOACCOUNT))
> > +     if (unlikely(f->f_mode & FMODE_BACKING))
> > +             path_put(backing_file_real_path(f));
> > +     else
> >               percpu_counter_dec(&nr_files);
>
> This is still missing the earlier pointed out fix that we still need
> the FMODE_NOACCOUNT check, isn't it?

Yes, I forgot and Christian has already fixed this on his branch.

>
> > + * This is only for kernel internal use, and the allocate file must not be
> > + * installed into file tables or such.
>
> I'd use the same blurb I'd suggest for the previous patch here as well.
>
> > +/**
> > + * backing_file_open - open a backing file for kernel internal use
> > + * @path:    path of the file to open
> > + * @flags:   open flags
> > + * @path:    path of the backing file
> > + * @cred:    credentials for open
> > + *
> > + * Open a file whose f_inode != d_inode(f_path.dentry).
> > + * the returned struct file is embedded inside a struct backing_file
> > + * container which is used to store the @real_path of the inode.
> > + * The @real_path can be accessed by backing_file_real_path() and the
> > + * real dentry can be accessed with file_dentry().
> > + * The kernel internal backing file is not accounted in nr_files.
> > + * This is only for kernel internal use, and must not be installed into
> > + * file tables or such.
> > + */
>
> I still find this comment not very descriptive.  Here is my counter
> suggestion, which I'm also not totally happy with, and which might not
> even be factually correct as I'm trying to understand the use case a bit
> better by reading the code:
>
>  * Open a backing file for a stackable file system (e.g. overlayfs).
>  * For these backing files that reside on the underlying file system, we still
>  * want to be able to return the path of the upper file in the stackable file
>  * system.  This is done by embedding the returned file into a container
>  * structure that also stores the path on the upper file system, which can be
>  * retreived using backing_file_real_path().

It is the other way around.
Those ovl files currently in master have the ovl path in f_path
and xfs inode in f_inode.
After this change, f_path is still ovl and f_inode is still xfs, but
backing_file_real_path() can be used to get the xfs path.

Using the terminology "upper file in the stackable file" above
is different from what "upper file" means in overlayfs terminology.
So maybe:

 * Open a backing file for a stackable filesystem (e.g. overlayfs).
 * @path may be on the stackable filesystem and backing inode on the
 * underlying filesystem.  In this case, we want to be able to return the
 * @real_path of the backing inode.  This is done by embedding the
 * returned file into a container structure that also stores the path of the
 * backing inode on the underlying filesystem, which can be retrieved
 * using backing_file_real_path().

>  *
>  * The return file is not accounted in nr_files and must not be installed
>  * into the file descriptor table.
>
> > +static inline const struct path *f_real_path(struct file *f)
>
> Question from an earlier revision: why f_real_path and not file_real_path?
>

I missed this question.
I don't really mind.
Considering the documentation below, perhaps it should be called
file_inode_path()

> Also this really needs a kerneldoc comment explaining when it should
> be used.

How about this:

/*
 * file_real_path - get the path corresponding to f_inode
 *
 * When opening a backing file for a stackable filesystem (e.g. overlayfs)
 * f_path may be on the stackable filesystem and f_inode on the
 * underlying filesystem.  When the path associated with f_inode is
 * needed, this helper should be used instead of accessing f_path directly.
*/

Thanks,
Amir.
diff mbox series

Patch

diff --git a/fs/file_table.c b/fs/file_table.c
index 4bc713865212..34a832504369 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -44,18 +44,40 @@  static struct kmem_cache *filp_cachep __read_mostly;
 
 static struct percpu_counter nr_files __cacheline_aligned_in_smp;
 
+/* Container for backing file with optional real path */
+struct backing_file {
+	struct file file;
+	struct path real_path;
+};
+
+static inline struct backing_file *backing_file(struct file *f)
+{
+	return container_of(f, struct backing_file, file);
+}
+
+struct path *backing_file_real_path(struct file *f)
+{
+	return &backing_file(f)->real_path;
+}
+EXPORT_SYMBOL_GPL(backing_file_real_path);
+
 static void file_free_rcu(struct rcu_head *head)
 {
 	struct file *f = container_of(head, struct file, f_rcuhead);
 
 	put_cred(f->f_cred);
-	kmem_cache_free(filp_cachep, f);
+	if (unlikely(f->f_mode & FMODE_BACKING))
+		kfree(backing_file(f));
+	else
+		kmem_cache_free(filp_cachep, f);
 }
 
 static inline void file_free(struct file *f)
 {
 	security_file_free(f);
-	if (!(f->f_mode & FMODE_NOACCOUNT))
+	if (unlikely(f->f_mode & FMODE_BACKING))
+		path_put(backing_file_real_path(f));
+	else
 		percpu_counter_dec(&nr_files);
 	call_rcu(&f->f_rcuhead, file_free_rcu);
 }
@@ -226,6 +248,30 @@  struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred)
 	return f;
 }
 
+/*
+ * Variant of alloc_empty_file() that allocates a backing_file container
+ * and doesn't check and modify nr_files.
+ *
+ * This is only for kernel internal use, and the allocate file must not be
+ * installed into file tables or such.
+ */
+struct file *alloc_empty_backing_file(int flags, const struct cred *cred)
+{
+	struct backing_file *ff;
+	int error;
+
+	ff = kzalloc(sizeof(struct backing_file), GFP_KERNEL);
+	if (unlikely(!ff))
+		return ERR_PTR(-ENOMEM);
+
+	error = init_file(&ff->file, flags, cred);
+	if (unlikely(error))
+		return ERR_PTR(error);
+
+	ff->file.f_mode |= FMODE_BACKING | FMODE_NOACCOUNT;
+	return &ff->file;
+}
+
 /**
  * alloc_file - allocate and initialize a 'struct file'
  *
diff --git a/fs/internal.h b/fs/internal.h
index bd3b2810a36b..9c31078e0d16 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -97,8 +97,9 @@  extern void chroot_fs_refs(const struct path *, const struct path *);
 /*
  * file_table.c
  */
-extern struct file *alloc_empty_file(int, const struct cred *);
-extern struct file *alloc_empty_file_noaccount(int, const struct cred *);
+struct file *alloc_empty_file(int flags, const struct cred *cred);
+struct file *alloc_empty_file_noaccount(int flags, const struct cred *cred);
+struct file *alloc_empty_backing_file(int flags, const struct cred *cred);
 
 static inline void put_file_access(struct file *file)
 {
diff --git a/fs/open.c b/fs/open.c
index c5da2b3eb105..059d7024155a 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1152,23 +1152,45 @@  struct file *kernel_file_open(const struct path *path, int flags,
 }
 EXPORT_SYMBOL_GPL(kernel_file_open);
 
-struct file *open_with_fake_path(const struct path *path, int flags,
-				struct inode *inode, const struct cred *cred)
+/**
+ * backing_file_open - open a backing file for kernel internal use
+ * @path:	path of the file to open
+ * @flags:	open flags
+ * @path:	path of the backing file
+ * @cred:	credentials for open
+ *
+ * Open a file whose f_inode != d_inode(f_path.dentry).
+ * the returned struct file is embedded inside a struct backing_file
+ * container which is used to store the @real_path of the inode.
+ * The @real_path can be accessed by backing_file_real_path() and the
+ * real dentry can be accessed with file_dentry().
+ * The kernel internal backing file is not accounted in nr_files.
+ * This is only for kernel internal use, and must not be installed into
+ * file tables or such.
+ */
+struct file *backing_file_open(const struct path *path, int flags,
+			       const struct path *real_path,
+			       const struct cred *cred)
 {
-	struct file *f = alloc_empty_file_noaccount(flags, cred);
-	if (!IS_ERR(f)) {
-		int error;
+	struct file *f;
+	int error;
 
-		f->f_path = *path;
-		error = do_dentry_open(f, inode, NULL);
-		if (error) {
-			fput(f);
-			f = ERR_PTR(error);
-		}
+	f = alloc_empty_backing_file(flags, cred);
+	if (IS_ERR(f))
+		return f;
+
+	f->f_path = *path;
+	path_get(real_path);
+	*backing_file_real_path(f) = *real_path;
+	error = do_dentry_open(f, d_inode(real_path->dentry), NULL);
+	if (error) {
+		fput(f);
+		f = ERR_PTR(error);
 	}
+
 	return f;
 }
-EXPORT_SYMBOL(open_with_fake_path);
+EXPORT_SYMBOL_GPL(backing_file_open);
 
 #define WILL_CREATE(flags)	(flags & (O_CREAT | __O_TMPFILE))
 #define O_PATH_FLAGS		(O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 39737c2aaa84..c610ae35b0b9 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -61,8 +61,8 @@  static struct file *ovl_open_realfile(const struct file *file,
 		if (!inode_owner_or_capable(real_idmap, realinode))
 			flags &= ~O_NOATIME;
 
-		realfile = open_with_fake_path(&file->f_path, flags, realinode,
-					       current_cred());
+		realfile = backing_file_open(&file->f_path, flags, realpath,
+					     current_cred());
 	}
 	revert_creds(old_cred);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1f8486e773af..ecf85a46fb2a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -171,6 +171,9 @@  typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File supports non-exclusive O_DIRECT writes from multiple threads */
 #define FMODE_DIO_PARALLEL_WRITE	((__force fmode_t)0x1000000)
 
+/* File is embedded in backing_file object */
+#define FMODE_BACKING		((__force fmode_t)0x2000000)
+
 /* File was opened by fanotify and shouldn't generate fanotify events */
 #define FMODE_NONOTIFY		((__force fmode_t)0x4000000)
 
@@ -2352,11 +2355,21 @@  static inline struct file *file_open_root_mnt(struct vfsmount *mnt,
 	return file_open_root(&(struct path){.mnt = mnt, .dentry = mnt->mnt_root},
 			      name, flags, mode);
 }
-extern struct file * dentry_open(const struct path *, int, const struct cred *);
-extern struct file *dentry_create(const struct path *path, int flags,
-				  umode_t mode, const struct cred *cred);
-extern struct file * open_with_fake_path(const struct path *, int,
-					 struct inode*, const struct cred *);
+struct file *dentry_open(const struct path *path, int flags,
+			 const struct cred *creds);
+struct file *dentry_create(const struct path *path, int flags, umode_t mode,
+			   const struct cred *cred);
+struct file *backing_file_open(const struct path *path, int flags,
+			       const struct path *real_path,
+			       const struct cred *cred);
+struct path *backing_file_real_path(struct file *f);
+static inline const struct path *f_real_path(struct file *f)
+{
+	if (unlikely(f->f_mode & FMODE_BACKING))
+		return backing_file_real_path(f);
+	return &f->f_path;
+}
+
 static inline struct file *file_clone_open(struct file *file)
 {
 	return dentry_open(&file->f_path, file->f_flags, file->f_cred);