mbox series

[GIT,PULL] fs/9p patches for 6.9 merge window

Message ID ZfRkyxUf8TIgsYjA@1149290c588b (mailing list archive)
State New
Headers show
Series [GIT,PULL] fs/9p patches for 6.9 merge window | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git tags/9p-for-6.9

Message

Eric Van Hensbergen March 15, 2024, 3:10 p.m. UTC
The following changes since commit 6613476e225e090cc9aad49be7fa504e290dd33d:

  Linux 6.8-rc1 (2024-01-21 14:11:32 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git tags/9p-for-6.9

for you to fetch changes up to be57855f505003c5cafff40338d5d0f23b00ba4d:

  fs/9p: fix dups even in uncached mode (2024-01-26 16:46:56 +0000)

----------------------------------------------------------------
fs/9p changes for the 6.9 merge window

This pull request includes a number of patches
addressing improvements in the cache portions of the 9p
client.

The biggest improvements have to do with fixing handling
of inodes and eliminating duplicate structures and unnecessary
allocation/release of inode structures and many associated
unnecessary protocol traffic.  This also dramatically
reduced code complexity across the code and sets us up to add
proper temporal cache capabilities.

Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>

----------------------------------------------------------------
Eric Van Hensbergen (8):
      fs/9p: switch vfsmount to use v9fs_get_new_inode
      fs/9p: convert mkdir to use get_new_inode
      fs/9p: remove walk and inode allocation from symlink
      fs/9p: Eliminate redundant non-cache path in mknod
      fs/9p: Eliminate now unused v9fs_get_inode
      fs/9p: rework qid2ino logic
      fs/9p: simplify iget to remove unnecessary paths
      fs/9p: fix dups even in uncached mode

 fs/9p/v9fs.h           |  31 +++++-----------------------
 fs/9p/v9fs_vfs.h       |  11 ++++++----
 fs/9p/vfs_dir.c        |   4 ++--
 fs/9p/vfs_inode.c      | 150 +++++++++++++++++++--------------------------------------------------------------------------------------------------------------------
 fs/9p/vfs_inode_dotl.c | 194 ++++++++++++++++++++++++++++++++-----------------------------------------------------------------------------------------------------------------------------------------------
 fs/9p/vfs_super.c      |  45 +----------------------------------------
 6 files changed, 71 insertions(+), 364 deletions(-)

Comments

Linus Torvalds March 15, 2024, 5:17 p.m. UTC | #1
On Fri, 15 Mar 2024 at 08:10, Eric Van Hensbergen <ericvh@kernel.org> wrote:
>
> fs/9p changes for the 6.9 merge window

Entirely tangential, but your pgp key drives me insane, and it finally
drove me over the edge.

One of your uid's has your own name mis-spelled. This is not new.

Please tell me there's a reason for it.

          Linus
pr-tracker-bot@kernel.org March 15, 2024, 5:20 p.m. UTC | #2
The pull request you sent on Fri, 15 Mar 2024 15:10:03 +0000:

> git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git tags/9p-for-6.9

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c442a42363b2ce5c3eb2b0ff1e052ee956f0a29f

Thank you!
Oleg Nesterov April 8, 2024, 2:14 p.m. UTC | #3
Hello,

the commit 724a08450f74 ("fs/9p: simplify iget to remove unnecessary paths")
from this PR breaks my setup.

On the host:

	$ cat /etc/fstab
	/dev/mapper/volgrp-root /                       ext4    defaults        1 1
	/dev/mapper/volgrp-home /home                   ext4    defaults        1 2

	$ qemu-system-x86_64 -kernel ./arch/x86/boot/bzImage \
		-append 'rw rootfstype=9p rootflags=version=9p2000.L,trans=virtio' \
		-fsdev local,id=FSID_1,path=/home/oleg/TEST_GUEST/,security_model=none \
		-device virtio-9p-pci,fsdev=FSID_1,mount_tag=/dev/root \
		-fsdev local,id=FSID_2,path=/,security_model=none,readonly \
		-device virtio-9p-pci,fsdev=FSID_2,mount_tag=hostfs \
		-nographic -serial mon:stdio

In the guest:

	# mount -t 9p hostfs /host -o version=9p2000.L,trans=virtio,access=any

Now, before this patch:

	/# cd /host
	/host# strace -e stat perl -e '-d "home"'
	...
	stat("home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
	+++ exited with 0 +++
	/host# cd home
	/host/home#

After this patch:

	# cd /host
	/host# strace -e stat perl -e '-d "home"'
	...
	stat("home", 0x1397298)                 = -1 ELOOP (Too many levels of symbolic links)
	+++ exited with 0 +++
	/host# cd home
	-bash: cd: home: Too many levels of symbolic links
	/host# dmesg
	...
	[   54.255756] VFS: Lookup of 'home' in 9p 9p would have caused loop
	[   72.190399] VFS: Lookup of 'home' in 9p 9p would have caused loop
	[   72.191535] VFS: Lookup of 'home' in 9p 9p would have caused loop
	[   72.192488] VFS: Lookup of 'home' in 9p 9p would have caused loop

Oleg.


On 03/15, Eric Van Hensbergen wrote:
>
> The following changes since commit 6613476e225e090cc9aad49be7fa504e290dd33d:
>
>   Linux 6.8-rc1 (2024-01-21 14:11:32 -0800)
>
> are available in the Git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git tags/9p-for-6.9
>
> for you to fetch changes up to be57855f505003c5cafff40338d5d0f23b00ba4d:
>
>   fs/9p: fix dups even in uncached mode (2024-01-26 16:46:56 +0000)
>
> ----------------------------------------------------------------
> fs/9p changes for the 6.9 merge window
>
> This pull request includes a number of patches
> addressing improvements in the cache portions of the 9p
> client.
>
> The biggest improvements have to do with fixing handling
> of inodes and eliminating duplicate structures and unnecessary
> allocation/release of inode structures and many associated
> unnecessary protocol traffic.  This also dramatically
> reduced code complexity across the code and sets us up to add
> proper temporal cache capabilities.
>
> Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
>
> ----------------------------------------------------------------
> Eric Van Hensbergen (8):
>       fs/9p: switch vfsmount to use v9fs_get_new_inode
>       fs/9p: convert mkdir to use get_new_inode
>       fs/9p: remove walk and inode allocation from symlink
>       fs/9p: Eliminate redundant non-cache path in mknod
>       fs/9p: Eliminate now unused v9fs_get_inode
>       fs/9p: rework qid2ino logic
>       fs/9p: simplify iget to remove unnecessary paths
>       fs/9p: fix dups even in uncached mode
>
>  fs/9p/v9fs.h           |  31 +++++-----------------------
>  fs/9p/v9fs_vfs.h       |  11 ++++++----
>  fs/9p/vfs_dir.c        |   4 ++--
>  fs/9p/vfs_inode.c      | 150 +++++++++++++++++++--------------------------------------------------------------------------------------------------------------------
>  fs/9p/vfs_inode_dotl.c | 194 ++++++++++++++++++++++++++++++++-----------------------------------------------------------------------------------------------------------------------------------------------
>  fs/9p/vfs_super.c      |  45 +----------------------------------------
>  6 files changed, 71 insertions(+), 364 deletions(-)
Eric Van Hensbergen April 10, 2024, 5:20 p.m. UTC | #4
April 8, 2024 at 9:14 AM, "Oleg Nesterov" <oleg@redhat.com> wrote:
> 
> Hello,
> 
> the commit 724a08450f74 ("fs/9p: simplify iget to remove unnecessary paths")
> 
> from this PR breaks my setup.
> 

Thanks for the bisect and detailed reproduction instructions.  I am looking at this now and it seems to be related to another problem reported by Kent Overstreet where he was seeing symlink loop reports that were disrupting his testing environment.  Once I'm able to reproduce, I'll try and get a patch out later today, if not I may revert the commit to keep from disrupting folks' testing environments.

      -eric
Eric Van Hensbergen April 10, 2024, 6:57 p.m. UTC | #5
April 10, 2024 at 12:20 PM, "Eric Van Hensbergen" <eric.vanhensbergen@linux.dev> wrote:
> April 8, 2024 at 9:14 AM, "Oleg Nesterov" <oleg@redhat.com> wrote:
> > Hello,
> >  the commit 724a08450f74 ("fs/9p: simplify iget to remove unnecessary paths")
> >  from this PR breaks my setup.
> > 
> 
> Thanks for the bisect and detailed reproduction instructions. I am looking at this now and it seems to be related to another problem reported by Kent Overstreet where he was seeing symlink loop reports that were disrupting his testing environment. Once I'm able to reproduce, I'll try and get a patch out later today, if not I may revert the commit to keep from disrupting folks' testing environments.
> 

Okay, I think I understand this one, unfortunately it doesn't appear obviously related to Kent's report if what I believe is correct.  I think I've reproduced the problem, fundamentally, since you have two mount points you are exporting together. I believe we are getting an inode number collision which was being hidden by the "always create a new inode on lookup" inefficiency in v9fs_vfs_lookup.  You could probably verify that for me by stating the /home directory and the / directory on the server side of your setup.  When I created two partitions and mounted one inside the other the "home" and the "root" both had inode 2 and I got -ELOOP when trying to access.

The underlying cause in the patch series is that I was trying to maintain inodes versus constantly creating and deleting them -- and I simplified the inode lookup to be based purely on inode number (versus checking against times, type, i_generation, etc.) -- so collisions are much more likely to happen.  If qemu detects that this is a possibility it usually prints something: 
qemu-system-aarch64: warning: 9p: Multiple devices detected in same VirtFS export, which might lead to file ID collisions and severe misbehaviours on guest! You should either use a separate export for each device shared from host or use virtfs option 'multidevs=remap'!

I can confirm that multidevs=remap in qemu does appear to avoid the problem, but this doesn't help the fact that we have a broader regression on our hand that used to work.  It'd be useful to see if mutlidevs=remap also deals with your problem @Kent, but while I think its the same underlying problem, I don't think it may be the same solution.

I think I have to give up on relying on qid.path/inode-number as unique and maintain our own client view of those -- but this will cause problems with the way several parts of the code currently operate where we need to lookup inode by a unique identifier from the server (which is currently conveyed by qids).

Now that I understand the problem, I think I can work thorugh a partial revert which will go back to a more complex match in vfs_lookup (which examines several other fields from the server beyond qid.path) -- but maybe I can do it in such a way which will still avoid keeping duplicate inode structs around in memory.

This didn't show up in my regressions because I always export a single file system where the inodes are always unique.

Thanks for your patience while I work through this - now that i can reproduce the problem, I'm fairly certain I can get a patch together this week to test to see if it solves the regressions.

     -eric
Oleg Nesterov April 11, 2024, 9:29 a.m. UTC | #6
On 04/10, Eric Van Hensbergen wrote:
>
> April 10, 2024 at 12:20 PM, "Eric Van Hensbergen" <eric.vanhensbergen@linux.dev> wrote:
> > April 8, 2024 at 9:14 AM, "Oleg Nesterov" <oleg@redhat.com> wrote:
> > > Hello,
> > >  the commit 724a08450f74 ("fs/9p: simplify iget to remove unnecessary paths")
> > >  from this PR breaks my setup.
> >

> I think
> I've reproduced the problem, fundamentally, since you have two mount
> points you are exporting together. I believe we are getting an inode
> number collision which was being hidden by the "always create a new inode
> on lookup" inefficiency in v9fs_vfs_lookup.  You could probably verify
> that for me by stating the /home directory and the / directory on the
> server side of your setup.

Yes, yes,

	$ ls -ldi / /home
	2 dr-xr-xr-x. 18 root root 4096 Jan  4  2016 /
	2 drwxr-xr-x.  7 root root 4096 Dec 20  2022 /home

that is why I showed you the relevant parts of my /etc/fstab

> If qemu detects that this is a possibility it usually
> prints something: qemu-system-aarch64: warning: 9p: Multiple devices
> detected in same VirtFS export,

My qemu is quite old, it doesn't. But I tested this on another machine
and yes, the newer qemu does warn.

(annoyingly, I had to redirect stderr to the file to see this warning,
 it is cleared by the booting kernel otherwise).

> I can confirm that multidevs=remap in qemu does appear to avoid the
> problem,

Confirm!

I didn't know about this option (and again, my old qemu doesn't support
it), but everything seems to work just fine with the newer qemu and
multidevs=remap. Thanks!

So I think this regression is minor.

> now that i can
> reproduce the problem, I'm fairly certain I can get a patch together
> this week to test to see if it solves the regressions.

Thanks ;)

Oleg.