diff mbox series

[RFC] 9p: case-insensitive host filesystems

Message ID 1757498.AyhHxzoH2B@silver (mailing list archive)
State New, archived
Headers show
Series [RFC] 9p: case-insensitive host filesystems | expand

Commit Message

Christian Schoenebeck April 22, 2022, 6:02 p.m. UTC
Now that 9p support for macOS hosts just landed in QEMU 7.0 and with support 
for Windows hosts on the horizon [1], the question is how to deal with case-
insensitive host filesystems, which are very common on those two systems?

I made some tests, e.g. trying to setup a 9p root fs Linux installation on a 
macOS host as described in the QEMU HOWTO [2], which at a certain point causes 
the debootstrap script to fail when trying to unpack the 'libpam-runtime' 
package. That's because it would try to create this symlink:

  /usr/share/man/man7/PAM.7.gz -> /usr/share/man/man7/pam.7.gz

which fails with EEXIST on a case-insensitive APFS. Unfortunately you can't 
easily switch an existing APFS partition to case-sensitivity. It requires to 
reformat the entire partition, loosing all your data, etc.

So I did a quick test with QEMU as outlined below, trying to simply let 9p 
server "eat" EEXIST errors in such cases, but then I realized that most of the 
time it would not even come that far, as Linux client would first send a 
'Twalk' request to check whether target symlink entry already exists, and as 
it gets a positive response from 9p server (again, due to case-insensitivity) 
client would stop right there without even trying to send a 'Tsymlink' 
request.

So maybe it's better to handle case-insensitivity entirely on client side? 
I've read that some generic "case fold" code has landed in the Linux kernel 
recently that might do the trick?

Should 9p server give a hint to 9p client that it's a case-insensitive fs? And 
if yes, once per entire exported fs or rather for each directory (as there 
might be submounts on host)?

[1] https://lore.kernel.org/all/20220408171013.912436-1-bmeng.cn@gmail.com/
[2] https://wiki.qemu.org/Documentation/9p_root_fs

---
 hw/9pfs/9p-local.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

         }
         err = fchownat(dirfd, name, credp->fc_uid, credp->fc_gid,
@@ -983,6 +1018,25 @@ static int local_link(FsContext *ctx, V9fsPath *oldpath,
 
     ret = linkat(odirfd, oname, ndirfd, name, 0);
     if (ret < 0) {
+#if CONFIG_DARWIN
+        if (errno == EEXIST) {
+            printf("  -> linkat(odirfd=%d, oname='%s', ndirfd=%d, name='%s') 
= EEXIST\n", odirfd, oname, ndirfd, name);
+        }
+        if (errno == EEXIST &&
+            strcmp(oname, name) && !p9_stricmp(oname, name))
+        {
+            struct stat st1, st2;
+            const int cur_errno = errno;
+            if (!fstatat(odirfd, oname, &st1, AT_SYMLINK_NOFOLLOW) &&
+                !fstatat(ndirfd, name, &st2, AT_SYMLINK_NOFOLLOW) &&
+                st1.st_dev == st2.st_dev && st1.st_ino == st2.st_ino)
+            {
+                printf("  -> iCASE SAME\n");
+                ret = 0;
+            }
+            errno = cur_errno;
+        }
+#endif
         goto out_close;
     }

Comments

Dominique Martinet April 22, 2022, 7:57 p.m. UTC | #1
Christian Schoenebeck wrote on Fri, Apr 22, 2022 at 08:02:46PM +0200:
> So maybe it's better to handle case-insensitivity entirely on client side? 
> I've read that some generic "case fold" code has landed in the Linux kernel 
> recently that might do the trick?

I haven't tried, but settings S_CASEFOLD on every inodes i_flags might do
what you want client-side.
That's easy enough to test and could be a mount option

Even with that it's possible to do a direct open without readdir first
if one knows the path and I that would only be case-insensitive if the
backing server is case insensitive though, so just setting the option
and expecting it to work all the time might be a little bit
optimistic... I believe guess that should be an optimization at best.

Ideally the server should tell the client they are casefolded somehow,
but 9p doesn't have any capability/mount time negotiation besides msize
so that's difficult with the current protocol.
Christian Schoenebeck May 23, 2022, 5:59 p.m. UTC | #2
On Freitag, 22. April 2022 21:57:40 CEST Dominique Martinet wrote:
> Christian Schoenebeck wrote on Fri, Apr 22, 2022 at 08:02:46PM +0200:
> > So maybe it's better to handle case-insensitivity entirely on client side?
> > I've read that some generic "case fold" code has landed in the Linux
> > kernel
> > recently that might do the trick?
> 
> I haven't tried, but settings S_CASEFOLD on every inodes i_flags might do
> what you want client-side.
> That's easy enough to test and could be a mount option

I just made a quick test using:

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 08f48b70a741..5d8e77daed53 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -257,6 +257,7 @@ int v9fs_init_inode(struct v9fs_session_info *v9ses,
        inode->i_atime = inode->i_mtime = inode->i_ctime = 
current_time(inode);
        inode->i_mapping->a_ops = &v9fs_addr_operations;
        inode->i_private = NULL;
+       inode->i_flags |= S_CASEFOLD;
 
        switch (mode & S_IFMT) {
        case S_IFIFO:

Unfortunately that did not help much. I still get EEXIST error e.g. when 
trying 'ln -s foo FOO'.

I am not sure though whether there would be more code places to touch or 
whether that's even the expected behaviour with S_CASEFOLD for some reason.

> Even with that it's possible to do a direct open without readdir first
> if one knows the path and I that would only be case-insensitive if the
> backing server is case insensitive though, so just setting the option
> and expecting it to work all the time might be a little bit
> optimistic... I believe guess that should be an optimization at best.
> 
> Ideally the server should tell the client they are casefolded somehow,
> but 9p doesn't have any capability/mount time negotiation besides msize
> so that's difficult with the current protocol.
diff mbox series

Patch

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index d42ce6d8b8..d6cb45c758 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -39,6 +39,10 @@ 
 #endif
 #endif
 #include <sys/ioctl.h>
+#ifdef CONFIG_DARWIN
+#include <glib.h>
+#include <glib/gprintf.h>
+#endif
 
 #ifndef XFS_SUPER_MAGIC
 #define XFS_SUPER_MAGIC  0x58465342
@@ -57,6 +61,18 @@  typedef struct {
     int mountfd;
 } LocalData;
 
+#ifdef CONFIG_DARWIN
+
+/* Compare strings case-insensitive (assuming UTF-8 encoding). */
+static int p9_stricmp(const char *a, const char *b)
+{
+    g_autofree gchar *cia = g_utf8_casefold(a, -1);
+    g_autofree gchar *cib = g_utf8_casefold(b, -1);
+    return g_utf8_collate(cia, cib);
+}
+
+#endif
+
 int local_open_nofollow(FsContext *fs_ctx, const char *path, int flags,
                         mode_t mode)
 {
@@ -931,6 +947,25 @@  static int local_symlink(FsContext *fs_ctx, const char 
*oldpath,
                fs_ctx->export_flags & V9FS_SM_NONE) {
         err = symlinkat(oldpath, dirfd, name);
         if (err) {
+#if CONFIG_DARWIN
+            if (errno == EEXIST) {
+                printf("  -> symlinkat(oldpath='%s', dirfd=%d, name='%s') = 
EEXIST\n", oldpath, dirfd, name);
+            }
+            if (errno == EEXIST &&
+                strcmp(oldpath, name) && !p9_stricmp(oldpath, name))
+            {
+                struct stat st1, st2;
+                const int cur_errno = errno;
+                if (!fstatat(dirfd, oldpath, &st1, AT_SYMLINK_NOFOLLOW) &&
+                    !fstatat(dirfd, name, &st2, AT_SYMLINK_NOFOLLOW) &&
+                    st1.st_dev == st2.st_dev && st1.st_ino == st2.st_ino)
+                {
+                    printf("  -> iCASE SAME\n");
+                    err = 0;
+                }
+                errno = cur_errno;
+            }
+#endif
             goto out;