Message ID | 20250114224514.2399813-1-antoine@lesviallon.fr (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] ceph: fix memory leak in ceph_mds_auth_match() | expand |
On Tue, 2025-01-14 at 23:45 +0100, Antoine Viallon wrote: > We now free the temporary target path substring allocation on every > possible branch, instead of omitting the default branch. > In some cases, a memory leak occured, which could rapidly crash the > system (depending on how many file accesses were attempted). > > This was detected in production because it caused a continuous memory > growth, > eventually triggering kernel OOM and completely hard-locking the > kernel. > > Relevant kmemleak stacktrace: > > unreferenced object 0xffff888131e69900 (size 128): > comm "git", pid 66104, jiffies 4295435999 > hex dump (first 32 bytes): > 76 6f 6c 75 6d 65 73 2f 63 6f 6e 74 61 69 6e 65 > volumes/containe > 72 73 2f 67 69 74 65 61 2f 67 69 74 65 61 2f 67 > rs/gitea/gitea/g > backtrace (crc 2f3bb450): > [<ffffffffaa68fb49>] __kmalloc_noprof+0x359/0x510 > [<ffffffffc32bf1df>] ceph_mds_check_access+0x5bf/0x14e0 > [ceph] > [<ffffffffc3235722>] ceph_open+0x312/0xd80 [ceph] > [<ffffffffaa7dd786>] do_dentry_open+0x456/0x1120 > [<ffffffffaa7e3729>] vfs_open+0x79/0x360 > [<ffffffffaa832875>] path_openat+0x1de5/0x4390 > [<ffffffffaa834fcc>] do_filp_open+0x19c/0x3c0 > [<ffffffffaa7e44a1>] do_sys_openat2+0x141/0x180 > [<ffffffffaa7e4945>] __x64_sys_open+0xe5/0x1a0 > [<ffffffffac2cc2f7>] do_syscall_64+0xb7/0x210 > [<ffffffffac400130>] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > It can be triggered by mouting a subdirectory of a CephFS filesystem, > and then trying to access files on this subdirectory with an auth > token using a path-scoped capability: > > $ ceph auth get client.services > [client.services] > key = REDACTED > caps mds = "allow rw fsname=cephfs path=/volumes/" > caps mon = "allow r fsname=cephfs" > caps osd = "allow rw tag cephfs data=cephfs" > > $ cat /proc/self/mounts > > services@00000000-0000-0000-0000-000000000000.cephfs=/volumes/contain > ers /ceph/containers ceph > rw,noatime,name=services,secret=<hidden>,ms_mode=prefer- > crc,mount_timeout=300,acl,mon_addr=[REDACTED]:3300,recover_session=cl > ean 0 0 > > $ seq 1 1000000 | xargs -P32 --replace={} touch > /ceph/containers/file-{} && \ > seq 1 1000000 | xargs -P32 --replace={} cat > /ceph/containers/file-{} > > Signed-off-by: Antoine Viallon <antoine@lesviallon.fr> > --- > fs/ceph/mds_client.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index 785fe489ef4b..c3b63243c2dd 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -5690,18 +5690,21 @@ static int ceph_mds_auth_match(struct > ceph_mds_client *mdsc, > * > * All the other cases > --> mismatch > */ > + int rc = 1; > char *first = strstr(_tpath, auth- > >match.path); > if (first != _tpath) { > - if (free_tpath) > - kfree(_tpath); > - return 0; > + rc = 0; > } > > if (tlen > len && _tpath[len] != '/') { > - if (free_tpath) > - kfree(_tpath); > - return 0; > + rc = 0; > } > + > + if (free_tpath) > + kfree(_tpath); I have some worry here. Maybe, I am wrong. Initially, we receive tpath pointer as function argument: https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/mds_client.c#L5605 Then, we assign tpath to _tpath: https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/mds_client.c#L5651 We allocate memory by condition: if (spath && (m = strlen(spath)) != 1) { /* mount path + '/' + tpath + an extra space */ n = m + 1 + tlen + 1; _tpath = kmalloc(n, GFP_NOFS); if (!_tpath) return -ENOMEM; /* remove the leading '/' */ snprintf(_tpath, n, "%s/%s", spath + 1, tpath); free_tpath = true; tlen = strlen(_tpath); } https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/mds_client.c#L5660 What if condition is not true and we don't allocate memory? We still have _tpath keeping the pointer on tpath and kfree() will be called. It sounds for me that we can free tpath and caller of ceph_mds_auth_match() will have use-after-free issue. Am I right here? Do I miss something here? Thanks, Slava. > + > + if (!rc) > + return 0; > } > } >
On 15/01/2025 00:27, Viacheslav Dubeyko wrote: > I have some worry here. Maybe, I am wrong. Initially, we receive tpath > pointer as function argument: > > https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/mds_client.c#L5605 > > Then, we assign tpath to _tpath: > > https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/mds_client.c#L5651 > > We allocate memory by condition: > > if (spath && (m = strlen(spath)) != 1) { > /* mount path + '/' + tpath + an extra > space */ > n = m + 1 + tlen + 1; > _tpath = kmalloc(n, GFP_NOFS); > if (!_tpath) > return -ENOMEM; > /* remove the leading '/' */ > snprintf(_tpath, n, "%s/%s", spath + > 1, tpath); > free_tpath = true; > tlen = strlen(_tpath); > } > > https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/mds_client.c#L5660 > > What if condition is not true and we don't allocate memory? We still > have _tpath keeping the pointer on tpath and kfree() will be called. It > sounds for me that we can free tpath and caller of > ceph_mds_auth_match() will have use-after-free issue. Am I right here? > Do I miss something here? Hello Slava, actually, we check that free_tpath is set to true before trying to free _tpath, and the only time free_tpath is set to true is after a successful kmalloc assigned to _tpath. + if (free_tpath) + kfree(_tpath); > > Thanks, > Slava. Antoine
On Wed, 2025-01-15 at 08:32 +0100, Antoine Viallon wrote: > On 15/01/2025 00:27, Viacheslav Dubeyko wrote: > > I have some worry here. Maybe, I am wrong. Initially, we receive > > tpath > > pointer as function argument: > > > > https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/mds_client.c#L5605 > > > > Then, we assign tpath to _tpath: > > > > https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/mds_client.c#L5651 > > > > We allocate memory by condition: > > > > if (spath && (m = strlen(spath)) != 1) { > > /* mount path + '/' + tpath + an > > extra > > space */ > > n = m + 1 + tlen + 1; > > _tpath = kmalloc(n, GFP_NOFS); > > if (!_tpath) > > return -ENOMEM; > > /* remove the leading '/' */ > > snprintf(_tpath, n, "%s/%s", spath > > + > > 1, tpath); > > free_tpath = true; > > tlen = strlen(_tpath); > > } > > > > https://elixir.bootlin.com/linux/v6.13-rc3/source/fs/ceph/mds_client.c#L5660 > > > > What if condition is not true and we don't allocate memory? We > > still > > have _tpath keeping the pointer on tpath and kfree() will be > > called. It > > sounds for me that we can free tpath and caller of > > ceph_mds_auth_match() will have use-after-free issue. Am I right > > here? > > Do I miss something here? > > Hello Slava, > actually, we check that free_tpath is set to true before trying to > free > _tpath, and the only time free_tpath is set to true is after a > successful kmalloc assigned to _tpath. Yeah, I see now. Thanks. But likewise logic looks slightly confusing and it could be a real source of bugs. Thanks, Slava.
On Tue, 2025-01-14 at 23:45 +0100, Antoine Viallon wrote: > We now free the temporary target path substring allocation on every > possible branch, instead of omitting the default branch. > In some cases, a memory leak occured, which could rapidly crash the > system (depending on how many file accesses were attempted). > > This was detected in production because it caused a continuous memory > growth, > eventually triggering kernel OOM and completely hard-locking the > kernel. > > Relevant kmemleak stacktrace: > > unreferenced object 0xffff888131e69900 (size 128): > comm "git", pid 66104, jiffies 4295435999 > hex dump (first 32 bytes): > 76 6f 6c 75 6d 65 73 2f 63 6f 6e 74 61 69 6e 65 > volumes/containe > 72 73 2f 67 69 74 65 61 2f 67 69 74 65 61 2f 67 > rs/gitea/gitea/g > backtrace (crc 2f3bb450): > [<ffffffffaa68fb49>] __kmalloc_noprof+0x359/0x510 > [<ffffffffc32bf1df>] ceph_mds_check_access+0x5bf/0x14e0 > [ceph] > [<ffffffffc3235722>] ceph_open+0x312/0xd80 [ceph] > [<ffffffffaa7dd786>] do_dentry_open+0x456/0x1120 > [<ffffffffaa7e3729>] vfs_open+0x79/0x360 > [<ffffffffaa832875>] path_openat+0x1de5/0x4390 > [<ffffffffaa834fcc>] do_filp_open+0x19c/0x3c0 > [<ffffffffaa7e44a1>] do_sys_openat2+0x141/0x180 > [<ffffffffaa7e4945>] __x64_sys_open+0xe5/0x1a0 > [<ffffffffac2cc2f7>] do_syscall_64+0xb7/0x210 > [<ffffffffac400130>] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > It can be triggered by mouting a subdirectory of a CephFS filesystem, > and then trying to access files on this subdirectory with an auth > token using a path-scoped capability: > > $ ceph auth get client.services > [client.services] > key = REDACTED > caps mds = "allow rw fsname=cephfs path=/volumes/" > caps mon = "allow r fsname=cephfs" > caps osd = "allow rw tag cephfs data=cephfs" > > $ cat /proc/self/mounts > > services@00000000-0000-0000-0000-000000000000.cephfs=/volumes/contain > ers /ceph/containers ceph > rw,noatime,name=services,secret=<hidden>,ms_mode=prefer- > crc,mount_timeout=300,acl,mon_addr=[REDACTED]:3300,recover_session=cl > ean 0 0 > > $ seq 1 1000000 | xargs -P32 --replace={} touch > /ceph/containers/file-{} && \ > seq 1 1000000 | xargs -P32 --replace={} cat > /ceph/containers/file-{} > > Signed-off-by: Antoine Viallon <antoine@lesviallon.fr> > --- > fs/ceph/mds_client.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index 785fe489ef4b..c3b63243c2dd 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -5690,18 +5690,21 @@ static int ceph_mds_auth_match(struct > ceph_mds_client *mdsc, > * > * All the other cases > --> mismatch > */ > + int rc = 1; > char *first = strstr(_tpath, auth- > >match.path); > if (first != _tpath) { > - if (free_tpath) > - kfree(_tpath); > - return 0; > + rc = 0; > } > > if (tlen > len && _tpath[len] != '/') { > - if (free_tpath) > - kfree(_tpath); > - return 0; > + rc = 0; > } > + > + if (free_tpath) > + kfree(_tpath); > + > + if (!rc) > + return 0; > } > } > Looks good. Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Thanks, Slava.
On Wed, Jan 15, 2025 at 6:49 PM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote: > > On Tue, 2025-01-14 at 23:45 +0100, Antoine Viallon wrote: > > We now free the temporary target path substring allocation on every > > possible branch, instead of omitting the default branch. > > In some cases, a memory leak occured, which could rapidly crash the > > system (depending on how many file accesses were attempted). > > > > This was detected in production because it caused a continuous memory > > growth, > > eventually triggering kernel OOM and completely hard-locking the > > kernel. > > > > Relevant kmemleak stacktrace: > > > > unreferenced object 0xffff888131e69900 (size 128): > > comm "git", pid 66104, jiffies 4295435999 > > hex dump (first 32 bytes): > > 76 6f 6c 75 6d 65 73 2f 63 6f 6e 74 61 69 6e 65 > > volumes/containe > > 72 73 2f 67 69 74 65 61 2f 67 69 74 65 61 2f 67 > > rs/gitea/gitea/g > > backtrace (crc 2f3bb450): > > [<ffffffffaa68fb49>] __kmalloc_noprof+0x359/0x510 > > [<ffffffffc32bf1df>] ceph_mds_check_access+0x5bf/0x14e0 > > [ceph] > > [<ffffffffc3235722>] ceph_open+0x312/0xd80 [ceph] > > [<ffffffffaa7dd786>] do_dentry_open+0x456/0x1120 > > [<ffffffffaa7e3729>] vfs_open+0x79/0x360 > > [<ffffffffaa832875>] path_openat+0x1de5/0x4390 > > [<ffffffffaa834fcc>] do_filp_open+0x19c/0x3c0 > > [<ffffffffaa7e44a1>] do_sys_openat2+0x141/0x180 > > [<ffffffffaa7e4945>] __x64_sys_open+0xe5/0x1a0 > > [<ffffffffac2cc2f7>] do_syscall_64+0xb7/0x210 > > [<ffffffffac400130>] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > It can be triggered by mouting a subdirectory of a CephFS filesystem, > > and then trying to access files on this subdirectory with an auth > > token using a path-scoped capability: > > > > $ ceph auth get client.services > > [client.services] > > key = REDACTED > > caps mds = "allow rw fsname=cephfs path=/volumes/" > > caps mon = "allow r fsname=cephfs" > > caps osd = "allow rw tag cephfs data=cephfs" > > > > $ cat /proc/self/mounts > > > > services@00000000-0000-0000-0000-000000000000.cephfs=/volumes/contain > > ers /ceph/containers ceph > > rw,noatime,name=services,secret=<hidden>,ms_mode=prefer- > > crc,mount_timeout=300,acl,mon_addr=[REDACTED]:3300,recover_session=cl > > ean 0 0 > > > > $ seq 1 1000000 | xargs -P32 --replace={} touch > > /ceph/containers/file-{} && \ > > seq 1 1000000 | xargs -P32 --replace={} cat > > /ceph/containers/file-{} > > > > Signed-off-by: Antoine Viallon <antoine@lesviallon.fr> > > --- > > fs/ceph/mds_client.c | 15 +++++++++------ > > 1 file changed, 9 insertions(+), 6 deletions(-) > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > index 785fe489ef4b..c3b63243c2dd 100644 > > --- a/fs/ceph/mds_client.c > > +++ b/fs/ceph/mds_client.c > > @@ -5690,18 +5690,21 @@ static int ceph_mds_auth_match(struct > > ceph_mds_client *mdsc, > > * > > * All the other cases > > --> mismatch > > */ > > + int rc = 1; > > char *first = strstr(_tpath, auth- > > >match.path); > > if (first != _tpath) { > > - if (free_tpath) > > - kfree(_tpath); > > - return 0; > > + rc = 0; > > } > > > > if (tlen > len && _tpath[len] != '/') { > > - if (free_tpath) > > - kfree(_tpath); > > - return 0; > > + rc = 0; > > } > > + > > + if (free_tpath) > > + kfree(_tpath); > > + > > + if (!rc) > > + return 0; > > } > > } > > > > Looks good. > > Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Hi Antoine, Slava, I have a slight nit that if (first != _tpath) { rc = 0; } if (tlen > len && _tpath[len] != '/') { rc = 0; } isn't the exact equivalent of the previous if (first != _tpath) { ... return 0; } if (tlen > len && _tpath[len] != '/') { ... return 0; } logic. I think if (first != _tpath || (tlen > len && _tpath[len] != '/')) { rc = 0; } would be better and a tiny bit more efficient. Also, renaming rc to path_matched and making it a bool for consistency with gid_matched in the first half of the function seems worth it. I went ahead and applied the patch with these changes plus indentation fixups: https://github.com/ceph/ceph-client/commit/3b7d93db450e9d8ead80d75e2a303248f1528c35 Please let me know if there are any objections. Thanks, Ilya
On Wed, 2025-01-15 at 20:49 +0100, Ilya Dryomov wrote: > On Wed, Jan 15, 2025 at 6:49 PM Viacheslav Dubeyko > <Slava.Dubeyko@ibm.com> wrote: > > > <skipped> > Hi Antoine, Slava, > > I have a slight nit that > > if (first != _tpath) { > rc = 0; > } > > if (tlen > len && _tpath[len] != '/') { > rc = 0; > } > > isn't the exact equivalent of the previous > > if (first != _tpath) { > ... > return 0; > } > > if (tlen > len && _tpath[len] != '/') { > ... > return 0; > } > > logic. I think > > if (first != _tpath || > (tlen > len && _tpath[len] != '/')) { > rc = 0; > } > > would be better and a tiny bit more efficient. Also, renaming rc to > path_matched and making it a bool for consistency with gid_matched in > the first half of the function seems worth it. > Yeah, that's right. I have missed this point. Looks better. Thanks, Slava.
Thank you very much Ilya, your patch is clearer indeed. I'm also wondering if the allocation itself could be avoided. In any case, it is good enough for now :) Antoine Viallon
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 785fe489ef4b..c3b63243c2dd 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -5690,18 +5690,21 @@ static int ceph_mds_auth_match(struct ceph_mds_client *mdsc, * * All the other cases --> mismatch */ + int rc = 1; char *first = strstr(_tpath, auth->match.path); if (first != _tpath) { - if (free_tpath) - kfree(_tpath); - return 0; + rc = 0; } if (tlen > len && _tpath[len] != '/') { - if (free_tpath) - kfree(_tpath); - return 0; + rc = 0; } + + if (free_tpath) + kfree(_tpath); + + if (!rc) + return 0; } }
We now free the temporary target path substring allocation on every possible branch, instead of omitting the default branch. In some cases, a memory leak occured, which could rapidly crash the system (depending on how many file accesses were attempted). This was detected in production because it caused a continuous memory growth, eventually triggering kernel OOM and completely hard-locking the kernel. Relevant kmemleak stacktrace: unreferenced object 0xffff888131e69900 (size 128): comm "git", pid 66104, jiffies 4295435999 hex dump (first 32 bytes): 76 6f 6c 75 6d 65 73 2f 63 6f 6e 74 61 69 6e 65 volumes/containe 72 73 2f 67 69 74 65 61 2f 67 69 74 65 61 2f 67 rs/gitea/gitea/g backtrace (crc 2f3bb450): [<ffffffffaa68fb49>] __kmalloc_noprof+0x359/0x510 [<ffffffffc32bf1df>] ceph_mds_check_access+0x5bf/0x14e0 [ceph] [<ffffffffc3235722>] ceph_open+0x312/0xd80 [ceph] [<ffffffffaa7dd786>] do_dentry_open+0x456/0x1120 [<ffffffffaa7e3729>] vfs_open+0x79/0x360 [<ffffffffaa832875>] path_openat+0x1de5/0x4390 [<ffffffffaa834fcc>] do_filp_open+0x19c/0x3c0 [<ffffffffaa7e44a1>] do_sys_openat2+0x141/0x180 [<ffffffffaa7e4945>] __x64_sys_open+0xe5/0x1a0 [<ffffffffac2cc2f7>] do_syscall_64+0xb7/0x210 [<ffffffffac400130>] entry_SYSCALL_64_after_hwframe+0x77/0x7f It can be triggered by mouting a subdirectory of a CephFS filesystem, and then trying to access files on this subdirectory with an auth token using a path-scoped capability: $ ceph auth get client.services [client.services] key = REDACTED caps mds = "allow rw fsname=cephfs path=/volumes/" caps mon = "allow r fsname=cephfs" caps osd = "allow rw tag cephfs data=cephfs" $ cat /proc/self/mounts services@00000000-0000-0000-0000-000000000000.cephfs=/volumes/containers /ceph/containers ceph rw,noatime,name=services,secret=<hidden>,ms_mode=prefer-crc,mount_timeout=300,acl,mon_addr=[REDACTED]:3300,recover_session=clean 0 0 $ seq 1 1000000 | xargs -P32 --replace={} touch /ceph/containers/file-{} && \ seq 1 1000000 | xargs -P32 --replace={} cat /ceph/containers/file-{} Signed-off-by: Antoine Viallon <antoine@lesviallon.fr> --- fs/ceph/mds_client.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)