diff mbox series

[v2] fs/ceph/file: fix buffer overflow in __ceph_sync_read()

Message ID 20241127165405.2676516-1-max.kellermann@ionos.com (mailing list archive)
State New
Headers show
Series [v2] fs/ceph/file: fix buffer overflow in __ceph_sync_read() | expand

Commit Message

Max Kellermann Nov. 27, 2024, 4:54 p.m. UTC
If the inode size gets truncated by another task, __ceph_sync_read()
may crash with a buffer overflow because it sets `left` to a huge
value:

  else if (off + ret > i_size)
          left = i_size - off;

Imagine `i_size` was truncated to zero; `off + ret > i_size` is always
true, but `i_size - off` can be negative; since `left` is unsigned, it
turns into a rather huge number, and thus the `while (left > 0)` loop
never stops until it eventually crashes because `pages[idx]` overflows
the `pages` allocation.

We need to ensure that `i_size` never becomes smaller than `off`.  I
suggest breaking from the loop as soon as this happens, right after
the `i_size = i_size_read(inode)` update.

This can be reproduced easily by running a program like this on one
Ceph client:

  ioctl(fd, CEPH_IOC_SYNCIO);
  char buffer[16384];
  while (1) pread(fd, buffer, sizeof(buffer), 8192);

Then, on another server, truncate and rewrite the file until the first
server's kernel crashes (I never needed more than two attempts to
trigger the kernel crash):

  dd if=/dev/urandom of=foo bs=1k count=64

This is how the crash looks like (with KASAN and some debug logs from
`__ceph_sync_read` and `ceph_fill_file_size`):

 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 16384 i_size 65536
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 16384 retry_op 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: size 65536 -> 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_seq 36656 -> 36657
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 1024 i_size 0
 ==================================================================
 BUG: KASAN: slab-out-of-bounds in __ceph_sync_read+0x173f/0x1b10
 Read of size 8 at addr ffff8881d5dfbea0 by task pread/3276

 CPU: 3 UID: 2147488069 PID: 3276 Comm: pread Not tainted 6.11.10-cm4all1-hp+ #254
 Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 09/05/2019
 Call Trace:
  <TASK>
  dump_stack_lvl+0x62/0x90
  print_report+0xc4/0x5e0
  ? __virt_addr_valid+0x1e9/0x3a0
  ? __ceph_sync_read+0x173f/0x1b10
  kasan_report+0xb9/0xf0
  ? __ceph_sync_read+0x173f/0x1b10
  __ceph_sync_read+0x173f/0x1b10
  ? __pfx___ceph_sync_read+0x10/0x10
  ? lock_acquire+0x186/0x4d0
  ? ceph_read_iter+0xace/0x19f0
  ceph_read_iter+0xace/0x19f0
  ? lock_release+0x648/0xb50
  ? __pfx_ceph_read_iter+0x10/0x10
  ? __rseq_handle_notify_resume+0x8ed/0xd40
  ? __pfx___rseq_handle_notify_resume+0x10/0x10
  ? vfs_read+0x6e0/0xba0
  vfs_read+0x6e0/0xba0
  ? __pfx_vfs_read+0x10/0x10
  ? syscall_exit_to_user_mode+0x9a/0x190
  ? syscall_exit_to_user_mode+0x9a/0x190
  __x64_sys_pread64+0x19b/0x1f0
  ? __pfx___x64_sys_pread64+0x10/0x10
  ? __pfx___rseq_handle_notify_resume+0x10/0x10
  do_syscall_64+0x82/0x130
  ? lockdep_hardirqs_on_prepare+0x275/0x3e0
  ? syscall_exit_to_user_mode+0x9a/0x190
  ? do_syscall_64+0x8e/0x130
  ? do_syscall_64+0x8e/0x130
  ? lockdep_hardirqs_on_prepare+0x275/0x3e0
  ? syscall_exit_to_user_mode+0x9a/0x190
  ? do_syscall_64+0x8e/0x130
  ? do_syscall_64+0x8e/0x130
  ? syscall_exit_to_user_mode+0x9a/0x190
  ? do_syscall_64+0x8e/0x130
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f8449d18343
 Code: 48 8b 6c 24 48 e8 3d 00 f3 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 80 3d a1 42 0e 00 00 49 89 ca 74 14 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5d c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 10
 RSP: 002b:00007ffd7a2e8b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000011
 RAX: ffffffffffffffda RBX: 00007ffd7a2e8cc8 RCX: 00007f8449d18343
 RDX: 0000000000004000 RSI: 0000557f7917c2a0 RDI: 0000000000000003
 RBP: 00007ffd7a2e8bb0 R08: 0000557f7919d000 R09: 0000000000021001
 R10: 0000000000002000 R11: 0000000000000202 R12: 0000000000000000
 R13: 00007ffd7a2e8cf0 R14: 0000557f436c2dd8 R15: 00007f8449e43020
  </TASK>

 Allocated by task 3276:
  kasan_save_stack+0x1c/0x40
  kasan_save_track+0x10/0x30
  __kasan_kmalloc+0x8b/0x90
  __kmalloc_noprof+0x1bf/0x490
  ceph_alloc_page_vector+0x36/0x110
  __ceph_sync_read+0x769/0x1b10
  ceph_read_iter+0xace/0x19f0
  vfs_read+0x6e0/0xba0
  __x64_sys_pread64+0x19b/0x1f0
  do_syscall_64+0x82/0x130
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

 The buggy address belongs to the object at ffff8881d5dfbe80
  which belongs to the cache kmalloc-32 of size 32
 The buggy address is located 0 bytes to the right of
  allocated 32-byte region [ffff8881d5dfbe80, ffff8881d5dfbea0)

 The buggy address belongs to the physical page:
 page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d5dfb
 flags: 0x2fffc0000000000(node=0|zone=2|lastcpupid=0x3fff)
 page_type: 0xfdffffff(slab)
 raw: 02fffc0000000000 ffff888100042780 dead000000000122 0000000000000000
 raw: 0000000000000000 0000000080400040 00000001fdffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff8881d5dfbd80: fa fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8881d5dfbe00: fa fb fb fb fc fc fc fc fa fb fb fb fc fc fc fc
 >ffff8881d5dfbe80: 00 00 00 00 fc fc fc fc fa fb fb fb fc fc fc fc
                                ^
  ffff8881d5dfbf00: fc fc fc fc fc fc fc fc fa fb fb fb fc fc fc fc
  ffff8881d5dfbf80: fa fb fb fb fc fc fc fc fa fb fb fb fc fc fc fc
 ==================================================================
 Disabling lock debugging due to kernel taint
 Oops: general protection fault, probably for non-canonical address 0xe021fc6b8000019a: 0000 [#1] SMP KASAN PTI
 KASAN: maybe wild-memory-access in range [0x0110035c00000cd0-0x0110035c00000cd7]
 CPU: 3 UID: 2147488069 PID: 3276 Comm: pread Tainted: G    B              6.11.10-cm4all1-hp+ #254
 Tainted: [B]=BAD_PAGE
 Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 09/05/2019
 RIP: 0010:__ceph_sync_read+0xc33/0x1b10
 Code: 39 e7 4d 0f 47 fc 48 8d 0c c6 48 89 c8 48 c1 e8 03 42 80 3c 30 00 0f 85 0b 0b 00 00 48 8b 11 48 8d 7a 08 48 89 f8 48 c1 e8 03 <42> 80 3c 30 00 0f 85 0d 0b 00 00 48 8b 42 08 a8 01 0f 84 ee 04 00
 RSP: 0018:ffff8881ed6e78e0 EFLAGS: 00010207
 RAX: 0022006b8000019a RBX: 0000000000000000 RCX: ffff8881d5dfbea0
 RDX: 0110035c00000ccc RSI: 0000000000000008 RDI: 0110035c00000cd4
 RBP: ffff8881ed6e7a80 R08: 0000000000000001 R09: fffffbfff28b44ac
 R10: ffffffff945a2567 R11: 0000000000000001 R12: ffffffffffffa000
 R13: 0000000000000004 R14: dffffc0000000000 R15: 0000000000001000
 FS:  00007f8449c1f740(0000) GS:ffff88d2b5a00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fb72c6aecf0 CR3: 00000001ed7b6003 CR4: 00000000007706f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  ? die_addr+0x3c/0xa0
  ? exc_general_protection+0x113/0x200
  ? asm_exc_general_protection+0x22/0x30
  ? __ceph_sync_read+0xc33/0x1b10
  ? __pfx___ceph_sync_read+0x10/0x10
  ? lock_acquire+0x186/0x4d0
  ? ceph_read_iter+0xace/0x19f0
  ceph_read_iter+0xace/0x19f0
  ? lock_release+0x648/0xb50
  ? __pfx_ceph_read_iter+0x10/0x10
  ? __rseq_handle_notify_resume+0x8ed/0xd40
  ? __pfx___rseq_handle_notify_resume+0x10/0x10
  ? vfs_read+0x6e0/0xba0
  vfs_read+0x6e0/0xba0
  ? __pfx_vfs_read+0x10/0x10
  ? syscall_exit_to_user_mode+0x9a/0x190
  ? syscall_exit_to_user_mode+0x9a/0x190
  __x64_sys_pread64+0x19b/0x1f0
  ? __pfx___x64_sys_pread64+0x10/0x10
  ? __pfx___rseq_handle_notify_resume+0x10/0x10
  do_syscall_64+0x82/0x130
  ? lockdep_hardirqs_on_prepare+0x275/0x3e0
  ? syscall_exit_to_user_mode+0x9a/0x190
  ? do_syscall_64+0x8e/0x130
  ? do_syscall_64+0x8e/0x130
  ? lockdep_hardirqs_on_prepare+0x275/0x3e0
  ? syscall_exit_to_user_mode+0x9a/0x190
  ? do_syscall_64+0x8e/0x130
  ? do_syscall_64+0x8e/0x130
  ? syscall_exit_to_user_mode+0x9a/0x190
  ? do_syscall_64+0x8e/0x130
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f8449d18343
 Code: 48 8b 6c 24 48 e8 3d 00 f3 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 80 3d a1 42 0e 00 00 49 89 ca 74 14 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5d c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 10
 RSP: 002b:00007ffd7a2e8b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000011
 RAX: ffffffffffffffda RBX: 00007ffd7a2e8cc8 RCX: 00007f8449d18343
 RDX: 0000000000004000 RSI: 0000557f7917c2a0 RDI: 0000000000000003
 RBP: 00007ffd7a2e8bb0 R08: 0000557f7919d000 R09: 0000000000021001
 R10: 0000000000002000 R11: 0000000000000202 R12: 0000000000000000
 R13: 00007ffd7a2e8cf0 R14: 0000557f436c2dd8 R15: 00007f8449e43020
  </TASK>
 Modules linked in:
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:__ceph_sync_read+0xc33/0x1b10
 Code: 39 e7 4d 0f 47 fc 48 8d 0c c6 48 89 c8 48 c1 e8 03 42 80 3c 30 00 0f 85 0b 0b 00 00 48 8b 11 48 8d 7a 08 48 89 f8 48 c1 e8 03 <42> 80 3c 30 00 0f 85 0d 0b 00 00 48 8b 42 08 a8 01 0f 84 ee 04 00
 RSP: 0018:ffff8881ed6e78e0 EFLAGS: 00010207
 RAX: 0022006b8000019a RBX: 0000000000000000 RCX: ffff8881d5dfbea0
 RDX: 0110035c00000ccc RSI: 0000000000000008 RDI: 0110035c00000cd4
 RBP: ffff8881ed6e7a80 R08: 0000000000000001 R09: fffffbfff28b44ac
 R10: ffffffff945a2567 R11: 0000000000000001 R12: ffffffffffffa000
 R13: 0000000000000004 R14: dffffc0000000000 R15: 0000000000001000
 FS:  00007f8449c1f740(0000) GS:ffff88d2b5a00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fb72c6aecf0 CR3: 00000001ed7b6003 CR4: 00000000007706f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 workqueue: ceph_con_workfn hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: size 0 -> 65536
 ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0

Fixes: 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads")
Fixes: https://tracker.ceph.com/issues/67524
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
v2: public posting; added link to Ceph bug tracker (vulnerability had
been known already for 3 months)
---
 fs/ceph/file.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Alex Markuze Nov. 27, 2024, 8:39 p.m. UTC | #1
There is a fix for this proposed by Luis.
It's being tested now.

On Wed, Nov 27, 2024 at 6:54 PM Max Kellermann <max.kellermann@ionos.com> wrote:
>
> If the inode size gets truncated by another task, __ceph_sync_read()
> may crash with a buffer overflow because it sets `left` to a huge
> value:
>
>   else if (off + ret > i_size)
>           left = i_size - off;
>
> Imagine `i_size` was truncated to zero; `off + ret > i_size` is always
> true, but `i_size - off` can be negative; since `left` is unsigned, it
> turns into a rather huge number, and thus the `while (left > 0)` loop
> never stops until it eventually crashes because `pages[idx]` overflows
> the `pages` allocation.
>
> We need to ensure that `i_size` never becomes smaller than `off`.  I
> suggest breaking from the loop as soon as this happens, right after
> the `i_size = i_size_read(inode)` update.
>
> This can be reproduced easily by running a program like this on one
> Ceph client:
>
>   ioctl(fd, CEPH_IOC_SYNCIO);
>   char buffer[16384];
>   while (1) pread(fd, buffer, sizeof(buffer), 8192);
>
> Then, on another server, truncate and rewrite the file until the first
> server's kernel crashes (I never needed more than two attempts to
> trigger the kernel crash):
>
>   dd if=/dev/urandom of=foo bs=1k count=64
>
> This is how the crash looks like (with KASAN and some debug logs from
> `__ceph_sync_read` and `ceph_fill_file_size`):
>
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 16384 i_size 65536
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 16384 retry_op 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: size 65536 -> 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_seq 36656 -> 36657
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 0 i_size 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: result 0 retry_op 0
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: on inode 0000000035059a6f 1000235edb7.fffffffffffffffe 2000~4000
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: orig 8192~16384 reading 8192~16384
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] __ceph_sync_read: 8192~16384 got 1024 i_size 0
>  ==================================================================
>  BUG: KASAN: slab-out-of-bounds in __ceph_sync_read+0x173f/0x1b10
>  Read of size 8 at addr ffff8881d5dfbea0 by task pread/3276
>
>  CPU: 3 UID: 2147488069 PID: 3276 Comm: pread Not tainted 6.11.10-cm4all1-hp+ #254
>  Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 09/05/2019
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x62/0x90
>   print_report+0xc4/0x5e0
>   ? __virt_addr_valid+0x1e9/0x3a0
>   ? __ceph_sync_read+0x173f/0x1b10
>   kasan_report+0xb9/0xf0
>   ? __ceph_sync_read+0x173f/0x1b10
>   __ceph_sync_read+0x173f/0x1b10
>   ? __pfx___ceph_sync_read+0x10/0x10
>   ? lock_acquire+0x186/0x4d0
>   ? ceph_read_iter+0xace/0x19f0
>   ceph_read_iter+0xace/0x19f0
>   ? lock_release+0x648/0xb50
>   ? __pfx_ceph_read_iter+0x10/0x10
>   ? __rseq_handle_notify_resume+0x8ed/0xd40
>   ? __pfx___rseq_handle_notify_resume+0x10/0x10
>   ? vfs_read+0x6e0/0xba0
>   vfs_read+0x6e0/0xba0
>   ? __pfx_vfs_read+0x10/0x10
>   ? syscall_exit_to_user_mode+0x9a/0x190
>   ? syscall_exit_to_user_mode+0x9a/0x190
>   __x64_sys_pread64+0x19b/0x1f0
>   ? __pfx___x64_sys_pread64+0x10/0x10
>   ? __pfx___rseq_handle_notify_resume+0x10/0x10
>   do_syscall_64+0x82/0x130
>   ? lockdep_hardirqs_on_prepare+0x275/0x3e0
>   ? syscall_exit_to_user_mode+0x9a/0x190
>   ? do_syscall_64+0x8e/0x130
>   ? do_syscall_64+0x8e/0x130
>   ? lockdep_hardirqs_on_prepare+0x275/0x3e0
>   ? syscall_exit_to_user_mode+0x9a/0x190
>   ? do_syscall_64+0x8e/0x130
>   ? do_syscall_64+0x8e/0x130
>   ? syscall_exit_to_user_mode+0x9a/0x190
>   ? do_syscall_64+0x8e/0x130
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  RIP: 0033:0x7f8449d18343
>  Code: 48 8b 6c 24 48 e8 3d 00 f3 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 80 3d a1 42 0e 00 00 49 89 ca 74 14 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5d c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 10
>  RSP: 002b:00007ffd7a2e8b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000011
>  RAX: ffffffffffffffda RBX: 00007ffd7a2e8cc8 RCX: 00007f8449d18343
>  RDX: 0000000000004000 RSI: 0000557f7917c2a0 RDI: 0000000000000003
>  RBP: 00007ffd7a2e8bb0 R08: 0000557f7919d000 R09: 0000000000021001
>  R10: 0000000000002000 R11: 0000000000000202 R12: 0000000000000000
>  R13: 00007ffd7a2e8cf0 R14: 0000557f436c2dd8 R15: 00007f8449e43020
>   </TASK>
>
>  Allocated by task 3276:
>   kasan_save_stack+0x1c/0x40
>   kasan_save_track+0x10/0x30
>   __kasan_kmalloc+0x8b/0x90
>   __kmalloc_noprof+0x1bf/0x490
>   ceph_alloc_page_vector+0x36/0x110
>   __ceph_sync_read+0x769/0x1b10
>   ceph_read_iter+0xace/0x19f0
>   vfs_read+0x6e0/0xba0
>   __x64_sys_pread64+0x19b/0x1f0
>   do_syscall_64+0x82/0x130
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
>  The buggy address belongs to the object at ffff8881d5dfbe80
>   which belongs to the cache kmalloc-32 of size 32
>  The buggy address is located 0 bytes to the right of
>   allocated 32-byte region [ffff8881d5dfbe80, ffff8881d5dfbea0)
>
>  The buggy address belongs to the physical page:
>  page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d5dfb
>  flags: 0x2fffc0000000000(node=0|zone=2|lastcpupid=0x3fff)
>  page_type: 0xfdffffff(slab)
>  raw: 02fffc0000000000 ffff888100042780 dead000000000122 0000000000000000
>  raw: 0000000000000000 0000000080400040 00000001fdffffff 0000000000000000
>  page dumped because: kasan: bad access detected
>
>  Memory state around the buggy address:
>   ffff8881d5dfbd80: fa fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
>   ffff8881d5dfbe00: fa fb fb fb fc fc fc fc fa fb fb fb fc fc fc fc
>  >ffff8881d5dfbe80: 00 00 00 00 fc fc fc fc fa fb fb fb fc fc fc fc
>                                 ^
>   ffff8881d5dfbf00: fc fc fc fc fc fc fc fc fa fb fb fb fc fc fc fc
>   ffff8881d5dfbf80: fa fb fb fb fc fc fc fc fa fb fb fb fc fc fc fc
>  ==================================================================
>  Disabling lock debugging due to kernel taint
>  Oops: general protection fault, probably for non-canonical address 0xe021fc6b8000019a: 0000 [#1] SMP KASAN PTI
>  KASAN: maybe wild-memory-access in range [0x0110035c00000cd0-0x0110035c00000cd7]
>  CPU: 3 UID: 2147488069 PID: 3276 Comm: pread Tainted: G    B              6.11.10-cm4all1-hp+ #254
>  Tainted: [B]=BAD_PAGE
>  Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 09/05/2019
>  RIP: 0010:__ceph_sync_read+0xc33/0x1b10
>  Code: 39 e7 4d 0f 47 fc 48 8d 0c c6 48 89 c8 48 c1 e8 03 42 80 3c 30 00 0f 85 0b 0b 00 00 48 8b 11 48 8d 7a 08 48 89 f8 48 c1 e8 03 <42> 80 3c 30 00 0f 85 0d 0b 00 00 48 8b 42 08 a8 01 0f 84 ee 04 00
>  RSP: 0018:ffff8881ed6e78e0 EFLAGS: 00010207
>  RAX: 0022006b8000019a RBX: 0000000000000000 RCX: ffff8881d5dfbea0
>  RDX: 0110035c00000ccc RSI: 0000000000000008 RDI: 0110035c00000cd4
>  RBP: ffff8881ed6e7a80 R08: 0000000000000001 R09: fffffbfff28b44ac
>  R10: ffffffff945a2567 R11: 0000000000000001 R12: ffffffffffffa000
>  R13: 0000000000000004 R14: dffffc0000000000 R15: 0000000000001000
>  FS:  00007f8449c1f740(0000) GS:ffff88d2b5a00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007fb72c6aecf0 CR3: 00000001ed7b6003 CR4: 00000000007706f0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  PKRU: 55555554
>  Call Trace:
>   <TASK>
>   ? die_addr+0x3c/0xa0
>   ? exc_general_protection+0x113/0x200
>   ? asm_exc_general_protection+0x22/0x30
>   ? __ceph_sync_read+0xc33/0x1b10
>   ? __pfx___ceph_sync_read+0x10/0x10
>   ? lock_acquire+0x186/0x4d0
>   ? ceph_read_iter+0xace/0x19f0
>   ceph_read_iter+0xace/0x19f0
>   ? lock_release+0x648/0xb50
>   ? __pfx_ceph_read_iter+0x10/0x10
>   ? __rseq_handle_notify_resume+0x8ed/0xd40
>   ? __pfx___rseq_handle_notify_resume+0x10/0x10
>   ? vfs_read+0x6e0/0xba0
>   vfs_read+0x6e0/0xba0
>   ? __pfx_vfs_read+0x10/0x10
>   ? syscall_exit_to_user_mode+0x9a/0x190
>   ? syscall_exit_to_user_mode+0x9a/0x190
>   __x64_sys_pread64+0x19b/0x1f0
>   ? __pfx___x64_sys_pread64+0x10/0x10
>   ? __pfx___rseq_handle_notify_resume+0x10/0x10
>   do_syscall_64+0x82/0x130
>   ? lockdep_hardirqs_on_prepare+0x275/0x3e0
>   ? syscall_exit_to_user_mode+0x9a/0x190
>   ? do_syscall_64+0x8e/0x130
>   ? do_syscall_64+0x8e/0x130
>   ? lockdep_hardirqs_on_prepare+0x275/0x3e0
>   ? syscall_exit_to_user_mode+0x9a/0x190
>   ? do_syscall_64+0x8e/0x130
>   ? do_syscall_64+0x8e/0x130
>   ? syscall_exit_to_user_mode+0x9a/0x190
>   ? do_syscall_64+0x8e/0x130
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  RIP: 0033:0x7f8449d18343
>  Code: 48 8b 6c 24 48 e8 3d 00 f3 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66 90 80 3d a1 42 0e 00 00 49 89 ca 74 14 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5d c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 10
>  RSP: 002b:00007ffd7a2e8b78 EFLAGS: 00000202 ORIG_RAX: 0000000000000011
>  RAX: ffffffffffffffda RBX: 00007ffd7a2e8cc8 RCX: 00007f8449d18343
>  RDX: 0000000000004000 RSI: 0000557f7917c2a0 RDI: 0000000000000003
>  RBP: 00007ffd7a2e8bb0 R08: 0000557f7919d000 R09: 0000000000021001
>  R10: 0000000000002000 R11: 0000000000000202 R12: 0000000000000000
>  R13: 00007ffd7a2e8cf0 R14: 0000557f436c2dd8 R15: 00007f8449e43020
>   </TASK>
>  Modules linked in:
>  ---[ end trace 0000000000000000 ]---
>  RIP: 0010:__ceph_sync_read+0xc33/0x1b10
>  Code: 39 e7 4d 0f 47 fc 48 8d 0c c6 48 89 c8 48 c1 e8 03 42 80 3c 30 00 0f 85 0b 0b 00 00 48 8b 11 48 8d 7a 08 48 89 f8 48 c1 e8 03 <42> 80 3c 30 00 0f 85 0d 0b 00 00 48 8b 42 08 a8 01 0f 84 ee 04 00
>  RSP: 0018:ffff8881ed6e78e0 EFLAGS: 00010207
>  RAX: 0022006b8000019a RBX: 0000000000000000 RCX: ffff8881d5dfbea0
>  RDX: 0110035c00000ccc RSI: 0000000000000008 RDI: 0110035c00000cd4
>  RBP: ffff8881ed6e7a80 R08: 0000000000000001 R09: fffffbfff28b44ac
>  R10: ffffffff945a2567 R11: 0000000000000001 R12: ffffffffffffa000
>  R13: 0000000000000004 R14: dffffc0000000000 R15: 0000000000001000
>  FS:  00007f8449c1f740(0000) GS:ffff88d2b5a00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007fb72c6aecf0 CR3: 00000001ed7b6003 CR4: 00000000007706f0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  PKRU: 55555554
>  workqueue: ceph_con_workfn hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: size 0 -> 65536
>  ceph:  [8f7ec2f3-0dcb-468f-bd16-37e0a61bf195 4098067] ceph_fill_file_size: truncate_size 0 -> 0, encrypted 0
>
> Fixes: 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads")
> Fixes: https://tracker.ceph.com/issues/67524
> Cc: stable@vger.kernel.org
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---
> v2: public posting; added link to Ceph bug tracker (vulnerability had
> been known already for 3 months)
> ---
>  fs/ceph/file.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 4b8d59ebda00..57d7cdda0f87 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1154,6 +1154,13 @@ ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
>                 doutc(cl, "%llu~%llu got %zd i_size %llu%s\n", off, len,
>                       ret, i_size, (more ? " MORE" : ""));
>
> +               if (off >= i_size)
> +                       /* meanwhile, the file has been truncated by
> +                        * another task and the offset is no longer
> +                        * valid; stop here
> +                        */
> +                       break;
> +
>                 /* Fix it to go to end of extent map */
>                 if (sparse && ret >= 0)
>                         ret = ceph_sparse_ext_map_end(op);
> --
> 2.45.2
>
>
Max Kellermann Nov. 27, 2024, 8:43 p.m. UTC | #2
On Wed, Nov 27, 2024 at 9:40 PM Alex Markuze <amarkuze@redhat.com> wrote:
> There is a fix for this proposed by Luis.

On the private security mailing list, I wrote about it:
"This patch is incomplete because it only checks for i_size==0.
Truncation to zero is the most common case, but any situation where
offset is suddenly larger than the new size triggers this bug."

I think my patch is better.
Alex Markuze Nov. 27, 2024, 8:56 p.m. UTC | #3
You are correct, that is why I'm testing a patch that deals with all
cases where i_size < offset.
I will CC you on the other thread.

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 4b8d59ebda00..19b084212fee 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1066,7 +1066,7 @@ ssize_t __ceph_sync_read(struct inode *inode,
loff_t *ki_pos,
        if (ceph_inode_is_shutdown(inode))
                return -EIO;

-       if (!len)
+       if (!len || !i_size)
                return 0;
        /*
         * flush any page cache pages in this range.  this
@@ -1200,12 +1200,11 @@ ssize_t __ceph_sync_read(struct inode *inode,
loff_t *ki_pos,
                }

                idx = 0;
-               if (ret <= 0)
-                       left = 0;
-               else if (off + ret > i_size)
-                       left = i_size - off;
+               if (off + ret > i_size)
+                       left = (i_size > off) ? i_size - off : 0;
                else
-                       left = ret;
+                       left = (ret > 0) ? ret : 0;
+
                while (left > 0) {
                        size_t plen, copied;



On Wed, Nov 27, 2024 at 10:43 PM Max Kellermann
<max.kellermann@ionos.com> wrote:
>
> On Wed, Nov 27, 2024 at 9:40 PM Alex Markuze <amarkuze@redhat.com> wrote:
> > There is a fix for this proposed by Luis.
>
> On the private security mailing list, I wrote about it:
> "This patch is incomplete because it only checks for i_size==0.
> Truncation to zero is the most common case, but any situation where
> offset is suddenly larger than the new size triggers this bug."
>
> I think my patch is better.
>
Max Kellermann Nov. 27, 2024, 9:06 p.m. UTC | #4
On Wed, Nov 27, 2024 at 9:57 PM Alex Markuze <amarkuze@redhat.com> wrote:
> You are correct, that is why I'm testing a patch that deals with all
> cases where i_size < offset.

I don't like that patch because it looks complicated; it obscures the
problem and it runs a bunch of code (fscrypt, zero_page_vector) before
noticing the problem. My patch is simple and breaks the loop as soon
as the new size is known.

But I found a bug in my patch: I forgot to call
ceph_osdc_put_request(). And while looking at it, I found another
(old) leak bug. I'll post two new patches.

(I'm trying hard to suppress a rant about C, after fixing several
other Ceph leak bugs this week that caused server outages over here.)
diff mbox series

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 4b8d59ebda00..57d7cdda0f87 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1154,6 +1154,13 @@  ssize_t __ceph_sync_read(struct inode *inode, loff_t *ki_pos,
 		doutc(cl, "%llu~%llu got %zd i_size %llu%s\n", off, len,
 		      ret, i_size, (more ? " MORE" : ""));
 
+		if (off >= i_size)
+			/* meanwhile, the file has been truncated by
+			 * another task and the offset is no longer
+			 * valid; stop here
+			 */
+			break;
+
 		/* Fix it to go to end of extent map */
 		if (sparse && ret >= 0)
 			ret = ceph_sparse_ext_map_end(op);