diff mbox series

[v2,2/2] VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host()

Message ID 20240105164001.2129796-2-harshit.m.mogalapalli@oracle.com (mailing list archive)
State Mainlined
Commit 19b070fefd0d024af3daa7329cbc0d00de5302ec
Headers show
Series [v2,1/2] VMCI: Use struct_size() in kmalloc() | expand

Commit Message

Harshit Mogalapalli Jan. 5, 2024, 4:40 p.m. UTC
Syzkaller hit 'WARNING in dg_dispatch_as_host' bug.

memcpy: detected field-spanning write (size 56) of single field "&dg_info->msg"
at drivers/misc/vmw_vmci/vmci_datagram.c:237 (size 24)

WARNING: CPU: 0 PID: 1555 at drivers/misc/vmw_vmci/vmci_datagram.c:237
dg_dispatch_as_host+0x88e/0xa60 drivers/misc/vmw_vmci/vmci_datagram.c:237

Some code commentry, based on my understanding:

544 #define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + (size_t)(_dg)->payload_size)
/// This is 24 + payload_size

memcpy(&dg_info->msg, dg, dg_size);
	Destination = dg_info->msg ---> this is a 24 byte
					structure(struct vmci_datagram)
	Source = dg --> this is a 24 byte structure (struct vmci_datagram)
	Size = dg_size = 24 + payload_size

{payload_size = 56-24 =32} -- Syzkaller managed to set payload_size to 32.

 35 struct delayed_datagram_info {
 36         struct datagram_entry *entry;
 37         struct work_struct work;
 38         bool in_dg_host_queue;
 39         /* msg and msg_payload must be together. */
 40         struct vmci_datagram msg;
 41         u8 msg_payload[];
 42 };

So those extra bytes of payload are copied into msg_payload[], a run time
warning is seen while fuzzing with Syzkaller.

One possible way to fix the warning is to split the memcpy() into
two parts -- one -- direct assignment of msg and second taking care of payload.

Gustavo quoted:
"Under FORTIFY_SOURCE we should not copy data across multiple members
in a structure."

Reported-by: syzkaller <syzkaller@googlegroups.com>
Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
---
This patch is only tested with the C reproducer, not any testing
specific to driver is done.

v1->v2: ( Suggestions from Gustavo )
        1. Change the commit message false positive --> legitimate
           warning.
        2. Remove unneeded payload_size variable.
        3. Replace first memcpy() with direct assignment.
---
 drivers/misc/vmw_vmci/vmci_datagram.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Gustavo A. R. Silva Jan. 5, 2024, 5:11 p.m. UTC | #1
On 1/5/24 10:40, Harshit Mogalapalli wrote:
> Syzkaller hit 'WARNING in dg_dispatch_as_host' bug.
> 
> memcpy: detected field-spanning write (size 56) of single field "&dg_info->msg"
> at drivers/misc/vmw_vmci/vmci_datagram.c:237 (size 24)
> 
> WARNING: CPU: 0 PID: 1555 at drivers/misc/vmw_vmci/vmci_datagram.c:237
> dg_dispatch_as_host+0x88e/0xa60 drivers/misc/vmw_vmci/vmci_datagram.c:237
> 
> Some code commentry, based on my understanding:
> 
> 544 #define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + (size_t)(_dg)->payload_size)
> /// This is 24 + payload_size
> 
> memcpy(&dg_info->msg, dg, dg_size);
> 	Destination = dg_info->msg ---> this is a 24 byte
> 					structure(struct vmci_datagram)
> 	Source = dg --> this is a 24 byte structure (struct vmci_datagram)
> 	Size = dg_size = 24 + payload_size
> 
> {payload_size = 56-24 =32} -- Syzkaller managed to set payload_size to 32.
> 
>   35 struct delayed_datagram_info {
>   36         struct datagram_entry *entry;
>   37         struct work_struct work;
>   38         bool in_dg_host_queue;
>   39         /* msg and msg_payload must be together. */
>   40         struct vmci_datagram msg;
>   41         u8 msg_payload[];
>   42 };
> 
> So those extra bytes of payload are copied into msg_payload[], a run time
> warning is seen while fuzzing with Syzkaller.
> 
> One possible way to fix the warning is to split the memcpy() into
> two parts -- one -- direct assignment of msg and second taking care of payload.
> 
> Gustavo quoted:
> "Under FORTIFY_SOURCE we should not copy data across multiple members
> in a structure."
> 
> Reported-by: syzkaller <syzkaller@googlegroups.com>
> Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
> Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>

Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>

Thanks!
Dan Carpenter Jan. 8, 2024, 7:33 a.m. UTC | #2
On Fri, Jan 05, 2024 at 08:40:00AM -0800, Harshit Mogalapalli wrote:
> Syzkaller hit 'WARNING in dg_dispatch_as_host' bug.
> 
> memcpy: detected field-spanning write (size 56) of single field "&dg_info->msg"
> at drivers/misc/vmw_vmci/vmci_datagram.c:237 (size 24)
> 
> WARNING: CPU: 0 PID: 1555 at drivers/misc/vmw_vmci/vmci_datagram.c:237
> dg_dispatch_as_host+0x88e/0xa60 drivers/misc/vmw_vmci/vmci_datagram.c:237
> 
> Some code commentry, based on my understanding:
> 
> 544 #define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + (size_t)(_dg)->payload_size)
> /// This is 24 + payload_size
> 
> memcpy(&dg_info->msg, dg, dg_size);
> 	Destination = dg_info->msg ---> this is a 24 byte
> 					structure(struct vmci_datagram)
> 	Source = dg --> this is a 24 byte structure (struct vmci_datagram)
> 	Size = dg_size = 24 + payload_size
> 
> {payload_size = 56-24 =32} -- Syzkaller managed to set payload_size to 32.
> 
>  35 struct delayed_datagram_info {
>  36         struct datagram_entry *entry;
>  37         struct work_struct work;
>  38         bool in_dg_host_queue;
>  39         /* msg and msg_payload must be together. */
>  40         struct vmci_datagram msg;
>  41         u8 msg_payload[];
>  42 };
> 
> So those extra bytes of payload are copied into msg_payload[], a run time
> warning is seen while fuzzing with Syzkaller.
> 
> One possible way to fix the warning is to split the memcpy() into
> two parts -- one -- direct assignment of msg and second taking care of payload.
> 
> Gustavo quoted:
> "Under FORTIFY_SOURCE we should not copy data across multiple members
> in a structure."
> 
> Reported-by: syzkaller <syzkaller@googlegroups.com>
> Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
> Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
> ---
> This patch is only tested with the C reproducer, not any testing
> specific to driver is done.
> 
> v1->v2: ( Suggestions from Gustavo )
>         1. Change the commit message false positive --> legitimate
>            warning.

The commit message is fine.

Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>

But, I mean, it's not really "legitimate".  It meets the fortify source
heuristic, but it's still a false positive.  Fortify source is
*supposed* to find memory corruption bugs and this is not a memory
corruption bug.  It's just that these days we have to treat foritify
false positives as crashing bugs because people enable it and we have to
fix it.

Let's not pretend that fortify has helped us in this situation, it has
caused us a problem.  It has taken valid code and created a crashing
bug.  I'm not saying that the cost isn't worth it, but let's not pretend.

regards,
dan carpenter
Gustavo A. R. Silva Jan. 8, 2024, 5:03 p.m. UTC | #3
On 1/8/24 01:33, Dan Carpenter wrote:
> On Fri, Jan 05, 2024 at 08:40:00AM -0800, Harshit Mogalapalli wrote:
>> Syzkaller hit 'WARNING in dg_dispatch_as_host' bug.
>>
>> memcpy: detected field-spanning write (size 56) of single field "&dg_info->msg"
>> at drivers/misc/vmw_vmci/vmci_datagram.c:237 (size 24)
>>
>> WARNING: CPU: 0 PID: 1555 at drivers/misc/vmw_vmci/vmci_datagram.c:237
>> dg_dispatch_as_host+0x88e/0xa60 drivers/misc/vmw_vmci/vmci_datagram.c:237
>>
>> Some code commentry, based on my understanding:
>>
>> 544 #define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + (size_t)(_dg)->payload_size)
>> /// This is 24 + payload_size
>>
>> memcpy(&dg_info->msg, dg, dg_size);
>> 	Destination = dg_info->msg ---> this is a 24 byte
>> 					structure(struct vmci_datagram)
>> 	Source = dg --> this is a 24 byte structure (struct vmci_datagram)
>> 	Size = dg_size = 24 + payload_size
>>
>> {payload_size = 56-24 =32} -- Syzkaller managed to set payload_size to 32.
>>
>>   35 struct delayed_datagram_info {
>>   36         struct datagram_entry *entry;
>>   37         struct work_struct work;
>>   38         bool in_dg_host_queue;
>>   39         /* msg and msg_payload must be together. */
>>   40         struct vmci_datagram msg;
>>   41         u8 msg_payload[];
>>   42 };
>>
>> So those extra bytes of payload are copied into msg_payload[], a run time
>> warning is seen while fuzzing with Syzkaller.
>>
>> One possible way to fix the warning is to split the memcpy() into
>> two parts -- one -- direct assignment of msg and second taking care of payload.
>>
>> Gustavo quoted:
>> "Under FORTIFY_SOURCE we should not copy data across multiple members
>> in a structure."
>>
>> Reported-by: syzkaller <syzkaller@googlegroups.com>
>> Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
>> Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
>> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
>> ---
>> This patch is only tested with the C reproducer, not any testing
>> specific to driver is done.
>>
>> v1->v2: ( Suggestions from Gustavo )
>>          1. Change the commit message false positive --> legitimate
>>             warning.
> 
> The commit message is fine.
> 
> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
> 
> But, I mean, it's not really "legitimate".  It meets the fortify source
> heuristic, but it's still a false positive.  Fortify source is
> *supposed* to find memory corruption bugs and this is not a memory
> corruption bug.  It's just that these days we have to treat foritify
> false positives as crashing bugs because people enable it and we have to
> fix it.
> 
> Let's not pretend that fortify has helped us in this situation, it has
> caused us a problem.  It has taken valid code and created a crashing
> bug.  I'm not saying that the cost isn't worth it, but let's not pretend.
> 

It's a "legitimate warning" (which differs from a "legitimate memory
corruption bug") in the sense that the feature is doing what it's
supposed to do: reporting a write beyond the boundaries of a field/member
in a structure.

Is that simple.  I don't see the "pretense" here.

BTW, is this _warning_ really causing a crash?

Thanks
--
Gustavo
Harshit Mogalapalli Jan. 8, 2024, 5:31 p.m. UTC | #4
Hi Gustavo,

On 08/01/24 10:33 pm, Gustavo A. R. Silva wrote:
> 
> 
> On 1/8/24 01:33, Dan Carpenter wrote:
>> On Fri, Jan 05, 2024 at 08:40:00AM -0800, Harshit Mogalapalli wrote:
>>> Syzkaller hit 'WARNING in dg_dispatch_as_host' bug.
>>>
>>> memcpy: detected field-spanning write (size 56) of single field 
>>> "&dg_info->msg"
>>> at drivers/misc/vmw_vmci/vmci_datagram.c:237 (size 24)
>>>
>>> WARNING: CPU: 0 PID: 1555 at drivers/misc/vmw_vmci/vmci_datagram.c:237
>>> dg_dispatch_as_host+0x88e/0xa60 
>>> drivers/misc/vmw_vmci/vmci_datagram.c:237
>>>
>>> Some code commentry, based on my understanding:
>>>
>>> 544 #define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + 
>>> (size_t)(_dg)->payload_size)
>>> /// This is 24 + payload_size
>>>
>>> memcpy(&dg_info->msg, dg, dg_size);
>>>     Destination = dg_info->msg ---> this is a 24 byte
>>>                     structure(struct vmci_datagram)
>>>     Source = dg --> this is a 24 byte structure (struct vmci_datagram)
>>>     Size = dg_size = 24 + payload_size
>>>
>>> {payload_size = 56-24 =32} -- Syzkaller managed to set payload_size 
>>> to 32.
>>>
>>>   35 struct delayed_datagram_info {
>>>   36         struct datagram_entry *entry;
>>>   37         struct work_struct work;
>>>   38         bool in_dg_host_queue;
>>>   39         /* msg and msg_payload must be together. */
>>>   40         struct vmci_datagram msg;
>>>   41         u8 msg_payload[];
>>>   42 };
>>>
>>> So those extra bytes of payload are copied into msg_payload[], a run 
>>> time
>>> warning is seen while fuzzing with Syzkaller.
>>>
>>> One possible way to fix the warning is to split the memcpy() into
>>> two parts -- one -- direct assignment of msg and second taking care 
>>> of payload.
>>>
>>> Gustavo quoted:
>>> "Under FORTIFY_SOURCE we should not copy data across multiple members
>>> in a structure."
>>>
>>> Reported-by: syzkaller <syzkaller@googlegroups.com>
>>> Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
>>> Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
>>> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
>>> ---
>>> This patch is only tested with the C reproducer, not any testing
>>> specific to driver is done.
>>>
>>> v1->v2: ( Suggestions from Gustavo )
>>>          1. Change the commit message false positive --> legitimate
>>>             warning.
>>
>> The commit message is fine.
>>
>> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
>>
>> But, I mean, it's not really "legitimate".  It meets the fortify source
>> heuristic, but it's still a false positive.  Fortify source is
>> *supposed* to find memory corruption bugs and this is not a memory
>> corruption bug.  It's just that these days we have to treat foritify
>> false positives as crashing bugs because people enable it and we have to
>> fix it.
>>
>> Let's not pretend that fortify has helped us in this situation, it has
>> caused us a problem.  It has taken valid code and created a crashing
>> bug.  I'm not saying that the cost isn't worth it, but let's not pretend.
>>
> 
> It's a "legitimate warning" (which differs from a "legitimate memory
> corruption bug") in the sense that the feature is doing what it's
> supposed to do: reporting a write beyond the boundaries of a field/member
> in a structure.
> 
> Is that simple.  I don't see the "pretense" here.
> 
> BTW, is this _warning_ really causing a crash?
> 

I think we can say this can cause a crash in a way, WARN_ONCE() is still 
a WARNING and we can have systems with panic_on_warn set.

Eg: on how a crash would look like when panic_on_warn is set.

[  173.803078] ------------[ cut here ]------------
[  173.806961] memcpy: detected field-spanning write (size 56) of single 
field "&dg_info->msg" at drivers/misc/vmw_vmci/vmci_datagram.c:237 (size 24)
[  173.817612] WARNING: CPU: 4 PID: 9003 at 
drivers/misc/vmw_vmci/vmci_datagram.c:237 dg_dispatch_as_host+0x88e/0xa60
[  173.826031] Modules linked in:
[  173.828765] CPU: 4 PID: 9003 Comm: r Not tainted 6.7.0-rc7 #6
[  173.833502] Hardware name: Red Hat KVM, BIOS 1.16.1-1.el9 04/01/2014
[  173.838689] RIP: 0010:dg_dispatch_as_host+0x88e/0xa60
[  173.842953] Code: fe ff ff e8 a4 61 70 fa b9 18 00 00 00 48 89 de 48 
c7 c2 e0 c8 20 92 48 c7 c7 60 c9 20 92 c6 05 e3 62 22 12 01 e8 92 c2 38 
fa <0f> 0b e9 82 fe ff ff e8 76 61 70 fa e8 71 61 70 fa 48 8d 7d 0c 48
[  173.857632] RSP: 0018:ffff88810362fb10 EFLAGS: 00010246
[  173.862078] RAX: 0000000000000000 RBX: 0000000000000038 RCX: 
0000000000000000
[  173.867885] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
[  173.873689] RBP: ffff888118f94680 R08: 0000000000000000 R09: 
0000000000000000
[  173.879503] R10: 0000000000000000 R11: 0000000000000000 R12: 
ffff888118f71130
[  173.885317] R13: ffff888118f71100 R14: 0000000000000000 R15: 
0000000000000000
[  173.891124] FS:  00007fced2868740(0000) GS:ffff8881f4200000(0000) 
knlGS:0000000000000000
[  173.897658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  173.902397] CR2: 0000000020000000 CR3: 00000001198d2000 CR4: 
00000000000006f0
[  173.908222] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  173.914073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[  173.919901] Call Trace:
[  173.922117]  <TASK>
[  173.924075]  ? show_regs+0x9b/0xb0
[  173.927151]  ? __warn+0xeb/0x2c0
[  173.929981]  ? dg_dispatch_as_host+0x88e/0xa60
[  173.933769]  ? report_bug+0x313/0x410
[  173.937010]  ? dg_dispatch_as_host+0x88e/0xa60
[  173.940804]  ? handle_bug+0x9d/0x130
[  173.943922]  ? exc_invalid_op+0x36/0x80
[  173.947195]  ? asm_exc_invalid_op+0x1a/0x20
[  173.950768]  ? dg_dispatch_as_host+0x88e/0xa60
[  173.954524]  vmci_datagram_dispatch+0x1da/0x230
[  173.958368]  ? __pfx_vmci_datagram_dispatch+0x10/0x10
[  173.962583]  vmci_datagram_send+0x2d/0x50
[  173.966006]  vmci_transport_dgram_enqueue+0x2e2/0x410
[  173.970228]  ? __pfx_vmci_transport_dgram_allow+0x10/0x10
[  173.974689]  vsock_dgram_sendmsg+0x391/0x610
[  173.978336]  ? __pfx_vsock_dgram_sendmsg+0x10/0x10
[  173.982568]  __sys_sendto+0x4dc/0x540
[  173.985767]  ? __pfx___sys_sendto+0x10/0x10
[  173.989289]  ? __pfx_vsock_dgram_connect+0x10/0x10
[  173.993312]  ? selinux_netlbl_socket_connect+0x37/0x50
[  173.997588]  ? selinux_socket_connect+0x76/0xa0
[  174.001449]  ? __sys_connect_file+0x5d/0x1b0
[  174.005078]  ? __sys_connect+0x113/0x1b0
[  174.008420]  ? __pfx___sys_connect+0x10/0x10
[  174.012031]  ? count_memcg_events.constprop.0+0x48/0x60
[  174.016346]  ? handle_mm_fault+0x2c8/0x910
[  174.019797]  ? ktime_get_coarse_real_ts64+0xa0/0xf0
[  174.023862]  ? __audit_syscall_entry+0x393/0x4f0
[  174.027769]  __x64_sys_sendto+0xe9/0x1d0
[  174.031090]  ? syscall_trace_enter.constprop.0+0x138/0x1b0
[  174.035604]  do_syscall_64+0x45/0x100
[  174.038765]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  174.042953] RIP: 0033:0x7fced220eca3
[  174.046040] Code: 48 8b 0d 08 83 20 00 f7 d8 64 89 01 48 83 c8 ff c3 
66 0f 1f 44 00 00 83 3d 49 c7 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 
05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b f7 ff ff 48 89 04 24
[  174.060689] RSP: 002b:00007ffe5f77d2e8 EFLAGS: 00000246 ORIG_RAX: 
000000000000002c
[  174.066898] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
00007fced220eca3
[  174.072766] RDX: 0000000000000020 RSI: 00007ffe5f77d400 RDI: 
0000000000000004
[  174.078573] RBP: 00007ffe5f77d330 R08: 00007ffe5f77d310 R09: 
000000000000000c
[  174.084376] R10: 0000000000000000 R11: 0000000000000246 R12: 
0000000000400730
[  174.090185] R13: 00007ffe5f77e520 R14: 0000000000000000 R15: 
0000000000000000
[  174.096014]  </TASK>
[  174.098021] Kernel panic - not syncing: kernel: panic_on_warn set ...
[  174.101976] CPU: 4 PID: 9003 Comm: r Not tainted 6.7.0-rc7 #6
[  174.107921] Hardware name: Red Hat KVM, BIOS 1.16.1-1.el9 04/01/2014
[  174.112923] Call Trace:
[  174.113973]  <TASK>
[  174.115922]  dump_stack_lvl+0x83/0xb0
[  174.117975]  panic+0x697/0x720
[  174.121972]  ? show_trace_log_lvl+0x3bb/0x520
[  174.125942]  ? __pfx_panic+0x10/0x10
[  174.128925]  ? dg_dispatch_as_host+0x88e/0xa60
[  174.131924]  check_panic_on_warn+0xb6/0xc0
[  174.133970]  __warn+0xf7/0x2c0
[  174.137970]  ? dg_dispatch_as_host+0x88e/0xa60
[  174.141970]  report_bug+0x313/0x410
[  174.144924]  ? dg_dispatch_as_host+0x88e/0xa60
[  174.148898]  handle_bug+0x9d/0x130
[  174.149968]  exc_invalid_op+0x36/0x80
[  174.153969]  asm_exc_invalid_op+0x1a/0x20
[  174.157970] RIP: 0010:dg_dispatch_as_host+0x88e/0xa60
[  174.161970] Code: fe ff ff e8 a4 61 70 fa b9 18 00 00 00 48 89 de 48 
c7 c2 e0 c8 20 92 48 c7 c7 60 c9 20 92 c6 05 e3 62 22 12 01 e8 92 c2 38 
fa <0f> 0b e9 82 fe ff ff e8 76 61 70 fa e8 71 61 70 fa 48 8d 7d 0c 48
[  174.176923] RSP: 0018:ffff88810362fb10 EFLAGS: 00010246
[  174.179922] RAX: 0000000000000000 RBX: 0000000000000038 RCX: 
0000000000000000
[  174.185967] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
[  174.189966] RBP: ffff888118f94680 R08: 0000000000000000 R09: 
0000000000000000
[  174.197968] R10: 0000000000000000 R11: 0000000000000000 R12: 
ffff888118f71130
[  174.201969] R13: ffff888118f71100 R14: 0000000000000000 R15: 
0000000000000000
[  174.208924]  vmci_datagram_dispatch+0x1da/0x230
[  174.212884]  ? __pfx_vmci_datagram_dispatch+0x10/0x10
[  174.216923]  vmci_datagram_send+0x2d/0x50
[  174.219924]  vmci_transport_dgram_enqueue+0x2e2/0x410
[  174.221966]  ? __pfx_vmci_transport_dgram_allow+0x10/0x10
[  174.227939]  vsock_dgram_sendmsg+0x391/0x610
[  174.230883]  ? __pfx_vsock_dgram_sendmsg+0x10/0x10
[  174.236021]  __sys_sendto+0x4dc/0x540
[  174.237970]  ? __pfx___sys_sendto+0x10/0x10
[  174.241968]  ? __pfx_vsock_dgram_connect+0x10/0x10
[  174.245969]  ? selinux_netlbl_socket_connect+0x37/0x50
[  174.249969]  ? selinux_socket_connect+0x76/0xa0
[  174.253968]  ? __sys_connect_file+0x5d/0x1b0
[  174.257966]  ? __sys_connect+0x113/0x1b0
[  174.259923]  ? __pfx___sys_connect+0x10/0x10
[  174.264922]  ? count_memcg_events.constprop.0+0x48/0x60
[  174.267947]  ? handle_mm_fault+0x2c8/0x910
[  174.269966]  ? ktime_get_coarse_real_ts64+0xa0/0xf0
[  174.275924]  ? __audit_syscall_entry+0x393/0x4f0
[  174.277970]  __x64_sys_sendto+0xe9/0x1d0
[  174.281972]  ? syscall_trace_enter.constprop.0+0x138/0x1b0
[  174.285970]  do_syscall_64+0x45/0x100
[  174.289966]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  174.293967] RIP: 0033:0x7fced220eca3
[  174.297970] Code: 48 8b 0d 08 83 20 00 f7 d8 64 89 01 48 83 c8 ff c3 
66 0f 1f 44 00 00 83 3d 49 c7 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 
05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b f7 ff ff 48 89 04 24
[  174.311883] RSP: 002b:00007ffe5f77d2e8 EFLAGS: 00000246 ORIG_RAX: 
000000000000002c
[  174.317968] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
00007fced220eca3
[  174.323930] RDX: 0000000000000020 RSI: 00007ffe5f77d400 RDI: 
0000000000000004
[  174.329968] RBP: 00007ffe5f77d330 R08: 00007ffe5f77d310 R09: 
000000000000000c
[  174.333967] R10: 0000000000000000 R11: 0000000000000246 R12: 
0000000000400730
[  174.340923] R13: 00007ffe5f77e520 R14: 0000000000000000 R15: 
0000000000000000
[  174.345969]  </TASK>
[  174.348922] Dumping ftrace buffer:
[  174.348922]    (ftrace buffer empty)
[  174.348922] Kernel Offset: disabled
[  174.348922] Rebooting in 86400 seconds..


Thanks,
Harshit


> Thanks
> -- 
> Gustavo
>
Gustavo A. R. Silva Jan. 8, 2024, 5:38 p.m. UTC | #5
On 1/8/24 11:31, Harshit Mogalapalli wrote:

>> BTW, is this _warning_ really causing a crash?
>>
> 
> I think we can say this can cause a crash in a way, WARN_ONCE() is still a WARNING and we can have systems with panic_on_warn set.
> 

Oh yes! thanks for the clarification. :)

--
Gustavo
Dan Carpenter Jan. 8, 2024, 6:36 p.m. UTC | #6
On Mon, Jan 08, 2024 at 11:03:51AM -0600, Gustavo A. R. Silva wrote:
> 
> 
> On 1/8/24 01:33, Dan Carpenter wrote:
> > On Fri, Jan 05, 2024 at 08:40:00AM -0800, Harshit Mogalapalli wrote:
> > > Syzkaller hit 'WARNING in dg_dispatch_as_host' bug.
> > > 
> > > memcpy: detected field-spanning write (size 56) of single field "&dg_info->msg"
> > > at drivers/misc/vmw_vmci/vmci_datagram.c:237 (size 24)
> > > 
> > > WARNING: CPU: 0 PID: 1555 at drivers/misc/vmw_vmci/vmci_datagram.c:237
> > > dg_dispatch_as_host+0x88e/0xa60 drivers/misc/vmw_vmci/vmci_datagram.c:237
> > > 
> > > Some code commentry, based on my understanding:
> > > 
> > > 544 #define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + (size_t)(_dg)->payload_size)
> > > /// This is 24 + payload_size
> > > 
> > > memcpy(&dg_info->msg, dg, dg_size);
> > > 	Destination = dg_info->msg ---> this is a 24 byte
> > > 					structure(struct vmci_datagram)
> > > 	Source = dg --> this is a 24 byte structure (struct vmci_datagram)
> > > 	Size = dg_size = 24 + payload_size
> > > 
> > > {payload_size = 56-24 =32} -- Syzkaller managed to set payload_size to 32.
> > > 
> > >   35 struct delayed_datagram_info {
> > >   36         struct datagram_entry *entry;
> > >   37         struct work_struct work;
> > >   38         bool in_dg_host_queue;
> > >   39         /* msg and msg_payload must be together. */
> > >   40         struct vmci_datagram msg;
> > >   41         u8 msg_payload[];
> > >   42 };
> > > 
> > > So those extra bytes of payload are copied into msg_payload[], a run time
> > > warning is seen while fuzzing with Syzkaller.
> > > 
> > > One possible way to fix the warning is to split the memcpy() into
> > > two parts -- one -- direct assignment of msg and second taking care of payload.
> > > 
> > > Gustavo quoted:
> > > "Under FORTIFY_SOURCE we should not copy data across multiple members
> > > in a structure."
> > > 
> > > Reported-by: syzkaller <syzkaller@googlegroups.com>
> > > Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
> > > Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> > > Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
> > > ---
> > > This patch is only tested with the C reproducer, not any testing
> > > specific to driver is done.
> > > 
> > > v1->v2: ( Suggestions from Gustavo )
> > >          1. Change the commit message false positive --> legitimate
> > >             warning.
> > 
> > The commit message is fine.
> > 
> > Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
> > 
> > But, I mean, it's not really "legitimate".  It meets the fortify source
> > heuristic, but it's still a false positive.  Fortify source is
> > *supposed* to find memory corruption bugs and this is not a memory
> > corruption bug.  It's just that these days we have to treat foritify
> > false positives as crashing bugs because people enable it and we have to
> > fix it.
> > 
> > Let's not pretend that fortify has helped us in this situation, it has
> > caused us a problem.  It has taken valid code and created a crashing
> > bug.  I'm not saying that the cost isn't worth it, but let's not pretend.
> > 
> 
> It's a "legitimate warning" (which differs from a "legitimate memory
> corruption bug") in the sense that the feature is doing what it's
> supposed to do: reporting a write beyond the boundaries of a field/member
> in a structure.
> 
> Is that simple.  I don't see the "pretense" here.
> 
> BTW, is this _warning_ really causing a crash?

I don't know how many people have Reboot on Warn enabled but I've heard
it's a shocking high number of people.

My problem with "legitimate" is that it's a biased word which imples
"good".  A more neutral way to describe it would be "acceptable" or
"matches the heuristic".

Generally, we get a lot of patches which are to make a tool happy and
sometimes like here it's probably an acceptable cost.  But I think
other times people lose sight of what it's all about and confuse good
and bad.  In some kind of very literal and narrow sense this warning is
bad.  It takes perfectly okay code and turns it into a crashing bug.  In
the larger sense and long term view then, sure, the heuristic is useful,
but right here, in this situation, it's bad.

regards,
dan carpenter
Gustavo A. R. Silva Jan. 8, 2024, 7:21 p.m. UTC | #7
On 1/8/24 12:36, Dan Carpenter wrote:
> On Mon, Jan 08, 2024 at 11:03:51AM -0600, Gustavo A. R. Silva wrote:
>>
>>
>> On 1/8/24 01:33, Dan Carpenter wrote:
>>> On Fri, Jan 05, 2024 at 08:40:00AM -0800, Harshit Mogalapalli wrote:
>>>> Syzkaller hit 'WARNING in dg_dispatch_as_host' bug.
>>>>
>>>> memcpy: detected field-spanning write (size 56) of single field "&dg_info->msg"
>>>> at drivers/misc/vmw_vmci/vmci_datagram.c:237 (size 24)
>>>>
>>>> WARNING: CPU: 0 PID: 1555 at drivers/misc/vmw_vmci/vmci_datagram.c:237
>>>> dg_dispatch_as_host+0x88e/0xa60 drivers/misc/vmw_vmci/vmci_datagram.c:237
>>>>
>>>> Some code commentry, based on my understanding:
>>>>
>>>> 544 #define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + (size_t)(_dg)->payload_size)
>>>> /// This is 24 + payload_size
>>>>
>>>> memcpy(&dg_info->msg, dg, dg_size);
>>>> 	Destination = dg_info->msg ---> this is a 24 byte
>>>> 					structure(struct vmci_datagram)
>>>> 	Source = dg --> this is a 24 byte structure (struct vmci_datagram)
>>>> 	Size = dg_size = 24 + payload_size
>>>>
>>>> {payload_size = 56-24 =32} -- Syzkaller managed to set payload_size to 32.
>>>>
>>>>    35 struct delayed_datagram_info {
>>>>    36         struct datagram_entry *entry;
>>>>    37         struct work_struct work;
>>>>    38         bool in_dg_host_queue;
>>>>    39         /* msg and msg_payload must be together. */
>>>>    40         struct vmci_datagram msg;
>>>>    41         u8 msg_payload[];
>>>>    42 };
>>>>
>>>> So those extra bytes of payload are copied into msg_payload[], a run time
>>>> warning is seen while fuzzing with Syzkaller.
>>>>
>>>> One possible way to fix the warning is to split the memcpy() into
>>>> two parts -- one -- direct assignment of msg and second taking care of payload.
>>>>
>>>> Gustavo quoted:
>>>> "Under FORTIFY_SOURCE we should not copy data across multiple members
>>>> in a structure."
>>>>
>>>> Reported-by: syzkaller <syzkaller@googlegroups.com>
>>>> Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
>>>> Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
>>>> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
>>>> ---
>>>> This patch is only tested with the C reproducer, not any testing
>>>> specific to driver is done.
>>>>
>>>> v1->v2: ( Suggestions from Gustavo )
>>>>           1. Change the commit message false positive --> legitimate
>>>>              warning.
>>>
>>> The commit message is fine.
>>>
>>> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
>>>
>>> But, I mean, it's not really "legitimate".  It meets the fortify source
>>> heuristic, but it's still a false positive.  Fortify source is
>>> *supposed* to find memory corruption bugs and this is not a memory
>>> corruption bug.  It's just that these days we have to treat foritify
>>> false positives as crashing bugs because people enable it and we have to
>>> fix it.
>>>
>>> Let's not pretend that fortify has helped us in this situation, it has
>>> caused us a problem.  It has taken valid code and created a crashing
>>> bug.  I'm not saying that the cost isn't worth it, but let's not pretend.
>>>
>>
>> It's a "legitimate warning" (which differs from a "legitimate memory
>> corruption bug") in the sense that the feature is doing what it's
>> supposed to do: reporting a write beyond the boundaries of a field/member
>> in a structure.
>>
>> Is that simple.  I don't see the "pretense" here.
>>
>> BTW, is this _warning_ really causing a crash?
> 
> I don't know how many people have Reboot on Warn enabled but I've heard
> it's a shocking high number of people.
> 
> My problem with "legitimate" is that it's a biased word which imples
> "good".  A more neutral way to describe it would be "acceptable" or
> "matches the heuristic".
> 
> Generally, we get a lot of patches which are to make a tool happy and
> sometimes like here it's probably an acceptable cost.  But I think
> other times people lose sight of what it's all about and confuse good
> and bad.  In some kind of very literal and narrow sense this warning is
> bad.  It takes perfectly okay code and turns it into a crashing bug.  In
> the larger sense and long term view then, sure, the heuristic is useful,
> but right here, in this situation, it's bad.


This is right on point:

"In some kind of very literal and narrow sense this warning is bad."

Let's say the vast majority of people is of this opinion. Thus, they
engage in never-ending discussions, and end up disregarding this sort
of warning, deciding to ignore it completely. Then, a lot more of these
warnings go unfixed. Then, a couple of actual memory corruption bugs
are introduced. Then, people don't notice them. Then, the hardening
feature ends up becoming useless.

Why insist on disregarding something that is clearly beneficial to
acknowledge and worth correcting right on the spot?

This is real work that must be done if we want the feature to help us
detect bugs and potential vulnerabilities.

Thanks
--
Gustavo
Kees Cook Jan. 8, 2024, 10:37 p.m. UTC | #8
On Fri, Jan 05, 2024 at 08:40:00AM -0800, Harshit Mogalapalli wrote:
> Syzkaller hit 'WARNING in dg_dispatch_as_host' bug.
> 
> memcpy: detected field-spanning write (size 56) of single field "&dg_info->msg"
> at drivers/misc/vmw_vmci/vmci_datagram.c:237 (size 24)
> 
> WARNING: CPU: 0 PID: 1555 at drivers/misc/vmw_vmci/vmci_datagram.c:237
> dg_dispatch_as_host+0x88e/0xa60 drivers/misc/vmw_vmci/vmci_datagram.c:237
> 
> Some code commentry, based on my understanding:
> 
> 544 #define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + (size_t)(_dg)->payload_size)
> /// This is 24 + payload_size
> 
> memcpy(&dg_info->msg, dg, dg_size);
> 	Destination = dg_info->msg ---> this is a 24 byte
> 					structure(struct vmci_datagram)
> 	Source = dg --> this is a 24 byte structure (struct vmci_datagram)
> 	Size = dg_size = 24 + payload_size
> 
> {payload_size = 56-24 =32} -- Syzkaller managed to set payload_size to 32.
> 
>  35 struct delayed_datagram_info {
>  36         struct datagram_entry *entry;
>  37         struct work_struct work;
>  38         bool in_dg_host_queue;
>  39         /* msg and msg_payload must be together. */
>  40         struct vmci_datagram msg;
>  41         u8 msg_payload[];
>  42 };
> 
> So those extra bytes of payload are copied into msg_payload[], a run time
> warning is seen while fuzzing with Syzkaller.
> 
> One possible way to fix the warning is to split the memcpy() into
> two parts -- one -- direct assignment of msg and second taking care of payload.
> 
> Gustavo quoted:
> "Under FORTIFY_SOURCE we should not copy data across multiple members
> in a structure."
> 
> Reported-by: syzkaller <syzkaller@googlegroups.com>
> Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
> Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>

Thanks for getting this fixed!

Yeah, it's a "false positive" in the sense that the code was expecting
to write into msg_payload. The warning is triggered because of the write
across the flex array boundary, which trips a bug in GCC and Clang,
which we're forced to work around.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832 (fixed in GCC 14+)
https://github.com/llvm/llvm-project/issues/72032 (not yet fixed in Clang)

Reviewed-by: Kees Cook <keescook@chromium.org>
Gustavo A. R. Silva Jan. 9, 2024, 2:05 a.m. UTC | #9
On 1/8/24 16:37, Kees Cook wrote:
> On Fri, Jan 05, 2024 at 08:40:00AM -0800, Harshit Mogalapalli wrote:
>> Syzkaller hit 'WARNING in dg_dispatch_as_host' bug.
>>
>> memcpy: detected field-spanning write (size 56) of single field "&dg_info->msg"
>> at drivers/misc/vmw_vmci/vmci_datagram.c:237 (size 24)
>>
>> WARNING: CPU: 0 PID: 1555 at drivers/misc/vmw_vmci/vmci_datagram.c:237
>> dg_dispatch_as_host+0x88e/0xa60 drivers/misc/vmw_vmci/vmci_datagram.c:237
>>
>> Some code commentry, based on my understanding:
>>
>> 544 #define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + (size_t)(_dg)->payload_size)
>> /// This is 24 + payload_size
>>
>> memcpy(&dg_info->msg, dg, dg_size);
>> 	Destination = dg_info->msg ---> this is a 24 byte
>> 					structure(struct vmci_datagram)
>> 	Source = dg --> this is a 24 byte structure (struct vmci_datagram)
>> 	Size = dg_size = 24 + payload_size
>>
>> {payload_size = 56-24 =32} -- Syzkaller managed to set payload_size to 32.
>>
>>   35 struct delayed_datagram_info {
>>   36         struct datagram_entry *entry;
>>   37         struct work_struct work;
>>   38         bool in_dg_host_queue;
>>   39         /* msg and msg_payload must be together. */
>>   40         struct vmci_datagram msg;
>>   41         u8 msg_payload[];
>>   42 };
>>
>> So those extra bytes of payload are copied into msg_payload[], a run time
>> warning is seen while fuzzing with Syzkaller.
>>
>> One possible way to fix the warning is to split the memcpy() into
>> two parts -- one -- direct assignment of msg and second taking care of payload.
>>
>> Gustavo quoted:
>> "Under FORTIFY_SOURCE we should not copy data across multiple members
>> in a structure."
>>
>> Reported-by: syzkaller <syzkaller@googlegroups.com>
>> Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
>> Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
>> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
> 
> Thanks for getting this fixed!
> 
> Yeah, it's a "false positive" in the sense that the code was expecting

It's a false positive _bug_, and a legitimate _warning_ coming from fortified
memcpy().

> to write into msg_payload. The warning is triggered because of the write
> across the flex array boundary, which trips a bug in GCC and Clang,
> which we're forced to work around.

The warning is triggered because of a write beyond the boundaries of
`dg_info->msg`. It's not directly related to the fact that there is a
flexible-array member following `dg_info->msg`.

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832 (fixed in GCC 14+)
> 	 (not yet fixed in Clang)

This issue is not related to the compiler bugs mentioned above.

> 
> Reviewed-by: Kees Cook <keescook@chromium.org>
> 

Thanks!
--
Gustavo
Dan Carpenter Jan. 9, 2024, 9:07 a.m. UTC | #10
On Mon, Jan 08, 2024 at 08:05:38PM -0600, Gustavo A. R. Silva wrote:
> > > Gustavo quoted:
> > > "Under FORTIFY_SOURCE we should not copy data across multiple members
> > > in a structure."
> > > 
> > > Reported-by: syzkaller <syzkaller@googlegroups.com>
> > > Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
> > > Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> > > Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
> > 
> > Thanks for getting this fixed!
> > 
> > Yeah, it's a "false positive" in the sense that the code was expecting
> 
> It's a false positive _bug_, and a legitimate _warning_ coming from fortified
> memcpy().

It really feels like you're trying to sell the cost of this as a good
thing...  We've already merged fortify so why are you still fighting
about this?  Now that it's merged, let's just all admit that false
positives are bad.

I feel like once we recognize that actually false positives are bad as
opposed to good then that's when we start looking for solutions.  In
Smatch, I have code that silences warnings about cross member writes
because it was a common source of false positives...

The Kconfig entry does not mention the risk of false positives at all
and it doesn't say anything about turning on fortify along with
CONFIG_PANIC_ON_OOPS is probably bad.  There are simple things we could
do to make this less risky.

regards,
dan carpenter
Gustavo A. R. Silva Jan. 9, 2024, 12:31 p.m. UTC | #11
On 1/9/24 03:07, Dan Carpenter wrote:
> On Mon, Jan 08, 2024 at 08:05:38PM -0600, Gustavo A. R. Silva wrote:
>>>> Gustavo quoted:
>>>> "Under FORTIFY_SOURCE we should not copy data across multiple members
>>>> in a structure."
>>>>
>>>> Reported-by: syzkaller <syzkaller@googlegroups.com>
>>>> Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
>>>> Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
>>>> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
>>>
>>> Thanks for getting this fixed!
>>>
>>> Yeah, it's a "false positive" in the sense that the code was expecting
>>
>> It's a false positive _bug_, and a legitimate _warning_ coming from fortified
>> memcpy().
> 
> It really feels like you're trying to sell the cost of this as a good
> thing...  We've already merged fortify so why are you still fighting

No, I'm just describing (here[1] and below), clear and concise, what fortify
is doing in this particular case, in response to your first intervention
in this thread[3].

"The warning is triggered because of a write beyond the boundaries of
`dg_info->msg`."[2]

You're arguing that fortify caused a problem. I'm describing the reason
why the feature triggered the warning. That's it, I guess.

Thanks
--
Gustavo

[1] https://lore.kernel.org/linux-hardening/9c742547-0021-464b-b7a8-7af46b0a4afa@embeddedor.com/
[2] https://lore.kernel.org/linux-hardening/7826922a-d642-424e-bede-bfc45be9254d@embeddedor.com/
[3] https://lore.kernel.org/linux-hardening/fc132bde-d42d-4aac-ba91-7a939a18091a@moroto.mountain/
Dan Carpenter Jan. 9, 2024, 1:22 p.m. UTC | #12
On Tue, Jan 09, 2024 at 06:31:41AM -0600, Gustavo A. R. Silva wrote:
> 
> You're arguing that fortify caused a problem.

Yes.

Before: Code working correctly
After: Kernel Panic

At first, I started to question if I was going mad, but then I looked
through the email thread and Harshit tested it and proved that the
kernel does actually panic depending on the .config.

I mean realistically we should backport this patch to old kernels,
right?  And if we had to assign a Fixes tag to this it would need to be
the commit which adds Fortify to the kernel.  Prior to that commit the
code was fine.

Again, I'm not saying that Fortify is bad overall.  Probably in DnD it
would be Chaotic Good where it's overall good but sometimes a pain.

regards,
dan carpenter
Gustavo A. R. Silva Jan. 9, 2024, 2:35 p.m. UTC | #13
On 1/9/24 07:22, Dan Carpenter wrote:
> On Tue, Jan 09, 2024 at 06:31:41AM -0600, Gustavo A. R. Silva wrote:
>>
>> You're arguing that fortify caused a problem.
> 
> Yes.
> 
> Before: Code working correctly
> After: Kernel Panic
> 
> At first, I started to question if I was going mad, but then I looked
> through the email thread and Harshit tested it and proved that the
> kernel does actually panic depending on the .config.
> 
> I mean realistically we should backport this patch to old kernels,
> right?  And if we had to assign a Fixes tag to this it would need to be
> the commit which adds Fortify to the kernel.  Prior to that commit the
> code was fine.
> 
> Again, I'm not saying that Fortify is bad overall.  Probably in DnD it
> would be Chaotic Good where it's overall good but sometimes a pain.
> 

You know what, I think the discrepancy here lies in the fact that, before,
fortify didn't complain about writes across multiple members in the same
struct, as long as we remained between the boundaries of the struct. This
was done via __builtin_object_size(p, 0).

In recent times, that has changed, and now fortify has been tightened to warn
about writes beyond the boundaries of a single object. This is now done via
__builtin_dynamic_object_size(p, 1)/__builtin_object_size(p, 1).

And, if this patch is to be backported, _technically_, the Fixes tag should
probably be assigned to that change from __bos(p, 0) to __bos(p, 1):

commit f68f2ff91512 ("fortify: Detect struct member overflows in memcpy() at compile-time")

In any case, fortify should now emit this sort of warning, and that adds
an extra layer of hardening to the kernel.

Thanks
--
Gustavo
Kees Cook Jan. 11, 2024, 12:03 a.m. UTC | #14
On Mon, Jan 08, 2024 at 08:05:38PM -0600, Gustavo A. R. Silva wrote:
> On 1/8/24 16:37, Kees Cook wrote:
> > On Fri, Jan 05, 2024 at 08:40:00AM -0800, Harshit Mogalapalli wrote:
> > > Syzkaller hit 'WARNING in dg_dispatch_as_host' bug.
> > > 
> > > memcpy: detected field-spanning write (size 56) of single field "&dg_info->msg"
> > > at drivers/misc/vmw_vmci/vmci_datagram.c:237 (size 24)
> > > 
> > > WARNING: CPU: 0 PID: 1555 at drivers/misc/vmw_vmci/vmci_datagram.c:237
> > > dg_dispatch_as_host+0x88e/0xa60 drivers/misc/vmw_vmci/vmci_datagram.c:237
> > > 
> > > Some code commentry, based on my understanding:
> > > 
> > > 544 #define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + (size_t)(_dg)->payload_size)
> > > /// This is 24 + payload_size
> > > 
> > > memcpy(&dg_info->msg, dg, dg_size);
> > > 	Destination = dg_info->msg ---> this is a 24 byte
> > > 					structure(struct vmci_datagram)
> > > 	Source = dg --> this is a 24 byte structure (struct vmci_datagram)
> > > 	Size = dg_size = 24 + payload_size
> > > 
> > > {payload_size = 56-24 =32} -- Syzkaller managed to set payload_size to 32.
> > > 
> > >   35 struct delayed_datagram_info {
> > >   36         struct datagram_entry *entry;
> > >   37         struct work_struct work;
> > >   38         bool in_dg_host_queue;
> > >   39         /* msg and msg_payload must be together. */
> > >   40         struct vmci_datagram msg;
> > >   41         u8 msg_payload[];
> > >   42 };
> > > 
> > > So those extra bytes of payload are copied into msg_payload[], a run time
> > > warning is seen while fuzzing with Syzkaller.
> > > 
> > > One possible way to fix the warning is to split the memcpy() into
> > > two parts -- one -- direct assignment of msg and second taking care of payload.
> > > 
> > > Gustavo quoted:
> > > "Under FORTIFY_SOURCE we should not copy data across multiple members
> > > in a structure."
> > > 
> > > Reported-by: syzkaller <syzkaller@googlegroups.com>
> > > Suggested-by: Vegard Nossum <vegard.nossum@oracle.com>
> > > Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> > > Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
> > 
> > Thanks for getting this fixed!
> > 
> > Yeah, it's a "false positive" in the sense that the code was expecting
> 
> It's a false positive _bug_, and a legitimate _warning_ coming from fortified
> memcpy().
> 
> > to write into msg_payload. The warning is triggered because of the write
> > across the flex array boundary, which trips a bug in GCC and Clang,
> > which we're forced to work around.
> 
> The warning is triggered because of a write beyond the boundaries of
> `dg_info->msg`. It's not directly related to the fact that there is a
> flexible-array member following `dg_info->msg`.
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832 (fixed in GCC 14+)
> > 	 (not yet fixed in Clang)
> 
> This issue is not related to the compiler bugs mentioned above.

Oops, yes, thanks for fixing my confusion. Right, this is a direct write
across members into the flex array, not a composite destination. Yay
all the corner cases. :P
Dan Carpenter Jan. 11, 2024, 7:15 a.m. UTC | #15
On Wed, Jan 10, 2024 at 04:03:28PM -0800, Kees Cook wrote:
> 
> Oops, yes, thanks for fixing my confusion. Right, this is a direct write
> across members into the flex array, not a composite destination. Yay
> all the corner cases. :P

Is there a document somewhere which explains what will trigger a runtime
warning?  For example, if you write across members but it's not into a
flex array does that cause an issue?  Or if you read across members?

For example, this line reads from bulletin->vlan and
bulletin->vlan_padding.  This causes a compiler warning in Clang and
GCC depending on the options but does it also trigger a runtime warning?
https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c#L2655

(I wrote a patch for this a few months back but didn't send it because
of the merge window.  I forgot about it until now that we're in a merge
window again...  :P)

regards,
dan carpenter
Vasiliy Kovalev Jan. 11, 2024, 12:53 p.m. UTC | #16
Hello, I was also working on solving this problem
https://lore.kernel.org/lkml/20240110104042.31865-1-kovalev@altlinux.org/T/#t.

Please note that there are 2 such places in the code, and by analogy with your
version of the changes, including changes in the approach to calculating the
size of the allocated memory, additional changes on top of your changes will
be as follows:

diff --git a/drivers/misc/vmw_vmci/vmci_datagram.c b/drivers/misc/vmw_vmci/vmci_datagram.c
index ba379cd6d054bd..1a50fcea681bf8 100644
--- a/drivers/misc/vmw_vmci/vmci_datagram.c
+++ b/drivers/misc/vmw_vmci/vmci_datagram.c
@@ -369,8 +369,9 @@ int vmci_datagram_invoke_guest_handler(struct vmci_datagram *dg)
        if (dst_entry->run_delayed) {
                struct delayed_datagram_info *dg_info;
 
-               dg_info = kmalloc(sizeof(*dg_info) + (size_t)dg->payload_size,
+               dg_info = kmalloc(struct_size(dg_info, msg_payload, dg->payload_size),
                                  GFP_ATOMIC);
+
                if (!dg_info) {
                        vmci_resource_put(resource);
                        return VMCI_ERROR_NO_MEM;
@@ -378,7 +379,9 @@ int vmci_datagram_invoke_guest_handler(struct vmci_datagram *dg)
 
                dg_info->in_dg_host_queue = false;
                dg_info->entry = dst_entry;
-               memcpy(&dg_info->msg, dg, VMCI_DG_SIZE(dg));
+               dg_info->msg = *dg;
+               memcpy(&dg_info->msg_payload, dg + 1, dg->payload_size);
+
 
                INIT_WORK(&dg_info->work, dg_delayed_dispatch);
                schedule_work(&dg_info->work);
Kees Cook Jan. 11, 2024, 6:13 p.m. UTC | #17
On Thu, Jan 11, 2024 at 10:15:40AM +0300, Dan Carpenter wrote:
> On Wed, Jan 10, 2024 at 04:03:28PM -0800, Kees Cook wrote:
> > 
> > Oops, yes, thanks for fixing my confusion. Right, this is a direct write
> > across members into the flex array, not a composite destination. Yay
> > all the corner cases. :P
> 
> Is there a document somewhere which explains what will trigger a runtime
> warning?  For example, if you write across members but it's not into a
> flex array does that cause an issue?  Or if you read across members?

There isn't a good place to find this. There are some code comments near
the memcpy macros, but that's not really sufficient.

At present FORTIFY is only being pedantic about _writes_, as that's
generally a higher priority problem. The implemented restriction is that
the destination buffer must be a single addressable struct member. That
destination can be a composite member (i.e. an entire substruct), but
going beyond a single member in a single memcpy() is no longer allowed
because the compiler cannot reason about whether such a copy is
"intentional".

> For example, this line reads from bulletin->vlan and
> bulletin->vlan_padding.  This causes a compiler warning in Clang and
> GCC depending on the options but does it also trigger a runtime warning?
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c#L2655

Right, the -Wstringop-overread and related compiler flags are doing the
source buffer size checking.

Note that for the compile-time warnings, GCC has the capacity to be much
more strict than the FORTIFY checks because it can perform value _range_
tracking, where as FORTIFY compile-time checks are limited to having the
copy size being a constant expression. (i.e. GCC may produce compile
time warnings for cases that FORTIFY will only warn about at runtime if
the range is violated.)

> (I wrote a patch for this a few months back but didn't send it because
> of the merge window.  I forgot about it until now that we're in a merge
> window again...  :P)

			memcpy(&ivi->vlan, &bulletin->vlan, VLAN_HLEN);

#define VLAN_HLEN 4
ivi->vlan is u32
bulletin has:
	u16 vlan;
	u8 vlan_padding[6];

yeah, ew. Should it even be reading padding? i.e. should this be:

		ivi->vlan = bulletin->vlan << 16;

?

Or should bulletin be:

	union {
		struct {
			u16 vlan;
			u8 vlan_padding[6];
		};
		struct {
			u32 vlan_header;
			u8 vlan_header_padding[4];
		};
	};

with:

		ivi->vlan = bulletin->vlan_header;

?

I've been finding that almost all memcpy()s and memset()s into non-array
types are better just rewritten as a direct assignment. :P
Dan Carpenter Jan. 12, 2024, 5:35 a.m. UTC | #18
On Thu, Jan 11, 2024 at 10:13:44AM -0800, Kees Cook wrote:
> On Thu, Jan 11, 2024 at 10:15:40AM +0300, Dan Carpenter wrote:
> > On Wed, Jan 10, 2024 at 04:03:28PM -0800, Kees Cook wrote:
> > (I wrote a patch for this a few months back but didn't send it because
> > of the merge window.  I forgot about it until now that we're in a merge
> > window again...  :P)
> 
> 			memcpy(&ivi->vlan, &bulletin->vlan, VLAN_HLEN);
> 
> #define VLAN_HLEN 4
> ivi->vlan is u32
> bulletin has:
> 	u16 vlan;
> 	u8 vlan_padding[6];
> 
> yeah, ew. Should it even be reading padding? i.e. should this be:
> 
> 		ivi->vlan = bulletin->vlan << 16;
> 

That seems reasonable.  We don't care about endianness?

> ?
> 
> Or should bulletin be:
> 
> 	union {
> 		struct {
> 			u16 vlan;
> 			u8 vlan_padding[6];
> 		};
> 		struct {
> 			u32 vlan_header;
> 			u8 vlan_header_padding[4];
> 		};
> 	};
> 
> with:
> 
> 		ivi->vlan = bulletin->vlan_header;
> 
> ?

My patch used a struct group.

	struct_group(vlan_header,
		u16 vlan;
		u16 vlan_padding;
	);

We don't need 6 bytes of padding, only 2.  Using a shift seems like
possibly a better approach.  Have to think about that.


> 
> I've been finding that almost all memcpy()s and memset()s into non-array
> types are better just rewritten as a direct assignment. :P

That seems like a good rule of thumb, thanks.

regards,
dan carpenter
Harshit Mogalapalli Feb. 16, 2024, 7:35 a.m. UTC | #19
Hi Kovalev,

On 11/01/24 6:23 pm, kovalev@altlinux.org wrote:
> Hello, I was also working on solving this problem
> https://lore.kernel.org/lkml/20240110104042.31865-1-kovalev@altlinux.org/T/#t.
> 
> Please note that there are 2 such places in the code, and by analogy with your
> version of the changes, including changes in the approach to calculating the
> size of the allocated memory, additional changes on top of your changes will
> be as follows:
> 
> diff --git a/drivers/misc/vmw_vmci/vmci_datagram.c b/drivers/misc/vmw_vmci/vmci_datagram.c
> index ba379cd6d054bd..1a50fcea681bf8 100644
> --- a/drivers/misc/vmw_vmci/vmci_datagram.c
> +++ b/drivers/misc/vmw_vmci/vmci_datagram.c
> @@ -369,8 +369,9 @@ int vmci_datagram_invoke_guest_handler(struct vmci_datagram *dg)
>          if (dst_entry->run_delayed) {
>                  struct delayed_datagram_info *dg_info;
>   
> -               dg_info = kmalloc(sizeof(*dg_info) + (size_t)dg->payload_size,
> +               dg_info = kmalloc(struct_size(dg_info, msg_payload, dg->payload_size),
>                                    GFP_ATOMIC);
> +
>                  if (!dg_info) {
>                          vmci_resource_put(resource);
>                          return VMCI_ERROR_NO_MEM;
> @@ -378,7 +379,9 @@ int vmci_datagram_invoke_guest_handler(struct vmci_datagram *dg)
>   
>                  dg_info->in_dg_host_queue = false;
>                  dg_info->entry = dst_entry;
> -               memcpy(&dg_info->msg, dg, VMCI_DG_SIZE(dg));
> +               dg_info->msg = *dg;
> +               memcpy(&dg_info->msg_payload, dg + 1, dg->payload_size);
> +
>   
>                  INIT_WORK(&dg_info->work, dg_delayed_dispatch);
>                  schedule_work(&dg_info->work);
I think you need to send a separate patch/patches for this.

[linux-next]$ git describe
next-20240216
[linux-next]$ git log --oneline drivers/misc/vmw_vmci/vmci_datagram.c
19b070fefd0d VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host()
e03d4910e6e4 VMCI: Use struct_size() in kmalloc()

I see that the two patches I sent are applied by Kees and are in linux-next.

I am thinking if we can reproduce the above WARNING in 
vmci_datagram_invoke_guest_handler() by modifying the C reproducer 
generated by Syzkaller for dg_dispatch_as_host()

Thanks,
Harshit
diff mbox series

Patch

diff --git a/drivers/misc/vmw_vmci/vmci_datagram.c b/drivers/misc/vmw_vmci/vmci_datagram.c
index ac6cb0c8d99b..ba379cd6d054 100644
--- a/drivers/misc/vmw_vmci/vmci_datagram.c
+++ b/drivers/misc/vmw_vmci/vmci_datagram.c
@@ -234,7 +234,8 @@  static int dg_dispatch_as_host(u32 context_id, struct vmci_datagram *dg)
 
 			dg_info->in_dg_host_queue = true;
 			dg_info->entry = dst_entry;
-			memcpy(&dg_info->msg, dg, dg_size);
+			dg_info->msg = *dg;
+			memcpy(&dg_info->msg_payload, dg + 1, dg->payload_size);
 
 			INIT_WORK(&dg_info->work, dg_delayed_dispatch);
 			schedule_work(&dg_info->work);