diff mbox series

io/channel-socket: Fix -fsanitize=undefined problem with latest Clang

Message ID 20240529133106.1224866-1-thuth@redhat.com (mailing list archive)
State New
Headers show
Series io/channel-socket: Fix -fsanitize=undefined problem with latest Clang | expand

Commit Message

Thomas Huth May 29, 2024, 1:31 p.m. UTC
Casting function pointers from one type to another causes undefined
behavior errors when compiling with -fsanitize=undefined with Clang v18:

 $ QTEST_QEMU_BINARY=./qemu-system-mips64 tests/qtest/netdev-socket
 TAP version 13
 # random seed: R02S4424f4f460de783fdd3d72c5571d3adc
 1..10
 # Start of mips64 tests
 # Start of netdev tests
 # Start of stream tests
 # starting QEMU: exec ./qemu-system-mips64 -qtest unix:/tmp/qtest-1213196.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1213196.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -nodefaults -M none -netdev stream,id=st0,addr.type=fd,addr.str=3 -accel qtest
 ../io/task.c:78:13: runtime error: call to function qapi_free_SocketAddress through pointer to incorrect function type 'void (*)(void *)'
 /tmp/qemu-sanitize/qapi/qapi-types-sockets.c:170: note: qapi_free_SocketAddress defined here
 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../io/task.c:78:13

Add a wrapper function to avoid the problem.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 io/channel-socket.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Philippe Mathieu-Daudé May 29, 2024, 1:52 p.m. UTC | #1
On 29/5/24 15:31, Thomas Huth wrote:
> Casting function pointers from one type to another causes undefined
> behavior errors when compiling with -fsanitize=undefined with Clang v18:
> 
>   $ QTEST_QEMU_BINARY=./qemu-system-mips64 tests/qtest/netdev-socket
>   TAP version 13
>   # random seed: R02S4424f4f460de783fdd3d72c5571d3adc
>   1..10
>   # Start of mips64 tests
>   # Start of netdev tests
>   # Start of stream tests
>   # starting QEMU: exec ./qemu-system-mips64 -qtest unix:/tmp/qtest-1213196.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1213196.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -nodefaults -M none -netdev stream,id=st0,addr.type=fd,addr.str=3 -accel qtest
>   ../io/task.c:78:13: runtime error: call to function qapi_free_SocketAddress through pointer to incorrect function type 'void (*)(void *)'
>   /tmp/qemu-sanitize/qapi/qapi-types-sockets.c:170: note: qapi_free_SocketAddress defined here
>   SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../io/task.c:78:13
> 
> Add a wrapper function to avoid the problem.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>   io/channel-socket.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Peter Maydell May 29, 2024, 1:53 p.m. UTC | #2
On Wed, 29 May 2024 at 14:32, Thomas Huth <thuth@redhat.com> wrote:
>
> Casting function pointers from one type to another causes undefined
> behavior errors when compiling with -fsanitize=undefined with Clang v18:
>
>  $ QTEST_QEMU_BINARY=./qemu-system-mips64 tests/qtest/netdev-socket
>  TAP version 13
>  # random seed: R02S4424f4f460de783fdd3d72c5571d3adc
>  1..10
>  # Start of mips64 tests
>  # Start of netdev tests
>  # Start of stream tests
>  # starting QEMU: exec ./qemu-system-mips64 -qtest unix:/tmp/qtest-1213196.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1213196.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -nodefaults -M none -netdev stream,id=st0,addr.type=fd,addr.str=3 -accel qtest
>  ../io/task.c:78:13: runtime error: call to function qapi_free_SocketAddress through pointer to incorrect function type 'void (*)(void *)'
>  /tmp/qemu-sanitize/qapi/qapi-types-sockets.c:170: note: qapi_free_SocketAddress defined here
>  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../io/task.c:78:13

It's a pity the sanitizer error doesn't tell you the actual
function type as well as the incorrect one it got cast to
(especially since in this case the function and its declaration
are both in generated code in the build tree not conveniently
findable with 'git grep'.)

In this case the function being called is:
 void qapi_free_SocketAddress(SocketAddress *obj)
and it's cast to a GDestroyNotify, which is
 typedef void            (*GDestroyNotify)       (gpointer       data);
(and gpointer is void*)

and although you can pass a foo* to a function expecting void*,
you can't treat a pointer to a function taking foo* as if it was
a pointer to a function taking void*, just in case the compiler
needs to do some clever trickery with the pointer value.

So the wrapper function looks like it doesn't do anything,
but it's doing the permitted implicit-cast of the argument.

> Add a wrapper function to avoid the problem.
>
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  io/channel-socket.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/io/channel-socket.c b/io/channel-socket.c
> index 3a899b0608..aa2a1c8586 100644
> --- a/io/channel-socket.c
> +++ b/io/channel-socket.c
> @@ -193,6 +193,10 @@ static void qio_channel_socket_connect_worker(QIOTask *task,
>      qio_task_set_error(task, err);
>  }
>
> +static void qio_qapi_free_SocketAddress(gpointer sa)
> +{
> +    qapi_free_SocketAddress(sa);
> +}
>
>  void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
>                                        SocketAddress *addr,
> @@ -213,7 +217,7 @@ void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
>      qio_task_run_in_thread(task,
>                             qio_channel_socket_connect_worker,
>                             addrCopy,
> -                           (GDestroyNotify)qapi_free_SocketAddress,
> +                           qio_qapi_free_SocketAddress,
>                             context);
>  }

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM
Daniel P. Berrangé June 3, 2024, 12:48 p.m. UTC | #3
On Wed, May 29, 2024 at 02:53:37PM +0100, Peter Maydell wrote:
> On Wed, 29 May 2024 at 14:32, Thomas Huth <thuth@redhat.com> wrote:
> >
> > Casting function pointers from one type to another causes undefined
> > behavior errors when compiling with -fsanitize=undefined with Clang v18:
> >
> >  $ QTEST_QEMU_BINARY=./qemu-system-mips64 tests/qtest/netdev-socket
> >  TAP version 13
> >  # random seed: R02S4424f4f460de783fdd3d72c5571d3adc
> >  1..10
> >  # Start of mips64 tests
> >  # Start of netdev tests
> >  # Start of stream tests
> >  # starting QEMU: exec ./qemu-system-mips64 -qtest unix:/tmp/qtest-1213196.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1213196.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -nodefaults -M none -netdev stream,id=st0,addr.type=fd,addr.str=3 -accel qtest
> >  ../io/task.c:78:13: runtime error: call to function qapi_free_SocketAddress through pointer to incorrect function type 'void (*)(void *)'
> >  /tmp/qemu-sanitize/qapi/qapi-types-sockets.c:170: note: qapi_free_SocketAddress defined here
> >  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../io/task.c:78:13
> 
> It's a pity the sanitizer error doesn't tell you the actual
> function type as well as the incorrect one it got cast to
> (especially since in this case the function and its declaration
> are both in generated code in the build tree not conveniently
> findable with 'git grep'.)
> 
> In this case the function being called is:
>  void qapi_free_SocketAddress(SocketAddress *obj)
> and it's cast to a GDestroyNotify, which is
>  typedef void            (*GDestroyNotify)       (gpointer       data);
> (and gpointer is void*)
> 
> and although you can pass a foo* to a function expecting void*,
> you can't treat a pointer to a function taking foo* as if it was
> a pointer to a function taking void*, just in case the compiler
> needs to do some clever trickery with the pointer value.
>
> So the wrapper function looks like it doesn't do anything,
> but it's doing the permitted implicit-cast of the argument


I guess that's the letter of the law in C, but does that actually
matter in practice, historically ?

The use of "(GDestroyNotify)blah"  casts is standard practice
across any application using GLib, and even in QEMU this is
far from the only place that does such a cast:

$ git grep '(GDestroyNotify)'
chardev/char-socket.c:                                   (GDestroyNotify)object_unref);
hw/i386/kvm/xen_xenstore.c:    g_list_free_full(w->events, (GDestroyNotify)free_watch_event);
hw/i386/kvm/xenstore_impl.c:                                            (GDestroyNotify)xs_node_unref);
hw/i386/kvm/xenstore_impl.c:                                            (GDestroyNotify)xs_node_unref);
hw/i386/kvm/xenstore_impl.c:                                            (GDestroyNotify)xs_node_unref);
hw/mem/sparse-mem.c:                                      (GDestroyNotify)g_free);
hw/misc/xlnx-versal-cframe-reg.c:                                  NULL, (GDestroyNotify)g_free);
hw/virtio/virtio-iommu.c:                                   NULL, (GDestroyNotify)g_free,
hw/virtio/virtio-iommu.c:                                   (GDestroyNotify)g_free);
io/channel-socket.c:                           (GDestroyNotify)qapi_free_SocketAddress,
io/net-listener.c:            listener, (GDestroyNotify)object_unref, NULL);
io/net-listener.c:                listener, (GDestroyNotify)object_unref, context);
io/net-listener.c:                listener, (GDestroyNotify)object_unref, NULL);
migration/block-dirty-bitmap.c:        gdn = (GDestroyNotify) qapi_free_BitmapMigrationBitmapAlias;
subprojects/libvhost-user/libvhost-user-glib.c:                                       (GDestroyNotify) vug_source_destroy);
system/memory.c:                                       (GDestroyNotify) flatview_unref);
tests/qtest/libqos/qgraph.c:    g_test_queue_destroy((GDestroyNotify) qos_object_destroy, obj);
tests/unit/test-vmstate.c:                                        (GDestroyNotify)g_free,
tests/unit/test-vmstate.c:                                        (GDestroyNotify)g_free);
tests/unit/test-vmstate.c:                                              (GDestroyNotify)g_free,
tests/unit/test-vmstate.c:                                              (GDestroyNotify)g_free);
ui/dbus-clipboard.c:        (GDestroyNotify)qemu_clipboard_info_unref,
ui/dbus-listener.c:        (GDestroyNotify)qemu_free_displaysurface,
ui/dbus-listener.c:        (GDestroyNotify)pixman_image_unref,
ui/dbus-listener.c:        (GDestroyNotify)pixman_image_unref, pixman_image_ref(ddl->ds->image));
ui/dbus-listener.c:        (GDestroyNotify)cursor_unref,
ui/dbus.c:        (GDestroyNotify)g_array_unref, consoles);

I presume this means the rest of the code merely isn't exercised by the
CI job test case this is fixing, but if we're saying this needs fixing,
then we should be fixing all usage.

Then there are many other function casts beyond GDestroyNotify that
we use, which feel like they should have similar conceptual problems:

$ git grep -E '\(\w+Func\)\w+' '*.c' | grep -v -E '(void|int|double|float|HANDLE)'
chardev/char-fd.c:    FEWatchFunc func = (FEWatchFunc)callback;
chardev/char-fd.c:        g_source_set_callback(child, (GSourceFunc)child_func, source, NULL);
chardev/char-fd.c:        g_source_set_callback(child, (GSourceFunc)child_func, source, NULL);
chardev/char-fe.c:    g_source_set_callback(src, (GSourceFunc)func, user_data, NULL);
chardev/char-socket.c:    g_source_set_callback(s->hup_source, (GSourceFunc)tcp_chr_hup,
chardev/spice.c:    GIOFunc func = (GIOFunc)callback;
hw/i386/kvm/xenstore_impl.c:    *items = g_list_insert_sorted(*items, g_strdup(key), (GCompareFunc)strcmp);
io/channel-buffer.c:    QIOChannelFunc func = (QIOChannelFunc)callback;
io/channel-null.c:    QIOChannelFunc func = (QIOChannelFunc)callback;
io/channel-watch.c:    QIOChannelFunc func = (QIOChannelFunc)callback;
io/channel-watch.c:    QIOChannelFunc func = (QIOChannelFunc)callback;
io/channel-watch.c:    QIOChannelFunc func = (QIOChannelFunc)callback;
io/channel-websock.c:    QIOChannelFunc func = (QIOChannelFunc)callback;
io/channel.c:    g_source_set_callback(source, (GSourceFunc)func, user_data, notify);
io/channel.c:                          (GSourceFunc)qio_channel_wait_complete,
io/net-listener.c:                              (GSourceFunc)qio_net_listener_wait_client_func,
migration/migration.c:    notify->notify = (NotifierWithReturnFunc)func;
migration/rdma.c:    QIOChannelFunc func = (QIOChannelFunc)callback;
net/colo-compare.c:                                  (GCompareDataFunc)seq_sorter,
net/colo-compare.c:                                (GCompareFunc)colo_old_packet_check_one))
net/colo-compare.c:                                (GCompareFunc)colo_old_packet_check_one))
net/colo-compare.c:                        (GCompareFunc)colo_old_packet_check_one_conn);
net/colo-compare.c:                 pkt, (GCompareFunc)HandlePacket);
net/net.c:    g_ptr_array_sort(results, (GCompareFunc)model_cmp);
qga/commands-win32.c:        getifentry2_ex = (GetIfEntry2Func)func;
qga/vss-win32.c:    func = (QGAVSSRequesterFunc)GetProcAddress(provider_lib, func_name);
subprojects/libvhost-user/libvhost-user-glib.c:    g_source_set_callback(gsrc, (GSourceFunc)vu_cb, data, NULL);
system/qdev-monitor.c:    g_ptr_array_sort(array, (GCompareFunc)qemu_pstrcmp0);
target/i386/cpu.c:    names = g_list_sort(names, (GCompareFunc)strcmp);
tests/unit/test-qtree.c:    tree = q_tree_new_full((GCompareDataFunc)my_compare, NULL,
tests/unit/test-util-filemonitor.c:                   (GFunc)qemu_file_monitor_test_record_free, NULL);
util/filemonitor-inotify.c:    g_idle_add((GSourceFunc)qemu_file_monitor_free_idle, mon);
util/qemu-option.c:    g_ptr_array_sort(array, (GCompareFunc)qemu_pstrcmp0);


So overall, I'm not in favour of this patch unless there's a compelling
functional reason why just this 1 case is special and all the others
can be safely ignored.

> 
> > Add a wrapper function to avoid the problem.
> >
> > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > ---
> >  io/channel-socket.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > index 3a899b0608..aa2a1c8586 100644
> > --- a/io/channel-socket.c
> > +++ b/io/channel-socket.c
> > @@ -193,6 +193,10 @@ static void qio_channel_socket_connect_worker(QIOTask *task,
> >      qio_task_set_error(task, err);
> >  }
> >
> > +static void qio_qapi_free_SocketAddress(gpointer sa)
> > +{
> > +    qapi_free_SocketAddress(sa);
> > +}
> >
> >  void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
> >                                        SocketAddress *addr,
> > @@ -213,7 +217,7 @@ void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
> >      qio_task_run_in_thread(task,
> >                             qio_channel_socket_connect_worker,
> >                             addrCopy,
> > -                           (GDestroyNotify)qapi_free_SocketAddress,
> > +                           qio_qapi_free_SocketAddress,
> >                             context);
> >  }
> 
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> 
> thanks
> -- PMM
> 

With regards,
Daniel
Thomas Huth June 3, 2024, 2:38 p.m. UTC | #4
On 03/06/2024 14.48, Daniel P. Berrangé wrote:
> On Wed, May 29, 2024 at 02:53:37PM +0100, Peter Maydell wrote:
>> On Wed, 29 May 2024 at 14:32, Thomas Huth <thuth@redhat.com> wrote:
>>>
>>> Casting function pointers from one type to another causes undefined
>>> behavior errors when compiling with -fsanitize=undefined with Clang v18:
>>>
>>>   $ QTEST_QEMU_BINARY=./qemu-system-mips64 tests/qtest/netdev-socket
>>>   TAP version 13
>>>   # random seed: R02S4424f4f460de783fdd3d72c5571d3adc
>>>   1..10
>>>   # Start of mips64 tests
>>>   # Start of netdev tests
>>>   # Start of stream tests
>>>   # starting QEMU: exec ./qemu-system-mips64 -qtest unix:/tmp/qtest-1213196.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1213196.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -nodefaults -M none -netdev stream,id=st0,addr.type=fd,addr.str=3 -accel qtest
>>>   ../io/task.c:78:13: runtime error: call to function qapi_free_SocketAddress through pointer to incorrect function type 'void (*)(void *)'
>>>   /tmp/qemu-sanitize/qapi/qapi-types-sockets.c:170: note: qapi_free_SocketAddress defined here
>>>   SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../io/task.c:78:13
>>
>> It's a pity the sanitizer error doesn't tell you the actual
>> function type as well as the incorrect one it got cast to
>> (especially since in this case the function and its declaration
>> are both in generated code in the build tree not conveniently
>> findable with 'git grep'.)
>>
>> In this case the function being called is:
>>   void qapi_free_SocketAddress(SocketAddress *obj)
>> and it's cast to a GDestroyNotify, which is
>>   typedef void            (*GDestroyNotify)       (gpointer       data);
>> (and gpointer is void*)
>>
>> and although you can pass a foo* to a function expecting void*,
>> you can't treat a pointer to a function taking foo* as if it was
>> a pointer to a function taking void*, just in case the compiler
>> needs to do some clever trickery with the pointer value.
>>
>> So the wrapper function looks like it doesn't do anything,
>> but it's doing the permitted implicit-cast of the argument
> 
> 
> I guess that's the letter of the law in C, but does that actually
> matter in practice, historically ?
> 
> The use of "(GDestroyNotify)blah"  casts is standard practice
> across any application using GLib, and even in QEMU this is
> far from the only place that does such a cast:
...
> I presume this means the rest of the code merely isn't exercised by the
> CI job test case this is fixing, but if we're saying this needs fixing,
> then we should be fixing all usage.

There are more spots discovered by the CI, I just didn't tackle them yet.

Another problem is e.g. the call_rcu macro from include/qemu/rcu.h...

> So overall, I'm not in favour of this patch unless there's a compelling
> functional reason why just this 1 case is special and all the others
> can be safely ignored.

We'd need to tackle them all. I thought it might be a good idea since we 
then could also switch to a more strict CFI mode (i.e. stop using 
-fsanitize-cfi-icall-generalize-pointers) ... but yes, looking at the amount 
of spots that need fixes, it feels like tilting at windmills.

Maybe best if we really just add -fno-sanitize=function to the CFLAGS when 
we're compiling with the sanitizer enabled.

  Thomas
Alex Bennée June 3, 2024, 2:47 p.m. UTC | #5
Thomas Huth <thuth@redhat.com> writes:

> Casting function pointers from one type to another causes undefined
> behavior errors when compiling with -fsanitize=undefined with Clang
> v18:

Queued to testing/next, thanks.
Daniel P. Berrangé June 3, 2024, 2:49 p.m. UTC | #6
On Mon, Jun 03, 2024 at 04:38:53PM +0200, Thomas Huth wrote:
> On 03/06/2024 14.48, Daniel P. Berrangé wrote:
> > On Wed, May 29, 2024 at 02:53:37PM +0100, Peter Maydell wrote:
> > > On Wed, 29 May 2024 at 14:32, Thomas Huth <thuth@redhat.com> wrote:
> > > > 
> > > > Casting function pointers from one type to another causes undefined
> > > > behavior errors when compiling with -fsanitize=undefined with Clang v18:
> > > > 
> > > >   $ QTEST_QEMU_BINARY=./qemu-system-mips64 tests/qtest/netdev-socket
> > > >   TAP version 13
> > > >   # random seed: R02S4424f4f460de783fdd3d72c5571d3adc
> > > >   1..10
> > > >   # Start of mips64 tests
> > > >   # Start of netdev tests
> > > >   # Start of stream tests
> > > >   # starting QEMU: exec ./qemu-system-mips64 -qtest unix:/tmp/qtest-1213196.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1213196.qmp,id=char0 -mon chardev=char0,mode=control -display none -audio none -nodefaults -M none -netdev stream,id=st0,addr.type=fd,addr.str=3 -accel qtest
> > > >   ../io/task.c:78:13: runtime error: call to function qapi_free_SocketAddress through pointer to incorrect function type 'void (*)(void *)'
> > > >   /tmp/qemu-sanitize/qapi/qapi-types-sockets.c:170: note: qapi_free_SocketAddress defined here
> > > >   SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../io/task.c:78:13
> > > 
> > > It's a pity the sanitizer error doesn't tell you the actual
> > > function type as well as the incorrect one it got cast to
> > > (especially since in this case the function and its declaration
> > > are both in generated code in the build tree not conveniently
> > > findable with 'git grep'.)
> > > 
> > > In this case the function being called is:
> > >   void qapi_free_SocketAddress(SocketAddress *obj)
> > > and it's cast to a GDestroyNotify, which is
> > >   typedef void            (*GDestroyNotify)       (gpointer       data);
> > > (and gpointer is void*)
> > > 
> > > and although you can pass a foo* to a function expecting void*,
> > > you can't treat a pointer to a function taking foo* as if it was
> > > a pointer to a function taking void*, just in case the compiler
> > > needs to do some clever trickery with the pointer value.
> > > 
> > > So the wrapper function looks like it doesn't do anything,
> > > but it's doing the permitted implicit-cast of the argument
> > 
> > 
> > I guess that's the letter of the law in C, but does that actually
> > matter in practice, historically ?
> > 
> > The use of "(GDestroyNotify)blah"  casts is standard practice
> > across any application using GLib, and even in QEMU this is
> > far from the only place that does such a cast:
> ...
> > I presume this means the rest of the code merely isn't exercised by the
> > CI job test case this is fixing, but if we're saying this needs fixing,
> > then we should be fixing all usage.
> 
> There are more spots discovered by the CI, I just didn't tackle them yet.
> 
> Another problem is e.g. the call_rcu macro from include/qemu/rcu.h...
> 
> > So overall, I'm not in favour of this patch unless there's a compelling
> > functional reason why just this 1 case is special and all the others
> > can be safely ignored.
> 
> We'd need to tackle them all. I thought it might be a good idea since we
> then could also switch to a more strict CFI mode (i.e. stop using
> -fsanitize-cfi-icall-generalize-pointers) ... but yes, looking at the amount
> of spots that need fixes, it feels like tilting at windmills.

We can't rely on the sanitizers to catch all cases where we're casting
functions, as we don't have good enough code coverage in tests to
identify all places that way.

Unless there's a warning flag we can use to get diagnosis of this
problem at compile time and then fix all reported issues, I won't have
any confidence in our ability to remove -fsanitize-cfi-icall-generalize-pointers
for CFI.

> Maybe best if we really just add -fno-sanitize=function to the CFLAGS when
> we're compiling with the sanitizer enabled.

That's my recommendation for the short term.

With regards,
Daniel
Daniel P. Berrangé June 3, 2024, 2:50 p.m. UTC | #7
On Mon, Jun 03, 2024 at 03:47:55PM +0100, Alex Bennée wrote:
> Thomas Huth <thuth@redhat.com> writes:
> 
> > Casting function pointers from one type to another causes undefined
> > behavior errors when compiling with -fsanitize=undefined with Clang
> > v18:
> 
> Queued to testing/next, thanks.

Please remove, as I don't think this is a viable approach to the
problem

With regards,
Daniel
Peter Maydell June 3, 2024, 2:58 p.m. UTC | #8
On Mon, 3 Jun 2024 at 15:49, Daniel P. Berrangé <berrange@redhat.com> wrote:
> We can't rely on the sanitizers to catch all cases where we're casting
> functions, as we don't have good enough code coverage in tests to
> identify all places that way.
>
> Unless there's a warning flag we can use to get diagnosis of this
> problem at compile time and then fix all reported issues, I won't have
> any confidence in our ability to remove -fsanitize-cfi-icall-generalize-pointers
> for CFI.

You might think that -Wcast-function-type would detect them at compile
time, but that has two loopholes:
 1. void(*) (void)  matches everything
 2. any parameter of pointer type matches any other pointer type

It seems to me rather inconsistent that the sanitizers do
not match up with the warning semantics here. I think I
would vote for raising that with the compiler folks --
either the sanitizer should be made looser so it matches
the -Wcast-function-type semantics, or else a new tighter
warning option that matches the sanitizer should be
provided. Ideally both, I think. But it's definitely silly
to have a runtime check that flags up things that the
compiler perfectly well could detect at compile time but
is choosing not to.

thanks
-- PMM
Peter Maydell June 3, 2024, 3:12 p.m. UTC | #9
On Mon, 3 Jun 2024 at 15:58, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Mon, 3 Jun 2024 at 15:49, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > We can't rely on the sanitizers to catch all cases where we're casting
> > functions, as we don't have good enough code coverage in tests to
> > identify all places that way.
> >
> > Unless there's a warning flag we can use to get diagnosis of this
> > problem at compile time and then fix all reported issues, I won't have
> > any confidence in our ability to remove -fsanitize-cfi-icall-generalize-pointers
> > for CFI.
>
> You might think that -Wcast-function-type would detect them at compile
> time, but that has two loopholes:
>  1. void(*) (void)  matches everything
>  2. any parameter of pointer type matches any other pointer type
>
> It seems to me rather inconsistent that the sanitizers do
> not match up with the warning semantics here. I think I
> would vote for raising that with the compiler folks --
> either the sanitizer should be made looser so it matches
> the -Wcast-function-type semantics, or else a new tighter
> warning option that matches the sanitizer should be
> provided. Ideally both, I think. But it's definitely silly
> to have a runtime check that flags up things that the
> compiler perfectly well could detect at compile time but
> is choosing not to.

Slightly further investigation suggests clang 16 and later
have -Wcast-function-type-strict for the "report all the
same casts that the sanitizer does". gcc doesn't I think have
that yet. I didn't spot any option in the sanitizer to
loosen the set of things it considers mismatched function
pointers.

-- PMM
Daniel P. Berrangé June 3, 2024, 3:52 p.m. UTC | #10
On Mon, Jun 03, 2024 at 04:12:34PM +0100, Peter Maydell wrote:
> On Mon, 3 Jun 2024 at 15:58, Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Mon, 3 Jun 2024 at 15:49, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > We can't rely on the sanitizers to catch all cases where we're casting
> > > functions, as we don't have good enough code coverage in tests to
> > > identify all places that way.
> > >
> > > Unless there's a warning flag we can use to get diagnosis of this
> > > problem at compile time and then fix all reported issues, I won't have
> > > any confidence in our ability to remove -fsanitize-cfi-icall-generalize-pointers
> > > for CFI.
> >
> > You might think that -Wcast-function-type would detect them at compile
> > time, but that has two loopholes:
> >  1. void(*) (void)  matches everything
> >  2. any parameter of pointer type matches any other pointer type
> >
> > It seems to me rather inconsistent that the sanitizers do
> > not match up with the warning semantics here. I think I
> > would vote for raising that with the compiler folks --
> > either the sanitizer should be made looser so it matches
> > the -Wcast-function-type semantics, or else a new tighter
> > warning option that matches the sanitizer should be
> > provided. Ideally both, I think. But it's definitely silly
> > to have a runtime check that flags up things that the
> > compiler perfectly well could detect at compile time but
> > is choosing not to.
> 
> Slightly further investigation suggests clang 16 and later
> have -Wcast-function-type-strict for the "report all the
> same casts that the sanitizer does". gcc doesn't I think have
> that yet. I didn't spot any option in the sanitizer to
> loosen the set of things it considers mismatched function
> pointers.

I just tried that with

CC=clang ./configure --target-list=x86_64-softmmu --extra-cflags="-Wcast-function-type-strict"  --disable-werror

and it rather shows the futility of the task, picking one reoprted
warning that is repeated over & over in differnt contexts:

In file included from qapi/qapi-types-block-core.c:15:
qapi/qapi-types-block-core.h:3620:1: warning: cast from 'void (*)(DummyBlockCoreForceArrays *)' (aka 'void (*)(struct DummyBlockCoreForceArrays *)') to 'void (*)(void)' converts to incompatible function type [-Wcast-function-type-strict]
 3620 | G_DEFINE_AUTOPTR_CLEANUP_FUNC(DummyBlockCoreForceArrays, qapi_free_DummyBlockCoreForceArrays)
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/glib-2.0/glib/gmacros.h:1372:3: note: expanded from macro 'G_DEFINE_AUTOPTR_CLEANUP_FUNC'
 1372 |   _GLIB_DEFINE_AUTOPTR_CLEANUP_FUNCS(TypeName, TypeName, func)
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/glib-2.0/glib/gmacros.h:1364:57: note: expanded from macro '_GLIB_DEFINE_AUTOPTR_CLEANUP_FUNCS'
 1364 |     { if (*_q) g_queue_free_full (*_q, (GDestroyNotify) (void(*)(void)) cleanup); }                             \
      |                                                         ^~~~~~~~~~~~~~~~~~~~~~~


IOW, the GLib headers themselves have bad casts in macros which we
rely on.  So we'll never be cast safe until GLib changes its own
code...if it ever does.


With regards,
Daniel
Thomas Huth June 3, 2024, 3:55 p.m. UTC | #11
On 03/06/2024 17.52, Daniel P. Berrangé wrote:
> On Mon, Jun 03, 2024 at 04:12:34PM +0100, Peter Maydell wrote:
>> On Mon, 3 Jun 2024 at 15:58, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>
>>> On Mon, 3 Jun 2024 at 15:49, Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>> We can't rely on the sanitizers to catch all cases where we're casting
>>>> functions, as we don't have good enough code coverage in tests to
>>>> identify all places that way.
>>>>
>>>> Unless there's a warning flag we can use to get diagnosis of this
>>>> problem at compile time and then fix all reported issues, I won't have
>>>> any confidence in our ability to remove -fsanitize-cfi-icall-generalize-pointers
>>>> for CFI.
>>>
>>> You might think that -Wcast-function-type would detect them at compile
>>> time, but that has two loopholes:
>>>   1. void(*) (void)  matches everything
>>>   2. any parameter of pointer type matches any other pointer type
>>>
>>> It seems to me rather inconsistent that the sanitizers do
>>> not match up with the warning semantics here. I think I
>>> would vote for raising that with the compiler folks --
>>> either the sanitizer should be made looser so it matches
>>> the -Wcast-function-type semantics, or else a new tighter
>>> warning option that matches the sanitizer should be
>>> provided. Ideally both, I think. But it's definitely silly
>>> to have a runtime check that flags up things that the
>>> compiler perfectly well could detect at compile time but
>>> is choosing not to.
>>
>> Slightly further investigation suggests clang 16 and later
>> have -Wcast-function-type-strict for the "report all the
>> same casts that the sanitizer does". gcc doesn't I think have
>> that yet. I didn't spot any option in the sanitizer to
>> loosen the set of things it considers mismatched function
>> pointers.
> 
> I just tried that with
> 
> CC=clang ./configure --target-list=x86_64-softmmu --extra-cflags="-Wcast-function-type-strict"  --disable-werror
> 
> and it rather shows the futility of the task, picking one reoprted
> warning that is repeated over & over in differnt contexts:
> 
> In file included from qapi/qapi-types-block-core.c:15:
> qapi/qapi-types-block-core.h:3620:1: warning: cast from 'void (*)(DummyBlockCoreForceArrays *)' (aka 'void (*)(struct DummyBlockCoreForceArrays *)') to 'void (*)(void)' converts to incompatible function type [-Wcast-function-type-strict]
>   3620 | G_DEFINE_AUTOPTR_CLEANUP_FUNC(DummyBlockCoreForceArrays, qapi_free_DummyBlockCoreForceArrays)
>        | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> /usr/include/glib-2.0/glib/gmacros.h:1372:3: note: expanded from macro 'G_DEFINE_AUTOPTR_CLEANUP_FUNC'
>   1372 |   _GLIB_DEFINE_AUTOPTR_CLEANUP_FUNCS(TypeName, TypeName, func)
>        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> /usr/include/glib-2.0/glib/gmacros.h:1364:57: note: expanded from macro '_GLIB_DEFINE_AUTOPTR_CLEANUP_FUNCS'
>   1364 |     { if (*_q) g_queue_free_full (*_q, (GDestroyNotify) (void(*)(void)) cleanup); }                             \
>        |                                                         ^~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> IOW, the GLib headers themselves have bad casts in macros which we
> rely on.  So we'll never be cast safe until GLib changes its own
> code...if it ever does.

Ok, thanks for checking! So the ultimate answer to the problem (at least 
right now) is: Let's use -fno-sanitize=function in QEMU.

  Thomas
Alex Bennée June 3, 2024, 5:46 p.m. UTC | #12
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Mon, Jun 03, 2024 at 03:47:55PM +0100, Alex Bennée wrote:
>> Thomas Huth <thuth@redhat.com> writes:
>> 
>> > Casting function pointers from one type to another causes undefined
>> > behavior errors when compiling with -fsanitize=undefined with Clang
>> > v18:
>> 
>> Queued to testing/next, thanks.
>
> Please remove, as I don't think this is a viable approach to the
> problem

Ok - I'll drop it.

I still have:

  .gitlab-ci.d/buildtest.yml: Use -fno-sanitize=function in the clang-system job

to prevent the CI failures.

>
> With regards,
> Daniel
Daniel P. Berrangé June 3, 2024, 5:55 p.m. UTC | #13
On Mon, Jun 03, 2024 at 06:46:15PM +0100, Alex Bennée wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Mon, Jun 03, 2024 at 03:47:55PM +0100, Alex Bennée wrote:
> >> Thomas Huth <thuth@redhat.com> writes:
> >> 
> >> > Casting function pointers from one type to another causes undefined
> >> > behavior errors when compiling with -fsanitize=undefined with Clang
> >> > v18:
> >> 
> >> Queued to testing/next, thanks.
> >
> > Please remove, as I don't think this is a viable approach to the
> > problem
> 
> Ok - I'll drop it.
> 
> I still have:
> 
>   .gitlab-ci.d/buildtest.yml: Use -fno-sanitize=function in the clang-system job
> 
> to prevent the CI failures.

Yes, that's good.


With regards,
Daniel
diff mbox series

Patch

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 3a899b0608..aa2a1c8586 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -193,6 +193,10 @@  static void qio_channel_socket_connect_worker(QIOTask *task,
     qio_task_set_error(task, err);
 }
 
+static void qio_qapi_free_SocketAddress(gpointer sa)
+{
+    qapi_free_SocketAddress(sa);
+}
 
 void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
                                       SocketAddress *addr,
@@ -213,7 +217,7 @@  void qio_channel_socket_connect_async(QIOChannelSocket *ioc,
     qio_task_run_in_thread(task,
                            qio_channel_socket_connect_worker,
                            addrCopy,
-                           (GDestroyNotify)qapi_free_SocketAddress,
+                           qio_qapi_free_SocketAddress,
                            context);
 }