[for-2.12] hw/riscv: Fix crashes with "-nodefaults"

Message ID	1521794165-31678-1-git-send-email-thuth@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Thomas Huth <thuth@redhat.com> To: Michael Clark <mjc@sifive.com>, qemu-devel@nongnu.org Date: Fri, 23 Mar 2018 09:36:05 +0100 Message-Id: <1521794165-31678-1-git-send-email-thuth@redhat.com> Subject: [Qemu-devel] [PATCH for-2.12] hw/riscv: Fix crashes with "-nodefaults" Precedence: list Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>, Palmer Dabbelt <palmer@sifive.com>, Sagar Karandikar <sagark@eecs.berkeley.edu> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Thomas Huth March 23, 2018, 8:36 a.m. UTC

Most of the RISC-V boards currently crash when they are started with
"-nodefaults", e.g.:

$ gdb --args riscv32-softmmu/qemu-system-riscv32 -nodefaults -M sifive_e
[...]
(gdb) r
Program received signal SIGSEGV, Segmentation fault.
qemu_chr_fe_init ([...], s=s@entry=0x0, [...])
    at chardev/char-fe.c:210
210	    } else if (s->be) {
(gdb) bt
 0  0x00005555558695f8 in qemu_chr_fe_init ([...], s=s@entry=0x0, [...])
    at chardev/char-fe.c:210
 1  0x00005555556fb425 in sifive_uart_create ([...], chr=0x0, [...])
    at hw/riscv/sifive_uart.c:169
 2  0x00005555556f8cc4 in riscv_sifive_e_init (machine=[...])
    at hw/riscv/sifive_e.c:164
    [...]

With "-nodefaults" there are no entries in serial_hds[], so qemu_chr_fe_init()
finally tries to dereference a NULL pointer. Let's fix this problem by
creating a "null" chardev in this case instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 For other boards / targets, see also:
 https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg05073.html

 hw/riscv/riscv_htif.c  | 5 +++++
 hw/riscv/sifive_uart.c | 5 +++++
 2 files changed, 10 insertions(+)

Bastian Koppelmann March 23, 2018, 8:58 a.m. UTC | #1

On 03/23/2018 09:36 AM, Thomas Huth wrote:
> Most of the RISC-V boards currently crash when they are started with
> "-nodefaults", e.g.:
> 
> $ gdb --args riscv32-softmmu/qemu-system-riscv32 -nodefaults -M sifive_e
> [...]
> (gdb) r
> Program received signal SIGSEGV, Segmentation fault.
> qemu_chr_fe_init ([...], s=s@entry=0x0, [...])
>     at chardev/char-fe.c:210
> 210	    } else if (s->be) {
> (gdb) bt
>  0  0x00005555558695f8 in qemu_chr_fe_init ([...], s=s@entry=0x0, [...])
>     at chardev/char-fe.c:210
>  1  0x00005555556fb425 in sifive_uart_create ([...], chr=0x0, [...])
>     at hw/riscv/sifive_uart.c:169
>  2  0x00005555556f8cc4 in riscv_sifive_e_init (machine=[...])
>     at hw/riscv/sifive_e.c:164
>     [...]
> 
> With "-nodefaults" there are no entries in serial_hds[], so qemu_chr_fe_init()
> finally tries to dereference a NULL pointer. Let's fix this problem by
> creating a "null" chardev in this case instead.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  For other boards / targets, see also:
>  https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg05073.html
> 
>  hw/riscv/riscv_htif.c  | 5 +++++
>  hw/riscv/sifive_uart.c | 5 +++++
>  2 files changed, 10 insertions(+)
> 

Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>

Cheers,
Bastian

Peter Maydell March 23, 2018, 9:56 a.m. UTC | #2

On 23 March 2018 at 08:36, Thomas Huth <thuth@redhat.com> wrote:
> Most of the RISC-V boards currently crash when they are started with
> "-nodefaults", e.g.:
>
> $ gdb --args riscv32-softmmu/qemu-system-riscv32 -nodefaults -M sifive_e
> [...]
> (gdb) r
> Program received signal SIGSEGV, Segmentation fault.
> qemu_chr_fe_init ([...], s=s@entry=0x0, [...])
>     at chardev/char-fe.c:210
> 210         } else if (s->be) {
> (gdb) bt
>  0  0x00005555558695f8 in qemu_chr_fe_init ([...], s=s@entry=0x0, [...])
>     at chardev/char-fe.c:210
>  1  0x00005555556fb425 in sifive_uart_create ([...], chr=0x0, [...])
>     at hw/riscv/sifive_uart.c:169
>  2  0x00005555556f8cc4 in riscv_sifive_e_init (machine=[...])
>     at hw/riscv/sifive_e.c:164
>     [...]
>
> With "-nodefaults" there are no entries in serial_hds[], so qemu_chr_fe_init()
> finally tries to dereference a NULL pointer. Let's fix this problem by
> creating a "null" chardev in this case instead.

Isn't it possible to fix this another level further down
by having qemu_chr_fe_init() &c handle having a NULL chardev?
That would be nicer than requiring every UART model to handle
it, because inevitably people will forget about that corner case.

> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  For other boards / targets, see also:
>  https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg05073.html
>
>  hw/riscv/riscv_htif.c  | 5 +++++
>  hw/riscv/sifive_uart.c | 5 +++++
>  2 files changed, 10 insertions(+)
>
> diff --git a/hw/riscv/riscv_htif.c b/hw/riscv/riscv_htif.c
> index 3e17f30..d3d31ff 100644
> --- a/hw/riscv/riscv_htif.c
> +++ b/hw/riscv/riscv_htif.c
> @@ -245,9 +245,14 @@ HTIFState *htif_mm_init(MemoryRegion *address_space, MemoryRegion *main_mem,
>      s->pending_read = 0;
>      s->allow_tohost = 0;
>      s->fromhost_inprogress = 0;
> +
> +    if (!chr) {
> +        chr = qemu_chardev_new(NULL, TYPE_CHARDEV_NULL, NULL, &error_abort);
> +    }
>      qemu_chr_fe_init(&s->chr, chr, &error_abort);
>      qemu_chr_fe_set_handlers(&s->chr, htif_can_recv, htif_recv, htif_event,
>          htif_be_change, s, NULL, true);
> +
>      if (base) {
>          memory_region_init_io(&s->mmio, NULL, &htif_mm_ops, s,
>                              TYPE_HTIF_UART, size);

As an aside: this source file looks like it's a device, but
it isn't a QOM device. That seems like it needs fixing, and it
should probably be in hw/char or hw/misc.

> diff --git a/hw/riscv/sifive_uart.c b/hw/riscv/sifive_uart.c
> index b0c3798..2bde8bb 100644
> --- a/hw/riscv/sifive_uart.c
> +++ b/hw/riscv/sifive_uart.c

If this is a UART why isn't it in hw/char ? It should also
be a proper QOM device, and moved to the right place in
the source tree.

> @@ -166,9 +166,14 @@ SiFiveUARTState *sifive_uart_create(MemoryRegion *address_space, hwaddr base,
>  {
>      SiFiveUARTState *s = g_malloc0(sizeof(SiFiveUARTState));
>      s->irq = irq;
> +
> +    if (!chr) {
> +        chr = qemu_chardev_new(NULL, TYPE_CHARDEV_NULL, NULL, &error_abort);
> +    }
>      qemu_chr_fe_init(&s->chr, chr, &error_abort);
>      qemu_chr_fe_set_handlers(&s->chr, uart_can_rx, uart_rx, uart_event,
>          uart_be_change, s, NULL, true);
> +
>      memory_region_init_io(&s->mmio, NULL, &uart_ops, s,
>                            TYPE_SIFIVE_UART, SIFIVE_UART_MAX);
>      memory_region_add_subregion(address_space, base, &s->mmio);
> --
> 1.8.3.1

thanks
-- PMM

Thomas Huth March 23, 2018, 12:31 p.m. UTC | #3

On 23.03.2018 10:56, Peter Maydell wrote:
> On 23 March 2018 at 08:36, Thomas Huth <thuth@redhat.com> wrote:
>> Most of the RISC-V boards currently crash when they are started with
>> "-nodefaults", e.g.:
>>
>> $ gdb --args riscv32-softmmu/qemu-system-riscv32 -nodefaults -M sifive_e
>> [...]
>> (gdb) r
>> Program received signal SIGSEGV, Segmentation fault.
>> qemu_chr_fe_init ([...], s=s@entry=0x0, [...])
>>     at chardev/char-fe.c:210
>> 210         } else if (s->be) {
>> (gdb) bt
>>  0  0x00005555558695f8 in qemu_chr_fe_init ([...], s=s@entry=0x0, [...])
>>     at chardev/char-fe.c:210
>>  1  0x00005555556fb425 in sifive_uart_create ([...], chr=0x0, [...])
>>     at hw/riscv/sifive_uart.c:169
>>  2  0x00005555556f8cc4 in riscv_sifive_e_init (machine=[...])
>>     at hw/riscv/sifive_e.c:164
>>     [...]
>>
>> With "-nodefaults" there are no entries in serial_hds[], so qemu_chr_fe_init()
>> finally tries to dereference a NULL pointer. Let's fix this problem by
>> creating a "null" chardev in this case instead.
> 
> Isn't it possible to fix this another level further down
> by having qemu_chr_fe_init() &c handle having a NULL chardev?

Not sure, ... I don't think that we should create a "null" device in
chardev/char-fe.c - that would sound like a layer violation to me.

So all we could do there is to set an error in "errp" or to simply
"ignore" the NULL pointer (so that b->chr simply gets set to NULL here).

In the first case, we'd run into abort()s instead, since lots of caller
call this function with errp = &error_abort. So that does also not sound
very user-friendly (unless we change the error_aborts into error_fatals
maybe).

In the second case, QEMU seems either bails out later in serial_mm_init
(when calling serial_realize_core()) with:

$ riscv32-softmmu/qemu-system-riscv32 -nodefaults -M virt
qemu-system-riscv32: Can't create serial device, empty char device

... or it seems to start in case of the other riscv machines. Not sure
whether we can trust all the remaining functions to deal correctly with
a NULL pointer in b->chr here though... I guess we could try and find
out later where there are other crashes in that case.

Opinions?

> That would be nicer than requiring every UART model to handle
> it, because inevitably people will forget about that corner case.

At least we need some more automatic qtest coverage here... (e.g.
https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg05033.html)

 Thomas

Peter Maydell March 23, 2018, 1:12 p.m. UTC | #4

On 23 March 2018 at 12:31, Thomas Huth <thuth@redhat.com> wrote:
> On 23.03.2018 10:56, Peter Maydell wrote:
>> On 23 March 2018 at 08:36, Thomas Huth <thuth@redhat.com> wrote:
>>> Most of the RISC-V boards currently crash when they are started with
>>> "-nodefaults", e.g.:
>>>
>>> $ gdb --args riscv32-softmmu/qemu-system-riscv32 -nodefaults -M sifive_e
>>> [...]
>>> (gdb) r
>>> Program received signal SIGSEGV, Segmentation fault.
>>> qemu_chr_fe_init ([...], s=s@entry=0x0, [...])
>>>     at chardev/char-fe.c:210
>>> 210         } else if (s->be) {
>>> (gdb) bt
>>>  0  0x00005555558695f8 in qemu_chr_fe_init ([...], s=s@entry=0x0, [...])
>>>     at chardev/char-fe.c:210
>>>  1  0x00005555556fb425 in sifive_uart_create ([...], chr=0x0, [...])
>>>     at hw/riscv/sifive_uart.c:169
>>>  2  0x00005555556f8cc4 in riscv_sifive_e_init (machine=[...])
>>>     at hw/riscv/sifive_e.c:164
>>>     [...]
>>>
>>> With "-nodefaults" there are no entries in serial_hds[], so qemu_chr_fe_init()
>>> finally tries to dereference a NULL pointer. Let's fix this problem by
>>> creating a "null" chardev in this case instead.
>>
>> Isn't it possible to fix this another level further down
>> by having qemu_chr_fe_init() &c handle having a NULL chardev?
>
> Not sure, ... I don't think that we should create a "null" device in
> chardev/char-fe.c - that would sound like a layer violation to me.

It already does treat a NULL Chardev* as being "do nothing"
most of the time: eg qemu_chr_fe_write(), qemu_chr_fe_read_all(),
qemu_chr_fe_ioctl(), qemu_chr_fe_set_handlers() &c all have
"if be->chr is NULL, do nothing" checks.

> So all we could do there is to set an error in "errp" or to simply
> "ignore" the NULL pointer (so that b->chr simply gets set to NULL here).

I think that the latter seems to be what the design intends:
a backend with a NULL chardev pointer is "backend with no
chardev attached at the moment". At least some of our UART devices
handle that as "just dump output into nothingness", which
seems logical to me.

Cc'ing Marc-André and Paolo as the chardev maintainers --
do you have a plan for what the intended design here is?

thanks
-- PMM

Paolo Bonzini March 23, 2018, 2:02 p.m. UTC | #5

On 23/03/2018 14:12, Peter Maydell wrote:
> I think that the latter seems to be what the design intends:
> a backend with a NULL chardev pointer is "backend with no
> chardev attached at the moment". At least some of our UART devices
> handle that as "just dump output into nothingness", which
> seems logical to me.
> 
> Cc'ing Marc-André and Paolo as the chardev maintainers --
> do you have a plan for what the intended design here is?

-nodefaults was added for x86 and other machines whose devices are
discoverable through PCI enumeration, ACPI or something like that, and
on those machine it suppresses the default devices.  For machines that
create their device tree automatically, it makes sense to do the same.

However, if the machine is emulating a real-world board with a fixed SoC
and fixed hardware in the SoC, it makes more sense to create a null backend.

Paolo

Peter Maydell March 23, 2018, 2:04 p.m. UTC | #6

On 23 March 2018 at 14:02, Paolo Bonzini <pbonzini@redhat.com> wrote:

> However, if the machine is emulating a real-world board with a fixed SoC
> and fixed hardware in the SoC, it makes more sense to create a null backend.

That's a lot of duplicate code to say "oh, this is NULL, create the
null backend" in lots of different boards :-(

thanks
-- PMM

Paolo Bonzini March 23, 2018, 2:13 p.m. UTC | #7

On 23/03/2018 15:04, Peter Maydell wrote:
>> However, if the machine is emulating a real-world board with a fixed SoC
>> and fixed hardware in the SoC, it makes more sense to create a null backend.
>
> That's a lot of duplicate code to say "oh, this is NULL, create the
> null backend" in lots of different boards :-(

Note it's a null "backend", not necessarily a null "character device".
Your proposal, namely ensuring that be->chr == NULL is handled properly
in qemu_chr_fe_init, would be just fine.

The main point was that the choice of whether to create the device is up
to the board, and there isn't a single answer valid for all boards.

Paolo

Peter Maydell March 23, 2018, 2:28 p.m. UTC | #8

On 23 March 2018 at 14:13, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 23/03/2018 15:04, Peter Maydell wrote:
>>> However, if the machine is emulating a real-world board with a fixed SoC
>>> and fixed hardware in the SoC, it makes more sense to create a null backend.
>>
>> That's a lot of duplicate code to say "oh, this is NULL, create the
>> null backend" in lots of different boards :-(
>
> Note it's a null "backend", not necessarily a null "character device".
> Your proposal, namely ensuring that be->chr == NULL is handled properly
> in qemu_chr_fe_init, would be just fine.

Hmm, the chardev layer code seems to have more ends than I expect.
The "frontend" is the UART model, right, and I thought the
"backend" was the TCP/UDP/serial port/stdio/etc end of things,
but those seem to be Chardevs? If those aren't the backend,
then what is?

What I'd like, anyway, is that every UART model can cope with being
setup with a NULL 'chardev' property, and ideally that that doesn't
require a lot of extra code per-UART, and doesn't require each UART
to create a TYPE_CHARDEV_NULL.

> The main point was that the choice of whether to create the device is up
> to the board, and there isn't a single answer valid for all boards.

I agree with that part, yes, assuming I have the terminology right now.

thanks
-- PMM

Paolo Bonzini March 23, 2018, 3:03 p.m. UTC | #9

On 23/03/2018 15:28, Peter Maydell wrote:
>> Note it's a null "backend", not necessarily a null "character device".
>> Your proposal, namely ensuring that be->chr == NULL is handled properly
>> in qemu_chr_fe_init, would be just fine.
> 
> Hmm, the chardev layer code seems to have more ends than I expect.
> The "frontend" is the UART model, right, and I thought the
> "backend" was the TCP/UDP/serial port/stdio/etc end of things,
> but those seem to be Chardevs? If those aren't the backend,
> then what is?

The naming of Chardev/CharBackend was modeled after
BlockDriverState/BlockBackend, and it's not great but at least it's
consistent. :/

The CharBackend struct essentially connects a Chardev and its user.  It
ought to be possible for the Chardev to be NULL.

Paolo

> What I'd like, anyway, is that every UART model can cope with being
> setup with a NULL 'chardev' property, and ideally that that doesn't
> require a lot of extra code per-UART, and doesn't require each UART
> to create a TYPE_CHARDEV_NULL.
>

[for-2.12] hw/riscv: Fix crashes with "-nodefaults"

Commit Message

Comments

Patch