mbox

[PULL,00/30] ppc-for-5.2 queue 20200904

Message ID 20200904034719.673626-1-david@gibson.dropbear.id.au (mailing list archive)
State New, archived
Headers show

Pull-request

git://github.com/dgibson/qemu.git tags/ppc-for-5.2-20200904

Message

David Gibson Sept. 4, 2020, 3:46 a.m. UTC
The following changes since commit 67a7bfe560a1bba59efab085cb3430f45176d382:

  Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2020-09-03' into staging (2020-09-03 16:58:25 +0100)

are available in the Git repository at:

  git://github.com/dgibson/qemu.git tags/ppc-for-5.2-20200904

for you to fetch changes up to b172606ecf29a140073f7787251a9d70ecb53b6e:

  spapr_numa: move NVLink2 associativity handling to spapr_numa.c (2020-09-04 13:40:09 +1000)

----------------------------------------------------------------
ppc patch queue 2020-09-04

Next pull request for qemu-5.2.  The biggest thing here is the
generalization of ARM's start-powered-off machine property to all
targets.  This can fix a number of odd little edge cases where KVM
could run vcpus before they were properly initialized.  This does
include changes to a number of files that aren't normally in my
purview.  There are suitable Acked-by lines and Peter requested this
come in via my tree, since the most pressing requirement for it is in
pseries machines with the POWER secure virtual machine facility.

In addition we have:
 * The start of Daniel Barboza's rework and clean up of pseries
   machine NUMA handling
 * Correction to behaviour of the nvdimm= generic machine property on
   pseries
 * An optimization to the allocation of XIVE interrupts on KVM
 * Some fixes for confused behaviour with kernel_irqchip when both
   XICS and XIVE are in play
 * Add HIOMAP comamnd to pnv flash
 * Properly advertise the fact that spapr_vscsi doesn't handle
   hotplugged disks
 * Some assorted minor enhancements

----------------------------------------------------------------
Cédric Le Goater (8):
      ppc/pnv: Fix TypeInfo of PnvLpcController abstract class
      spapr/xive: Add a 'hv-prio' property to represent the KVM escalation priority
      ppc/pnv: Add a HIOMAP erase command
      spapr/xive: Use the xics flag to check for XIVE-only IRQ backends
      spapr/xive: Modify kvm_cpu_is_enabled() interface
      spapr/xive: Use kvmppc_xive_source_reset() in post_load
      spapr/xive: Allocate IPIs independently from the other sources
      spapr/xive: Allocate vCPU IPIs from the vCPU contexts

Daniel Henrique Barboza (10):
      spapr_vscsi: do not allow device hotplug
      ppc/spapr_nvdimm: use g_autofree in spapr_nvdimm_validate_opts()
      spapr, spapr_nvdimm: fold NVDIMM validation in the same place
      ppc/spapr_nvdimm: do not enable support with 'nvdimm=off'
      ppc: introducing spapr_numa.c NUMA code helper
      ppc/spapr_nvdimm: turn spapr_dt_nvdimm() static
      spapr: introduce SpaprMachineState::numa_assoc_array
      spapr, spapr_numa: handle vcpu ibm,associativity
      spapr, spapr_numa: move lookup-arrays handling to spapr_numa.c
      spapr_numa: move NVLink2 associativity handling to spapr_numa.c

David Gibson (2):
      adb: Correct class size on TYPE_ADB_DEVICE
      spapr: Remove unnecessary DRC type-checker macros

Philippe Mathieu-Daudé (2):
      hw/ppc/ppc4xx_pci: Use ARRAY_SIZE() instead of magic value
      hw/ppc/ppc4xx_pci: Replace pointless warning by assert()

Thiago Jung Bauermann (8):
      target/arm: Move start-powered-off property to generic CPUState
      target/arm: Move setting of CPU halted state to generic code
      ppc/spapr: Use start-powered-off CPUState property
      ppc/e500: Use start-powered-off CPUState property
      mips/cps: Use start-powered-off CPUState property
      sparc/sun4m: Don't set cs->halted = 0 in main_cpu_reset()
      sparc/sun4m: Use start-powered-off CPUState property
      target/s390x: Use start-powered-off CPUState property

 exec.c                        |   1 +
 hw/core/cpu.c                 |   2 +-
 hw/input/adb.c                |   1 +
 hw/intc/spapr_xive.c          |  33 ++++-----
 hw/intc/spapr_xive_kvm.c      | 102 +++++++++++++++++++++-----
 hw/mips/cps.c                 |  15 +++-
 hw/ppc/e500.c                 |  13 +++-
 hw/ppc/meson.build            |   3 +-
 hw/ppc/pnv_bmc.c              |  29 +++++++-
 hw/ppc/pnv_lpc.c              |   3 +-
 hw/ppc/ppc4xx_pci.c           |   8 +-
 hw/ppc/spapr.c                | 109 ++++++---------------------
 hw/ppc/spapr_cpu_core.c       |  10 +--
 hw/ppc/spapr_irq.c            |   2 +-
 hw/ppc/spapr_numa.c           | 167 ++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_nvdimm.c         |  68 ++++++++++-------
 hw/ppc/spapr_pci.c            |   9 +--
 hw/ppc/spapr_pci_nvlink2.c    |  20 +----
 hw/scsi/spapr_vscsi.c         |   3 +
 hw/sparc/sun4m.c              |  26 ++-----
 include/hw/core/cpu.h         |   4 +
 include/hw/ipmi/ipmi.h        |   1 +
 include/hw/ppc/spapr.h        |  12 +++
 include/hw/ppc/spapr_drc.h    |  43 +----------
 include/hw/ppc/spapr_numa.h   |  35 +++++++++
 include/hw/ppc/spapr_nvdimm.h |   7 +-
 include/hw/ppc/spapr_xive.h   |   2 +
 target/arm/cpu.c              |   4 +-
 target/arm/cpu.h              |   3 -
 target/arm/kvm32.c            |   2 +-
 target/arm/kvm64.c            |   2 +-
 target/s390x/cpu.c            |   2 +-
 32 files changed, 468 insertions(+), 273 deletions(-)
 create mode 100644 hw/ppc/spapr_numa.c
 create mode 100644 include/hw/ppc/spapr_numa.h

Comments

Peter Maydell Sept. 6, 2020, 3:20 p.m. UTC | #1
On Fri, 4 Sep 2020 at 04:47, David Gibson <david@gibson.dropbear.id.au> wrote:
>
> The following changes since commit 67a7bfe560a1bba59efab085cb3430f45176d382:
>
>   Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2020-09-03' into staging (2020-09-03 16:58:25 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/dgibson/qemu.git tags/ppc-for-5.2-20200904
>
> for you to fetch changes up to b172606ecf29a140073f7787251a9d70ecb53b6e:
>
>   spapr_numa: move NVLink2 associativity handling to spapr_numa.c (2020-09-04 13:40:09 +1000)
>
> ----------------------------------------------------------------
> ppc patch queue 2020-09-04
>
> Next pull request for qemu-5.2.  The biggest thing here is the
> generalization of ARM's start-powered-off machine property to all
> targets.  This can fix a number of odd little edge cases where KVM
> could run vcpus before they were properly initialized.  This does
> include changes to a number of files that aren't normally in my
> purview.  There are suitable Acked-by lines and Peter requested this
> come in via my tree, since the most pressing requirement for it is in
> pseries machines with the POWER secure virtual machine facility.
>
> In addition we have:
>  * The start of Daniel Barboza's rework and clean up of pseries
>    machine NUMA handling
>  * Correction to behaviour of the nvdimm= generic machine property on
>    pseries
>  * An optimization to the allocation of XIVE interrupts on KVM
>  * Some fixes for confused behaviour with kernel_irqchip when both
>    XICS and XIVE are in play
>  * Add HIOMAP comamnd to pnv flash
>  * Properly advertise the fact that spapr_vscsi doesn't handle
>    hotplugged disks
>  * Some assorted minor enhancements

Hi -- this fails to build for Windows:

../../hw/ppc/spapr_numa.c: In function 'spapr_numa_fixup_cpu_dt':
../../hw/ppc/spapr_numa.c:77:5: error: unknown type name 'uint'
     uint vcpu_assoc_size = NUMA_ASSOC_SIZE + 1;
     ^

That should probably be using one of the standard C types.

The 'check-tcg' tests for the linux-user static build also
failed on an s390x test:

  CHECK   debian-s390x-cross
  BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
  RUN     tests for s390x
  TEST    threadcount on s390x
Unhandled trap: 0x10003
PSW=mask 0000000180000000 addr 00000000010004f0 cc 00
R00=0000000000000000 R01=0000000000000000 R02=0000000000000000
R03=0000000000000000
R04=0000000000000000 R05=0000000000000000 R06=0000000000000000
R07=0000000000000000
R08=0000000000000000 R09=0000000000000000 R10=0000000000000000
R11=0000000000000000
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000
R15=00000040008006c0

../Makefile.target:153: recipe for target 'run-threadcount' failed
make[2]: *** [run-threadcount] Error 1


thanks
-- PMM
David Gibson Sept. 7, 2020, 2:38 a.m. UTC | #2
On Sun, Sep 06, 2020 at 04:20:10PM +0100, Peter Maydell wrote:
> On Fri, 4 Sep 2020 at 04:47, David Gibson <david@gibson.dropbear.id.au> wrote:
> >
> > The following changes since commit 67a7bfe560a1bba59efab085cb3430f45176d382:
> >
> >   Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2020-09-03' into staging (2020-09-03 16:58:25 +0100)
> >
> > are available in the Git repository at:
> >
> >   git://github.com/dgibson/qemu.git tags/ppc-for-5.2-20200904
> >
> > for you to fetch changes up to b172606ecf29a140073f7787251a9d70ecb53b6e:
> >
> >   spapr_numa: move NVLink2 associativity handling to spapr_numa.c (2020-09-04 13:40:09 +1000)
> >
> > ----------------------------------------------------------------
> > ppc patch queue 2020-09-04
> >
> > Next pull request for qemu-5.2.  The biggest thing here is the
> > generalization of ARM's start-powered-off machine property to all
> > targets.  This can fix a number of odd little edge cases where KVM
> > could run vcpus before they were properly initialized.  This does
> > include changes to a number of files that aren't normally in my
> > purview.  There are suitable Acked-by lines and Peter requested this
> > come in via my tree, since the most pressing requirement for it is in
> > pseries machines with the POWER secure virtual machine facility.
> >
> > In addition we have:
> >  * The start of Daniel Barboza's rework and clean up of pseries
> >    machine NUMA handling
> >  * Correction to behaviour of the nvdimm= generic machine property on
> >    pseries
> >  * An optimization to the allocation of XIVE interrupts on KVM
> >  * Some fixes for confused behaviour with kernel_irqchip when both
> >    XICS and XIVE are in play
> >  * Add HIOMAP comamnd to pnv flash
> >  * Properly advertise the fact that spapr_vscsi doesn't handle
> >    hotplugged disks
> >  * Some assorted minor enhancements
> 
> Hi -- this fails to build for Windows:
> 
> ../../hw/ppc/spapr_numa.c: In function 'spapr_numa_fixup_cpu_dt':
> ../../hw/ppc/spapr_numa.c:77:5: error: unknown type name 'uint'
>      uint vcpu_assoc_size = NUMA_ASSOC_SIZE + 1;
>      ^

Huh, that's weird.  My testing run was less thorough than I'd usually
do, because so many tests were broken on the master branch, but I was
pretty sure I did do successful mingw builds.

> That should probably be using one of the standard C types.

Done.

> The 'check-tcg' tests for the linux-user static build also
> failed on an s390x test:
> 
>   CHECK   debian-s390x-cross
>   BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
>   RUN     tests for s390x
>   TEST    threadcount on s390x
> Unhandled trap: 0x10003
> PSW=mask 0000000180000000 addr 00000000010004f0 cc 00
> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000
> R03=0000000000000000
> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000
> R07=0000000000000000
> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000
> R11=0000000000000000
> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000
> R15=00000040008006c0
> 
> ../Makefile.target:153: recipe for target 'run-threadcount' failed
> make[2]: *** [run-threadcount] Error 1

Bother.  I did see that failure on Travis, but assumed it was a false
positive because there were so many failures on master there.
Laurent Vivier Sept. 7, 2020, 1:29 p.m. UTC | #3
On 07/09/2020 04:38, David Gibson wrote:
> On Sun, Sep 06, 2020 at 04:20:10PM +0100, Peter Maydell wrote:
>> On Fri, 4 Sep 2020 at 04:47, David Gibson <david@gibson.dropbear.id.au> wrote:
>>>
>>> The following changes since commit 67a7bfe560a1bba59efab085cb3430f45176d382:
>>>
>>>   Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2020-09-03' into staging (2020-09-03 16:58:25 +0100)
>>>
>>> are available in the Git repository at:
>>>
>>>   git://github.com/dgibson/qemu.git tags/ppc-for-5.2-20200904
>>>
>>> for you to fetch changes up to b172606ecf29a140073f7787251a9d70ecb53b6e:
>>>
>>>   spapr_numa: move NVLink2 associativity handling to spapr_numa.c (2020-09-04 13:40:09 +1000)
>>>
>>> ----------------------------------------------------------------
>>> ppc patch queue 2020-09-04
>>>
>>> Next pull request for qemu-5.2.  The biggest thing here is the
>>> generalization of ARM's start-powered-off machine property to all
>>> targets.  This can fix a number of odd little edge cases where KVM
>>> could run vcpus before they were properly initialized.  This does
>>> include changes to a number of files that aren't normally in my
>>> purview.  There are suitable Acked-by lines and Peter requested this
>>> come in via my tree, since the most pressing requirement for it is in
>>> pseries machines with the POWER secure virtual machine facility.
>>>
>>> In addition we have:
>>>  * The start of Daniel Barboza's rework and clean up of pseries
>>>    machine NUMA handling
>>>  * Correction to behaviour of the nvdimm= generic machine property on
>>>    pseries
>>>  * An optimization to the allocation of XIVE interrupts on KVM
>>>  * Some fixes for confused behaviour with kernel_irqchip when both
>>>    XICS and XIVE are in play
>>>  * Add HIOMAP comamnd to pnv flash
>>>  * Properly advertise the fact that spapr_vscsi doesn't handle
>>>    hotplugged disks
>>>  * Some assorted minor enhancements
>>
>> Hi -- this fails to build for Windows:
>>
>> ../../hw/ppc/spapr_numa.c: In function 'spapr_numa_fixup_cpu_dt':
>> ../../hw/ppc/spapr_numa.c:77:5: error: unknown type name 'uint'
>>      uint vcpu_assoc_size = NUMA_ASSOC_SIZE + 1;
>>      ^
> 
> Huh, that's weird.  My testing run was less thorough than I'd usually
> do, because so many tests were broken on the master branch, but I was
> pretty sure I did do successful mingw builds.
> 
>> That should probably be using one of the standard C types.
> 
> Done.
> 
>> The 'check-tcg' tests for the linux-user static build also
>> failed on an s390x test:
>>
>>   CHECK   debian-s390x-cross
>>   BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
>>   RUN     tests for s390x
>>   TEST    threadcount on s390x
>> Unhandled trap: 0x10003

This is EXCP_HALTED (include/exec/cpu-all.h)

The message error comes from cpu_loop() in linux-user/s390x/cpu_loop.c.

The trap can only come from accel/tcg/cpu-exec.c

    679 int cpu_exec(CPUState *cpu)
    680 {
...
    688     if (cpu_handle_halt(cpu)) {
    689         return EXCP_HALTED;
    690     }

and

    428 static inline bool cpu_handle_halt(CPUState *cpu)
    429 {
    430     if (cpu->halted) {
...
    441         if (!cpu_has_work(cpu)) {
    442             return true;
    443         }

and

     58 static bool s390_cpu_has_work(CPUState *cs)
     59 {
     60     S390CPU *cpu = S390_CPU(cs);
     61
     62     /* STOPPED cpus can never wake up */
     63     if (s390_cpu_get_state(cpu) != S390_CPU_STATE_LOAD &&
     64         s390_cpu_get_state(cpu) != S390_CPU_STATE_OPERATING) {
     65         return false;
     66     }
     67
     68     if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
     69         return false;
     70     }
     71
     72     return s390_cpu_has_int(cpu);
     73 }

and in target/s390x/cpu.h:

    772 #ifndef CONFIG_USER_ONLY
    773 unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
    774 #else
    775 static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
S390CPU *cpu)
    776 {
    777     return 0;
    778 }
    779 #endif /* CONFIG_USER_ONLY */
    780 static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
    781 {
    782     return cpu->env.cpu_state;
    783 }

As cpu_state is never set, perhaps in case of linux-user it should
always return S390_CPU_STATE_OPERATING?

Something like:

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 035427521cec..8a8628fcdcc6 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -771,16 +771,20 @@ int s390_assign_subch_ioeventfd(EventNotifier
*notifier, uint32_t sch_id,
                                 int vq, bool assign);
 #ifndef CONFIG_USER_ONLY
 unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
+static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
+{
+    return cpu->env.cpu_state;
+}
 #else
 static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
S390CPU *cpu)
 {
     return 0;
 }
-#endif /* CONFIG_USER_ONLY */
 static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
 {
-    return cpu->env.cpu_state;
+    return S390_CPU_STATE_OPERATING;
 }
+#endif /* CONFIG_USER_ONLY */

Thanks,
Laurent

>> PSW=mask 0000000180000000 addr 00000000010004f0 cc 00
>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000
>> R03=0000000000000000
>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000
>> R07=0000000000000000
>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000
>> R11=0000000000000000
>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000
>> R15=00000040008006c0
>>
>> ../Makefile.target:153: recipe for target 'run-threadcount' failed
>> make[2]: *** [run-threadcount] Error 1
> 
> Bother.  I did see that failure on Travis, but assumed it was a false
> positive because there were so many failures on master there.
>
Philippe Mathieu-Daudé Sept. 7, 2020, 2:05 p.m. UTC | #4
Hi Thiago,

On 9/7/20 3:29 PM, Laurent Vivier wrote:
> On 07/09/2020 04:38, David Gibson wrote:
>> On Sun, Sep 06, 2020 at 04:20:10PM +0100, Peter Maydell wrote:
>>> On Fri, 4 Sep 2020 at 04:47, David Gibson <david@gibson.dropbear.id.au> wrote:
>>>>
>>>> The following changes since commit 67a7bfe560a1bba59efab085cb3430f45176d382:
>>>>
>>>>   Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2020-09-03' into staging (2020-09-03 16:58:25 +0100)
>>>>
>>>> are available in the Git repository at:
>>>>
>>>>   git://github.com/dgibson/qemu.git tags/ppc-for-5.2-20200904
>>>>
>>>> for you to fetch changes up to b172606ecf29a140073f7787251a9d70ecb53b6e:
>>>>
>>>>   spapr_numa: move NVLink2 associativity handling to spapr_numa.c (2020-09-04 13:40:09 +1000)
>>>>
>>>> ----------------------------------------------------------------
>>>> ppc patch queue 2020-09-04
>>>>
>>>> Next pull request for qemu-5.2.  The biggest thing here is the
>>>> generalization of ARM's start-powered-off machine property to all
>>>> targets.  This can fix a number of odd little edge cases where KVM
>>>> could run vcpus before they were properly initialized.  This does
>>>> include changes to a number of files that aren't normally in my
>>>> purview.  There are suitable Acked-by lines and Peter requested this
>>>> come in via my tree, since the most pressing requirement for it is in
>>>> pseries machines with the POWER secure virtual machine facility.
>>>>
>>>> In addition we have:
>>>>  * The start of Daniel Barboza's rework and clean up of pseries
>>>>    machine NUMA handling
>>>>  * Correction to behaviour of the nvdimm= generic machine property on
>>>>    pseries
>>>>  * An optimization to the allocation of XIVE interrupts on KVM
>>>>  * Some fixes for confused behaviour with kernel_irqchip when both
>>>>    XICS and XIVE are in play
>>>>  * Add HIOMAP comamnd to pnv flash
>>>>  * Properly advertise the fact that spapr_vscsi doesn't handle
>>>>    hotplugged disks
>>>>  * Some assorted minor enhancements
>>>
>>> Hi -- this fails to build for Windows:
>>>
>>> ../../hw/ppc/spapr_numa.c: In function 'spapr_numa_fixup_cpu_dt':
>>> ../../hw/ppc/spapr_numa.c:77:5: error: unknown type name 'uint'
>>>      uint vcpu_assoc_size = NUMA_ASSOC_SIZE + 1;
>>>      ^
>>
>> Huh, that's weird.  My testing run was less thorough than I'd usually
>> do, because so many tests were broken on the master branch, but I was
>> pretty sure I did do successful mingw builds.
>>
>>> That should probably be using one of the standard C types.
>>
>> Done.
>>
>>> The 'check-tcg' tests for the linux-user static build also
>>> failed on an s390x test:
>>>
>>>   CHECK   debian-s390x-cross
>>>   BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
>>>   RUN     tests for s390x
>>>   TEST    threadcount on s390x
>>> Unhandled trap: 0x10003
> 
> This is EXCP_HALTED (include/exec/cpu-all.h)
> 
> The message error comes from cpu_loop() in linux-user/s390x/cpu_loop.c.
> 
> The trap can only come from accel/tcg/cpu-exec.c
> 
>     679 int cpu_exec(CPUState *cpu)
>     680 {
> ...
>     688     if (cpu_handle_halt(cpu)) {
>     689         return EXCP_HALTED;
>     690     }
> 
> and
> 
>     428 static inline bool cpu_handle_halt(CPUState *cpu)
>     429 {
>     430     if (cpu->halted) {
> ...
>     441         if (!cpu_has_work(cpu)) {
>     442             return true;
>     443         }
> 
> and
> 
>      58 static bool s390_cpu_has_work(CPUState *cs)
>      59 {
>      60     S390CPU *cpu = S390_CPU(cs);
>      61
>      62     /* STOPPED cpus can never wake up */
>      63     if (s390_cpu_get_state(cpu) != S390_CPU_STATE_LOAD &&
>      64         s390_cpu_get_state(cpu) != S390_CPU_STATE_OPERATING) {
>      65         return false;
>      66     }
>      67
>      68     if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>      69         return false;
>      70     }
>      71
>      72     return s390_cpu_has_int(cpu);
>      73 }
> 
> and in target/s390x/cpu.h:
> 
>     772 #ifndef CONFIG_USER_ONLY
>     773 unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
>     774 #else
>     775 static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
> S390CPU *cpu)
>     776 {
>     777     return 0;
>     778 }
>     779 #endif /* CONFIG_USER_ONLY */
>     780 static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
>     781 {
>     782     return cpu->env.cpu_state;
>     783 }
> 
> As cpu_state is never set, perhaps in case of linux-user it should
> always return S390_CPU_STATE_OPERATING?
> 
> Something like:
> 
> diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
> index 035427521cec..8a8628fcdcc6 100644
> --- a/target/s390x/cpu.h
> +++ b/target/s390x/cpu.h
> @@ -771,16 +771,20 @@ int s390_assign_subch_ioeventfd(EventNotifier
> *notifier, uint32_t sch_id,
>                                  int vq, bool assign);
>  #ifndef CONFIG_USER_ONLY
>  unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
> +static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
> +{
> +    return cpu->env.cpu_state;
> +}
>  #else
>  static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
> S390CPU *cpu)
>  {
>      return 0;
>  }
> -#endif /* CONFIG_USER_ONLY */
>  static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
>  {
> -    return cpu->env.cpu_state;
> +    return S390_CPU_STATE_OPERATING;
>  }
> +#endif /* CONFIG_USER_ONLY */

Since this is the effect of your "target/s390x: Use start-powered-off
CPUState property" patch, can you have a look please?

> 
> Thanks,
> Laurent
> 
>>> PSW=mask 0000000180000000 addr 00000000010004f0 cc 00
>>> R00=0000000000000000 R01=0000000000000000 R02=0000000000000000
>>> R03=0000000000000000
>>> R04=0000000000000000 R05=0000000000000000 R06=0000000000000000
>>> R07=0000000000000000
>>> R08=0000000000000000 R09=0000000000000000 R10=0000000000000000
>>> R11=0000000000000000
>>> R12=0000000000000000 R13=0000000000000000 R14=0000000000000000
>>> R15=00000040008006c0
>>>
>>> ../Makefile.target:153: recipe for target 'run-threadcount' failed
>>> make[2]: *** [run-threadcount] Error 1
>>
>> Bother.  I did see that failure on Travis, but assumed it was a false
>> positive because there were so many failures on master there.
>>
> 
>
Laurent Vivier Sept. 7, 2020, 2:31 p.m. UTC | #5
On 07/09/2020 16:05, Philippe Mathieu-Daudé wrote:
> Hi Thiago,
> 
> On 9/7/20 3:29 PM, Laurent Vivier wrote:
>> On 07/09/2020 04:38, David Gibson wrote:
>>> On Sun, Sep 06, 2020 at 04:20:10PM +0100, Peter Maydell wrote:
>>>> On Fri, 4 Sep 2020 at 04:47, David Gibson <david@gibson.dropbear.id.au> wrote:
>>>>>
>>>>> The following changes since commit 67a7bfe560a1bba59efab085cb3430f45176d382:
>>>>>
>>>>>   Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2020-09-03' into staging (2020-09-03 16:58:25 +0100)
>>>>>
>>>>> are available in the Git repository at:
>>>>>
>>>>>   git://github.com/dgibson/qemu.git tags/ppc-for-5.2-20200904
>>>>>
>>>>> for you to fetch changes up to b172606ecf29a140073f7787251a9d70ecb53b6e:
>>>>>
>>>>>   spapr_numa: move NVLink2 associativity handling to spapr_numa.c (2020-09-04 13:40:09 +1000)
>>>>>
>>>>> ----------------------------------------------------------------
>>>>> ppc patch queue 2020-09-04
>>>>>
>>>>> Next pull request for qemu-5.2.  The biggest thing here is the
>>>>> generalization of ARM's start-powered-off machine property to all
>>>>> targets.  This can fix a number of odd little edge cases where KVM
>>>>> could run vcpus before they were properly initialized.  This does
>>>>> include changes to a number of files that aren't normally in my
>>>>> purview.  There are suitable Acked-by lines and Peter requested this
>>>>> come in via my tree, since the most pressing requirement for it is in
>>>>> pseries machines with the POWER secure virtual machine facility.
>>>>>
>>>>> In addition we have:
>>>>>  * The start of Daniel Barboza's rework and clean up of pseries
>>>>>    machine NUMA handling
>>>>>  * Correction to behaviour of the nvdimm= generic machine property on
>>>>>    pseries
>>>>>  * An optimization to the allocation of XIVE interrupts on KVM
>>>>>  * Some fixes for confused behaviour with kernel_irqchip when both
>>>>>    XICS and XIVE are in play
>>>>>  * Add HIOMAP comamnd to pnv flash
>>>>>  * Properly advertise the fact that spapr_vscsi doesn't handle
>>>>>    hotplugged disks
>>>>>  * Some assorted minor enhancements
>>>>
>>>> Hi -- this fails to build for Windows:
>>>>
>>>> ../../hw/ppc/spapr_numa.c: In function 'spapr_numa_fixup_cpu_dt':
>>>> ../../hw/ppc/spapr_numa.c:77:5: error: unknown type name 'uint'
>>>>      uint vcpu_assoc_size = NUMA_ASSOC_SIZE + 1;
>>>>      ^
>>>
>>> Huh, that's weird.  My testing run was less thorough than I'd usually
>>> do, because so many tests were broken on the master branch, but I was
>>> pretty sure I did do successful mingw builds.
>>>
>>>> That should probably be using one of the standard C types.
>>>
>>> Done.
>>>
>>>> The 'check-tcg' tests for the linux-user static build also
>>>> failed on an s390x test:
>>>>
>>>>   CHECK   debian-s390x-cross
>>>>   BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
>>>>   RUN     tests for s390x
>>>>   TEST    threadcount on s390x
>>>> Unhandled trap: 0x10003
>>
>> This is EXCP_HALTED (include/exec/cpu-all.h)
>>
>> The message error comes from cpu_loop() in linux-user/s390x/cpu_loop.c.
>>
>> The trap can only come from accel/tcg/cpu-exec.c
>>
>>     679 int cpu_exec(CPUState *cpu)
>>     680 {
>> ...
>>     688     if (cpu_handle_halt(cpu)) {
>>     689         return EXCP_HALTED;
>>     690     }
>>
>> and
>>
>>     428 static inline bool cpu_handle_halt(CPUState *cpu)
>>     429 {
>>     430     if (cpu->halted) {
>> ...
>>     441         if (!cpu_has_work(cpu)) {
>>     442             return true;
>>     443         }
>>
>> and
>>
>>      58 static bool s390_cpu_has_work(CPUState *cs)
>>      59 {
>>      60     S390CPU *cpu = S390_CPU(cs);
>>      61
>>      62     /* STOPPED cpus can never wake up */
>>      63     if (s390_cpu_get_state(cpu) != S390_CPU_STATE_LOAD &&
>>      64         s390_cpu_get_state(cpu) != S390_CPU_STATE_OPERATING) {
>>      65         return false;
>>      66     }
>>      67
>>      68     if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>>      69         return false;
>>      70     }
>>      71
>>      72     return s390_cpu_has_int(cpu);
>>      73 }
>>
>> and in target/s390x/cpu.h:
>>
>>     772 #ifndef CONFIG_USER_ONLY
>>     773 unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
>>     774 #else
>>     775 static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
>> S390CPU *cpu)
>>     776 {
>>     777     return 0;
>>     778 }
>>     779 #endif /* CONFIG_USER_ONLY */
>>     780 static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
>>     781 {
>>     782     return cpu->env.cpu_state;
>>     783 }
>>
>> As cpu_state is never set, perhaps in case of linux-user it should
>> always return S390_CPU_STATE_OPERATING?
>>
>> Something like:
>>
>> diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
>> index 035427521cec..8a8628fcdcc6 100644
>> --- a/target/s390x/cpu.h
>> +++ b/target/s390x/cpu.h
>> @@ -771,16 +771,20 @@ int s390_assign_subch_ioeventfd(EventNotifier
>> *notifier, uint32_t sch_id,
>>                                  int vq, bool assign);
>>  #ifndef CONFIG_USER_ONLY
>>  unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
>> +static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
>> +{
>> +    return cpu->env.cpu_state;
>> +}
>>  #else
>>  static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
>> S390CPU *cpu)
>>  {
>>      return 0;
>>  }
>> -#endif /* CONFIG_USER_ONLY */
>>  static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
>>  {
>> -    return cpu->env.cpu_state;
>> +    return S390_CPU_STATE_OPERATING;
>>  }
>> +#endif /* CONFIG_USER_ONLY */
> 
> Since this is the effect of your "target/s390x: Use start-powered-off
> CPUState property" patch, can you have a look please?
> 

For information, the problem appears only with "--enable-debug-tcg" (I
have also "--static --enable-linux-user --disable-system --disable-tools").

CC: David Hildenbrand (s390 TCG CPU maintainer)

Thanks,
Laurent
Cornelia Huck Sept. 7, 2020, 2:51 p.m. UTC | #6
On Mon, 7 Sep 2020 16:31:24 +0200
Laurent Vivier <lvivier@redhat.com> wrote:

> On 07/09/2020 16:05, Philippe Mathieu-Daudé wrote:
> > Hi Thiago,
> > 
> > On 9/7/20 3:29 PM, Laurent Vivier wrote:  
> >> On 07/09/2020 04:38, David Gibson wrote:  
> >>> On Sun, Sep 06, 2020 at 04:20:10PM +0100, Peter Maydell wrote:  

> >>>> The 'check-tcg' tests for the linux-user static build also
> >>>> failed on an s390x test:
> >>>>
> >>>>   CHECK   debian-s390x-cross
> >>>>   BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
> >>>>   RUN     tests for s390x
> >>>>   TEST    threadcount on s390x
> >>>> Unhandled trap: 0x10003  
> >>
> >> This is EXCP_HALTED (include/exec/cpu-all.h)
> >>
> >> The message error comes from cpu_loop() in linux-user/s390x/cpu_loop.c.
> >>
> >> The trap can only come from accel/tcg/cpu-exec.c
> >>
> >>     679 int cpu_exec(CPUState *cpu)
> >>     680 {
> >> ...
> >>     688     if (cpu_handle_halt(cpu)) {
> >>     689         return EXCP_HALTED;
> >>     690     }
> >>
> >> and
> >>
> >>     428 static inline bool cpu_handle_halt(CPUState *cpu)
> >>     429 {
> >>     430     if (cpu->halted) {
> >> ...
> >>     441         if (!cpu_has_work(cpu)) {
> >>     442             return true;
> >>     443         }
> >>
> >> and
> >>
> >>      58 static bool s390_cpu_has_work(CPUState *cs)
> >>      59 {
> >>      60     S390CPU *cpu = S390_CPU(cs);
> >>      61
> >>      62     /* STOPPED cpus can never wake up */
> >>      63     if (s390_cpu_get_state(cpu) != S390_CPU_STATE_LOAD &&
> >>      64         s390_cpu_get_state(cpu) != S390_CPU_STATE_OPERATING) {
> >>      65         return false;
> >>      66     }
> >>      67
> >>      68     if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
> >>      69         return false;
> >>      70     }
> >>      71
> >>      72     return s390_cpu_has_int(cpu);
> >>      73 }
> >>
> >> and in target/s390x/cpu.h:
> >>
> >>     772 #ifndef CONFIG_USER_ONLY
> >>     773 unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
> >>     774 #else
> >>     775 static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
> >> S390CPU *cpu)
> >>     776 {
> >>     777     return 0;
> >>     778 }
> >>     779 #endif /* CONFIG_USER_ONLY */
> >>     780 static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
> >>     781 {
> >>     782     return cpu->env.cpu_state;
> >>     783 }
> >>
> >> As cpu_state is never set, perhaps in case of linux-user it should
> >> always return S390_CPU_STATE_OPERATING?

Possibly, we should not have any state handling for linux-user.

> >>
> >> Something like:
> >>
> >> diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
> >> index 035427521cec..8a8628fcdcc6 100644
> >> --- a/target/s390x/cpu.h
> >> +++ b/target/s390x/cpu.h
> >> @@ -771,16 +771,20 @@ int s390_assign_subch_ioeventfd(EventNotifier
> >> *notifier, uint32_t sch_id,
> >>                                  int vq, bool assign);
> >>  #ifndef CONFIG_USER_ONLY
> >>  unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
> >> +static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
> >> +{
> >> +    return cpu->env.cpu_state;
> >> +}
> >>  #else
> >>  static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
> >> S390CPU *cpu)
> >>  {
> >>      return 0;
> >>  }
> >> -#endif /* CONFIG_USER_ONLY */
> >>  static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
> >>  {
> >> -    return cpu->env.cpu_state;
> >> +    return S390_CPU_STATE_OPERATING;
> >>  }
> >> +#endif /* CONFIG_USER_ONLY */  
> > 
> > Since this is the effect of your "target/s390x: Use start-powered-off
> > CPUState property" patch, can you have a look please?
> >   
> 
> For information, the problem appears only with "--enable-debug-tcg" (I
> have also "--static --enable-linux-user --disable-system --disable-tools").

Ah, so that's why this did not show up in my testing.

> 
> CC: David Hildenbrand (s390 TCG CPU maintainer)
> 
> Thanks,
> Laurent
Laurent Vivier Sept. 7, 2020, 4:29 p.m. UTC | #7
On 07/09/2020 16:51, Cornelia Huck wrote:
> On Mon, 7 Sep 2020 16:31:24 +0200
> Laurent Vivier <lvivier@redhat.com> wrote:
> 
>> On 07/09/2020 16:05, Philippe Mathieu-Daudé wrote:
>>> Hi Thiago,
>>>
>>> On 9/7/20 3:29 PM, Laurent Vivier wrote:  
>>>> On 07/09/2020 04:38, David Gibson wrote:  
>>>>> On Sun, Sep 06, 2020 at 04:20:10PM +0100, Peter Maydell wrote:  
> 
>>>>>> The 'check-tcg' tests for the linux-user static build also
>>>>>> failed on an s390x test:
>>>>>>
>>>>>>   CHECK   debian-s390x-cross
>>>>>>   BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
>>>>>>   RUN     tests for s390x
>>>>>>   TEST    threadcount on s390x
>>>>>> Unhandled trap: 0x10003  
>>>>
>>>> This is EXCP_HALTED (include/exec/cpu-all.h)
>>>>
>>>> The message error comes from cpu_loop() in linux-user/s390x/cpu_loop.c.
>>>>
>>>> The trap can only come from accel/tcg/cpu-exec.c
>>>>
>>>>     679 int cpu_exec(CPUState *cpu)
>>>>     680 {
>>>> ...
>>>>     688     if (cpu_handle_halt(cpu)) {
>>>>     689         return EXCP_HALTED;
>>>>     690     }
>>>>
>>>> and
>>>>
>>>>     428 static inline bool cpu_handle_halt(CPUState *cpu)
>>>>     429 {
>>>>     430     if (cpu->halted) {
>>>> ...
>>>>     441         if (!cpu_has_work(cpu)) {
>>>>     442             return true;
>>>>     443         }
>>>>
>>>> and
>>>>
>>>>      58 static bool s390_cpu_has_work(CPUState *cs)
>>>>      59 {
>>>>      60     S390CPU *cpu = S390_CPU(cs);
>>>>      61
>>>>      62     /* STOPPED cpus can never wake up */
>>>>      63     if (s390_cpu_get_state(cpu) != S390_CPU_STATE_LOAD &&
>>>>      64         s390_cpu_get_state(cpu) != S390_CPU_STATE_OPERATING) {
>>>>      65         return false;
>>>>      66     }
>>>>      67
>>>>      68     if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>>>>      69         return false;
>>>>      70     }
>>>>      71
>>>>      72     return s390_cpu_has_int(cpu);
>>>>      73 }
>>>>
>>>> and in target/s390x/cpu.h:
>>>>
>>>>     772 #ifndef CONFIG_USER_ONLY
>>>>     773 unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
>>>>     774 #else
>>>>     775 static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
>>>> S390CPU *cpu)
>>>>     776 {
>>>>     777     return 0;
>>>>     778 }
>>>>     779 #endif /* CONFIG_USER_ONLY */
>>>>     780 static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
>>>>     781 {
>>>>     782     return cpu->env.cpu_state;
>>>>     783 }
>>>>
>>>> As cpu_state is never set, perhaps in case of linux-user it should
>>>> always return S390_CPU_STATE_OPERATING?
> 
> Possibly, we should not have any state handling for linux-user.
> 

I did that, but now 390_cpu_has_work() is false because
CPU_INTERRUPT_HARD is not set in cs->interrupt_request.

I think we should not enter in cpu_loop() with halted set to 1.

Before the patch of this series, s390_cpu_reset() is called twice, and
on the second call, halted is already 0.

With start_powered_off set to true in initfn, on the first reset
"halted" is 0 and on the second it is 1 (because it has been copied from
start_powered_off) and so cpu_loop() starts with halted set to 1 and fails.

Thanks,
Laurent
Laurent Vivier Sept. 7, 2020, 5:26 p.m. UTC | #8
On 07/09/2020 18:29, Laurent Vivier wrote:
> On 07/09/2020 16:51, Cornelia Huck wrote:
>> On Mon, 7 Sep 2020 16:31:24 +0200
>> Laurent Vivier <lvivier@redhat.com> wrote:
>>
>>> On 07/09/2020 16:05, Philippe Mathieu-Daudé wrote:
>>>> Hi Thiago,
>>>>
>>>> On 9/7/20 3:29 PM, Laurent Vivier wrote:  
>>>>> On 07/09/2020 04:38, David Gibson wrote:  
>>>>>> On Sun, Sep 06, 2020 at 04:20:10PM +0100, Peter Maydell wrote:  
>>
>>>>>>> The 'check-tcg' tests for the linux-user static build also
>>>>>>> failed on an s390x test:
>>>>>>>
>>>>>>>   CHECK   debian-s390x-cross
>>>>>>>   BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
>>>>>>>   RUN     tests for s390x
>>>>>>>   TEST    threadcount on s390x
>>>>>>> Unhandled trap: 0x10003  
>>>>>
>>>>> This is EXCP_HALTED (include/exec/cpu-all.h)
>>>>>
>>>>> The message error comes from cpu_loop() in linux-user/s390x/cpu_loop.c.
>>>>>
>>>>> The trap can only come from accel/tcg/cpu-exec.c
>>>>>
>>>>>     679 int cpu_exec(CPUState *cpu)
>>>>>     680 {
>>>>> ...
>>>>>     688     if (cpu_handle_halt(cpu)) {
>>>>>     689         return EXCP_HALTED;
>>>>>     690     }
>>>>>
>>>>> and
>>>>>
>>>>>     428 static inline bool cpu_handle_halt(CPUState *cpu)
>>>>>     429 {
>>>>>     430     if (cpu->halted) {
>>>>> ...
>>>>>     441         if (!cpu_has_work(cpu)) {
>>>>>     442             return true;
>>>>>     443         }
>>>>>
>>>>> and
>>>>>
>>>>>      58 static bool s390_cpu_has_work(CPUState *cs)
>>>>>      59 {
>>>>>      60     S390CPU *cpu = S390_CPU(cs);
>>>>>      61
>>>>>      62     /* STOPPED cpus can never wake up */
>>>>>      63     if (s390_cpu_get_state(cpu) != S390_CPU_STATE_LOAD &&
>>>>>      64         s390_cpu_get_state(cpu) != S390_CPU_STATE_OPERATING) {
>>>>>      65         return false;
>>>>>      66     }
>>>>>      67
>>>>>      68     if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>>>>>      69         return false;
>>>>>      70     }
>>>>>      71
>>>>>      72     return s390_cpu_has_int(cpu);
>>>>>      73 }
>>>>>
>>>>> and in target/s390x/cpu.h:
>>>>>
>>>>>     772 #ifndef CONFIG_USER_ONLY
>>>>>     773 unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
>>>>>     774 #else
>>>>>     775 static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
>>>>> S390CPU *cpu)
>>>>>     776 {
>>>>>     777     return 0;
>>>>>     778 }
>>>>>     779 #endif /* CONFIG_USER_ONLY */
>>>>>     780 static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
>>>>>     781 {
>>>>>     782     return cpu->env.cpu_state;
>>>>>     783 }
>>>>>
>>>>> As cpu_state is never set, perhaps in case of linux-user it should
>>>>> always return S390_CPU_STATE_OPERATING?
>>
>> Possibly, we should not have any state handling for linux-user.
>>
> 
> I did that, but now 390_cpu_has_work() is false because
> CPU_INTERRUPT_HARD is not set in cs->interrupt_request.
> 
> I think we should not enter in cpu_loop() with halted set to 1.
> 
> Before the patch of this series, s390_cpu_reset() is called twice, and
> on the second call, halted is already 0.
> 
> With start_powered_off set to true in initfn, on the first reset
> "halted" is 0 and on the second it is 1 (because it has been copied from
> start_powered_off) and so cpu_loop() starts with halted set to 1 and fails.

What is happening:

[without start_powered_off]

  1- halted is set to 1 in s390x_cpu_initfn()
  2- halted is set to 0 in s390x_cpu_reset() by parent_reset()
     (cpu_common_reset()
  3- cpu_loop() is always entered with halted set to 0

[with start_powered_off]

  1- halted is set to start_powered_off (1) in s390x_cpu_reset() by
     parent_reset() (cpu_common_reset()
  2- cpu_loop() is always entered with halted set to 1

So in the first case, cpu_loop() is always started with halted set to 0
and in the second case with halted set to 1.

And I think, with linux-user, it should never be started with halted set
to 1.

We can't add a "#ifdef CONFIG_USER_ONLY" in hw/core/cpu.c to set halted
to 0 because it is in the common files, but we can do:

diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
index 73d7d6007e8e..749cd548f0f3 100644
--- a/target/s390x/cpu.c
+++ b/target/s390x/cpu.c
@@ -291,9 +291,9 @@ static void s390_cpu_initfn(Object *obj)
     S390CPU *cpu = S390_CPU(obj);

     cpu_set_cpustate_pointers(cpu);
-    cs->start_powered_off = true;
     cs->exception_index = EXCP_HLT;
 #if !defined(CONFIG_USER_ONLY)
+    cs->start_powered_off = true;
     object_property_add(obj, "crash-information", "GuestPanicInformation",
                         s390_cpu_get_crash_info_qom, NULL, NULL, NULL);
     cpu->env.tod_timer =

Thanks,
Laurent
Philippe Mathieu-Daudé Sept. 7, 2020, 7:46 p.m. UTC | #9
On 9/7/20 7:26 PM, Laurent Vivier wrote:
> On 07/09/2020 18:29, Laurent Vivier wrote:
>> On 07/09/2020 16:51, Cornelia Huck wrote:
>>> On Mon, 7 Sep 2020 16:31:24 +0200
>>> Laurent Vivier <lvivier@redhat.com> wrote:
>>>
>>>> On 07/09/2020 16:05, Philippe Mathieu-Daudé wrote:
>>>>> Hi Thiago,
>>>>>
>>>>> On 9/7/20 3:29 PM, Laurent Vivier wrote:  
>>>>>> On 07/09/2020 04:38, David Gibson wrote:  
>>>>>>> On Sun, Sep 06, 2020 at 04:20:10PM +0100, Peter Maydell wrote:  
>>>
>>>>>>>> The 'check-tcg' tests for the linux-user static build also
>>>>>>>> failed on an s390x test:
>>>>>>>>
>>>>>>>>   CHECK   debian-s390x-cross
>>>>>>>>   BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
>>>>>>>>   RUN     tests for s390x
>>>>>>>>   TEST    threadcount on s390x
>>>>>>>> Unhandled trap: 0x10003  
>>>>>>
>>>>>> This is EXCP_HALTED (include/exec/cpu-all.h)
>>>>>>
>>>>>> The message error comes from cpu_loop() in linux-user/s390x/cpu_loop.c.
>>>>>>
>>>>>> The trap can only come from accel/tcg/cpu-exec.c
>>>>>>
>>>>>>     679 int cpu_exec(CPUState *cpu)
>>>>>>     680 {
>>>>>> ...
>>>>>>     688     if (cpu_handle_halt(cpu)) {
>>>>>>     689         return EXCP_HALTED;
>>>>>>     690     }
>>>>>>
>>>>>> and
>>>>>>
>>>>>>     428 static inline bool cpu_handle_halt(CPUState *cpu)
>>>>>>     429 {
>>>>>>     430     if (cpu->halted) {
>>>>>> ...
>>>>>>     441         if (!cpu_has_work(cpu)) {
>>>>>>     442             return true;
>>>>>>     443         }
>>>>>>
>>>>>> and
>>>>>>
>>>>>>      58 static bool s390_cpu_has_work(CPUState *cs)
>>>>>>      59 {
>>>>>>      60     S390CPU *cpu = S390_CPU(cs);
>>>>>>      61
>>>>>>      62     /* STOPPED cpus can never wake up */
>>>>>>      63     if (s390_cpu_get_state(cpu) != S390_CPU_STATE_LOAD &&
>>>>>>      64         s390_cpu_get_state(cpu) != S390_CPU_STATE_OPERATING) {
>>>>>>      65         return false;
>>>>>>      66     }
>>>>>>      67
>>>>>>      68     if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>>>>>>      69         return false;
>>>>>>      70     }
>>>>>>      71
>>>>>>      72     return s390_cpu_has_int(cpu);
>>>>>>      73 }
>>>>>>
>>>>>> and in target/s390x/cpu.h:
>>>>>>
>>>>>>     772 #ifndef CONFIG_USER_ONLY
>>>>>>     773 unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
>>>>>>     774 #else
>>>>>>     775 static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
>>>>>> S390CPU *cpu)
>>>>>>     776 {
>>>>>>     777     return 0;
>>>>>>     778 }
>>>>>>     779 #endif /* CONFIG_USER_ONLY */
>>>>>>     780 static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
>>>>>>     781 {
>>>>>>     782     return cpu->env.cpu_state;
>>>>>>     783 }
>>>>>>
>>>>>> As cpu_state is never set, perhaps in case of linux-user it should
>>>>>> always return S390_CPU_STATE_OPERATING?
>>>
>>> Possibly, we should not have any state handling for linux-user.
>>>
>>
>> I did that, but now 390_cpu_has_work() is false because
>> CPU_INTERRUPT_HARD is not set in cs->interrupt_request.
>>
>> I think we should not enter in cpu_loop() with halted set to 1.
>>
>> Before the patch of this series, s390_cpu_reset() is called twice, and
>> on the second call, halted is already 0.
>>
>> With start_powered_off set to true in initfn, on the first reset
>> "halted" is 0 and on the second it is 1 (because it has been copied from
>> start_powered_off) and so cpu_loop() starts with halted set to 1 and fails.
> 
> What is happening:
> 
> [without start_powered_off]
> 
>   1- halted is set to 1 in s390x_cpu_initfn()
>   2- halted is set to 0 in s390x_cpu_reset() by parent_reset()
>      (cpu_common_reset()
>   3- cpu_loop() is always entered with halted set to 0
> 
> [with start_powered_off]
> 
>   1- halted is set to start_powered_off (1) in s390x_cpu_reset() by
>      parent_reset() (cpu_common_reset()
>   2- cpu_loop() is always entered with halted set to 1
> 
> So in the first case, cpu_loop() is always started with halted set to 0
> and in the second case with halted set to 1.
> 
> And I think, with linux-user, it should never be started with halted set
> to 1.
> 
> We can't add a "#ifdef CONFIG_USER_ONLY" in hw/core/cpu.c to set halted
> to 0 because it is in the common files, but we can do:
> 
> diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
> index 73d7d6007e8e..749cd548f0f3 100644
> --- a/target/s390x/cpu.c
> +++ b/target/s390x/cpu.c
> @@ -291,9 +291,9 @@ static void s390_cpu_initfn(Object *obj)
>      S390CPU *cpu = S390_CPU(obj);
> 
>      cpu_set_cpustate_pointers(cpu);
> -    cs->start_powered_off = true;
>      cs->exception_index = EXCP_HLT;
>  #if !defined(CONFIG_USER_ONLY)
> +    cs->start_powered_off = true;
>      object_property_add(obj, "crash-information", "GuestPanicInformation",
>                          s390_cpu_get_crash_info_qom, NULL, NULL, NULL);
>      cpu->env.tod_timer =

This looks like the correct fix indeed :)
(Maybe worth adding a comment around).

Thanks for investigating!

> 
> Thanks,
> Laurent
David Gibson Sept. 7, 2020, 11:50 p.m. UTC | #10
On Mon, Sep 07, 2020 at 09:46:28PM +0200, Philippe Mathieu-Daudé wrote:
> On 9/7/20 7:26 PM, Laurent Vivier wrote:
> > On 07/09/2020 18:29, Laurent Vivier wrote:
> >> On 07/09/2020 16:51, Cornelia Huck wrote:
> >>> On Mon, 7 Sep 2020 16:31:24 +0200
> >>> Laurent Vivier <lvivier@redhat.com> wrote:
> >>>
> >>>> On 07/09/2020 16:05, Philippe Mathieu-Daudé wrote:
> >>>>> Hi Thiago,
> >>>>>
> >>>>> On 9/7/20 3:29 PM, Laurent Vivier wrote:  
> >>>>>> On 07/09/2020 04:38, David Gibson wrote:  
> >>>>>>> On Sun, Sep 06, 2020 at 04:20:10PM +0100, Peter Maydell wrote:  
> >>>
> >>>>>>>> The 'check-tcg' tests for the linux-user static build also
> >>>>>>>> failed on an s390x test:
> >>>>>>>>
> >>>>>>>>   CHECK   debian-s390x-cross
> >>>>>>>>   BUILD   s390x-linux-user guest-tests with docker qemu/debian-s390x-cross
> >>>>>>>>   RUN     tests for s390x
> >>>>>>>>   TEST    threadcount on s390x
> >>>>>>>> Unhandled trap: 0x10003  
> >>>>>>
> >>>>>> This is EXCP_HALTED (include/exec/cpu-all.h)
> >>>>>>
> >>>>>> The message error comes from cpu_loop() in linux-user/s390x/cpu_loop.c.
> >>>>>>
> >>>>>> The trap can only come from accel/tcg/cpu-exec.c
> >>>>>>
> >>>>>>     679 int cpu_exec(CPUState *cpu)
> >>>>>>     680 {
> >>>>>> ...
> >>>>>>     688     if (cpu_handle_halt(cpu)) {
> >>>>>>     689         return EXCP_HALTED;
> >>>>>>     690     }
> >>>>>>
> >>>>>> and
> >>>>>>
> >>>>>>     428 static inline bool cpu_handle_halt(CPUState *cpu)
> >>>>>>     429 {
> >>>>>>     430     if (cpu->halted) {
> >>>>>> ...
> >>>>>>     441         if (!cpu_has_work(cpu)) {
> >>>>>>     442             return true;
> >>>>>>     443         }
> >>>>>>
> >>>>>> and
> >>>>>>
> >>>>>>      58 static bool s390_cpu_has_work(CPUState *cs)
> >>>>>>      59 {
> >>>>>>      60     S390CPU *cpu = S390_CPU(cs);
> >>>>>>      61
> >>>>>>      62     /* STOPPED cpus can never wake up */
> >>>>>>      63     if (s390_cpu_get_state(cpu) != S390_CPU_STATE_LOAD &&
> >>>>>>      64         s390_cpu_get_state(cpu) != S390_CPU_STATE_OPERATING) {
> >>>>>>      65         return false;
> >>>>>>      66     }
> >>>>>>      67
> >>>>>>      68     if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
> >>>>>>      69         return false;
> >>>>>>      70     }
> >>>>>>      71
> >>>>>>      72     return s390_cpu_has_int(cpu);
> >>>>>>      73 }
> >>>>>>
> >>>>>> and in target/s390x/cpu.h:
> >>>>>>
> >>>>>>     772 #ifndef CONFIG_USER_ONLY
> >>>>>>     773 unsigned int s390_cpu_set_state(uint8_t cpu_state, S390CPU *cpu);
> >>>>>>     774 #else
> >>>>>>     775 static inline unsigned int s390_cpu_set_state(uint8_t cpu_state,
> >>>>>> S390CPU *cpu)
> >>>>>>     776 {
> >>>>>>     777     return 0;
> >>>>>>     778 }
> >>>>>>     779 #endif /* CONFIG_USER_ONLY */
> >>>>>>     780 static inline uint8_t s390_cpu_get_state(S390CPU *cpu)
> >>>>>>     781 {
> >>>>>>     782     return cpu->env.cpu_state;
> >>>>>>     783 }
> >>>>>>
> >>>>>> As cpu_state is never set, perhaps in case of linux-user it should
> >>>>>> always return S390_CPU_STATE_OPERATING?
> >>>
> >>> Possibly, we should not have any state handling for linux-user.
> >>>
> >>
> >> I did that, but now 390_cpu_has_work() is false because
> >> CPU_INTERRUPT_HARD is not set in cs->interrupt_request.
> >>
> >> I think we should not enter in cpu_loop() with halted set to 1.
> >>
> >> Before the patch of this series, s390_cpu_reset() is called twice, and
> >> on the second call, halted is already 0.
> >>
> >> With start_powered_off set to true in initfn, on the first reset
> >> "halted" is 0 and on the second it is 1 (because it has been copied from
> >> start_powered_off) and so cpu_loop() starts with halted set to 1 and fails.
> > 
> > What is happening:
> > 
> > [without start_powered_off]
> > 
> >   1- halted is set to 1 in s390x_cpu_initfn()
> >   2- halted is set to 0 in s390x_cpu_reset() by parent_reset()
> >      (cpu_common_reset()
> >   3- cpu_loop() is always entered with halted set to 0
> > 
> > [with start_powered_off]
> > 
> >   1- halted is set to start_powered_off (1) in s390x_cpu_reset() by
> >      parent_reset() (cpu_common_reset()
> >   2- cpu_loop() is always entered with halted set to 1
> > 
> > So in the first case, cpu_loop() is always started with halted set to 0
> > and in the second case with halted set to 1.
> > 
> > And I think, with linux-user, it should never be started with halted set
> > to 1.
> > 
> > We can't add a "#ifdef CONFIG_USER_ONLY" in hw/core/cpu.c to set halted
> > to 0 because it is in the common files, but we can do:
> > 
> > diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
> > index 73d7d6007e8e..749cd548f0f3 100644
> > --- a/target/s390x/cpu.c
> > +++ b/target/s390x/cpu.c
> > @@ -291,9 +291,9 @@ static void s390_cpu_initfn(Object *obj)
> >      S390CPU *cpu = S390_CPU(obj);
> > 
> >      cpu_set_cpustate_pointers(cpu);
> > -    cs->start_powered_off = true;
> >      cs->exception_index = EXCP_HLT;
> >  #if !defined(CONFIG_USER_ONLY)
> > +    cs->start_powered_off = true;
> >      object_property_add(obj, "crash-information", "GuestPanicInformation",
> >                          s390_cpu_get_crash_info_qom, NULL, NULL, NULL);
> >      cpu->env.tod_timer =
> 
> This looks like the correct fix indeed :)
> (Maybe worth adding a comment around).
> 
> Thanks for investigating!

Yes, thanks for figuring this out.  I'll fix up my PR accordingly and
resend today.
Cornelia Huck Sept. 8, 2020, 6:11 a.m. UTC | #11
On Mon, 7 Sep 2020 21:46:28 +0200
Philippe Mathieu-Daudé <philmd@redhat.com> wrote:

> On 9/7/20 7:26 PM, Laurent Vivier wrote:
> > On 07/09/2020 18:29, Laurent Vivier wrote:  

> >> I think we should not enter in cpu_loop() with halted set to 1.
> >>
> >> Before the patch of this series, s390_cpu_reset() is called twice, and
> >> on the second call, halted is already 0.
> >>
> >> With start_powered_off set to true in initfn, on the first reset
> >> "halted" is 0 and on the second it is 1 (because it has been copied from
> >> start_powered_off) and so cpu_loop() starts with halted set to 1 and fails.  
> > 
> > What is happening:
> > 
> > [without start_powered_off]
> > 
> >   1- halted is set to 1 in s390x_cpu_initfn()
> >   2- halted is set to 0 in s390x_cpu_reset() by parent_reset()
> >      (cpu_common_reset()
> >   3- cpu_loop() is always entered with halted set to 0
> > 
> > [with start_powered_off]
> > 
> >   1- halted is set to start_powered_off (1) in s390x_cpu_reset() by
> >      parent_reset() (cpu_common_reset()
> >   2- cpu_loop() is always entered with halted set to 1
> > 
> > So in the first case, cpu_loop() is always started with halted set to 0
> > and in the second case with halted set to 1.
> > 
> > And I think, with linux-user, it should never be started with halted set
> > to 1.

linux-user always confuses me a bit, but this seems right.

> > 
> > We can't add a "#ifdef CONFIG_USER_ONLY" in hw/core/cpu.c to set halted
> > to 0 because it is in the common files, but we can do:
> > 
> > diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
> > index 73d7d6007e8e..749cd548f0f3 100644
> > --- a/target/s390x/cpu.c
> > +++ b/target/s390x/cpu.c
> > @@ -291,9 +291,9 @@ static void s390_cpu_initfn(Object *obj)
> >      S390CPU *cpu = S390_CPU(obj);
> > 
> >      cpu_set_cpustate_pointers(cpu);
> > -    cs->start_powered_off = true;
> >      cs->exception_index = EXCP_HLT;
> >  #if !defined(CONFIG_USER_ONLY)
> > +    cs->start_powered_off = true;
> >      object_property_add(obj, "crash-information", "GuestPanicInformation",
> >                          s390_cpu_get_crash_info_qom, NULL, NULL, NULL);
> >      cpu->env.tod_timer =  
> 
> This looks like the correct fix indeed :)
> (Maybe worth adding a comment around).

Agreed on both counts.

> Thanks for investigating!

And here as well :)
Thiago Jung Bauermann Sept. 8, 2020, 3:12 p.m. UTC | #12
Cornelia Huck <cohuck@redhat.com> writes:

> On Mon, 7 Sep 2020 21:46:28 +0200
> Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>
>> On 9/7/20 7:26 PM, Laurent Vivier wrote:
>>
>> > We can't add a "#ifdef CONFIG_USER_ONLY" in hw/core/cpu.c to set halted
>> > to 0 because it is in the common files, but we can do:
>> > 
>> > diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
>> > index 73d7d6007e8e..749cd548f0f3 100644
>> > --- a/target/s390x/cpu.c
>> > +++ b/target/s390x/cpu.c
>> > @@ -291,9 +291,9 @@ static void s390_cpu_initfn(Object *obj)
>> >      S390CPU *cpu = S390_CPU(obj);
>> > 
>> >      cpu_set_cpustate_pointers(cpu);
>> > -    cs->start_powered_off = true;
>> >      cs->exception_index = EXCP_HLT;
>> >  #if !defined(CONFIG_USER_ONLY)
>> > +    cs->start_powered_off = true;
>> >      object_property_add(obj, "crash-information", "GuestPanicInformation",
>> >                          s390_cpu_get_crash_info_qom, NULL, NULL, NULL);
>> >      cpu->env.tod_timer =  
>> 
>> This looks like the correct fix indeed :)
>> (Maybe worth adding a comment around).
>
> Agreed on both counts.
>
>> Thanks for investigating!
>
> And here as well :)

Thank you very much for investigating and fixing this problem, Laurent!

Sorry for not working on this issue. I was out on an extended weekend
and just came back.