Message ID | 20161115015049.2735-1-lersek@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 15/11/2016 02:50, Laszlo Ersek wrote: > The generic edk2 SMM infrastructure prefers > EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If > Trigger() only brings the current processor into SMM, then edk2 handles it > in the following ways: > > (1) If Trigger() is executed by the BSP (which is guaranteed before > ExitBootServices(), but is not necessarily true at runtime), then: > > (a) If edk2 has been configured for "traditional" SMM synchronization, > then the BSP sends directed SMIs to the APs with APIC delivery, > bringing them into SMM individually. Then the BSP runs the SMI > handler / dispatcher. > > (b) If edk2 has been configured for "relaxed" SMM synchronization, > then the APs that are not already in SMM are not brought in, and > the BSP runs the SMI handler / dispatcher. > > (2) If Trigger() is executed by an AP (which is possible after > ExitBootServices(), and can be forced e.g. by "taskset -c 1 > efibootmgr"), then the AP in question brings in the BSP with a > directed SMI, and the BSP runs the SMI handler / dispatcher. > > The smaller problem with (1a) and (2) is that the BSP and AP > synchronization is slow. For example, the "taskset -c 1 efibootmgr" > command from (2) can take more than 3 seconds to complete, because > efibootmgr accesses non-volatile UEFI variables intensively. > > The larger problem is that QEMU's current behavior diverges from the > behavior usually seen on physical hardware, and that keeps exposing > obscure corner cases, race conditions and other instabilities in edk2, > which generally expects / prefers a software SMI to affect all CPUs at > once. > > Therefore introduce a special APM_STS value (0x51) that causes QEMU to > inject the SMI on all VCPUs. OVMF's EFI_SMM_CONTROL2_PROTOCOL.Trigger() > can utilize this to accommodate edk2's preference about "broadcast" SMI. > > SeaBIOS uses values 0x00 and 0x01 for APM_STS (called PORT_SMI_STATUS in > the SeaBIOS code), so this change should be transparent to it. > > While the original posting of this patch > <http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html> > only intended to speed up (2), based on our recent "stress testing" of SMM > this patch actually provides functional improvements. (There are no code > changes relative to the original posting.) > > Cc: "Michael S. Tsirkin" <mst@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Also-suggested-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Laszlo Ersek <lersek@redhat.com> I'm queuing this for 2.8, but I have a question---should this feature be detectable, and if so how? Paolo > --- > hw/isa/lpc_ich9.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c > index 10d1ee8b9310..f2fe644fdaa4 100644 > --- a/hw/isa/lpc_ich9.c > +++ b/hw/isa/lpc_ich9.c > @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled) > > /* APM */ > > +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' > + > static void ich9_apm_ctrl_changed(uint32_t val, void *arg) > { > ICH9LPCState *lpc = arg; > @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, void *arg) > > /* SMI_EN = PMBASE + 30. SMI control and enable register */ > if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { > - cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > + if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { > + CPUState *cs; > + > + CPU_FOREACH(cs) { > + cpu_interrupt(cs, CPU_INTERRUPT_SMI); > + } > + } else { > + cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > + } > } > } > >
On 11/15/16 14:59, Paolo Bonzini wrote: > > > On 15/11/2016 02:50, Laszlo Ersek wrote: >> The generic edk2 SMM infrastructure prefers >> EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If >> Trigger() only brings the current processor into SMM, then edk2 handles it >> in the following ways: >> >> (1) If Trigger() is executed by the BSP (which is guaranteed before >> ExitBootServices(), but is not necessarily true at runtime), then: >> >> (a) If edk2 has been configured for "traditional" SMM synchronization, >> then the BSP sends directed SMIs to the APs with APIC delivery, >> bringing them into SMM individually. Then the BSP runs the SMI >> handler / dispatcher. >> >> (b) If edk2 has been configured for "relaxed" SMM synchronization, >> then the APs that are not already in SMM are not brought in, and >> the BSP runs the SMI handler / dispatcher. >> >> (2) If Trigger() is executed by an AP (which is possible after >> ExitBootServices(), and can be forced e.g. by "taskset -c 1 >> efibootmgr"), then the AP in question brings in the BSP with a >> directed SMI, and the BSP runs the SMI handler / dispatcher. >> >> The smaller problem with (1a) and (2) is that the BSP and AP >> synchronization is slow. For example, the "taskset -c 1 efibootmgr" >> command from (2) can take more than 3 seconds to complete, because >> efibootmgr accesses non-volatile UEFI variables intensively. >> >> The larger problem is that QEMU's current behavior diverges from the >> behavior usually seen on physical hardware, and that keeps exposing >> obscure corner cases, race conditions and other instabilities in edk2, >> which generally expects / prefers a software SMI to affect all CPUs at >> once. >> >> Therefore introduce a special APM_STS value (0x51) that causes QEMU to >> inject the SMI on all VCPUs. OVMF's EFI_SMM_CONTROL2_PROTOCOL.Trigger() >> can utilize this to accommodate edk2's preference about "broadcast" SMI. >> >> SeaBIOS uses values 0x00 and 0x01 for APM_STS (called PORT_SMI_STATUS in >> the SeaBIOS code), so this change should be transparent to it. >> >> While the original posting of this patch >> <http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html> >> only intended to speed up (2), based on our recent "stress testing" of SMM >> this patch actually provides functional improvements. (There are no code >> changes relative to the original posting.) >> >> Cc: "Michael S. Tsirkin" <mst@redhat.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Also-suggested-by: Paolo Bonzini <pbonzini@redhat.com> >> Signed-off-by: Laszlo Ersek <lersek@redhat.com> > > I'm queuing this for 2.8, Thank you! > but I have a question---should this feature be > detectable, and if so how? That's the exact question I wanted to ask you. :) ... How about an fw_cfg file? For example: - name: etc/broadcast_smi - value type: uint8_t - role: if the file exists, the broadcast SMI capability exists. Read the uint8_t value from the fw_cfg file. Write the uint8_t value read to ICH9_APM_STS first, before triggering the SMI via ICH9_APM_CNT, to request a broadcast SMI. The values 0 and 1 are reserved for SeaBIOS, so if the fw_cfg file exists, those values will never be provided. This would allow me to make the OVMF patches more robust: - first, I don't have to hardcode the value 'Q' in SmmControl2Dxe, - second, I can check for the presence of "etc/broadcast_smi" in PlatformPei, and set PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode dynamically. I would quite like this approach, as simply reverting the PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode settings in OVMF will break SMM hard on QEMUs that do not support the broadcast SMI feature. The default would remain the current setting. Also, if we add the fw_cfg item, we don't need to rush this into 2.8 I think (of course I wouldn't mind making 2.8 nevertheless). Do you think we should make the broadcast SMI capability (and the descriptor fw_cfg file) machine-type dependent? I do think so (if for nothing else then for "rather safe than sorry"), but I always struggle with this kind of question, so any advice is most welcome... Thank you! Laszlo > > Paolo > >> --- >> hw/isa/lpc_ich9.c | 12 +++++++++++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c >> index 10d1ee8b9310..f2fe644fdaa4 100644 >> --- a/hw/isa/lpc_ich9.c >> +++ b/hw/isa/lpc_ich9.c >> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled) >> >> /* APM */ >> >> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' >> + >> static void ich9_apm_ctrl_changed(uint32_t val, void *arg) >> { >> ICH9LPCState *lpc = arg; >> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, void *arg) >> >> /* SMI_EN = PMBASE + 30. SMI control and enable register */ >> if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { >> - cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); >> + if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { >> + CPUState *cs; >> + >> + CPU_FOREACH(cs) { >> + cpu_interrupt(cs, CPU_INTERRUPT_SMI); >> + } >> + } else { >> + cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); >> + } >> } >> } >> >>
On Tue, Nov 15, 2016 at 04:39:04PM +0100, Laszlo Ersek wrote: > On 11/15/16 14:59, Paolo Bonzini wrote: > > > > > > On 15/11/2016 02:50, Laszlo Ersek wrote: > >> The generic edk2 SMM infrastructure prefers > >> EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If > >> Trigger() only brings the current processor into SMM, then edk2 handles it > >> in the following ways: > >> > >> (1) If Trigger() is executed by the BSP (which is guaranteed before > >> ExitBootServices(), but is not necessarily true at runtime), then: > >> > >> (a) If edk2 has been configured for "traditional" SMM synchronization, > >> then the BSP sends directed SMIs to the APs with APIC delivery, > >> bringing them into SMM individually. Then the BSP runs the SMI > >> handler / dispatcher. > >> > >> (b) If edk2 has been configured for "relaxed" SMM synchronization, > >> then the APs that are not already in SMM are not brought in, and > >> the BSP runs the SMI handler / dispatcher. > >> > >> (2) If Trigger() is executed by an AP (which is possible after > >> ExitBootServices(), and can be forced e.g. by "taskset -c 1 > >> efibootmgr"), then the AP in question brings in the BSP with a > >> directed SMI, and the BSP runs the SMI handler / dispatcher. > >> > >> The smaller problem with (1a) and (2) is that the BSP and AP > >> synchronization is slow. For example, the "taskset -c 1 efibootmgr" > >> command from (2) can take more than 3 seconds to complete, because > >> efibootmgr accesses non-volatile UEFI variables intensively. > >> > >> The larger problem is that QEMU's current behavior diverges from the > >> behavior usually seen on physical hardware, and that keeps exposing > >> obscure corner cases, race conditions and other instabilities in edk2, > >> which generally expects / prefers a software SMI to affect all CPUs at > >> once. > >> > >> Therefore introduce a special APM_STS value (0x51) that causes QEMU to > >> inject the SMI on all VCPUs. OVMF's EFI_SMM_CONTROL2_PROTOCOL.Trigger() > >> can utilize this to accommodate edk2's preference about "broadcast" SMI. > >> > >> SeaBIOS uses values 0x00 and 0x01 for APM_STS (called PORT_SMI_STATUS in > >> the SeaBIOS code), so this change should be transparent to it. > >> > >> While the original posting of this patch > >> <http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html> > >> only intended to speed up (2), based on our recent "stress testing" of SMM > >> this patch actually provides functional improvements. (There are no code > >> changes relative to the original posting.) > >> > >> Cc: "Michael S. Tsirkin" <mst@redhat.com> > >> Cc: Paolo Bonzini <pbonzini@redhat.com> > >> Also-suggested-by: Paolo Bonzini <pbonzini@redhat.com> > >> Signed-off-by: Laszlo Ersek <lersek@redhat.com> > > > > I'm queuing this for 2.8, > > Thank you! > > > but I have a question---should this feature be > > detectable, and if so how? > > That's the exact question I wanted to ask you. :) > > ... How about an fw_cfg file? For example: > > - name: etc/broadcast_smi > - value type: uint8_t > - role: if the file exists, the broadcast SMI capability exists. Read > the uint8_t value from the fw_cfg file. Write the uint8_t value read to > ICH9_APM_STS first, before triggering the SMI via ICH9_APM_CNT, to > request a broadcast SMI. The values 0 and 1 are reserved for SeaBIOS, so > if the fw_cfg file exists, those values will never be provided. > > This would allow me to make the OVMF patches more robust: > - first, I don't have to hardcode the value 'Q' in SmmControl2Dxe, > - second, I can check for the presence of "etc/broadcast_smi" in > PlatformPei, and set PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode > dynamically. > > I would quite like this approach, as simply reverting the > PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode settings in OVMF will break > SMM hard on QEMUs that do not support the broadcast SMI feature. The > default would remain the current setting. > > Also, if we add the fw_cfg item, we don't need to rush this into 2.8 I > think (of course I wouldn't mind making 2.8 nevertheless). > > Do you think we should make the broadcast SMI capability (and the > descriptor fw_cfg file) machine-type dependent? I do think so (if for > nothing else then for "rather safe than sorry"), but I always struggle > with this kind of question, so any advice is most welcome... > > Thank you! > Laszlo Hmm it's a bugfix so why not just backport the fix? I don't see why do we want work-arounds in UEFI - does not seem much easier. > > > > Paolo > > > >> --- > >> hw/isa/lpc_ich9.c | 12 +++++++++++- > >> 1 file changed, 11 insertions(+), 1 deletion(-) > >> > >> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c > >> index 10d1ee8b9310..f2fe644fdaa4 100644 > >> --- a/hw/isa/lpc_ich9.c > >> +++ b/hw/isa/lpc_ich9.c > >> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled) > >> > >> /* APM */ > >> > >> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' > >> + > >> static void ich9_apm_ctrl_changed(uint32_t val, void *arg) > >> { > >> ICH9LPCState *lpc = arg; > >> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, void *arg) > >> > >> /* SMI_EN = PMBASE + 30. SMI control and enable register */ > >> if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { > >> - cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > >> + if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { > >> + CPUState *cs; > >> + > >> + CPU_FOREACH(cs) { > >> + cpu_interrupt(cs, CPU_INTERRUPT_SMI); > >> + } > >> + } else { > >> + cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > >> + } > >> } > >> } > >> > >>
On 11/15/16 16:45, Michael S. Tsirkin wrote: > On Tue, Nov 15, 2016 at 04:39:04PM +0100, Laszlo Ersek wrote: >> On 11/15/16 14:59, Paolo Bonzini wrote: >>> >>> >>> On 15/11/2016 02:50, Laszlo Ersek wrote: >>>> The generic edk2 SMM infrastructure prefers >>>> EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If >>>> Trigger() only brings the current processor into SMM, then edk2 handles it >>>> in the following ways: >>>> >>>> (1) If Trigger() is executed by the BSP (which is guaranteed before >>>> ExitBootServices(), but is not necessarily true at runtime), then: >>>> >>>> (a) If edk2 has been configured for "traditional" SMM synchronization, >>>> then the BSP sends directed SMIs to the APs with APIC delivery, >>>> bringing them into SMM individually. Then the BSP runs the SMI >>>> handler / dispatcher. >>>> >>>> (b) If edk2 has been configured for "relaxed" SMM synchronization, >>>> then the APs that are not already in SMM are not brought in, and >>>> the BSP runs the SMI handler / dispatcher. >>>> >>>> (2) If Trigger() is executed by an AP (which is possible after >>>> ExitBootServices(), and can be forced e.g. by "taskset -c 1 >>>> efibootmgr"), then the AP in question brings in the BSP with a >>>> directed SMI, and the BSP runs the SMI handler / dispatcher. >>>> >>>> The smaller problem with (1a) and (2) is that the BSP and AP >>>> synchronization is slow. For example, the "taskset -c 1 efibootmgr" >>>> command from (2) can take more than 3 seconds to complete, because >>>> efibootmgr accesses non-volatile UEFI variables intensively. >>>> >>>> The larger problem is that QEMU's current behavior diverges from the >>>> behavior usually seen on physical hardware, and that keeps exposing >>>> obscure corner cases, race conditions and other instabilities in edk2, >>>> which generally expects / prefers a software SMI to affect all CPUs at >>>> once. >>>> >>>> Therefore introduce a special APM_STS value (0x51) that causes QEMU to >>>> inject the SMI on all VCPUs. OVMF's EFI_SMM_CONTROL2_PROTOCOL.Trigger() >>>> can utilize this to accommodate edk2's preference about "broadcast" SMI. >>>> >>>> SeaBIOS uses values 0x00 and 0x01 for APM_STS (called PORT_SMI_STATUS in >>>> the SeaBIOS code), so this change should be transparent to it. >>>> >>>> While the original posting of this patch >>>> <http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html> >>>> only intended to speed up (2), based on our recent "stress testing" of SMM >>>> this patch actually provides functional improvements. (There are no code >>>> changes relative to the original posting.) >>>> >>>> Cc: "Michael S. Tsirkin" <mst@redhat.com> >>>> Cc: Paolo Bonzini <pbonzini@redhat.com> >>>> Also-suggested-by: Paolo Bonzini <pbonzini@redhat.com> >>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com> >>> >>> I'm queuing this for 2.8, >> >> Thank you! >> >>> but I have a question---should this feature be >>> detectable, and if so how? >> >> That's the exact question I wanted to ask you. :) >> >> ... How about an fw_cfg file? For example: >> >> - name: etc/broadcast_smi >> - value type: uint8_t >> - role: if the file exists, the broadcast SMI capability exists. Read >> the uint8_t value from the fw_cfg file. Write the uint8_t value read to >> ICH9_APM_STS first, before triggering the SMI via ICH9_APM_CNT, to >> request a broadcast SMI. The values 0 and 1 are reserved for SeaBIOS, so >> if the fw_cfg file exists, those values will never be provided. >> >> This would allow me to make the OVMF patches more robust: >> - first, I don't have to hardcode the value 'Q' in SmmControl2Dxe, >> - second, I can check for the presence of "etc/broadcast_smi" in >> PlatformPei, and set PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode >> dynamically. >> >> I would quite like this approach, as simply reverting the >> PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode settings in OVMF will break >> SMM hard on QEMUs that do not support the broadcast SMI feature. The >> default would remain the current setting. >> >> Also, if we add the fw_cfg item, we don't need to rush this into 2.8 I >> think (of course I wouldn't mind making 2.8 nevertheless). >> >> Do you think we should make the broadcast SMI capability (and the >> descriptor fw_cfg file) machine-type dependent? I do think so (if for >> nothing else then for "rather safe than sorry"), but I always struggle >> with this kind of question, so any advice is most welcome... >> >> Thank you! >> Laszlo > > Hmm it's a bugfix so why not just backport the fix? > I don't see why do we want work-arounds in UEFI - > does not seem much easier. If the consensus is that the patch is a QEMU bugfix (as opposed to a feature) and that it is eligible for the currently supported upstream stable branches, that's the best, no doubt. For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually correct; when I was writing the OVMF docs, I must have misunderstood the requirements and needlessly required 2.5+; 2.4+ should have been fine.) Which means the fix should be backported as far as stable-2.4. Should we proceed with that? CC'ing Mike Roth and the stable list. Thanks! Laszlo > > >>> >>> Paolo >>> >>>> --- >>>> hw/isa/lpc_ich9.c | 12 +++++++++++- >>>> 1 file changed, 11 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c >>>> index 10d1ee8b9310..f2fe644fdaa4 100644 >>>> --- a/hw/isa/lpc_ich9.c >>>> +++ b/hw/isa/lpc_ich9.c >>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled) >>>> >>>> /* APM */ >>>> >>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' >>>> + >>>> static void ich9_apm_ctrl_changed(uint32_t val, void *arg) >>>> { >>>> ICH9LPCState *lpc = arg; >>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, void *arg) >>>> >>>> /* SMI_EN = PMBASE + 30. SMI control and enable register */ >>>> if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { >>>> - cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); >>>> + if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { >>>> + CPUState *cs; >>>> + >>>> + CPU_FOREACH(cs) { >>>> + cpu_interrupt(cs, CPU_INTERRUPT_SMI); >>>> + } >>>> + } else { >>>> + cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); >>>> + } >>>> } >>>> } >>>> >>>>
> If the consensus is that the patch is a QEMU bugfix (as opposed to a > feature) and that it is eligible for the currently supported upstream > stable branches, that's the best, no doubt. The currently supported upstream stable branches is just 2.7. :) I'm okay with bending the rules and including it in 2.8, but it's worrisome that you also needed to go back from relaxed to traditional delivery, meaning that old QEMU + new OVMF will take ages to boot. If this is the case, I still think this needs some kind of discovery mechanism, unless OVMF can just say "things were too broken, stop supporting SMM on QEMUs older than 2.8". For example: - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP setting is used for the PCD; this would be backwards compatibility mode. - we could have another magic 0xB2 value, which is implemented directly in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) to detect the new feature. It can fail to start if using traditional AP and the new feature is not there. By the way, in case OVMF needs to use SmmSwDispatch in the future, I would make QEMU use broadcast behavior for all values in the 0x10-0xff range, or something like that. Paolo > For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The > SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually > correct; when I was writing the OVMF docs, I must have misunderstood the > requirements and needlessly required 2.5+; 2.4+ should have been fine.) > > Which means the fix should be backported as far as stable-2.4. > > Should we proceed with that? CC'ing Mike Roth and the stable list. > > Thanks! > Laszlo > > > > > > >>> > >>> Paolo > >>> > >>>> --- > >>>> hw/isa/lpc_ich9.c | 12 +++++++++++- > >>>> 1 file changed, 11 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c > >>>> index 10d1ee8b9310..f2fe644fdaa4 100644 > >>>> --- a/hw/isa/lpc_ich9.c > >>>> +++ b/hw/isa/lpc_ich9.c > >>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool > >>>> smm_enabled) > >>>> > >>>> /* APM */ > >>>> > >>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' > >>>> + > >>>> static void ich9_apm_ctrl_changed(uint32_t val, void *arg) > >>>> { > >>>> ICH9LPCState *lpc = arg; > >>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, > >>>> void *arg) > >>>> > >>>> /* SMI_EN = PMBASE + 30. SMI control and enable register */ > >>>> if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { > >>>> - cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > >>>> + if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { > >>>> + CPUState *cs; > >>>> + > >>>> + CPU_FOREACH(cs) { > >>>> + cpu_interrupt(cs, CPU_INTERRUPT_SMI); > >>>> + } > >>>> + } else { > >>>> + cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > >>>> + } > >>>> } > >>>> } > >>>> > >>>> > >
On Wed, Nov 16, 2016 at 07:47:42AM -0500, Paolo Bonzini wrote: > > > If the consensus is that the patch is a QEMU bugfix (as opposed to a > > feature) and that it is eligible for the currently supported upstream > > stable branches, that's the best, no doubt. > > The currently supported upstream stable branches is just 2.7. :) > > I'm okay with bending the rules and including it in 2.8, but it's > worrisome that you also needed to go back from relaxed to traditional > delivery, meaning that old QEMU + new OVMF will take ages to boot. > > If this is the case, I still think this needs some kind of discovery > mechanism, unless OVMF can just say "things were too broken, stop > supporting SMM on QEMUs older than 2.8". > > For example: > > - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP > setting is used for the PCD; this would be backwards compatibility mode. > > - we could have another magic 0xB2 value, which is implemented directly > in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > to detect the new feature. It can fail to start if using traditional > AP and the new feature is not there. If we keep collecting these magic values, should architect it and do a host/guest bitmap like virtio does? > By the way, in case OVMF needs to use SmmSwDispatch in the future, I > would make QEMU use broadcast behavior for all values in the 0x10-0xff > range, or something like that. > > Paolo It bothers me with all these ideas is that it's PV. Unavoidable? > > For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The > > SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually > > correct; when I was writing the OVMF docs, I must have misunderstood the > > requirements and needlessly required 2.5+; 2.4+ should have been fine.) > > > > Which means the fix should be backported as far as stable-2.4. > > > > Should we proceed with that? CC'ing Mike Roth and the stable list. > > > > Thanks! > > Laszlo > > > > > > > > > > >>> > > >>> Paolo > > >>> > > >>>> --- > > >>>> hw/isa/lpc_ich9.c | 12 +++++++++++- > > >>>> 1 file changed, 11 insertions(+), 1 deletion(-) > > >>>> > > >>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c > > >>>> index 10d1ee8b9310..f2fe644fdaa4 100644 > > >>>> --- a/hw/isa/lpc_ich9.c > > >>>> +++ b/hw/isa/lpc_ich9.c > > >>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool > > >>>> smm_enabled) > > >>>> > > >>>> /* APM */ > > >>>> > > >>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' > > >>>> + > > >>>> static void ich9_apm_ctrl_changed(uint32_t val, void *arg) > > >>>> { > > >>>> ICH9LPCState *lpc = arg; > > >>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, > > >>>> void *arg) > > >>>> > > >>>> /* SMI_EN = PMBASE + 30. SMI control and enable register */ > > >>>> if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { > > >>>> - cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > > >>>> + if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { > > >>>> + CPUState *cs; > > >>>> + > > >>>> + CPU_FOREACH(cs) { > > >>>> + cpu_interrupt(cs, CPU_INTERRUPT_SMI); > > >>>> + } > > >>>> + } else { > > >>>> + cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > > >>>> + } > > >>>> } > > >>>> } > > >>>> > > >>>> > > > >
On 16/11/2016 14:18, Michael S. Tsirkin wrote: > > - we could have another magic 0xB2 value, which is implemented directly > > in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > > to detect the new feature. It can fail to start if using traditional > > AP and the new feature is not there. > > If we keep collecting these magic values, should architect it > and do a host/guest bitmap like virtio does? The value written in 0xB3 can certainly be a feature bitmap. For now we would have for example bit 0 if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI bit 1-7 zero Paolo
On 11/16/16 13:47, Paolo Bonzini wrote: > >> If the consensus is that the patch is a QEMU bugfix (as opposed to a >> feature) and that it is eligible for the currently supported upstream >> stable branches, that's the best, no doubt. > > The currently supported upstream stable branches is just 2.7. :) > > I'm okay with bending the rules and including it in 2.8, but it's > worrisome that you also needed to go back from relaxed to traditional > delivery, meaning that old QEMU + new OVMF will take ages to boot. > > If this is the case, I still think this needs some kind of discovery > mechanism, unless OVMF can just say "things were too broken, stop > supporting SMM on QEMUs older than 2.8". > > For example: > > - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP > setting is used for the PCD; this would be backwards compatibility mode. Okay, but this still means that the PCD has to become dynamic, and we must set the PCD earlier (likely in PlatformPei) based on something. I guess that's what the next paragraph is about: > - we could have another magic 0xB2 value, which is implemented directly > in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > to detect the new feature. It can fail to start if using traditional > AP and the new feature is not there. Please explain in more detail. If I write to 0xB2 (by invoking the Trigger() method or somehow else), then on old QEMU's that will raise a sync / unicast SMI. The SMI handler in edk2 will run, but no request parameters will have been set up by OVMF, so the SMI handler will do... no clue what. I don't think this is a good idea. My preference is fw_cfg ATM. It provides a prove, flexible and extensible interface (it's easy to add new files for future features). If we expect more knobs in the area, I can modify my proposal to use "etc/smi/broadcast", so we can add "etc/smi/XXXX" later. Do you have any specific arguments against fw_cfg? As I suggested in my previous email, with fw_cfg I can implement the change in OVMF such that the default behavior wouldn't change -- the default delivery would remain relaxed, and the broadcast wouldn't be requested, unless the fw_cfg file told OVMF otherwise. > By the way, in case OVMF needs to use SmmSwDispatch in the future, I > would make QEMU use broadcast behavior for all values in the 0x10-0xff > range, or something like that. Are we talking control/command (0xB2) or scratch/data (0xB3) register values? My patches currently use the scratch/data register to provide the hint to QEMU; that register is less likely to interfere with anything the SMM core in edk2 does. I seem to recall that SmmSwDispatch uses command/control values to distinguish the called functions. Should we keep the broadcast / unicast decision separate from the control/command value ? Thanks Laszlo > > Paolo > >> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The >> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually >> correct; when I was writing the OVMF docs, I must have misunderstood the >> requirements and needlessly required 2.5+; 2.4+ should have been fine.) >> >> Which means the fix should be backported as far as stable-2.4. >> >> Should we proceed with that? CC'ing Mike Roth and the stable list. >> >> Thanks! >> Laszlo >> >>> >>> >>>>> >>>>> Paolo >>>>> >>>>>> --- >>>>>> hw/isa/lpc_ich9.c | 12 +++++++++++- >>>>>> 1 file changed, 11 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c >>>>>> index 10d1ee8b9310..f2fe644fdaa4 100644 >>>>>> --- a/hw/isa/lpc_ich9.c >>>>>> +++ b/hw/isa/lpc_ich9.c >>>>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool >>>>>> smm_enabled) >>>>>> >>>>>> /* APM */ >>>>>> >>>>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' >>>>>> + >>>>>> static void ich9_apm_ctrl_changed(uint32_t val, void *arg) >>>>>> { >>>>>> ICH9LPCState *lpc = arg; >>>>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, >>>>>> void *arg) >>>>>> >>>>>> /* SMI_EN = PMBASE + 30. SMI control and enable register */ >>>>>> if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { >>>>>> - cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); >>>>>> + if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { >>>>>> + CPUState *cs; >>>>>> + >>>>>> + CPU_FOREACH(cs) { >>>>>> + cpu_interrupt(cs, CPU_INTERRUPT_SMI); >>>>>> + } >>>>>> + } else { >>>>>> + cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); >>>>>> + } >>>>>> } >>>>>> } >>>>>> >>>>>> >> >>
On 11/16/16 14:18, Michael S. Tsirkin wrote: > On Wed, Nov 16, 2016 at 07:47:42AM -0500, Paolo Bonzini wrote: >> >>> If the consensus is that the patch is a QEMU bugfix (as opposed to a >>> feature) and that it is eligible for the currently supported upstream >>> stable branches, that's the best, no doubt. >> >> The currently supported upstream stable branches is just 2.7. :) >> >> I'm okay with bending the rules and including it in 2.8, but it's >> worrisome that you also needed to go back from relaxed to traditional >> delivery, meaning that old QEMU + new OVMF will take ages to boot. >> >> If this is the case, I still think this needs some kind of discovery >> mechanism, unless OVMF can just say "things were too broken, stop >> supporting SMM on QEMUs older than 2.8". >> >> For example: >> >> - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP >> setting is used for the PCD; this would be backwards compatibility mode. >> >> - we could have another magic 0xB2 value, which is implemented directly >> in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it >> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) >> to detect the new feature. It can fail to start if using traditional >> AP and the new feature is not there. > > If we keep collecting these magic values, should architect it > and do a host/guest bitmap like virtio does? A feature bitmap is not a bad idea; I can modify my proposal to say, '"etc/smi/features" is a little-endian uint64_t feature bitmap, where bit #0 is the availability of broadcast SMIs. Request it by writing 'Q' to STS before triggering an SMI via writing CNT'. Another example where we use a feature bitmap is fw_cfg itself (the DMA capability is signaled by bit 1). However, feature *negotiation* is overkill, in my opinion. > >> By the way, in case OVMF needs to use SmmSwDispatch in the future, I >> would make QEMU use broadcast behavior for all values in the 0x10-0xff >> range, or something like that. >> >> Paolo > > It bothers me with all these ideas is that it's PV. > Unavoidable? It seems so, yes -- as I understand it, the software-initiated SMI on bare metal Q35 is meant to be broadcast unconditionally, but we had diverged from that in our Q35 implementation, historically. SeaBIOS came to rely on the unicast nature of QEMU's SMI (AIUI) and now we have to invent a way to select the non-historical broadcast. ( BTW, I foresee further Frankensteinization of Q35, as the maximum amount of SMRAM (TSEG) it provides, by spec, is 8MB, and that might not be enough for a very large VCPU count. (The SMM stack was originally tested against 255 VCPUs, yes, but the VCPU max continues to grow, plus edk2 developers keep adding SMM features that require more SMRAM -- sometimes more SMRAM even per CPU.) We have one unused bit pattern left in the TSEG_SZ bit field of the ESMRAMC register, namely binary 11, which stands for "reserved". We might want to commandeer that down the line, and associate a really large SMRAM / TSEG size with it -- 128MB or 256MB, for example. Or, we could use it to signal some other way for TSEG size configuration. The TSEG is carved out of the end of the <4GB RAM, so larger TSEGs than 8MB should fit, as long as the guest is started with enough memory. Anyway, I digress... ) Thanks Laszlo > >>> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The >>> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually >>> correct; when I was writing the OVMF docs, I must have misunderstood the >>> requirements and needlessly required 2.5+; 2.4+ should have been fine.) >>> >>> Which means the fix should be backported as far as stable-2.4. >>> >>> Should we proceed with that? CC'ing Mike Roth and the stable list. >>> >>> Thanks! >>> Laszlo >>> >>>> >>>> >>>>>> >>>>>> Paolo >>>>>> >>>>>>> --- >>>>>>> hw/isa/lpc_ich9.c | 12 +++++++++++- >>>>>>> 1 file changed, 11 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c >>>>>>> index 10d1ee8b9310..f2fe644fdaa4 100644 >>>>>>> --- a/hw/isa/lpc_ich9.c >>>>>>> +++ b/hw/isa/lpc_ich9.c >>>>>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool >>>>>>> smm_enabled) >>>>>>> >>>>>>> /* APM */ >>>>>>> >>>>>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' >>>>>>> + >>>>>>> static void ich9_apm_ctrl_changed(uint32_t val, void *arg) >>>>>>> { >>>>>>> ICH9LPCState *lpc = arg; >>>>>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, >>>>>>> void *arg) >>>>>>> >>>>>>> /* SMI_EN = PMBASE + 30. SMI control and enable register */ >>>>>>> if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { >>>>>>> - cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); >>>>>>> + if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { >>>>>>> + CPUState *cs; >>>>>>> + >>>>>>> + CPU_FOREACH(cs) { >>>>>>> + cpu_interrupt(cs, CPU_INTERRUPT_SMI); >>>>>>> + } >>>>>>> + } else { >>>>>>> + cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); >>>>>>> + } >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>> >>>
On 11/16/16 15:05, Paolo Bonzini wrote: > > > On 16/11/2016 14:18, Michael S. Tsirkin wrote: >>> - we could have another magic 0xB2 value, which is implemented directly >>> in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it >>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) >>> to detect the new feature. It can fail to start if using traditional >>> AP and the new feature is not there. >> >> If we keep collecting these magic values, should architect it >> and do a host/guest bitmap like virtio does? > > The value written in 0xB3 can certainly be a feature bitmap. For now we > would have for example > > bit 0 if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI > bit 1-7 zero Doable, but: - doesn't address how OVMF learns about the broadcast SMI availability, - the command value OVMF currently writes is 0. How about this: - etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0 stands for broadcast SMI availability - 0xB2 is the command value (independent of 0xB3) - 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS reserves bit#0 already (uses values 0 and 1), so we can use the remaining 7 bits for requesting features. Bit#1 (value 2) could be the broadcast SMI. This does resemble a kind of feature negotiation, except the host cannot signal back an error (unsupported combination of features), like virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags. Thanks Laszlo
> I guess that's what the next paragraph is about: > > > - we could have another magic 0xB2 value, which is implemented directly > > in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > > to detect the new feature. It can fail to start if using traditional > > AP and the new feature is not there. > > Please explain in more detail. If I write to 0xB2 (by invoking the > Trigger() method or somehow else), then on old QEMU's that will raise a > sync / unicast SMI. The SMI handler in edk2 will run, but no request > parameters will have been set up by OVMF, so the SMI handler will do... > no clue what. It should hopefully do nothing. A spurious SMI (such as the one caused by the write to 0xB2) should not crash OVMF. SMBASE relocation uses IPIs, so my hope was to use the SmmCpuFeaturesSmmRelocationComplete hook. > My preference is fw_cfg ATM. It provides a prove, flexible and > extensible interface (it's easy to add new files for future features). > If we expect more knobs in the area, I can modify my proposal to use > "etc/smi/broadcast", so we can add "etc/smi/XXXX" later. Did you know there are 16 entries only for fw_cfg files? :) And we're using already 20 in the worst case: genroms/linuxboot.bin genroms/kvmvapic.bin NVDIMM_DSM_MEM_FILE "etc/smbios/smbios-tables" "etc/smbios/smbios-anchor" "etc/acpi/tables" "etc/table-loader" ACPI_BUILD_TPMLOG_FILE ACPI_BUILD_RSDP_FILE "etc/e820" "etc/msr_feature_control" "etc/reserved-memory-end" "etc/pvpanic-port" "etc/boot-menu-wait" "bootsplash.jpg" "etc/boot-fail-wait" "etc/igd-opregion" "etc/igd-bdsm-size" "etc/extra-pci-roots" "bootorder" Therefore, so close to the release I'm a bit worried about doing changes to fw_cfg or adding more fw_cfg files. Though we just got rid of one file for the number of CPUs, so I guess we might not care. > Do you have any specific arguments against fw_cfg? As I suggested in my > previous email, with fw_cfg I can implement the change in OVMF such that > the default behavior wouldn't change -- the default delivery would > remain relaxed, and the broadcast wouldn't be requested, unless the > fw_cfg file told OVMF otherwise. > > > By the way, in case OVMF needs to use SmmSwDispatch in the future, I > > would make QEMU use broadcast behavior for all values in the 0x10-0xff > > range, or something like that. > > Are we talking control/command (0xB2) or scratch/data (0xB3) register > values? My patches currently use the scratch/data register to provide > the hint to QEMU; that register is less likely to interfere with > anything the SMM core in edk2 does. Sorry I confused the two registers. 0xb3 is more or less unused as far as I can see indeed. Paolo
On 11/16/16 19:04, Paolo Bonzini wrote: >> I guess that's what the next paragraph is about: >> >>> - we could have another magic 0xB2 value, which is implemented directly >>> in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it >>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) >>> to detect the new feature. It can fail to start if using traditional >>> AP and the new feature is not there. >> >> Please explain in more detail. If I write to 0xB2 (by invoking the >> Trigger() method or somehow else), then on old QEMU's that will raise a >> sync / unicast SMI. The SMI handler in edk2 will run, but no request >> parameters will have been set up by OVMF, so the SMI handler will do... >> no clue what. > > It should hopefully do nothing. A spurious SMI (such as the one caused > by the write to 0xB2) should not crash OVMF. > > SMBASE relocation uses IPIs, so my hope was to use the > SmmCpuFeaturesSmmRelocationComplete hook. From a cursory look, SmmCpuFeaturesSmmRelocationComplete() seems to be called early enough from PiSmmCpuDxeSmm that we might be able to call PcdSet() from it, for updating PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode. I perceive it a bit too close to the edge :) >> My preference is fw_cfg ATM. It provides a prove, flexible and >> extensible interface (it's easy to add new files for future features). >> If we expect more knobs in the area, I can modify my proposal to use >> "etc/smi/broadcast", so we can add "etc/smi/XXXX" later. > > Did you know there are 16 entries only for fw_cfg files? :) Yes, I've known that, but it can be changed by redefining FW_CFG_FILE_SLOTS, can't it? The key type for fw_cfg is uint16_t, so we should have some reserves. > And we're > using already 20 in the worst case: > > genroms/linuxboot.bin > genroms/kvmvapic.bin > NVDIMM_DSM_MEM_FILE > "etc/smbios/smbios-tables" > "etc/smbios/smbios-anchor" > "etc/acpi/tables" > "etc/table-loader" > ACPI_BUILD_TPMLOG_FILE > ACPI_BUILD_RSDP_FILE > "etc/e820" > "etc/msr_feature_control" > "etc/reserved-memory-end" > "etc/pvpanic-port" > "etc/boot-menu-wait" > "bootsplash.jpg" > "etc/boot-fail-wait" > "etc/igd-opregion" > "etc/igd-bdsm-size" > "etc/extra-pci-roots" > "bootorder" > > Therefore, so close to the release I'm a bit worried about doing > changes to fw_cfg or adding more fw_cfg files. Though we just got > rid of one file for the number of CPUs, so I guess we might not care. I agree with your caution about this. I'm also perfectly fine if this update misses 2.8. :) > >> Do you have any specific arguments against fw_cfg? As I suggested in my >> previous email, with fw_cfg I can implement the change in OVMF such that >> the default behavior wouldn't change -- the default delivery would >> remain relaxed, and the broadcast wouldn't be requested, unless the >> fw_cfg file told OVMF otherwise. >> >>> By the way, in case OVMF needs to use SmmSwDispatch in the future, I >>> would make QEMU use broadcast behavior for all values in the 0x10-0xff >>> range, or something like that. >> >> Are we talking control/command (0xB2) or scratch/data (0xB3) register >> values? My patches currently use the scratch/data register to provide >> the hint to QEMU; that register is less likely to interfere with >> anything the SMM core in edk2 does. > > Sorry I confused the two registers. 0xb3 is more or less unused as far > as I can see indeed. Thanks Laszlo
On Wed, Nov 16, 2016 at 07:03:27PM +0100, Laszlo Ersek wrote: > On 11/16/16 15:05, Paolo Bonzini wrote: > > > > > > On 16/11/2016 14:18, Michael S. Tsirkin wrote: > >>> - we could have another magic 0xB2 value, which is implemented directly > >>> in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > >>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > >>> to detect the new feature. It can fail to start if using traditional > >>> AP and the new feature is not there. > >> > >> If we keep collecting these magic values, should architect it > >> and do a host/guest bitmap like virtio does? > > > > The value written in 0xB3 can certainly be a feature bitmap. For now we > > would have for example > > > > bit 0 if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI > > bit 1-7 zero > > Doable, but: > - doesn't address how OVMF learns about the broadcast SMI availability, > - the command value OVMF currently writes is 0. > > How about this: > - etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0 > stands for broadcast SMI availability > - 0xB2 is the command value (independent of 0xB3) > - 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS > reserves bit#0 already (uses values 0 and 1), so we can use the > remaining 7 bits for requesting features. Bit#1 (value 2) could be the > broadcast SMI. > > This does resemble a kind of feature negotiation, except the host cannot > signal back an error (unsupported combination of features), like > virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags. > > Thanks > Laszlo I think that if you are going to do it, do it like 1.0: - same bitmap for host and guest. how about a writeable fw cfg file? - use 0XB3 bit for FEATURES_OK
On Wed, Nov 16, 2016 at 06:37:30PM +0100, Laszlo Ersek wrote: > On 11/16/16 13:47, Paolo Bonzini wrote: > > > >> If the consensus is that the patch is a QEMU bugfix (as opposed to a > >> feature) and that it is eligible for the currently supported upstream > >> stable branches, that's the best, no doubt. > > > > The currently supported upstream stable branches is just 2.7. :) > > > > I'm okay with bending the rules and including it in 2.8, but it's > > worrisome that you also needed to go back from relaxed to traditional > > delivery, meaning that old QEMU + new OVMF will take ages to boot. > > > > If this is the case, I still think this needs some kind of discovery > > mechanism, unless OVMF can just say "things were too broken, stop > > supporting SMM on QEMUs older than 2.8". > > > > For example: > > > > - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP > > setting is used for the PCD; this would be backwards compatibility mode. > > Okay, but this still means that the PCD has to become dynamic, and we > must set the PCD earlier (likely in PlatformPei) based on something. > > I guess that's what the next paragraph is about: > > > - we could have another magic 0xB2 value, which is implemented directly > > in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > > to detect the new feature. It can fail to start if using traditional > > AP and the new feature is not there. > > Please explain in more detail. If I write to 0xB2 (by invoking the > Trigger() method or somehow else), then on old QEMU's that will raise a > sync / unicast SMI. The SMI handler in edk2 will run, but no request > parameters will have been set up by OVMF, so the SMI handler will do... > no clue what. I don't think this is a good idea. > > My preference is fw_cfg ATM. It provides a prove, flexible and > extensible interface (it's easy to add new files for future features). > If we expect more knobs in the area, I can modify my proposal to use > "etc/smi/broadcast", so we can add "etc/smi/XXXX" later. > > Do you have any specific arguments against fw_cfg? As I suggested in my > previous email, with fw_cfg I can implement the change in OVMF such that > the default behavior wouldn't change -- the default delivery would > remain relaxed, and the broadcast wouldn't be requested, unless the > fw_cfg file told OVMF otherwise. Only thing is, I think it's a good idea in the future to be able to build OVMF without legacy QEMU support. E.g. there are all people that want to speed up boot. Add some ifdefs in code for that? And add comments to document which version needs these hacks. > > By the way, in case OVMF needs to use SmmSwDispatch in the future, I > > would make QEMU use broadcast behavior for all values in the 0x10-0xff > > range, or something like that. > > Are we talking control/command (0xB2) or scratch/data (0xB3) register > values? My patches currently use the scratch/data register to provide > the hint to QEMU; that register is less likely to interfere with > anything the SMM core in edk2 does. I seem to recall that SmmSwDispatch > uses command/control values to distinguish the called functions. Should > we keep the broadcast / unicast decision separate from the > control/command value ? > > Thanks > Laszlo > > > > > Paolo > > > >> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The > >> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually > >> correct; when I was writing the OVMF docs, I must have misunderstood the > >> requirements and needlessly required 2.5+; 2.4+ should have been fine.) > >> > >> Which means the fix should be backported as far as stable-2.4. > >> > >> Should we proceed with that? CC'ing Mike Roth and the stable list. > >> > >> Thanks! > >> Laszlo > >> > >>> > >>> > >>>>> > >>>>> Paolo > >>>>> > >>>>>> --- > >>>>>> hw/isa/lpc_ich9.c | 12 +++++++++++- > >>>>>> 1 file changed, 11 insertions(+), 1 deletion(-) > >>>>>> > >>>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c > >>>>>> index 10d1ee8b9310..f2fe644fdaa4 100644 > >>>>>> --- a/hw/isa/lpc_ich9.c > >>>>>> +++ b/hw/isa/lpc_ich9.c > >>>>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool > >>>>>> smm_enabled) > >>>>>> > >>>>>> /* APM */ > >>>>>> > >>>>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' > >>>>>> + > >>>>>> static void ich9_apm_ctrl_changed(uint32_t val, void *arg) > >>>>>> { > >>>>>> ICH9LPCState *lpc = arg; > >>>>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, > >>>>>> void *arg) > >>>>>> > >>>>>> /* SMI_EN = PMBASE + 30. SMI control and enable register */ > >>>>>> if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { > >>>>>> - cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > >>>>>> + if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { > >>>>>> + CPUState *cs; > >>>>>> + > >>>>>> + CPU_FOREACH(cs) { > >>>>>> + cpu_interrupt(cs, CPU_INTERRUPT_SMI); > >>>>>> + } > >>>>>> + } else { > >>>>>> + cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > >>>>>> + } > >>>>>> } > >>>>>> } > >>>>>> > >>>>>> > >> > >>
On Wed, Nov 16, 2016 at 01:04:00PM -0500, Paolo Bonzini wrote: > > I guess that's what the next paragraph is about: > > > > > - we could have another magic 0xB2 value, which is implemented directly > > > in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > > > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > > > to detect the new feature. It can fail to start if using traditional > > > AP and the new feature is not there. > > > > Please explain in more detail. If I write to 0xB2 (by invoking the > > Trigger() method or somehow else), then on old QEMU's that will raise a > > sync / unicast SMI. The SMI handler in edk2 will run, but no request > > parameters will have been set up by OVMF, so the SMI handler will do... > > no clue what. > > It should hopefully do nothing. A spurious SMI (such as the one caused > by the write to 0xB2) should not crash OVMF. > > SMBASE relocation uses IPIs, so my hope was to use the > SmmCpuFeaturesSmmRelocationComplete hook. > > > My preference is fw_cfg ATM. It provides a prove, flexible and > > extensible interface (it's easy to add new files for future features). > > If we expect more knobs in the area, I can modify my proposal to use > > "etc/smi/broadcast", so we can add "etc/smi/XXXX" later. > > Did you know there are 16 entries only for fw_cfg files? :) And we're > using already 20 in the worst case: > > genroms/linuxboot.bin > genroms/kvmvapic.bin > NVDIMM_DSM_MEM_FILE > "etc/smbios/smbios-tables" > "etc/smbios/smbios-anchor" > "etc/acpi/tables" > "etc/table-loader" > ACPI_BUILD_TPMLOG_FILE > ACPI_BUILD_RSDP_FILE > "etc/e820" > "etc/msr_feature_control" > "etc/reserved-memory-end" > "etc/pvpanic-port" > "etc/boot-menu-wait" > "bootsplash.jpg" > "etc/boot-fail-wait" > "etc/igd-opregion" > "etc/igd-bdsm-size" > "etc/extra-pci-roots" > "bootorder" > > Therefore, so close to the release I'm a bit worried about doing > changes to fw_cfg or adding more fw_cfg files. Indeed. Is an unconditional thing so bad? What would be the observed behaviour with new OVMF on old QEMU? Note you need to migrate during boot to notice this. > Though we just got > rid of one file for the number of CPUs, so I guess we might not care. > > > Do you have any specific arguments against fw_cfg? As I suggested in my > > previous email, with fw_cfg I can implement the change in OVMF such that > > the default behavior wouldn't change -- the default delivery would > > remain relaxed, and the broadcast wouldn't be requested, unless the > > fw_cfg file told OVMF otherwise. > > > > > By the way, in case OVMF needs to use SmmSwDispatch in the future, I > > > would make QEMU use broadcast behavior for all values in the 0x10-0xff > > > range, or something like that. > > > > Are we talking control/command (0xB2) or scratch/data (0xB3) register > > values? My patches currently use the scratch/data register to provide > > the hint to QEMU; that register is less likely to interfere with > > anything the SMM core in edk2 does. > > Sorry I confused the two registers. 0xb3 is more or less unused as far > as I can see indeed. > > Paolo
On 11/16/16 21:38, Michael S. Tsirkin wrote: > On Wed, Nov 16, 2016 at 01:04:00PM -0500, Paolo Bonzini wrote: >>> I guess that's what the next paragraph is about: >>> >>>> - we could have another magic 0xB2 value, which is implemented directly >>>> in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it >>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) >>>> to detect the new feature. It can fail to start if using traditional >>>> AP and the new feature is not there. >>> >>> Please explain in more detail. If I write to 0xB2 (by invoking the >>> Trigger() method or somehow else), then on old QEMU's that will raise a >>> sync / unicast SMI. The SMI handler in edk2 will run, but no request >>> parameters will have been set up by OVMF, so the SMI handler will do... >>> no clue what. >> >> It should hopefully do nothing. A spurious SMI (such as the one caused >> by the write to 0xB2) should not crash OVMF. >> >> SMBASE relocation uses IPIs, so my hope was to use the >> SmmCpuFeaturesSmmRelocationComplete hook. >> >>> My preference is fw_cfg ATM. It provides a prove, flexible and >>> extensible interface (it's easy to add new files for future features). >>> If we expect more knobs in the area, I can modify my proposal to use >>> "etc/smi/broadcast", so we can add "etc/smi/XXXX" later. >> >> Did you know there are 16 entries only for fw_cfg files? :) And we're >> using already 20 in the worst case: >> >> genroms/linuxboot.bin >> genroms/kvmvapic.bin >> NVDIMM_DSM_MEM_FILE >> "etc/smbios/smbios-tables" >> "etc/smbios/smbios-anchor" >> "etc/acpi/tables" >> "etc/table-loader" >> ACPI_BUILD_TPMLOG_FILE >> ACPI_BUILD_RSDP_FILE >> "etc/e820" >> "etc/msr_feature_control" >> "etc/reserved-memory-end" >> "etc/pvpanic-port" >> "etc/boot-menu-wait" >> "bootsplash.jpg" >> "etc/boot-fail-wait" >> "etc/igd-opregion" >> "etc/igd-bdsm-size" >> "etc/extra-pci-roots" >> "bootorder" >> >> Therefore, so close to the release I'm a bit worried about doing >> changes to fw_cfg or adding more fw_cfg files. > > Indeed. Is an unconditional thing so bad? > What would be the observed behaviour with new OVMF on old QEMU? The SMM stack would expect broadcast SMIs but only unicast SMIs would occur. This would - introduce very long delays in the handling on each SMI as the synchronization algorithms time out and then force individual APs into SMM by LAPIC writes, - expose obscure synchronization bugs in the SMM stack, especially during S3 resume. The directed / unicast SMI model is less tested in edk2 and a number of super-obscure corner cases remain. Thanks Laszlo > Note you need to migrate during boot to notice this. > >> Though we just got >> rid of one file for the number of CPUs, so I guess we might not care. >> >>> Do you have any specific arguments against fw_cfg? As I suggested in my >>> previous email, with fw_cfg I can implement the change in OVMF such that >>> the default behavior wouldn't change -- the default delivery would >>> remain relaxed, and the broadcast wouldn't be requested, unless the >>> fw_cfg file told OVMF otherwise. >>> >>>> By the way, in case OVMF needs to use SmmSwDispatch in the future, I >>>> would make QEMU use broadcast behavior for all values in the 0x10-0xff >>>> range, or something like that. >>> >>> Are we talking control/command (0xB2) or scratch/data (0xB3) register >>> values? My patches currently use the scratch/data register to provide >>> the hint to QEMU; that register is less likely to interfere with >>> anything the SMM core in edk2 does. >> >> Sorry I confused the two registers. 0xb3 is more or less unused as far >> as I can see indeed. >> >> Paolo
On 11/16/16 21:27, Michael S. Tsirkin wrote: > On Wed, Nov 16, 2016 at 07:03:27PM +0100, Laszlo Ersek wrote: >> On 11/16/16 15:05, Paolo Bonzini wrote: >>> >>> >>> On 16/11/2016 14:18, Michael S. Tsirkin wrote: >>>>> - we could have another magic 0xB2 value, which is implemented directly >>>>> in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it >>>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) >>>>> to detect the new feature. It can fail to start if using traditional >>>>> AP and the new feature is not there. >>>> >>>> If we keep collecting these magic values, should architect it >>>> and do a host/guest bitmap like virtio does? >>> >>> The value written in 0xB3 can certainly be a feature bitmap. For now we >>> would have for example >>> >>> bit 0 if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI >>> bit 1-7 zero >> >> Doable, but: >> - doesn't address how OVMF learns about the broadcast SMI availability, >> - the command value OVMF currently writes is 0. >> >> How about this: >> - etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0 >> stands for broadcast SMI availability >> - 0xB2 is the command value (independent of 0xB3) >> - 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS >> reserves bit#0 already (uses values 0 and 1), so we can use the >> remaining 7 bits for requesting features. Bit#1 (value 2) could be the >> broadcast SMI. >> >> This does resemble a kind of feature negotiation, except the host cannot >> signal back an error (unsupported combination of features), like >> virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags. >> >> Thanks >> Laszlo > > I think that if you are going to do it, do it like 1.0: > - same bitmap for host and guest. how about a writeable fw cfg file? fw_cfg files are not writeable since qemu 2.4 (see commits 023e3148567ac and 6cec43e178cde). How about this alternative, in STS: - bit 0: read and written transparently - bit 1: on write: 0 -- set features in bits 2-7 1 -- query host features into bits 2-7 on read: - after querying features: - reads back as 0 if the interface is supported - reads back as 1 if the interface is missing - after setting features: - reads back as 0 if the feature subset is valid - reads back as 1 otherwise - bit 2: on write: - when setting features: request broadcast SMI - when querying features: ignored on read: - after setting features: zero - after querying features: broadcast SMI availability (1 if available) - bit 3-7: future features (I think 5 more features for SMI handling should suffice), working similarly to bit 2 SeaBIOS writes values 0x00 and 0x01, and expects to find the same when reading back. Bit pattern 0000_000?b translates to "clear all features", which always succeeds and results in behavior identical to the current one, hence bits 1-7 read back as zero. OVMF: - write 0x02, read back value: - if bit 1 is set, interface is missing - otherwise feature bitmap was returned in bits 2-7 - select requested features in bits 2-7, set bit 1 to 0, write value, read back value - if bit 1 is set, the feature subset is invalid - okay otherwise Thanks Laszlo > - use 0XB3 bit for FEATURES_OK >
On Thu, Nov 17, 2016 at 02:16:35PM +0100, Laszlo Ersek wrote: > On 11/16/16 21:27, Michael S. Tsirkin wrote: > > On Wed, Nov 16, 2016 at 07:03:27PM +0100, Laszlo Ersek wrote: > >> On 11/16/16 15:05, Paolo Bonzini wrote: > >>> > >>> > >>> On 16/11/2016 14:18, Michael S. Tsirkin wrote: > >>>>> - we could have another magic 0xB2 value, which is implemented directly > >>>>> in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > >>>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > >>>>> to detect the new feature. It can fail to start if using traditional > >>>>> AP and the new feature is not there. > >>>> > >>>> If we keep collecting these magic values, should architect it > >>>> and do a host/guest bitmap like virtio does? > >>> > >>> The value written in 0xB3 can certainly be a feature bitmap. For now we > >>> would have for example > >>> > >>> bit 0 if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI > >>> bit 1-7 zero > >> > >> Doable, but: > >> - doesn't address how OVMF learns about the broadcast SMI availability, > >> - the command value OVMF currently writes is 0. > >> > >> How about this: > >> - etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0 > >> stands for broadcast SMI availability > >> - 0xB2 is the command value (independent of 0xB3) > >> - 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS > >> reserves bit#0 already (uses values 0 and 1), so we can use the > >> remaining 7 bits for requesting features. Bit#1 (value 2) could be the > >> broadcast SMI. > >> > >> This does resemble a kind of feature negotiation, except the host cannot > >> signal back an error (unsupported combination of features), like > >> virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags. > >> > >> Thanks > >> Laszlo > > > > I think that if you are going to do it, do it like 1.0: > > - same bitmap for host and guest. how about a writeable fw cfg file? > > fw_cfg files are not writeable since qemu 2.4 (see commits 023e3148567ac > and 6cec43e178cde). > > How about this alternative, in STS: > - bit 0: read and written transparently > - bit 1: on write: > 0 -- set features in bits 2-7 > 1 -- query host features into bits 2-7 > on read: > - after querying features: > - reads back as 0 if the interface is supported > - reads back as 1 if the interface is missing > - after setting features: > - reads back as 0 if the feature subset is valid > - reads back as 1 otherwise > - bit 2: on write: > - when setting features: request broadcast SMI > - when querying features: ignored > on read: > - after setting features: zero > - after querying features: broadcast SMI availability (1 if > available) > > - bit 3-7: future features (I think 5 more features for SMI handling > should suffice), working similarly to bit 2 > > SeaBIOS writes values 0x00 and 0x01, and expects to find the same when > reading back. Bit pattern 0000_000?b translates to "clear all > features", which always succeeds and results in behavior identical to > the current one, hence bits 1-7 read back as zero. > > OVMF: > - write 0x02, read back value: > - if bit 1 is set, interface is missing > - otherwise feature bitmap was returned in bits 2-7 > - select requested features in bits 2-7, set bit 1 to 0, write value, > read back value > - if bit 1 is set, the feature subset is invalid > - okay otherwise > > Thanks > Laszlo It's all fine, or we can make fw cfg writeable again (I posted a patch for that a while ago), but it's all a bit too much for this release. Let's just defer it, or do you have a better idea? > > - use 0XB3 bit for FEATURES_OK > >
On 11/17/16 18:46, Michael S. Tsirkin wrote: > On Thu, Nov 17, 2016 at 02:16:35PM +0100, Laszlo Ersek wrote: >> On 11/16/16 21:27, Michael S. Tsirkin wrote: >>> On Wed, Nov 16, 2016 at 07:03:27PM +0100, Laszlo Ersek wrote: >>>> On 11/16/16 15:05, Paolo Bonzini wrote: >>>>> >>>>> >>>>> On 16/11/2016 14:18, Michael S. Tsirkin wrote: >>>>>>> - we could have another magic 0xB2 value, which is implemented directly >>>>>>> in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it >>>>>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) >>>>>>> to detect the new feature. It can fail to start if using traditional >>>>>>> AP and the new feature is not there. >>>>>> >>>>>> If we keep collecting these magic values, should architect it >>>>>> and do a host/guest bitmap like virtio does? >>>>> >>>>> The value written in 0xB3 can certainly be a feature bitmap. For now we >>>>> would have for example >>>>> >>>>> bit 0 if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI >>>>> bit 1-7 zero >>>> >>>> Doable, but: >>>> - doesn't address how OVMF learns about the broadcast SMI availability, >>>> - the command value OVMF currently writes is 0. >>>> >>>> How about this: >>>> - etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0 >>>> stands for broadcast SMI availability >>>> - 0xB2 is the command value (independent of 0xB3) >>>> - 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS >>>> reserves bit#0 already (uses values 0 and 1), so we can use the >>>> remaining 7 bits for requesting features. Bit#1 (value 2) could be the >>>> broadcast SMI. >>>> >>>> This does resemble a kind of feature negotiation, except the host cannot >>>> signal back an error (unsupported combination of features), like >>>> virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags. >>>> >>>> Thanks >>>> Laszlo >>> >>> I think that if you are going to do it, do it like 1.0: >>> - same bitmap for host and guest. how about a writeable fw cfg file? >> >> fw_cfg files are not writeable since qemu 2.4 (see commits 023e3148567ac >> and 6cec43e178cde). >> >> How about this alternative, in STS: >> - bit 0: read and written transparently >> - bit 1: on write: >> 0 -- set features in bits 2-7 >> 1 -- query host features into bits 2-7 >> on read: >> - after querying features: >> - reads back as 0 if the interface is supported >> - reads back as 1 if the interface is missing >> - after setting features: >> - reads back as 0 if the feature subset is valid >> - reads back as 1 otherwise >> - bit 2: on write: >> - when setting features: request broadcast SMI >> - when querying features: ignored >> on read: >> - after setting features: zero >> - after querying features: broadcast SMI availability (1 if >> available) >> >> - bit 3-7: future features (I think 5 more features for SMI handling >> should suffice), working similarly to bit 2 >> >> SeaBIOS writes values 0x00 and 0x01, and expects to find the same when >> reading back. Bit pattern 0000_000?b translates to "clear all >> features", which always succeeds and results in behavior identical to >> the current one, hence bits 1-7 read back as zero. >> >> OVMF: >> - write 0x02, read back value: >> - if bit 1 is set, interface is missing >> - otherwise feature bitmap was returned in bits 2-7 >> - select requested features in bits 2-7, set bit 1 to 0, write value, >> read back value >> - if bit 1 is set, the feature subset is invalid >> - okay otherwise >> >> Thanks >> Laszlo > > > It's all fine, or we can make fw cfg writeable again (I posted > a patch for that a while ago), but it's all a bit too much > for this release. > > Let's just defer it, or do you have a better idea? I'm writing patches for the above proposal (including a document under docs/specs/), and I plan to post them soon. They're definitely 2.9 material though -- I don't mind if I have to wait a bit even just to get feedback on those patches :) So, to make it formal for the patch that started this thread: Self-NAK Will post something better / more flexible soon, for 2.9. (Hopefully I'll remember to put "[for-2.9]" in the subject.) Thanks! Laszlo >>> - use 0XB3 bit for FEATURES_OK >>>
diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c index 10d1ee8b9310..f2fe644fdaa4 100644 --- a/hw/isa/lpc_ich9.c +++ b/hw/isa/lpc_ich9.c @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled) /* APM */ +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' + static void ich9_apm_ctrl_changed(uint32_t val, void *arg) { ICH9LPCState *lpc = arg; @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, void *arg) /* SMI_EN = PMBASE + 30. SMI control and enable register */ if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { - cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); + if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { + CPUState *cs; + + CPU_FOREACH(cs) { + cpu_interrupt(cs, CPU_INTERRUPT_SMI); + } + } else { + cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); + } } }
The generic edk2 SMM infrastructure prefers EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If Trigger() only brings the current processor into SMM, then edk2 handles it in the following ways: (1) If Trigger() is executed by the BSP (which is guaranteed before ExitBootServices(), but is not necessarily true at runtime), then: (a) If edk2 has been configured for "traditional" SMM synchronization, then the BSP sends directed SMIs to the APs with APIC delivery, bringing them into SMM individually. Then the BSP runs the SMI handler / dispatcher. (b) If edk2 has been configured for "relaxed" SMM synchronization, then the APs that are not already in SMM are not brought in, and the BSP runs the SMI handler / dispatcher. (2) If Trigger() is executed by an AP (which is possible after ExitBootServices(), and can be forced e.g. by "taskset -c 1 efibootmgr"), then the AP in question brings in the BSP with a directed SMI, and the BSP runs the SMI handler / dispatcher. The smaller problem with (1a) and (2) is that the BSP and AP synchronization is slow. For example, the "taskset -c 1 efibootmgr" command from (2) can take more than 3 seconds to complete, because efibootmgr accesses non-volatile UEFI variables intensively. The larger problem is that QEMU's current behavior diverges from the behavior usually seen on physical hardware, and that keeps exposing obscure corner cases, race conditions and other instabilities in edk2, which generally expects / prefers a software SMI to affect all CPUs at once. Therefore introduce a special APM_STS value (0x51) that causes QEMU to inject the SMI on all VCPUs. OVMF's EFI_SMM_CONTROL2_PROTOCOL.Trigger() can utilize this to accommodate edk2's preference about "broadcast" SMI. SeaBIOS uses values 0x00 and 0x01 for APM_STS (called PORT_SMI_STATUS in the SeaBIOS code), so this change should be transparent to it. While the original posting of this patch <http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html> only intended to speed up (2), based on our recent "stress testing" of SMM this patch actually provides functional improvements. (There are no code changes relative to the original posting.) Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Also-suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> --- hw/isa/lpc_ich9.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)