diff mbox

[edk2] KVM: MTRR: fix memory type handling if MTRR is completely disabled

Message ID 561F2952.5060300@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Xiao Guangrong Oct. 15, 2015, 4:19 a.m. UTC
On 10/15/2015 02:08 AM, Janusz wrote:
> W dniu 14.10.2015 o 10:32, Xiao Guangrong pisze:
>>
>>
>> On 10/14/2015 04:24 PM, Xiao Guangrong wrote:
>>>
>>>
>>> On 10/14/2015 03:37 PM, Janusz wrote:
>>>> I was able to run my virtual machine with this, but had very high cpu
>>>> usage when something happen in it like booting system. once, my virtual
>>>> machine hang and I couln't even get my mouse / keyboard back from qemu.
>>>> When I did vga passthrough, I didn't get any video output, and cpu
>>>> usage
>>>> was also high. Tried it on 4.3
>>>
>>> Which tree are you using? Is it kvm tree?
>>> Could you please work on queue brancn on current kvm tree based on
>>> top commit 73917739334c6509: KVM: x86: fix SMI to halted VCPU.
>>>
>>> Hmm... interesting, this diff works on my box...
>>
>> Forgot to say that i built my test env following the instructions on
>> kvm-wiki:
>> http://www.linux-kvm.org/page/OVMF
>>
>> My test script is attached, and i will try to build the env like yours
>> as much
>> as possible...
> I cloned git://git.kernel.org/pub/scm/virt/kvm/kvm.git 73917739334c6509
> commit, but this is breaking my system...
> Slim is not able to start i3, xdm is not killing X when I stop xdm, qemu
> is not able to start when I don't use option -nographic
> log from qemu on that kernel version:
> xcb_connection_has_error() returned true
> No protocol specified
> Could not initialize SDL(No available video device) - exiting
>
> On main kernel branch I don't have those problems.
>
> I tried to run with -nographic, and tried pc-i440fx-2.1 but the same
> problem as before, high cpu usage and no graphic on my GPU.
> I don't know if that will help by this is my log from option -global
> isa-debugcon.iobase=0x402 -debugcon file:fedora.ovmf.log:
> https://bpaste.net/show/36c54dba68c2

Well, the bug may be not in KVM. When this bug happened, i saw OVMF
only checked 1 CPU out, there is the log from OVMF's debug input:

   Flushing GCD
   Flushing GCD
   Flushing GCD
   Flushing GCD
   Flushing GCD
   Flushing GCD
   Flushing GCD
   Flushing GCD
   Flushing GCD
   Flushing GCDs
Detect CPU count: 1

So that the startup code has been freed however the APs are still running,
i think that why we saw the vCPUs executed on unexpected address.

After digging into OVMF's code, i noticed that BSP CPU waits for APs
for a fixed timer period, however, KVM recent changes require zap all
mappings if CR0.CD is changed, that means the APs need more time to
startup.

After following changes to OVMF, the bug is completely gone on my side:

previous kernel to do this test.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Janusz Oct. 15, 2015, 6:19 a.m. UTC | #1
W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>
>
>
> Well, the bug may be not in KVM. When this bug happened, i saw OVMF
> only checked 1 CPU out, there is the log from OVMF's debug input:
>
>   Flushing GCD
>   Flushing GCD
>   Flushing GCD
>   Flushing GCD
>   Flushing GCD
>   Flushing GCD
>   Flushing GCD
>   Flushing GCD
>   Flushing GCD
>   Flushing GCDs
> Detect CPU count: 1
>
> So that the startup code has been freed however the APs are still
> running,
> i think that why we saw the vCPUs executed on unexpected address.
>
> After digging into OVMF's code, i noticed that BSP CPU waits for APs
> for a fixed timer period, however, KVM recent changes require zap all
> mappings if CR0.CD is changed, that means the APs need more time to
> startup.
>
> After following changes to OVMF, the bug is completely gone on my side:
>
> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
> @@ -454,7 +454,9 @@ StartApsStackless (
>    //
>    // Wait 100 milliseconds for APs to arrive at the ApEntryPoint routine
>    //
> -  MicroSecondDelay (100 * 1000);
> +  MicroSecondDelay (10 * 100 * 1000);
>
>    return EFI_SUCCESS;
>  }
>
> Janusz, could you please check this instead? You can switch to your
> previous kernel to do this test.
>
>
Ok, now first time when I started VM I was able to start system
successfully. When I turned it off and started it again, it restarted my
vm at system boot couple of times. Sometimes I also get very high cpu
usage for no reason. Also, I get less fps in GTA 5 than in kernel 4.1, I
get something like 30-55, but on 4.1 I get all the time 60 fps. This is
my new log: https://bpaste.net/show/61a122ad7fe5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong Oct. 15, 2015, 6:41 a.m. UTC | #2
On 10/15/2015 02:19 PM, Janusz wrote:
> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>>
>>
>>
>> Well, the bug may be not in KVM. When this bug happened, i saw OVMF
>> only checked 1 CPU out, there is the log from OVMF's debug input:
>>
>>    Flushing GCD
>>    Flushing GCD
>>    Flushing GCD
>>    Flushing GCD
>>    Flushing GCD
>>    Flushing GCD
>>    Flushing GCD
>>    Flushing GCD
>>    Flushing GCD
>>    Flushing GCDs
>> Detect CPU count: 1
>>
>> So that the startup code has been freed however the APs are still
>> running,
>> i think that why we saw the vCPUs executed on unexpected address.
>>
>> After digging into OVMF's code, i noticed that BSP CPU waits for APs
>> for a fixed timer period, however, KVM recent changes require zap all
>> mappings if CR0.CD is changed, that means the APs need more time to
>> startup.
>>
>> After following changes to OVMF, the bug is completely gone on my side:
>>
>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
>> @@ -454,7 +454,9 @@ StartApsStackless (
>>     //
>>     // Wait 100 milliseconds for APs to arrive at the ApEntryPoint routine
>>     //
>> -  MicroSecondDelay (100 * 1000);
>> +  MicroSecondDelay (10 * 100 * 1000);
>>
>>     return EFI_SUCCESS;
>>   }
>>
>> Janusz, could you please check this instead? You can switch to your
>> previous kernel to do this test.
>>
>>
> Ok, now first time when I started VM I was able to start system
> successfully. When I turned it off and started it again, it restarted my
> vm at system boot couple of times. Sometimes I also get very high cpu
> usage for no reason. Also, I get less fps in GTA 5 than in kernel 4.1, I
> get something like 30-55, but on 4.1 I get all the time 60 fps. This is
> my new log: https://bpaste.net/show/61a122ad7fe5
>

Just confirm: the Qemu internal error did not appear any more, right?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Janusz Oct. 15, 2015, 6:58 a.m. UTC | #3
W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze:
>
>
> On 10/15/2015 02:19 PM, Janusz wrote:
>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>>>
>>>
>>>
>>> Well, the bug may be not in KVM. When this bug happened, i saw OVMF
>>> only checked 1 CPU out, there is the log from OVMF's debug input:
>>>
>>>    Flushing GCD
>>>    Flushing GCD
>>>    Flushing GCD
>>>    Flushing GCD
>>>    Flushing GCD
>>>    Flushing GCD
>>>    Flushing GCD
>>>    Flushing GCD
>>>    Flushing GCD
>>>    Flushing GCDs
>>> Detect CPU count: 1
>>>
>>> So that the startup code has been freed however the APs are still
>>> running,
>>> i think that why we saw the vCPUs executed on unexpected address.
>>>
>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs
>>> for a fixed timer period, however, KVM recent changes require zap all
>>> mappings if CR0.CD is changed, that means the APs need more time to
>>> startup.
>>>
>>> After following changes to OVMF, the bug is completely gone on my side:
>>>
>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
>>> @@ -454,7 +454,9 @@ StartApsStackless (
>>>     //
>>>     // Wait 100 milliseconds for APs to arrive at the ApEntryPoint
>>> routine
>>>     //
>>> -  MicroSecondDelay (100 * 1000);
>>> +  MicroSecondDelay (10 * 100 * 1000);
>>>
>>>     return EFI_SUCCESS;
>>>   }
>>>
>>> Janusz, could you please check this instead? You can switch to your
>>> previous kernel to do this test.
>>>
>>>
>> Ok, now first time when I started VM I was able to start system
>> successfully. When I turned it off and started it again, it restarted my
>> vm at system boot couple of times. Sometimes I also get very high cpu
>> usage for no reason. Also, I get less fps in GTA 5 than in kernel 4.1, I
>> get something like 30-55, but on 4.1 I get all the time 60 fps. This is
>> my new log: https://bpaste.net/show/61a122ad7fe5
>>
>
> Just confirm: the Qemu internal error did not appear any more, right?
Yes, when I reverted your first patch, switched to -vga std from -vga
none and didn't passthrough my GPU (case when I got this internal
error), vm started without problem. I even didn't get any VM restarts
like with passthrough
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong Oct. 15, 2015, 7:10 a.m. UTC | #4
On 10/15/2015 02:58 PM, Janusz wrote:
> W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze:
>>
>>
>> On 10/15/2015 02:19 PM, Janusz wrote:
>>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>>>>
>>>>
>>>>
>>>> Well, the bug may be not in KVM. When this bug happened, i saw OVMF
>>>> only checked 1 CPU out, there is the log from OVMF's debug input:
>>>>
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCDs
>>>> Detect CPU count: 1
>>>>
>>>> So that the startup code has been freed however the APs are still
>>>> running,
>>>> i think that why we saw the vCPUs executed on unexpected address.
>>>>
>>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs
>>>> for a fixed timer period, however, KVM recent changes require zap all
>>>> mappings if CR0.CD is changed, that means the APs need more time to
>>>> startup.
>>>>
>>>> After following changes to OVMF, the bug is completely gone on my side:
>>>>
>>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
>>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
>>>> @@ -454,7 +454,9 @@ StartApsStackless (
>>>>      //
>>>>      // Wait 100 milliseconds for APs to arrive at the ApEntryPoint
>>>> routine
>>>>      //
>>>> -  MicroSecondDelay (100 * 1000);
>>>> +  MicroSecondDelay (10 * 100 * 1000);
>>>>
>>>>      return EFI_SUCCESS;
>>>>    }
>>>>
>>>> Janusz, could you please check this instead? You can switch to your
>>>> previous kernel to do this test.
>>>>
>>>>
>>> Ok, now first time when I started VM I was able to start system
>>> successfully. When I turned it off and started it again, it restarted my
>>> vm at system boot couple of times. Sometimes I also get very high cpu
>>> usage for no reason. Also, I get less fps in GTA 5 than in kernel 4.1, I
>>> get something like 30-55, but on 4.1 I get all the time 60 fps. This is
>>> my new log: https://bpaste.net/show/61a122ad7fe5
>>>
>>
>> Just confirm: the Qemu internal error did not appear any more, right?
> Yes, when I reverted your first patch, switched to -vga std from -vga
> none and didn't passthrough my GPU (case when I got this internal
> error), vm started without problem. I even didn't get any VM restarts
> like with passthrough
>

Wow, it seems we have fixed the QEMU internal error now. :)

Recurrently, Paolo has reverted some MTRR patches, was your test
based on these reverted patches?

The GPU passthrough issue may be related to vfio (not sure), Alex, do
you have any idea?

Laszlo, could you please check the root case is reasonable and fix it in
OVMF if it's right?

BTW, OVMF handles #UD with no trace - nothing is killed, and no call trace
in the debug input...

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Janusz Oct. 15, 2015, 7:21 a.m. UTC | #5
W dniu 15.10.2015 o 09:10, Xiao Guangrong pisze:
>
>
> On 10/15/2015 02:58 PM, Janusz wrote:
>> W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze:
>>>
>>>
>>> On 10/15/2015 02:19 PM, Janusz wrote:
>>>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>>>>>
>>>>>
>>>>>
>>>>> Well, the bug may be not in KVM. When this bug happened, i saw OVMF
>>>>> only checked 1 CPU out, there is the log from OVMF's debug input:
>>>>>
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCDs
>>>>> Detect CPU count: 1
>>>>>
>>>>> So that the startup code has been freed however the APs are still
>>>>> running,
>>>>> i think that why we saw the vCPUs executed on unexpected address.
>>>>>
>>>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs
>>>>> for a fixed timer period, however, KVM recent changes require zap all
>>>>> mappings if CR0.CD is changed, that means the APs need more time to
>>>>> startup.
>>>>>
>>>>> After following changes to OVMF, the bug is completely gone on my
>>>>> side:
>>>>>
>>>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>> @@ -454,7 +454,9 @@ StartApsStackless (
>>>>>      //
>>>>>      // Wait 100 milliseconds for APs to arrive at the ApEntryPoint
>>>>> routine
>>>>>      //
>>>>> -  MicroSecondDelay (100 * 1000);
>>>>> +  MicroSecondDelay (10 * 100 * 1000);
>>>>>
>>>>>      return EFI_SUCCESS;
>>>>>    }
>>>>>
>>>>> Janusz, could you please check this instead? You can switch to your
>>>>> previous kernel to do this test.
>>>>>
>>>>>
>>>> Ok, now first time when I started VM I was able to start system
>>>> successfully. When I turned it off and started it again, it
>>>> restarted my
>>>> vm at system boot couple of times. Sometimes I also get very high cpu
>>>> usage for no reason. Also, I get less fps in GTA 5 than in kernel
>>>> 4.1, I
>>>> get something like 30-55, but on 4.1 I get all the time 60 fps.
>>>> This is
>>>> my new log: https://bpaste.net/show/61a122ad7fe5
>>>>
>>>
>>> Just confirm: the Qemu internal error did not appear any more, right?
>> Yes, when I reverted your first patch, switched to -vga std from -vga
>> none and didn't passthrough my GPU (case when I got this internal
>> error), vm started without problem. I even didn't get any VM restarts
>> like with passthrough
>>
>
> Wow, it seems we have fixed the QEMU internal error now. :)
>
> Recurrently, Paolo has reverted some MTRR patches, was your test
> based on these reverted patches?
>
> The GPU passthrough issue may be related to vfio (not sure), Alex, do
> you have any idea?
>
> Laszlo, could you please check the root case is reasonable and fix it in
> OVMF if it's right?
>
> BTW, OVMF handles #UD with no trace - nothing is killed, and no call
> trace
> in the debug input...
>
Yes, reverted MTRR code is already in kernel I use - 4.3-r5+
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laszlo Ersek Oct. 15, 2015, 4:18 p.m. UTC | #6
CC'ing Jordan and Chen Fan.

On 10/15/15 09:10, Xiao Guangrong wrote:
> 
> 
> On 10/15/2015 02:58 PM, Janusz wrote:
>> W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze:
>>>
>>>
>>> On 10/15/2015 02:19 PM, Janusz wrote:
>>>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>>>>>
>>>>>
>>>>>
>>>>> Well, the bug may be not in KVM. When this bug happened, i saw OVMF
>>>>> only checked 1 CPU out, there is the log from OVMF's debug input:
>>>>>
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCD
>>>>>     Flushing GCDs
>>>>> Detect CPU count: 1
>>>>>
>>>>> So that the startup code has been freed however the APs are still
>>>>> running,
>>>>> i think that why we saw the vCPUs executed on unexpected address.
>>>>>
>>>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs
>>>>> for a fixed timer period, however, KVM recent changes require zap all
>>>>> mappings if CR0.CD is changed, that means the APs need more time to
>>>>> startup.
>>>>>
>>>>> After following changes to OVMF, the bug is completely gone on my
>>>>> side:
>>>>>
>>>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>> @@ -454,7 +454,9 @@ StartApsStackless (
>>>>>      //
>>>>>      // Wait 100 milliseconds for APs to arrive at the ApEntryPoint
>>>>> routine
>>>>>      //
>>>>> -  MicroSecondDelay (100 * 1000);
>>>>> +  MicroSecondDelay (10 * 100 * 1000);
>>>>>
>>>>>      return EFI_SUCCESS;
>>>>>    }
>>>>>
>>>>> Janusz, could you please check this instead? You can switch to your
>>>>> previous kernel to do this test.
>>>>>
>>>>>
>>>> Ok, now first time when I started VM I was able to start system
>>>> successfully. When I turned it off and started it again, it
>>>> restarted my
>>>> vm at system boot couple of times. Sometimes I also get very high cpu
>>>> usage for no reason. Also, I get less fps in GTA 5 than in kernel
>>>> 4.1, I
>>>> get something like 30-55, but on 4.1 I get all the time 60 fps. This is
>>>> my new log: https://bpaste.net/show/61a122ad7fe5
>>>>
>>>
>>> Just confirm: the Qemu internal error did not appear any more, right?
>> Yes, when I reverted your first patch, switched to -vga std from -vga
>> none and didn't passthrough my GPU (case when I got this internal
>> error), vm started without problem. I even didn't get any VM restarts
>> like with passthrough
>>
> 
> Wow, it seems we have fixed the QEMU internal error now. :)
> 
> Recurrently, Paolo has reverted some MTRR patches, was your test
> based on these reverted patches?
> 
> The GPU passthrough issue may be related to vfio (not sure), Alex, do
> you have any idea?
> 
> Laszlo, could you please check the root case is reasonable and fix it in
> OVMF if it's right?

The code that you have found is in edk2's EFI_MP_SERVICES_PROTOCOL
implementation -- more closely, its initial CPU counter code --, from
edk2 git commit 533263ee5a7f. It is not specific to OVMF -- it is
generic edk2 code for Intel processors. (I'm CC'ing Jordan and Chen Fan
because they authored the patch in question.)

If VCPUs need more time to rendezvous than written in the code, on
recent KVM, then I think we should introduce a new FixedPCD in
UefiCpuPkg (practically: a compile time constant) for the timeout. Which
is not hard to do.

However, we'll need two things:
- an idea about the concrete rendezvous timeout to set, from OvmfPkg

- a *detailed* explanation / elaboration on your words:

  "KVM recent changes require zap all mappings if CR0.CD is changed,
  that means the APs need more time to startup"

  Preferably with references to Linux kernel commits and the Intel SDM,
  so that n00bs like me can get a fleeting idea. Do you mean that with
  caching disabled, the APs execute their rendezvous code (from memory)
  more slowly?

> BTW, OVMF handles #UD with no trace - nothing is killed, and no call trace
> in the debug input...

There *is* a trace (of any unexpected exception -- at least for the
BSP), but unfortunately its location is not intuitive.

The exception handler that is built into OVMF
("UefiCpuPkg/Library/CpuExceptionHandlerLib") is again generic edk2
code, and it prints the trace directly to the serial port, regardless of
the fact that OVMF's DebugLib instance logs explicit DEBUGs to the QEMU
debug port. (The latter can be directed to the serial port as well, if
you build OVMF with -D DEBUG_ON_SERIAL_PORT, but this is not relevant here.)

If you reproduce the issue while looking at the (virtual) serial port of
the guest, I trust you will get a register dump.

Thanks!
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kinney, Michael D Oct. 15, 2015, 4:53 p.m. UTC | #7
Laszlo,

There is already a PCD for this timeout that is used by CpuMpPei.

	gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds

I noticed that CpuDxe is using a hard coded AP timeout.  I think we should just use this same PCD for both the PEI and DXE CPU module and then set it for OVMF to the compatible value.

Mike

>-----Original Message-----
>From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
>Laszlo Ersek
>Sent: Thursday, October 15, 2015 9:19 AM
>To: Xiao Guangrong
>Cc: kvm@vger.kernel.org; Justen, Jordan L; edk2-devel@ml01.01.org; Alex
>Williamson; Chen Fan; Paolo Bonzini; Wanpeng Li
>Subject: Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is
>completely disabled
>
>CC'ing Jordan and Chen Fan.
>
>On 10/15/15 09:10, Xiao Guangrong wrote:
>>
>>
>> On 10/15/2015 02:58 PM, Janusz wrote:
>>> W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze:
>>>>
>>>>
>>>> On 10/15/2015 02:19 PM, Janusz wrote:
>>>>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Well, the bug may be not in KVM. When this bug happened, i saw
>OVMF
>>>>>> only checked 1 CPU out, there is the log from OVMF's debug input:
>>>>>>
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCD
>>>>>>     Flushing GCDs
>>>>>> Detect CPU count: 1
>>>>>>
>>>>>> So that the startup code has been freed however the APs are still
>>>>>> running,
>>>>>> i think that why we saw the vCPUs executed on unexpected address.
>>>>>>
>>>>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs
>>>>>> for a fixed timer period, however, KVM recent changes require zap all
>>>>>> mappings if CR0.CD is changed, that means the APs need more time to
>>>>>> startup.
>>>>>>
>>>>>> After following changes to OVMF, the bug is completely gone on my
>>>>>> side:
>>>>>>
>>>>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>>> @@ -454,7 +454,9 @@ StartApsStackless (
>>>>>>      //
>>>>>>      // Wait 100 milliseconds for APs to arrive at the ApEntryPoint
>>>>>> routine
>>>>>>      //
>>>>>> -  MicroSecondDelay (100 * 1000);
>>>>>> +  MicroSecondDelay (10 * 100 * 1000);
>>>>>>
>>>>>>      return EFI_SUCCESS;
>>>>>>    }
>>>>>>
>>>>>> Janusz, could you please check this instead? You can switch to your
>>>>>> previous kernel to do this test.
>>>>>>
>>>>>>
>>>>> Ok, now first time when I started VM I was able to start system
>>>>> successfully. When I turned it off and started it again, it
>>>>> restarted my
>>>>> vm at system boot couple of times. Sometimes I also get very high cpu
>>>>> usage for no reason. Also, I get less fps in GTA 5 than in kernel
>>>>> 4.1, I
>>>>> get something like 30-55, but on 4.1 I get all the time 60 fps. This is
>>>>> my new log: https://bpaste.net/show/61a122ad7fe5
>>>>>
>>>>
>>>> Just confirm: the Qemu internal error did not appear any more, right?
>>> Yes, when I reverted your first patch, switched to -vga std from -vga
>>> none and didn't passthrough my GPU (case when I got this internal
>>> error), vm started without problem. I even didn't get any VM restarts
>>> like with passthrough
>>>
>>
>> Wow, it seems we have fixed the QEMU internal error now. :)
>>
>> Recurrently, Paolo has reverted some MTRR patches, was your test
>> based on these reverted patches?
>>
>> The GPU passthrough issue may be related to vfio (not sure), Alex, do
>> you have any idea?
>>
>> Laszlo, could you please check the root case is reasonable and fix it in
>> OVMF if it's right?
>
>The code that you have found is in edk2's EFI_MP_SERVICES_PROTOCOL
>implementation -- more closely, its initial CPU counter code --, from
>edk2 git commit 533263ee5a7f. It is not specific to OVMF -- it is
>generic edk2 code for Intel processors. (I'm CC'ing Jordan and Chen Fan
>because they authored the patch in question.)
>
>If VCPUs need more time to rendezvous than written in the code, on
>recent KVM, then I think we should introduce a new FixedPCD in
>UefiCpuPkg (practically: a compile time constant) for the timeout. Which
>is not hard to do.
>
>However, we'll need two things:
>- an idea about the concrete rendezvous timeout to set, from OvmfPkg
>
>- a *detailed* explanation / elaboration on your words:
>
>  "KVM recent changes require zap all mappings if CR0.CD is changed,
>  that means the APs need more time to startup"
>
>  Preferably with references to Linux kernel commits and the Intel SDM,
>  so that n00bs like me can get a fleeting idea. Do you mean that with
>  caching disabled, the APs execute their rendezvous code (from memory)
>  more slowly?
>
>> BTW, OVMF handles #UD with no trace - nothing is killed, and no call trace
>> in the debug input...
>
>There *is* a trace (of any unexpected exception -- at least for the
>BSP), but unfortunately its location is not intuitive.
>
>The exception handler that is built into OVMF
>("UefiCpuPkg/Library/CpuExceptionHandlerLib") is again generic edk2
>code, and it prints the trace directly to the serial port, regardless of
>the fact that OVMF's DebugLib instance logs explicit DEBUGs to the QEMU
>debug port. (The latter can be directed to the serial port as well, if
>you build OVMF with -D DEBUG_ON_SERIAL_PORT, but this is not relevant
>here.)
>
>If you reproduce the issue while looking at the (virtual) serial port of
>the guest, I trust you will get a register dump.
>
>Thanks!
>Laszlo
>_______________________________________________
>edk2-devel mailing list
>edk2-devel@lists.01.org
>https://lists.01.org/mailman/listinfo/edk2-devel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laszlo Ersek Oct. 15, 2015, 6:46 p.m. UTC | #8
On 10/15/15 18:53, Kinney, Michael D wrote:
> Laszlo,
> 
> There is already a PCD for this timeout that is used by CpuMpPei.
> 
> 	gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds
> 
> I noticed that CpuDxe is using a hard coded AP timeout.  I think we should just use this same PCD for both the PEI and DXE CPU module and then set it for OVMF to the compatible value.

Perfect, thank you!

(I notice the default in the DEC file is 50000, which is half of what
the DXE driver hardcodes.)

Now we only need a recommended (or experimental) value for it, and an
explanation why 100*1000 is no longer sufficient on KVM :)

Thanks!
Laszlo


> 
> Mike
> 
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
>> Laszlo Ersek
>> Sent: Thursday, October 15, 2015 9:19 AM
>> To: Xiao Guangrong
>> Cc: kvm@vger.kernel.org; Justen, Jordan L; edk2-devel@ml01.01.org; Alex
>> Williamson; Chen Fan; Paolo Bonzini; Wanpeng Li
>> Subject: Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is
>> completely disabled
>>
>> CC'ing Jordan and Chen Fan.
>>
>> On 10/15/15 09:10, Xiao Guangrong wrote:
>>>
>>>
>>> On 10/15/2015 02:58 PM, Janusz wrote:
>>>> W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze:
>>>>>
>>>>>
>>>>> On 10/15/2015 02:19 PM, Janusz wrote:
>>>>>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Well, the bug may be not in KVM. When this bug happened, i saw
>> OVMF
>>>>>>> only checked 1 CPU out, there is the log from OVMF's debug input:
>>>>>>>
>>>>>>>     Flushing GCD
>>>>>>>     Flushing GCD
>>>>>>>     Flushing GCD
>>>>>>>     Flushing GCD
>>>>>>>     Flushing GCD
>>>>>>>     Flushing GCD
>>>>>>>     Flushing GCD
>>>>>>>     Flushing GCD
>>>>>>>     Flushing GCD
>>>>>>>     Flushing GCDs
>>>>>>> Detect CPU count: 1
>>>>>>>
>>>>>>> So that the startup code has been freed however the APs are still
>>>>>>> running,
>>>>>>> i think that why we saw the vCPUs executed on unexpected address.
>>>>>>>
>>>>>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs
>>>>>>> for a fixed timer period, however, KVM recent changes require zap all
>>>>>>> mappings if CR0.CD is changed, that means the APs need more time to
>>>>>>> startup.
>>>>>>>
>>>>>>> After following changes to OVMF, the bug is completely gone on my
>>>>>>> side:
>>>>>>>
>>>>>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>>>> @@ -454,7 +454,9 @@ StartApsStackless (
>>>>>>>      //
>>>>>>>      // Wait 100 milliseconds for APs to arrive at the ApEntryPoint
>>>>>>> routine
>>>>>>>      //
>>>>>>> -  MicroSecondDelay (100 * 1000);
>>>>>>> +  MicroSecondDelay (10 * 100 * 1000);
>>>>>>>
>>>>>>>      return EFI_SUCCESS;
>>>>>>>    }
>>>>>>>
>>>>>>> Janusz, could you please check this instead? You can switch to your
>>>>>>> previous kernel to do this test.
>>>>>>>
>>>>>>>
>>>>>> Ok, now first time when I started VM I was able to start system
>>>>>> successfully. When I turned it off and started it again, it
>>>>>> restarted my
>>>>>> vm at system boot couple of times. Sometimes I also get very high cpu
>>>>>> usage for no reason. Also, I get less fps in GTA 5 than in kernel
>>>>>> 4.1, I
>>>>>> get something like 30-55, but on 4.1 I get all the time 60 fps. This is
>>>>>> my new log: https://bpaste.net/show/61a122ad7fe5
>>>>>>
>>>>>
>>>>> Just confirm: the Qemu internal error did not appear any more, right?
>>>> Yes, when I reverted your first patch, switched to -vga std from -vga
>>>> none and didn't passthrough my GPU (case when I got this internal
>>>> error), vm started without problem. I even didn't get any VM restarts
>>>> like with passthrough
>>>>
>>>
>>> Wow, it seems we have fixed the QEMU internal error now. :)
>>>
>>> Recurrently, Paolo has reverted some MTRR patches, was your test
>>> based on these reverted patches?
>>>
>>> The GPU passthrough issue may be related to vfio (not sure), Alex, do
>>> you have any idea?
>>>
>>> Laszlo, could you please check the root case is reasonable and fix it in
>>> OVMF if it's right?
>>
>> The code that you have found is in edk2's EFI_MP_SERVICES_PROTOCOL
>> implementation -- more closely, its initial CPU counter code --, from
>> edk2 git commit 533263ee5a7f. It is not specific to OVMF -- it is
>> generic edk2 code for Intel processors. (I'm CC'ing Jordan and Chen Fan
>> because they authored the patch in question.)
>>
>> If VCPUs need more time to rendezvous than written in the code, on
>> recent KVM, then I think we should introduce a new FixedPCD in
>> UefiCpuPkg (practically: a compile time constant) for the timeout. Which
>> is not hard to do.
>>
>> However, we'll need two things:
>> - an idea about the concrete rendezvous timeout to set, from OvmfPkg
>>
>> - a *detailed* explanation / elaboration on your words:
>>
>>  "KVM recent changes require zap all mappings if CR0.CD is changed,
>>  that means the APs need more time to startup"
>>
>>  Preferably with references to Linux kernel commits and the Intel SDM,
>>  so that n00bs like me can get a fleeting idea. Do you mean that with
>>  caching disabled, the APs execute their rendezvous code (from memory)
>>  more slowly?
>>
>>> BTW, OVMF handles #UD with no trace - nothing is killed, and no call trace
>>> in the debug input...
>>
>> There *is* a trace (of any unexpected exception -- at least for the
>> BSP), but unfortunately its location is not intuitive.
>>
>> The exception handler that is built into OVMF
>> ("UefiCpuPkg/Library/CpuExceptionHandlerLib") is again generic edk2
>> code, and it prints the trace directly to the serial port, regardless of
>> the fact that OVMF's DebugLib instance logs explicit DEBUGs to the QEMU
>> debug port. (The latter can be directed to the serial port as well, if
>> you build OVMF with -D DEBUG_ON_SERIAL_PORT, but this is not relevant
>> here.)
>>
>> If you reproduce the issue while looking at the (virtual) serial port of
>> the guest, I trust you will get a register dump.
>>
>> Thanks!
>> Laszlo
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laszlo Ersek Oct. 16, 2015, 6:22 p.m. UTC | #9
On 10/16/15 05:05, Xiao Guangrong wrote:
> 
> 
> On 10/16/2015 12:18 AM, Laszlo Ersek wrote:
>> CC'ing Jordan and Chen Fan.
>>
>> On 10/15/15 09:10, Xiao Guangrong wrote:
>>>
>>>
>>> On 10/15/2015 02:58 PM, Janusz wrote:
>>>> W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze:
>>>>>
>>>>>
>>>>> On 10/15/2015 02:19 PM, Janusz wrote:
>>>>>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Well, the bug may be not in KVM. When this bug happened, i saw OVMF
>>>>>>> only checked 1 CPU out, there is the log from OVMF's debug input:
>>>>>>>
>>>>>>>      Flushing GCD
>>>>>>>      Flushing GCD
>>>>>>>      Flushing GCD
>>>>>>>      Flushing GCD
>>>>>>>      Flushing GCD
>>>>>>>      Flushing GCD
>>>>>>>      Flushing GCD
>>>>>>>      Flushing GCD
>>>>>>>      Flushing GCD
>>>>>>>      Flushing GCDs
>>>>>>> Detect CPU count: 1
>>>>>>>
>>>>>>> So that the startup code has been freed however the APs are still
>>>>>>> running,
>>>>>>> i think that why we saw the vCPUs executed on unexpected address.
>>>>>>>
>>>>>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs
>>>>>>> for a fixed timer period, however, KVM recent changes require zap
>>>>>>> all
>>>>>>> mappings if CR0.CD is changed, that means the APs need more time to
>>>>>>> startup.
>>>>>>>
>>>>>>> After following changes to OVMF, the bug is completely gone on my
>>>>>>> side:
>>>>>>>
>>>>>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
>>>>>>> @@ -454,7 +454,9 @@ StartApsStackless (
>>>>>>>       //
>>>>>>>       // Wait 100 milliseconds for APs to arrive at the ApEntryPoint
>>>>>>> routine
>>>>>>>       //
>>>>>>> -  MicroSecondDelay (100 * 1000);
>>>>>>> +  MicroSecondDelay (10 * 100 * 1000);
>>>>>>>
>>>>>>>       return EFI_SUCCESS;
>>>>>>>     }
>>>>>>>
>>>>>>> Janusz, could you please check this instead? You can switch to your
>>>>>>> previous kernel to do this test.
>>>>>>>
>>>>>>>
>>>>>> Ok, now first time when I started VM I was able to start system
>>>>>> successfully. When I turned it off and started it again, it
>>>>>> restarted my
>>>>>> vm at system boot couple of times. Sometimes I also get very high cpu
>>>>>> usage for no reason. Also, I get less fps in GTA 5 than in kernel
>>>>>> 4.1, I
>>>>>> get something like 30-55, but on 4.1 I get all the time 60 fps.
>>>>>> This is
>>>>>> my new log: https://bpaste.net/show/61a122ad7fe5
>>>>>>
>>>>>
>>>>> Just confirm: the Qemu internal error did not appear any more, right?
>>>> Yes, when I reverted your first patch, switched to -vga std from -vga
>>>> none and didn't passthrough my GPU (case when I got this internal
>>>> error), vm started without problem. I even didn't get any VM restarts
>>>> like with passthrough
>>>>
>>>
>>> Wow, it seems we have fixed the QEMU internal error now. :)
>>>
>>> Recurrently, Paolo has reverted some MTRR patches, was your test
>>> based on these reverted patches?
>>>
>>> The GPU passthrough issue may be related to vfio (not sure), Alex, do
>>> you have any idea?
>>>
>>> Laszlo, could you please check the root case is reasonable and fix it in
>>> OVMF if it's right?
>>
>> The code that you have found is in edk2's EFI_MP_SERVICES_PROTOCOL
>> implementation -- more closely, its initial CPU counter code --, from
>> edk2 git commit 533263ee5a7f. It is not specific to OVMF -- it is
>> generic edk2 code for Intel processors. (I'm CC'ing Jordan and Chen Fan
>> because they authored the patch in question.)
> 
> Okay, good to know it, i do not have much knowledge on edk2 and OVMF... :(
> 
>>
>> If VCPUs need more time to rendezvous than written in the code, on
>> recent KVM, then I think we should introduce a new FixedPCD in
>> UefiCpuPkg (practically: a compile time constant) for the timeout. Which
>> is not hard to do.
>>
>> However, we'll need two things:
>> - an idea about the concrete rendezvous timeout to set, from OvmfPkg
>>
>> - a *detailed* explanation / elaboration on your words:
>>
>>    "KVM recent changes require zap all mappings if CR0.CD is changed,
>>    that means the APs need more time to startup"
>>
>>    Preferably with references to Linux kernel commits and the Intel SDM,
>>    so that n00bs like me can get a fleeting idea. Do you mean that with
>>    caching disabled, the APs execute their rendezvous code (from memory)
>>    more slowly?
> 
> Kernel commit b18d5431acc causes the vCPUs need more time to startup
> as:
> - it zaps all the mappings for the guest memory in EPT or shadow page
>   table, it requires VM-exits to rebuild the mappings for all memory
>   access.
> 
> - if there is device passthrough-ed in guest and IOMMU lacks snooping
>   control feature, the memory will become UC after CR0.CD is set to 1.
> 
> And a generic factor is, if the guest has more vCPUs then more time is
> needed. That why the bug is hardly triggered on small vCPUs guest. I
> guess we need a self-adapting way to handle the case...

Thanks, this should be enough for composing a commit message.

> 
>>
>>> BTW, OVMF handles #UD with no trace - nothing is killed, and no call
>>> trace
>>> in the debug input...
>>
>> There *is* a trace (of any unexpected exception -- at least for the
>> BSP), but unfortunately its location is not intuitive.
>>
>> The exception handler that is built into OVMF
>> ("UefiCpuPkg/Library/CpuExceptionHandlerLib") is again generic edk2
>> code, and it prints the trace directly to the serial port, regardless of
>> the fact that OVMF's DebugLib instance logs explicit DEBUGs to the QEMU
>> debug port. (The latter can be directed to the serial port as well, if
>> you build OVMF with -D DEBUG_ON_SERIAL_PORT, but this is not relevant
>> here.)
>>
>> If you reproduce the issue while looking at the (virtual) serial port of
>> the guest, I trust you will get a register dump.
> 
> Er... it seems no dump in serial output, i attached it in this mail. The
> system
> continues to run with 1 CPU enabled......

Actually, the guest is in a reboot loop, it just may not be obvious from
the log. Whenever you see

SecCoreStartupWithStack(0xFFFCC000, 0x818000)

that means the guest has rebooted.

The fault handler that I described becomes active when a fault gets
injected visibily to the guest -- or happens within the guest entirely
-- for example, a null pointer dereference, and the fault handler can
actually handle it.

I guess a triple fault occurs or some such.

Thanks
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Janusz Oct. 20, 2015, 5:27 p.m. UTC | #10
W dniu 15.10.2015 o 20:46, Laszlo Ersek pisze:
> On 10/15/15 18:53, Kinney, Michael D wrote:
>> Laszlo,
>>
>> There is already a PCD for this timeout that is used by CpuMpPei.
>>
>> 	gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds
>>
>> I noticed that CpuDxe is using a hard coded AP timeout.  I think we should just use this same PCD for both the PEI and DXE CPU module and then set it for OVMF to the compatible value.
> Perfect, thank you!
>
> (I notice the default in the DEC file is 50000, which is half of what
> the DXE driver hardcodes.)
>
> Now we only need a recommended (or experimental) value for it, and an
> explanation why 100*1000 is no longer sufficient on KVM :)
>
> Thanks!
> Laszlo
>
>
>
Laszlo,

I saw that there is already some change in ovmf for MicroSecondDelay
https://github.com/tianocore/edk2/commit/1e410eadd80c328e66868263b3006a274ce81ae0
Is that a fix for it? Because I tried it and it still doesn't work for
me: https://bpaste.net/show/2514b51bf41f
I still get internal error

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laszlo Ersek Oct. 20, 2015, 5:44 p.m. UTC | #11
Hi,

On 10/20/15 19:27, Janusz wrote:
> W dniu 15.10.2015 o 20:46, Laszlo Ersek pisze:
>> On 10/15/15 18:53, Kinney, Michael D wrote:
>>> Laszlo,
>>>
>>> There is already a PCD for this timeout that is used by CpuMpPei.
>>>
>>> 	gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds
>>>
>>> I noticed that CpuDxe is using a hard coded AP timeout.  I think we should just use this same PCD for both the PEI and DXE CPU module and then set it for OVMF to the compatible value.
>> Perfect, thank you!
>>
>> (I notice the default in the DEC file is 50000, which is half of what
>> the DXE driver hardcodes.)
>>
>> Now we only need a recommended (or experimental) value for it, and an
>> explanation why 100*1000 is no longer sufficient on KVM :)
>>
>> Thanks!
>> Laszlo
>>
>>
>>
> Laszlo,
> 
> I saw that there is already some change in ovmf for MicroSecondDelay
> https://github.com/tianocore/edk2/commit/1e410eadd80c328e66868263b3006a274ce81ae0
> Is that a fix for it? Because I tried it and it still doesn't work for
> me: https://bpaste.net/show/2514b51bf41f
> I still get internal error

I think you guys are now "mature enough OVMF users" to start employing
the correct terminology.

"edk2" (also spelled as "EDK II") is: "a modern, feature-rich,
cross-platform firmware development environment for the UEFI and PI
specifications".

The source tree contains a whole bunch of modules (drivers,
applications, libraries), organized into packages.

"OVMF" usually denotes a firmware binary built from one of the
OvmfPkg/OvmfPkg*.dsc "platform description files". Think of them as "top
level makefiles". The difference between them is the target architecture
(there's Ia32, X64, and Ia32X64 -- the last one means that the SEC and
PEI phases are 32-bit, whereas the DXE and later phases are 64-bit.) In
practice you'll only care about full X64.

Now, each of OvmfPkg/OvmfPkg*.dsc builds the following three kinds of
modules into the final binary:
- platform-independent modules from various top-level packages
- platform- (ie. Ia32/X64-) dependent modules from various top-level
  packages
- modules from under OvmfPkg that are specific to QEMU/KVM (and Xen, if
  you happen to use OVMF with Xen)

Now, when you reference a commit like 1e410ead above, you can look at
the diffstat, and decide if it is OvmfPkg-specific (third category
above) or not. Here you see UefiCpuPkg, which happens to be the second
category.

The important point is: please do *not* call any and all edk2 patches
"OVMF changes", indiscriminately. That's super confusing for people who
understand the above distinctions. Which now you do too. :)

Let me add that in edk2, patches that straddle top level packages are
generally forbidden -- you can't have a patch that modifies OvmfPkg and
UefiCpuPkg at the same time, modulo *very* rare exceptions. If a feature
or bugfix needs to touch several top-level packages, the series must be
built up carefully in stages.

Knowing all of the above, you can tell that the patch you referenced had
only *enabled* OvmfPkg to customize UefiCpuPkg, via
"PcdCpuApInitTimeOutInMicroSeconds". But for that customization to occur
actually, a small patch for OvmfPkg will be necessary too, in order to
set "PcdCpuApInitTimeOutInMicroSeconds" differently from the default.

I plan to send that patch soon. If you'd like to be CC'd, that's great
(reporting back with a Tested-by is even better!), but I'll need your
real name for that. (Or any name that looks like a real name.)

Thanks!
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Janusz Oct. 20, 2015, 6:52 p.m. UTC | #12
W dniu 20.10.2015 o 19:44, Laszlo Ersek pisze:
> Hi,
>
> On 10/20/15 19:27, Janusz wrote:
>> W dniu 15.10.2015 o 20:46, Laszlo Ersek pisze:
>>> On 10/15/15 18:53, Kinney, Michael D wrote:
>>>> Laszlo,
>>>>
>>>> There is already a PCD for this timeout that is used by CpuMpPei.
>>>>
>>>> 	gUefiCpuPkgTokenSpaceGuid.PcdCpuApInitTimeOutInMicroSeconds
>>>>
>>>> I noticed that CpuDxe is using a hard coded AP timeout.  I think we should just use this same PCD for both the PEI and DXE CPU module and then set it for OVMF to the compatible value.
>>> Perfect, thank you!
>>>
>>> (I notice the default in the DEC file is 50000, which is half of what
>>> the DXE driver hardcodes.)
>>>
>>> Now we only need a recommended (or experimental) value for it, and an
>>> explanation why 100*1000 is no longer sufficient on KVM :)
>>>
>>> Thanks!
>>> Laszlo
>>>
>>>
>>>
>> Laszlo,
>>
>> I saw that there is already some change in ovmf for MicroSecondDelay
>> https://github.com/tianocore/edk2/commit/1e410eadd80c328e66868263b3006a274ce81ae0
>> Is that a fix for it? Because I tried it and it still doesn't work for
>> me: https://bpaste.net/show/2514b51bf41f
>> I still get internal error
> I think you guys are now "mature enough OVMF users" to start employing
> the correct terminology.

Sory for that :)

> "edk2" (also spelled as "EDK II") is: "a modern, feature-rich,
> cross-platform firmware development environment for the UEFI and PI
> specifications".
>
> The source tree contains a whole bunch of modules (drivers,
> applications, libraries), organized into packages.
>
> "OVMF" usually denotes a firmware binary built from one of the
> OvmfPkg/OvmfPkg*.dsc "platform description files". Think of them as "top
> level makefiles". The difference between them is the target architecture
> (there's Ia32, X64, and Ia32X64 -- the last one means that the SEC and
> PEI phases are 32-bit, whereas the DXE and later phases are 64-bit.) In
> practice you'll only care about full X64.
>
> Now, each of OvmfPkg/OvmfPkg*.dsc builds the following three kinds of
> modules into the final binary:
> - platform-independent modules from various top-level packages
> - platform- (ie. Ia32/X64-) dependent modules from various top-level
>   packages
> - modules from under OvmfPkg that are specific to QEMU/KVM (and Xen, if
>   you happen to use OVMF with Xen)
>
> Now, when you reference a commit like 1e410ead above, you can look at
> the diffstat, and decide if it is OvmfPkg-specific (third category
> above) or not. Here you see UefiCpuPkg, which happens to be the second
> category.
>
> The important point is: please do *not* call any and all edk2 patches
> "OVMF changes", indiscriminately. That's super confusing for people who
> understand the above distinctions. Which now you do too. :)
>
> Let me add that in edk2, patches that straddle top level packages are
> generally forbidden -- you can't have a patch that modifies OvmfPkg and
> UefiCpuPkg at the same time, modulo *very* rare exceptions. If a feature
> or bugfix needs to touch several top-level packages, the series must be
> built up carefully in stages.
>
> Knowing all of the above, you can tell that the patch you referenced had
> only *enabled* OvmfPkg to customize UefiCpuPkg, via
> "PcdCpuApInitTimeOutInMicroSeconds". But for that customization to occur
> actually, a small patch for OvmfPkg will be necessary too, in order to
> set "PcdCpuApInitTimeOutInMicroSeconds" differently from the default.
>
> I plan to send that patch soon. If you'd like to be CC'd, that's great
> (reporting back with a Tested-by is even better!), but I'll need your
> real name for that. (Or any name that looks like a real name.)
would be great if you could add me to cc list, thanks
>
> Thanks!
> Laszlo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/UefiCpuPkg/CpuDxe/ApStartup.c
+++ b/UefiCpuPkg/CpuDxe/ApStartup.c
@@ -454,7 +454,9 @@  StartApsStackless (
    //
    // Wait 100 milliseconds for APs to arrive at the ApEntryPoint routine
    //
-  MicroSecondDelay (100 * 1000);
+  MicroSecondDelay (10 * 100 * 1000);

    return EFI_SUCCESS;
  }

Janusz, could you please check this instead? You can switch to your