diff mbox

Xen-unstable 4.8: HVM domain_crash called from emulate.c:144 RIP: c000:[<000000000000336a>]

Message ID 5761463502000078000F5355@prv-mh.provo.novell.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Beulich June 15, 2016, 10:12 a.m. UTC
>>> On 15.06.16 at 11:38, <linux@eikelenboom.it> wrote:
> Wednesday, June 15, 2016, 10:57:03 AM, you wrote:
> 
>> Wednesday, June 15, 2016, 10:29:37 AM, you wrote:
> 
>>>>>> On 15.06.16 at 01:49, <linux@eikelenboom.it> wrote:
>>>> Just tested latest xen-unstable 4.8 (xen_changeset git:d337764),
>>>> but one of the latest commits seems to have broken boot of HVM guests
>>>> (using qemu-xen) previous build with xen_changeset git:6e908ee worked 
>>>> fine.
> 
>>> Primary suspects would seem to be 67fc274bbe and bfa84968b2,
>>> but (obviously) I didn't see any issues with them in my own
>>> testing, so could you
>>> - instead of doing a full bisect, revert just those two
> 
>> Will give reverting that a shot.
> 
> Reverting bfa84968b2 is sufficient.

Could you give this wild guess a try on top of the tree without the
revert?



>>> And then of course this domain_crash() could of course be
>>> accompanied by some helpful printk() ...
> 
> Do you have a debug patch of what you are interested in ?

Not yet - basically we should log all of the variables involved in the
condition leading to the domain_crash().

Jan

Comments

Sander Eikelenboom June 15, 2016, noon UTC | #1
Wednesday, June 15, 2016, 12:12:37 PM, you wrote:

>>>> On 15.06.16 at 11:38, <linux@eikelenboom.it> wrote:
>> Wednesday, June 15, 2016, 10:57:03 AM, you wrote:
>> 
>>> Wednesday, June 15, 2016, 10:29:37 AM, you wrote:
>> 
>>>>>>> On 15.06.16 at 01:49, <linux@eikelenboom.it> wrote:
>>>>> Just tested latest xen-unstable 4.8 (xen_changeset git:d337764),
>>>>> but one of the latest commits seems to have broken boot of HVM guests
>>>>> (using qemu-xen) previous build with xen_changeset git:6e908ee worked 
>>>>> fine.
>> 
>>>> Primary suspects would seem to be 67fc274bbe and bfa84968b2,
>>>> but (obviously) I didn't see any issues with them in my own
>>>> testing, so could you
>>>> - instead of doing a full bisect, revert just those two
>> 
>>> Will give reverting that a shot.
>> 
>> Reverting bfa84968b2 is sufficient.

> Could you give this wild guess a try on top of the tree without the
> revert?

> --- unstable.orig/xen/arch/x86/hvm/emulate.c
> +++ unstable/xen/arch/x86/hvm/emulate.c
> @@ -1180,7 +1180,7 @@ static int hvmemul_rep_movs(
>          pfec |= PFEC_user_mode;
>  
>      bytes = PAGE_SIZE - (saddr & ~PAGE_MASK);
-    if ( vio->>mmio_access.read_access &&
+    if ( vio->>mmio_access.read_access && !vio->mmio_access.write_access &&
>           (vio->mmio_gla == (saddr & PAGE_MASK)) &&
>           bytes >= bytes_per_rep )
>      {

Unfortunately still crashes.

--
Sander

>>>> And then of course this domain_crash() could of course be
>>>> accompanied by some helpful printk() ...
>> 
>> Do you have a debug patch of what you are interested in ?

> Not yet - basically we should log all of the variables involved in the
> condition leading to the domain_crash().

> Jan
Jan Beulich June 15, 2016, 12:48 p.m. UTC | #2
>>> On 15.06.16 at 14:00, <linux@eikelenboom.it> wrote:
> Wednesday, June 15, 2016, 12:12:37 PM, you wrote:
>>>>> On 15.06.16 at 11:38, <linux@eikelenboom.it> wrote:
>>> Wednesday, June 15, 2016, 10:57:03 AM, you wrote:
>>>> Wednesday, June 15, 2016, 10:29:37 AM, you wrote:
>>>>>>>> On 15.06.16 at 01:49, <linux@eikelenboom.it> wrote:
>>>>>> Just tested latest xen-unstable 4.8 (xen_changeset git:d337764),
>>>>>> but one of the latest commits seems to have broken boot of HVM guests
>>>>>> (using qemu-xen) previous build with xen_changeset git:6e908ee worked 
>>>>>> fine.
>>> 
>>>>> Primary suspects would seem to be 67fc274bbe and bfa84968b2,
>>>>> but (obviously) I didn't see any issues with them in my own
>>>>> testing, so could you
>>>>> - instead of doing a full bisect, revert just those two
>>> 
>>>> Will give reverting that a shot.
>>> 
>>> Reverting bfa84968b2 is sufficient.
> 
>> Could you give this wild guess a try on top of the tree without the
>> revert?
> 
>> --- unstable.orig/xen/arch/x86/hvm/emulate.c
>> +++ unstable/xen/arch/x86/hvm/emulate.c
>> @@ -1180,7 +1180,7 @@ static int hvmemul_rep_movs(
>>          pfec |= PFEC_user_mode;
>>  
>>      bytes = PAGE_SIZE - (saddr & ~PAGE_MASK);
> -    if ( vio->>mmio_access.read_access &&
> +    if ( vio->>mmio_access.read_access && !vio->mmio_access.write_access &&
>>           (vio->mmio_gla == (saddr & PAGE_MASK)) &&
>>           bytes >= bytes_per_rep )
>>      {
> 
> Unfortunately still crashes.

Thanks for trying. Which basically just leaves the p.count > *reps
part in that domain_crash() condition, as that's the only other thing
involved in that check which said commit could have an effect on (as
far as I can tell at least). Would you be up for another experiment,
removing that one line? Other things to try (just to understand the
issue) would be to
- revert only each half of said commit individually (the two hunks
  really are independent),
- remove just the two latch_linear_to_phys() calls.

Apart from that, and just to see whether there are other differences
between your guest(s) and mine, could you post a guest config from
one that's affected?

Jan
Sander Eikelenboom June 15, 2016, 1:58 p.m. UTC | #3
Wednesday, June 15, 2016, 2:48:55 PM, you wrote:

>>>> On 15.06.16 at 14:00, <linux@eikelenboom.it> wrote:
>> Wednesday, June 15, 2016, 12:12:37 PM, you wrote:
>>>>>> On 15.06.16 at 11:38, <linux@eikelenboom.it> wrote:
>>>> Wednesday, June 15, 2016, 10:57:03 AM, you wrote:
>>>>> Wednesday, June 15, 2016, 10:29:37 AM, you wrote:
>>>>>>>>> On 15.06.16 at 01:49, <linux@eikelenboom.it> wrote:
>>>>>>> Just tested latest xen-unstable 4.8 (xen_changeset git:d337764),
>>>>>>> but one of the latest commits seems to have broken boot of HVM guests
>>>>>>> (using qemu-xen) previous build with xen_changeset git:6e908ee worked 
>>>>>>> fine.
>>>> 
>>>>>> Primary suspects would seem to be 67fc274bbe and bfa84968b2,
>>>>>> but (obviously) I didn't see any issues with them in my own
>>>>>> testing, so could you
>>>>>> - instead of doing a full bisect, revert just those two
>>>> 
>>>>> Will give reverting that a shot.
>>>> 
>>>> Reverting bfa84968b2 is sufficient.
>> 
>>> Could you give this wild guess a try on top of the tree without the
>>> revert?
>> 
>>> --- unstable.orig/xen/arch/x86/hvm/emulate.c
>>> +++ unstable/xen/arch/x86/hvm/emulate.c
>>> @@ -1180,7 +1180,7 @@ static int hvmemul_rep_movs(
>>>          pfec |= PFEC_user_mode;
>>>  
>>>      bytes = PAGE_SIZE - (saddr & ~PAGE_MASK);
>> -    if ( vio->>mmio_access.read_access &&
>> +    if ( vio->>mmio_access.read_access && !vio->mmio_access.write_access &&
>>>           (vio->mmio_gla == (saddr & PAGE_MASK)) &&
>>>           bytes >= bytes_per_rep )
>>>      {
>> 
>> Unfortunately still crashes.

> Thanks for trying. Which basically just leaves the p.count > *reps
> part in that domain_crash() condition, as that's the only other thing
> involved in that check which said commit could have an effect on (as
> far as I can tell at least). Would you be up for another experiment,
> removing that one line? Other things to try (just to understand the
> issue) would be to
> - revert only each half of said commit individually (the two hunks
>   really are independent),
> - remove just the two latch_linear_to_phys() calls.

Will try some of that and let you know.

> Apart from that, and just to see whether there are other differences
> between your guest(s) and mine, could you post a guest config from
> one that's affected?

Hope you are not too disappointed it's rather sparse:

builder='hvm'
device_model_version = 'qemu-xen'
device_model_user = 'root'
memory = 512
name = 'test_guest'
vcpus = 4
cpu_weight = 768
vif = [ 'bridge=xen_bridge, ip=192.168.1.15, mac=00:16:3E:C4:72:83, model=e1000' ]
disk = [ 'phy:/dev/xen_vms/test_guest1,hda,w', 'phy:/dev/xen_vms/test_guest2,hdb,w' ]
on_crash = 'preserve'
boot='c'
vnc=0
serial='pty'

Both dom0 and guest run Debian Jessie, as said platform is AMD, running a 
4.7-rc3ish kernel.


> Jan
Jan Beulich June 15, 2016, 2:07 p.m. UTC | #4
>>> On 15.06.16 at 15:58, <linux@eikelenboom.it> wrote:
> Wednesday, June 15, 2016, 2:48:55 PM, you wrote:
>> Apart from that, and just to see whether there are other differences
>> between your guest(s) and mine, could you post a guest config from
>> one that's affected?
> 
> Hope you are not too disappointed it's rather sparse:

In no way.

> builder='hvm'
> device_model_version = 'qemu-xen'
> device_model_user = 'root'
> memory = 512
> name = 'test_guest'
> vcpus = 4
> cpu_weight = 768
> vif = [ 'bridge=xen_bridge, ip=192.168.1.15, mac=00:16:3E:C4:72:83, 
> model=e1000' ]
> disk = [ 'phy:/dev/xen_vms/test_guest1,hda,w', 
> 'phy:/dev/xen_vms/test_guest2,hdb,w' ]
> on_crash = 'preserve'
> boot='c'
> vnc=0
> serial='pty'

I wonder whether mine having

stdvga=0

matters. Albeit a quick test passing stdvga=1 works here. And I
don't think the vnc= setting should have an effect here.

Jan
Boris Ostrovsky June 15, 2016, 2:20 p.m. UTC | #5
On 06/15/2016 10:07 AM, Jan Beulich wrote:
>>>> On 15.06.16 at 15:58, <linux@eikelenboom.it> wrote:
>> Wednesday, June 15, 2016, 2:48:55 PM, you wrote:
>>> Apart from that, and just to see whether there are other differences
>>> between your guest(s) and mine, could you post a guest config from
>>> one that's affected?
>> Hope you are not too disappointed it's rather sparse:
> In no way.
>
>> builder='hvm'
>> device_model_version = 'qemu-xen'
>> device_model_user = 'root'
>> memory = 512
>> name = 'test_guest'
>> vcpus = 4
>> cpu_weight = 768
>> vif = [ 'bridge=xen_bridge, ip=192.168.1.15, mac=00:16:3E:C4:72:83, 
>> model=e1000' ]
>> disk = [ 'phy:/dev/xen_vms/test_guest1,hda,w', 
>> 'phy:/dev/xen_vms/test_guest2,hdb,w' ]
>> on_crash = 'preserve'
>> boot='c'
>> vnc=0
>> serial='pty'
> I wonder whether mine having
>
> stdvga=0
>
> matters. Albeit a quick test passing stdvga=1 works here. And I
> don't think the vnc= setting should have an effect here.

Our nightly picked up this crash as well on an AMD box (Intel passed).

I believe this is due to

+       if ( *reps * bytes_per_rep > bytes )
+            *reps = bytes / bytes_per_rep;

in hvmemul_rep_stos() and then, as you pointed out in another message,
we fail p.count > *reps comparison.

-boris

-boris
in hvmemul_rep_stos.
Jan Beulich June 15, 2016, 2:35 p.m. UTC | #6
>>> On 15.06.16 at 16:20, <boris.ostrovsky@oracle.com> wrote:
> On 06/15/2016 10:07 AM, Jan Beulich wrote:
>>>>> On 15.06.16 at 15:58, <linux@eikelenboom.it> wrote:
>>> Wednesday, June 15, 2016, 2:48:55 PM, you wrote:
>>>> Apart from that, and just to see whether there are other differences
>>>> between your guest(s) and mine, could you post a guest config from
>>>> one that's affected?
>>> Hope you are not too disappointed it's rather sparse:
>> In no way.
>>
>>> builder='hvm'
>>> device_model_version = 'qemu-xen'
>>> device_model_user = 'root'
>>> memory = 512
>>> name = 'test_guest'
>>> vcpus = 4
>>> cpu_weight = 768
>>> vif = [ 'bridge=xen_bridge, ip=192.168.1.15, mac=00:16:3E:C4:72:83, 
>>> model=e1000' ]
>>> disk = [ 'phy:/dev/xen_vms/test_guest1,hda,w', 
>>> 'phy:/dev/xen_vms/test_guest2,hdb,w' ]
>>> on_crash = 'preserve'
>>> boot='c'
>>> vnc=0
>>> serial='pty'
>> I wonder whether mine having
>>
>> stdvga=0
>>
>> matters. Albeit a quick test passing stdvga=1 works here. And I
>> don't think the vnc= setting should have an effect here.
> 
> Our nightly picked up this crash as well on an AMD box (Intel passed).
> 
> I believe this is due to
> 
> +       if ( *reps * bytes_per_rep > bytes )
> +            *reps = bytes / bytes_per_rep;
> 
> in hvmemul_rep_stos() and then, as you pointed out in another message,
> we fail p.count > *reps comparison.

But the really interesting thing then is - why only AMD?

Jan
diff mbox

Patch

--- unstable.orig/xen/arch/x86/hvm/emulate.c
+++ unstable/xen/arch/x86/hvm/emulate.c
@@ -1180,7 +1180,7 @@  static int hvmemul_rep_movs(
         pfec |= PFEC_user_mode;
 
     bytes = PAGE_SIZE - (saddr & ~PAGE_MASK);
-    if ( vio->mmio_access.read_access &&
+    if ( vio->mmio_access.read_access && !vio->mmio_access.write_access &&
          (vio->mmio_gla == (saddr & PAGE_MASK)) &&
          bytes >= bytes_per_rep )
     {