diff mbox

kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform

Message ID 20140725093127.GB5269@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Will Deacon July 25, 2014, 9:31 a.m. UTC
Hi Peter,

On Thu, Jul 24, 2014 at 09:05:39PM +0100, Peter Maydell wrote:
> On 24 July 2014 20:55, Will Deacon <will.deacon@arm.com> wrote:
> > Again, that can be solved by introduced Marc's attr for determining the
> > GICV offset within the 64k page. I don't think that's -stable material.
> 
> Agreed that we don't want to put Marc's patchset in -stable
> (and that without it systems with GICV in their host devicetree
> at pagebase+60K are unusable, so we're not actually regressing
> anything if we put this into stable). But...
> 
> >> I can't think of any way of determining whether a particular
> >> system gets this right or wrong automatically, which suggests
> >> perhaps we need to allow the device tree to specify that the
> >> GICV is 64k-page-safe...
> >
> > When we support such systems, I also think we'll need a device-tree change.
> > My main concern right now is stopping the ability to hose the entire machine
> > by trying to instantiate a virtual GIC.
> 
> ...I don't see how your patch prevents instantiating a VGIC
> and hosing the machine on a system where the 64K
> with the GICV registers in it goes
>  [GICV registers] [machine blows up if you read this]
>  0K                      8K                                             64K

True, if such a machine existed, then this patch wouldn't detect it. I don't
think we support anything like that in mainline at the moment, but the
following additional diff should solve the problem, no?


> Whether the 64K page contains Bad Stuff is completely
> orthogonal to whether the device tree offset the host has
> for the GICV is 0K, 60K or anything in between. What you
> should be checking for is "is this system design broken?",
> which is probably a device tree attribute.

Actually, I think we can just claim that the GICV occupies the full 64k
region and allow the offset within that region to be acquired via the new
ioctl.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Peter Maydell July 25, 2014, 10:06 a.m. UTC | #1
On 25 July 2014 10:31, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Jul 24, 2014 at 09:05:39PM +0100, Peter Maydell wrote:
>> On 24 July 2014 20:55, Will Deacon <will.deacon@arm.com> wrote:
>> > Again, that can be solved by introduced Marc's attr for determining the
>> > GICV offset within the 64k page. I don't think that's -stable material.
>>
>> Agreed that we don't want to put Marc's patchset in -stable
>> (and that without it systems with GICV in their host devicetree
>> at pagebase+60K are unusable, so we're not actually regressing
>> anything if we put this into stable). But...
>>
>> >> I can't think of any way of determining whether a particular
>> >> system gets this right or wrong automatically, which suggests
>> >> perhaps we need to allow the device tree to specify that the
>> >> GICV is 64k-page-safe...
>> >
>> > When we support such systems, I also think we'll need a device-tree change.
>> > My main concern right now is stopping the ability to hose the entire machine
>> > by trying to instantiate a virtual GIC.
>>
>> ...I don't see how your patch prevents instantiating a VGIC
>> and hosing the machine on a system where the 64K
>> with the GICV registers in it goes
>>  [GICV registers] [machine blows up if you read this]
>>  0K                      8K                                             64K
>
> True, if such a machine existed, then this patch wouldn't detect it. I don't
> think we support anything like that in mainline at the moment, but the
> following additional diff should solve the problem, no?

Well, the apm-storm.dtsi specifies a 64K aligned GICV
base but a size of 8K, so presumably it qualifies.
(If the GICV size should really be 64K and that's an APM
device tree bug then this patch is going to make KVM
on 64K page hosts stop working on that board... somebody
with access to the h/w docs for the APM boards needs
to check that, I guess.)

> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index fa9a95b3ed19..476d3bf540a8 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1539,6 +1539,14 @@ int kvm_vgic_hyp_init(void)
>                 goto out_unmap;
>         }
>
> +       if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
> +               kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
> +                       (unsigned long long)resource_size(&vcpu_res),
> +                       PAGE_SIZE);
> +               ret = -ENXIO;
> +               goto out_unmap;
> +       }
> +
>         vgic_vcpu_base = vcpu_res.start;
>
>         kvm_info("%s@%llx IRQ%d\n", vgic_node->name,

Yes, I think if we check both start and size are page aligned
then this will both:
 (a) avoid using the VGIC on systems where it's going to
   allow the guest to take down the machine
 (b) avoid using the VGIC if it's not at a page boundary,
   pending Marc's patchset landing and making that case
   actually work rather than do weird things.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Schopp July 25, 2014, 2:02 p.m. UTC | #2
>>>> I can't think of any way of determining whether a particular
>>>> system gets this right or wrong automatically, which suggests
>>>> perhaps we need to allow the device tree to specify that the
>>>> GICV is 64k-page-safe...
>>> When we support such systems, I also think we'll need a device-tree change.
>>> My main concern right now is stopping the ability to hose the entire machine
>>> by trying to instantiate a virtual GIC.
>> ...I don't see how your patch prevents instantiating a VGIC
>> and hosing the machine on a system where the 64K
>> with the GICV registers in it goes
>>  [GICV registers] [machine blows up if you read this]
>>  0K                      8K                                             64K
> True, if such a machine existed, then this patch wouldn't detect it. I don't
> think we support anything like that in mainline at the moment, but the
> following additional diff should solve the problem, no?
>
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index fa9a95b3ed19..476d3bf540a8 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1539,6 +1539,14 @@ int kvm_vgic_hyp_init(void)
>                 goto out_unmap;
>         }
>  
> +       if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
> +               kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
> +                       (unsigned long long)resource_size(&vcpu_res),
> +                       PAGE_SIZE);
> +               ret = -ENXIO;
> +               goto out_unmap;
> +       }
> +
>         vgic_vcpu_base = vcpu_res.start;
>  
>         kvm_info("%s@%llx IRQ%d\n", vgic_node->name,
This would break with my SOC device tree which looks like this.  Note
this device tree works just fine without checks.

    gic: interrupt-controller@e1101000 {
        compatible = "arm,gic-400-v2m";
        #interrupt-cells = <3>;
        #address-cells = <0>;
        interrupt-controller;
        msi-controller;
        reg = <0x0 0xe1110000 0 0x1000>, /* gic dist */
              <0x0 0xe112f000 0 0x2000>, /* gic cpu */
              <0x0 0xe114f000 0 0x2000>, /* gic virtual ic*/
              <0x0 0xe116f000 0 0x2000>, /* gic virtual cpu*/
              <0x0 0xe1180000 0 0x1000>; /* gic msi */
        interrupts = <1 8 0xf04>;
    };

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon July 25, 2014, 2:08 p.m. UTC | #3
Hi Joel,

On Fri, Jul 25, 2014 at 03:02:58PM +0100, Joel Schopp wrote:
> >>>> I can't think of any way of determining whether a particular
> >>>> system gets this right or wrong automatically, which suggests
> >>>> perhaps we need to allow the device tree to specify that the
> >>>> GICV is 64k-page-safe...
> >>> When we support such systems, I also think we'll need a device-tree change.
> >>> My main concern right now is stopping the ability to hose the entire machine
> >>> by trying to instantiate a virtual GIC.
> >> ...I don't see how your patch prevents instantiating a VGIC
> >> and hosing the machine on a system where the 64K
> >> with the GICV registers in it goes
> >>  [GICV registers] [machine blows up if you read this]
> >>  0K                      8K                                             64K
> > True, if such a machine existed, then this patch wouldn't detect it. I don't
> > think we support anything like that in mainline at the moment, but the
> > following additional diff should solve the problem, no?
> >
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index fa9a95b3ed19..476d3bf540a8 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1539,6 +1539,14 @@ int kvm_vgic_hyp_init(void)
> >                 goto out_unmap;
> >         }
> >  
> > +       if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
> > +               kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
> > +                       (unsigned long long)resource_size(&vcpu_res),
> > +                       PAGE_SIZE);
> > +               ret = -ENXIO;
> > +               goto out_unmap;
> > +       }
> > +
> >         vgic_vcpu_base = vcpu_res.start;
> >  
> >         kvm_info("%s@%llx IRQ%d\n", vgic_node->name,
> This would break with my SOC device tree which looks like this.  Note
> this device tree works just fine without checks.
> 
>     gic: interrupt-controller@e1101000 {
>         compatible = "arm,gic-400-v2m";
>         #interrupt-cells = <3>;
>         #address-cells = <0>;
>         interrupt-controller;
>         msi-controller;
>         reg = <0x0 0xe1110000 0 0x1000>, /* gic dist */
>               <0x0 0xe112f000 0 0x2000>, /* gic cpu */
>               <0x0 0xe114f000 0 0x2000>, /* gic virtual ic*/
>               <0x0 0xe116f000 0 0x2000>, /* gic virtual cpu*/
>               <0x0 0xe1180000 0 0x1000>; /* gic msi */
>         interrupts = <1 8 0xf04>;
>     };

I appreciate it may work, but that's only because the kernel is actually
using an alias of GICV at 0xe1160000 by accident. I would say that you're
getting away with passing an incorrect description.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Maydell July 25, 2014, 2:08 p.m. UTC | #4
On 25 July 2014 15:02, Joel Schopp <joel.schopp@amd.com> wrote:
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1539,6 +1539,14 @@ int kvm_vgic_hyp_init(void)
>>                 goto out_unmap;
>>         }
>>
>> +       if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
>> +               kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
>> +                       (unsigned long long)resource_size(&vcpu_res),
>> +                       PAGE_SIZE);
>> +               ret = -ENXIO;
>> +               goto out_unmap;
>> +       }
>> +
>>         vgic_vcpu_base = vcpu_res.start;
>>
>>         kvm_info("%s@%llx IRQ%d\n", vgic_node->name,
> This would break with my SOC device tree which looks like this.  Note
> this device tree works just fine without checks.
>
>     gic: interrupt-controller@e1101000 {
>         compatible = "arm,gic-400-v2m";
>         #interrupt-cells = <3>;
>         #address-cells = <0>;
>         interrupt-controller;
>         msi-controller;
>         reg = <0x0 0xe1110000 0 0x1000>, /* gic dist */
>               <0x0 0xe112f000 0 0x2000>, /* gic cpu */
>               <0x0 0xe114f000 0 0x2000>, /* gic virtual ic*/
>               <0x0 0xe116f000 0 0x2000>, /* gic virtual cpu*/
>               <0x0 0xe1180000 0 0x1000>; /* gic msi */
>         interrupts = <1 8 0xf04>;
>     };

That has a non-aligned GICV -- does it really work on a
mainline kernel without Marc's patches to cope with GICV
which aren't at the base of a page? I can't see how...

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Schopp July 25, 2014, 2:16 p.m. UTC | #5
On 07/25/2014 09:08 AM, Will Deacon wrote:
> Hi Joel,
>
> On Fri, Jul 25, 2014 at 03:02:58PM +0100, Joel Schopp wrote:
>>>>>> I can't think of any way of determining whether a particular
>>>>>> system gets this right or wrong automatically, which suggests
>>>>>> perhaps we need to allow the device tree to specify that the
>>>>>> GICV is 64k-page-safe...
>>>>> When we support such systems, I also think we'll need a device-tree change.
>>>>> My main concern right now is stopping the ability to hose the entire machine
>>>>> by trying to instantiate a virtual GIC.
>>>> ...I don't see how your patch prevents instantiating a VGIC
>>>> and hosing the machine on a system where the 64K
>>>> with the GICV registers in it goes
>>>>  [GICV registers] [machine blows up if you read this]
>>>>  0K                      8K                                             64K
>>> True, if such a machine existed, then this patch wouldn't detect it. I don't
>>> think we support anything like that in mainline at the moment, but the
>>> following additional diff should solve the problem, no?
>>>
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index fa9a95b3ed19..476d3bf540a8 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -1539,6 +1539,14 @@ int kvm_vgic_hyp_init(void)
>>>                 goto out_unmap;
>>>         }
>>>  
>>> +       if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
>>> +               kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
>>> +                       (unsigned long long)resource_size(&vcpu_res),
>>> +                       PAGE_SIZE);
>>> +               ret = -ENXIO;
>>> +               goto out_unmap;
>>> +       }
>>> +
>>>         vgic_vcpu_base = vcpu_res.start;
>>>  
>>>         kvm_info("%s@%llx IRQ%d\n", vgic_node->name,
>> This would break with my SOC device tree which looks like this.  Note
>> this device tree works just fine without checks.
>>
>>     gic: interrupt-controller@e1101000 {
>>         compatible = "arm,gic-400-v2m";
>>         #interrupt-cells = <3>;
>>         #address-cells = <0>;
>>         interrupt-controller;
>>         msi-controller;
>>         reg = <0x0 0xe1110000 0 0x1000>, /* gic dist */
>>               <0x0 0xe112f000 0 0x2000>, /* gic cpu */
>>               <0x0 0xe114f000 0 0x2000>, /* gic virtual ic*/
>>               <0x0 0xe116f000 0 0x2000>, /* gic virtual cpu*/
>>               <0x0 0xe1180000 0 0x1000>; /* gic msi */
>>         interrupts = <1 8 0xf04>;
>>     };
> I appreciate it may work, but that's only because the kernel is actually
> using an alias of GICV at 0xe1160000 by accident. I would say that you're
> getting away with passing an incorrect description.
The problem is that by the spec the size is 0x2000 and was never
properly rearchitected from 4K to variable page size support. The
workaround is that each of the 4K in the 64K page (16 of them) are
really mapped to the same location in hardware. So the contents of
0xe1160000 is the same as the contents of 0xe116f000. See “Appendix F:
GIC-400 and 64KB Translation Granule” in v2.1 of the ARM _Server Base
System Architecture_ specification.

Now if we were to change the entry to <0x0 0xe1160000 0 0x2000> it would
be obviously broken because the second half would map to a duplicate of
the first half.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon July 25, 2014, 2:23 p.m. UTC | #6
On Fri, Jul 25, 2014 at 03:16:15PM +0100, Joel Schopp wrote:
> On 07/25/2014 09:08 AM, Will Deacon wrote:
> >> This would break with my SOC device tree which looks like this.  Note
> >> this device tree works just fine without checks.
> >>
> >>     gic: interrupt-controller@e1101000 {
> >>         compatible = "arm,gic-400-v2m";
> >>         #interrupt-cells = <3>;
> >>         #address-cells = <0>;
> >>         interrupt-controller;
> >>         msi-controller;
> >>         reg = <0x0 0xe1110000 0 0x1000>, /* gic dist */
> >>               <0x0 0xe112f000 0 0x2000>, /* gic cpu */
> >>               <0x0 0xe114f000 0 0x2000>, /* gic virtual ic*/
> >>               <0x0 0xe116f000 0 0x2000>, /* gic virtual cpu*/
> >>               <0x0 0xe1180000 0 0x1000>; /* gic msi */
> >>         interrupts = <1 8 0xf04>;
> >>     };
> > I appreciate it may work, but that's only because the kernel is actually
> > using an alias of GICV at 0xe1160000 by accident. I would say that you're
> > getting away with passing an incorrect description.
> The problem is that by the spec the size is 0x2000 and was never
> properly rearchitected from 4K to variable page size support. The
> workaround is that each of the 4K in the 64K page (16 of them) are
> really mapped to the same location in hardware. So the contents of
> 0xe1160000 is the same as the contents of 0xe116f000. See “Appendix F:
> GIC-400 and 64KB Translation Granule” in v2.1 of the ARM _Server Base
> System Architecture_ specification.

I've read that document, but it's not mandated and I don't think I have a
single piece of hardware that actually follows it. Even the CPUs don't seem
to perform the aliasing suggesting there (take a look at the A57 and A53
TRMs -- there are reserved regions in there).

> Now if we were to change the entry to <0x0 0xe1160000 0 0x2000> it would
> be obviously broken because the second half would map to a duplicate of
> the first half.

I think you'd need something like <0x0 0xe1160000 0 0x20000> and we'd have
to change the GIC driver to do the right thing. What we currently have is
unsafe on most hardware, yet happens to work on your system.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Schopp July 25, 2014, 2:59 p.m. UTC | #7
On 07/25/2014 09:23 AM, Will Deacon wrote:
> On Fri, Jul 25, 2014 at 03:16:15PM +0100, Joel Schopp wrote:
>> On 07/25/2014 09:08 AM, Will Deacon wrote:
>>>> This would break with my SOC device tree which looks like this.  Note
>>>> this device tree works just fine without checks.
>>>>
>>>>     gic: interrupt-controller@e1101000 {
>>>>         compatible = "arm,gic-400-v2m";
>>>>         #interrupt-cells = <3>;
>>>>         #address-cells = <0>;
>>>>         interrupt-controller;
>>>>         msi-controller;
>>>>         reg = <0x0 0xe1110000 0 0x1000>, /* gic dist */
>>>>               <0x0 0xe112f000 0 0x2000>, /* gic cpu */
>>>>               <0x0 0xe114f000 0 0x2000>, /* gic virtual ic*/
>>>>               <0x0 0xe116f000 0 0x2000>, /* gic virtual cpu*/
>>>>               <0x0 0xe1180000 0 0x1000>; /* gic msi */
>>>>         interrupts = <1 8 0xf04>;
>>>>     };
>>> I appreciate it may work, but that's only because the kernel is actually
>>> using an alias of GICV at 0xe1160000 by accident. I would say that you're
>>> getting away with passing an incorrect description.
>> The problem is that by the spec the size is 0x2000 and was never
>> properly rearchitected from 4K to variable page size support. The
>> workaround is that each of the 4K in the 64K page (16 of them) are
>> really mapped to the same location in hardware. So the contents of
>> 0xe1160000 is the same as the contents of 0xe116f000. See “Appendix F:
>> GIC-400 and 64KB Translation Granule” in v2.1 of the ARM _Server Base
>> System Architecture_ specification.
> I've read that document, but it's not mandated and I don't think I have a
> single piece of hardware that actually follows it. Even the CPUs don't seem
> to perform the aliasing suggesting there (take a look at the A57 and A53
> TRMs -- there are reserved regions in there).
Not mandated, but it is obviously highly encouraged, and at a minimum
valid.  The SOC sitting under my desk follows it.

>
>> Now if we were to change the entry to <0x0 0xe1160000 0 0x2000> it would
>> be obviously broken because the second half would map to a duplicate of
>> the first half.
> I think you'd need something like <0x0 0xe1160000 0 0x20000> and we'd have
> to change the GIC driver to do the right thing. What we currently have is
> unsafe on most hardware, yet happens to work on your system.
If you change the GIC driver,kvm, kvmtool, and qemu to do the right
thing I'd be happy to change the device tree entry to <0x0 0xe1160000 0
0x20000> or any other value of your choosing.  I really have no opinion
here on how it should be done other than what is there now currently
works and I'd like to have whatever we do in the future continue to work.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index fa9a95b3ed19..476d3bf540a8 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1539,6 +1539,14 @@  int kvm_vgic_hyp_init(void)
                goto out_unmap;
        }
 
+       if (!PAGE_ALIGNED(resource_size(&vcpu_res))) {
+               kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n",
+                       (unsigned long long)resource_size(&vcpu_res),
+                       PAGE_SIZE);
+               ret = -ENXIO;
+               goto out_unmap;
+       }
+
        vgic_vcpu_base = vcpu_res.start;
 
        kvm_info("%s@%llx IRQ%d\n", vgic_node->name,