diff mbox

Calling vmalloc_to_page() on ioremap memory?

Message ID CAG_fn=Vc5134sX6JRUoGp=W0to6eg56DuW3YErqeWuR_W_O9gQ@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alexander Potapenko June 25, 2018, 2:59 p.m. UTC
Hi Ard, Mark, Andrew and others,

AFAIU, commit 029c54b09599573015a5c18dbe59cbdf42742237 ("mm/vmalloc.c:
huge-vmap: fail gracefully on unexpected huge vmap mappings") was
supposed to make vmalloc_to_page() return NULL for pointers not
returned by vmalloc().
But when I call vmalloc_to_page() for the pointer returned by
acpi_os_ioremap() (see the patch below) I see that the resulting
`struct page *` points to unmapped memory:

 }

Thanks,
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Comments

Ard Biesheuvel June 25, 2018, 3:37 p.m. UTC | #1
On 25 June 2018 at 16:59, Alexander Potapenko <glider@google.com> wrote:
> Hi Ard, Mark, Andrew and others,
>
> AFAIU, commit 029c54b09599573015a5c18dbe59cbdf42742237 ("mm/vmalloc.c:
> huge-vmap: fail gracefully on unexpected huge vmap mappings") was
> supposed to make vmalloc_to_page() return NULL for pointers not
> returned by vmalloc().
> But when I call vmalloc_to_page() for the pointer returned by
> acpi_os_ioremap() (see the patch below) I see that the resulting
> `struct page *` points to unmapped memory:
>

Why do you assume it maps memory? It could map a device's MMIO
registers as well, which don't have struct pages associated with them.

> ============================
> ACPI: Enabled 2 GPEs in block 00 to 0F
> phys: 00000000fed00000, vmalloc: ffffc9000019a000, page:

Isn't that phys address something like the HPET on a x86 system?

> ffff8800fed00000 [ffffea0003fb4000]
> BUG: unable to handle kernel paging request at ffffea0003fb4000
> PGD 3f7d5067 P4D 3f7d5067 PUD 3f7d4067 PMD 0
> Oops: 0000 [#1] SMP PTI
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc2+ #1325
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/=
> 2014
> RIP: 0010:acpi_os_map_iomem+0x1c5/0x210 ??:?
> Code: 00 88 ff ff 4d 89 f8 48 c1 f9 06 4c 89 f6 48 c7 c7 60 5f 01 82
> 48 c1 e1 0c 48 01 c1 e8 6d 42 70 ff 4d 85 ff 0f 84 14 ff ff ff <49> 8b
> 37 48 c7 c7 d2 61 01 82 e8 55 42 70 ff e9 00 ff ff ff 48 c7
> RSP: 0000:ffff88003e253840 EFLAGS: 00010286
> RAX: 000000000000005c RBX: ffff88003d857b80 RCX: ffffffff82245d38
> RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff8288e86c
> RBP: 00000000fed00000 R08: 00000000000000ae R09: 0000000000000007
> R10: 0000000000000000 R11: ffffffff828908ad R12: 0000000000001000
> R13: ffffc9000019a000 R14: 00000000fed00000 R15: ffffea0003fb4000
> FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:000000000000000=
> 0
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffea0003fb4000 CR3: 000000000220a000 CR4: 00000000000006f0
> Call Trace:
>  acpi_ex_system_memory_space_handler+0xca/0x19f ??:?
> ============================
>
> For memory error detection purposes I'm trying to map the addresses
> from the vmalloc range to valid struct pages, or at least make sure
> there's no struct page for a given address.
> Looking up the vmap_area_root rbtree isn't an option, as this must be
> done from instrumented code, including interrupt handlers.
> I've been trying to employ vmalloc_to_page(), but looks like it
> doesn't work for ioremapped addresses.
> Is this at all possible?
>
> Patch showing the problem follows. I'm building using GCC 7.1.1 on a
> defconfig for x86_64.
>
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -279,14 +279,23 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size si=
> ze)
>  static void __iomem *acpi_map(acpi_physical_address pg_off, unsigned
> long pg_sz)
>  {
>         unsigned long pfn;
> +       void __iomem *ret;
> +       struct page *page;
>
>         pfn =3D pg_off >> PAGE_SHIFT;
>         if (should_use_kmap(pfn)) {
>                 if (pg_sz > PAGE_SIZE)
>                         return NULL;
>                 return (void __iomem __force *)kmap(pfn_to_page(pfn));
> -       } else
> -               return acpi_os_ioremap(pg_off, pg_sz);
> +       } else {
> +               ret =3D acpi_os_ioremap(pg_off, pg_sz);
> +               BUG_ON(!is_vmalloc_addr(ret));
> +               page =3D vmalloc_to_page(ret);
> +               pr_err("phys: %px, vmalloc: %px, page: %px [%px]\n",
> pg_off, ret, page_address(page), page);
> +               if (page)
> +                       pr_err("flags: %d\n", page->flags);
> +               return ret;
> +       }
>  }
>
> Thanks,
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
Mark Rutland June 25, 2018, 4 p.m. UTC | #2
On Mon, Jun 25, 2018 at 04:59:23PM +0200, Alexander Potapenko wrote:
> Hi Ard, Mark, Andrew and others,
> 
> AFAIU, commit 029c54b09599573015a5c18dbe59cbdf42742237 ("mm/vmalloc.c:
> huge-vmap: fail gracefully on unexpected huge vmap mappings") was
> supposed to make vmalloc_to_page() return NULL for pointers not
> returned by vmalloc().

It's a little more subtle than that -- avoiding an edge case where we
unexpectedly hit huge mappings, rather than determining whether an
address same from vmalloc().

> For memory error detection purposes I'm trying to map the addresses
> from the vmalloc range to valid struct pages, or at least make sure
> there's no struct page for a given address.
> Looking up the vmap_area_root rbtree isn't an option, as this must be
> done from instrumented code, including interrupt handlers.

I'm not sure how you can do this without looking at VMAs.

In general, the vmalloc area can contain addresses which are not memory,
and this cannot be detremined from the address alone.

You *might* be able to get away with pfn_valid(vmalloc_to_pfn(x)), but
IIRC there's some disagreement on the precise meaning of pfn_valid(), so
that might just tell you that the address happens to fall close to some
valid memory.

Thanks,
Mark.
Alexander Potapenko June 25, 2018, 4:07 p.m. UTC | #3
On Mon, Jun 25, 2018 at 5:37 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On 25 June 2018 at 16:59, Alexander Potapenko <glider@google.com> wrote:
> > Hi Ard, Mark, Andrew and others,
> >
> > AFAIU, commit 029c54b09599573015a5c18dbe59cbdf42742237 ("mm/vmalloc.c:
> > huge-vmap: fail gracefully on unexpected huge vmap mappings") was
> > supposed to make vmalloc_to_page() return NULL for pointers not
> > returned by vmalloc().
> > But when I call vmalloc_to_page() for the pointer returned by
> > acpi_os_ioremap() (see the patch below) I see that the resulting
> > `struct page *` points to unmapped memory:
> >
>
> Why do you assume it maps memory? It could map a device's MMIO
> registers as well, which don't have struct pages associated with them.
I might have been unclear. I'm just assuming that vmalloc_to_page()
returns either a valid struct page or NULL for a valid pointer
belonging to vmalloc area.
In this case vmalloc_to_page() returns a wild pointer.

> > ============================
> > ACPI: Enabled 2 GPEs in block 00 to 0F
> > phys: 00000000fed00000, vmalloc: ffffc9000019a000, page:
>
> Isn't that phys address something like the HPET on a x86 system?
Yes, probably. I just came across it while trying to instrument every
memory access with code doing something along the lines of:
  if (is_vmalloc_addr(addr))
    return vmalloc_to_page(addr)->metadata;
  else
    if (virt_to_page(addr))
      return virt_to_page(addr)->metadata;

, I don't think there's anything specific to that physical address.
> > ffff8800fed00000 [ffffea0003fb4000]
> > BUG: unable to handle kernel paging request at ffffea0003fb4000
> > PGD 3f7d5067 P4D 3f7d5067 PUD 3f7d4067 PMD 0
> > Oops: 0000 [#1] SMP PTI
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc2+ #1325
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/=
> > 2014
> > RIP: 0010:acpi_os_map_iomem+0x1c5/0x210 ??:?
> > Code: 00 88 ff ff 4d 89 f8 48 c1 f9 06 4c 89 f6 48 c7 c7 60 5f 01 82
> > 48 c1 e1 0c 48 01 c1 e8 6d 42 70 ff 4d 85 ff 0f 84 14 ff ff ff <49> 8b
> > 37 48 c7 c7 d2 61 01 82 e8 55 42 70 ff e9 00 ff ff ff 48 c7
> > RSP: 0000:ffff88003e253840 EFLAGS: 00010286
> > RAX: 000000000000005c RBX: ffff88003d857b80 RCX: ffffffff82245d38
> > RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff8288e86c
> > RBP: 00000000fed00000 R08: 00000000000000ae R09: 0000000000000007
> > R10: 0000000000000000 R11: ffffffff828908ad R12: 0000000000001000
> > R13: ffffc9000019a000 R14: 00000000fed00000 R15: ffffea0003fb4000
> > FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:000000000000000=
> > 0
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: ffffea0003fb4000 CR3: 000000000220a000 CR4: 00000000000006f0
> > Call Trace:
> >  acpi_ex_system_memory_space_handler+0xca/0x19f ??:?
> > ============================
> >
> > For memory error detection purposes I'm trying to map the addresses
> > from the vmalloc range to valid struct pages, or at least make sure
> > there's no struct page for a given address.
> > Looking up the vmap_area_root rbtree isn't an option, as this must be
> > done from instrumented code, including interrupt handlers.
> > I've been trying to employ vmalloc_to_page(), but looks like it
> > doesn't work for ioremapped addresses.
> > Is this at all possible?
> >
> > Patch showing the problem follows. I'm building using GCC 7.1.1 on a
> > defconfig for x86_64.
> >
> > --- a/drivers/acpi/osl.c
> > +++ b/drivers/acpi/osl.c
> > @@ -279,14 +279,23 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size si=
> > ze)
> >  static void __iomem *acpi_map(acpi_physical_address pg_off, unsigned
> > long pg_sz)
> >  {
> >         unsigned long pfn;
> > +       void __iomem *ret;
> > +       struct page *page;
> >
> >         pfn =3D pg_off >> PAGE_SHIFT;
> >         if (should_use_kmap(pfn)) {
> >                 if (pg_sz > PAGE_SIZE)
> >                         return NULL;
> >                 return (void __iomem __force *)kmap(pfn_to_page(pfn));
> > -       } else
> > -               return acpi_os_ioremap(pg_off, pg_sz);
> > +       } else {
> > +               ret =3D acpi_os_ioremap(pg_off, pg_sz);
> > +               BUG_ON(!is_vmalloc_addr(ret));
> > +               page =3D vmalloc_to_page(ret);
> > +               pr_err("phys: %px, vmalloc: %px, page: %px [%px]\n",
> > pg_off, ret, page_address(page), page);
> > +               if (page)
> > +                       pr_err("flags: %d\n", page->flags);
> > +               return ret;
> > +       }
> >  }
> >
> > Thanks,
> > Alexander Potapenko
> > Software Engineer
> >
> > Google Germany GmbH
> > Erika-Mann-Straße, 33
> > 80636 München
> >
> > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> > Registergericht und -nummer: Hamburg, HRB 86891
> > Sitz der Gesellschaft: Hamburg
Ard Biesheuvel June 25, 2018, 4:18 p.m. UTC | #4
On 25 June 2018 at 18:07, Alexander Potapenko <glider@google.com> wrote:
> On Mon, Jun 25, 2018 at 5:37 PM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>>
>> On 25 June 2018 at 16:59, Alexander Potapenko <glider@google.com> wrote:
>> > Hi Ard, Mark, Andrew and others,
>> >
>> > AFAIU, commit 029c54b09599573015a5c18dbe59cbdf42742237 ("mm/vmalloc.c:
>> > huge-vmap: fail gracefully on unexpected huge vmap mappings") was
>> > supposed to make vmalloc_to_page() return NULL for pointers not
>> > returned by vmalloc().
>> > But when I call vmalloc_to_page() for the pointer returned by
>> > acpi_os_ioremap() (see the patch below) I see that the resulting
>> > `struct page *` points to unmapped memory:
>> >
>>
>> Why do you assume it maps memory? It could map a device's MMIO
>> registers as well, which don't have struct pages associated with them.
> I might have been unclear. I'm just assuming that vmalloc_to_page()
> returns either a valid struct page or NULL for a valid pointer
> belonging to vmalloc area.
> In this case vmalloc_to_page() returns a wild pointer.
>

is_vmalloc_addr() only checks whether the mapping is within the
boundaries of the VMALLOC region, and does not check whether it is in
fact a VM_ALLOC or VM_MAP mapping, and vmalloc_to_page() should
probably only be called on mappings of that type. The reason it is
implemented like this may well be the issue that you highlight, i.e.,
that this cannot be done from every context.

As for the patch, it was intended to ensure that vmalloc_to_page()
does not blindly assume that VM_MAP regions are mapped down to pages,
which is not the case for mappings of the kernel image on arm64.



>> > ============================
>> > ACPI: Enabled 2 GPEs in block 00 to 0F
>> > phys: 00000000fed00000, vmalloc: ffffc9000019a000, page:
>>
>> Isn't that phys address something like the HPET on a x86 system?
> Yes, probably. I just came across it while trying to instrument every
> memory access with code doing something along the lines of:
>   if (is_vmalloc_addr(addr))
>     return vmalloc_to_page(addr)->metadata;
>   else
>     if (virt_to_page(addr))
>       return virt_to_page(addr)->metadata;
>
> , I don't think there's anything specific to that physical address.
>> > ffff8800fed00000 [ffffea0003fb4000]
>> > BUG: unable to handle kernel paging request at ffffea0003fb4000
>> > PGD 3f7d5067 P4D 3f7d5067 PUD 3f7d4067 PMD 0
>> > Oops: 0000 [#1] SMP PTI
>> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc2+ #1325
>> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/=
>> > 2014
>> > RIP: 0010:acpi_os_map_iomem+0x1c5/0x210 ??:?
>> > Code: 00 88 ff ff 4d 89 f8 48 c1 f9 06 4c 89 f6 48 c7 c7 60 5f 01 82
>> > 48 c1 e1 0c 48 01 c1 e8 6d 42 70 ff 4d 85 ff 0f 84 14 ff ff ff <49> 8b
>> > 37 48 c7 c7 d2 61 01 82 e8 55 42 70 ff e9 00 ff ff ff 48 c7
>> > RSP: 0000:ffff88003e253840 EFLAGS: 00010286
>> > RAX: 000000000000005c RBX: ffff88003d857b80 RCX: ffffffff82245d38
>> > RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff8288e86c
>> > RBP: 00000000fed00000 R08: 00000000000000ae R09: 0000000000000007
>> > R10: 0000000000000000 R11: ffffffff828908ad R12: 0000000000001000
>> > R13: ffffc9000019a000 R14: 00000000fed00000 R15: ffffea0003fb4000
>> > FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:000000000000000=
>> > 0
>> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > CR2: ffffea0003fb4000 CR3: 000000000220a000 CR4: 00000000000006f0
>> > Call Trace:
>> >  acpi_ex_system_memory_space_handler+0xca/0x19f ??:?
>> > ============================
>> >
>> > For memory error detection purposes I'm trying to map the addresses
>> > from the vmalloc range to valid struct pages, or at least make sure
>> > there's no struct page for a given address.
>> > Looking up the vmap_area_root rbtree isn't an option, as this must be
>> > done from instrumented code, including interrupt handlers.
>> > I've been trying to employ vmalloc_to_page(), but looks like it
>> > doesn't work for ioremapped addresses.
>> > Is this at all possible?
>> >
>> > Patch showing the problem follows. I'm building using GCC 7.1.1 on a
>> > defconfig for x86_64.
>> >
>> > --- a/drivers/acpi/osl.c
>> > +++ b/drivers/acpi/osl.c
>> > @@ -279,14 +279,23 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size si=
>> > ze)
>> >  static void __iomem *acpi_map(acpi_physical_address pg_off, unsigned
>> > long pg_sz)
>> >  {
>> >         unsigned long pfn;
>> > +       void __iomem *ret;
>> > +       struct page *page;
>> >
>> >         pfn =3D pg_off >> PAGE_SHIFT;
>> >         if (should_use_kmap(pfn)) {
>> >                 if (pg_sz > PAGE_SIZE)
>> >                         return NULL;
>> >                 return (void __iomem __force *)kmap(pfn_to_page(pfn));
>> > -       } else
>> > -               return acpi_os_ioremap(pg_off, pg_sz);
>> > +       } else {
>> > +               ret =3D acpi_os_ioremap(pg_off, pg_sz);
>> > +               BUG_ON(!is_vmalloc_addr(ret));
>> > +               page =3D vmalloc_to_page(ret);
>> > +               pr_err("phys: %px, vmalloc: %px, page: %px [%px]\n",
>> > pg_off, ret, page_address(page), page);
>> > +               if (page)
>> > +                       pr_err("flags: %d\n", page->flags);
>> > +               return ret;
>> > +       }
>> >  }
>> >
>> > Thanks,
>> > Alexander Potapenko
>> > Software Engineer
>> >
>> > Google Germany GmbH
>> > Erika-Mann-Straße, 33
>> > 80636 München
>> >
>> > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>> > Registergericht und -nummer: Hamburg, HRB 86891
>> > Sitz der Gesellschaft: Hamburg
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
Alexander Potapenko June 25, 2018, 4:24 p.m. UTC | #5
On Mon, Jun 25, 2018 at 6:00 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Mon, Jun 25, 2018 at 04:59:23PM +0200, Alexander Potapenko wrote:
> > Hi Ard, Mark, Andrew and others,
> >
> > AFAIU, commit 029c54b09599573015a5c18dbe59cbdf42742237 ("mm/vmalloc.c:
> > huge-vmap: fail gracefully on unexpected huge vmap mappings") was
> > supposed to make vmalloc_to_page() return NULL for pointers not
> > returned by vmalloc().
>
> It's a little more subtle than that -- avoiding an edge case where we
> unexpectedly hit huge mappings, rather than determining whether an
> address same from vmalloc().
Ok, but anyway, acpi_os_ioremap() creates a huge page mapping via
__ioremap_caller() (see
https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/ioremap.c#L133)
Shouldn't these checks detect that as well?

> > For memory error detection purposes I'm trying to map the addresses
> > from the vmalloc range to valid struct pages, or at least make sure
> > there's no struct page for a given address.
> > Looking up the vmap_area_root rbtree isn't an option, as this must be
> > done from instrumented code, including interrupt handlers.
>
> I'm not sure how you can do this without looking at VMAs.
>
> In general, the vmalloc area can contain addresses which are not memory,
> and this cannot be detremined from the address alone.
I thought this was exactly what vmalloc_to_page() did, but apparently no.

> You *might* be able to get away with pfn_valid(vmalloc_to_pfn(x)), but
> IIRC there's some disagreement on the precise meaning of pfn_valid(), so
> that might just tell you that the address happens to fall close to some
> valid memory.
This appears to work, at least for ACPI mappings. I'll check other cases though.
Thank you!
> Thanks,
> Mark.
Mark Rutland June 25, 2018, 4:27 p.m. UTC | #6
On Mon, Jun 25, 2018 at 06:24:57PM +0200, Alexander Potapenko wrote:
> On Mon, Jun 25, 2018 at 6:00 PM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Mon, Jun 25, 2018 at 04:59:23PM +0200, Alexander Potapenko wrote:
> > > Hi Ard, Mark, Andrew and others,
> > >
> > > AFAIU, commit 029c54b09599573015a5c18dbe59cbdf42742237 ("mm/vmalloc.c:
> > > huge-vmap: fail gracefully on unexpected huge vmap mappings") was
> > > supposed to make vmalloc_to_page() return NULL for pointers not
> > > returned by vmalloc().
> >
> > It's a little more subtle than that -- avoiding an edge case where we
> > unexpectedly hit huge mappings, rather than determining whether an
> > address same from vmalloc().
> Ok, but anyway, acpi_os_ioremap() creates a huge page mapping via
> __ioremap_caller() (see
> https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/ioremap.c#L133)
> Shouldn't these checks detect that as well?

It should catch such mappings, yes.

> > > For memory error detection purposes I'm trying to map the addresses
> > > from the vmalloc range to valid struct pages, or at least make sure
> > > there's no struct page for a given address.
> > > Looking up the vmap_area_root rbtree isn't an option, as this must be
> > > done from instrumented code, including interrupt handlers.
> >
> > I'm not sure how you can do this without looking at VMAs.
> >
> > In general, the vmalloc area can contain addresses which are not memory,
> > and this cannot be detremined from the address alone.
> I thought this was exactly what vmalloc_to_page() did, but apparently no.
> 
> > You *might* be able to get away with pfn_valid(vmalloc_to_pfn(x)), but
> > IIRC there's some disagreement on the precise meaning of pfn_valid(), so
> > that might just tell you that the address happens to fall close to some
> > valid memory.
> This appears to work, at least for ACPI mappings. I'll check other cases though.
> Thank you!

Great!

Thanks,
Mark.
Alexander Potapenko June 26, 2018, 10 a.m. UTC | #7
On Mon, Jun 25, 2018 at 6:27 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Mon, Jun 25, 2018 at 06:24:57PM +0200, Alexander Potapenko wrote:
> > On Mon, Jun 25, 2018 at 6:00 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Mon, Jun 25, 2018 at 04:59:23PM +0200, Alexander Potapenko wrote:
> > > > Hi Ard, Mark, Andrew and others,
> > > >
> > > > AFAIU, commit 029c54b09599573015a5c18dbe59cbdf42742237 ("mm/vmalloc.c:
> > > > huge-vmap: fail gracefully on unexpected huge vmap mappings") was
> > > > supposed to make vmalloc_to_page() return NULL for pointers not
> > > > returned by vmalloc().
> > >
> > > It's a little more subtle than that -- avoiding an edge case where we
> > > unexpectedly hit huge mappings, rather than determining whether an
> > > address same from vmalloc().
> > Ok, but anyway, acpi_os_ioremap() creates a huge page mapping via
> > __ioremap_caller() (see
> > https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/ioremap.c#L133)
> > Shouldn't these checks detect that as well?
>
> It should catch such mappings, yes.
>
> > > > For memory error detection purposes I'm trying to map the addresses
> > > > from the vmalloc range to valid struct pages, or at least make sure
> > > > there's no struct page for a given address.
> > > > Looking up the vmap_area_root rbtree isn't an option, as this must be
> > > > done from instrumented code, including interrupt handlers.
> > >
> > > I'm not sure how you can do this without looking at VMAs.
> > >
> > > In general, the vmalloc area can contain addresses which are not memory,
> > > and this cannot be detremined from the address alone.
> > I thought this was exactly what vmalloc_to_page() did, but apparently no.
> >
> > > You *might* be able to get away with pfn_valid(vmalloc_to_pfn(x)), but
> > > IIRC there's some disagreement on the precise meaning of pfn_valid(), so
> > > that might just tell you that the address happens to fall close to some
> > > valid memory.
> > This appears to work, at least for ACPI mappings. I'll check other cases though.
> > Thank you!
pfn_valid(vmalloc_to_pfn(x)) works for me, so I'll stick to this
solution for now. Thanks again!

But just to clarify, should vmalloc_to_page() return NULL for a huge
mapping returned by __ioremap_caller()?
Your answer and that of Ard seem to be contradictory.
Maybe it's a good idea to add the pfn_valid() check to
vmalloc_to_page() just to be sure?
> Great!
>
> Thanks,
> Mark.
Mark Rutland June 26, 2018, 12:10 p.m. UTC | #8
On Tue, Jun 26, 2018 at 12:00:00PM +0200, Alexander Potapenko wrote:
> On Mon, Jun 25, 2018 at 6:27 PM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Mon, Jun 25, 2018 at 06:24:57PM +0200, Alexander Potapenko wrote:
> > > On Mon, Jun 25, 2018 at 6:00 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > > >
> > > > On Mon, Jun 25, 2018 at 04:59:23PM +0200, Alexander Potapenko wrote:
> > > > > Hi Ard, Mark, Andrew and others,
> > > > >
> > > > > AFAIU, commit 029c54b09599573015a5c18dbe59cbdf42742237 ("mm/vmalloc.c:
> > > > > huge-vmap: fail gracefully on unexpected huge vmap mappings") was
> > > > > supposed to make vmalloc_to_page() return NULL for pointers not
> > > > > returned by vmalloc().
> > > >
> > > > It's a little more subtle than that -- avoiding an edge case where we
> > > > unexpectedly hit huge mappings, rather than determining whether an
> > > > address same from vmalloc().
> > > Ok, but anyway, acpi_os_ioremap() creates a huge page mapping via
> > > __ioremap_caller() (see
> > > https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/ioremap.c#L133)
> > > Shouldn't these checks detect that as well?
> >
> > It should catch such mappings, yes.
> >
> > > > > For memory error detection purposes I'm trying to map the addresses
> > > > > from the vmalloc range to valid struct pages, or at least make sure
> > > > > there's no struct page for a given address.
> > > > > Looking up the vmap_area_root rbtree isn't an option, as this must be
> > > > > done from instrumented code, including interrupt handlers.
> > > >
> > > > I'm not sure how you can do this without looking at VMAs.
> > > >
> > > > In general, the vmalloc area can contain addresses which are not memory,
> > > > and this cannot be detremined from the address alone.
> > > I thought this was exactly what vmalloc_to_page() did, but apparently no.
> > >
> > > > You *might* be able to get away with pfn_valid(vmalloc_to_pfn(x)), but
> > > > IIRC there's some disagreement on the precise meaning of pfn_valid(), so
> > > > that might just tell you that the address happens to fall close to some
> > > > valid memory.
> > > This appears to work, at least for ACPI mappings. I'll check other cases though.
> > > Thank you!
> pfn_valid(vmalloc_to_pfn(x)) works for me, so I'll stick to this
> solution for now. Thanks again!
> 
> But just to clarify, should vmalloc_to_page() return NULL for a huge
> mapping returned by __ioremap_caller()?

It will not always do so.

It *may* return NULL, or it may return a potentially invalid pointer to
struct page.

> Your answer and that of Ard seem to be contradictory.
> Maybe it's a good idea to add the pfn_valid() check to
> vmalloc_to_page() just to be sure?

Perhaps, though it really depends on the intended use case of
vmalloc_to_page().

Thanks,
Mark.
diff mbox

Patch

============================
ACPI: Enabled 2 GPEs in block 00 to 0F
phys: 00000000fed00000, vmalloc: ffffc9000019a000, page:
ffff8800fed00000 [ffffea0003fb4000]
BUG: unable to handle kernel paging request at ffffea0003fb4000
PGD 3f7d5067 P4D 3f7d5067 PUD 3f7d4067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc2+ #1325
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/=
2014
RIP: 0010:acpi_os_map_iomem+0x1c5/0x210 ??:?
Code: 00 88 ff ff 4d 89 f8 48 c1 f9 06 4c 89 f6 48 c7 c7 60 5f 01 82
48 c1 e1 0c 48 01 c1 e8 6d 42 70 ff 4d 85 ff 0f 84 14 ff ff ff <49> 8b
37 48 c7 c7 d2 61 01 82 e8 55 42 70 ff e9 00 ff ff ff 48 c7
RSP: 0000:ffff88003e253840 EFLAGS: 00010286
RAX: 000000000000005c RBX: ffff88003d857b80 RCX: ffffffff82245d38
RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff8288e86c
RBP: 00000000fed00000 R08: 00000000000000ae R09: 0000000000000007
R10: 0000000000000000 R11: ffffffff828908ad R12: 0000000000001000
R13: ffffc9000019a000 R14: 00000000fed00000 R15: ffffea0003fb4000
FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:000000000000000=
0
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffea0003fb4000 CR3: 000000000220a000 CR4: 00000000000006f0
Call Trace:
 acpi_ex_system_memory_space_handler+0xca/0x19f ??:?
============================

For memory error detection purposes I'm trying to map the addresses
from the vmalloc range to valid struct pages, or at least make sure
there's no struct page for a given address.
Looking up the vmap_area_root rbtree isn't an option, as this must be
done from instrumented code, including interrupt handlers.
I've been trying to employ vmalloc_to_page(), but looks like it
doesn't work for ioremapped addresses.
Is this at all possible?

Patch showing the problem follows. I'm building using GCC 7.1.1 on a
defconfig for x86_64.

--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -279,14 +279,23 @@  acpi_map_lookup_virt(void __iomem *virt, acpi_size si=
ze)
 static void __iomem *acpi_map(acpi_physical_address pg_off, unsigned
long pg_sz)
 {
        unsigned long pfn;
+       void __iomem *ret;
+       struct page *page;

        pfn =3D pg_off >> PAGE_SHIFT;
        if (should_use_kmap(pfn)) {
                if (pg_sz > PAGE_SIZE)
                        return NULL;
                return (void __iomem __force *)kmap(pfn_to_page(pfn));
-       } else
-               return acpi_os_ioremap(pg_off, pg_sz);
+       } else {
+               ret =3D acpi_os_ioremap(pg_off, pg_sz);
+               BUG_ON(!is_vmalloc_addr(ret));
+               page =3D vmalloc_to_page(ret);
+               pr_err("phys: %px, vmalloc: %px, page: %px [%px]\n",
pg_off, ret, page_address(page), page);
+               if (page)
+                       pr_err("flags: %d\n", page->flags);
+               return ret;
+       }