[V2,00/18] mm/highmem: Preemptible variant of kmap_atomic & friends

Message ID	20201029221806.189523375@linutronix.de (mailing list archive)
Headers	show Return-Path: <SRS0=hvK6=EE=vger.kernel.org=linux-mips-owner@kernel.org> Message-Id: <20201029221806.189523375@linutronix.de> Date: Thu, 29 Oct 2020 23:18:06 +0100 From: Thomas Gleixner <tglx@linutronix.de> To: LKML <linux-kernel@vger.kernel.org> Cc: linux-arch@vger.kernel.org, Linus Torvalds <torvalds@linuxfoundation.org>, Peter Zijlstra <peterz@infradead.org>, Paul McKenney <paulmck@kernel.org>, David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>, Ard Biesheuvel <ardb@kernel.org>, Herbert Xu <herbert@gondor.apana.org.au>, Christoph Hellwig <hch@lst.de>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Ingo Molnar <mingo@kernel.org>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, x86@kernel.org, Vineet Gupta <vgupta@synopsys.com>, linux-snps-arc@lists.infradead.org, Russell King <linux@armlinux.org.uk>, Arnd Bergmann <arnd@arndb.de>, linux-arm-kernel@lists.infradead.org, Guo Ren <guoren@kernel.org>, linux-csky@vger.kernel.org, Michal Simek <monstr@monstr.eu>, Thomas Bogendoerfer <tsbogend@alpha.franken.de>, linux-mips@vger.kernel.org, Nick Hu <nickhu@andestech.com>, Greentime Hu <green.hu@gmail.com>, Vincent Chen <deanbo422@gmail.com>, Michael Ellerman <mpe@ellerman.id.au>, Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, linuxppc-dev@lists.ozlabs.org, "David S. Miller" <davem@davemloft.net>, sparclinux@vger.kernel.org, Chris Zankel <chris@zankel.net>, Max Filippov <jcmvbkbc@gmail.com>, linux-xtensa@linux-xtensa.org Subject: [patch V2 00/18] mm/highmem: Preemptible variant of kmap_atomic & friends MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Precedence: bulk
Series	mm/highmem: Preemptible variant of kmap_atomic & friends \| expand [V2,00/18] mm/highmem: Preemptible variant of kmap_atomic & friends [V2,01/18] sched: Make migrate_disable/enable() independent of RT [V2,02/18] mm/highmem: Un-EXPORT __kmap_atomic_idx() [V2,03/18] highmem: Provide generic variant of kmap_atomic* [V2,04/18] x86/mm/highmem: Use generic kmap atomic implementation [V2,05/18] arc/mm/highmem: Use generic kmap atomic implementation [V2,06/18] ARM: highmem: Switch to generic kmap atomic [V2,07/18] csky/mm/highmem: Switch to generic kmap atomic [V2,08/18] microblaze/mm/highmem: Switch to generic kmap atomic [V2,09/18] mips/mm/highmem: Switch to generic kmap atomic [V2,10/18] nds32/mm/highmem: Switch to generic kmap atomic [V2,11/18] powerpc/mm/highmem: Switch to generic kmap atomic [V2,12/18] sparc/mm/highmem: Switch to generic kmap atomic [V2,13/18] xtensa/mm/highmem: Switch to generic kmap atomic [V2,14/18] mm/highmem: Remove the old kmap_atomic cruft [V2,15/18] io-mapping: Cleanup atomic iomap [V2,16/18] sched: highmem: Store local kmaps in task struct [V2,17/18] mm/highmem: Provide kmap_local* [V2,18/18] io-mapping: Provide iomap_local variant

Thomas Gleixner Oct. 29, 2020, 10:18 p.m. UTC

Following up to the discussion in:

  https://lore.kernel.org/r/20200914204209.256266093@linutronix.de

and the initial version of this:

  https://lore.kernel.org/r/20200919091751.011116649@linutronix.de

this series provides a preemptible variant of kmap_atomic & related
interfaces.

Now that the scheduler folks have wrapped their heads around the migration
disable scheduler woes, there is not a real reason anymore to confine
migration disabling to RT.

As expressed in the earlier discussion by graphics and crypto folks, there
is interest to get rid of their kmap_atomic* usage because they need only a
temporary stable map and not all the bells and whistels of kmap_atomic*.

This series provides kmap_local.* iomap_local variants which only disable
migration to keep the virtual mapping address stable accross preemption,
but do neither disable pagefaults nor preemption. The new functions can be
used in any context, but if used in atomic context the caller has to take
care of eventually disabling pagefaults.

This is achieved by:

 - Removing the RT dependency from migrate_disable/enable()

 - Consolidating all kmap atomic implementations in generic code

 - Switching from per CPU storage of the kmap index to a per task storage

 - Adding a pteval array to the per task storage which contains the ptevals
   of the currently active temporary kmaps

 - Adding context switch code which checks whether the outgoing or the
   incoming task has active temporary kmaps. If so, the outgoing task's
   kmaps are removed and the incoming task's kmaps are restored.

 - Adding new interfaces k[un]map_temporary*() which are not disabling
   preemption and can be called from any context (except NMI).

   Contrary to kmap() which provides preemptible and "persistant" mappings,
   these interfaces are meant to replace the temporary mappings provided by
   kmap_atomic*() today.

This allows to get rid of conditional mapping choices and allows to have
preemptible short term mappings on 64bit which are today enforced to be
non-preemptible due to the highmem constraints. It clearly puts overhead on
the highmem users, but highmem is slow anyway.

This is not a wholesale conversion which makes kmap_atomic magically
preemptible because there might be usage sites which rely on the implicit
preempt disable. So this needs to be done on a case by case basis and the
call sites converted to kmap_temporary.

Note, that this is only lightly tested on X86 and completely untested on
all other architectures.

There is also a still to be investigated question from Linus on the initial
posting versus the per cpu / per task mapping stack depth which might need
to be made larger due to the ability to take page faults within a mapping
region.

Though I wanted to share the current state of affairs before investigating
that further. If there is consensus in going forward with this, I'll have a
deeper look into this issue.

The lot is available from

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git highmem

It is based on Peter Zijlstras migrate disable branch which is close to be
merged into the tip tree, but still not finalized:

   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/migrate-disable

Changes vs. V1:

  - Make it truly functional by depending on migrate disable/enable (Brown paperbag)
  - Rename to kmap_local.* (Linus)
  - Fix the sched in/out issue Linus pointed out
  - Fix a few style issues (Christoph)
  - Split a few things out into seperate patches to make review simpler
  - Pick up acked/reviewed tags as appropriate

Thanks,

	tglx
---
 a/arch/arm/mm/highmem.c               |  121 ------------------
 a/arch/microblaze/mm/highmem.c        |   78 ------------
 a/arch/nds32/mm/highmem.c             |   48 -------
 a/arch/powerpc/mm/highmem.c           |   67 ----------
 a/arch/sparc/mm/highmem.c             |  115 -----------------
 arch/arc/Kconfig                      |    1 
 arch/arc/include/asm/highmem.h        |    8 +
 arch/arc/mm/highmem.c                 |   44 ------
 arch/arm/Kconfig                      |    1 
 arch/arm/include/asm/highmem.h        |   31 +++-
 arch/arm/mm/Makefile                  |    1 
 arch/csky/Kconfig                     |    1 
 arch/csky/include/asm/highmem.h       |    4 
 arch/csky/mm/highmem.c                |   75 -----------
 arch/microblaze/Kconfig               |    1 
 arch/microblaze/include/asm/highmem.h |    6 
 arch/microblaze/mm/Makefile           |    1 
 arch/microblaze/mm/init.c             |    6 
 arch/mips/Kconfig                     |    1 
 arch/mips/include/asm/highmem.h       |    4 
 arch/mips/mm/highmem.c                |   77 ------------
 arch/mips/mm/init.c                   |    3 
 arch/nds32/Kconfig.cpu                |    1 
 arch/nds32/include/asm/highmem.h      |   21 ++-
 arch/nds32/mm/Makefile                |    1 
 arch/powerpc/Kconfig                  |    1 
 arch/powerpc/include/asm/highmem.h    |    6 
 arch/powerpc/mm/Makefile              |    1 
 arch/powerpc/mm/mem.c                 |    7 -
 arch/sparc/Kconfig                    |    1 
 arch/sparc/include/asm/highmem.h      |    7 -
 arch/sparc/mm/Makefile                |    3 
 arch/sparc/mm/srmmu.c                 |    2 
 arch/x86/include/asm/fixmap.h         |    1 
 arch/x86/include/asm/highmem.h        |   12 +
 arch/x86/include/asm/iomap.h          |   13 --
 arch/x86/mm/highmem_32.c              |   59 ---------
 arch/x86/mm/init_32.c                 |   15 --
 arch/x86/mm/iomap_32.c                |   57 --------
 arch/xtensa/Kconfig                   |    1 
 arch/xtensa/include/asm/highmem.h     |    9 +
 arch/xtensa/mm/highmem.c              |   44 ------
 b/arch/x86/Kconfig                    |    3 
 include/linux/highmem.h               |  203 ++++++++++++++++++++++---------
 include/linux/io-mapping.h            |   42 +++++-
 include/linux/preempt.h               |   38 -----
 include/linux/sched.h                 |   11 +
 kernel/entry/common.c                 |    2 
 kernel/fork.c                         |    1 
 kernel/sched/core.c                   |   30 +++-
 kernel/sched/sched.h                  |    2 
 lib/smp_processor_id.c                |    2 
 mm/Kconfig                            |    3 
 mm/highmem.c                          |  218 ++++++++++++++++++++++++++++++++--
 54 files changed, 542 insertions(+), 969 deletions(-)

Linus Torvalds Oct. 29, 2020, 11:11 p.m. UTC | #1

On Thu, Oct 29, 2020 at 3:32 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
>
> Though I wanted to share the current state of affairs before investigating
> that further. If there is consensus in going forward with this, I'll have a
> deeper look into this issue.

Me likee. I think this looks like the right thing to do.

I didn't actually apply the patches, but just from reading them it
_looks_ to me like you do the migrate_disable() unconditionally, even
if it's not a highmem page..

That sounds like it might be a good thing for debugging, but not
necessarily great in general.

Or am I misreading things?

                Linus

Thomas Gleixner Oct. 29, 2020, 11:41 p.m. UTC | #2

On Thu, Oct 29 2020 at 16:11, Linus Torvalds wrote:
> On Thu, Oct 29, 2020 at 3:32 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Though I wanted to share the current state of affairs before investigating
>> that further. If there is consensus in going forward with this, I'll have a
>> deeper look into this issue.
>
> Me likee. I think this looks like the right thing to do.
>
> I didn't actually apply the patches, but just from reading them it
> _looks_ to me like you do the migrate_disable() unconditionally, even
> if it's not a highmem page..
>
> That sounds like it might be a good thing for debugging, but not
> necessarily great in general.
>
> Or am I misreading things?

No, you're not misreading it, but doing it conditionally would be a
complete semantical disaster. kmap_atomic*() also disables preemption
and pagefaults unconditionaly.  If that wouldn't be the case then every
caller would have to have conditionals like 'if (CONFIG_HIGHMEM)' or
worse 'if (PageHighMem(page)'.

Let's not go there.

Migrate disable is a less horrible plague than preempt and pagefault
disable even if the scheduler people disagree due to the lack of theory
backing that up :)

The charm of the new interface is that users still can rely on per
cpuness independent of being on a highmem plagued system. For non
highmem systems the extra migrate disable/enable is really a minor
nuissance.

Thanks,

        tglx

Christoph Hellwig Oct. 30, 2020, 7:25 a.m. UTC | #3

On Thu, Oct 29, 2020 at 11:18:06PM +0100, Thomas Gleixner wrote:
> This is achieved by:

...

> 
>  - Consolidating all kmap atomic implementations in generic code

...

> Though I wanted to share the current state of affairs before investigating
> that further. If there is consensus in going forward with this, I'll have a
> deeper look into this issue.

I think the consolidation is a winner no matter where we go next.  Maybe
split it out in a prep series so we can get it in ASAP?

Thomas Gleixner Oct. 30, 2020, 9:39 a.m. UTC | #4

On Fri, Oct 30 2020 at 00:41, Thomas Gleixner wrote:
> On Thu, Oct 29 2020 at 16:11, Linus Torvalds wrote:
> No, you're not misreading it, but doing it conditionally would be a
> complete semantical disaster. kmap_atomic*() also disables preemption
> and pagefaults unconditionaly.  If that wouldn't be the case then every
> caller would have to have conditionals like 'if (CONFIG_HIGHMEM)' or
> worse 'if (PageHighMem(page)'.
>
> Let's not go there.
>
> Migrate disable is a less horrible plague than preempt and pagefault
> disable even if the scheduler people disagree due to the lack of theory
> backing that up :)
>
> The charm of the new interface is that users still can rely on per
> cpuness independent of being on a highmem plagued system. For non
> highmem systems the extra migrate disable/enable is really a minor
> nuissance.

thinking about it some more after having sleep and coffee, we actually
could hide the migrate disable in the actual highmem part.

But then we really should not name it kmap_local. 'local' suggests
locality, think local_irq*, local_bh* ... kmap_task would be more
accurate then.

Toughts?

        tglx

Thomas Gleixner Oct. 30, 2020, 9:39 a.m. UTC | #5

On Fri, Oct 30 2020 at 08:25, Christoph Hellwig wrote:
> On Thu, Oct 29, 2020 at 11:18:06PM +0100, Thomas Gleixner wrote:
>>  - Consolidating all kmap atomic implementations in generic code
>
> I think the consolidation is a winner no matter where we go next.  Maybe
> split it out in a prep series so we can get it in ASAP?

Yes, patch 2-15 can just go without any dependency. The only thing which
needs a bit of thought is naming. See the other reply to Linus.

Thanks,

        tglx

Matthew Wilcox Oct. 30, 2020, 1:06 p.m. UTC | #6

On Thu, Oct 29, 2020 at 11:18:06PM +0100, Thomas Gleixner wrote:
> This series provides kmap_local.* iomap_local variants which only disable
> migration to keep the virtual mapping address stable accross preemption,
> but do neither disable pagefaults nor preemption. The new functions can be
> used in any context, but if used in atomic context the caller has to take
> care of eventually disabling pagefaults.

Could I ask for a CONFIG_KMAP_DEBUG which aliases all the kmap variants
to vmap()?  I think we currently have a problem in iov_iter on HIGHMEM
configs:

copy_page_to_iter() calls page_copy_sane() which checks:

        head = compound_head(page);
        if (likely(n <= v && v <= page_size(head)))
                return true;

but then:

                void *kaddr = kmap_atomic(page);
                size_t wanted = copy_to_iter(kaddr + offset, bytes, i);
                kunmap_atomic(kaddr);

so if offset to offset+bytes is larger than PAGE_SIZE, this is going to
work for lowmem pages and fail miserably for highmem pages.  I suggest
vmap() because vmap has a PAGE_SIZE gap between each allocation.

Alternatively if we could have a kmap_atomic_compound(), that would
be awesome, but probably not realistic to implement.  I've more
or less resigned myself to having to map things one page at a time.

Thomas Gleixner Oct. 30, 2020, 7:35 p.m. UTC | #7

On Fri, Oct 30 2020 at 13:06, Matthew Wilcox wrote:
> On Thu, Oct 29, 2020 at 11:18:06PM +0100, Thomas Gleixner wrote:
>> This series provides kmap_local.* iomap_local variants which only disable
>> migration to keep the virtual mapping address stable accross preemption,
>> but do neither disable pagefaults nor preemption. The new functions can be
>> used in any context, but if used in atomic context the caller has to take
>> care of eventually disabling pagefaults.
>
> Could I ask for a CONFIG_KMAP_DEBUG which aliases all the kmap variants
> to vmap()?  I think we currently have a problem in iov_iter on HIGHMEM
> configs:

For kmap() that would work, but for kmap_atomic() not so much when it is
called in non-preemptible context because vmap() might sleep.

> copy_page_to_iter() calls page_copy_sane() which checks:
>
>         head = compound_head(page);
>         if (likely(n <= v && v <= page_size(head)))
>                 return true;
>
> but then:
>
>                 void *kaddr = kmap_atomic(page);
>                 size_t wanted = copy_to_iter(kaddr + offset, bytes, i);
>                 kunmap_atomic(kaddr);
>
> so if offset to offset+bytes is larger than PAGE_SIZE, this is going to
> work for lowmem pages and fail miserably for highmem pages.  I suggest
> vmap() because vmap has a PAGE_SIZE gap between each allocation.

On 32bit highmem the kmap_atomic() case is easy: Double the number of
mapping slots and only use every second one, which gives you a guard
page between the maps.

For 64bit we could do something ugly: Enable the highmem kmap_atomic()
crud and enforce an alias mapping (at least on the architectures where
this is reasonable). Then you get the same as for 32bit.

> Alternatively if we could have a kmap_atomic_compound(), that would
> be awesome, but probably not realistic to implement.  I've more
> or less resigned myself to having to map things one page at a time.

That might be horribly awesome on 32bit :)

Thanks,

        tglx

Linus Torvalds Oct. 30, 2020, 8:28 p.m. UTC | #8

On Fri, Oct 30, 2020 at 2:39 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> But then we really should not name it kmap_local. 'local' suggests
> locality, think local_irq*, local_bh* ... kmap_task would be more
> accurate then.

So the main reason I'd like to see it is because I think on a
non-highmem machine, the new kmap should be a complete no-op. IOW,
we'd make sure that there are no costs, no need to increment any
"restrict migration" counts etc.

It's been a bit of a pain to have kmap_atomic() have magical side
semantics that people might then depend on.

I think "local" could still work as a name, because it would have to
be thread-local (and maybe we'd want a debug mode where that gets
verified, as actual HIGHMEM machines are getting rare).

I'd avoid "task", because that implies (to me, at least) that it
wouldn't be good for interrupts etc that don't have a task context.

I think the main issue is that it has to be released in the same
context as it was created (ie no passing those things around to other
contexts). I think "local" is fine for that, but I could imagine other
names. The ones that come to mind are somewhat cumbersome, though
("shortterm" or "local_ctx" or something along those lines).

I dunno.

              Linus

Thomas Gleixner Oct. 30, 2020, 10:26 p.m. UTC | #9

On Fri, Oct 30 2020 at 13:28, Linus Torvalds wrote:
> On Fri, Oct 30, 2020 at 2:39 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> But then we really should not name it kmap_local. 'local' suggests
>> locality, think local_irq*, local_bh* ... kmap_task would be more
>> accurate then.
>
> So the main reason I'd like to see it is because I think on a
> non-highmem machine, the new kmap should be a complete no-op. IOW,
> we'd make sure that there are no costs, no need to increment any
> "restrict migration" counts etc.

Fair enough.

> It's been a bit of a pain to have kmap_atomic() have magical side
> semantics that people might then depend on.

kmap_atomic() will still have the side semantics :)

> I think "local" could still work as a name, because it would have to
> be thread-local (and maybe we'd want a debug mode where that gets
> verified, as actual HIGHMEM machines are getting rare).
>
> I'd avoid "task", because that implies (to me, at least) that it
> wouldn't be good for interrupts etc that don't have a task context.
>
> I think the main issue is that it has to be released in the same
> context as it was created (ie no passing those things around to other
> contexts). I think "local" is fine for that, but I could imagine other
> names. The ones that come to mind are somewhat cumbersome, though
> ("shortterm" or "local_ctx" or something along those lines).

Yeah, not really intuitive either.

Let's stick with _local and add proper function documentation which
clearly says, that the side effect of non-migratability applies only for
the 32bit highmem case in order to make it work at all.

So code which needs CPU locality cannot rely on it and we have enough
debug stuff to catch something like:

    kmap_local()
    this_cpu_write(....)
    kunmap_local()

Let me redo the pile.

While at it I might have a look at that debug request from Willy in the
other end of this thread. Any comment on that?

 https://lore.kernel.org/r/87k0v7mrrd.fsf@nanos.tec.linutronix.de

Thanks,

        tglx

Linus Torvalds Oct. 30, 2020, 10:46 p.m. UTC | #10

On Fri, Oct 30, 2020 at 3:26 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> While at it I might have a look at that debug request from Willy in the
> other end of this thread. Any comment on that?
>
>  https://lore.kernel.org/r/87k0v7mrrd.fsf@nanos.tec.linutronix.de

I do think that it would be nice to have a debug mode, particularly
since over the last few years we've really lost a lot of HIGHMEM
coverage (to the point that I've wondered how worthwhile it really is
to support at all any more - I think it's Arnd who argued that it's
mainly some embedded ARM variants that will want it for the forseeable
future).

So I'm honestly somewhat torn. I think HIGHMEM is dying, and yes that
argues for "non-HIGHMEM had better have some debug coverage", but at
the same time I think it doesn't even really matter any more. At some
point those embedded ARM platforms just aren't even interesting - they
might as well use older kernels if they are the only thing really
arguing for HIGHMEM at all.

This is one reason why I'd like the new kmap_local() to be a no-op,
and I'd prefer for it to have no other side effects - because I want
to be ready to remove it entirely some day. And if we end up having
some transition where people start rewriting "kmap_atomic()" to be
"kmap_local() + explicit preemption disable", then I think that would
be a good step on that whole "kmap will eventually go away" path.

But I do *not* believe that we need to add _so_ much debug support
that we'd catch Willy's "more than one page" case. And I absolutely do
not believe for a second that we should start caring about compound
pages. NO. kmap() is almost dead already, we're not making it worse.

To me, your patch series has two big advantages:

 - more common code

 - kmap_local() becomes more of a no-op

and the last thing we want is to expand on kmap.

           Linus

Thomas Gleixner Oct. 30, 2020, 11:26 p.m. UTC | #11

On Fri, Oct 30 2020 at 15:46, Linus Torvalds wrote:
> On Fri, Oct 30, 2020 at 3:26 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> To me, your patch series has two big advantages:
>
>  - more common code
>
>  - kmap_local() becomes more of a no-op
>
> and the last thing we want is to expand on kmap.

Happy to go with that.

While trying to document the mess, I just stumbled over the abuse of
kmap_atomic_prot() in

drivers/gpu/drm/ttm/ttm_bo_util.c:      dst = kmap_atomic_prot(d, prot);
drivers/gpu/drm/ttm/ttm_bo_util.c:      src = kmap_atomic_prot(s, prot);
drivers/gpu/drm/vmwgfx/vmwgfx_blit.c:   kmap_atomic_prot(d->dst_pages[dst_page],
drivers/gpu/drm/vmwgfx/vmwgfx_blit.c:   kmap_atomic_prot(d->src_pages[src_page],

For !HIGHMEM kmap_atomic_prot() just ignores the 'prot' argument and
returns the page address. 

Moar patches to be written ... Sigh!

Thanks,

        tglx

Thomas Gleixner Oct. 30, 2020, 11:53 p.m. UTC | #12

On Sat, Oct 31 2020 at 00:26, Thomas Gleixner wrote:

> On Fri, Oct 30 2020 at 15:46, Linus Torvalds wrote:
>> On Fri, Oct 30, 2020 at 3:26 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>> To me, your patch series has two big advantages:
>>
>>  - more common code
>>
>>  - kmap_local() becomes more of a no-op
>>
>> and the last thing we want is to expand on kmap.
>
> Happy to go with that.
>
> While trying to document the mess, I just stumbled over the abuse of
> kmap_atomic_prot() in
>
> drivers/gpu/drm/ttm/ttm_bo_util.c:      dst = kmap_atomic_prot(d, prot);
> drivers/gpu/drm/ttm/ttm_bo_util.c:      src = kmap_atomic_prot(s, prot);
> drivers/gpu/drm/vmwgfx/vmwgfx_blit.c:   kmap_atomic_prot(d->dst_pages[dst_page],
> drivers/gpu/drm/vmwgfx/vmwgfx_blit.c:   kmap_atomic_prot(d->src_pages[src_page],
>
> For !HIGHMEM kmap_atomic_prot() just ignores the 'prot' argument and
> returns the page address. 
>
> Moar patches to be written ... Sigh!

Or not. This is actually correct by some definition of correct. For
the non highmem case pgprot is set via the set_memory_*() functions and
this just handles the highmem case.

Comments are overrrated...

Arnd Bergmann Oct. 31, 2020, 1:37 p.m. UTC | #13

On Fri, Oct 30, 2020 at 11:46 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Oct 30, 2020 at 3:26 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > While at it I might have a look at that debug request from Willy in the
> > other end of this thread. Any comment on that?
> >
> >  https://lore.kernel.org/r/87k0v7mrrd.fsf@nanos.tec.linutronix.de
>
> I do think that it would be nice to have a debug mode, particularly
> since over the last few years we've really lost a lot of HIGHMEM
> coverage (to the point that I've wondered how worthwhile it really is
> to support at all any more - I think it's Arnd who argued that it's
> mainly some embedded ARM variants that will want it for the forseeable
> future).
>
> So I'm honestly somewhat torn. I think HIGHMEM is dying, and yes that
> argues for "non-HIGHMEM had better have some debug coverage", but at
> the same time I think it doesn't even really matter any more.

Shifting the testing of highmem code to the actual users of highmem
sounds reasonable to me. This means it will get broken more often,
but if it doesn't happen all the time, it might serve as a canary:
once a bug survives for long enough, we have a good indication that
users stopped caring ;-)

> At some
> point those embedded ARM platforms just aren't even interesting - they
> might as well use older kernels if they are the only thing really
> arguing for HIGHMEM at all.

Agreed, but it does need a little time to get there. My best guess is three
to five years from now we can remove it for good, provided a few things
happen first:

1. The remaining users of TI Keystone2, NXP i.MX6 Quad(Plus) and
  Renesas R8A7790/R8A7791/R8A7742/R8A7743 that use the
  largest memory configuration must have stopped requiring kernel
  version updates.
  These were all introduced at the peak of 32-bit Arm embedded
  systems around 2013, but they have long (15+ year) product
  life cycles and users pick these because they do promise kernel
  updates, unlike other SoC families that get stuck on old vendor
  kernels much earlier.

2. The plan to add a CONFIG_VMSPLIT_4G_4G option on arch/arm/
  must work out. We don't have all the code yet, and so far it looks
  like it's going to be a bit ugly and a bit slow but not nearly as ugly
  or slow as it was on x86 20 years ago.
  This would cover Cortex-A7/A15 systems from 2GB to 4GB,
  which are still fairly common and need to keep using mainline
  kernels for much longer than the system in point 1.
  It won't help on Cortex-A9 machines with 2GB, which I hope can
  migrate CONFIG_VMSPLIT_2G_OPT as a fallback.

3. No new systems with larger memory must appear. I noticed that
  e.g. the newly introduced Rockchips RV1109 and Allwinner
  A50/R328/V316 support LP-DDR4 on a dual Cortex-A7, but I
  hope that nobody will in practice combine a $2 SoC with a $15
  memory chip.
  Most other 32-bit chips use DDR3, which tends to prohibit
  configurations over 4GB in new designs, with the cheapest
  ones limited to 512MB (a single 256Mx16 chip) and the
  high end having already moved on to 64 bit.

Regarding 32-bit non-Arm systems, support for the MIPS-based
Baikal T1 was merged earlier this year and is used in Russian
PC style systems with up to 8GB.
There are also some users on 10+ year old 32-bit netbooks or
business laptops, both x86 and Apple G4.
The longest-lived 32-bit embedded systems with large memory
(other than Arm) are probably NXP QorIQ P20xx/P40xx used in
military VME bus systems, and low-end embedded systems based
on Vortex86.
I'm less worried about all of these because upstream kernel
support for ppc32 and x86-32 is already bitrotting and they will
likely get stuck on the last working kernel before the
TI/Renesas/NXP Arm systems do.

       Arnd

Christophe Leroy Oct. 31, 2020, 3:05 p.m. UTC | #14

> There are also some users on 10+ year old 32-bit netbooks or
> business laptops, both x86 and Apple G4.
> The longest-lived 32-bit embedded systems with large memory
> (other than Arm) are probably NXP QorIQ P20xx/P40xx used in
> military VME bus systems, and low-end embedded systems based
> on Vortex86.
> I'm less worried about all of these because upstream kernel
> support for ppc32 and x86-32 is already bitrotting and they will
> likely get stuck on the last working kernel before the
> TI/Renesas/NXP Arm systems do.
>

Upstream kernel support for ppc32 is bitrotting, seriously ? What do  
you mean exactly ?

ppc32 is actively supported, with recent addition of support of  
hugepages, kasan, uaccess protection, VMAP stack, etc ...

Christophe

Arnd Bergmann Oct. 31, 2020, 9:33 p.m. UTC | #15

On Sat, Oct 31, 2020 at 4:04 PM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
> > There are also some users on 10+ year old 32-bit netbooks or
> > business laptops, both x86 and Apple G4.
> > The longest-lived 32-bit embedded systems with large memory
> > (other than Arm) are probably NXP QorIQ P20xx/P40xx used in
> > military VME bus systems, and low-end embedded systems based
> > on Vortex86.
> > I'm less worried about all of these because upstream kernel
> > support for ppc32 and x86-32 is already bitrotting and they will
> > likely get stuck on the last working kernel before the
> > TI/Renesas/NXP Arm systems do.
> >
>
> Upstream kernel support for ppc32 is bitrotting, seriously ? What do
> you mean exactly ?

I was thinking more of the platform support: out of the twelve
32-bit platforms in arch/powerpc/platforms/, your 8xx is the only
one listed as 'maintained' or 'supported' in the maintainers list,
and that seems to accurately describe the current state.

Freescale seems to have practically stopped contributing to any of
their 32-bit platforms in 2016 after the NXP acquisition and no longer
employing either of the maintainers. Similarly, Ben seems to have
stopped working on powermac in 2016, which was ten years after
the last 32-bit hardware shipped for that platform.

> ppc32 is actively supported, with recent addition of support of
> hugepages, kasan, uaccess protection, VMAP stack, etc ...

That is good to hear, I didn't know about these additions.
What platforms are people using to develop these? Is this
mainly your 8xx work, or is there ongoing development for
platforms that need highmem?

         Arnd

Thomas Gleixner Nov. 2, 2020, 1:08 a.m. UTC | #16

On Thu, Oct 29 2020 at 23:18, Thomas Gleixner wrote:
>
> There is also a still to be investigated question from Linus on the initial
> posting versus the per cpu / per task mapping stack depth which might need
> to be made larger due to the ability to take page faults within a mapping
> region.

I looked deeper into that and we have a stack depth of 20. That's plenty
and I couldn't find a way to get above 10 nested ones including faults,
interrupts, softirqs. With some stress testing I was not able to get over
a maximum of 6 according to the traceprintk I added.

For some obscure reason when CONFIG_DEBUG_HIGHMEM is enabled the stack
depth is increased from 20 to 41. But the only thing DEBUG_HIGHMEM does
is to enable a few BUG_ON()'s in the mapping code.

That's a leftover from the historical mapping code which had fixed
entries for various purposes. DEBUG_HIGHMEM inserted guard mappings
between the map types. But that got all ditched when kmap_atomic()
switched to a stack based map management. Though the WITH_KM_FENCE magic
survived without being functional. All the thing does today is to
increase the stack depth.

I just made that functional again by keeping the stack depth increase
and utilizing every second slot. That should catch Willy's mapping
problem nicely if he bothers to test on 32bit :)

Thanks,

        tglx

[V2,00/18] mm/highmem: Preemptible variant of kmap_atomic & friends

Message

Comments