diff mbox series

[RFC,v2,01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary

Message ID 20220429135108.2781579-2-schnelle@linux.ibm.com (mailing list archive)
State New
Headers show
Series [RFC,v2,01/39] Kconfig: introduce HAS_IOPORT option and select it as necessary | expand

Commit Message

Niklas Schnelle April 29, 2022, 1:49 p.m. UTC
We introduce a new HAS_IOPORT Kconfig option to indicate support for
I/O Port access. In a future patch HAS_IOPORT=n will disable compilation
of the I/O accessor functions inb()/outb() and friends on architectures
which can not meaningfully support legacy I/O spaces such as s390 or
where such support is optional. The "depends on" relations on HAS_IOPORT
in drivers as well as ifdefs for HAS_IOPORT specific sections will be
added in subsequent patches on a per subsystem basis.

Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
 arch/alpha/Kconfig      | 1 +
 arch/arm/Kconfig        | 1 +
 arch/arm64/Kconfig      | 1 +
 arch/ia64/Kconfig       | 1 +
 arch/m68k/Kconfig       | 1 +
 arch/microblaze/Kconfig | 1 +
 arch/mips/Kconfig       | 1 +
 arch/parisc/Kconfig     | 1 +
 arch/powerpc/Kconfig    | 1 +
 arch/riscv/Kconfig      | 1 +
 arch/sh/Kconfig         | 1 +
 arch/sparc/Kconfig      | 1 +
 arch/x86/Kconfig        | 1 +
 drivers/bus/Kconfig     | 2 +-
 lib/Kconfig             | 4 ++++
 lib/Kconfig.kgdb        | 1 +
 16 files changed, 19 insertions(+), 1 deletion(-)

Comments

Maciej W. Rozycki May 1, 2022, 10:40 p.m. UTC | #1
On Fri, 29 Apr 2022, Niklas Schnelle wrote:

> We introduce a new HAS_IOPORT Kconfig option to indicate support for
> I/O Port access. In a future patch HAS_IOPORT=n will disable compilation
> of the I/O accessor functions inb()/outb() and friends on architectures
> which can not meaningfully support legacy I/O spaces such as s390 or
> where such support is optional. The "depends on" relations on HAS_IOPORT
> in drivers as well as ifdefs for HAS_IOPORT specific sections will be
> added in subsequent patches on a per subsystem basis.
[...]
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index de3b32a507d2..4c55df08d6f1 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -47,6 +47,7 @@ config MIPS
>  	select GENERIC_SMP_IDLE_THREAD
>  	select GENERIC_TIME_VSYSCALL
>  	select GUP_GET_PTE_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT
> +	select HAS_IOPORT
>  	select HAVE_ARCH_COMPILER_H
>  	select HAVE_ARCH_JUMP_LABEL
>  	select HAVE_ARCH_KGDB if MIPS_FP_SUPPORT

 NAK, not all MIPS systems have the port I/O space, and we have it already 
handled via the NO_IOPORT_MAP option.  We'll need to have HAS_IOPORT set 
to !NO_IOPORT_MAP (or vice versa) for the MIPS architecture.

  Maciej
Bjorn Helgaas May 4, 2022, 9:08 p.m. UTC | #2
On Fri, Apr 29, 2022 at 03:49:59PM +0200, Niklas Schnelle wrote:
> We introduce a new HAS_IOPORT Kconfig option to indicate support for
> I/O Port access. In a future patch HAS_IOPORT=n will disable compilation
> of the I/O accessor functions inb()/outb() and friends on architectures
> which can not meaningfully support legacy I/O spaces such as s390 or
> where such support is optional. 

So you plan to drop inb()/outb() on architectures where I/O port space
is optional?  So even platforms that have I/O port space may not be
able to use it?

This feels like a lot of work where the main benefit is to keep
Kconfig from offering drivers that aren't of interest on s390.

Granted, there may be issues where inb()/outb() does the wrong thing
such as dereferencing null pointers when I/O port space isn't
implemented.  I think that's a defect in inb()/outb() and could be
fixed there.

Bjorn
Arnd Bergmann May 4, 2022, 9:31 p.m. UTC | #3
On Wed, May 4, 2022 at 11:08 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, Apr 29, 2022 at 03:49:59PM +0200, Niklas Schnelle wrote:
> > We introduce a new HAS_IOPORT Kconfig option to indicate support for
> > I/O Port access. In a future patch HAS_IOPORT=n will disable compilation
> > of the I/O accessor functions inb()/outb() and friends on architectures
> > which can not meaningfully support legacy I/O spaces such as s390 or
> > where such support is optional.
>
> So you plan to drop inb()/outb() on architectures where I/O port space
> is optional?  So even platforms that have I/O port space may not be
> able to use it?
>
> This feels like a lot of work where the main benefit is to keep
> Kconfig from offering drivers that aren't of interest on s390.
>
> Granted, there may be issues where inb()/outb() does the wrong thing
> such as dereferencing null pointers when I/O port space isn't
> implemented.  I think that's a defect in inb()/outb() and could be
> fixed there.

The current implementation in asm-generic/io.h implements inb()/outb()
using readb()/writeb() with a fixed architecture specific offset.

There are three possible things that can happen here:

a) there is a host bridge driver that maps its I/O ports to this window,
    and everything works
b) the address range is reserved and accessible but no host bridge
   driver has mapped its registers there, so an access causes a
   page fault
c) the architecture does not define an offset, and accessing low I/O
    ports ends up as a NULL pointer dereference

The main goal is to avoid c), which is what happens on s390, but
can also happen elsewhere. Catching b) would be nice as well,
but is much harder to do from generic code as you'd need an
architecture specific inline asm statement to insert a ex_table
fixup, or a runtime conditional on each access.

         Arnd
Niklas Schnelle May 5, 2022, 8:10 a.m. UTC | #4
On Wed, 2022-05-04 at 23:31 +0200, Arnd Bergmann wrote:
> On Wed, May 4, 2022 at 11:08 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, Apr 29, 2022 at 03:49:59PM +0200, Niklas Schnelle wrote:
> > > We introduce a new HAS_IOPORT Kconfig option to indicate support for
> > > I/O Port access. In a future patch HAS_IOPORT=n will disable compilation
> > > of the I/O accessor functions inb()/outb() and friends on architectures
> > > which can not meaningfully support legacy I/O spaces such as s390 or
> > > where such support is optional.
> > 
> > So you plan to drop inb()/outb() on architectures where I/O port space
> > is optional?  So even platforms that have I/O port space may not be
> > able to use it?
> > 
> > This feels like a lot of work where the main benefit is to keep
> > Kconfig from offering drivers that aren't of interest on s390.
> > 
> > Granted, there may be issues where inb()/outb() does the wrong thing
> > such as dereferencing null pointers when I/O port space isn't
> > implemented.  I think that's a defect in inb()/outb() and could be
> > fixed there.
> 
> The current implementation in asm-generic/io.h implements inb()/outb()
> using readb()/writeb() with a fixed architecture specific offset.
> 
> There are three possible things that can happen here:
> 
> a) there is a host bridge driver that maps its I/O ports to this window,
>     and everything works
> b) the address range is reserved and accessible but no host bridge
>    driver has mapped its registers there, so an access causes a
>    page fault
> c) the architecture does not define an offset, and accessing low I/O
>     ports ends up as a NULL pointer dereference
> 
> The main goal is to avoid c), which is what happens on s390, but
> can also happen elsewhere. Catching b) would be nice as well,
> but is much harder to do from generic code as you'd need an
> architecture specific inline asm statement to insert a ex_table
> fixup, or a runtime conditional on each access.
> 
>          Arnd

Yes and to add to this, we did try a local solution in inb()/outb()
before. This added a warning when they are used and we know at compile
time that we're dealing with case c). This approach was nacked by Linus
though as we were turning a compile time known broken case into a
runtime one:

https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/

I do agree with this assesment and think this is the rightâ„¢ approach
but it is more churn as can be seen by the size of this series. I think
longer term it could be valuable though especially if more platforms
phase out I/O port support like POWER9 for which this also allows
filtering what drivers will never work.
Bjorn Helgaas May 5, 2022, 4:10 p.m. UTC | #5
On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote:
> On Wed, May 4, 2022 at 11:08 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, Apr 29, 2022 at 03:49:59PM +0200, Niklas Schnelle wrote:
> > > We introduce a new HAS_IOPORT Kconfig option to indicate support for
> > > I/O Port access. In a future patch HAS_IOPORT=n will disable compilation
> > > of the I/O accessor functions inb()/outb() and friends on architectures
> > > which can not meaningfully support legacy I/O spaces such as s390 or
> > > where such support is optional.
> >
> > So you plan to drop inb()/outb() on architectures where I/O port space
> > is optional?  So even platforms that have I/O port space may not be
> > able to use it?
> >
> > This feels like a lot of work where the main benefit is to keep
> > Kconfig from offering drivers that aren't of interest on s390.
> >
> > Granted, there may be issues where inb()/outb() does the wrong thing
> > such as dereferencing null pointers when I/O port space isn't
> > implemented.  I think that's a defect in inb()/outb() and could be
> > fixed there.
> 
> The current implementation in asm-generic/io.h implements inb()/outb()
> using readb()/writeb() with a fixed architecture specific offset.
> 
> There are three possible things that can happen here:
> 
> a) there is a host bridge driver that maps its I/O ports to this window,
>     and everything works
> b) the address range is reserved and accessible but no host bridge
>    driver has mapped its registers there, so an access causes a
>    page fault
> c) the architecture does not define an offset, and accessing low I/O
>     ports ends up as a NULL pointer dereference
> 
> The main goal is to avoid c), which is what happens on s390, but
> can also happen elsewhere. Catching b) would be nice as well,
> but is much harder to do from generic code as you'd need an
> architecture specific inline asm statement to insert a ex_table
> fixup, or a runtime conditional on each access.

Or s390 could implement its own inb().

I'm hearing that generic powerpc kernels have to run both on machines
that have I/O port space and those that don't.  That makes me think
s390 could do something similar.

Bjorn
Arnd Bergmann May 5, 2022, 5:39 p.m. UTC | #6
On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote:
> >
> > The main goal is to avoid c), which is what happens on s390, but
> > can also happen elsewhere. Catching b) would be nice as well,
> > but is much harder to do from generic code as you'd need an
> > architecture specific inline asm statement to insert a ex_table
> > fixup, or a runtime conditional on each access.
>
> Or s390 could implement its own inb().
>
> I'm hearing that generic powerpc kernels have to run both on machines
> that have I/O port space and those that don't.  That makes me think
> s390 could do something similar.

No, this is actually the current situation, and it makes absolutely no
sense. s390 has no way of implementing inb()/outb() because there
are no instructions for it and it cannot tunnel them through a virtual
address mapping like on most of the other architectures. (it has special
instructions for accessing memory space, which is not the same as
a pointer dereference here).

The existing implementation gets flagged as a NULL pointer dereference
by a compiler warning because it effectively is.

powerpc kernels generally map the I/O space into a section of the
physical address space, where it gets mapped into a fixed virtual
address and accessed through pointer dereference. This works on
any powerpc CPU as long as it is implemented in the PCI host
bridge in the usual way. The only difference between powerpc and
arm here is that there are fewer implementations, so one can
make assumptions about which PCI host bridge is used based on
a CPU core.

     Arnd
Bjorn Helgaas May 5, 2022, 7:53 p.m. UTC | #7
On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote:
> On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote:
> > >
> > > The main goal is to avoid c), which is what happens on s390, but
> > > can also happen elsewhere. Catching b) would be nice as well,
> > > but is much harder to do from generic code as you'd need an
> > > architecture specific inline asm statement to insert a ex_table
> > > fixup, or a runtime conditional on each access.
> >
> > Or s390 could implement its own inb().
> >
> > I'm hearing that generic powerpc kernels have to run both on machines
> > that have I/O port space and those that don't.  That makes me think
> > s390 could do something similar.
> 
> No, this is actually the current situation, and it makes absolutely no
> sense. s390 has no way of implementing inb()/outb() because there
> are no instructions for it and it cannot tunnel them through a virtual
> address mapping like on most of the other architectures. (it has special
> instructions for accessing memory space, which is not the same as
> a pointer dereference here).
> 
> The existing implementation gets flagged as a NULL pointer dereference
> by a compiler warning because it effectively is.

I think s390 currently uses the inb() in asm-generic/io.h, i.e.,
"__raw_readb(PCI_IOBASE + addr)".  I understand that's a NULL pointer
dereference because the default PCI_IOBASE is 0.

I mooted a s390 inb() implementation like "return ~0" because that's
what happens on most arches when there's no device to respond to the
inb().

The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter
drivers that use I/O ports in some cases but not others.  But maybe
it's the most practical way.

Bjorn
Finn Thain May 6, 2022, 9:12 a.m. UTC | #8
On Thu, 5 May 2022, Bjorn Helgaas wrote:

> On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote:
> > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote:
> > > >
> > > > The main goal is to avoid c), which is what happens on s390, but
> > > > can also happen elsewhere. Catching b) would be nice as well,
> > > > but is much harder to do from generic code as you'd need an
> > > > architecture specific inline asm statement to insert a ex_table
> > > > fixup, or a runtime conditional on each access.
> > >
> > > Or s390 could implement its own inb().
> > >
> > > I'm hearing that generic powerpc kernels have to run both on machines
> > > that have I/O port space and those that don't.  That makes me think
> > > s390 could do something similar.
> > 
> > No, this is actually the current situation, and it makes absolutely no
> > sense. s390 has no way of implementing inb()/outb() because there
> > are no instructions for it and it cannot tunnel them through a virtual
> > address mapping like on most of the other architectures. (it has special
> > instructions for accessing memory space, which is not the same as
> > a pointer dereference here).
> > 
> > The existing implementation gets flagged as a NULL pointer dereference
> > by a compiler warning because it effectively is.
> 
> I think s390 currently uses the inb() in asm-generic/io.h, i.e.,
> "__raw_readb(PCI_IOBASE + addr)".  I understand that's a NULL pointer
> dereference because the default PCI_IOBASE is 0.
> 
> I mooted a s390 inb() implementation like "return ~0" because that's
> what happens on most arches when there's no device to respond to the
> inb().
> 
> The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter
> drivers that use I/O ports in some cases but not others.  But maybe
> it's the most practical way.
> 

Do you mean, "the most practical way to avoid a compiler warning on s390"? 
What about "#pragma GCC diagnostic ignored"?
Niklas Schnelle May 6, 2022, 9:38 a.m. UTC | #9
On Thu, 2022-05-05 at 14:53 -0500, Bjorn Helgaas wrote:
> On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote:
> > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote:
> > > > The main goal is to avoid c), which is what happens on s390, but
> > > > can also happen elsewhere. Catching b) would be nice as well,
> > > > but is much harder to do from generic code as you'd need an
> > > > architecture specific inline asm statement to insert a ex_table
> > > > fixup, or a runtime conditional on each access.
> > > 
> > > Or s390 could implement its own inb().
> > > 
> > > I'm hearing that generic powerpc kernels have to run both on machines
> > > that have I/O port space and those that don't.  That makes me think
> > > s390 could do something similar.
> > 
> > No, this is actually the current situation, and it makes absolutely no
> > sense. s390 has no way of implementing inb()/outb() because there
> > are no instructions for it and it cannot tunnel them through a virtual
> > address mapping like on most of the other architectures. (it has special
> > instructions for accessing memory space, which is not the same as
> > a pointer dereference here).
> > 
> > The existing implementation gets flagged as a NULL pointer dereference
> > by a compiler warning because it effectively is.
> 
> I think s390 currently uses the inb() in asm-generic/io.h, i.e.,
> "__raw_readb(PCI_IOBASE + addr)".  I understand that's a NULL pointer
> dereference because the default PCI_IOBASE is 0.
> 
> I mooted a s390 inb() implementation like "return ~0" because that's
> what happens on most arches when there's no device to respond to the
> inb().
> 
> The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter
> drivers that use I/O ports in some cases but not others.  But maybe
> it's the most practical way.
> 
> Bjorn

I fear such stubs are kind of equivalent to my previous patch doing the
same in asm-generic/io.h that was pulled and then unpulled by Linus.
Maybe it would be slightly different if instead of a warning outX()
would just be a NOP and inX() just returned ~0 but we're in essence
pretending that we have these functions when we know they are nonsense.

Another argument I see is that as shown by POWER9 we might start to see
more platforms that just can't do I/O port access. E.g. I would also be
surprised if Apple's M1 has I/O port access. Sooner or later I expect
distributions on some platforms to only support such systems. For
example on ppc a server distribution might only support IBM POWER
without I/O port support before too long. Then having HAS_IOPORT allows
to get rid of drivers that won't work anyway.

There are also reports of probing a driver with I/O ports causing a
system crash on systems without I/O port support. For example in this
answer by John Garry (added so he may supply more information):

https://lore.kernel.org/lkml/db043b76-880d-5fad-69cf-96abcd9cd34f@huawei.com/
Maciej W. Rozycki May 6, 2022, 10:20 a.m. UTC | #10
On Thu, 5 May 2022, Arnd Bergmann wrote:

> > I'm hearing that generic powerpc kernels have to run both on machines
> > that have I/O port space and those that don't.  That makes me think
> > s390 could do something similar.
> 
> No, this is actually the current situation, and it makes absolutely no
> sense. s390 has no way of implementing inb()/outb() because there
> are no instructions for it and it cannot tunnel them through a virtual
> address mapping like on most of the other architectures. (it has special
> instructions for accessing memory space, which is not the same as
> a pointer dereference here).

 I think I'm missing something here.  IIUC we're talking about a PCI/PCIe 
bus used with s390 hardware, right?

 (It has to be PCI/PCIe, because other than x86/IA-64 host buses there are 
only PCI/PCIe and EISA/ISA buses out there that define I/O access cycles 
and EISA/ISA have long been obsoleted except perhaps from some niche use.)

 If this is PCI/PCIe indeed, then an I/O access is just a different bit 
pattern put on the bus/in the TLP in the address phase.  So what is there 
inherent to the s390 architecture that prevents that different bit pattern 
from being used?

 If anything, I could imagine the same limitation as with current POWER9 
implementations, that is whatever glue is used to wire PCI/PCIe to the 
rest of the system does not implement a way to use said bit pattern (which 
has nothing to do with the POWER9 processor instruction set).

 But that has nothing to do with the presence or absence of any specific 
processor instructions.  It's just a limitation of bus glue.  So I guess 
it's just that all PCI/PCIe glue logic implementations for s390 have such 
a limitation, right?

  Maciej
John Garry May 6, 2022, 11:07 a.m. UTC | #11
On 06/05/2022 10:38, Niklas Schnelle wrote:
> Another argument I see is that as shown by POWER9 we might start to see
> more platforms that just can't do I/O port access. E.g. I would also be
> surprised if Apple's M1 has I/O port access. Sooner or later I expect
> distributions on some platforms to only support such systems. For
> example on ppc a server distribution might only support IBM POWER
> without I/O port support before too long. Then having HAS_IOPORT allows
> to get rid of drivers that won't work anyway.
> 
> There are also reports of probing a driver with I/O ports causing a
> system crash on systems without I/O port support. For example in this
> answer by John Garry (added so he may supply more information):
> 
> https://lore.kernel.org/lkml/db043b76-880d-5fad-69cf-96abcd9cd34f@huawei.com/
> 
> .

That issue is that drivers like hwmon f71805f use inb/outb accessors 
with hardcoded IO port addresses to probe the driver. On archs like 
arm64 or powerpc - which do not natively support inb et al - this may 
crash the system when no PCI IO space is mapped [0]. Indeed, when PCI IO 
space is mapped, it is preferable these those drivers still would not 
access these ports.

So this series from Niklas could be used as a basis to solve that 
problem, in that we could also introduce something like HARDCODED_IOPORT 
[1] to stop those drivers being built at all for arm64.

[0] 
https://lore.kernel.org/linux-input/20210112055129.7840-1-song.bao.hua@hisilicon.com/T/#mf86445470160c44ac110e9d200b09245169dc5b6
[1] 
https://lore.kernel.org/lkml/CAK8P3a3HHeP+Gw_k2P7Qtig0OmErf0HN30G22+qHic_uZTh11Q@mail.gmail.com/

Thanks,
John
Niklas Schnelle May 6, 2022, 11:18 a.m. UTC | #12
On Fri, 2022-05-06 at 19:12 +1000, Finn Thain wrote:
> 
> On Thu, 5 May 2022, Bjorn Helgaas wrote:
> 
> > On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote:
> > > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote:
> > > > > The main goal is to avoid c), which is what happens on s390, but
> > > > > can also happen elsewhere. Catching b) would be nice as well,
> > > > > but is much harder to do from generic code as you'd need an
> > > > > architecture specific inline asm statement to insert a ex_table
> > > > > fixup, or a runtime conditional on each access.
> > > > 
> > > > Or s390 could implement its own inb().
> > > > 
> > > > I'm hearing that generic powerpc kernels have to run both on machines
> > > > that have I/O port space and those that don't.  That makes me think
> > > > s390 could do something similar.
> > > 
> > > No, this is actually the current situation, and it makes absolutely no
> > > sense. s390 has no way of implementing inb()/outb() because there
> > > are no instructions for it and it cannot tunnel them through a virtual
> > > address mapping like on most of the other architectures. (it has special
> > > instructions for accessing memory space, which is not the same as
> > > a pointer dereference here).
> > > 
> > > The existing implementation gets flagged as a NULL pointer dereference
> > > by a compiler warning because it effectively is.
> > 
> > I think s390 currently uses the inb() in asm-generic/io.h, i.e.,
> > "__raw_readb(PCI_IOBASE + addr)".  I understand that's a NULL pointer
> > dereference because the default PCI_IOBASE is 0.
> > 
> > I mooted a s390 inb() implementation like "return ~0" because that's
> > what happens on most arches when there's no device to respond to the
> > inb().
> > 
> > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter
> > drivers that use I/O ports in some cases but not others.  But maybe
> > it's the most practical way.
> > 
> 
> Do you mean, "the most practical way to avoid a compiler warning on s390"? 
> What about "#pragma GCC diagnostic ignored"?

This actually happens with clang. Apart from that, I think this would
also fall under the same argument as the original patch Linus unpulled.
We would just paint over someting that we know at compile time won't
work:

https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/
Arnd Bergmann May 6, 2022, 11:33 a.m. UTC | #13
On Fri, May 6, 2022 at 12:20 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote:
> On Thu, 5 May 2022, Arnd Bergmann wrote:
>  I think I'm missing something here.  IIUC we're talking about a PCI/PCIe
> bus used with s390 hardware, right?
>
>  (It has to be PCI/PCIe, because other than x86/IA-64 host buses there are
> only PCI/PCIe and EISA/ISA buses out there that define I/O access cycles
> and EISA/ISA have long been obsoleted except perhaps from some niche use.)
>
>  If this is PCI/PCIe indeed, then an I/O access is just a different bit
> pattern put on the bus/in the TLP in the address phase.  So what is there
> inherent to the s390 architecture that prevents that different bit pattern
> from being used?

The hardware design for PCI on s390 is very different from any other
architecture, and more abstract. Rather than implementing MMIO register
access as pointer dereference, this is a separate CPU instruction that
takes a device/bar plus offset as arguments rather than a pointer, and
Linux encodes this back into a fake __iomem token.

>  If anything, I could imagine the same limitation as with current POWER9
> implementations, that is whatever glue is used to wire PCI/PCIe to the
> rest of the system does not implement a way to use said bit pattern (which
> has nothing to do with the POWER9 processor instruction set).
>
>  But that has nothing to do with the presence or absence of any specific
> processor instructions.  It's just a limitation of bus glue.  So I guess
> it's just that all PCI/PCIe glue logic implementations for s390 have such
> a limitation, right?

There are separate instructions for PCI memory and config space, but
no instructions for I/O space, or for non-PCI MMIO that it could be mapped
into.

       Arnd
Maciej W. Rozycki May 6, 2022, 12:27 p.m. UTC | #14
On Fri, 6 May 2022, Arnd Bergmann wrote:

> >  If this is PCI/PCIe indeed, then an I/O access is just a different bit
> > pattern put on the bus/in the TLP in the address phase.  So what is there
> > inherent to the s390 architecture that prevents that different bit pattern
> > from being used?
> 
> The hardware design for PCI on s390 is very different from any other
> architecture, and more abstract. Rather than implementing MMIO register
> access as pointer dereference, this is a separate CPU instruction that
> takes a device/bar plus offset as arguments rather than a pointer, and
> Linux encodes this back into a fake __iomem token.

 OK, that seems to me like a reasonable and quite a clean design (on the 
hardware side).

 So what happens if the instruction is given an I/O rather than memory BAR 
as the relevant argument?  Is the address space indicator bit (bit #0) 
simply ignored or what?

> >  But that has nothing to do with the presence or absence of any specific
> > processor instructions.  It's just a limitation of bus glue.  So I guess
> > it's just that all PCI/PCIe glue logic implementations for s390 have such
> > a limitation, right?
> 
> There are separate instructions for PCI memory and config space, but
> no instructions for I/O space, or for non-PCI MMIO that it could be mapped
> into.

 The PCI configuration space was retrofitted into x86 systems (and is 
accessed in an awkward manner with them), but with a new design such a 
clean approach is most welcome IMHO.  Thank you for your explanation.

  Maciej
Niklas Schnelle May 6, 2022, 12:42 p.m. UTC | #15
On Fri, 2022-05-06 at 13:33 +0200, Arnd Bergmann wrote:
> On Fri, May 6, 2022 at 12:20 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote:
> > On Thu, 5 May 2022, Arnd Bergmann wrote:
> >  I think I'm missing something here.  IIUC we're talking about a PCI/PCIe
> > bus used with s390 hardware, right?
> > 
> >  (It has to be PCI/PCIe, because other than x86/IA-64 host buses there are
> > only PCI/PCIe and EISA/ISA buses out there that define I/O access cycles
> > and EISA/ISA have long been obsoleted except perhaps from some niche use.)
> > 
> >  If this is PCI/PCIe indeed, then an I/O access is just a different bit
> > pattern put on the bus/in the TLP in the address phase.  So what is there
> > inherent to the s390 architecture that prevents that different bit pattern
> > from being used?
> 
> The hardware design for PCI on s390 is very different from any other
> architecture, and more abstract. Rather than implementing MMIO register
> access as pointer dereference, this is a separate CPU instruction that
> takes a device/bar plus offset as arguments rather than a pointer, and
> Linux encodes this back into a fake __iomem token.

Correct, we have since gained new PCI load/store instructions that
actually do take a virtual address that gets address translated to a
"physical address" for the PCI BARs. The "physical address" we get from
the platform (again via special instructions). I put "physical address"
in quotes because while they are conceptually physical addresses and
they are translated to by MMU translation tables, accessing their
virtual mapping with any instruction other then the special PCI
load/store instructions will cause addressing exceptions. So we still
don't have real MMIO but something that looks a lot more like MMIO than
we used to.

> 
> >  If anything, I could imagine the same limitation as with current POWER9
> > implementations, that is whatever glue is used to wire PCI/PCIe to the
> > rest of the system does not implement a way to use said bit pattern (which
> > has nothing to do with the POWER9 processor instruction set).
> > 
> >  But that has nothing to do with the presence or absence of any specific
> > processor instructions.  It's just a limitation of bus glue.  So I guess
> > it's just that all PCI/PCIe glue logic implementations for s390 have such
> > a limitation, right?
> 
> There are separate instructions for PCI memory and config space, but
> no instructions for I/O space, or for non-PCI MMIO that it could be mapped
> into.
> 
>        Arnd

The config space is still accessed with the old style PCI load/store
instructions via a special extra BAR.

But yes overall on s390x we can only access PCI(e) devices via special
instructions not via real MMIO and also the OS has no direct access to
the registers of the PHB which are only accessible to firmware. 

Maybe as a bit of further background it's also important to note that
on s390x all Operating Systems run inside a hypervisor. On the lowest
level any OS can run this is a non-paging machine level hypervisor. For
PCI that means that we always have a kind of pass-through access though
without paging and hardware support for the memory partitioning this is
of course relatively simple.
David Laight May 6, 2022, 12:53 p.m. UTC | #16
From: Maciej W. Rozycki
> Sent: 06 May 2022 13:27
> 
> On Fri, 6 May 2022, Arnd Bergmann wrote:
> 
> > >  If this is PCI/PCIe indeed, then an I/O access is just a different bit
> > > pattern put on the bus/in the TLP in the address phase.  So what is there
> > > inherent to the s390 architecture that prevents that different bit pattern
> > > from being used?
> >
> > The hardware design for PCI on s390 is very different from any other
> > architecture, and more abstract. Rather than implementing MMIO register
> > access as pointer dereference, this is a separate CPU instruction that
> > takes a device/bar plus offset as arguments rather than a pointer, and
> > Linux encodes this back into a fake __iomem token.
> 
>  OK, that seems to me like a reasonable and quite a clean design (on the
> hardware side).
> 
>  So what happens if the instruction is given an I/O rather than memory BAR
> as the relevant argument?  Is the address space indicator bit (bit #0)
> simply ignored or what?

You don't understand something...

For a memory BAR pci_ioremap() returns a token that can be used with readl().
On most architectures readl() is just a memory access.
This all fails on an I/O BAR.

For an I/O BAR you want a similar pair of functions.
On x86 the access function would need to use the 'in/out' instructions but
there is no requirement for the token to be the native io port number.
On many non-x86 architectures the access function would be a simple memory
access - but a specific system (eg many ppc) may never actually allow
such mappings to succeed.

You might also want a third PCI mapping function that can map a memory
BAR or an I/O BAR - with a suitable access function.
On x86 this would need something like ioread8() for accesses.
Except that function uses small integers for io port numbers
(which is inherently dangerous).

Nothing except the architecture specific implementation of the function
to access an io BAR would ever go near a function called inb().

The same is really true for other bus type - including ISA and EISA.
(Ignoring the horrid of probing ISI bus devices - hopefully they
are in the ACPI tables??_
If a driver is probed on a ISA bus there ought to be functions
equivalent to pci_ioremap() (for both memory and IO addresses)
that return tokens appropriate for the specific bus.

That is all a different load of churn.


> > >  But that has nothing to do with the presence or absence of any specific
> > > processor instructions.  It's just a limitation of bus glue.  So I guess
> > > it's just that all PCI/PCIe glue logic implementations for s390 have such
> > > a limitation, right?
> >
> > There are separate instructions for PCI memory and config space, but
> > no instructions for I/O space, or for non-PCI MMIO that it could be mapped
> > into.
> 
>  The PCI configuration space was retrofitted into x86 systems (and is
> accessed in an awkward manner with them), but with a new design such a
> clean approach is most welcome IMHO.  Thank you for your explanation.

Actually I think x86 was the initial system for PCI.
The PCI config space 'mess' is all about trying to make
something that wouldn't break existing memory maps.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Arnd Bergmann May 6, 2022, 12:53 p.m. UTC | #17
On Fri, May 6, 2022 at 2:27 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote:
>
> On Fri, 6 May 2022, Arnd Bergmann wrote:
>
> > >  If this is PCI/PCIe indeed, then an I/O access is just a different bit
> > > pattern put on the bus/in the TLP in the address phase.  So what is there
> > > inherent to the s390 architecture that prevents that different bit pattern
> > > from being used?
> >
> > The hardware design for PCI on s390 is very different from any other
> > architecture, and more abstract. Rather than implementing MMIO register
> > access as pointer dereference, this is a separate CPU instruction that
> > takes a device/bar plus offset as arguments rather than a pointer, and
> > Linux encodes this back into a fake __iomem token.
>
>  OK, that seems to me like a reasonable and quite a clean design (on the
> hardware side).
>
>  So what happens if the instruction is given an I/O rather than memory BAR
> as the relevant argument?  Is the address space indicator bit (bit #0)
> simply ignored or what?

Not sure. My best guess is that it would actually work as you'd expect,
but is deliberately left out of the architecture specification so they don't
have to to validate the correctness.  Note that only a small number of
PCIe cards are actually supported by IBM, and I think the firmware
only passes devices to the OS if they are whitelisted.

        Arnd
Niklas Schnelle May 6, 2022, 12:55 p.m. UTC | #18
On Fri, 2022-05-06 at 13:27 +0100, Maciej W. Rozycki wrote:
> On Fri, 6 May 2022, Arnd Bergmann wrote:
> 
> > >  If this is PCI/PCIe indeed, then an I/O access is just a different bit
> > > pattern put on the bus/in the TLP in the address phase.  So what is there
> > > inherent to the s390 architecture that prevents that different bit pattern
> > > from being used?
> > 
> > The hardware design for PCI on s390 is very different from any other
> > architecture, and more abstract. Rather than implementing MMIO register
> > access as pointer dereference, this is a separate CPU instruction that
> > takes a device/bar plus offset as arguments rather than a pointer, and
> > Linux encodes this back into a fake __iomem token.
> 
>  OK, that seems to me like a reasonable and quite a clean design (on the 
> hardware side).
> 
>  So what happens if the instruction is given an I/O rather than memory BAR 
> as the relevant argument?  Is the address space indicator bit (bit #0) 
> simply ignored or what?

See my answer to Arnd for some more background but there simply isn't a
way to formulate an I/O access. In the old style PCI instructions the
BAR number and the function handle are put in a register before the
access. BAR number 15 is used to access config space. If there is no
BAR for that number the instruction fails with a non-zero CC.

> 
> > >  But that has nothing to do with the presence or absence of any specific
> > > processor instructions.  It's just a limitation of bus glue.  So I guess
> > > it's just that all PCI/PCIe glue logic implementations for s390 have such
> > > a limitation, right?
> > 
> > There are separate instructions for PCI memory and config space, but
> > no instructions for I/O space, or for non-PCI MMIO that it could be mapped
> > into.
> 
>  The PCI configuration space was retrofitted into x86 systems (and is 
> accessed in an awkward manner with them), but with a new design such a 
> clean approach is most welcome IMHO.  Thank you for your explanation.
> 
>   Maciej

Well our design is a retrofit too considering s390x is a direct
decendent of IBM System/360 which one could argue to have been the
first ISA. But yes as PCI support was only added with PCIe and with a
machine level hypervisor already in place we do get shielded a lot from
the gritty details.
Geert Uytterhoeven May 6, 2022, 1:08 p.m. UTC | #19
On Fri, May 6, 2022 at 2:56 PM David Laight <David.Laight@aculab.com> wrote:
> From: Maciej W. Rozycki
> > Sent: 06 May 2022 13:27
> > On Fri, 6 May 2022, Arnd Bergmann wrote:
> > > >  If this is PCI/PCIe indeed, then an I/O access is just a different bit
> > > > pattern put on the bus/in the TLP in the address phase.  So what is there
> > > > inherent to the s390 architecture that prevents that different bit pattern
> > > > from being used?
> > >
> > > The hardware design for PCI on s390 is very different from any other
> > > architecture, and more abstract. Rather than implementing MMIO register
> > > access as pointer dereference, this is a separate CPU instruction that
> > > takes a device/bar plus offset as arguments rather than a pointer, and
> > > Linux encodes this back into a fake __iomem token.
> >
> >  OK, that seems to me like a reasonable and quite a clean design (on the
> > hardware side).
> >
> >  So what happens if the instruction is given an I/O rather than memory BAR
> > as the relevant argument?  Is the address space indicator bit (bit #0)
> > simply ignored or what?
>
> You don't understand something...
>
> For a memory BAR pci_ioremap() returns a token that can be used with readl().
> On most architectures readl() is just a memory access.
> This all fails on an I/O BAR.
>
> For an I/O BAR you want a similar pair of functions.
> On x86 the access function would need to use the 'in/out' instructions but
> there is no requirement for the token to be the native io port number.
> On many non-x86 architectures the access function would be a simple memory
> access - but a specific system (eg many ppc) may never actually allow
> such mappings to succeed.
>
> You might also want a third PCI mapping function that can map a memory
> BAR or an I/O BAR - with a suitable access function.
> On x86 this would need something like ioread8() for accesses.
> Except that function uses small integers for io port numbers
> (which is inherently dangerous).
>
> Nothing except the architecture specific implementation of the function
> to access an io BAR would ever go near a function called inb().
>
> The same is really true for other bus type - including ISA and EISA.
> (Ignoring the horrid of probing ISI bus devices - hopefully they
> are in the ACPI tables??_
> If a driver is probed on a ISA bus there ought to be functions
> equivalent to pci_ioremap() (for both memory and IO addresses)
> that return tokens appropriate for the specific bus.
>
> That is all a different load of churn.

A loooong time ago,  it was suggested to add register accessor
functions to struct device, so e.g. readl(dev, offset) would call
into these accessors, which would implement the bus-specific behavior.
No more worries about readl(), __raw_readl(), ioread32b(), or whatever
quirk is needed, at the (small on nowadays' machines) expense of
some indirection...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Maciej W. Rozycki May 6, 2022, 1:15 p.m. UTC | #20
On Fri, 6 May 2022, David Laight wrote:

> >  The PCI configuration space was retrofitted into x86 systems (and is
> > accessed in an awkward manner with them), but with a new design such a
> > clean approach is most welcome IMHO.  Thank you for your explanation.
> 
> Actually I think x86 was the initial system for PCI.
> The PCI config space 'mess' is all about trying to make
> something that wouldn't break existing memory maps.

 It was retrofitted in that x86 systems already existed for ~15 years when 
PCI came into picture.  Therefore the makers of the CPU ISA couldn't have 
envisaged the need for config access instructions like they did for memory 
and port access.

  Maciej
Niklas Schnelle May 6, 2022, 1:15 p.m. UTC | #21
On Fri, 2022-05-06 at 14:53 +0200, Arnd Bergmann wrote:
> On Fri, May 6, 2022 at 2:27 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote:
> > On Fri, 6 May 2022, Arnd Bergmann wrote:
> > 
> > > >  If this is PCI/PCIe indeed, then an I/O access is just a different bit
> > > > pattern put on the bus/in the TLP in the address phase.  So what is there
> > > > inherent to the s390 architecture that prevents that different bit pattern
> > > > from being used?
> > > 
> > > The hardware design for PCI on s390 is very different from any other
> > > architecture, and more abstract. Rather than implementing MMIO register
> > > access as pointer dereference, this is a separate CPU instruction that
> > > takes a device/bar plus offset as arguments rather than a pointer, and
> > > Linux encodes this back into a fake __iomem token.
> > 
> >  OK, that seems to me like a reasonable and quite a clean design (on the
> > hardware side).
> > 
> >  So what happens if the instruction is given an I/O rather than memory BAR
> > as the relevant argument?  Is the address space indicator bit (bit #0)
> > simply ignored or what?
> 
> Not sure. My best guess is that it would actually work as you'd expect,
> but is deliberately left out of the architecture specification so they don't
> have to to validate the correctness.  Note that only a small number of
> PCIe cards are actually supported by IBM, and I think the firmware
> only passes devices to the OS if they are whitelisted.
> 
>         Arnd

Yes, though in Linux we do try hard to work with whatever is plugged
in. We did benefit from this in the past working with a new NIC from a
different vendor with 0 additional changes. Also you can use vfio-pci
to pass-through arbitrary PCI devices to a QEMU emulating s390x.
Maciej W. Rozycki May 6, 2022, 1:16 p.m. UTC | #22
On Fri, 6 May 2022, Arnd Bergmann wrote:

> >  So what happens if the instruction is given an I/O rather than memory BAR
> > as the relevant argument?  Is the address space indicator bit (bit #0)
> > simply ignored or what?
> 
> Not sure. My best guess is that it would actually work as you'd expect,
> but is deliberately left out of the architecture specification so they don't
> have to to validate the correctness.  Note that only a small number of
> PCIe cards are actually supported by IBM, and I think the firmware
> only passes devices to the OS if they are whitelisted.

 That makes sense, thanks!

  Maciej
David Laight May 6, 2022, 1:28 p.m. UTC | #23
From: Maciej W. Rozycki
> Sent: 06 May 2022 14:15
> On Fri, 6 May 2022, David Laight wrote:
> 
> > >  The PCI configuration space was retrofitted into x86 systems (and is
> > > accessed in an awkward manner with them), but with a new design such a
> > > clean approach is most welcome IMHO.  Thank you for your explanation.
> >
> > Actually I think x86 was the initial system for PCI.
> > The PCI config space 'mess' is all about trying to make
> > something that wouldn't break existing memory maps.
> 
>  It was retrofitted in that x86 systems already existed for ~15 years when
> PCI came into picture.  Therefore the makers of the CPU ISA couldn't have
> envisaged the need for config access instructions like they did for memory
> and port access.

Rev 2.0 of the PCI spec (1993) defines two mechanisms for config cycles.
#2 is probably the first one and maps all of PCI config space into
4k of IO space (PCI bridges aren't supported).
#1 requires a pair of accesses (and SMP locking).

Neither is really horrid.

For horrid try the ISApnp interface.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Maciej W. Rozycki May 6, 2022, 1:40 p.m. UTC | #24
On Fri, 6 May 2022, Geert Uytterhoeven wrote:

> A loooong time ago,  it was suggested to add register accessor
> functions to struct device, so e.g. readl(dev, offset) would call
> into these accessors, which would implement the bus-specific behavior.
> No more worries about readl(), __raw_readl(), ioread32b(), or whatever
> quirk is needed, at the (small on nowadays' machines) expense of
> some indirection...

 I guess you'd need an additional parameter for the endianness policy 
required (to match either bit or byte lanes, according to ultimate data 
interpretation) where crossing between buses of a different endianness 
each.  Otherwise you'd end up with the mess elsewhere.

  Maciej
David Laight May 6, 2022, 2:03 p.m. UTC | #25
From: Geert Uytterhoeven
> Sent: 06 May 2022 14:09
...
> > The same is really true for other bus type - including ISA and EISA.
> > (Ignoring the horrid of probing ISI bus devices - hopefully they
> > are in the ACPI tables??_
> > If a driver is probed on a ISA bus there ought to be functions
> > equivalent to pci_ioremap() (for both memory and IO addresses)
> > that return tokens appropriate for the specific bus.
> >
> > That is all a different load of churn.
> 
> A loooong time ago,  it was suggested to add register accessor
> functions to struct device, so e.g. readl(dev, offset) would call
> into these accessors, which would implement the bus-specific behavior.
> No more worries about readl(), __raw_readl(), ioread32b(), or whatever
> quirk is needed, at the (small on nowadays' machines) expense of
> some indirection...

I was just thinking that the access functions might need a 'device'.
Although you also need the BAR (or equivalent).
So readl(dev, bar_token, offset) or readl(dev, bar_token + offset).
Clearly the 'dev' parameter could be compiled out for non-DEBUG
build on x86 - leaving the current(ish) object code.

You don't want an indirect call (this year), but maybe real
function call and a few tests won't make that much difference.
They might affect PCIe writes, but PCIe reads are so slow you
need to avoid them whenever possible.
I've not timed reads into something like an ethernet chip,
but into our fpga they are probably 1000 clocks+.

OTOH I wouldn't want any overhead on the PIO fifo reads
on one of our small ppc devices.
We push a lot of data though that fifo and anything extra
would kill performance.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Maciej W. Rozycki May 6, 2022, 2:44 p.m. UTC | #26
On Fri, 6 May 2022, David Laight wrote:

> >  It was retrofitted in that x86 systems already existed for ~15 years when
> > PCI came into picture.  Therefore the makers of the CPU ISA couldn't have
> > envisaged the need for config access instructions like they did for memory
> > and port access.
> 
> Rev 2.0 of the PCI spec (1993) defines two mechanisms for config cycles.
> #2 is probably the first one and maps all of PCI config space into
> 4k of IO space (PCI bridges aren't supported).

 This one is even more horrid than #1 in that it requires two separate 
preparatory I/O writes rather than just one, one to the Forward Register 
(at 0xcfa) to set the bus number, and another to the Configuration Space 
Enable Register (at 0xcf8) to set the function number, before you can 
issue a configuration read or write to a device.  So you need MP locking 
too.

 NB only peer bridges aren't supported with this mechanism, normal PCI-PCI 
bridges are, via the Forward Register.

> #1 requires a pair of accesses (and SMP locking).
> 
> Neither is really horrid.

 Both are.  First neither is MP-safe and second both are indirect in that 
you need to poke at some chipset registers before you can issue the actual 
read or write.

 Sane access would require a single CPU instruction to read or write from 
the configuration space.  To access the conventional PCI configuration 
space in a direct linear manner you need 256 * 21 * 8 * 256 = 10.5MiB of 
address space.  Such amount of address space seems affordable even with 
32-bit systems.

  Maciej
Geert Uytterhoeven May 6, 2022, 2:56 p.m. UTC | #27
Hi Maciej,

On Fri, May 6, 2022 at 4:44 PM Maciej W. Rozycki <macro@orcam.me.uk> wrote:
> On Fri, 6 May 2022, David Laight wrote:
> > >  It was retrofitted in that x86 systems already existed for ~15 years when
> > > PCI came into picture.  Therefore the makers of the CPU ISA couldn't have
> > > envisaged the need for config access instructions like they did for memory
> > > and port access.
> >
> > Rev 2.0 of the PCI spec (1993) defines two mechanisms for config cycles.
> > #2 is probably the first one and maps all of PCI config space into
> > 4k of IO space (PCI bridges aren't supported).
>
>  This one is even more horrid than #1 in that it requires two separate
> preparatory I/O writes rather than just one, one to the Forward Register
> (at 0xcfa) to set the bus number, and another to the Configuration Space
> Enable Register (at 0xcf8) to set the function number, before you can
> issue a configuration read or write to a device.  So you need MP locking
> too.
>
>  NB only peer bridges aren't supported with this mechanism, normal PCI-PCI
> bridges are, via the Forward Register.
>
> > #1 requires a pair of accesses (and SMP locking).
> >
> > Neither is really horrid.
>
>  Both are.  First neither is MP-safe and second both are indirect in that
> you need to poke at some chipset registers before you can issue the actual
> read or write.
>
>  Sane access would require a single CPU instruction to read or write from
> the configuration space.  To access the conventional PCI configuration
> space in a direct linear manner you need 256 * 21 * 8 * 256 = 10.5MiB of
> address space.  Such amount of address space seems affordable even with
> 32-bit systems.

Won't have fit in the legacy 1 MiB space ("640 KiB...").

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Geert Uytterhoeven May 6, 2022, 3:02 p.m. UTC | #28
Hi David

On Fri, May 6, 2022 at 4:05 PM David Laight <David.Laight@aculab.com> wrote:
> From: Geert Uytterhoeven
> > Sent: 06 May 2022 14:09
> > > The same is really true for other bus type - including ISA and EISA.
> > > (Ignoring the horrid of probing ISI bus devices - hopefully they
> > > are in the ACPI tables??_
> > > If a driver is probed on a ISA bus there ought to be functions
> > > equivalent to pci_ioremap() (for both memory and IO addresses)
> > > that return tokens appropriate for the specific bus.
> > >
> > > That is all a different load of churn.
> >
> > A loooong time ago,  it was suggested to add register accessor
> > functions to struct device, so e.g. readl(dev, offset) would call
> > into these accessors, which would implement the bus-specific behavior.
> > No more worries about readl(), __raw_readl(), ioread32b(), or whatever
> > quirk is needed, at the (small on nowadays' machines) expense of
> > some indirection...
>
> I was just thinking that the access functions might need a 'device'.
> Although you also need the BAR (or equivalent).
> So readl(dev, bar_token, offset) or readl(dev, bar_token + offset).

Note that we do have such a system: regmap.

> Clearly the 'dev' parameter could be compiled out for non-DEBUG
> build on x86 - leaving the current(ish) object code.

Assumed all devices are PCI devices.
E.g. USB devices would still need the indirection.

> You don't want an indirect call (this year), but maybe real
> function call and a few tests won't make that much difference.
> They might affect PCIe writes, but PCIe reads are so slow you
> need to avoid them whenever possible.
> I've not timed reads into something like an ethernet chip,
> but into our fpga they are probably 1000 clocks+.
>
> OTOH I wouldn't want any overhead on the PIO fifo reads
> on one of our small ppc devices.
> We push a lot of data though that fifo and anything extra
> would kill performance.

Right.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Maciej W. Rozycki May 6, 2022, 3:03 p.m. UTC | #29
Hi Geert,
> >  Sane access would require a single CPU instruction to read or write from
> > the configuration space.  To access the conventional PCI configuration
> > space in a direct linear manner you need 256 * 21 * 8 * 256 = 10.5MiB of
> > address space.  Such amount of address space seems affordable even with
> > 32-bit systems.
> 
> Won't have fit in the legacy 1 MiB space ("640 KiB...").

 Haha, but anyway you're supposed to use BIOS calls under DOS and the like 
so it doesn't really matter.  You can't poke at the APIC in the legacy 
space either.

  Maciej
Finn Thain May 7, 2022, 12:01 a.m. UTC | #30
On Fri, 6 May 2022, Niklas Schnelle wrote:

> On Fri, 2022-05-06 at 19:12 +1000, Finn Thain wrote:
> > 
> > On Thu, 5 May 2022, Bjorn Helgaas wrote:
> > 
> > > On Thu, May 05, 2022 at 07:39:42PM +0200, Arnd Bergmann wrote:
> > > > On Thu, May 5, 2022 at 6:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Wed, May 04, 2022 at 11:31:28PM +0200, Arnd Bergmann wrote:
> > > > > > The main goal is to avoid c), which is what happens on s390, 
> > > > > > but can also happen elsewhere. Catching b) would be nice as 
> > > > > > well, but is much harder to do from generic code as you'd need 
> > > > > > an architecture specific inline asm statement to insert a 
> > > > > > ex_table fixup, or a runtime conditional on each access.
> > > > > 
> > > > > Or s390 could implement its own inb().
> > > > > 
> > > > > I'm hearing that generic powerpc kernels have to run both on 
> > > > > machines that have I/O port space and those that don't.  That 
> > > > > makes me think s390 could do something similar.
> > > > 
> > > > No, this is actually the current situation, and it makes 
> > > > absolutely no sense. s390 has no way of implementing inb()/outb() 
> > > > because there are no instructions for it and it cannot tunnel them 
> > > > through a virtual address mapping like on most of the other 
> > > > architectures. (it has special instructions for accessing memory 
> > > > space, which is not the same as a pointer dereference here).
> > > > 
> > > > The existing implementation gets flagged as a NULL pointer 
> > > > dereference by a compiler warning because it effectively is.
> > > 
> > > I think s390 currently uses the inb() in asm-generic/io.h, i.e., 
> > > "__raw_readb(PCI_IOBASE + addr)".  I understand that's a NULL 
> > > pointer dereference because the default PCI_IOBASE is 0.
> > > 
> > > I mooted a s390 inb() implementation like "return ~0" because that's 
> > > what happens on most arches when there's no device to respond to the 
> > > inb().
> > > 
> > > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter 
> > > drivers that use I/O ports in some cases but not others.  But maybe 
> > > it's the most practical way.
> > > 
> > 
> > Do you mean, "the most practical way to avoid a compiler warning on 
> > s390"? What about "#pragma GCC diagnostic ignored"?
> 
> This actually happens with clang.

That suggests a clang bug to me. If you believe GCC should behave like 
clang, then I guess the pragma above really is the one you want. If you 
somehow feel that the kernel should cater to gcc and clang even where they 
disagree then you would have to use "#pragma clang diagnostic ignored".

> Apart from that, I think this would also fall under the same argument as 
> the original patch Linus unpulled. We would just paint over someting 
> that we know at compile time won't work:
> 
> https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/
> 

I wasn't advocating adding any warnings.

If you know at compile time that a driver won't work, the usual solution 
is scripts/config -d CONFIG_SOME_UNDESIRED_DRIVER. Why is that no 
longer appropriate for drivers that use IO ports?
Arnd Bergmann May 7, 2022, 1:14 p.m. UTC | #31
On Sat, May 7, 2022 at 2:01 AM Finn Thain <fthain@linux-m68k.org> wrote:
> On Fri, 6 May 2022, Niklas Schnelle wrote:
> > On Fri, 2022-05-06 at 19:12 +1000, Finn Thain wrote:
> > > On Thu, 5 May 2022, Bjorn Helgaas wrote:
> > > >
> > > > I mooted a s390 inb() implementation like "return ~0" because that's
> > > > what happens on most arches when there's no device to respond to the
> > > > inb().
> > > >
> > > > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter
> > > > drivers that use I/O ports in some cases but not others.  But maybe
> > > > it's the most practical way.
> > > >
> > >
> > > Do you mean, "the most practical way to avoid a compiler warning on
> > > s390"? What about "#pragma GCC diagnostic ignored"?
> >
> > This actually happens with clang.
>
> That suggests a clang bug to me. If you believe GCC should behave like
> clang, then I guess the pragma above really is the one you want. If you
> somehow feel that the kernel should cater to gcc and clang even where they
> disagree then you would have to use "#pragma clang diagnostic ignored".

I don't see how you can blame the compiler for this. On architectures
with a zero PCI_IOBASE, an inb(0x2f8) literally becomes

        var = *(u8*)((NULL + 0x2f8);

If you run a driver that does this, the kernel gets a page fault for
the NULL page
and reports an Oops. clang tells you 'warning: performing pointer
arithmetic on a null pointer has undefined behavior', which is not exactly
spot on, but close enough to warn you that you probably shouldn't do this. gcc
doesn't warn here, but it does warn about an array out-of-bounds access when
you pass such a pointer into memcpy or another string function.

> > Apart from that, I think this would also fall under the same argument as
> > the original patch Linus unpulled. We would just paint over someting
> > that we know at compile time won't work:
> >
> > https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/
> >
>
> I wasn't advocating adding any warnings.
>
> If you know at compile time that a driver won't work, the usual solution
> is scripts/config -d CONFIG_SOME_UNDESIRED_DRIVER. Why is that no
> longer appropriate for drivers that use IO ports?

This was never an option, we rely on 'make allmodconfig' to build without
warnings on all architectures for finding regressions. Any driver that depends
on architecture specific interfaces must not get selected on architectures that
don't have those interfaces.

         Arnd
Finn Thain May 7, 2022, 11:59 p.m. UTC | #32
Hi Arnd,

On Sat, 7 May 2022, Arnd Bergmann wrote:

> On Sat, May 7, 2022 at 2:01 AM Finn Thain <fthain@linux-m68k.org> wrote:
> > On Fri, 6 May 2022, Niklas Schnelle wrote:
> > > On Fri, 2022-05-06 at 19:12 +1000, Finn Thain wrote:
> > > > On Thu, 5 May 2022, Bjorn Helgaas wrote:
> > > > >
> > > > > I mooted a s390 inb() implementation like "return ~0" because that's
> > > > > what happens on most arches when there's no device to respond to the
> > > > > inb().
> > > > >
> > > > > The HAS_IOPORT dependencies are fairly ugly IMHO, and they clutter
> > > > > drivers that use I/O ports in some cases but not others.  But maybe
> > > > > it's the most practical way.
> > > > >
> > > >
> > > > Do you mean, "the most practical way to avoid a compiler warning on
> > > > s390"? What about "#pragma GCC diagnostic ignored"?
> > >
> > > This actually happens with clang.
> >
> > That suggests a clang bug to me. If you believe GCC should behave like
> > clang, then I guess the pragma above really is the one you want. If you
> > somehow feel that the kernel should cater to gcc and clang even where they
> > disagree then you would have to use "#pragma clang diagnostic ignored".
> 
> I don't see how you can blame the compiler for this. On architectures
> with a zero PCI_IOBASE, an inb(0x2f8) literally becomes
> 
>         var = *(u8*)((NULL + 0x2f8);
> 
> If you run a driver that does this, the kernel gets a page fault for
> the NULL page
> and reports an Oops. clang tells you 'warning: performing pointer
> arithmetic on a null pointer has undefined behavior', which is not exactly
> spot on, but close enough to warn you that you probably shouldn't do this. gcc
> doesn't warn here, but it does warn about an array out-of-bounds access when
> you pass such a pointer into memcpy or another string function.
> 

The appeal to UB is weak IMHO. Pointer arithmetic with a zero value is 
unambiguous and the compiler generates the code to implement the expected 
behaviour just fine.

UB is literally an omission in the standard. Well, low level programming 
has always been beyond the scope of C standards. If architectural-level 
code wants to do arithmetic with an arbitrary integer values, and the 
compiler doesn't like it, then the relevant warnings should be disabled 
for those expressions.

> > > Apart from that, I think this would also fall under the same argument as
> > > the original patch Linus unpulled. We would just paint over someting
> > > that we know at compile time won't work:
> > >
> > > https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/
> > >
> >
> > I wasn't advocating adding any warnings.
> >
> > If you know at compile time that a driver won't work, the usual solution
> > is scripts/config -d CONFIG_SOME_UNDESIRED_DRIVER. Why is that no
> > longer appropriate for drivers that use IO ports?
> 
> This was never an option, we rely on 'make allmodconfig' to build 
> without warnings on all architectures for finding regressions.

"All modules on all architectures with all compilers and checkers with all 
warnings enabled"? That's not even vaguely realistic.

How about, "All modules on all architectures with a nominated compiler 
with the appropriate warnings enabled."

> Any driver that depends on architecture specific interfaces must not get 
> selected on architectures that don't have those interfaces.
> 

Kconfig always met that need before we got saddled with -Werror.

That suggests to me that we need a "bool CONFIG_WARINGS_INTO_ERRORS" to 
control -Werror, which could be disabled for .config files (like make 
allmodconfig) where it is not helping.
Finn Thain May 8, 2022, 12:15 a.m. UTC | #33
On Sun, 8 May 2022, I wrote:

> 
> That suggests to me that we need a "bool CONFIG_WARINGS_INTO_ERRORS" to 
> control -Werror, which could be disabled for .config files (like make 
> allmodconfig) where it is not helping.
> 

I just noticed that we already have CONFIG_WERROR. So perhaps something 
like this would help.

diff --git a/init/Kconfig b/init/Kconfig
index ddcbefe535e9..765d83fb148e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -150,6 +150,8 @@ config WERROR
 
 	  However, if you have a new (or very old) compiler with odd and
 	  unusual warnings, or you have some architecture with problems,
+	  or if you are using a compiler that doesn't happen to interpret
+	  the C standards in quite the same way as some other compilers,
 	  you may need to disable this config option in order to
 	  successfully build the kernel.
diff mbox series

Patch

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 7d0d26b5b3f5..2b9cf1b0bdb8 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -27,6 +27,7 @@  config ALPHA
 	select AUDIT_ARCH
 	select GENERIC_CPU_VULNERABILITIES
 	select GENERIC_SMP_IDLE_THREAD
+	select HAS_IOPORT
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2e8091e2d8a8..603ce00033a5 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -69,6 +69,7 @@  config ARM
 	select GENERIC_SCHED_CLOCK
 	select GENERIC_SMP_IDLE_THREAD
 	select HARDIRQS_SW_RESEND
+	select HAS_IOPORT
 	select HAVE_ARCH_AUDITSYSCALL if AEABI && !OABI_COMPAT
 	select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
 	select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 20ea89d9ac2f..234dc89a7654 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -136,6 +136,7 @@  config ARM64
 	select GENERIC_GETTIMEOFDAY
 	select GENERIC_VDSO_TIME_NS
 	select HARDIRQS_SW_RESEND
+	select HAS_IOPORT
 	select HAVE_MOVE_PMD
 	select HAVE_MOVE_PUD
 	select HAVE_PCI
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index cb93769a9f2a..0fffe5130a80 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -25,6 +25,7 @@  config IA64
 	select PCI_DOMAINS if PCI
 	select PCI_MSI
 	select PCI_SYSCALL if PCI
+	select HAS_IOPORT
 	select HAVE_ASM_MODVERSIONS
 	select HAVE_UNSTABLE_SCHED_CLOCK
 	select HAVE_EXIT_THREAD
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 936cce42ae9a..54bf0a40c2f0 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -18,6 +18,7 @@  config M68K
 	select GENERIC_CPU_DEVICES
 	select GENERIC_IOMAP
 	select GENERIC_IRQ_SHOW
+	select HAS_IOPORT if PCI || ISA || ATARI_ROM_ISA
 	select HAVE_ASM_MODVERSIONS
 	select HAVE_DEBUG_BUGVERBOSE
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS if !CPU_HAS_NO_UNALIGNED
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index 8cf429ad1c84..966a6682f1fc 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -21,6 +21,7 @@  config MICROBLAZE
 	select GENERIC_IRQ_SHOW
 	select GENERIC_PCI_IOMAP
 	select GENERIC_SCHED_CLOCK
+	select HAS_IOPORT if PCI
 	select HAVE_ARCH_HASH
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_SECCOMP
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index de3b32a507d2..4c55df08d6f1 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -47,6 +47,7 @@  config MIPS
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_TIME_VSYSCALL
 	select GUP_GET_PTE_LOW_HIGH if CPU_MIPS32 && PHYS_ADDR_T_64BIT
+	select HAS_IOPORT
 	select HAVE_ARCH_COMPILER_H
 	select HAVE_ARCH_JUMP_LABEL
 	select HAVE_ARCH_KGDB if MIPS_FP_SUPPORT
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 52e550b45692..741c5c64c173 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -46,6 +46,7 @@  config PARISC
 	select MODULES_USE_ELF_RELA
 	select CLONE_BACKWARDS
 	select TTY # Needed for pdc_cons.c
+	select HAS_IOPORT if PCI || EISA
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_HASH
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 174edabb74fa..7133cc35b777 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -182,6 +182,7 @@  config PPC
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_TIME_VSYSCALL
 	select GENERIC_VDSO_TIME_NS
+	select HAS_IOPORT			if PCI
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_HUGE_VMALLOC		if HAVE_ARCH_HUGE_VMAP
 	select HAVE_ARCH_HUGE_VMAP		if PPC_RADIX_MMU || PPC_8xx
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 00fd9c548f26..27fc8a450478 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -67,6 +67,7 @@  config RISCV
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_TIME_VSYSCALL if MMU && 64BIT
 	select GENERIC_VDSO_TIME_NS if HAVE_GENERIC_VDSO
+	select HAS_IOPORT if MMU
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL
 	select HAVE_ARCH_JUMP_LABEL_RELATIVE if !XIP_KERNEL
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 5f220e903e5a..6c1694e82b89 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -25,6 +25,7 @@  config SUPERH
 	select GENERIC_SCHED_CLOCK
 	select GENERIC_SMP_IDLE_THREAD
 	select GUP_GET_PTE_LOW_HIGH if X2TLB
+	select HAS_IOPORT if HAS_IOPORT_MAP
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_SECCOMP_FILTER
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 9200bc04701c..64736476dde8 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -32,6 +32,7 @@  config SPARC
 	select GENERIC_IRQ_SHOW
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select GENERIC_PCI_IOMAP
+	select HAS_IOPORT
 	select HAVE_NMI_WATCHDOG if SPARC64
 	select HAVE_CBPF_JIT if SPARC32
 	select HAVE_EBPF_JIT if SPARC64
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b0142e01002e..9ef0438d1b7d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -155,6 +155,7 @@  config X86
 	select GUP_GET_PTE_LOW_HIGH		if X86_PAE
 	select HARDIRQS_SW_RESEND
 	select HARDLOCKUP_CHECK_TIMESTAMP	if X86_64
+	select HAS_IOPORT
 	select HAVE_ACPI_APEI			if ACPI
 	select HAVE_ACPI_APEI_NMI		if ACPI
 	select HAVE_ALIGNED_STRUCT_PAGE		if SLUB
diff --git a/drivers/bus/Kconfig b/drivers/bus/Kconfig
index 3c68e174a113..a61285100224 100644
--- a/drivers/bus/Kconfig
+++ b/drivers/bus/Kconfig
@@ -81,7 +81,7 @@  config MOXTET
 config HISILICON_LPC
 	bool "Support for ISA I/O space on HiSilicon Hip06/7"
 	depends on (ARM64 && ARCH_HISI) || (COMPILE_TEST && !ALPHA && !HEXAGON && !PARISC)
-	depends on HAS_IOMEM
+	depends on HAS_IOPORT
 	select INDIRECT_PIO if ARM64
 	help
 	  Driver to enable I/O access to devices attached to the Low Pin
diff --git a/lib/Kconfig b/lib/Kconfig
index 087e06b4cdfd..177fed9cf20a 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -91,6 +91,7 @@  config ARCH_USE_SYM_ANNOTATIONS
 config INDIRECT_PIO
 	bool "Access I/O in non-MMIO mode"
 	depends on ARM64
+	depends on HAS_IOPORT
 	help
 	  On some platforms where no separate I/O space exists, there are I/O
 	  hosts which can not be accessed in MMIO mode. Using the logical PIO
@@ -493,6 +494,9 @@  config HAS_IOMEM
 	depends on !NO_IOMEM
 	default y
 
+config HAS_IOPORT
+	def_bool ISA
+
 config HAS_IOPORT_MAP
 	bool
 	depends on HAS_IOMEM && !NO_IOPORT_MAP
diff --git a/lib/Kconfig.kgdb b/lib/Kconfig.kgdb
index 05dae05b6cc9..c68e4d9dcecb 100644
--- a/lib/Kconfig.kgdb
+++ b/lib/Kconfig.kgdb
@@ -121,6 +121,7 @@  config KDB_DEFAULT_ENABLE
 
 config KDB_KEYBOARD
 	bool "KGDB_KDB: keyboard as input device"
+	depends on HAS_IOPORT
 	depends on VT && KGDB_KDB
 	default n
 	help