[RFC,02/15] cxl/core/hdm: Bail on endpoint init fail

Message ID	20220413183720.2444089-3-ben.widawsky@intel.com
State	New, archived
Headers	show Return-Path: <linux-cxl-owner@kernel.org> From: Ben Widawsky <ben.widawsky@intel.com> To: linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev Cc: patches@lists.linux.dev, Ben Widawsky <ben.widawsky@intel.com>, Alison Schofield <alison.schofield@intel.com>, Dan Williams <dan.j.williams@intel.com>, Ira Weiny <ira.weiny@intel.com>, Jonathan Cameron <Jonathan.Cameron@huawei.com>, Vishal Verma <vishal.l.verma@intel.com> Subject: [RFC PATCH 02/15] cxl/core/hdm: Bail on endpoint init fail Date: Wed, 13 Apr 2022 11:37:07 -0700 Message-Id: <20220413183720.2444089-3-ben.widawsky@intel.com> In-Reply-To: <20220413183720.2444089-1-ben.widawsky@intel.com> References: <20220413183720.2444089-1-ben.widawsky@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Region driver \| expand [RFC,00/15] Region driver [RFC,01/15] cxl/core: Use is_endpoint_decoder [RFC,02/15] cxl/core/hdm: Bail on endpoint init fail [RFC,03/15] Revert "cxl/core: Convert decoder range to resource" [RFC,04/15] cxl/core: Create distinct decoder structs [RFC,05/15] cxl/acpi: Reserve CXL resources from request_free_mem_region [RFC,06/15] cxl/acpi: Manage root decoder's address space [RFC,07/15] cxl/port: Surface ram and pmem resources [RFC,08/15] cxl/core/hdm: Allocate resources from the media [RFC,09/15] cxl/core/port: Add attrs for size and volatility [RFC,10/15] cxl/core: Extract IW/IG decoding [RFC,11/15] cxl/acpi: Use common IW/IG decoding [RFC,12/15] cxl/region: Add region creation ABI [RFC,13/15] cxl/core/port: Add attrs for root ways & granularity [RFC,14/15] cxl/region: Introduce configuration [RFC,15/15] cxl/region: Introduce a cxl_region driver

Ben Widawsky April 13, 2022, 6:37 p.m. UTC

Endpoint decoder enumeration is the only way in which we can determine
Device Physical Address (DPA) -> Host Physical Address (HPA) mappings.
Information is obtained only when the register state can be read
sequentially. If when enumerating the decoders a failure occurs, all
other decoders must also fail since the decoders can no longer be
accurately managed (unless it's the last decoder in which case it can
still work).

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
 drivers/cxl/core/hdm.c | 2 ++
 1 file changed, 2 insertions(+)

Dan Williams April 13, 2022, 9:31 p.m. UTC | #1

On Wed, Apr 13, 2022 at 11:38 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> Endpoint decoder enumeration is the only way in which we can determine
> Device Physical Address (DPA) -> Host Physical Address (HPA) mappings.
> Information is obtained only when the register state can be read
> sequentially. If when enumerating the decoders a failure occurs, all
> other decoders must also fail since the decoders can no longer be
> accurately managed (unless it's the last decoder in which case it can
> still work).

I think this should be expanded to fail if any decoder fails to
allocate anywhere in the topology otherwise it leaves a mess for
future address translation code to work through cases where decoder
information is missing.

The current approach is based around the current expectation that
nothing is enumerating pre-existing regions, and nothing is performing
address translation.

>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> ---
>  drivers/cxl/core/hdm.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index bfc8ee876278..c3c021b54079 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -255,6 +255,8 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
>                                       cxlhdm->regs.hdm_decoder, i);
>                 if (rc) {
>                         put_device(&cxld->dev);
> +                       if (is_endpoint_decoder(&cxld->dev))
> +                               return rc;
>                         failed++;
>                         continue;
>                 }
> --
> 2.35.1
>

Adam Manzanares April 18, 2022, 4:37 p.m. UTC | #2

On Wed, Apr 13, 2022 at 02:31:42PM -0700, Dan Williams wrote:
> On Wed, Apr 13, 2022 at 11:38 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> >
> > Endpoint decoder enumeration is the only way in which we can determine
> > Device Physical Address (DPA) -> Host Physical Address (HPA) mappings.
> > Information is obtained only when the register state can be read
> > sequentially. If when enumerating the decoders a failure occurs, all
> > other decoders must also fail since the decoders can no longer be
> > accurately managed (unless it's the last decoder in which case it can
> > still work).
> 
> I think this should be expanded to fail if any decoder fails to
> allocate anywhere in the topology otherwise it leaves a mess for
> future address translation code to work through cases where decoder
> information is missing.
> 
> The current approach is based around the current expectation that
> nothing is enumerating pre-existing regions, and nothing is performing
> address translation.

Does the qemu support currently allow testing of this patch? If so, it would 
be good to reference qemu configurations. Any other alternatives would be 
welcome as well. 

+Luis on cc.

> 
> >
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > ---
> >  drivers/cxl/core/hdm.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > index bfc8ee876278..c3c021b54079 100644
> > --- a/drivers/cxl/core/hdm.c
> > +++ b/drivers/cxl/core/hdm.c
> > @@ -255,6 +255,8 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
> >                                       cxlhdm->regs.hdm_decoder, i);
> >                 if (rc) {
> >                         put_device(&cxld->dev);
> > +                       if (is_endpoint_decoder(&cxld->dev))
> > +                               return rc;
> >                         failed++;
> >                         continue;
> >                 }
> > --
> > 2.35.1
> >
>

Ben Widawsky May 12, 2022, 3:50 p.m. UTC | #3

On 22-04-18 16:37:12, Adam Manzanares wrote:
> On Wed, Apr 13, 2022 at 02:31:42PM -0700, Dan Williams wrote:
> > On Wed, Apr 13, 2022 at 11:38 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > >
> > > Endpoint decoder enumeration is the only way in which we can determine
> > > Device Physical Address (DPA) -> Host Physical Address (HPA) mappings.
> > > Information is obtained only when the register state can be read
> > > sequentially. If when enumerating the decoders a failure occurs, all
> > > other decoders must also fail since the decoders can no longer be
> > > accurately managed (unless it's the last decoder in which case it can
> > > still work).
> > 
> > I think this should be expanded to fail if any decoder fails to
> > allocate anywhere in the topology otherwise it leaves a mess for
> > future address translation code to work through cases where decoder
> > information is missing.
> > 
> > The current approach is based around the current expectation that
> > nothing is enumerating pre-existing regions, and nothing is performing
> > address translation.
> 
> Does the qemu support currently allow testing of this patch? If so, it would 
> be good to reference qemu configurations. Any other alternatives would be 
> welcome as well. 
> 
> +Luis on cc.
> 

No. This type of error injection would be cool to have, but I'm not sure of a
good way to support that in a scalable way. Maybe Jonathan has some ideas?

> > 
> > >
> > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > ---
> > >  drivers/cxl/core/hdm.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > > index bfc8ee876278..c3c021b54079 100644
> > > --- a/drivers/cxl/core/hdm.c
> > > +++ b/drivers/cxl/core/hdm.c
> > > @@ -255,6 +255,8 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
> > >                                       cxlhdm->regs.hdm_decoder, i);
> > >                 if (rc) {
> > >                         put_device(&cxld->dev);
> > > +                       if (is_endpoint_decoder(&cxld->dev))
> > > +                               return rc;
> > >                         failed++;
> > >                         continue;
> > >                 }
> > > --
> > > 2.35.1
> > >
> >

Luis Chamberlain May 12, 2022, 5:27 p.m. UTC | #4

On Thu, May 12, 2022 at 08:50:14AM -0700, Ben Widawsky wrote:
> On 22-04-18 16:37:12, Adam Manzanares wrote:
> > On Wed, Apr 13, 2022 at 02:31:42PM -0700, Dan Williams wrote:
> > > On Wed, Apr 13, 2022 at 11:38 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > > >
> > > > Endpoint decoder enumeration is the only way in which we can determine
> > > > Device Physical Address (DPA) -> Host Physical Address (HPA) mappings.
> > > > Information is obtained only when the register state can be read
> > > > sequentially. If when enumerating the decoders a failure occurs, all
> > > > other decoders must also fail since the decoders can no longer be
> > > > accurately managed (unless it's the last decoder in which case it can
> > > > still work).
> > > 
> > > I think this should be expanded to fail if any decoder fails to
> > > allocate anywhere in the topology otherwise it leaves a mess for
> > > future address translation code to work through cases where decoder
> > > information is missing.
> > > 
> > > The current approach is based around the current expectation that
> > > nothing is enumerating pre-existing regions, and nothing is performing
> > > address translation.
> > 
> > Does the qemu support currently allow testing of this patch? If so, it would 
> > be good to reference qemu configurations. Any other alternatives would be 
> > welcome as well. 
> > 
> > +Luis on cc.
> > 
> 
> No. This type of error injection would be cool to have, but I'm not sure of a
> good way to support that in a scalable way. Maybe Jonathan has some ideas?

In case it helps on the Linux front the least intrusive way is to use
ALLOW_ERROR_INJECTION(). It's what I hope we'll slowly strive for on
the block layer and filesystems slowly. That incurs one macro call per error
routine you want to allow error injection on.

Then you use debugfs to dynamically enable / disable the error
injection / rate etc.

So I think this begs the question, what error injection mechanisms
exist for qemu and would new functionality be welcomed?

Linux builds off a brilliantly simple simple interface borrowed from
failmalloc [0]. The initial implementation on Linux then was also really
simple [1] [2] [3] however it required adding stubs on each call with a
respective build option to enable failure injection. Configuration was done
through debugfs.

Later Josef enabled us to use BPF to allow overriding kprobed functions
to return arbitrary values[4], and further generalized away from kprobes
by Masami [5].

If no failure injection is present in qemu something as simple as the initial
approach could be considered [1] [2] [3], but a dynamic interface
would certainly be wonderful long term.

[0] http://www.nongnu.org/failmalloc/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de1ba09b214056365d9082982905b255caafb7a2
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6ff1cb355e628f8fc55fa2d01e269e5e1bbc2fe9
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8a8b6502fb669c3a0638a08955442814cedc86b1
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=92ace9991da08827e809c2d120108a96a281e7fc
[5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=540adea3809f61115d2a1ea4ed6e627613452ba1

  Luis

> > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > ---
> > > >  drivers/cxl/core/hdm.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > > > index bfc8ee876278..c3c021b54079 100644
> > > > --- a/drivers/cxl/core/hdm.c
> > > > +++ b/drivers/cxl/core/hdm.c
> > > > @@ -255,6 +255,8 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
> > > >                                       cxlhdm->regs.hdm_decoder, i);
> > > >                 if (rc) {
> > > >                         put_device(&cxld->dev);
> > > > +                       if (is_endpoint_decoder(&cxld->dev))
> > > > +                               return rc;
> > > >                         failed++;
> > > >                         continue;
> > > >                 }
> > > > --
> > > > 2.35.1
> > > >
> > >

Jonathan Cameron May 13, 2022, 12:09 p.m. UTC | #5

On Thu, 12 May 2022 10:27:38 -0700
Luis Chamberlain <mcgrof@kernel.org> wrote:

> On Thu, May 12, 2022 at 08:50:14AM -0700, Ben Widawsky wrote:
> > On 22-04-18 16:37:12, Adam Manzanares wrote:  
> > > On Wed, Apr 13, 2022 at 02:31:42PM -0700, Dan Williams wrote:  
> > > > On Wed, Apr 13, 2022 at 11:38 AM Ben Widawsky <ben.widawsky@intel.com> wrote:  
> > > > >
> > > > > Endpoint decoder enumeration is the only way in which we can determine
> > > > > Device Physical Address (DPA) -> Host Physical Address (HPA) mappings.
> > > > > Information is obtained only when the register state can be read
> > > > > sequentially. If when enumerating the decoders a failure occurs, all
> > > > > other decoders must also fail since the decoders can no longer be
> > > > > accurately managed (unless it's the last decoder in which case it can
> > > > > still work).  
> > > > 
> > > > I think this should be expanded to fail if any decoder fails to
> > > > allocate anywhere in the topology otherwise it leaves a mess for
> > > > future address translation code to work through cases where decoder
> > > > information is missing.
> > > > 
> > > > The current approach is based around the current expectation that
> > > > nothing is enumerating pre-existing regions, and nothing is performing
> > > > address translation.  
> > > 
> > > Does the qemu support currently allow testing of this patch? If so, it would 
> > > be good to reference qemu configurations. Any other alternatives would be 
> > > welcome as well. 
> > > 
> > > +Luis on cc.
> > >   
> > 
> > No. This type of error injection would be cool to have, but I'm not sure of a
> > good way to support that in a scalable way. Maybe Jonathan has some ideas?  
> 
> In case it helps on the Linux front the least intrusive way is to use
> ALLOW_ERROR_INJECTION(). It's what I hope we'll slowly strive for on
> the block layer and filesystems slowly. That incurs one macro call per error
> routine you want to allow error injection on.
> 
> Then you use debugfs to dynamically enable / disable the error
> injection / rate etc.
> 
> So I think this begs the question, what error injection mechanisms
> exist for qemu and would new functionality be welcomed?

So what paths can actually cause this to fail? Looking at the upstream
code in init_hdm_decoder() looks like there are only a few things that
are checked. 

base or size being all fs or interleave ways not being a value the
kernel understands.

For all fs, I'm not sure how we'd get that value?

For interleave ways:
Our current verification of writes to these registers in QEMU is very
limited I think you can currently push in an invalid value. We are only
masking writes, not checking for mid range values that don't exist.
However, that's something I'll be looking to restrict soon as we add
more input verification so I wouldn't rely on it.

I'm not aware of anything general affecting QEMU devices emulation.
I've hacked cases in as temporary tests but not sure
we'd want to carry something specific for this one.

> 
> Linux builds off a brilliantly simple simple interface borrowed from
> failmalloc [0]. The initial implementation on Linux then was also really
> simple [1] [2] [3] however it required adding stubs on each call with a
> respective build option to enable failure injection. Configuration was done
> through debugfs.
> 
> Later Josef enabled us to use BPF to allow overriding kprobed functions
> to return arbitrary values[4], and further generalized away from kprobes
> by Masami [5].
> 
> If no failure injection is present in qemu something as simple as the initial
> approach could be considered [1] [2] [3], but a dynamic interface
> would certainly be wonderful long term.
> 
> [0] http://www.nongnu.org/failmalloc/
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de1ba09b214056365d9082982905b255caafb7a2
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6ff1cb355e628f8fc55fa2d01e269e5e1bbc2fe9
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8a8b6502fb669c3a0638a08955442814cedc86b1
> [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=92ace9991da08827e809c2d120108a96a281e7fc
> [5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=540adea3809f61115d2a1ea4ed6e627613452ba1
> 
>   Luis
> 
> > > > > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > > > > ---
> > > > >  drivers/cxl/core/hdm.c | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > > > > index bfc8ee876278..c3c021b54079 100644
> > > > > --- a/drivers/cxl/core/hdm.c
> > > > > +++ b/drivers/cxl/core/hdm.c
> > > > > @@ -255,6 +255,8 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
> > > > >                                       cxlhdm->regs.hdm_decoder, i);
> > > > >                 if (rc) {
> > > > >                         put_device(&cxld->dev);
> > > > > +                       if (is_endpoint_decoder(&cxld->dev))
> > > > > +                               return rc;
> > > > >                         failed++;
> > > > >                         continue;
> > > > >                 }
> > > > > --
> > > > > 2.35.1
> > > > >  
> > > >

Dan Williams May 13, 2022, 3:03 p.m. UTC | #6

On Fri, May 13, 2022 at 5:09 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Thu, 12 May 2022 10:27:38 -0700
> Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> > On Thu, May 12, 2022 at 08:50:14AM -0700, Ben Widawsky wrote:
> > > On 22-04-18 16:37:12, Adam Manzanares wrote:
> > > > On Wed, Apr 13, 2022 at 02:31:42PM -0700, Dan Williams wrote:
> > > > > On Wed, Apr 13, 2022 at 11:38 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > > > > >
> > > > > > Endpoint decoder enumeration is the only way in which we can determine
> > > > > > Device Physical Address (DPA) -> Host Physical Address (HPA) mappings.
> > > > > > Information is obtained only when the register state can be read
> > > > > > sequentially. If when enumerating the decoders a failure occurs, all
> > > > > > other decoders must also fail since the decoders can no longer be
> > > > > > accurately managed (unless it's the last decoder in which case it can
> > > > > > still work).
> > > > >
> > > > > I think this should be expanded to fail if any decoder fails to
> > > > > allocate anywhere in the topology otherwise it leaves a mess for
> > > > > future address translation code to work through cases where decoder
> > > > > information is missing.
> > > > >
> > > > > The current approach is based around the current expectation that
> > > > > nothing is enumerating pre-existing regions, and nothing is performing
> > > > > address translation.
> > > >
> > > > Does the qemu support currently allow testing of this patch? If so, it would
> > > > be good to reference qemu configurations. Any other alternatives would be
> > > > welcome as well.
> > > >
> > > > +Luis on cc.
> > > >
> > >
> > > No. This type of error injection would be cool to have, but I'm not sure of a
> > > good way to support that in a scalable way. Maybe Jonathan has some ideas?
> >
> > In case it helps on the Linux front the least intrusive way is to use
> > ALLOW_ERROR_INJECTION(). It's what I hope we'll slowly strive for on
> > the block layer and filesystems slowly. That incurs one macro call per error
> > routine you want to allow error injection on.
> >
> > Then you use debugfs to dynamically enable / disable the error
> > injection / rate etc.
> >
> > So I think this begs the question, what error injection mechanisms
> > exist for qemu and would new functionality be welcomed?
>
> So what paths can actually cause this to fail? Looking at the upstream
> code in init_hdm_decoder() looks like there are only a few things that
> are checked.
>
> base or size being all fs or interleave ways not being a value the
> kernel understands.
>
> For all fs, I'm not sure how we'd get that value?
>
> For interleave ways:
> Our current verification of writes to these registers in QEMU is very
> limited I think you can currently push in an invalid value. We are only
> masking writes, not checking for mid range values that don't exist.
> However, that's something I'll be looking to restrict soon as we add
> more input verification so I wouldn't rely on it.
>
> I'm not aware of anything general affecting QEMU devices emulation.
> I've hacked cases in as temporary tests but not sure
> we'd want to carry something specific for this one.

This is another motivation for cxl_test. QEMU is meant to faithfully
emulate the hardware, not unit test drivers.

Luis Chamberlain May 13, 2022, 3:12 p.m. UTC | #7

On Fri, May 13, 2022 at 01:09:09PM +0100, Jonathan Cameron wrote:
> On Thu, 12 May 2022 10:27:38 -0700
> Luis Chamberlain <mcgrof@kernel.org> wrote:
> 
> > On Thu, May 12, 2022 at 08:50:14AM -0700, Ben Widawsky wrote:
> > > On 22-04-18 16:37:12, Adam Manzanares wrote:  
> > > > On Wed, Apr 13, 2022 at 02:31:42PM -0700, Dan Williams wrote:  
> > > > > On Wed, Apr 13, 2022 at 11:38 AM Ben Widawsky <ben.widawsky@intel.com> wrote:  
> > > > > >
> > > > > > Endpoint decoder enumeration is the only way in which we can determine
> > > > > > Device Physical Address (DPA) -> Host Physical Address (HPA) mappings.
> > > > > > Information is obtained only when the register state can be read
> > > > > > sequentially. If when enumerating the decoders a failure occurs, all
> > > > > > other decoders must also fail since the decoders can no longer be
> > > > > > accurately managed (unless it's the last decoder in which case it can
> > > > > > still work).  
> > > > > 
> > > > > I think this should be expanded to fail if any decoder fails to
> > > > > allocate anywhere in the topology otherwise it leaves a mess for
> > > > > future address translation code to work through cases where decoder
> > > > > information is missing.
> > > > > 
> > > > > The current approach is based around the current expectation that
> > > > > nothing is enumerating pre-existing regions, and nothing is performing
> > > > > address translation.  
> > > > 
> > > > Does the qemu support currently allow testing of this patch? If so, it would 
> > > > be good to reference qemu configurations. Any other alternatives would be 
> > > > welcome as well. 
> > > > 
> > > > +Luis on cc.
> > > >   
> > > 
> > > No. This type of error injection would be cool to have, but I'm not sure of a
> > > good way to support that in a scalable way. Maybe Jonathan has some ideas?  
> > 
> > In case it helps on the Linux front the least intrusive way is to use
> > ALLOW_ERROR_INJECTION(). It's what I hope we'll slowly strive for on
> > the block layer and filesystems slowly. That incurs one macro call per error
> > routine you want to allow error injection on.
> > 
> > Then you use debugfs to dynamically enable / disable the error
> > injection / rate etc.
> > 
> > So I think this begs the question, what error injection mechanisms
> > exist for qemu and would new functionality be welcomed?
> 
> So what paths can actually cause this to fail?

If you are asking about adopting something like the failmalloc
should_fail() strategy in qemu, you'd essentially open code a call to
a should_fail() and in it pass the arguments you want from your
own call down. If you want to ignore size you can just pass 0
for instance.

> Looking at the upstream
> code in init_hdm_decoder() looks like there are only a few things that
> are checked. 

If you mean in Linux, you would open code a should_fail()
specific to the area as in this commit old commit example, and
adding a respective kconfig entry for it:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8a8b6502fb669c3a0638a08955442814cedc86b1

Eech of these knobs then get its own probability, times, and space
debugfs entries which let the routine should_fail() fail when the
parameters set meet the criteria set by debugfs.

There are ways to make this much more scalable though, but I had not
seen many efforts to do so. I did start such an approach using debugfs
specific to *one* kconfig entry, for instance see this block layer proposed
change, which would in turn enable tons of different ways to enable failing
if CONFIG_FAIL_ADD_DISK would be used:

https://lore.kernel.org/linux-block/20210512064629.13899-9-mcgrof@kernel.org/

However, at the recent discussion at LSFMM for this we decided instead
to just sprinkle ALLOW_ERROR_INJECTION() after each routine. Otherwise
you are open coding tons of new "should_fail()" calls in your runtime
path and that can make it hard to review patches and is just a lot of
noise in code.

But with CONFIG_FAIL_FUNCTION this means you don't have to open code
should_fail() calls, but instead for each routine you want to add a failure
injection support you'd just use ALLOW_ERROR_INJECTION() per call.

Read Documentation/fault-injection/fault-injection.rst on
fail_function/injectable and fail_function/<function-name>/retval,
so we could do for instance, to avoid a namespace clash I just
added the cxl_ prefix:

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 0e89a7a932d4..2aff3bace698 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -149,8 +149,8 @@ static int to_interleave_ways(u32 ctrl)
 	}
 }

-static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
-			    int *target_map, void __iomem *hdm, int which)
+static int cxl_init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
+				int *target_map, void __iomem *hdm, int which)
 {
 	u64 size, base;
 	u32 ctrl;
@@ -207,6 +207,7 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,

 	return 0;
 }
+ALLOW_ERROR_INJECTION(cxl_init_hdm_decoder, ERRNO);

 /**
  * devm_cxl_enumerate_decoders - add decoder objects per HDM register set
@@ -251,8 +252,8 @@ int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm)
 			return PTR_ERR(cxld);
 		}

-		rc = init_hdm_decoder(port, cxld, target_map,
-				      cxlhdm->regs.hdm_decoder, i);
+		rc = cxl_init_hdm_decoder(port, cxld, target_map,
+					  cxlhdm->regs.hdm_decoder, i);
 		if (rc) {
 			put_device(&cxld->dev);
 			failed++;

This lets's us then modprobe the cxl modules and then:

root@kdevops-dev ~ # grep ^cxl /sys/kernel/debug/fail_function/injectable 
cxl_init_hdm_decoder [cxl_core] ERRNO

echo cxl_init_hdm_decoder > /sys/kernel/debug/fail_function/cxl_init_hdm_decoder/
printf %#x -6 > /sys/kernel/debug/fail_function/cxl_init_hdm_decoder/retval

Now this routine will return -ENXIO (-6) when called but I think
you still need to set the probability for this to not be 0:

cat /sys/kernel/debug/fail_function/probability 
0

In the world where ALLOW_ERROR_INJECTION() is not used we'd get one of
each of the probability, space and times for each new failure injection
function.

> base or size being all fs or interleave ways not being a value the
> kernel understands.

I think that if you want to make use of variable failures depending on
input data you might as well open code your own should_fail() calls
as I did in the above referenced block layer patches for add_disk with
your own kconfig entry.

ALLOW_ERROR_INJECTION() seems to be useful if you are OK to do simple
failures based on a return code and probably a probablity / times
values (times means that if you say 2 it will fail ever 2 calls).

There are odd uses of ALLOW_ERROR_INJECTION() but I can't say I'm a fan
of: drivers/platform/surface/aggregator/ssh_packet_layer.c

> For all fs, I'm not sure how we'd get that value?

You'd echo the return value you want to fake a failure for into the
debugfs retval. If you open code your own should_fail() you can use
size, space, probability in whatever you see fit.

> For interleave ways:
> Our current verification of writes to these registers in QEMU is very
> limited I think you can currently push in an invalid value. We are only
> masking writes, not checking for mid range values that don't exist.
> However, that's something I'll be looking to restrict soon as we add
> more input verification so I wouldn't rely on it.
> 
> I'm not aware of anything general affecting QEMU devices emulation.
> I've hacked cases in as temporary tests but not sure
> we'd want to carry something specific for this one.

I'd imagine as in any subsystem finding the few key areas you *do* want
test coverage for failure injection will be a nice fun next step. It is
understandable if init_hdm_decoder() is not it.

  Luis

Dan Williams May 13, 2022, 7:14 p.m. UTC | #8

On Fri, May 13, 2022 at 8:12 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Fri, May 13, 2022 at 01:09:09PM +0100, Jonathan Cameron wrote:
> > On Thu, 12 May 2022 10:27:38 -0700
> > Luis Chamberlain <mcgrof@kernel.org> wrote:
> >
> > > On Thu, May 12, 2022 at 08:50:14AM -0700, Ben Widawsky wrote:
> > > > On 22-04-18 16:37:12, Adam Manzanares wrote:
> > > > > On Wed, Apr 13, 2022 at 02:31:42PM -0700, Dan Williams wrote:
> > > > > > On Wed, Apr 13, 2022 at 11:38 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
> > > > > > >
> > > > > > > Endpoint decoder enumeration is the only way in which we can determine
> > > > > > > Device Physical Address (DPA) -> Host Physical Address (HPA) mappings.
> > > > > > > Information is obtained only when the register state can be read
> > > > > > > sequentially. If when enumerating the decoders a failure occurs, all
> > > > > > > other decoders must also fail since the decoders can no longer be
> > > > > > > accurately managed (unless it's the last decoder in which case it can
> > > > > > > still work).
> > > > > >
> > > > > > I think this should be expanded to fail if any decoder fails to
> > > > > > allocate anywhere in the topology otherwise it leaves a mess for
> > > > > > future address translation code to work through cases where decoder
> > > > > > information is missing.
> > > > > >
> > > > > > The current approach is based around the current expectation that
> > > > > > nothing is enumerating pre-existing regions, and nothing is performing
> > > > > > address translation.
> > > > >
> > > > > Does the qemu support currently allow testing of this patch? If so, it would
> > > > > be good to reference qemu configurations. Any other alternatives would be
> > > > > welcome as well.
> > > > >
> > > > > +Luis on cc.
> > > > >
> > > >
> > > > No. This type of error injection would be cool to have, but I'm not sure of a
> > > > good way to support that in a scalable way. Maybe Jonathan has some ideas?
> > >
> > > In case it helps on the Linux front the least intrusive way is to use
> > > ALLOW_ERROR_INJECTION(). It's what I hope we'll slowly strive for on
> > > the block layer and filesystems slowly. That incurs one macro call per error
> > > routine you want to allow error injection on.
> > >
> > > Then you use debugfs to dynamically enable / disable the error
> > > injection / rate etc.
> > >
> > > So I think this begs the question, what error injection mechanisms
> > > exist for qemu and would new functionality be welcomed?
> >
> > So what paths can actually cause this to fail?
>
> If you are asking about adopting something like the failmalloc
> should_fail() strategy in qemu, you'd essentially open code a call to
> a should_fail() and in it pass the arguments you want from your
> own call down. If you want to ignore size you can just pass 0
> for instance.
>
> > Looking at the upstream
> > code in init_hdm_decoder() looks like there are only a few things that
> > are checked.
>
> If you mean in Linux, you would open code a should_fail()
> specific to the area as in this commit old commit example, and
> adding a respective kconfig entry for it:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8a8b6502fb669c3a0638a08955442814cedc86b1
>
> Eech of these knobs then get its own probability, times, and space
> debugfs entries which let the routine should_fail() fail when the
> parameters set meet the criteria set by debugfs.
>
> There are ways to make this much more scalable though, but I had not
> seen many efforts to do so. I did start such an approach using debugfs
> specific to *one* kconfig entry, for instance see this block layer proposed
> change, which would in turn enable tons of different ways to enable failing
> if CONFIG_FAIL_ADD_DISK would be used:
>
> https://lore.kernel.org/linux-block/20210512064629.13899-9-mcgrof@kernel.org/
>
> However, at the recent discussion at LSFMM for this we decided instead
> to just sprinkle ALLOW_ERROR_INJECTION() after each routine. Otherwise
> you are open coding tons of new "should_fail()" calls in your runtime
> path and that can make it hard to review patches and is just a lot of
> noise in code.
>
> But with CONFIG_FAIL_FUNCTION this means you don't have to open code
> should_fail() calls, but instead for each routine you want to add a failure
> injection support you'd just use ALLOW_ERROR_INJECTION() per call.

So cxl_test takes the opposite approach and tries not to pollute the
production code with test instrumentation. All of the infrastructure
to replace calls and inject mocked values is self contained in
tools/testing/cxl/ where it builds replacement modules with test
instrumentation. Otherwise its a maintenance burden, in my view, to
read the error injection macros in the nominal code paths.

> Read Documentation/fault-injection/fault-injection.rst on
> fail_function/injectable and fail_function/<function-name>/retval,
> so we could do for instance, to avoid a namespace clash I just
> added the cxl_ prefix:

Certainly those would be good to use behind the mocked interfaces.

Luis Chamberlain May 13, 2022, 7:31 p.m. UTC | #9

On Fri, May 13, 2022 at 12:14:51PM -0700, Dan Williams wrote:
> On Fri, May 13, 2022 at 8:12 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > But with CONFIG_FAIL_FUNCTION this means you don't have to open code
> > should_fail() calls, but instead for each routine you want to add a failure
> > injection support you'd just use ALLOW_ERROR_INJECTION() per call.
> 
> So cxl_test takes the opposite approach and tries not to pollute the
> production code with test instrumentation. All of the infrastructure
> to replace calls and inject mocked values is self contained in
> tools/testing/cxl/ where it builds replacement modules with test
> instrumentation. Otherwise its a maintenance burden, in my view, to
> read the error injection macros in the nominal code paths.

Is relying on just ALLOW_ERROR_INJECTION() per routine you'd want
to enable error injection for really too much to swallow?

  Luis

Dan Williams May 19, 2022, 5:09 a.m. UTC | #10

On Fri, May 13, 2022 at 12:32 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Fri, May 13, 2022 at 12:14:51PM -0700, Dan Williams wrote:
> > On Fri, May 13, 2022 at 8:12 AM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > > But with CONFIG_FAIL_FUNCTION this means you don't have to open code
> > > should_fail() calls, but instead for each routine you want to add a failure
> > > injection support you'd just use ALLOW_ERROR_INJECTION() per call.
> >
> > So cxl_test takes the opposite approach and tries not to pollute the
> > production code with test instrumentation. All of the infrastructure
> > to replace calls and inject mocked values is self contained in
> > tools/testing/cxl/ where it builds replacement modules with test
> > instrumentation. Otherwise its a maintenance burden, in my view, to
> > read the error injection macros in the nominal code paths.
>
> Is relying on just ALLOW_ERROR_INJECTION() per routine you'd want
> to enable error injection for really too much to swallow?

Inline? To me, yes. However, it seems the perfect thing to hide
out-of-line in a mocked call injected from tools/testing/.

[RFC,02/15] cxl/core/hdm: Bail on endpoint init fail

Commit Message

Comments

Patch