diff mbox

[PATCHv2] of: Add generic handling for ePAPR 1.1 fail-sss states

Message ID 20160829223542.18871-1-tony@atomide.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tony Lindgren Aug. 29, 2016, 10:35 p.m. UTC
We have devices that are in incomplete state, but still need to be
probed to allow properly idling them for PM. Some examples are
devices that are not pinned out on certain packages, or otherwise
unusable on some SoCs.

Setting status = "disabled" cannot be used for this case. Setting
"disabled" makes Linux ignore these devices so they are not probed
and can never get idled properly by Linux PM runtime.

The proper way deal with the incomplete devices is to probe them,
then allow the driver to check the state, and then disable or idle
the device using PM runtime. To do this, we need to often enable
the device and run some device specific code to idle the device.

Also boards may have needs to disable and idle unused devices.
This may be needed for example if all resources are not wired
for a certain board variant.

It seems we can use the ePAPR 1.1 status fail-sss to do this.
Quoting "Table 2-4 Values for status property" we have "fail-sss":

"Indicates that the device is not operational. A serious error was
 detected in the device and it is unlikely to become operational
 without repair. The sss portion of the value is specific to the
 device and indicates the error condition detected."

We can handle these fail states can be handled in a generic
way. So let's introduce a generic status = "fail-" that
describes a situation where a device driver probe should just
shut down or idle the device if possible and then bail out.
This allows the SoC variant and board specific dts files to set
the status as needed.

The suggested usage in a device driver probe is:

static int foo_probe(struct platform_device *pdev)
{
	int err;
	const char *status;
	...

        pm_runtime_enable(&pdev->dev);
        pm_runtime_get_sync(&pdev->dev);
        pm_runtime_use_autosuspend(&pdev->dev);
	...

	/* Configure device, load firmware, idle device */
	...

	if (of_device_is_incomplete(pdev->dev.of_node, status)) {
		if (!strcmp("hw-incomplete-pins", status)) {
			dev_info(&pdev->dev,
				 "Unusable hardware: Not pinned out\n");
			err = -ENODEV;
			goto out;
		}
		if (!strcmp("hw-missing-daughter-card")) {
			err = -EPROBE_DEFER;
			goto out;
		}
		if (!strcmp("hw-buggy-dma")) {
			dev_warn(&pdev->dev,
				 "Replace hardware for working DMA\n");
		}
	}

	/* Continue normal device probe */
	...

	return 0;

out:
	pm_runtime_dont_use_autosuspend(&pdev->dev);
	pm_runtime_put_sync(&pdev->dev);
	pm_runtime_disable(&pdev->dev);

	return err;
}

Cc: Nishanth Menon <nm@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>
Cc: Tom Rini <trini@konsulko.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---
Changes since v1 ([PATCH] of: Add generic handling for hardware incomplete
fail state):

- Make more generic as suggested by Frank but stick with
  "operational status of a device" approch most people seem
  to prefer that

- Pass the failure status back to caller so the caller may
  attempt to handle the failure status

- Improved the description with more examples

 drivers/of/base.c  | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/of.h |  8 +++++++
 2 files changed, 69 insertions(+), 3 deletions(-)

Comments

Rob Herring Aug. 30, 2016, 12:23 a.m. UTC | #1
On Mon, Aug 29, 2016 at 5:35 PM, Tony Lindgren <tony@atomide.com> wrote:
> We have devices that are in incomplete state, but still need to be
> probed to allow properly idling them for PM. Some examples are
> devices that are not pinned out on certain packages, or otherwise
> unusable on some SoCs.
>
> Setting status = "disabled" cannot be used for this case. Setting
> "disabled" makes Linux ignore these devices so they are not probed
> and can never get idled properly by Linux PM runtime.
>
> The proper way deal with the incomplete devices is to probe them,
> then allow the driver to check the state, and then disable or idle
> the device using PM runtime. To do this, we need to often enable
> the device and run some device specific code to idle the device.
>
> Also boards may have needs to disable and idle unused devices.
> This may be needed for example if all resources are not wired
> for a certain board variant.
>
> It seems we can use the ePAPR 1.1 status fail-sss to do this.
> Quoting "Table 2-4 Values for status property" we have "fail-sss":
>
> "Indicates that the device is not operational. A serious error was
>  detected in the device and it is unlikely to become operational
>  without repair. The sss portion of the value is specific to the
>  device and indicates the error condition detected."

I had read this as 'sss' is just 3 characters, but I guess that doesn't matter.

> We can handle these fail states can be handled in a generic
> way. So let's introduce a generic status = "fail-" that
> describes a situation where a device driver probe should just
> shut down or idle the device if possible and then bail out.
> This allows the SoC variant and board specific dts files to set
> the status as needed.
>
> The suggested usage in a device driver probe is:
>
> static int foo_probe(struct platform_device *pdev)
> {
>         int err;
>         const char *status;
>         ...
>
>         pm_runtime_enable(&pdev->dev);
>         pm_runtime_get_sync(&pdev->dev);
>         pm_runtime_use_autosuspend(&pdev->dev);
>         ...
>
>         /* Configure device, load firmware, idle device */
>         ...
>
>         if (of_device_is_incomplete(pdev->dev.of_node, status)) {

&status

>                 if (!strcmp("hw-incomplete-pins", status)) {
>                         dev_info(&pdev->dev,
>                                  "Unusable hardware: Not pinned out\n");
>                         err = -ENODEV;
>                         goto out;
>                 }
>                 if (!strcmp("hw-missing-daughter-card")) {
>                         err = -EPROBE_DEFER;

This implies we're going to change this on the fly? I guess
disabled->okay can already happen.

>                         goto out;
>                 }
>                 if (!strcmp("hw-buggy-dma")) {
>                         dev_warn(&pdev->dev,
>                                  "Replace hardware for working DMA\n");
>                 }
>         }
>
>         /* Continue normal device probe */
>         ...
>
>         return 0;
>
> out:
>         pm_runtime_dont_use_autosuspend(&pdev->dev);
>         pm_runtime_put_sync(&pdev->dev);
>         pm_runtime_disable(&pdev->dev);
>
>         return err;
> }
>
> Cc: Nishanth Menon <nm@ti.com>
> Cc: Tero Kristo <t-kristo@ti.com>
> Cc: Tom Rini <trini@konsulko.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> ---
> Changes since v1 ([PATCH] of: Add generic handling for hardware incomplete
> fail state):
>
> - Make more generic as suggested by Frank but stick with
>   "operational status of a device" approch most people seem
>   to prefer that
>
> - Pass the failure status back to caller so the caller may
>   attempt to handle the failure status
>
> - Improved the description with more examples
>
>  drivers/of/base.c  | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++---
>  include/linux/of.h |  8 +++++++
>  2 files changed, 69 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index ff37f6d..4d39857 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -550,8 +550,8 @@ EXPORT_SYMBOL(of_machine_is_compatible);
>   *
>   *  @device: Node to check for availability, with locks already held
>   *
> - *  Returns true if the status property is absent or set to "okay" or "ok",
> - *  false otherwise
> + *  Returns true if the status property is absent or set to "okay", "ok",
> + *  or starting with "fail-", false otherwise
>   */
>  static bool __of_device_is_available(const struct device_node *device)
>  {
> @@ -566,7 +566,8 @@ static bool __of_device_is_available(const struct device_node *device)
>                 return true;
>
>         if (statlen > 0) {
> -               if (!strcmp(status, "okay") || !strcmp(status, "ok"))
> +               if (!strcmp(status, "okay") || !strcmp(status, "ok") ||
> +                   !strncmp(status, "fail-", 5))
>                         return true;
>         }
>
> @@ -595,6 +596,63 @@ bool of_device_is_available(const struct device_node *device)
>  EXPORT_SYMBOL(of_device_is_available);
>
>  /**
> + *  __of_device_is_incomplete - check if a device is incomplete
> + *
> + *  @device: Node to check for partial failure with locks already held
> + *  @status: Status string for optional handling of the fail-sss state
> + */
> +static bool __of_device_is_incomplete(const struct device_node *device,
> +                                     const char **status)
> +{
> +       const char *s, *m = "fail-";
> +       int slen, mlen;
> +
> +       if (!device)
> +               return false;
> +
> +       s = __of_get_property(device, "status", &slen);

You can use the string helper function here (or is the lock a problem?).

Overall, seems okay to me.

> +       if (!s)
> +               return false;
> +
> +       mlen = strlen(m);
> +       if (slen <= mlen)
> +               return false;
> +
> +       if (strncmp("fail-", s, mlen))
> +               return false;
> +
> +       *status = s + mlen;
> +
> +       return true;
> +}
> +
> +/**
> + *  of_device_is_incomplete - check if a device is incomplete
> + *
> + *  @device: Node to check for partial failure
> + *  @status: Status string for optional handling of the fail-sss state
> + *
> + *  Returns true if the status property is set to "fail-sss",
> + *  false otherwise. Fore more information, see fail-sss in ePAPR 1.1
> + *  "Table 2-4 Values for status property".
> + *
> + *  The caller can optionally try to handle the error based on the
> + *  status string.
> + */
> +bool of_device_is_incomplete(const struct device_node *device,
> +                            const char **status)
> +{
> +       unsigned long flags;
> +       bool res;
> +
> +       raw_spin_lock_irqsave(&devtree_lock, flags);
> +       res = __of_device_is_incomplete(device, status);
> +       raw_spin_unlock_irqrestore(&devtree_lock, flags);
> +       return res;
> +}
> +EXPORT_SYMBOL(of_device_is_incomplete);
> +
> +/**
>   *  of_device_is_big_endian - check if a device has BE registers
>   *
>   *  @device: Node to check for endianness
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 3d9ff8e..b593936 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -320,6 +320,8 @@ extern int of_device_is_compatible(const struct device_node *device,
>  extern int of_device_compatible_match(struct device_node *device,
>                                       const char *const *compat);
>  extern bool of_device_is_available(const struct device_node *device);
> +extern bool of_device_is_incomplete(const struct device_node *device,
> +                                   const char **status);
>  extern bool of_device_is_big_endian(const struct device_node *device);
>  extern const void *of_get_property(const struct device_node *node,
>                                 const char *name,
> @@ -504,6 +506,12 @@ static inline bool of_device_is_available(const struct device_node *device)
>         return false;
>  }
>
> +static inline bool of_device_is_incomplete(const struct device_node *device,
> +                                          const char **status)
> +{
> +       return false;
> +}
> +
>  static inline bool of_device_is_big_endian(const struct device_node *device)
>  {
>         return false;
> --
> 2.9.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Aug. 30, 2016, 12:39 a.m. UTC | #2
* Rob Herring <robh+dt@kernel.org> [160829 17:24]:
> On Mon, Aug 29, 2016 at 5:35 PM, Tony Lindgren <tony@atomide.com> wrote:
> > It seems we can use the ePAPR 1.1 status fail-sss to do this.
> > Quoting "Table 2-4 Values for status property" we have "fail-sss":
> >
> > "Indicates that the device is not operational. A serious error was
> >  detected in the device and it is unlikely to become operational
> >  without repair. The sss portion of the value is specific to the
> >  device and indicates the error condition detected."
> 
> I had read this as 'sss' is just 3 characters, but I guess that doesn't matter.

Yeah I'd assume it does not matter for the length.

> > We can handle these fail states can be handled in a generic
> > way. So let's introduce a generic status = "fail-" that
> > describes a situation where a device driver probe should just
> > shut down or idle the device if possible and then bail out.
> > This allows the SoC variant and board specific dts files to set
> > the status as needed.
> >
> > The suggested usage in a device driver probe is:
> >
> > static int foo_probe(struct platform_device *pdev)
> > {
> >         int err;
> >         const char *status;
> >         ...
> >
> >         pm_runtime_enable(&pdev->dev);
> >         pm_runtime_get_sync(&pdev->dev);
> >         pm_runtime_use_autosuspend(&pdev->dev);
> >         ...
> >
> >         /* Configure device, load firmware, idle device */
> >         ...
> >
> >         if (of_device_is_incomplete(pdev->dev.of_node, status)) {
> 
> &status

Oops, I guess I should compile test the example.

> >                 if (!strcmp("hw-incomplete-pins", status)) {
> >                         dev_info(&pdev->dev,
> >                                  "Unusable hardware: Not pinned out\n");
> >                         err = -ENODEV;
> >                         goto out;
> >                 }
> >                 if (!strcmp("hw-missing-daughter-card")) {
> >                         err = -EPROBE_DEFER;
> 
> This implies we're going to change this on the fly? I guess
> disabled->okay can already happen.

Well let's assume the bootloader sets some i2c controlled daughter
board with "fail-hw-missing-daughter-card", then in theory kernel
could probe it if it pops up on the i2c bus later on. Not sure if
we want to do this, but it seems we could..

> > +static bool __of_device_is_incomplete(const struct device_node *device,
> > +                                     const char **status)
> > +{
> > +       const char *s, *m = "fail-";
> > +       int slen, mlen;
> > +
> > +       if (!device)
> > +               return false;
> > +
> > +       s = __of_get_property(device, "status", &slen);
> 
> You can use the string helper function here (or is the lock a problem?).

I'll check.

> Overall, seems okay to me.

OK thanks for looking.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frank Rowand Aug. 31, 2016, 8:50 p.m. UTC | #3
Hi Tony,

On 08/29/16 15:35, Tony Lindgren wrote:
> We have devices that are in incomplete state, but still need to be
> probed to allow properly idling them for PM. Some examples are
> devices that are not pinned out on certain packages, or otherwise
> unusable on some SoCs.
> 
> Setting status = "disabled" cannot be used for this case. Setting
> "disabled" makes Linux ignore these devices so they are not probed
> and can never get idled properly by Linux PM runtime.
> 
> The proper way deal with the incomplete devices is to probe them,
> then allow the driver to check the state, and then disable or idle
> the device using PM runtime. To do this, we need to often enable
> the device and run some device specific code to idle the device.
> 
> Also boards may have needs to disable and idle unused devices.
> This may be needed for example if all resources are not wired
> for a certain board variant.
> 
> It seems we can use the ePAPR 1.1 status fail-sss to do this.
> Quoting "Table 2-4 Values for status property" we have "fail-sss":
> 
> "Indicates that the device is not operational. A serious error was
>  detected in the device and it is unlikely to become operational
>  without repair. The sss portion of the value is specific to the
>  device and indicates the error condition detected."
> 
> We can handle these fail states can be handled in a generic
> way. So let's introduce a generic status = "fail-" that
> describes a situation where a device driver probe should just
> shut down or idle the device if possible and then bail out.
> This allows the SoC variant and board specific dts files to set
> the status as needed.
> 
> The suggested usage in a device driver probe is:
> 
> static int foo_probe(struct platform_device *pdev)
> {
> 	int err;
> 	const char *status;
> 	...
> 
>         pm_runtime_enable(&pdev->dev);
>         pm_runtime_get_sync(&pdev->dev);
>         pm_runtime_use_autosuspend(&pdev->dev);
> 	...
> 
> 	/* Configure device, load firmware, idle device */
> 	...
> 
> 	if (of_device_is_incomplete(pdev->dev.of_node, status)) {
> 		if (!strcmp("hw-incomplete-pins", status)) {
> 			dev_info(&pdev->dev,
> 				 "Unusable hardware: Not pinned out\n");
> 			err = -ENODEV;
> 			goto out;
> 		}
> 		if (!strcmp("hw-missing-daughter-card")) {
> 			err = -EPROBE_DEFER;
> 			goto out;
> 		}
> 		if (!strcmp("hw-buggy-dma")) {
> 			dev_warn(&pdev->dev,
> 				 "Replace hardware for working DMA\n");
> 		}
> 	}

What if the device has two issues to be reported?  You can not
specify two different values for the status property.

What if the firmware wants to report that the hardware failed
self-test (thus status = "fail-sss") but is already using
status to describe the hardware?


> 
> 	/* Continue normal device probe */
> 	...
> 
> 	return 0;
> 
> out:
> 	pm_runtime_dont_use_autosuspend(&pdev->dev);
> 	pm_runtime_put_sync(&pdev->dev);
> 	pm_runtime_disable(&pdev->dev);
> 
> 	return err;
> }
> 
> Cc: Nishanth Menon <nm@ti.com>
> Cc: Tero Kristo <t-kristo@ti.com>
> Cc: Tom Rini <trini@konsulko.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> ---
> Changes since v1 ([PATCH] of: Add generic handling for hardware incomplete
> fail state):
> 
> - Make more generic as suggested by Frank but stick with
>   "operational status of a device" approch most people seem
>   to prefer that

I am still opposed to using the status property for this purpose.

The status property is intended to report an operational problem with
a device or a device that the kernel can cause to be operational (such
as a quiescent cpu being enabled).  It is the only property I am aware
of to report _state_.

It is unfortunate that Linux has adopted the practice of overloading status
to determine whether a piece of hardware exists or does not exist.  This
is extremely useful for the way we structure the .dts and .dtsi files but
should have used a new property name.  We are stuck with that choice of
using the status property for two purposes, first the state of a device,
and secondly the hardware description of existing or not existing.

Why not just create a new property that describes the hardware?
Perhaps something like:

   incomplete = "pins_output", "buggy_dma";


> - Pass the failure status back to caller so the caller may
>   attempt to handle the failure status
> 
> - Improved the description with more examples
> 
>  drivers/of/base.c  | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++---
>  include/linux/of.h |  8 +++++++
>  2 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index ff37f6d..4d39857 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -550,8 +550,8 @@ EXPORT_SYMBOL(of_machine_is_compatible);
>   *
>   *  @device: Node to check for availability, with locks already held
>   *
> - *  Returns true if the status property is absent or set to "okay" or "ok",
> - *  false otherwise
> + *  Returns true if the status property is absent or set to "okay", "ok",
> + *  or starting with "fail-", false otherwise
>   */
>  static bool __of_device_is_available(const struct device_node *device)
>  {
> @@ -566,7 +566,8 @@ static bool __of_device_is_available(const struct device_node *device)
>  		return true;
>  
>  	if (statlen > 0) {
> -		if (!strcmp(status, "okay") || !strcmp(status, "ok"))
> +		if (!strcmp(status, "okay") || !strcmp(status, "ok") ||
> +		    !strncmp(status, "fail-", 5))
>  			return true;
>  	}
>  
> @@ -595,6 +596,63 @@ bool of_device_is_available(const struct device_node *device)
>  EXPORT_SYMBOL(of_device_is_available);
>  
>  /**
> + *  __of_device_is_incomplete - check if a device is incomplete

It is not checking if a device is incomplete.  It is checking whether the
device is operational _or_ incomplete.

This is conflating concepts and likely to be confusing.  This is the problem
with overloading the status property for yet another purpose.


> + *
> + *  @device: Node to check for partial failure with locks already held
> + *  @status: Status string for optional handling of the fail-sss state
> + */
> +static bool __of_device_is_incomplete(const struct device_node *device,
> +				      const char **status)
> +{
> +	const char *s, *m = "fail-";
> +	int slen, mlen;
> +
> +	if (!device)
> +		return false;
> +
> +	s = __of_get_property(device, "status", &slen);
> +	if (!s)
> +		return false;
> +
> +	mlen = strlen(m);
> +	if (slen <= mlen)
> +		return false;
> +
> +	if (strncmp("fail-", s, mlen))
> +		return false;
> +
> +	*status = s + mlen;
> +
> +	return true;
> +}
> +
> +/**
> + *  of_device_is_incomplete - check if a device is incomplete
> + *
> + *  @device: Node to check for partial failure
> + *  @status: Status string for optional handling of the fail-sss state
> + *
> + *  Returns true if the status property is set to "fail-sss",
> + *  false otherwise. Fore more information, see fail-sss in ePAPR 1.1
> + *  "Table 2-4 Values for status property".
> + *
> + *  The caller can optionally try to handle the error based on the
> + *  status string.
> + */
> +bool of_device_is_incomplete(const struct device_node *device,
> +			     const char **status)
> +{
> +	unsigned long flags;
> +	bool res;
> +
> +	raw_spin_lock_irqsave(&devtree_lock, flags);
> +	res = __of_device_is_incomplete(device, status);
> +	raw_spin_unlock_irqrestore(&devtree_lock, flags);
> +	return res;
> +}
> +EXPORT_SYMBOL(of_device_is_incomplete);
> +
> +/**
>   *  of_device_is_big_endian - check if a device has BE registers
>   *
>   *  @device: Node to check for endianness
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 3d9ff8e..b593936 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -320,6 +320,8 @@ extern int of_device_is_compatible(const struct device_node *device,
>  extern int of_device_compatible_match(struct device_node *device,
>  				      const char *const *compat);
>  extern bool of_device_is_available(const struct device_node *device);
> +extern bool of_device_is_incomplete(const struct device_node *device,
> +				    const char **status);
>  extern bool of_device_is_big_endian(const struct device_node *device);
>  extern const void *of_get_property(const struct device_node *node,
>  				const char *name,
> @@ -504,6 +506,12 @@ static inline bool of_device_is_available(const struct device_node *device)
>  	return false;
>  }
>  
> +static inline bool of_device_is_incomplete(const struct device_node *device,
> +					   const char **status)
> +{
> +	return false;
> +}
> +
>  static inline bool of_device_is_big_endian(const struct device_node *device)
>  {
>  	return false;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Aug. 31, 2016, 9:41 p.m. UTC | #4
* Frank Rowand <frowand.list@gmail.com> [160831 13:51]:
> On 08/29/16 15:35, Tony Lindgren wrote:
> > 	if (of_device_is_incomplete(pdev->dev.of_node, status)) {
> > 		if (!strcmp("hw-incomplete-pins", status)) {
> > 			dev_info(&pdev->dev,
> > 				 "Unusable hardware: Not pinned out\n");
> > 			err = -ENODEV;
> > 			goto out;
> > 		}
> > 		if (!strcmp("hw-missing-daughter-card")) {
> > 			err = -EPROBE_DEFER;
> > 			goto out;
> > 		}
> > 		if (!strcmp("hw-buggy-dma")) {
> > 			dev_warn(&pdev->dev,
> > 				 "Replace hardware for working DMA\n");
> > 		}
> > 	}
> 
> What if the device has two issues to be reported?  You can not
> specify two different values for the status property.

That's a good point.

> What if the firmware wants to report that the hardware failed
> self-test (thus status = "fail-sss") but is already using
> status to describe the hardware?

Yeah that's true. Do you know what the "sss" stands for here?
Status Self teSt, or Side Scan Sonar? :)

> > - Make more generic as suggested by Frank but stick with
> >   "operational status of a device" approch most people seem
> >   to prefer that
> 
> I am still opposed to using the status property for this purpose.
> 
> The status property is intended to report an operational problem with
> a device or a device that the kernel can cause to be operational (such
> as a quiescent cpu being enabled).  It is the only property I am aware
> of to report _state_.
> 
> It is unfortunate that Linux has adopted the practice of overloading status
> to determine whether a piece of hardware exists or does not exist.  This
> is extremely useful for the way we structure the .dts and .dtsi files but
> should have used a new property name.  We are stuck with that choice of
> using the status property for two purposes, first the state of a device,
> and secondly the hardware description of existing or not existing.
> 
> Why not just create a new property that describes the hardware?
> Perhaps something like:
> 
>    incomplete = "pins_output", "buggy_dma";

New property for incomplete works for me. Rob, got any comments here?

> > + *  __of_device_is_incomplete - check if a device is incomplete
> 
> It is not checking if a device is incomplete.  It is checking whether the
> device is operational _or_ incomplete.
> 
> This is conflating concepts and likely to be confusing.  This is the problem
> with overloading the status property for yet another purpose.

Sure that's a valid point.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Herring Sept. 8, 2016, 1:38 p.m. UTC | #5
On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren <tony@atomide.com> wrote:
> * Frank Rowand <frowand.list@gmail.com> [160831 13:51]:
>> On 08/29/16 15:35, Tony Lindgren wrote:
>> >     if (of_device_is_incomplete(pdev->dev.of_node, status)) {
>> >             if (!strcmp("hw-incomplete-pins", status)) {
>> >                     dev_info(&pdev->dev,
>> >                              "Unusable hardware: Not pinned out\n");
>> >                     err = -ENODEV;
>> >                     goto out;
>> >             }
>> >             if (!strcmp("hw-missing-daughter-card")) {
>> >                     err = -EPROBE_DEFER;
>> >                     goto out;
>> >             }
>> >             if (!strcmp("hw-buggy-dma")) {
>> >                     dev_warn(&pdev->dev,
>> >                              "Replace hardware for working DMA\n");
>> >             }
>> >     }
>>
>> What if the device has two issues to be reported?  You can not
>> specify two different values for the status property.
>
> That's a good point.
>
>> What if the firmware wants to report that the hardware failed
>> self-test (thus status = "fail-sss") but is already using
>> status to describe the hardware?
>
> Yeah that's true. Do you know what the "sss" stands for here?
> Status Self teSt, or Side Scan Sonar? :)

String String String!!!?

No clue for me.

>
>> > - Make more generic as suggested by Frank but stick with
>> >   "operational status of a device" approch most people seem
>> >   to prefer that
>>
>> I am still opposed to using the status property for this purpose.
>>
>> The status property is intended to report an operational problem with
>> a device or a device that the kernel can cause to be operational (such
>> as a quiescent cpu being enabled).  It is the only property I am aware
>> of to report _state_.

Yes, in theory a device can go from disabled to okay, but that's
generally never been supported. Linux takes the simple approach of
"disabled" means ignore it. I think we'll see that change with
overlays.

>> It is unfortunate that Linux has adopted the practice of overloading status
>> to determine whether a piece of hardware exists or does not exist.  This
>> is extremely useful for the way we structure the .dts and .dtsi files but
>> should have used a new property name.  We are stuck with that choice of
>> using the status property for two purposes, first the state of a device,
>> and secondly the hardware description of existing or not existing.

I don't agree. Generally, disabled means the h/w is there, but don't
use it. There may be some cases where the hardware doesn't exist for
the convenience of having a single dts, but that's the exception.

>> Why not just create a new property that describes the hardware?
>> Perhaps something like:
>>
>>    incomplete = "pins_output", "buggy_dma";
>
> New property for incomplete works for me. Rob, got any comments here?

Pins not muxed out or connected on the board has to be the #1 reason
for disabled status. I don't think we need or want another way to
express that.

We may have discussed this, but why can't the driver that checks fail
state just check whatever was used to set the device to fail in the
first place?

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Sept. 8, 2016, 2:20 p.m. UTC | #6
On 09/08/2016 08:38 AM, Rob Herring wrote:
> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren <tony@atomide.com> wrote:
[...]
>>> It is unfortunate that Linux has adopted the practice of overloading status
>>> to determine whether a piece of hardware exists or does not exist.  This
>>> is extremely useful for the way we structure the .dts and .dtsi files but
>>> should have used a new property name.  We are stuck with that choice of
>>> using the status property for two purposes, first the state of a device,
>>> and secondly the hardware description of existing or not existing.
>
> I don't agree. Generally, disabled means the h/w is there, but don't
> use it. There may be some cases where the hardware doesn't exist for
> the convenience of having a single dts, but that's the exception.
>

Minor point here: when SoCs are manufactured, even though the silicon 
die may have a hardware block, it is completely efused out based on 
paper spin. in effect such a hardware block "does not exist". The 
number of such paper spins are not exception cases, but rather 
standard for SoC vendors - maintaining dts per paper spin is just too 
impossible to maintain (DRA7 as an example maintains a single 
dra7.dtsi as the root for all paper spins.. the variations if 
maintained as seperate dts might infact end up being larger in number 
than all the dts we have in arch/arm/boot/dts) - typically as an soc 
vendor pushes a specific SoC to multiple markets, this tends to be a norm.
Tony Lindgren Sept. 8, 2016, 3:58 p.m. UTC | #7
* Rob Herring <robh+dt@kernel.org> [160908 06:38]:
> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren <tony@atomide.com> wrote:
> > * Frank Rowand <frowand.list@gmail.com> [160831 13:51]:
> >> I am still opposed to using the status property for this purpose.
> >>
> >> The status property is intended to report an operational problem with
> >> a device or a device that the kernel can cause to be operational (such
> >> as a quiescent cpu being enabled).  It is the only property I am aware
> >> of to report _state_.
> 
> Yes, in theory a device can go from disabled to okay, but that's
> generally never been supported. Linux takes the simple approach of
> "disabled" means ignore it. I think we'll see that change with
> overlays.

Yeah I think we have to assume that.

> >> It is unfortunate that Linux has adopted the practice of overloading status
> >> to determine whether a piece of hardware exists or does not exist.  This
> >> is extremely useful for the way we structure the .dts and .dtsi files but
> >> should have used a new property name.  We are stuck with that choice of
> >> using the status property for two purposes, first the state of a device,
> >> and secondly the hardware description of existing or not existing.
> 
> I don't agree. Generally, disabled means the h/w is there, but don't
> use it. There may be some cases where the hardware doesn't exist for
> the convenience of having a single dts, but that's the exception.
> 
> >> Why not just create a new property that describes the hardware?
> >> Perhaps something like:
> >>
> >>    incomplete = "pins_output", "buggy_dma";
> >
> > New property for incomplete works for me. Rob, got any comments here?
> 
> Pins not muxed out or connected on the board has to be the #1 reason
> for disabled status. I don't think we need or want another way to
> express that.

Both status and and a separate property work for me.

If no other considerations, we should probably pick something with a
a limited set of states to avoid it getting out of control and being
misused for something weird like driver probe order..

For example, just status = "fail" would be enough for the cases I've
seen. That would still allow probe the device, then PM runtime idle
it and bail out with -ENODEV.

For whatever warnings or errors the driver needs to show, the driver
could probably figure it out. I don't know if we want to or need to
pass any informational messages with the incomplete status or
property :)

> We may have discussed this, but why can't the driver that checks fail
> state just check whatever was used to set the device to fail in the
> first place?

Well there may be no way to check if something is pinned out based on
the hardware. The same SoC can be packaged with different pins. In that
case only the die id or serial number for each produced chip is
different, not the revision numbers. So for cases like that the dtb
file or kernel cmdline is the only information available. And we can't
assume pinctrl is required as it's perfectly fine to do that only in
the bootloader for the static cases to save memory.

Just to consider other ways of doing it, we could use the compatible
flag to tag devices that need to be just idled on probe, but that does
not seem like generic solution to me.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frank Rowand Sept. 8, 2016, 7:05 p.m. UTC | #8
On 09/08/16 06:38, Rob Herring wrote:
> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren <tony@atomide.com> wrote:
>> * Frank Rowand <frowand.list@gmail.com> [160831 13:51]:
>>> On 08/29/16 15:35, Tony Lindgren wrote:
>>>>     if (of_device_is_incomplete(pdev->dev.of_node, status)) {
>>>>             if (!strcmp("hw-incomplete-pins", status)) {
>>>>                     dev_info(&pdev->dev,
>>>>                              "Unusable hardware: Not pinned out\n");
>>>>                     err = -ENODEV;
>>>>                     goto out;
>>>>             }
>>>>             if (!strcmp("hw-missing-daughter-card")) {
>>>>                     err = -EPROBE_DEFER;
>>>>                     goto out;
>>>>             }
>>>>             if (!strcmp("hw-buggy-dma")) {
>>>>                     dev_warn(&pdev->dev,
>>>>                              "Replace hardware for working DMA\n");
>>>>             }
>>>>     }
>>>
>>> What if the device has two issues to be reported?  You can not
>>> specify two different values for the status property.
>>
>> That's a good point.
>>
>>> What if the firmware wants to report that the hardware failed
>>> self-test (thus status = "fail-sss") but is already using
>>> status to describe the hardware?
>>
>> Yeah that's true. Do you know what the "sss" stands for here?
>> Status Self teSt, or Side Scan Sonar? :)
> 
> String String String!!!?
> 
> No clue for me.
> 
>>
>>>> - Make more generic as suggested by Frank but stick with
>>>>   "operational status of a device" approch most people seem
>>>>   to prefer that
>>>
>>> I am still opposed to using the status property for this purpose.
>>>
>>> The status property is intended to report an operational problem with
>>> a device or a device that the kernel can cause to be operational (such
>>> as a quiescent cpu being enabled).  It is the only property I am aware
>>> of to report _state_.
> 
> Yes, in theory a device can go from disabled to okay, but that's
> generally never been supported. Linux takes the simple approach of
> "disabled" means ignore it. I think we'll see that change with
> overlays.
> 
>>> It is unfortunate that Linux has adopted the practice of overloading status
>>> to determine whether a piece of hardware exists or does not exist.  This
>>> is extremely useful for the way we structure the .dts and .dtsi files but
>>> should have used a new property name.  We are stuck with that choice of
>>> using the status property for two purposes, first the state of a device,
>>> and secondly the hardware description of existing or not existing.
> 
> I don't agree. Generally, disabled means the h/w is there, but don't
> use it. There may be some cases where the hardware doesn't exist for
> the convenience of having a single dts, but that's the exception.

That it is not an exception, but instead a frequent pattern for .dtsi files.
A quick look in arm:

  $ grep status *.dtsi | wc -l
  4842

  $ grep status *.dtsi | grep '"disabled"' | wc -l
  3431


>>> Why not just create a new property that describes the hardware?
>>> Perhaps something like:
>>>
>>>    incomplete = "pins_output", "buggy_dma";
>>
>> New property for incomplete works for me. Rob, got any comments here?
> 
> Pins not muxed out or connected on the board has to be the #1 reason
> for disabled status. I don't think we need or want another way to
> express that.

How is that expressed now?


> We may have discussed this, but why can't the driver that checks fail
> state just check whatever was used to set the device to fail in the
> first place?
> 
> Rob
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frank Rowand Sept. 8, 2016, 7:09 p.m. UTC | #9
On 09/08/16 08:58, Tony Lindgren wrote:
> * Rob Herring <robh+dt@kernel.org> [160908 06:38]:
>> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren <tony@atomide.com> wrote:
>>> * Frank Rowand <frowand.list@gmail.com> [160831 13:51]:
>>>> I am still opposed to using the status property for this purpose.
>>>>
>>>> The status property is intended to report an operational problem with
>>>> a device or a device that the kernel can cause to be operational (such
>>>> as a quiescent cpu being enabled).  It is the only property I am aware
>>>> of to report _state_.
>>
>> Yes, in theory a device can go from disabled to okay, but that's
>> generally never been supported. Linux takes the simple approach of
>> "disabled" means ignore it. I think we'll see that change with
>> overlays.
> 
> Yeah I think we have to assume that.
> 
>>>> It is unfortunate that Linux has adopted the practice of overloading status
>>>> to determine whether a piece of hardware exists or does not exist.  This
>>>> is extremely useful for the way we structure the .dts and .dtsi files but
>>>> should have used a new property name.  We are stuck with that choice of
>>>> using the status property for two purposes, first the state of a device,
>>>> and secondly the hardware description of existing or not existing.
>>
>> I don't agree. Generally, disabled means the h/w is there, but don't
>> use it. There may be some cases where the hardware doesn't exist for
>> the convenience of having a single dts, but that's the exception.
>>
>>>> Why not just create a new property that describes the hardware?
>>>> Perhaps something like:
>>>>
>>>>    incomplete = "pins_output", "buggy_dma";
>>>
>>> New property for incomplete works for me. Rob, got any comments here?
>>
>> Pins not muxed out or connected on the board has to be the #1 reason
>> for disabled status. I don't think we need or want another way to
>> express that.
> 
> Both status and and a separate property work for me.
> 
> If no other considerations, we should probably pick something with a
> a limited set of states to avoid it getting out of control and being
> misused for something weird like driver probe order..

I do not want to add another property that conveys state.  The property
that I was proposing does not convey state -- it describes the hardware.


> For example, just status = "fail" would be enough for the cases I've
> seen. That would still allow probe the device, then PM runtime idle
> it and bail out with -ENODEV.
> 
> For whatever warnings or errors the driver needs to show, the driver
> could probably figure it out. I don't know if we want to or need to
> pass any informational messages with the incomplete status or
> property :)
> 
>> We may have discussed this, but why can't the driver that checks fail
>> state just check whatever was used to set the device to fail in the
>> first place?
> 
> Well there may be no way to check if something is pinned out based on
> the hardware. The same SoC can be packaged with different pins. In that
> case only the die id or serial number for each produced chip is
> different, not the revision numbers. So for cases like that the dtb
> file or kernel cmdline is the only information available. And we can't
> assume pinctrl is required as it's perfectly fine to do that only in
> the bootloader for the static cases to save memory.
> 
> Just to consider other ways of doing it, we could use the compatible
> flag to tag devices that need to be just idled on probe, but that does
> not seem like generic solution to me.

Yuck.  Again overloading a property to convey multiple pieces of
information.


> 
> Regards,
> 
> Tony
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frank Rowand Sept. 8, 2016, 7:17 p.m. UTC | #10
On 09/08/16 12:09, Frank Rowand wrote:
> On 09/08/16 08:58, Tony Lindgren wrote:
>> * Rob Herring <robh+dt@kernel.org> [160908 06:38]:
>>> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren <tony@atomide.com> wrote:
>>>> * Frank Rowand <frowand.list@gmail.com> [160831 13:51]:
>>>>> I am still opposed to using the status property for this purpose.
>>>>>
>>>>> The status property is intended to report an operational problem with
>>>>> a device or a device that the kernel can cause to be operational (such
>>>>> as a quiescent cpu being enabled).  It is the only property I am aware
>>>>> of to report _state_.
>>>
>>> Yes, in theory a device can go from disabled to okay, but that's
>>> generally never been supported. Linux takes the simple approach of
>>> "disabled" means ignore it. I think we'll see that change with
>>> overlays.
>>
>> Yeah I think we have to assume that.
>>
>>>>> It is unfortunate that Linux has adopted the practice of overloading status
>>>>> to determine whether a piece of hardware exists or does not exist.  This
>>>>> is extremely useful for the way we structure the .dts and .dtsi files but
>>>>> should have used a new property name.  We are stuck with that choice of
>>>>> using the status property for two purposes, first the state of a device,
>>>>> and secondly the hardware description of existing or not existing.
>>>
>>> I don't agree. Generally, disabled means the h/w is there, but don't
>>> use it. There may be some cases where the hardware doesn't exist for
>>> the convenience of having a single dts, but that's the exception.
>>>
>>>>> Why not just create a new property that describes the hardware?
>>>>> Perhaps something like:
>>>>>
>>>>>    incomplete = "pins_output", "buggy_dma";
>>>>
>>>> New property for incomplete works for me. Rob, got any comments here?
>>>
>>> Pins not muxed out or connected on the board has to be the #1 reason
>>> for disabled status. I don't think we need or want another way to
>>> express that.
>>
>> Both status and and a separate property work for me.
>>
>> If no other considerations, we should probably pick something with a
>> a limited set of states to avoid it getting out of control and being
>> misused for something weird like driver probe order..
> 
> I do not want to add another property that conveys state.  The property
> that I was proposing does not convey state -- it describes the hardware.
> 
> 
>> For example, just status = "fail" would be enough for the cases I've
>> seen. That would still allow probe the device, then PM runtime idle
>> it and bail out with -ENODEV.
>>
>> For whatever warnings or errors the driver needs to show, the driver
>> could probably figure it out. I don't know if we want to or need to
>> pass any informational messages with the incomplete status or
>> property :)
>>
>>> We may have discussed this, but why can't the driver that checks fail
>>> state just check whatever was used to set the device to fail in the
>>> first place?
>>
>> Well there may be no way to check if something is pinned out based on
>> the hardware. The same SoC can be packaged with different pins. In that
>> case only the die id or serial number for each produced chip is
>> different, not the revision numbers. So for cases like that the dtb
>> file or kernel cmdline is the only information available. And we can't
>> assume pinctrl is required as it's perfectly fine to do that only in
>> the bootloader for the static cases to save memory.
>>
>> Just to consider other ways of doing it, we could use the compatible
>> flag to tag devices that need to be just idled on probe, but that does
>> not seem like generic solution to me.
> 
> Yuck.  Again overloading a property to convey multiple pieces of
> information.

I should have been more explicit in that statement.

If the hardware device does not have "wires" routed to a connector or
bus then it is still the same device.  Thus the compatible should be
the same.

The difference is the way the device is used in the SOC or board.

>>
>> Regards,
>>
>> Tony
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tony Lindgren Sept. 8, 2016, 8:19 p.m. UTC | #11
* Frank Rowand <frowand.list@gmail.com> [160908 12:18]:
> > On 09/08/16 08:58, Tony Lindgren wrote:
> >> Just to consider other ways of doing it, we could use the compatible
> >> flag to tag devices that need to be just idled on probe, but that does
> >> not seem like generic solution to me.
> > 
> > Yuck.  Again overloading a property to convey multiple pieces of
> > information.
> 
> I should have been more explicit in that statement.
> 
> If the hardware device does not have "wires" routed to a connector or
> bus then it is still the same device.  Thus the compatible should be
> the same.
> 
> The difference is the way the device is used in the SOC or board.

That is correct. The device is exactly the same.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Herring Sept. 9, 2016, 2:43 a.m. UTC | #12
On Thu, Sep 8, 2016 at 2:05 PM, Frank Rowand <frowand.list@gmail.com> wrote:
> On 09/08/16 06:38, Rob Herring wrote:
>> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren <tony@atomide.com> wrote:
>>> * Frank Rowand <frowand.list@gmail.com> [160831 13:51]:
>>>> On 08/29/16 15:35, Tony Lindgren wrote:
>>>>>     if (of_device_is_incomplete(pdev->dev.of_node, status)) {
>>>>>             if (!strcmp("hw-incomplete-pins", status)) {
>>>>>                     dev_info(&pdev->dev,
>>>>>                              "Unusable hardware: Not pinned out\n");
>>>>>                     err = -ENODEV;
>>>>>                     goto out;
>>>>>             }
>>>>>             if (!strcmp("hw-missing-daughter-card")) {
>>>>>                     err = -EPROBE_DEFER;
>>>>>                     goto out;
>>>>>             }
>>>>>             if (!strcmp("hw-buggy-dma")) {
>>>>>                     dev_warn(&pdev->dev,
>>>>>                              "Replace hardware for working DMA\n");
>>>>>             }
>>>>>     }
>>>>
>>>> What if the device has two issues to be reported?  You can not
>>>> specify two different values for the status property.
>>>
>>> That's a good point.
>>>
>>>> What if the firmware wants to report that the hardware failed
>>>> self-test (thus status = "fail-sss") but is already using
>>>> status to describe the hardware?
>>>
>>> Yeah that's true. Do you know what the "sss" stands for here?
>>> Status Self teSt, or Side Scan Sonar? :)
>>
>> String String String!!!?
>>
>> No clue for me.
>>
>>>
>>>>> - Make more generic as suggested by Frank but stick with
>>>>>   "operational status of a device" approch most people seem
>>>>>   to prefer that
>>>>
>>>> I am still opposed to using the status property for this purpose.
>>>>
>>>> The status property is intended to report an operational problem with
>>>> a device or a device that the kernel can cause to be operational (such
>>>> as a quiescent cpu being enabled).  It is the only property I am aware
>>>> of to report _state_.
>>
>> Yes, in theory a device can go from disabled to okay, but that's
>> generally never been supported. Linux takes the simple approach of
>> "disabled" means ignore it. I think we'll see that change with
>> overlays.
>>
>>>> It is unfortunate that Linux has adopted the practice of overloading status
>>>> to determine whether a piece of hardware exists or does not exist.  This
>>>> is extremely useful for the way we structure the .dts and .dtsi files but
>>>> should have used a new property name.  We are stuck with that choice of
>>>> using the status property for two purposes, first the state of a device,
>>>> and secondly the hardware description of existing or not existing.
>>
>> I don't agree. Generally, disabled means the h/w is there, but don't
>> use it. There may be some cases where the hardware doesn't exist for
>> the convenience of having a single dts, but that's the exception.
>
> That it is not an exception, but instead a frequent pattern for .dtsi files.
> A quick look in arm:
>
>   $ grep status *.dtsi | wc -l
>   4842
>
>   $ grep status *.dtsi | grep '"disabled"' | wc -l
>   3431

That's not what I mean. You can't grep for this. Yes, marking blocks
as disabled by default in the SoC dtsi file and then enabling in the
board file is standard practice. If it is left disabled that doesn't
mean it doesn't exist is what I was getting at.

And yes, there are other cases where things get disabled with efuses
or don't have pins. Sometimes there is no difference in parts at all
and it's just a given block is not tested in factory test.

>>>> Why not just create a new property that describes the hardware?
>>>> Perhaps something like:
>>>>
>>>>    incomplete = "pins_output", "buggy_dma";
>>>
>>> New property for incomplete works for me. Rob, got any comments here?
>>
>> Pins not muxed out or connected on the board has to be the #1 reason
>> for disabled status. I don't think we need or want another way to
>> express that.
>
> How is that expressed now?

status = "disabled";

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Rini Sept. 9, 2016, 2:10 p.m. UTC | #13
On Thu, Sep 08, 2016 at 09:43:03PM -0500, Rob Herring wrote:
> On Thu, Sep 8, 2016 at 2:05 PM, Frank Rowand <frowand.list@gmail.com> wrote:
> > On 09/08/16 06:38, Rob Herring wrote:
> >> On Wed, Aug 31, 2016 at 4:41 PM, Tony Lindgren <tony@atomide.com> wrote:
> >>> * Frank Rowand <frowand.list@gmail.com> [160831 13:51]:
[snip]
> >>>> Why not just create a new property that describes the hardware?
> >>>> Perhaps something like:
> >>>>
> >>>>    incomplete = "pins_output", "buggy_dma";
> >>>
> >>> New property for incomplete works for me. Rob, got any comments here?
> >>
> >> Pins not muxed out or connected on the board has to be the #1 reason
> >> for disabled status. I don't think we need or want another way to
> >> express that.
> >
> > How is that expressed now?
> 
> status = "disabled";

It might be worth repeating that the problem that needs to be solved is
that we have an IP block that exists in some sense but needs to be shut
down (or some similar case the driver can handle) rather than ignored
like today.  This is not the same as having 5 UARTs available from the
SoC but only one is actually in some useful physical state, so list all
5 in the dtsi, status disabled, and then in the board dts file enable
the exposed ones.
Matthijs van Duin Sept. 10, 2016, 1:11 a.m. UTC | #14
On 8 September 2016 at 15:38, Rob Herring <robh+dt@kernel.org> wrote:
> Yes, in theory a device can go from disabled to okay, but that's
> generally never been supported. Linux takes the simple approach of
> "disabled" means ignore it. I think we'll see that change with
> overlays.

No need for future tense there, overlays are being used on a daily
basis on the BeagleBone and have already been for years.

> I don't agree. Generally, disabled means the h/w is there, but don't
> use it. There may be some cases where the hardware doesn't exist for
> the convenience of having a single dts, but that's the exception.

Yes and no. What matters is whether "don't use it" means "you can't
put it to good use" or "don't even try to reach the peripheral, bad
things will happen". Right now it's used for both cases.

Ideally the latter case would be removed from the kernel's view entirely.

On 8 September 2016 at 16:20, Nishanth Menon <nm@ti.com> wrote:
> Minor point here

It's not minor, it's quite crucial.

> maintaining dts per paper spin is just too impossible to maintain

Even if the per-spin dtsi just consists of including the common dtsi
and then applying /delete-node/ per peripheral that is disabled in
efuse? (or through firewalling, e.g. on HS devices)

> the variations if maintained as seperate dts might infact end up being
> larger in number than all the dts we have in arch/arm/boot/dts

There are 813 dts files there. Even if there were a dra7xx and am57xx
for every value of x (and there really isn't) you wouldn't get
anywhere near there.

But, fair enough, efuse bits are definitely an excellent way to get
combinatorial explosion.

Afaik those feature bits are readable through the control module on TI
SoCs though. Assuming such a thing is the norm in SoC world maybe a
"delete-if" property referencing some sort of test on register bits of
a referenced syscon node might do the trick?

On the cortex-A8 doing auto-detection would be a feasible alternative,
by reusing the existing exception mechanism to trap synchronous
aborts, but e.g. the Cortex-A15 seems to use async aborts for *every*
bus error, which makes things just very awful.

Matthijs
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Rini Sept. 12, 2016, 1:35 p.m. UTC | #15
On Sat, Sep 10, 2016 at 03:11:17AM +0200, Matthijs van Duin wrote:
> On 8 September 2016 at 15:38, Rob Herring <robh+dt@kernel.org> wrote:
> > Yes, in theory a device can go from disabled to okay, but that's
> > generally never been supported. Linux takes the simple approach of
> > "disabled" means ignore it. I think we'll see that change with
> > overlays.
> 
> No need for future tense there, overlays are being used on a daily
> basis on the BeagleBone and have already been for years.
> 
> > I don't agree. Generally, disabled means the h/w is there, but don't
> > use it. There may be some cases where the hardware doesn't exist for
> > the convenience of having a single dts, but that's the exception.
> 
> Yes and no. What matters is whether "don't use it" means "you can't
> put it to good use" or "don't even try to reach the peripheral, bad
> things will happen". Right now it's used for both cases.
> 
> Ideally the latter case would be removed from the kernel's view entirely.

What do you mean by "you can't put it to good use" ?  Is that the case
of stuff that's say exposed via a header and could be used but isn't (ie
the cape/hat/chip/etc case) or the IP block is still OK but just not
exposed at all?

What we're trying to address here is the case of "don't even try to
use the peripheral, bad things will happen.  But please properly idle
the IP block!".
Tom Rini Sept. 12, 2016, 1:38 p.m. UTC | #16
On Sat, Sep 10, 2016 at 03:11:17AM +0200, Matthijs van Duin wrote:

[snip]
> On 8 September 2016 at 16:20, Nishanth Menon <nm@ti.com> wrote:
> > Minor point here
> 
> It's not minor, it's quite crucial.
> 
> > maintaining dts per paper spin is just too impossible to maintain
> 
> Even if the per-spin dtsi just consists of including the common dtsi
> and then applying /delete-node/ per peripheral that is disabled in
> efuse? (or through firewalling, e.g. on HS devices)
> 
> > the variations if maintained as seperate dts might infact end up being
> > larger in number than all the dts we have in arch/arm/boot/dts
> 
> There are 813 dts files there. Even if there were a dra7xx and am57xx
> for every value of x (and there really isn't) you wouldn't get
> anywhere near there.
> 
> But, fair enough, efuse bits are definitely an excellent way to get
> combinatorial explosion.
> 
> Afaik those feature bits are readable through the control module on TI
> SoCs though. Assuming such a thing is the norm in SoC world maybe a
> "delete-if" property referencing some sort of test on register bits of
> a referenced syscon node might do the trick?
> 
> On the cortex-A8 doing auto-detection would be a feasible alternative,
> by reusing the existing exception mechanism to trap synchronous
> aborts, but e.g. the Cortex-A15 seems to use async aborts for *every*
> bus error, which makes things just very awful.

I think this still misses one of the keys which is that for the IP block
that has been efused (or whatever) into being unusable it's also still
true that the IP block needs to be properly turned off.  The IP in
question wasn't removed from the SoC but rather an ice pick was jammed
into a critical path meaning you can't use it and now it's in zombie
state.  The kernel needs to know about it so that it can finish the job.
Matthijs van Duin Sept. 12, 2016, 1:46 p.m. UTC | #17
On 12 September 2016 at 15:35, Tom Rini <trini@konsulko.com> wrote:
> What do you mean by "you can't put it to good use" ?  Is that the case
> of stuff that's say exposed via a header and could be used but isn't (ie
> the cape/hat/chip/etc case) or the IP block is still OK but just not
> exposed at all?
>
> What we're trying to address here is the case of "don't even try to
> use the peripheral, bad things will happen.  But please properly idle
> the IP block!".

I'm pretty sure IP blocks in the "don't even try" category are
necessarily idled. There's no reason they wouldn't be.

The hwmod infrastructure generally tries to access the sysconfig
register located in the IP block address space itself. If you try to
do that with a peripheral that's efused out you'll get a bus error.

Matthijs
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Rini Sept. 12, 2016, 1:49 p.m. UTC | #18
On Mon, Sep 12, 2016 at 03:46:23PM +0200, Matthijs van Duin wrote:
> On 12 September 2016 at 15:35, Tom Rini <trini@konsulko.com> wrote:
> > What do you mean by "you can't put it to good use" ?  Is that the case
> > of stuff that's say exposed via a header and could be used but isn't (ie
> > the cape/hat/chip/etc case) or the IP block is still OK but just not
> > exposed at all?
> >
> > What we're trying to address here is the case of "don't even try to
> > use the peripheral, bad things will happen.  But please properly idle
> > the IP block!".
> 
> I'm pretty sure IP blocks in the "don't even try" category are
> necessarily idled. There's no reason they wouldn't be.

OK.  But this is specifically about the blocks that we are being told by
the vendor are _not_ idled and need to be.
diff mbox

Patch

diff --git a/drivers/of/base.c b/drivers/of/base.c
index ff37f6d..4d39857 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -550,8 +550,8 @@  EXPORT_SYMBOL(of_machine_is_compatible);
  *
  *  @device: Node to check for availability, with locks already held
  *
- *  Returns true if the status property is absent or set to "okay" or "ok",
- *  false otherwise
+ *  Returns true if the status property is absent or set to "okay", "ok",
+ *  or starting with "fail-", false otherwise
  */
 static bool __of_device_is_available(const struct device_node *device)
 {
@@ -566,7 +566,8 @@  static bool __of_device_is_available(const struct device_node *device)
 		return true;
 
 	if (statlen > 0) {
-		if (!strcmp(status, "okay") || !strcmp(status, "ok"))
+		if (!strcmp(status, "okay") || !strcmp(status, "ok") ||
+		    !strncmp(status, "fail-", 5))
 			return true;
 	}
 
@@ -595,6 +596,63 @@  bool of_device_is_available(const struct device_node *device)
 EXPORT_SYMBOL(of_device_is_available);
 
 /**
+ *  __of_device_is_incomplete - check if a device is incomplete
+ *
+ *  @device: Node to check for partial failure with locks already held
+ *  @status: Status string for optional handling of the fail-sss state
+ */
+static bool __of_device_is_incomplete(const struct device_node *device,
+				      const char **status)
+{
+	const char *s, *m = "fail-";
+	int slen, mlen;
+
+	if (!device)
+		return false;
+
+	s = __of_get_property(device, "status", &slen);
+	if (!s)
+		return false;
+
+	mlen = strlen(m);
+	if (slen <= mlen)
+		return false;
+
+	if (strncmp("fail-", s, mlen))
+		return false;
+
+	*status = s + mlen;
+
+	return true;
+}
+
+/**
+ *  of_device_is_incomplete - check if a device is incomplete
+ *
+ *  @device: Node to check for partial failure
+ *  @status: Status string for optional handling of the fail-sss state
+ *
+ *  Returns true if the status property is set to "fail-sss",
+ *  false otherwise. Fore more information, see fail-sss in ePAPR 1.1
+ *  "Table 2-4 Values for status property".
+ *
+ *  The caller can optionally try to handle the error based on the
+ *  status string.
+ */
+bool of_device_is_incomplete(const struct device_node *device,
+			     const char **status)
+{
+	unsigned long flags;
+	bool res;
+
+	raw_spin_lock_irqsave(&devtree_lock, flags);
+	res = __of_device_is_incomplete(device, status);
+	raw_spin_unlock_irqrestore(&devtree_lock, flags);
+	return res;
+}
+EXPORT_SYMBOL(of_device_is_incomplete);
+
+/**
  *  of_device_is_big_endian - check if a device has BE registers
  *
  *  @device: Node to check for endianness
diff --git a/include/linux/of.h b/include/linux/of.h
index 3d9ff8e..b593936 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -320,6 +320,8 @@  extern int of_device_is_compatible(const struct device_node *device,
 extern int of_device_compatible_match(struct device_node *device,
 				      const char *const *compat);
 extern bool of_device_is_available(const struct device_node *device);
+extern bool of_device_is_incomplete(const struct device_node *device,
+				    const char **status);
 extern bool of_device_is_big_endian(const struct device_node *device);
 extern const void *of_get_property(const struct device_node *node,
 				const char *name,
@@ -504,6 +506,12 @@  static inline bool of_device_is_available(const struct device_node *device)
 	return false;
 }
 
+static inline bool of_device_is_incomplete(const struct device_node *device,
+					   const char **status)
+{
+	return false;
+}
+
 static inline bool of_device_is_big_endian(const struct device_node *device)
 {
 	return false;