diff mbox

[1/2,v2] tpm: cmd_ready command can be issued only after granting locality

Message ID 20180128075101.6883-2-tomas.winkler@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Winkler, Tomas Jan. 28, 2018, 7:51 a.m. UTC
The correct sequence is to first request locality and only after
that perform cmd_ready  handshake, otherwise the hardware will drop
the subsequent message as from the device point of view the cmd_ready
handshake wasn't performed. Symmetrically locality has to be relinquished
only after going idle handshake has completed, this requires that
go_idle has to poll for the completion and as well locality
relinquish has to poll for completion so it is not overrriden
in back to back commands flow.

The issue is only visible on devices that support multiple localities.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
---
V2: poll for locality relinquish completion

 drivers/char/tpm/tpm-interface.c |  14 ++---
 drivers/char/tpm/tpm_crb.c       | 108 +++++++++++++++++++++++++++------------
 drivers/char/tpm/tpm_tis_core.c  |   4 +-
 include/linux/tpm.h              |   2 +-
 4 files changed, 87 insertions(+), 41 deletions(-)

Comments

Jason Gunthorpe Jan. 28, 2018, 8:15 p.m. UTC | #1
On Sun, Jan 28, 2018 at 09:51:00AM +0200, Tomas Winkler wrote:

> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> index bcdd3790e94d..06639fb6ab85 100644
> +++ b/include/linux/tpm.h
> @@ -44,7 +44,7 @@ struct tpm_class_ops {
>  	bool (*update_timeouts)(struct tpm_chip *chip,
>  				unsigned long *timeout_cap);
>  	int (*request_locality)(struct tpm_chip *chip, int loc);
> -	void (*relinquish_locality)(struct tpm_chip *chip, int loc);
> +	int (*relinquish_locality)(struct tpm_chip *chip, int loc);

This seems wrong.. What is the core code supposed to do if relinquish
fails?

Just returning an error code from transmit doesn't really do anything
helpful from a broad subsytem perspective.

I think if a driver can fail reliquish then it needs some kind of
strategy to recover.

Suggest trying the reliquish again on every next request until
success, otherwise fail request locality, potentially permanently.

Jason
Winkler, Tomas Jan. 28, 2018, 9:17 p.m. UTC | #2
> 
> On Sun, Jan 28, 2018 at 09:51:00AM +0200, Tomas Winkler wrote:
> 
> > diff --git a/include/linux/tpm.h b/include/linux/tpm.h index
> > bcdd3790e94d..06639fb6ab85 100644
> > +++ b/include/linux/tpm.h
> > @@ -44,7 +44,7 @@ struct tpm_class_ops {
> >  	bool (*update_timeouts)(struct tpm_chip *chip,
> >  				unsigned long *timeout_cap);
> >  	int (*request_locality)(struct tpm_chip *chip, int loc);
> > -	void (*relinquish_locality)(struct tpm_chip *chip, int loc);
> > +	int (*relinquish_locality)(struct tpm_chip *chip, int loc);
> 
> This seems wrong.. What is the core code supposed to do if relinquish fails?

Not much just propage the error to the caller and leave the policy decision to it.

> Just returning an error code from transmit doesn't really do anything helpful
> from a broad subsytem perspective.

Yes, you are right, but I'm not sure even if the subsystem is broad enough to understand
the system setup,  or in another direction specific enough to behave upon hw limitations. 
> 
> I think if a driver can fail reliquish then it needs some kind of strategy to
> recover.
Maybe some driver can and some not, but if it doesn't succeed it should return an error.
> 
> Suggest trying the reliquish again on every next request until success,
> otherwise fail request locality, potentially permanently.

This is something I rather prevent because it leaves the HW in kind of undefined state 
( and we should probably work on that a bit more later).
As far as I've debugged the flow now, the driver just fails, and the error goes up 
user space caller or the internal flow is stopped.
A user can reboot the system or whatever it helps in his/her particular setup.

Make sense?

Anyhow I will dig to it more how fatal is that relinquish failure. 

Thanks
Tomas
Jason Gunthorpe Jan. 29, 2018, 5:57 p.m. UTC | #3
On Sun, Jan 28, 2018 at 09:17:53PM +0000, Winkler, Tomas wrote:

> > I think if a driver can fail reliquish then it needs some kind of strategy to
> > recover.

> Maybe some driver can and some not, but if it doesn't succeed it
> should return an error.

But you can't just leave the driver in some inconsistent state..

Every time I've audited something to do with 'add error codes to
destroy/free/release' I find driver design issues..

> > Suggest trying the reliquish again on every next request until success,
> > otherwise fail request locality, potentially permanently.
> 
> This is something I rather prevent because it leaves the HW in kind of undefined state 
> ( and we should probably work on that a bit more later).
> As far as I've debugged the flow now, the driver just fails, and the error goes up 
> user space caller or the internal flow is stopped.

But tranmist_command will be called again - then what does the driver
do? The driver needs an answer for that..

If you don't want to retry then I'd rather see request_locality
permanently fail then adding a return code to release.

Jason
Winkler, Tomas Jan. 29, 2018, 7:40 p.m. UTC | #4
> On Sun, Jan 28, 2018 at 09:17:53PM +0000, Winkler, Tomas wrote:
> 
> > > I think if a driver can fail reliquish then it needs some kind of
> > > strategy to recover.
> 
> > Maybe some driver can and some not, but if it doesn't succeed it
> > should return an error.
> 
> But you can't just leave the driver in some inconsistent state..
> 
> Every time I've audited something to do with 'add error codes to
> destroy/free/release' I find driver design issues..

I'm sure of it, but from this particular point the driver itself is stateless, 
it's just reading HW state via registers. It's not going through driver state changes.

> > > Suggest trying the reliquish again on every next request until
> > > success, otherwise fail request locality, potentially permanently.
> >
> > This is something I rather prevent because it leaves the HW in kind of
> > undefined state ( and we should probably work on that a bit more later).
> > As far as I've debugged the flow now, the driver just fails, and the
> > error goes up user space caller or the internal flow is stopped.
> 
> But tranmist_command will be called again - then what does the driver do?
> The driver needs an answer for that..
It will just fail again
> 
> If you don't want to retry then I'd rather see request_locality permanently
> fail then adding a return code to release.

What do you mean exactly mean by permanently fail,  
My current assumption is that  it will fail permanently because the HW is not responsive
Or indicate error on any subsequent command, unless the hw recover somehow. 
Currently I'm not aware of any possibility to reset the device except rebooting the system.


Thanks
Tomas
Jarkko Sakkinen Feb. 6, 2018, 8:02 p.m. UTC | #5
On Sun, Jan 28, 2018 at 09:17:53PM +0000, Winkler, Tomas wrote:
> 
> > 
> > On Sun, Jan 28, 2018 at 09:51:00AM +0200, Tomas Winkler wrote:
> > 
> > > diff --git a/include/linux/tpm.h b/include/linux/tpm.h index
> > > bcdd3790e94d..06639fb6ab85 100644
> > > +++ b/include/linux/tpm.h
> > > @@ -44,7 +44,7 @@ struct tpm_class_ops {
> > >  	bool (*update_timeouts)(struct tpm_chip *chip,
> > >  				unsigned long *timeout_cap);
> > >  	int (*request_locality)(struct tpm_chip *chip, int loc);
> > > -	void (*relinquish_locality)(struct tpm_chip *chip, int loc);
> > > +	int (*relinquish_locality)(struct tpm_chip *chip, int loc);
> > 
> > This seems wrong.. What is the core code supposed to do if relinquish fails?
> 
> Not much just propage the error to the caller and leave the policy
> decision to it.

Your patch set must either cover this or keep it as void.

A better idea is to print an error to klog.

/Jarkko
Winkler, Tomas Feb. 6, 2018, 9:26 p.m. UTC | #6
> 
> On Sun, Jan 28, 2018 at 09:17:53PM +0000, Winkler, Tomas wrote:
> >
> > >
> > > On Sun, Jan 28, 2018 at 09:51:00AM +0200, Tomas Winkler wrote:
> > >
> > > > diff --git a/include/linux/tpm.h b/include/linux/tpm.h index
> > > > bcdd3790e94d..06639fb6ab85 100644
> > > > +++ b/include/linux/tpm.h
> > > > @@ -44,7 +44,7 @@ struct tpm_class_ops {
> > > >  	bool (*update_timeouts)(struct tpm_chip *chip,
> > > >  				unsigned long *timeout_cap);
> > > >  	int (*request_locality)(struct tpm_chip *chip, int loc);
> > > > -	void (*relinquish_locality)(struct tpm_chip *chip, int loc);
> > > > +	int (*relinquish_locality)(struct tpm_chip *chip, int loc);
> > >
> > > This seems wrong.. What is the core code supposed to do if relinquish
> fails?
> >
> > Not much just propage the error to the caller and leave the policy
> > decision to it.
> 
> Your patch set must either cover this or keep it as void.


How the code is covering other failures in the transmit functions,  
how is this one different from for example request_locality failure?
Why we should not propage this error up?

> 
> A better idea is to print an error to klog.
We can do that in addition.


Thanks
Tomas
Jarkko Sakkinen Feb. 8, 2018, 12:44 p.m. UTC | #7
On Tue, Feb 06, 2018 at 09:26:15PM +0000, Winkler, Tomas wrote:
> > 
> > On Sun, Jan 28, 2018 at 09:17:53PM +0000, Winkler, Tomas wrote:
> > >
> > > >
> > > > On Sun, Jan 28, 2018 at 09:51:00AM +0200, Tomas Winkler wrote:
> > > >
> > > > > diff --git a/include/linux/tpm.h b/include/linux/tpm.h index
> > > > > bcdd3790e94d..06639fb6ab85 100644
> > > > > +++ b/include/linux/tpm.h
> > > > > @@ -44,7 +44,7 @@ struct tpm_class_ops {
> > > > >  	bool (*update_timeouts)(struct tpm_chip *chip,
> > > > >  				unsigned long *timeout_cap);
> > > > >  	int (*request_locality)(struct tpm_chip *chip, int loc);
> > > > > -	void (*relinquish_locality)(struct tpm_chip *chip, int loc);
> > > > > +	int (*relinquish_locality)(struct tpm_chip *chip, int loc);
> > > >
> > > > This seems wrong.. What is the core code supposed to do if relinquish
> > fails?
> > >
> > > Not much just propage the error to the caller and leave the policy
> > > decision to it.
> > 
> > Your patch set must either cover this or keep it as void.
> 
> 
> How the code is covering other failures in the transmit functions,  
> how is this one different from for example request_locality failure?
> Why we should not propage this error up?
> 
> > 
> > A better idea is to print an error to klog.
> We can do that in addition.

I guess you are right. This can be propagated to the user space so that
it knows that there is problem. To make the root more visible the klog
message would make sense.

/Jarkko
Winkler, Tomas Feb. 8, 2018, 12:46 p.m. UTC | #8
> 
> On Tue, Feb 06, 2018 at 09:26:15PM +0000, Winkler, Tomas wrote:
> > >
> > > On Sun, Jan 28, 2018 at 09:17:53PM +0000, Winkler, Tomas wrote:
> > > >
> > > > >
> > > > > On Sun, Jan 28, 2018 at 09:51:00AM +0200, Tomas Winkler wrote:
> > > > >
> > > > > > diff --git a/include/linux/tpm.h b/include/linux/tpm.h index
> > > > > > bcdd3790e94d..06639fb6ab85 100644
> > > > > > +++ b/include/linux/tpm.h
> > > > > > @@ -44,7 +44,7 @@ struct tpm_class_ops {
> > > > > >  	bool (*update_timeouts)(struct tpm_chip *chip,
> > > > > >  				unsigned long *timeout_cap);
> > > > > >  	int (*request_locality)(struct tpm_chip *chip, int loc);
> > > > > > -	void (*relinquish_locality)(struct tpm_chip *chip, int loc);
> > > > > > +	int (*relinquish_locality)(struct tpm_chip *chip, int loc);
> > > > >
> > > > > This seems wrong.. What is the core code supposed to do if
> > > > > relinquish
> > > fails?
> > > >
> > > > Not much just propage the error to the caller and leave the policy
> > > > decision to it.
> > >
> > > Your patch set must either cover this or keep it as void.
> >
> >
> > How the code is covering other failures in the transmit functions, how
> > is this one different from for example request_locality failure?
> > Why we should not propage this error up?
> >
> > >
> > > A better idea is to print an error to klog.
> > We can do that in addition.
> 
> I guess you are right. This can be propagated to the user space so that it
> knows that there is problem. To make the root more visible the klog
> message would make sense.

Thanks, will add an error message.
Tomas
diff mbox

Patch

diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 76df4fbcf089..9fb3d406b078 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -422,8 +422,6 @@  ssize_t tpm_transmit(struct tpm_chip *chip, struct tpm_space *space,
 	if (!(flags & TPM_TRANSMIT_UNLOCKED))
 		mutex_lock(&chip->tpm_mutex);
 
-	if (chip->dev.parent)
-		pm_runtime_get_sync(chip->dev.parent);
 
 	if (chip->ops->clk_enable != NULL)
 		chip->ops->clk_enable(chip, true);
@@ -439,6 +437,9 @@  ssize_t tpm_transmit(struct tpm_chip *chip, struct tpm_space *space,
 		chip->locality = rc;
 	}
 
+	if (chip->dev.parent)
+		pm_runtime_get_sync(chip->dev.parent);
+
 	rc = tpm2_prepare_space(chip, space, ordinal, buf);
 	if (rc)
 		goto out;
@@ -499,17 +500,18 @@  ssize_t tpm_transmit(struct tpm_chip *chip, struct tpm_space *space,
 	rc = tpm2_commit_space(chip, space, ordinal, buf, &len);
 
 out:
+	if (chip->dev.parent)
+		pm_runtime_put_sync(chip->dev.parent);
+
 	if (need_locality && chip->ops->relinquish_locality) {
-		chip->ops->relinquish_locality(chip, chip->locality);
+		rc = chip->ops->relinquish_locality(chip, chip->locality);
 		chip->locality = -1;
 	}
+
 out_no_locality:
 	if (chip->ops->clk_enable != NULL)
 		chip->ops->clk_enable(chip, false);
 
-	if (chip->dev.parent)
-		pm_runtime_put_sync(chip->dev.parent);
-
 	if (!(flags & TPM_TRANSMIT_UNLOCKED))
 		mutex_unlock(&chip->tpm_mutex);
 	return rc ? rc : len;
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 7b3c2a8aa9de..497edd9848cd 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -112,6 +112,25 @@  struct tpm2_crb_smc {
 	u32 smc_func_id;
 };
 
+static bool crb_wait_for_reg_32(u32 __iomem *reg, u32 mask, u32 value,
+				unsigned long timeout)
+{
+	ktime_t start;
+	ktime_t stop;
+
+	start = ktime_get();
+	stop = ktime_add(start, ms_to_ktime(timeout));
+
+	do {
+		if ((ioread32(reg) & mask) == value)
+			return true;
+
+		usleep_range(50, 100);
+	} while (ktime_before(ktime_get(), stop));
+
+	return ((ioread32(reg) & mask) == value);
+}
+
 /**
  * crb_go_idle - request tpm crb device to go the idle state
  *
@@ -128,7 +147,7 @@  struct tpm2_crb_smc {
  *
  * Return: 0 always
  */
-static int __maybe_unused crb_go_idle(struct device *dev, struct crb_priv *priv)
+static int crb_go_idle(struct device *dev, struct crb_priv *priv)
 {
 	if ((priv->sm == ACPI_TPM2_START_METHOD) ||
 	    (priv->sm == ACPI_TPM2_COMMAND_BUFFER_WITH_START_METHOD) ||
@@ -136,30 +155,17 @@  static int __maybe_unused crb_go_idle(struct device *dev, struct crb_priv *priv)
 		return 0;
 
 	iowrite32(CRB_CTRL_REQ_GO_IDLE, &priv->regs_t->ctrl_req);
-	/* we don't really care when this settles */
 
+	if (!crb_wait_for_reg_32(&priv->regs_t->ctrl_req,
+				 CRB_CTRL_REQ_GO_IDLE/* mask */,
+				 0, /* value */
+				 TPM2_TIMEOUT_C)) {
+		dev_warn(dev, "goIdle timed out\n");
+		return -ETIME;
+	}
 	return 0;
 }
 
-static bool crb_wait_for_reg_32(u32 __iomem *reg, u32 mask, u32 value,
-				unsigned long timeout)
-{
-	ktime_t start;
-	ktime_t stop;
-
-	start = ktime_get();
-	stop = ktime_add(start, ms_to_ktime(timeout));
-
-	do {
-		if ((ioread32(reg) & mask) == value)
-			return true;
-
-		usleep_range(50, 100);
-	} while (ktime_before(ktime_get(), stop));
-
-	return false;
-}
-
 /**
  * crb_cmd_ready - request tpm crb device to enter ready state
  *
@@ -175,8 +181,7 @@  static bool crb_wait_for_reg_32(u32 __iomem *reg, u32 mask, u32 value,
  *
  * Return: 0 on success -ETIME on timeout;
  */
-static int __maybe_unused crb_cmd_ready(struct device *dev,
-					struct crb_priv *priv)
+static int crb_cmd_ready(struct device *dev, struct crb_priv *priv)
 {
 	if ((priv->sm == ACPI_TPM2_START_METHOD) ||
 	    (priv->sm == ACPI_TPM2_COMMAND_BUFFER_WITH_START_METHOD) ||
@@ -195,11 +200,11 @@  static int __maybe_unused crb_cmd_ready(struct device *dev,
 	return 0;
 }
 
-static int crb_request_locality(struct tpm_chip *chip, int loc)
+static int __crb_request_locality(struct device *dev,
+				  struct crb_priv *priv, int loc)
 {
-	struct crb_priv *priv = dev_get_drvdata(&chip->dev);
 	u32 value = CRB_LOC_STATE_LOC_ASSIGNED |
-		CRB_LOC_STATE_TPM_REG_VALID_STS;
+		    CRB_LOC_STATE_TPM_REG_VALID_STS;
 
 	if (!priv->regs_h)
 		return 0;
@@ -207,21 +212,45 @@  static int crb_request_locality(struct tpm_chip *chip, int loc)
 	iowrite32(CRB_LOC_CTRL_REQUEST_ACCESS, &priv->regs_h->loc_ctrl);
 	if (!crb_wait_for_reg_32(&priv->regs_h->loc_state, value, value,
 				 TPM2_TIMEOUT_C)) {
-		dev_warn(&chip->dev, "TPM_LOC_STATE_x.requestAccess timed out\n");
+		dev_warn(dev, "TPM_LOC_STATE_x.requestAccess timed out\n");
 		return -ETIME;
 	}
 
 	return 0;
 }
 
-static void crb_relinquish_locality(struct tpm_chip *chip, int loc)
+static int crb_request_locality(struct tpm_chip *chip, int loc)
 {
 	struct crb_priv *priv = dev_get_drvdata(&chip->dev);
 
+	return __crb_request_locality(&chip->dev, priv, loc);
+}
+
+static int __crb_relinquish_locality(struct device *dev,
+				     struct crb_priv *priv, int loc)
+{
+	u32 mask = CRB_LOC_STATE_LOC_ASSIGNED |
+		   CRB_LOC_STATE_TPM_REG_VALID_STS;
+	u32 value = CRB_LOC_STATE_TPM_REG_VALID_STS;
+
 	if (!priv->regs_h)
-		return;
+		return 0;
 
 	iowrite32(CRB_LOC_CTRL_RELINQUISH, &priv->regs_h->loc_ctrl);
+	if (!crb_wait_for_reg_32(&priv->regs_h->loc_state, mask, value,
+				 TPM2_TIMEOUT_C)) {
+		dev_warn(dev, "TPM_LOC_STATE_x.requestAccess timed out\n");
+		return -ETIME;
+	}
+
+	return 0;
+}
+
+static int crb_relinquish_locality(struct tpm_chip *chip, int loc)
+{
+	struct crb_priv *priv = dev_get_drvdata(&chip->dev);
+
+	return __crb_relinquish_locality(&chip->dev, priv, loc);
 }
 
 static u8 crb_status(struct tpm_chip *chip)
@@ -475,6 +504,10 @@  static int crb_map_io(struct acpi_device *device, struct crb_priv *priv,
 			dev_warn(dev, FW_BUG "Bad ACPI memory layout");
 	}
 
+	ret = __crb_request_locality(dev, priv, 0);
+	if (ret)
+		return ret;
+
 	priv->regs_t = crb_map_res(dev, priv, &io_res, buf->control_address,
 				   sizeof(struct crb_regs_tail));
 	if (IS_ERR(priv->regs_t))
@@ -531,6 +564,8 @@  static int crb_map_io(struct acpi_device *device, struct crb_priv *priv,
 
 	crb_go_idle(dev, priv);
 
+	__crb_relinquish_locality(dev, priv, 0);
+
 	return ret;
 }
 
@@ -588,10 +623,14 @@  static int crb_acpi_add(struct acpi_device *device)
 	chip->acpi_dev_handle = device->handle;
 	chip->flags = TPM_CHIP_FLAG_TPM2;
 
-	rc  = crb_cmd_ready(dev, priv);
+	rc = __crb_request_locality(dev, priv, 0);
 	if (rc)
 		return rc;
 
+	rc  = crb_cmd_ready(dev, priv);
+	if (rc)
+		goto out;
+
 	pm_runtime_get_noresume(dev);
 	pm_runtime_set_active(dev);
 	pm_runtime_enable(dev);
@@ -601,12 +640,15 @@  static int crb_acpi_add(struct acpi_device *device)
 		crb_go_idle(dev, priv);
 		pm_runtime_put_noidle(dev);
 		pm_runtime_disable(dev);
-		return rc;
+		goto out;
 	}
 
-	pm_runtime_put(dev);
+	pm_runtime_put_sync(dev);
 
-	return 0;
+out:
+	__crb_relinquish_locality(dev, priv, 0);
+
+	return rc;
 }
 
 static int crb_acpi_remove(struct acpi_device *device)
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 183a5f54d875..a22b12adbdfd 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -143,11 +143,13 @@  static bool check_locality(struct tpm_chip *chip, int l)
 	return false;
 }
 
-static void release_locality(struct tpm_chip *chip, int l)
+static int release_locality(struct tpm_chip *chip, int l)
 {
 	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
 
 	tpm_tis_write8(priv, TPM_ACCESS(l), TPM_ACCESS_ACTIVE_LOCALITY);
+
+	return 0;
 }
 
 static int request_locality(struct tpm_chip *chip, int l)
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index bcdd3790e94d..06639fb6ab85 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -44,7 +44,7 @@  struct tpm_class_ops {
 	bool (*update_timeouts)(struct tpm_chip *chip,
 				unsigned long *timeout_cap);
 	int (*request_locality)(struct tpm_chip *chip, int loc);
-	void (*relinquish_locality)(struct tpm_chip *chip, int loc);
+	int (*relinquish_locality)(struct tpm_chip *chip, int loc);
 	void (*clk_enable)(struct tpm_chip *chip, bool value);
 };