diff mbox series

platform/chrome: cros_ec_proto: Lock device when updating MKBP version

Message ID 20240725175714.1769080-1-patrykd@google.com (mailing list archive)
State Superseded
Headers show
Series platform/chrome: cros_ec_proto: Lock device when updating MKBP version | expand

Commit Message

Patryk Duda July 25, 2024, 5:57 p.m. UTC
The cros_ec_get_host_command_version_mask() function requires that the
caller must have ec_dev->lock mutex before calling it. This requirement
was not met and as a result it was possible that two commands were sent
to the device at the same time.

The problem was observed while using UART backend which doesn't use any
additional locks, unlike SPI backend which locks the controller until
response is received.

Fixes: f74c7557ed0d ("platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure")
Cc: stable@vger.kernel.org
Signed-off-by: Patryk Duda <patrykd@google.com>
---
The f74c7557ed0d patch was tested with Fingerprint MCU (FPMCU) that uses
SPI to communicate with chipset. At that time, UART FPMCU had the same
version of GET_NEXT_EVENT command in RO and RW, so the MKBP version was
never updated in this case.

Version 3 of GET_NEXT_EVENT command was recently added to CrosEC. As a
result MKBP version is also updated when UART FPMCU is used which
exposed this problem.

Best regards,
Patryk

 drivers/platform/chrome/cros_ec_proto.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Tzung-Bi Shih July 29, 2024, 3:47 a.m. UTC | #1
On Thu, Jul 25, 2024 at 05:57:13PM +0000, Patryk Duda wrote:
> The cros_ec_get_host_command_version_mask() function requires that the
> caller must have ec_dev->lock mutex before calling it. This requirement
> was not met and as a result it was possible that two commands were sent
> to the device at the same time.

To clarify:
- What would happen if multiple cros_ec_get_host_command_version_mask() calls
  at the same time?
- What are the callees?  I'm trying to understand the source of parallelism.

Also, the patch also needs an unlock at [1].

[1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819

> The problem was observed while using UART backend which doesn't use any
> additional locks, unlike SPI backend which locks the controller until
> response is received.

Is it a general issue if multiple commands send to EC at a time?  If yes, it
should serialize that in the UART transportation.
Patryk Duda July 29, 2024, 11:57 a.m. UTC | #2
On Mon, Jul 29, 2024 at 5:47 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
>
> On Thu, Jul 25, 2024 at 05:57:13PM +0000, Patryk Duda wrote:
> > The cros_ec_get_host_command_version_mask() function requires that the
> > caller must have ec_dev->lock mutex before calling it. This requirement
> > was not met and as a result it was possible that two commands were sent
> > to the device at the same time.
>
> To clarify:
> - What would happen if multiple cros_ec_get_host_command_version_mask() calls
>   at the same time?
In the best case, MCU will receive both commands glued together and
will ignore them.
It will result in a timeout in the kernel. In the worst case, request
and/or response buffers will be
corrupted.

> - What are the callees?  I'm trying to understand the source of parallelism.
This is a race between interrupt handling and ioctl call from userspace

Handling interrupt path
cros_ec_irq_thread()
cros_ec_handle_event()
cros_ec_get_next_event() - Queries host command version without taking
ec_dev->lock mutex first
cros_ec_get_host_command_version_mask()
cros_ec_send_command()
cros_ec_xfer_command()
cros_ec_uart_pkt_xfer()

Command from userspace
cros_ec_chardev_ioctl()
cros_ec_chardev_ioctl_xcmd()
cros_ec_cmd_xfer() - Locks ec_dev->lock mutex before sending command
cros_ec_send_command()
cros_ec_xfer_command()
cros_ec_uart_pkt_xfer()

>
> Also, the patch also needs an unlock at [1].
>
> [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819

Yeah. I'll fix it in v2

>
> > The problem was observed while using UART backend which doesn't use any
> > additional locks, unlike SPI backend which locks the controller until
> > response is received.
>
> Is it a general issue if multiple commands send to EC at a time?  If yes, it
> should serialize that in the UART transportation.

Host Commands only support one command at a time. It's enforced by 'lock' mutex
from cros_ec_device structure. We just need to use it properly.
Tzung-Bi Shih July 30, 2024, 6:04 a.m. UTC | #3
On Mon, Jul 29, 2024 at 01:57:09PM +0200, Patryk Duda wrote:
> On Mon, Jul 29, 2024 at 5:47 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
> >
> > On Thu, Jul 25, 2024 at 05:57:13PM +0000, Patryk Duda wrote:
> > > The cros_ec_get_host_command_version_mask() function requires that the
> > > caller must have ec_dev->lock mutex before calling it. This requirement
> > > was not met and as a result it was possible that two commands were sent
> > > to the device at the same time.
> >
> > To clarify:
> > - What would happen if multiple cros_ec_get_host_command_version_mask() calls
> >   at the same time?
> In the best case, MCU will receive both commands glued together and
> will ignore them.
> It will result in a timeout in the kernel. In the worst case, request
> and/or response buffers will be
> corrupted.
> 
> > - What are the callees?  I'm trying to understand the source of parallelism.
> This is a race between interrupt handling and ioctl call from userspace
> 
> Handling interrupt path
> cros_ec_irq_thread()
> cros_ec_handle_event()
> cros_ec_get_next_event() - Queries host command version without taking
> ec_dev->lock mutex first
> cros_ec_get_host_command_version_mask()
> cros_ec_send_command()
> cros_ec_xfer_command()
> cros_ec_uart_pkt_xfer()
> 
> Command from userspace
> cros_ec_chardev_ioctl()
> cros_ec_chardev_ioctl_xcmd()
> cros_ec_cmd_xfer() - Locks ec_dev->lock mutex before sending command
> cros_ec_send_command()
> cros_ec_xfer_command()
> cros_ec_uart_pkt_xfer()
> 
> >
> > Also, the patch also needs an unlock at [1].
> >
> > [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819
> 
> Yeah. I'll fix it in v2

I'm wondering if it's simpler to just lock and unlock around calling
cros_ec_get_host_command_version_mask().  What do you think?

> > > The problem was observed while using UART backend which doesn't use any
> > > additional locks, unlike SPI backend which locks the controller until
> > > response is received.
> >
> > Is it a general issue if multiple commands send to EC at a time?  If yes, it
> > should serialize that in the UART transportation.
> 
> Host Commands only support one command at a time. It's enforced by 'lock' mutex
> from cros_ec_device structure. We just need to use it properly.

I see.  Please use the fixes tag if you'd have chance to send next version:
Fixes: f74c7557ed0d ("platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure")
Patryk Duda July 30, 2024, 8:05 a.m. UTC | #4
On Tue, Jul 30, 2024 at 8:04 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
>
> On Mon, Jul 29, 2024 at 01:57:09PM +0200, Patryk Duda wrote:
> > On Mon, Jul 29, 2024 at 5:47 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
> > >
> > > On Thu, Jul 25, 2024 at 05:57:13PM +0000, Patryk Duda wrote:
> > > > The cros_ec_get_host_command_version_mask() function requires that the
> > > > caller must have ec_dev->lock mutex before calling it. This requirement
> > > > was not met and as a result it was possible that two commands were sent
> > > > to the device at the same time.
> > >
> > > To clarify:
> > > - What would happen if multiple cros_ec_get_host_command_version_mask() calls
> > >   at the same time?
> > In the best case, MCU will receive both commands glued together and
> > will ignore them.
> > It will result in a timeout in the kernel. In the worst case, request
> > and/or response buffers will be
> > corrupted.
> >
> > > - What are the callees?  I'm trying to understand the source of parallelism.
> > This is a race between interrupt handling and ioctl call from userspace
> >
> > Handling interrupt path
> > cros_ec_irq_thread()
> > cros_ec_handle_event()
> > cros_ec_get_next_event() - Queries host command version without taking
> > ec_dev->lock mutex first
> > cros_ec_get_host_command_version_mask()
> > cros_ec_send_command()
> > cros_ec_xfer_command()
> > cros_ec_uart_pkt_xfer()
> >
> > Command from userspace
> > cros_ec_chardev_ioctl()
> > cros_ec_chardev_ioctl_xcmd()
> > cros_ec_cmd_xfer() - Locks ec_dev->lock mutex before sending command
> > cros_ec_send_command()
> > cros_ec_xfer_command()
> > cros_ec_uart_pkt_xfer()
> >
> > >
> > > Also, the patch also needs an unlock at [1].
> > >
> > > [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819
> >
> > Yeah. I'll fix it in v2
>
> I'm wondering if it's simpler to just lock and unlock around calling
> cros_ec_get_host_command_version_mask().  What do you think?
>
Initially, I thought it would be good to keep ec_dev->mkbp_event_supported
update under the mutex (similar to cros_ec_query_all() which is called with
locked mutex), but mkbp_event_supported is also used without locked mutex.

I don't see any obvious risks with updating the MKBP version outside mutex.
Do you want me to change it?

> > > > The problem was observed while using UART backend which doesn't use any
> > > > additional locks, unlike SPI backend which locks the controller until
> > > > response is received.
> > >
> > > Is it a general issue if multiple commands send to EC at a time?  If yes, it
> > > should serialize that in the UART transportation.
> >
> > Host Commands only support one command at a time. It's enforced by 'lock' mutex
> > from cros_ec_device structure. We just need to use it properly.
>
> I see.  Please use the fixes tag if you'd have chance to send next version:
> Fixes: f74c7557ed0d ("platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure")
Tzung-Bi Shih July 30, 2024, 10:14 a.m. UTC | #5
On Tue, Jul 30, 2024 at 10:05:16AM +0200, Patryk Duda wrote:
> On Tue, Jul 30, 2024 at 8:04 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
> > On Mon, Jul 29, 2024 at 01:57:09PM +0200, Patryk Duda wrote:
> > > On Mon, Jul 29, 2024 at 5:47 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
> > > > Also, the patch also needs an unlock at [1].
> > > >
> > > > [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819
> > >
> > > Yeah. I'll fix it in v2
> >
> > I'm wondering if it's simpler to just lock and unlock around calling
> > cros_ec_get_host_command_version_mask().  What do you think?
> >
> Initially, I thought it would be good to keep ec_dev->mkbp_event_supported
> update under the mutex (similar to cros_ec_query_all() which is called with
> locked mutex), but mkbp_event_supported is also used without locked mutex.
> 
> I don't see any obvious risks with updating the MKBP version outside mutex.
> Do you want me to change it?

Yes.
Patryk Duda July 30, 2024, 10:55 a.m. UTC | #6
On Tue, Jul 30, 2024 at 12:14 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
>
> On Tue, Jul 30, 2024 at 10:05:16AM +0200, Patryk Duda wrote:
> > On Tue, Jul 30, 2024 at 8:04 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
> > > On Mon, Jul 29, 2024 at 01:57:09PM +0200, Patryk Duda wrote:
> > > > On Mon, Jul 29, 2024 at 5:47 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
> > > > > Also, the patch also needs an unlock at [1].
> > > > >
> > > > > [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819
> > > >
> > > > Yeah. I'll fix it in v2
> > >
> > > I'm wondering if it's simpler to just lock and unlock around calling
> > > cros_ec_get_host_command_version_mask().  What do you think?
> > >
> > Initially, I thought it would be good to keep ec_dev->mkbp_event_supported
> > update under the mutex (similar to cros_ec_query_all() which is called with
> > locked mutex), but mkbp_event_supported is also used without locked mutex.
> >
> > I don't see any obvious risks with updating the MKBP version outside mutex.
> > Do you want me to change it?
>
> Yes.
Fixed in v3: https://lore.kernel.org/lkml/20240730104425.607083-1-patrykd@google.com
diff mbox series

Patch

diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c
index f776fd42244f..cf7ef5f9db84 100644
--- a/drivers/platform/chrome/cros_ec_proto.c
+++ b/drivers/platform/chrome/cros_ec_proto.c
@@ -813,6 +813,7 @@  int cros_ec_get_next_event(struct cros_ec_device *ec_dev,
 	if (ret == -ENOPROTOOPT) {
 		dev_dbg(ec_dev->dev,
 			"GET_NEXT_EVENT returned invalid version error.\n");
+		mutex_lock(&ec_dev->lock);
 		ret = cros_ec_get_host_command_version_mask(ec_dev,
 							EC_CMD_GET_NEXT_EVENT,
 							&ver_mask);
@@ -829,6 +830,7 @@  int cros_ec_get_next_event(struct cros_ec_device *ec_dev,
 		ec_dev->mkbp_event_supported = fls(ver_mask);
 		dev_dbg(ec_dev->dev, "MKBP support version changed to %u\n",
 			ec_dev->mkbp_event_supported - 1);
+		mutex_unlock(&ec_dev->lock);
 
 		/* Try to get next event with new MKBP support version set. */
 		ret = get_next_event(ec_dev);