Message ID | 20240725175714.1769080-1-patrykd@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | platform/chrome: cros_ec_proto: Lock device when updating MKBP version | expand |
On Thu, Jul 25, 2024 at 05:57:13PM +0000, Patryk Duda wrote: > The cros_ec_get_host_command_version_mask() function requires that the > caller must have ec_dev->lock mutex before calling it. This requirement > was not met and as a result it was possible that two commands were sent > to the device at the same time. To clarify: - What would happen if multiple cros_ec_get_host_command_version_mask() calls at the same time? - What are the callees? I'm trying to understand the source of parallelism. Also, the patch also needs an unlock at [1]. [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819 > The problem was observed while using UART backend which doesn't use any > additional locks, unlike SPI backend which locks the controller until > response is received. Is it a general issue if multiple commands send to EC at a time? If yes, it should serialize that in the UART transportation.
On Mon, Jul 29, 2024 at 5:47 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > On Thu, Jul 25, 2024 at 05:57:13PM +0000, Patryk Duda wrote: > > The cros_ec_get_host_command_version_mask() function requires that the > > caller must have ec_dev->lock mutex before calling it. This requirement > > was not met and as a result it was possible that two commands were sent > > to the device at the same time. > > To clarify: > - What would happen if multiple cros_ec_get_host_command_version_mask() calls > at the same time? In the best case, MCU will receive both commands glued together and will ignore them. It will result in a timeout in the kernel. In the worst case, request and/or response buffers will be corrupted. > - What are the callees? I'm trying to understand the source of parallelism. This is a race between interrupt handling and ioctl call from userspace Handling interrupt path cros_ec_irq_thread() cros_ec_handle_event() cros_ec_get_next_event() - Queries host command version without taking ec_dev->lock mutex first cros_ec_get_host_command_version_mask() cros_ec_send_command() cros_ec_xfer_command() cros_ec_uart_pkt_xfer() Command from userspace cros_ec_chardev_ioctl() cros_ec_chardev_ioctl_xcmd() cros_ec_cmd_xfer() - Locks ec_dev->lock mutex before sending command cros_ec_send_command() cros_ec_xfer_command() cros_ec_uart_pkt_xfer() > > Also, the patch also needs an unlock at [1]. > > [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819 Yeah. I'll fix it in v2 > > > The problem was observed while using UART backend which doesn't use any > > additional locks, unlike SPI backend which locks the controller until > > response is received. > > Is it a general issue if multiple commands send to EC at a time? If yes, it > should serialize that in the UART transportation. Host Commands only support one command at a time. It's enforced by 'lock' mutex from cros_ec_device structure. We just need to use it properly.
On Mon, Jul 29, 2024 at 01:57:09PM +0200, Patryk Duda wrote: > On Mon, Jul 29, 2024 at 5:47 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > > > On Thu, Jul 25, 2024 at 05:57:13PM +0000, Patryk Duda wrote: > > > The cros_ec_get_host_command_version_mask() function requires that the > > > caller must have ec_dev->lock mutex before calling it. This requirement > > > was not met and as a result it was possible that two commands were sent > > > to the device at the same time. > > > > To clarify: > > - What would happen if multiple cros_ec_get_host_command_version_mask() calls > > at the same time? > In the best case, MCU will receive both commands glued together and > will ignore them. > It will result in a timeout in the kernel. In the worst case, request > and/or response buffers will be > corrupted. > > > - What are the callees? I'm trying to understand the source of parallelism. > This is a race between interrupt handling and ioctl call from userspace > > Handling interrupt path > cros_ec_irq_thread() > cros_ec_handle_event() > cros_ec_get_next_event() - Queries host command version without taking > ec_dev->lock mutex first > cros_ec_get_host_command_version_mask() > cros_ec_send_command() > cros_ec_xfer_command() > cros_ec_uart_pkt_xfer() > > Command from userspace > cros_ec_chardev_ioctl() > cros_ec_chardev_ioctl_xcmd() > cros_ec_cmd_xfer() - Locks ec_dev->lock mutex before sending command > cros_ec_send_command() > cros_ec_xfer_command() > cros_ec_uart_pkt_xfer() > > > > > Also, the patch also needs an unlock at [1]. > > > > [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819 > > Yeah. I'll fix it in v2 I'm wondering if it's simpler to just lock and unlock around calling cros_ec_get_host_command_version_mask(). What do you think? > > > The problem was observed while using UART backend which doesn't use any > > > additional locks, unlike SPI backend which locks the controller until > > > response is received. > > > > Is it a general issue if multiple commands send to EC at a time? If yes, it > > should serialize that in the UART transportation. > > Host Commands only support one command at a time. It's enforced by 'lock' mutex > from cros_ec_device structure. We just need to use it properly. I see. Please use the fixes tag if you'd have chance to send next version: Fixes: f74c7557ed0d ("platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure")
On Tue, Jul 30, 2024 at 8:04 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > On Mon, Jul 29, 2024 at 01:57:09PM +0200, Patryk Duda wrote: > > On Mon, Jul 29, 2024 at 5:47 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > > > > > On Thu, Jul 25, 2024 at 05:57:13PM +0000, Patryk Duda wrote: > > > > The cros_ec_get_host_command_version_mask() function requires that the > > > > caller must have ec_dev->lock mutex before calling it. This requirement > > > > was not met and as a result it was possible that two commands were sent > > > > to the device at the same time. > > > > > > To clarify: > > > - What would happen if multiple cros_ec_get_host_command_version_mask() calls > > > at the same time? > > In the best case, MCU will receive both commands glued together and > > will ignore them. > > It will result in a timeout in the kernel. In the worst case, request > > and/or response buffers will be > > corrupted. > > > > > - What are the callees? I'm trying to understand the source of parallelism. > > This is a race between interrupt handling and ioctl call from userspace > > > > Handling interrupt path > > cros_ec_irq_thread() > > cros_ec_handle_event() > > cros_ec_get_next_event() - Queries host command version without taking > > ec_dev->lock mutex first > > cros_ec_get_host_command_version_mask() > > cros_ec_send_command() > > cros_ec_xfer_command() > > cros_ec_uart_pkt_xfer() > > > > Command from userspace > > cros_ec_chardev_ioctl() > > cros_ec_chardev_ioctl_xcmd() > > cros_ec_cmd_xfer() - Locks ec_dev->lock mutex before sending command > > cros_ec_send_command() > > cros_ec_xfer_command() > > cros_ec_uart_pkt_xfer() > > > > > > > > Also, the patch also needs an unlock at [1]. > > > > > > [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819 > > > > Yeah. I'll fix it in v2 > > I'm wondering if it's simpler to just lock and unlock around calling > cros_ec_get_host_command_version_mask(). What do you think? > Initially, I thought it would be good to keep ec_dev->mkbp_event_supported update under the mutex (similar to cros_ec_query_all() which is called with locked mutex), but mkbp_event_supported is also used without locked mutex. I don't see any obvious risks with updating the MKBP version outside mutex. Do you want me to change it? > > > > The problem was observed while using UART backend which doesn't use any > > > > additional locks, unlike SPI backend which locks the controller until > > > > response is received. > > > > > > Is it a general issue if multiple commands send to EC at a time? If yes, it > > > should serialize that in the UART transportation. > > > > Host Commands only support one command at a time. It's enforced by 'lock' mutex > > from cros_ec_device structure. We just need to use it properly. > > I see. Please use the fixes tag if you'd have chance to send next version: > Fixes: f74c7557ed0d ("platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure")
On Tue, Jul 30, 2024 at 10:05:16AM +0200, Patryk Duda wrote: > On Tue, Jul 30, 2024 at 8:04 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > On Mon, Jul 29, 2024 at 01:57:09PM +0200, Patryk Duda wrote: > > > On Mon, Jul 29, 2024 at 5:47 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > > > Also, the patch also needs an unlock at [1]. > > > > > > > > [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819 > > > > > > Yeah. I'll fix it in v2 > > > > I'm wondering if it's simpler to just lock and unlock around calling > > cros_ec_get_host_command_version_mask(). What do you think? > > > Initially, I thought it would be good to keep ec_dev->mkbp_event_supported > update under the mutex (similar to cros_ec_query_all() which is called with > locked mutex), but mkbp_event_supported is also used without locked mutex. > > I don't see any obvious risks with updating the MKBP version outside mutex. > Do you want me to change it? Yes.
On Tue, Jul 30, 2024 at 12:14 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > On Tue, Jul 30, 2024 at 10:05:16AM +0200, Patryk Duda wrote: > > On Tue, Jul 30, 2024 at 8:04 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > > On Mon, Jul 29, 2024 at 01:57:09PM +0200, Patryk Duda wrote: > > > > On Mon, Jul 29, 2024 at 5:47 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > > > > Also, the patch also needs an unlock at [1]. > > > > > > > > > > [1]: https://elixir.bootlin.com/linux/v6.10/source/drivers/platform/chrome/cros_ec_proto.c#L819 > > > > > > > > Yeah. I'll fix it in v2 > > > > > > I'm wondering if it's simpler to just lock and unlock around calling > > > cros_ec_get_host_command_version_mask(). What do you think? > > > > > Initially, I thought it would be good to keep ec_dev->mkbp_event_supported > > update under the mutex (similar to cros_ec_query_all() which is called with > > locked mutex), but mkbp_event_supported is also used without locked mutex. > > > > I don't see any obvious risks with updating the MKBP version outside mutex. > > Do you want me to change it? > > Yes. Fixed in v3: https://lore.kernel.org/lkml/20240730104425.607083-1-patrykd@google.com
diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c index f776fd42244f..cf7ef5f9db84 100644 --- a/drivers/platform/chrome/cros_ec_proto.c +++ b/drivers/platform/chrome/cros_ec_proto.c @@ -813,6 +813,7 @@ int cros_ec_get_next_event(struct cros_ec_device *ec_dev, if (ret == -ENOPROTOOPT) { dev_dbg(ec_dev->dev, "GET_NEXT_EVENT returned invalid version error.\n"); + mutex_lock(&ec_dev->lock); ret = cros_ec_get_host_command_version_mask(ec_dev, EC_CMD_GET_NEXT_EVENT, &ver_mask); @@ -829,6 +830,7 @@ int cros_ec_get_next_event(struct cros_ec_device *ec_dev, ec_dev->mkbp_event_supported = fls(ver_mask); dev_dbg(ec_dev->dev, "MKBP support version changed to %u\n", ec_dev->mkbp_event_supported - 1); + mutex_unlock(&ec_dev->lock); /* Try to get next event with new MKBP support version set. */ ret = get_next_event(ec_dev);
The cros_ec_get_host_command_version_mask() function requires that the caller must have ec_dev->lock mutex before calling it. This requirement was not met and as a result it was possible that two commands were sent to the device at the same time. The problem was observed while using UART backend which doesn't use any additional locks, unlike SPI backend which locks the controller until response is received. Fixes: f74c7557ed0d ("platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure") Cc: stable@vger.kernel.org Signed-off-by: Patryk Duda <patrykd@google.com> --- The f74c7557ed0d patch was tested with Fingerprint MCU (FPMCU) that uses SPI to communicate with chipset. At that time, UART FPMCU had the same version of GET_NEXT_EVENT command in RO and RW, so the MKBP version was never updated in this case. Version 3 of GET_NEXT_EVENT command was recently added to CrosEC. As a result MKBP version is also updated when UART FPMCU is used which exposed this problem. Best regards, Patryk drivers/platform/chrome/cros_ec_proto.c | 2 ++ 1 file changed, 2 insertions(+)