mbox series

[v3,0/9] Introduce a unified API for SCMI Server testing

Message ID 20220903183042.3913053-1-cristian.marussi@arm.com (mailing list archive)
Headers show
Series Introduce a unified API for SCMI Server testing | expand

Message

Cristian Marussi Sept. 3, 2022, 6:30 p.m. UTC
Hi all,

This series aims to introduce a new SCMI unified userspace interface meant
to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
from the perspective of the OSPM agent (non-secure world only ...)

It is proposed as a testing/development facility, it is NOT meant to be a
feature to use in production, but only enabled in Kconfig for test
deployments.

Currently an SCMI Compliance Suite like the one at [1] can only work by
injecting SCMI messages at the SCMI transport layer using the mailbox test
driver (CONFIG_MAILBOX_TEST) via its few debugfs entries and looking at
the related replies from the SCMI backend Server.

This approach has a few drawbacks:

- the SCMI Server under test MUST be reachable through a mailbox based
  SCMI transport: any other SCMI Server placement is not possible (like in
  a VM reachable via SCMI Virtio). In order to cover other placements in
  the current scenario we should write some sort of test driver for each
  and every existent SCMI transport and for any future additional transport
  ...this clearly does not scale.

- even in the mailbox case the userspace Compliance suite cannot simply
  send and receive bare SCMI messages BUT it has to properly lay them out
  into the shared memory exposed by the mailbox test driver as expected by
  the transport definitions. In other words such a userspace test
  application has to, not only use a proper transport driver for the system
  at hand, but it also has to have a comprehensive knowledge of the
  internals of the underlying transport in order to operate.

- last but not least, the system under test has to be specifically
  configured and built, in terms of Kconfig and DT, to perform such kind of
  testing, it cannot be used for anything else, which is unfortunate for
  CI/CD deployments.

This series introduces a new SCMI Raw mode support feature that, when
configured and enabled exposes a new interface in debugfs through which:

- a userspace application can inject bare SCMI binary messages into the
  SCMI core stack; such messages will be routed by the SCMI regular kernel
  stack to the backend Server using the currently configured transport
  transparently: in other words you can test the SCMI server, no matter
  where it is placed, as long as it is reachable from the currently
  configured SCMI stack.
  Same goes the other way around on the reading path: any SCMI server reply
  can be read as a bare SCMI binary message from the same debugfs path.

- as a direct consequence of this way of injecting bare messages in the
  middle of the SCMI stack (instead of beneath it at the transport layer)
  the user application has to handle only bare SCMI messages without having
  to worry about the specific underlying transport internals that will be
  taken care of by the SCMI core stack itself using its own machinery,
  without duplicating such logic.

- a system under test, once configured with SCMI Raw support enabled in
  Kconfig, can be booted without any particular DT change.

In V2 the runtime enable/disable switching capability has been removed
(for now) since still not deemed to be stable/reliable enough: as a
consequence when SCMI Raw support is compiled in, the regular SCMI stack
drivers are now inhibited permanently for that Kernel.

A quick and trivial example from the shell...reading from a sensor
injecting a properly crafted packet in raw mode:

	# INJECT THE SENSOR_READING MESSAGE FOR SENSOR ID=1 (binary little endian)
	root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message

	# READING BACK THE REPLY...
	root@deb-buster-arm64:~# cat /sys/kernel/debug/scmi_raw/message | od --endian=little -t x4
	0000000 00005406 00000000 00000335 00000000
	0000020

while doing that, since Raw mode makes (partial) use of the regular SCMI
stack, you can observe the messages going through the SCMI stack with the
usual traces:

              bash-329     [000] ..... 14183.446808: scmi_msg_dump: pt=15 t=CMND msg_id=06 seq=0000 s=0 pyld=0100000000000000
   irq/35-mhu_db_l-81      [000] ..... 14183.447809: scmi_msg_dump: pt=15 t=RESP msg_id=06 seq=0000 s=0 pyld=3503000000000000


..trying to read in async when the backend server does NOT supports asyncs:

	# AN ASYNC SENSOR READING REQUEST...
	root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x01\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message_async

              bash-329     [000] ..... 16415.938739: scmi_msg_dump: pt=15 t=CMND msg_id=06 seq=0000 s=0 pyld=0100000001000000
   irq/35-mhu_db_l-81      [000] ..... 16415.944129: scmi_msg_dump: pt=15 t=RESP msg_id=06 seq=0000 s=-1 pyld=

	# RETURNS A STATUS -1 FROM THE SERVER NOT SUPPORTING IT
	root@deb-buster-arm64:~# cat /sys/kernel/debug/scmi_raw/message | od --endian=little -t x4
	0000000 00005406 ffffffff
	0000010

Note that this was on a JUNO, BUT exactly the same steps can be used to
reach an SCMI Server living on a VM reachable via virtio as long as the
system under test if properly configured to work with a virtio transport.

In a nutshell the exposed API is as follows:

/sys/kernel/debug/scmi_raw/
├── errors
├── message
├── message_async
├── notification
├── reset
├── transport_max_msg_size
├── transport_rx_timeout_ms
└── transport_tx_max_msg

where:

 - message*: used to send sync/async commands and read back immediate and
   delayed responses (if any)
 - errors: used to report timeout and unexpected replies
 - reset: used to reset the SCMI Raw stack, flushing all queues from
   received messages still pending to be read out (useful to be sure to
   cleanup between test suite runs...)
 - notification: used to read any notification being spit by the system
   (if previously enabled by the user app)
 - transport*: a bunch of configuration useful to setup the user
   application expectations in terms of timeouts and message
   characteristics.

Each write corresponds to one command request and the replies or delayed
response are read back one message at time (receiving an EOF at each
message boundary).

The user application running the test is in charge of handling timeouts
and properly choosing SCMI sequence numbers for the outgoing requests: note
that the same fixed number can be re-used (...though discouraged...) as
long as the suite does NOT expect to send multiple in-flight commands
concurrently.

Since the SCMI core regular stack is partially used to deliver and collect
the messages, late replies after timeouts and any other sort of unexpected
message sent by the SCMI server platform back can be identified by the SCMI
core as usual and it will be reported under /errors for later analysis.
(a userspace test-app will have anyway properly detected the timeout on
 /message* ...)

All of the above has been roughly tested against a standard JUNO SCP SCMI
Server (mailbox trans) and an emulated SCMI Server living in a VM (virtio
trans) using a custom experimental version of the scmi-tests Compliance
suite patched to support Raw mode and posted at [2]. (still in development
...certainly not up for review as of now...it is just a mean for me to
test the testing API ... O_o)

The series is based on sudeep/for-next/scmi [3] on top of:

commit 40d30cf680cb ("firmware: arm_scmi: Harmonize SCMI tracing message format")

Still todo:

- needs more complete testing, ideally running to completion at least the full
  SCMI compliance suite at [2] to use it as a reference
- more feedback on the API

Having said that (in such a concise and brief way :P) ...
	
...any feedback/comments are welcome !

Thanks,
Cristian

---
v2 --> v3
- fixed some sparse warning on LE and __poll_t
- reworked and simplified deferred worker in charge of xfer delayed waiting
- allow for injection of DT-unknown protocols messages when in Raw mode
  (needed for any kind of fuzzing...)

v1 --> v2
- added comments and debugfs docs
- added dedicated transport devices for channels initialization
- better channels handling in Raw mode
- removed runtime enable, moved to static compile time exclusion
  of SCMI regular stack

[1]: https://gitlab.arm.com/tests/scmi-tests
[2]: https://gitlab.arm.com/linux-arm/scmi-tests-cm/-/commits/raw_mode_support_V3/
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux.git/log/?h=for-next/scmi

Cristian Marussi (9):
  firmware: arm_scmi: Refactor xfer in-flight registration routines
  firmware: arm_scmi: Simplify chan_available transport operation
  firmware: arm_scmi: Use dedicated devices to initialize channels
  firmware: arm_scmi: Add xfer raw helpers
  firmware: arm_scmi: Move errors defs and code to common.h
  firmware: arm_scmi: Add raw transmission support
  firmware: arm_scmi: Add debugfs ABI documentation for Raw mode
  firmware: arm_scmi: Reject SCMI drivers while in Raw mode
  firmware: arm_scmi: Call Raw mode hooks from the core stack

 Documentation/ABI/testing/debugfs-scmi-raw |   88 ++
 drivers/firmware/arm_scmi/Kconfig          |   13 +
 drivers/firmware/arm_scmi/Makefile         |    1 +
 drivers/firmware/arm_scmi/common.h         |   51 +-
 drivers/firmware/arm_scmi/driver.c         |  389 ++++--
 drivers/firmware/arm_scmi/mailbox.c        |    4 +-
 drivers/firmware/arm_scmi/optee.c          |    4 +-
 drivers/firmware/arm_scmi/raw_mode.c       | 1235 ++++++++++++++++++++
 drivers/firmware/arm_scmi/raw_mode.h       |   29 +
 drivers/firmware/arm_scmi/smc.c            |    4 +-
 drivers/firmware/arm_scmi/virtio.c         |    2 +-
 11 files changed, 1714 insertions(+), 106 deletions(-)
 create mode 100644 Documentation/ABI/testing/debugfs-scmi-raw
 create mode 100644 drivers/firmware/arm_scmi/raw_mode.c
 create mode 100644 drivers/firmware/arm_scmi/raw_mode.h

Comments

Vincent Guittot Oct. 7, 2022, 2:23 p.m. UTC | #1
Hi Cristian,

On Sat, 3 Sept 2022 at 20:31, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> Hi all,
>
> This series aims to introduce a new SCMI unified userspace interface meant
> to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> from the perspective of the OSPM agent (non-secure world only ...)
>
> It is proposed as a testing/development facility, it is NOT meant to be a
> feature to use in production, but only enabled in Kconfig for test
> deployments.
>
> Currently an SCMI Compliance Suite like the one at [1] can only work by
> injecting SCMI messages at the SCMI transport layer using the mailbox test
> driver (CONFIG_MAILBOX_TEST) via its few debugfs entries and looking at
> the related replies from the SCMI backend Server.
>

...

>
> In V2 the runtime enable/disable switching capability has been removed
> (for now) since still not deemed to be stable/reliable enough: as a
> consequence when SCMI Raw support is compiled in, the regular SCMI stack
> drivers are now inhibited permanently for that Kernel.
>
> A quick and trivial example from the shell...reading from a sensor
> injecting a properly crafted packet in raw mode:
>
>         # INJECT THE SENSOR_READING MESSAGE FOR SENSOR ID=1 (binary little endian)
>         root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message

I have tried your patchset with an SCMI server using an optee-os
transport channel but I have a timed out error when trying your
example above to read sensor1

#  echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00
> /sys/kernel/debug/scmi_raw/message
# [   93.306690] arm-scmi firmware:scmi0: timed out in RAW response -
HDR:00005406

and there no response available when trying to read it with
# cat /sys/kernel/debug/scmi_raw/message


The sensor 1 can be successfully read in normal mode:
# cat /sys/class/hwmon/hwmon0/temp1_input
25000
#

In both case, the SCMI server received the requests and replied successfully

Could it be that you process the answer differently in case of raw mode ?

>
>         # READING BACK THE REPLY...
>         root@deb-buster-arm64:~# cat /sys/kernel/debug/scmi_raw/message | od --endian=little -t x4
>         0000000 00005406 00000000 00000335 00000000
>         0000020
>
> while doing that, since Raw mode makes (partial) use of the regular SCMI
> stack, you can observe the messages going through the SCMI stack with the
> usual traces:
>
>               bash-329     [000] ..... 14183.446808: scmi_msg_dump: pt=15 t=CMND msg_id=06 seq=0000 s=0 pyld=0100000000000000
>    irq/35-mhu_db_l-81      [000] ..... 14183.447809: scmi_msg_dump: pt=15 t=RESP msg_id=06 seq=0000 s=0 pyld=3503000000000000
>
>
> ..trying to read in async when the backend server does NOT supports asyncs:
>
>         # AN ASYNC SENSOR READING REQUEST...
>         root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x01\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message_async
>
>               bash-329     [000] ..... 16415.938739: scmi_msg_dump: pt=15 t=CMND msg_id=06 seq=0000 s=0 pyld=0100000001000000
>    irq/35-mhu_db_l-81      [000] ..... 16415.944129: scmi_msg_dump: pt=15 t=RESP msg_id=06 seq=0000 s=-1 pyld=
>
>         # RETURNS A STATUS -1 FROM THE SERVER NOT SUPPORTING IT
>         root@deb-buster-arm64:~# cat /sys/kernel/debug/scmi_raw/message | od --endian=little -t x4
>         0000000 00005406 ffffffff
>         0000010
>
> Note that this was on a JUNO, BUT exactly the same steps can be used to
> reach an SCMI Server living on a VM reachable via virtio as long as the
> system under test if properly configured to work with a virtio transport.
>
> In a nutshell the exposed API is as follows:
>
> /sys/kernel/debug/scmi_raw/
> ├── errors
> ├── message
> ├── message_async
> ├── notification
> ├── reset
> ├── transport_max_msg_size
> ├── transport_rx_timeout_ms
> └── transport_tx_max_msg
>
> where:
>
>  - message*: used to send sync/async commands and read back immediate and
>    delayed responses (if any)
>  - errors: used to report timeout and unexpected replies
>  - reset: used to reset the SCMI Raw stack, flushing all queues from
>    received messages still pending to be read out (useful to be sure to
>    cleanup between test suite runs...)
>  - notification: used to read any notification being spit by the system
>    (if previously enabled by the user app)
>  - transport*: a bunch of configuration useful to setup the user
>    application expectations in terms of timeouts and message
>    characteristics.
>
> Each write corresponds to one command request and the replies or delayed
> response are read back one message at time (receiving an EOF at each
> message boundary).
>
> The user application running the test is in charge of handling timeouts
> and properly choosing SCMI sequence numbers for the outgoing requests: note
> that the same fixed number can be re-used (...though discouraged...) as
> long as the suite does NOT expect to send multiple in-flight commands
> concurrently.
>
> Since the SCMI core regular stack is partially used to deliver and collect
> the messages, late replies after timeouts and any other sort of unexpected
> message sent by the SCMI server platform back can be identified by the SCMI
> core as usual and it will be reported under /errors for later analysis.
> (a userspace test-app will have anyway properly detected the timeout on
>  /message* ...)
>
> All of the above has been roughly tested against a standard JUNO SCP SCMI
> Server (mailbox trans) and an emulated SCMI Server living in a VM (virtio
> trans) using a custom experimental version of the scmi-tests Compliance
> suite patched to support Raw mode and posted at [2]. (still in development
> ...certainly not up for review as of now...it is just a mean for me to
> test the testing API ... O_o)
>
> The series is based on sudeep/for-next/scmi [3] on top of:
>
> commit 40d30cf680cb ("firmware: arm_scmi: Harmonize SCMI tracing message format")
>
> Still todo:
>
> - needs more complete testing, ideally running to completion at least the full
>   SCMI compliance suite at [2] to use it as a reference
> - more feedback on the API
>
> Having said that (in such a concise and brief way :P) ...
>
> ...any feedback/comments are welcome !
>
> Thanks,
> Cristian
>
> ---
> v2 --> v3
> - fixed some sparse warning on LE and __poll_t
> - reworked and simplified deferred worker in charge of xfer delayed waiting
> - allow for injection of DT-unknown protocols messages when in Raw mode
>   (needed for any kind of fuzzing...)
>
> v1 --> v2
> - added comments and debugfs docs
> - added dedicated transport devices for channels initialization
> - better channels handling in Raw mode
> - removed runtime enable, moved to static compile time exclusion
>   of SCMI regular stack
>
> [1]: https://gitlab.arm.com/tests/scmi-tests
> [2]: https://gitlab.arm.com/linux-arm/scmi-tests-cm/-/commits/raw_mode_support_V3/
> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux.git/log/?h=for-next/scmi
>
> Cristian Marussi (9):
>   firmware: arm_scmi: Refactor xfer in-flight registration routines
>   firmware: arm_scmi: Simplify chan_available transport operation
>   firmware: arm_scmi: Use dedicated devices to initialize channels
>   firmware: arm_scmi: Add xfer raw helpers
>   firmware: arm_scmi: Move errors defs and code to common.h
>   firmware: arm_scmi: Add raw transmission support
>   firmware: arm_scmi: Add debugfs ABI documentation for Raw mode
>   firmware: arm_scmi: Reject SCMI drivers while in Raw mode
>   firmware: arm_scmi: Call Raw mode hooks from the core stack
>
>  Documentation/ABI/testing/debugfs-scmi-raw |   88 ++
>  drivers/firmware/arm_scmi/Kconfig          |   13 +
>  drivers/firmware/arm_scmi/Makefile         |    1 +
>  drivers/firmware/arm_scmi/common.h         |   51 +-
>  drivers/firmware/arm_scmi/driver.c         |  389 ++++--
>  drivers/firmware/arm_scmi/mailbox.c        |    4 +-
>  drivers/firmware/arm_scmi/optee.c          |    4 +-
>  drivers/firmware/arm_scmi/raw_mode.c       | 1235 ++++++++++++++++++++
>  drivers/firmware/arm_scmi/raw_mode.h       |   29 +
>  drivers/firmware/arm_scmi/smc.c            |    4 +-
>  drivers/firmware/arm_scmi/virtio.c         |    2 +-
>  11 files changed, 1714 insertions(+), 106 deletions(-)
>  create mode 100644 Documentation/ABI/testing/debugfs-scmi-raw
>  create mode 100644 drivers/firmware/arm_scmi/raw_mode.c
>  create mode 100644 drivers/firmware/arm_scmi/raw_mode.h
>
> --
> 2.32.0
>
Cristian Marussi Oct. 7, 2022, 3:37 p.m. UTC | #2
On Fri, Oct 07, 2022 at 04:23:33PM +0200, Vincent Guittot wrote:
> Hi Cristian,
> 

Hi Vincent

thanks for give it a try !

> On Sat, 3 Sept 2022 at 20:31, Cristian Marussi <cristian.marussi@arm.com> wrote:
> >
> > Hi all,
> >
> > This series aims to introduce a new SCMI unified userspace interface meant
> > to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> > from the perspective of the OSPM agent (non-secure world only ...)
> >
> > It is proposed as a testing/development facility, it is NOT meant to be a
> > feature to use in production, but only enabled in Kconfig for test
> > deployments.
> >
> > Currently an SCMI Compliance Suite like the one at [1] can only work by
> > injecting SCMI messages at the SCMI transport layer using the mailbox test
> > driver (CONFIG_MAILBOX_TEST) via its few debugfs entries and looking at
> > the related replies from the SCMI backend Server.
> >
> 
> ...
> 
> >
> > In V2 the runtime enable/disable switching capability has been removed
> > (for now) since still not deemed to be stable/reliable enough: as a
> > consequence when SCMI Raw support is compiled in, the regular SCMI stack
> > drivers are now inhibited permanently for that Kernel.
> >
> > A quick and trivial example from the shell...reading from a sensor
> > injecting a properly crafted packet in raw mode:
> >
> >         # INJECT THE SENSOR_READING MESSAGE FOR SENSOR ID=1 (binary little endian)
> >         root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message
> 
> I have tried your patchset with an SCMI server using an optee-os
> transport channel but I have a timed out error when trying your
> example above to read sensor1
> 
> #  echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00
> > /sys/kernel/debug/scmi_raw/message
> # [   93.306690] arm-scmi firmware:scmi0: timed out in RAW response -
> HDR:00005406
> 
> and there no response available when trying to read it with
> # cat /sys/kernel/debug/scmi_raw/message
> 

is there anything cat'ting /sys/kernel/debug/scmi_raw/errors ?

> 
> The sensor 1 can be successfully read in normal mode:
> # cat /sys/class/hwmon/hwmon0/temp1_input
> 25000
> #
> 
> In both case, the SCMI server received the requests and replied successfully
> 
> Could it be that you process the answer differently in case of raw mode ?
> 

Well, absolutely, when in raw mode the reply is picked up directly into
the RX path and copied in a message queue to be read from asyncrhnously
later via debugfs.

... mmm I think I found the problem...the reply is picked up on the RX *IRQ*
path as of now...but in optee/SMC there is no interrupt (sometime there is in
SMC) and the reply is instead read back straight away (transport is marked as
sync_cmds_completed_on_ret=true in fact).... so this is the issue probably ...
I have NOT tested on anything but mailbox and virtio till now...and I
missed this possibility that this NO-irq reply was a thing even when NOT
in polling mode (which I do not support)...my bad :<

Ok, next week I'll rework the series to fix this big_BUG and some other minor
things...in the meantime if you want to try this snippet down below...

... this won't definitely be the final patch but it could let you experiment
more (only build tested though )

Thanks for testing !
Cristian

--->8-------

diff --git a/drivers/firmware/arm_scmi/raw_mode.c b/drivers/firmware/arm_scmi/raw_mode.c
index 13eeebe4b7a8..b9fcb66a1b6a 100644
--- a/drivers/firmware/arm_scmi/raw_mode.c
+++ b/drivers/firmware/arm_scmi/raw_mode.c
@@ -197,6 +197,8 @@ struct scmi_dbg_raw_data {
        size_t rx_size;
 };
 
+void scmi_raw_message_report(void *r, struct scmi_xfer *xfer, unsigned int idx);
+
 static inline
 struct scmi_raw_buffer *scmi_raw_buffer_get(struct scmi_raw_mode_info *raw,
                                            unsigned int idx)
@@ -389,22 +391,34 @@ static void scmi_xfer_raw_worker(struct work_struct *work)
 
                xfer = rw->xfer;
 
-               /*
-                * Waiters are queued by wait-deadline at the end, so some of
-                * them could have been already expired when processed, BUT we
-                * have to check the completion status anyway just in case a
-                * virtually expired (aged) transaction was indeed completed
-                * fine and we'll have to wait for the asynchronous part (if
-                * any).
-                */
-               aging = jiffies - rw->start_jiffies;
-               tmo = max_tmo > aging ? max_tmo - aging : 0;
-
-               if ((tmo && !wait_for_completion_timeout(&xfer->done, tmo)) ||
-                   (!tmo && !try_wait_for_completion(&xfer->done))) {
-                       dev_err(dev, "timed out in RAW response - HDR:%08X\n",
-                               pack_scmi_header(&xfer->hdr));
-                       ret = -ETIMEDOUT;
+               if (!raw->desc->sync_cmds_completed_on_ret) {
+                       /*
+                        * Waiters are queued by wait-deadline at the end, so some of
+                        * them could have been already expired when processed, BUT we
+                        * have to check the completion status anyway just in case a
+                        * virtually expired (aged) transaction was indeed completed
+                        * fine and we'll have to wait for the asynchronous part (if
+                        * any).
+                        */
+                       aging = jiffies - rw->start_jiffies;
+                       tmo = max_tmo > aging ? max_tmo - aging : 0;
+
+                       if ((tmo &&
+                            !wait_for_completion_timeout(&xfer->done, tmo)) ||
+                            (!tmo && !try_wait_for_completion(&xfer->done))) {
+                               dev_err(dev,
+                                       "timed out in RAW response - HDR:%08X\n",
+                                       pack_scmi_header(&xfer->hdr));
+                               ret = -ETIMEDOUT;
+                       }
+               } else {
+                       raw->desc->ops->fetch_response(rw->cinfo, xfer);
+                       /* Trace polled replies. */
+                       trace_scmi_msg_dump(xfer->hdr.protocol_id, xfer->hdr.id,
+                                           "RESP",
+                                           xfer->hdr.seq, xfer->hdr.status,
+                                           xfer->rx.buf, xfer->rx.len);
+                       scmi_raw_message_report(raw, xfer, SCMI_RAW_REPLY_QUEUE);
                }
 
                /* Avoid unneeded async waits */


---8<-------
Vincent Guittot Oct. 7, 2022, 3:58 p.m. UTC | #3
On Fri, 7 Oct 2022 at 17:37, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> On Fri, Oct 07, 2022 at 04:23:33PM +0200, Vincent Guittot wrote:
> > Hi Cristian,
> >
>
> Hi Vincent
>
> thanks for give it a try !
>
> > On Sat, 3 Sept 2022 at 20:31, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > >
> > > Hi all,
> > >
> > > This series aims to introduce a new SCMI unified userspace interface meant
> > > to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> > > from the perspective of the OSPM agent (non-secure world only ...)
> > >
> > > It is proposed as a testing/development facility, it is NOT meant to be a
> > > feature to use in production, but only enabled in Kconfig for test
> > > deployments.
> > >
> > > Currently an SCMI Compliance Suite like the one at [1] can only work by
> > > injecting SCMI messages at the SCMI transport layer using the mailbox test
> > > driver (CONFIG_MAILBOX_TEST) via its few debugfs entries and looking at
> > > the related replies from the SCMI backend Server.
> > >
> >
> > ...
> >
> > >
> > > In V2 the runtime enable/disable switching capability has been removed
> > > (for now) since still not deemed to be stable/reliable enough: as a
> > > consequence when SCMI Raw support is compiled in, the regular SCMI stack
> > > drivers are now inhibited permanently for that Kernel.
> > >
> > > A quick and trivial example from the shell...reading from a sensor
> > > injecting a properly crafted packet in raw mode:
> > >
> > >         # INJECT THE SENSOR_READING MESSAGE FOR SENSOR ID=1 (binary little endian)
> > >         root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message
> >
> > I have tried your patchset with an SCMI server using an optee-os
> > transport channel but I have a timed out error when trying your
> > example above to read sensor1
> >
> > #  echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00
> > > /sys/kernel/debug/scmi_raw/message
> > # [   93.306690] arm-scmi firmware:scmi0: timed out in RAW response -
> > HDR:00005406
> >
> > and there no response available when trying to read it with
> > # cat /sys/kernel/debug/scmi_raw/message
> >
>
> is there anything cat'ting /sys/kernel/debug/scmi_raw/errors ?

It was empty

>
> >
> > The sensor 1 can be successfully read in normal mode:
> > # cat /sys/class/hwmon/hwmon0/temp1_input
> > 25000
> > #
> >
> > In both case, the SCMI server received the requests and replied successfully
> >
> > Could it be that you process the answer differently in case of raw mode ?
> >
>
> Well, absolutely, when in raw mode the reply is picked up directly into
> the RX path and copied in a message queue to be read from asyncrhnously
> later via debugfs.
>
> ... mmm I think I found the problem...the reply is picked up on the RX *IRQ*
> path as of now...but in optee/SMC there is no interrupt (sometime there is in
> SMC) and the reply is instead read back straight away (transport is marked as
> sync_cmds_completed_on_ret=true in fact).... so this is the issue probably ...
> I have NOT tested on anything but mailbox and virtio till now...and I
> missed this possibility that this NO-irq reply was a thing even when NOT
> in polling mode (which I do not support)...my bad :<
>
> Ok, next week I'll rework the series to fix this big_BUG and some other minor
> things...in the meantime if you want to try this snippet down below...
>
> ... this won't definitely be the final patch but it could let you experiment
> more (only build tested though )

Thanks.
The patch below fixes my problem with optee transport layer

>
> Thanks for testing !
> Cristian
>
> --->8-------
>
> diff --git a/drivers/firmware/arm_scmi/raw_mode.c b/drivers/firmware/arm_scmi/raw_mode.c
> index 13eeebe4b7a8..b9fcb66a1b6a 100644
> --- a/drivers/firmware/arm_scmi/raw_mode.c
> +++ b/drivers/firmware/arm_scmi/raw_mode.c
> @@ -197,6 +197,8 @@ struct scmi_dbg_raw_data {
>         size_t rx_size;
>  };
>
> +void scmi_raw_message_report(void *r, struct scmi_xfer *xfer, unsigned int idx);
> +
>  static inline
>  struct scmi_raw_buffer *scmi_raw_buffer_get(struct scmi_raw_mode_info *raw,
>                                             unsigned int idx)
> @@ -389,22 +391,34 @@ static void scmi_xfer_raw_worker(struct work_struct *work)
>
>                 xfer = rw->xfer;
>
> -               /*
> -                * Waiters are queued by wait-deadline at the end, so some of
> -                * them could have been already expired when processed, BUT we
> -                * have to check the completion status anyway just in case a
> -                * virtually expired (aged) transaction was indeed completed
> -                * fine and we'll have to wait for the asynchronous part (if
> -                * any).
> -                */
> -               aging = jiffies - rw->start_jiffies;
> -               tmo = max_tmo > aging ? max_tmo - aging : 0;
> -
> -               if ((tmo && !wait_for_completion_timeout(&xfer->done, tmo)) ||
> -                   (!tmo && !try_wait_for_completion(&xfer->done))) {
> -                       dev_err(dev, "timed out in RAW response - HDR:%08X\n",
> -                               pack_scmi_header(&xfer->hdr));
> -                       ret = -ETIMEDOUT;
> +               if (!raw->desc->sync_cmds_completed_on_ret) {
> +                       /*
> +                        * Waiters are queued by wait-deadline at the end, so some of
> +                        * them could have been already expired when processed, BUT we
> +                        * have to check the completion status anyway just in case a
> +                        * virtually expired (aged) transaction was indeed completed
> +                        * fine and we'll have to wait for the asynchronous part (if
> +                        * any).
> +                        */
> +                       aging = jiffies - rw->start_jiffies;
> +                       tmo = max_tmo > aging ? max_tmo - aging : 0;
> +
> +                       if ((tmo &&
> +                            !wait_for_completion_timeout(&xfer->done, tmo)) ||
> +                            (!tmo && !try_wait_for_completion(&xfer->done))) {
> +                               dev_err(dev,
> +                                       "timed out in RAW response - HDR:%08X\n",
> +                                       pack_scmi_header(&xfer->hdr));
> +                               ret = -ETIMEDOUT;
> +                       }
> +               } else {
> +                       raw->desc->ops->fetch_response(rw->cinfo, xfer);
> +                       /* Trace polled replies. */
> +                       trace_scmi_msg_dump(xfer->hdr.protocol_id, xfer->hdr.id,
> +                                           "RESP",
> +                                           xfer->hdr.seq, xfer->hdr.status,
> +                                           xfer->rx.buf, xfer->rx.len);
> +                       scmi_raw_message_report(raw, xfer, SCMI_RAW_REPLY_QUEUE);
>                 }
>
>                 /* Avoid unneeded async waits */
>
>
> ---8<-------
>
Cristian Marussi Oct. 7, 2022, 4:07 p.m. UTC | #4
On Fri, Oct 07, 2022 at 05:58:59PM +0200, Vincent Guittot wrote:
> On Fri, 7 Oct 2022 at 17:37, Cristian Marussi <cristian.marussi@arm.com> wrote:
> >
> > On Fri, Oct 07, 2022 at 04:23:33PM +0200, Vincent Guittot wrote:
> > > Hi Cristian,
> > >
> >
> > Hi Vincent
> >
> > thanks for give it a try !
> >
> > > On Sat, 3 Sept 2022 at 20:31, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > This series aims to introduce a new SCMI unified userspace interface meant
> > > > to ease testing an SCMI Server implementation for compliance, fuzzing etc.,
> > > > from the perspective of the OSPM agent (non-secure world only ...)
> > > >
> > > > It is proposed as a testing/development facility, it is NOT meant to be a
> > > > feature to use in production, but only enabled in Kconfig for test
> > > > deployments.
> > > >
> > > > Currently an SCMI Compliance Suite like the one at [1] can only work by
> > > > injecting SCMI messages at the SCMI transport layer using the mailbox test
> > > > driver (CONFIG_MAILBOX_TEST) via its few debugfs entries and looking at
> > > > the related replies from the SCMI backend Server.
> > > >
> > >
> > > ...
> > >
> > > >
> > > > In V2 the runtime enable/disable switching capability has been removed
> > > > (for now) since still not deemed to be stable/reliable enough: as a
> > > > consequence when SCMI Raw support is compiled in, the regular SCMI stack
> > > > drivers are now inhibited permanently for that Kernel.
> > > >
> > > > A quick and trivial example from the shell...reading from a sensor
> > > > injecting a properly crafted packet in raw mode:
> > > >
> > > >         # INJECT THE SENSOR_READING MESSAGE FOR SENSOR ID=1 (binary little endian)
> > > >         root@deb-buster-arm64:~# echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00 > /sys/kernel/debug/scmi_raw/message
> > >
> > > I have tried your patchset with an SCMI server using an optee-os
> > > transport channel but I have a timed out error when trying your
> > > example above to read sensor1
> > >
> > > #  echo -e -n \\x06\\x54\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00
> > > > /sys/kernel/debug/scmi_raw/message
> > > # [   93.306690] arm-scmi firmware:scmi0: timed out in RAW response -
> > > HDR:00005406
> > >
> > > and there no response available when trying to read it with
> > > # cat /sys/kernel/debug/scmi_raw/message
> > >
> >
> > is there anything cat'ting /sys/kernel/debug/scmi_raw/errors ?
> 
> It was empty
> 
> >
> > >
> > > The sensor 1 can be successfully read in normal mode:
> > > # cat /sys/class/hwmon/hwmon0/temp1_input
> > > 25000
> > > #
> > >
> > > In both case, the SCMI server received the requests and replied successfully
> > >
> > > Could it be that you process the answer differently in case of raw mode ?
> > >
> >
> > Well, absolutely, when in raw mode the reply is picked up directly into
> > the RX path and copied in a message queue to be read from asyncrhnously
> > later via debugfs.
> >
> > ... mmm I think I found the problem...the reply is picked up on the RX *IRQ*
> > path as of now...but in optee/SMC there is no interrupt (sometime there is in
> > SMC) and the reply is instead read back straight away (transport is marked as
> > sync_cmds_completed_on_ret=true in fact).... so this is the issue probably ...
> > I have NOT tested on anything but mailbox and virtio till now...and I
> > missed this possibility that this NO-irq reply was a thing even when NOT
> > in polling mode (which I do not support)...my bad :<
> >
> > Ok, next week I'll rework the series to fix this big_BUG and some other minor
> > things...in the meantime if you want to try this snippet down below...
> >
> > ... this won't definitely be the final patch but it could let you experiment
> > more (only build tested though )
> 
> Thanks.
> The patch below fixes my problem with optee transport layer
> 

Good, thanks for the patience.

Thanks,
Cristian

> >
> > Thanks for testing !
> > Cristian
> >
> > --->8-------
> >
> > diff --git a/drivers/firmware/arm_scmi/raw_mode.c b/drivers/firmware/arm_scmi/raw_mode.c
> > index 13eeebe4b7a8..b9fcb66a1b6a 100644
> > --- a/drivers/firmware/arm_scmi/raw_mode.c
> > +++ b/drivers/firmware/arm_scmi/raw_mode.c
> > @@ -197,6 +197,8 @@ struct scmi_dbg_raw_data {
> >         size_t rx_size;
> >  };
> >
> > +void scmi_raw_message_report(void *r, struct scmi_xfer *xfer, unsigned int idx);
> > +
> >  static inline
> >  struct scmi_raw_buffer *scmi_raw_buffer_get(struct scmi_raw_mode_info *raw,
> >                                             unsigned int idx)
> > @@ -389,22 +391,34 @@ static void scmi_xfer_raw_worker(struct work_struct *work)
> >
> >                 xfer = rw->xfer;
> >
> > -               /*
> > -                * Waiters are queued by wait-deadline at the end, so some of
> > -                * them could have been already expired when processed, BUT we
> > -                * have to check the completion status anyway just in case a
> > -                * virtually expired (aged) transaction was indeed completed
> > -                * fine and we'll have to wait for the asynchronous part (if
> > -                * any).
> > -                */
> > -               aging = jiffies - rw->start_jiffies;
> > -               tmo = max_tmo > aging ? max_tmo - aging : 0;
> > -
> > -               if ((tmo && !wait_for_completion_timeout(&xfer->done, tmo)) ||
> > -                   (!tmo && !try_wait_for_completion(&xfer->done))) {
> > -                       dev_err(dev, "timed out in RAW response - HDR:%08X\n",
> > -                               pack_scmi_header(&xfer->hdr));
> > -                       ret = -ETIMEDOUT;
> > +               if (!raw->desc->sync_cmds_completed_on_ret) {
> > +                       /*
> > +                        * Waiters are queued by wait-deadline at the end, so some of
> > +                        * them could have been already expired when processed, BUT we
> > +                        * have to check the completion status anyway just in case a
> > +                        * virtually expired (aged) transaction was indeed completed
> > +                        * fine and we'll have to wait for the asynchronous part (if
> > +                        * any).
> > +                        */
> > +                       aging = jiffies - rw->start_jiffies;
> > +                       tmo = max_tmo > aging ? max_tmo - aging : 0;
> > +
> > +                       if ((tmo &&
> > +                            !wait_for_completion_timeout(&xfer->done, tmo)) ||
> > +                            (!tmo && !try_wait_for_completion(&xfer->done))) {
> > +                               dev_err(dev,
> > +                                       "timed out in RAW response - HDR:%08X\n",
> > +                                       pack_scmi_header(&xfer->hdr));
> > +                               ret = -ETIMEDOUT;
> > +                       }
> > +               } else {
> > +                       raw->desc->ops->fetch_response(rw->cinfo, xfer);
> > +                       /* Trace polled replies. */
> > +                       trace_scmi_msg_dump(xfer->hdr.protocol_id, xfer->hdr.id,
> > +                                           "RESP",
> > +                                           xfer->hdr.seq, xfer->hdr.status,
> > +                                           xfer->rx.buf, xfer->rx.len);
> > +                       scmi_raw_message_report(raw, xfer, SCMI_RAW_REPLY_QUEUE);
> >                 }
> >
> >                 /* Avoid unneeded async waits */
> >
> >
> > ---8<-------
> >