diff mbox series

bus: mhi: core: Use cached values for calculating the shared write pointer

Message ID 1629997263-11147-1-git-send-email-quic_jhugo@quicinc.com (mailing list archive)
State Superseded
Headers show
Series bus: mhi: core: Use cached values for calculating the shared write pointer | expand

Commit Message

Jeffrey Hugo Aug. 26, 2021, 5:01 p.m. UTC
mhi_recycle_ev_ring() computes the shared write pointer for the ring
(ctxt_wp) using a read/modify/write pattern where the ctxt_wp value in the
shared memory is read, incremented, and written back.  There are no checks
on the read value, it is assumed that it is kept in sync with the locally
cached value.  Per the MHI spec, this is correct.  The device should only
read ctxt_wp, never write it.

However, there are devices in the wild that violate the spec, and can
update the ctxt_wp in a specific scenario.  This can cause corruption, and
violate the above assumption that the ctxt_wp is in sync with the cached
value.

This can occur when the device has loaded firmware from the host, and is
transitioning from the SBL EE to the AMSS EE.  As part of shutting down
SBL, the SBL flushes it's local MHI context to the shared memory since
the local context will not persist across an EE change.  In the case of
the event ring, SBL will flush its entire context, not just the parts that
it is allowed to update.  This means SBL will write to ctxt_wp, and
possibly corrupt it.

An example:

Host				Device
----				---
Update ctxt_wp to 0x1f0
				SBL observes 0x1f0
Update ctxt_wp to 0x0
				Starts transition to AMSS EE
				Context flush, writes 0x1f0 to ctxt_wp
Update ctxt_wp to 0x200
Update ctxt_wp to 0x210
				AMSS observes 0x210
				0x210 exceeds ring size
				AMSS signals syserr

The reason the ctxt_wp goes off the end of the ring is that the rollover
check is only performed on the cached wp, which is out of sync with
ctxt_wp.

Since the host is the authority of the value of ctxt_wp per the MHI spec,
we can fix this issue by not reading ctxt_wp from the shared memory, and
instead compute it based on the cached value.  If SBL corrupts ctxt_wp,
the host won't observe it, and will correct the value at some point later.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
---
 drivers/bus/mhi/core/main.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

Comments

Hemant Kumar Aug. 26, 2021, 7:54 p.m. UTC | #1
On 8/26/2021 10:01 AM, Jeffrey Hugo wrote:
> mhi_recycle_ev_ring() computes the shared write pointer for the ring
> (ctxt_wp) using a read/modify/write pattern where the ctxt_wp value in the
> shared memory is read, incremented, and written back.  There are no checks
> on the read value, it is assumed that it is kept in sync with the locally
> cached value.  Per the MHI spec, this is correct.  The device should only
> read ctxt_wp, never write it.
> 
> However, there are devices in the wild that violate the spec, and can
> update the ctxt_wp in a specific scenario.  This can cause corruption, and
> violate the above assumption that the ctxt_wp is in sync with the cached
> value.
> 
> This can occur when the device has loaded firmware from the host, and is
> transitioning from the SBL EE to the AMSS EE.  As part of shutting down
> SBL, the SBL flushes it's local MHI context to the shared memory since
> the local context will not persist across an EE change.  In the case of
> the event ring, SBL will flush its entire context, not just the parts that
> it is allowed to update.  This means SBL will write to ctxt_wp, and
> possibly corrupt it.
> 
> An example:
> 
> Host				Device
> ----				---
> Update ctxt_wp to 0x1f0
> 				SBL observes 0x1f0
> Update ctxt_wp to 0x0
> 				Starts transition to AMSS EE
> 				Context flush, writes 0x1f0 to ctxt_wp
> Update ctxt_wp to 0x200
> Update ctxt_wp to 0x210
> 				AMSS observes 0x210
> 				0x210 exceeds ring size
> 				AMSS signals syserr
> 
> The reason the ctxt_wp goes off the end of the ring is that the rollover
> check is only performed on the cached wp, which is out of sync with
> ctxt_wp.
> 
> Since the host is the authority of the value of ctxt_wp per the MHI spec,
> we can fix this issue by not reading ctxt_wp from the shared memory, and
> instead compute it based on the cached value.  If SBL corrupts ctxt_wp,
> the host won't observe it, and will correct the value at some point later.
> 
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> ---
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>

LGTM
kernel test robot Aug. 26, 2021, 8:07 p.m. UTC | #2
Hi Jeffrey,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.14-rc7 next-20210826]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Jeffrey-Hugo/bus-mhi-core-Use-cached-values-for-calculating-the-shared-write-pointer/20210827-010402
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 73f3af7b4611d77bdaea303fb639333eb28e37d7
config: riscv-randconfig-r042-20210826 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project ea08c4cd1c0869ec5024a8bb3f5cdf06ab03ae83)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/82f8b573ef9eb6f068ed842f7ce1009c6bd5b63e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jeffrey-Hugo/bus-mhi-core-Use-cached-values-for-calculating-the-shared-write-pointer/20210827-010402
        git checkout 82f8b573ef9eb6f068ed842f7ce1009c6bd5b63e
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=riscv SHELL=/bin/bash drivers/bus/mhi/core/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> drivers/bus/mhi/core/main.c:542:50: error: use of undeclared identifier 'ring_base'; did you mean 'pfn_base'?
           *ring->ctxt_wp = ring->iommu_base + (ring->wp - ring_base);
                                                           ^~~~~~~~~
                                                           pfn_base
   arch/riscv/include/asm/page.h:82:22: note: 'pfn_base' declared here
   extern unsigned long pfn_base;
                        ^
>> drivers/bus/mhi/core/main.c:542:17: warning: incompatible pointer to integer conversion assigning to 'u64' (aka 'unsigned long long') from 'void *' [-Wint-conversion]
           *ring->ctxt_wp = ring->iommu_base + (ring->wp - ring_base);
                          ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1 warning and 1 error generated.


vim +542 drivers/bus/mhi/core/main.c

   532	
   533	static void mhi_recycle_ev_ring_element(struct mhi_controller *mhi_cntrl,
   534						struct mhi_ring *ring)
   535	{
   536		/* Update the WP */
   537		ring->wp += ring->el_size;
   538	
   539		if (ring->wp >= (ring->base + ring->len))
   540			ring->wp = ring->base;
   541	
 > 542		*ring->ctxt_wp = ring->iommu_base + (ring->wp - ring_base);
   543	
   544		/* Update the RP */
   545		ring->rp += ring->el_size;
   546		if (ring->rp >= (ring->base + ring->len))
   547			ring->rp = ring->base;
   548	
   549		/* Update to all cores */
   550		smp_wmb();
   551	}
   552	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff mbox series

Patch

diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index c01ec2f..1e7e7bb 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -533,18 +533,13 @@  irqreturn_t mhi_intvec_handler(int irq_number, void *dev)
 static void mhi_recycle_ev_ring_element(struct mhi_controller *mhi_cntrl,
 					struct mhi_ring *ring)
 {
-	dma_addr_t ctxt_wp;
-
 	/* Update the WP */
 	ring->wp += ring->el_size;
-	ctxt_wp = *ring->ctxt_wp + ring->el_size;
 
-	if (ring->wp >= (ring->base + ring->len)) {
+	if (ring->wp >= (ring->base + ring->len))
 		ring->wp = ring->base;
-		ctxt_wp = ring->iommu_base;
-	}
 
-	*ring->ctxt_wp = ctxt_wp;
+	*ring->ctxt_wp = ring->iommu_base + (ring->wp - ring_base);
 
 	/* Update the RP */
 	ring->rp += ring->el_size;