diff mbox series

wifi: wilc1000: Rework bus locking

Message ID 20241022013855.284783-1-marex@denx.de (mailing list archive)
State New
Delegated to: Kalle Valo
Headers show
Series wifi: wilc1000: Rework bus locking | expand

Commit Message

Marek Vasut Oct. 22, 2024, 1:38 a.m. UTC
The bus locking in this driver is broken and produces subtle race
condition with ksdioirqd and its mmc_claim_host()/mmc_release_host()
usage in case of SDIO bus. Rework the locking to avoid this race
condition.

The problem is the hif_cs mutex used in acquire_bus()/release_bus(),
which makes it look like calling acquire_bus() results in exclusive
access to the bus, but that is not true for SDIO bus. For SDIO bus,
to obtain exclusive access (any access, really), it is necessary to
call sdio_claim_host(), which is a wrapper around mmc_claim_host(),
which does its own locking. The acquire_bus() does not do that, but
the SDIO interface implementation does call sdio_claim_host() and
sdio_release_host() every single command, which is problematic. To
make things worse, wilc_sdio_interrupt() implementation called from
ksdioirqd first calls sdio_release_host(), then interrupt handling
and finally sdio_claim_host().

The core problem is that sdio_claim_host() cannot be done per command,
but has to be done per register/data IO which consists of multiple
commands. Usually the WILC register read/write consists of 3x CMD52
to push in CSA pointer address and 1x CMD53 to read/write data to that
address. Most other accesses are also composed of multiple commands.

Currently, if ksdioirqd wakes up and attempts to read SDIO_CCCR_INTx
to get pending SDIO IRQs in sdio_get_pending_irqs(), it can easily
perform that transfer between two consecutive CMD52 which are pushing
in the CSA pointer address and possibly disrupt the WILC operation.
This is undesired behavior.

Rework the locking.

Introduce new .hif_claim/.hif_release callbacks which implement bus
specific locking. Lock/unlock SDIO bus access using sdio_claim_host()
and sdio_release_host(), lock/unlock SPI bus access using the current
hif_cs mutex moved purely into the spi.c interface. Make acquire_bus()
and release_bus() call the .hif_claim/.hif_release() callbacks and do
not access the hif_cs mutex from there at all.

Remove any SDIO bus locking used directly in commands and the broken
SDIO bus unlocking in wilc_sdio_interrupt(), this is no longer needed.
Fix up SDIO initialization code which newly needs sdio_claim_host()
and sdio_release_host(), since it cannot depend on the locking being
done per-command anymore.

Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Adham Abozaeid <adham.abozaeid@microchip.com>
Cc: Ajay Singh <ajay.kathat@microchip.com>
Cc: Alexis Lothoré <alexis.lothore@bootlin.com>
Cc: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Cc: Conor Dooley <conor+dt@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Krzysztof Kozlowski <krzk+dt@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
---
NOTE: I only tested the SDIO part
---
 .../wireless/microchip/wilc1000/cfg80211.c    |  2 -
 .../net/wireless/microchip/wilc1000/netdev.c  |  2 -
 .../net/wireless/microchip/wilc1000/netdev.h  |  3 --
 .../net/wireless/microchip/wilc1000/sdio.c    | 40 ++++++++++++-------
 drivers/net/wireless/microchip/wilc1000/spi.c | 15 +++++++
 .../net/wireless/microchip/wilc1000/wlan.c    | 30 ++++++++++----
 .../net/wireless/microchip/wilc1000/wlan.h    |  2 +
 7 files changed, 65 insertions(+), 29 deletions(-)

Comments

Alexis Lothoré Oct. 22, 2024, 10:43 a.m. UTC | #1
Hi Marek,

On 10/22/24 03:38, Marek Vasut wrote:
> The bus locking in this driver is broken and produces subtle race
> condition with ksdioirqd and its mmc_claim_host()/mmc_release_host()
> usage in case of SDIO bus. Rework the locking to avoid this race
> condition.
> 
> The problem is the hif_cs mutex used in acquire_bus()/release_bus(),
> which makes it look like calling acquire_bus() results in exclusive
> access to the bus, but that is not true for SDIO bus. For SDIO bus,
> to obtain exclusive access (any access, really), it is necessary to
> call sdio_claim_host(), which is a wrapper around mmc_claim_host(),
> which does its own locking. The acquire_bus() does not do that, but
> the SDIO interface implementation does call sdio_claim_host() and
> sdio_release_host() every single command, which is problematic. To
> make things worse, wilc_sdio_interrupt() implementation called from
> ksdioirqd first calls sdio_release_host(), then interrupt handling
> and finally sdio_claim_host().
> 
> The core problem is that sdio_claim_host() cannot be done per command,
> but has to be done per register/data IO which consists of multiple
> commands.

Is it really true ? What makes you say that we can not perform multiple
operations under the same exclusive sdio section ?

Usually the WILC register read/write consists of 3x CMD52
> to push in CSA pointer address and 1x CMD53 to read/write data to that
> address. Most other accesses are also composed of multiple commands.
> 
> Currently, if ksdioirqd wakes up and attempts to read SDIO_CCCR_INTx
> to get pending SDIO IRQs in sdio_get_pending_irqs(), it can easily
> perform that transfer between two consecutive CMD52 which are pushing
> in the CSA pointer address and possibly disrupt the WILC operation.
> This is undesired behavior.

I agree about the observation, and then I disagree about the statement above on
sdio_claim_host/sdio_release_host not meant to be used for multiple commands.
I see plenty of sdio wireless drivers performing multiple sdio operations under
the same sdio exclusive bus access section, either explicitely in their code, or
through a sdio dedicated helper (eg: sdio_enable_func, sdio_disable_func).

But more concerns below
> 
> Rework the locking.
> 
> Introduce new .hif_claim/.hif_release callbacks which implement bus
> specific locking. Lock/unlock SDIO bus access using sdio_claim_host()
> and sdio_release_host(), lock/unlock SPI bus access using the current
> hif_cs mutex moved purely into the spi.c interface. Make acquire_bus()
> and release_bus() call the .hif_claim/.hif_release() callbacks and do
> not access the hif_cs mutex from there at all.
> 
> Remove any SDIO bus locking used directly in commands and the broken
> SDIO bus unlocking in wilc_sdio_interrupt(), this is no longer needed.
> Fix up SDIO initialization code which newly needs sdio_claim_host()
> and sdio_release_host(), since it cannot depend on the locking being
> done per-command anymore.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>

[...]

>  
> -static void wilc_sdio_interrupt(struct sdio_func *func)
> +static void wilc_sdio_claim(struct wilc *wilc)
> +{
> +	struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev);
> +
> +	sdio_claim_host(func);
> +}
> +
> +static void wilc_sdio_release(struct wilc *wilc)
>  {
> +	struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev);
> +
>  	sdio_release_host(func);
> +}

So with this series, we end up with some bus-specific operations needing some
locking, but is now up to the upper layer to control this locking. This feels
wrong. The driver has a dedicated sdio layer, so if we need some locking for
sdio-specific operations, it should be handled autonomously by the sdio layer,
right ?

[...]

>  static int wilc_wlan_cfg_commit(struct wilc_vif *vif, int type,
> diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.h b/drivers/net/wireless/microchip/wilc1000/wlan.h
> index b9e7f9222eadd..ade2db95e8a0f 100644
> --- a/drivers/net/wireless/microchip/wilc1000/wlan.h
> +++ b/drivers/net/wireless/microchip/wilc1000/wlan.h
> @@ -403,6 +403,8 @@ struct wilc_hif_func {
>  	void (*disable_interrupt)(struct wilc *nic);
>  	int (*hif_reset)(struct wilc *wilc);
>  	bool (*hif_is_init)(struct wilc *wilc);
> +	void (*hif_claim)(struct wilc *wilc);
> +	void (*hif_release)(struct wilc *wilc);

So IIUC, your series push the hif_cs lock into each bus layer of the driver,
remove any explicit call to bus-specific locking mechanism from those layers,
and makes the upper layer control the locking. As mentioned above, I don't
understand why those layers can not manage the bus-specific locking by
themselves (which would be a big win for the upper layer).
For SDIO specifically, I feel like letting the layer handle those claim/release
would even allow to remove this hif_cs mutex (but we may still need a lock for
SPI side)

But I may be missing something, so feel free to prove me wrong.
Marek Vasut Oct. 22, 2024, 1:19 p.m. UTC | #2
On 10/22/24 12:43 PM, Alexis Lothoré wrote:
> Hi Marek,

Hi,

> On 10/22/24 03:38, Marek Vasut wrote:
>> The bus locking in this driver is broken and produces subtle race
>> condition with ksdioirqd and its mmc_claim_host()/mmc_release_host()
>> usage in case of SDIO bus. Rework the locking to avoid this race
>> condition.
>>
>> The problem is the hif_cs mutex used in acquire_bus()/release_bus(),
>> which makes it look like calling acquire_bus() results in exclusive
>> access to the bus, but that is not true for SDIO bus. For SDIO bus,
>> to obtain exclusive access (any access, really), it is necessary to
>> call sdio_claim_host(), which is a wrapper around mmc_claim_host(),
>> which does its own locking. The acquire_bus() does not do that, but
>> the SDIO interface implementation does call sdio_claim_host() and
>> sdio_release_host() every single command, which is problematic. To
>> make things worse, wilc_sdio_interrupt() implementation called from
>> ksdioirqd first calls sdio_release_host(), then interrupt handling
>> and finally sdio_claim_host().
>>
>> The core problem is that sdio_claim_host() cannot be done per command,
>> but has to be done per register/data IO which consists of multiple
>> commands.
> 
> Is it really true ? What makes you say that we can not perform multiple
> operations under the same exclusive sdio section ?

What I am trying to say is this:

With current code, this can happen, which is not good, because transfers 
from multiple threads can be interleaved and interfere with each other:

thread 1                         thread2
do_some_higher_level_op() {
  ...
  read_register_0x3b0000() {
   claim_bus
   CMD52 0x00
   release bus                    ksdioirqd() {
                                    claim_bus
                                    CMD52 0x0f, lets read SDIO_CCCR_INTx
                                    release_bus
   claim bus                      }
   CMD52 0x00
   release_bus
   claim_bus
   CMD52 0x3b
   release_bus
   claim_bus
   CMD53 lets read data
   release_bus
  }
  ...
}

What should happen is either:

thread 1                         thread2
                                  ksdioirqd() { // option 1
                                    claim_bus
                                    CMD52 0x0f, lets read SDIO_CCCR_INTx
                                    release_bus
                                  }
do_some_higher_level_op() {
  claim_bus
  ...
  read_register_0x3b0000 {
   CMD52 0x00
   CMD52 0x00
   CMD52 0x3b
   CMD53 lets read data
  }
  ...
  read_another_register()
  ...
  release_bus
}
                                  ksdioirqd() { // option 2
                                    claim_bus
                                    CMD52 0x0f, lets read SDIO_CCCR_INTx
                                    release_bus
                                  }

That's what this patch implements, to avoid the interference.

Maybe I should include the infographics? Or reword this somehow?

> Usually the WILC register read/write consists of 3x CMD52
>> to push in CSA pointer address and 1x CMD53 to read/write data to that
>> address. Most other accesses are also composed of multiple commands.
>>
>> Currently, if ksdioirqd wakes up and attempts to read SDIO_CCCR_INTx
>> to get pending SDIO IRQs in sdio_get_pending_irqs(), it can easily
>> perform that transfer between two consecutive CMD52 which are pushing
>> in the CSA pointer address and possibly disrupt the WILC operation.
>> This is undesired behavior.
> 
> I agree about the observation, and then I disagree about the statement above on
> sdio_claim_host/sdio_release_host not meant to be used for multiple commands.

I think we have a misunderstanding here, see above.

> I see plenty of sdio wireless drivers performing multiple sdio operations under
> the same sdio exclusive bus access section, either explicitely in their code, or
> through a sdio dedicated helper (eg: sdio_enable_func, sdio_disable_func).
> 
> But more concerns below
>>
>> Rework the locking.
>>
>> Introduce new .hif_claim/.hif_release callbacks which implement bus
>> specific locking. Lock/unlock SDIO bus access using sdio_claim_host()
>> and sdio_release_host(), lock/unlock SPI bus access using the current
>> hif_cs mutex moved purely into the spi.c interface. Make acquire_bus()
>> and release_bus() call the .hif_claim/.hif_release() callbacks and do
>> not access the hif_cs mutex from there at all.
>>
>> Remove any SDIO bus locking used directly in commands and the broken
>> SDIO bus unlocking in wilc_sdio_interrupt(), this is no longer needed.
>> Fix up SDIO initialization code which newly needs sdio_claim_host()
>> and sdio_release_host(), since it cannot depend on the locking being
>> done per-command anymore.
>>
>> Signed-off-by: Marek Vasut <marex@denx.de>
> 
> [...]
> 
>>   
>> -static void wilc_sdio_interrupt(struct sdio_func *func)
>> +static void wilc_sdio_claim(struct wilc *wilc)
>> +{
>> +	struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev);
>> +
>> +	sdio_claim_host(func);
>> +}
>> +
>> +static void wilc_sdio_release(struct wilc *wilc)
>>   {
>> +	struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev);
>> +
>>   	sdio_release_host(func);
>> +}
> 
> So with this series, we end up with some bus-specific operations needing some
> locking, but is now up to the upper layer to control this locking. This feels
> wrong.

It always was the upper layer (wlan.c) that attempted to do bus locking, 
except it was incomplete. The acquire_bus()/release_bus() primitives 
seems to be an attempt at serializing bus access across multiple 
operations (which really boils down to multiple SPI or SDIO commands).

The problem is, acquire_bus()/release_bus() does not really work, 
because it does not prevent e.g. ksdioirqd from inserting a command on 
the SDIO bus. SDIO (or really, mmc framework) has its own way of doing 
bus locking, that's the sdio_claim_host()/sdio_release_host(), SPI does 
have spi_bus_lock()/spi_bus_unlock() which I should use in V2.

Those sdio_claim_host()/sdio_release_host() and 
spi_bus_lock()/spi_bus_unlock() should be called in 
acquire_bus()/release_bus() to correctly serialize bus access across 
multiple operations. That will also eliminate the custom and not really 
functional hif_cs mutex.

> The driver has a dedicated sdio layer, so if we need some locking for
> sdio-specific operations, it should be handled autonomously by the sdio layer,
> right ?

Not quite, I don't think the intention was to let anything communicate 
with the WILC device within block "protected" by 
acquire_bus()/release_bus() pair. That's why I believe this is where bus 
lock and unlock should happen too.

> [...]
> 
>>   static int wilc_wlan_cfg_commit(struct wilc_vif *vif, int type,
>> diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.h b/drivers/net/wireless/microchip/wilc1000/wlan.h
>> index b9e7f9222eadd..ade2db95e8a0f 100644
>> --- a/drivers/net/wireless/microchip/wilc1000/wlan.h
>> +++ b/drivers/net/wireless/microchip/wilc1000/wlan.h
>> @@ -403,6 +403,8 @@ struct wilc_hif_func {
>>   	void (*disable_interrupt)(struct wilc *nic);
>>   	int (*hif_reset)(struct wilc *wilc);
>>   	bool (*hif_is_init)(struct wilc *wilc);
>> +	void (*hif_claim)(struct wilc *wilc);
>> +	void (*hif_release)(struct wilc *wilc);
> 
> So IIUC, your series push the hif_cs lock into each bus layer of the driver,
> remove any explicit call to bus-specific locking mechanism from those layers,
> and makes the upper layer control the locking. As mentioned above, I don't
> understand why those layers can not manage the bus-specific locking by
> themselves (which would be a big win for the upper layer).

Because of acquire_bus()/release_bus() which I think is an attempt to 
serialize bus access across multiple complex operations (=commands sent 
to the card), see above.

> For SDIO specifically, I feel like letting the layer handle those claim/release
> would even allow to remove this hif_cs mutex (but we may still need a lock for
> SPI side)
> 
> But I may be missing something, so feel free to prove me wrong.
With spi_bus_lock()/unlock() we can actually dispose of the hif_cs mutex 
altogether.
kernel test robot Oct. 22, 2024, 8:33 p.m. UTC | #3
Hi Marek,

kernel test robot noticed the following build errors:

[auto build test ERROR on wireless-next/main]
[also build test ERROR on next-20241022]
[cannot apply to wireless/main linus/master v6.12-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Marek-Vasut/wifi-wilc1000-Rework-bus-locking/20241022-093954
base:   https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git main
patch link:    https://lore.kernel.org/r/20241022013855.284783-1-marex%40denx.de
patch subject: [PATCH] wifi: wilc1000: Rework bus locking
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20241023/202410230402.Cgu8obYR-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241023/202410230402.Cgu8obYR-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410230402.Cgu8obYR-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/net/wireless/microchip/wilc1000/spi.c:1113:20: error: no member named 'hif_cs' in 'struct wilc'
    1113 |         mutex_lock(&wilc->hif_cs);
         |                     ~~~~  ^
   include/linux/mutex.h:166:44: note: expanded from macro 'mutex_lock'
     166 | #define mutex_lock(lock) mutex_lock_nested(lock, 0)
         |                                            ^~~~
   drivers/net/wireless/microchip/wilc1000/spi.c:1118:22: error: no member named 'hif_cs' in 'struct wilc'
    1118 |         mutex_unlock(&wilc->hif_cs);
         |                       ~~~~  ^
   drivers/net/wireless/microchip/wilc1000/spi.c:1147:23: error: no member named 'hif_cs' in 'struct wilc'
    1147 |         mutex_destroy(&wilc->hif_cs);
         |                        ~~~~  ^
>> drivers/net/wireless/microchip/wilc1000/spi.c:1156:14: error: use of undeclared identifier 'spi'
    1156 |         mutex_init(&spi->hif_cs);
         |                     ^
>> drivers/net/wireless/microchip/wilc1000/spi.c:1156:14: error: use of undeclared identifier 'spi'
   5 errors generated.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for MODVERSIONS
   Depends on [n]: MODULES [=y] && !COMPILE_TEST [=y]
   Selected by [y]:
   - RANDSTRUCT_FULL [=y] && (CC_HAS_RANDSTRUCT [=y] || GCC_PLUGINS [=n]) && MODULES [=y]


vim +1113 drivers/net/wireless/microchip/wilc1000/spi.c

  1105	
  1106	/********************************************
  1107	 *
  1108	 *      Bus interfaces
  1109	 *
  1110	 ********************************************/
  1111	static void wilc_spi_claim(struct wilc *wilc)
  1112	{
> 1113		mutex_lock(&wilc->hif_cs);
  1114	}
  1115	
  1116	static void wilc_spi_release(struct wilc *wilc)
  1117	{
> 1118		mutex_unlock(&wilc->hif_cs);
  1119	}
  1120	
  1121	static int wilc_spi_reset(struct wilc *wilc)
  1122	{
  1123		struct spi_device *spi = to_spi_device(wilc->dev);
  1124		struct wilc_spi *spi_priv = wilc->bus_data;
  1125		int result;
  1126	
  1127		result = wilc_spi_special_cmd(wilc, CMD_RESET);
  1128		if (result && !spi_priv->probing_crc)
  1129			dev_err(&spi->dev, "Failed cmd reset\n");
  1130	
  1131		return result;
  1132	}
  1133	
  1134	static bool wilc_spi_is_init(struct wilc *wilc)
  1135	{
  1136		struct wilc_spi *spi_priv = wilc->bus_data;
  1137	
  1138		return spi_priv->isinit;
  1139	}
  1140	
  1141	static int wilc_spi_deinit(struct wilc *wilc)
  1142	{
  1143		struct wilc_spi *spi_priv = wilc->bus_data;
  1144	
  1145		spi_priv->isinit = false;
  1146		wilc_wlan_power(wilc, false);
  1147		mutex_destroy(&wilc->hif_cs);
  1148		return 0;
  1149	}
  1150	
  1151	static int wilc_spi_init(struct wilc *wilc, bool resume)
  1152	{
  1153		struct wilc_spi *spi_priv = wilc->bus_data;
  1154		int ret;
  1155	
> 1156		mutex_init(&spi->hif_cs);
  1157	
  1158		if (spi_priv->isinit) {
  1159			/* Confirm we can read chipid register without error: */
  1160			if (wilc_validate_chipid(wilc) == 0)
  1161				return 0;
  1162		}
  1163	
  1164		wilc_wlan_power(wilc, true);
  1165	
  1166		ret = wilc_spi_configure_bus_protocol(wilc);
  1167		if (ret) {
  1168			wilc_wlan_power(wilc, false);
  1169			return ret;
  1170		}
  1171	
  1172		spi_priv->isinit = true;
  1173	
  1174		return 0;
  1175	}
  1176
kernel test robot Oct. 23, 2024, 1:53 a.m. UTC | #4
Hi Marek,

kernel test robot noticed the following build errors:

[auto build test ERROR on wireless-next/main]
[also build test ERROR on next-20241022]
[cannot apply to wireless/main linus/master v6.12-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Marek-Vasut/wifi-wilc1000-Rework-bus-locking/20241022-093954
base:   https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git main
patch link:    https://lore.kernel.org/r/20241022013855.284783-1-marex%40denx.de
patch subject: [PATCH] wifi: wilc1000: Rework bus locking
config: loongarch-allmodconfig (https://download.01.org/0day-ci/archive/20241023/202410230937.zWSJkSuE-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241023/202410230937.zWSJkSuE-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410230937.zWSJkSuE-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/notifier.h:14,
                    from include/linux/clk.h:14,
                    from drivers/net/wireless/microchip/wilc1000/spi.c:7:
   drivers/net/wireless/microchip/wilc1000/spi.c: In function 'wilc_spi_claim':
>> drivers/net/wireless/microchip/wilc1000/spi.c:1113:25: error: 'struct wilc' has no member named 'hif_cs'
    1113 |         mutex_lock(&wilc->hif_cs);
         |                         ^~
   include/linux/mutex.h:166:44: note: in definition of macro 'mutex_lock'
     166 | #define mutex_lock(lock) mutex_lock_nested(lock, 0)
         |                                            ^~~~
   drivers/net/wireless/microchip/wilc1000/spi.c: In function 'wilc_spi_release':
   drivers/net/wireless/microchip/wilc1000/spi.c:1118:27: error: 'struct wilc' has no member named 'hif_cs'
    1118 |         mutex_unlock(&wilc->hif_cs);
         |                           ^~
   drivers/net/wireless/microchip/wilc1000/spi.c: In function 'wilc_spi_deinit':
   drivers/net/wireless/microchip/wilc1000/spi.c:1147:28: error: 'struct wilc' has no member named 'hif_cs'
    1147 |         mutex_destroy(&wilc->hif_cs);
         |                            ^~
   drivers/net/wireless/microchip/wilc1000/spi.c: In function 'wilc_spi_init':
>> drivers/net/wireless/microchip/wilc1000/spi.c:1156:21: error: 'spi' undeclared (first use in this function)
    1156 |         mutex_init(&spi->hif_cs);
         |                     ^~~
   include/linux/mutex.h:64:23: note: in definition of macro 'mutex_init'
      64 |         __mutex_init((mutex), #mutex, &__key);                          \
         |                       ^~~~~
   drivers/net/wireless/microchip/wilc1000/spi.c:1156:21: note: each undeclared identifier is reported only once for each function it appears in
    1156 |         mutex_init(&spi->hif_cs);
         |                     ^~~
   include/linux/mutex.h:64:23: note: in definition of macro 'mutex_init'
      64 |         __mutex_init((mutex), #mutex, &__key);                          \
         |                       ^~~~~
   drivers/net/wireless/microchip/wilc1000/spi.c: At top level:
>> drivers/net/wireless/microchip/wilc1000/spi.c:1116:13: warning: 'wilc_spi_release' defined but not used [-Wunused-function]
    1116 | static void wilc_spi_release(struct wilc *wilc)
         |             ^~~~~~~~~~~~~~~~
>> drivers/net/wireless/microchip/wilc1000/spi.c:1111:13: warning: 'wilc_spi_claim' defined but not used [-Wunused-function]
    1111 | static void wilc_spi_claim(struct wilc *wilc)
         |             ^~~~~~~~~~~~~~


vim +1113 drivers/net/wireless/microchip/wilc1000/spi.c

  1105	
  1106	/********************************************
  1107	 *
  1108	 *      Bus interfaces
  1109	 *
  1110	 ********************************************/
> 1111	static void wilc_spi_claim(struct wilc *wilc)
  1112	{
> 1113		mutex_lock(&wilc->hif_cs);
  1114	}
  1115	
> 1116	static void wilc_spi_release(struct wilc *wilc)
  1117	{
  1118		mutex_unlock(&wilc->hif_cs);
  1119	}
  1120	
  1121	static int wilc_spi_reset(struct wilc *wilc)
  1122	{
  1123		struct spi_device *spi = to_spi_device(wilc->dev);
  1124		struct wilc_spi *spi_priv = wilc->bus_data;
  1125		int result;
  1126	
  1127		result = wilc_spi_special_cmd(wilc, CMD_RESET);
  1128		if (result && !spi_priv->probing_crc)
  1129			dev_err(&spi->dev, "Failed cmd reset\n");
  1130	
  1131		return result;
  1132	}
  1133	
  1134	static bool wilc_spi_is_init(struct wilc *wilc)
  1135	{
  1136		struct wilc_spi *spi_priv = wilc->bus_data;
  1137	
  1138		return spi_priv->isinit;
  1139	}
  1140	
  1141	static int wilc_spi_deinit(struct wilc *wilc)
  1142	{
  1143		struct wilc_spi *spi_priv = wilc->bus_data;
  1144	
  1145		spi_priv->isinit = false;
  1146		wilc_wlan_power(wilc, false);
  1147		mutex_destroy(&wilc->hif_cs);
  1148		return 0;
  1149	}
  1150	
  1151	static int wilc_spi_init(struct wilc *wilc, bool resume)
  1152	{
  1153		struct wilc_spi *spi_priv = wilc->bus_data;
  1154		int ret;
  1155	
> 1156		mutex_init(&spi->hif_cs);
  1157	
  1158		if (spi_priv->isinit) {
  1159			/* Confirm we can read chipid register without error: */
  1160			if (wilc_validate_chipid(wilc) == 0)
  1161				return 0;
  1162		}
  1163	
  1164		wilc_wlan_power(wilc, true);
  1165	
  1166		ret = wilc_spi_configure_bus_protocol(wilc);
  1167		if (ret) {
  1168			wilc_wlan_power(wilc, false);
  1169			return ret;
  1170		}
  1171	
  1172		spi_priv->isinit = true;
  1173	
  1174		return 0;
  1175	}
  1176
Ajay Singh Oct. 23, 2024, 7:17 a.m. UTC | #5
Hi,

On 10/22/24 06:19, Marek Vasut wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the
> content is safe
> 
> On 10/22/24 12:43 PM, Alexis Lothoré wrote:
>> Hi Marek,
> 
> Hi,
> 
>> On 10/22/24 03:38, Marek Vasut wrote:
>>> The bus locking in this driver is broken and produces subtle race
>>> condition with ksdioirqd and its mmc_claim_host()/mmc_release_host()
>>> usage in case of SDIO bus. Rework the locking to avoid this race
>>> condition.
>>>
>>> The problem is the hif_cs mutex used in acquire_bus()/release_bus(),
>>> which makes it look like calling acquire_bus() results in exclusive
>>> access to the bus, but that is not true for SDIO bus. For SDIO bus,
>>> to obtain exclusive access (any access, really), it is necessary to
>>> call sdio_claim_host(), which is a wrapper around mmc_claim_host(),
>>> which does its own locking. The acquire_bus() does not do that, but
>>> the SDIO interface implementation does call sdio_claim_host() and
>>> sdio_release_host() every single command, which is problematic. To
>>> make things worse, wilc_sdio_interrupt() implementation called from
>>> ksdioirqd first calls sdio_release_host(), then interrupt handling
>>> and finally sdio_claim_host().
>>>
>>> The core problem is that sdio_claim_host() cannot be done per command,
>>> but has to be done per register/data IO which consists of multiple
>>> commands.
>>
>> Is it really true ? What makes you say that we can not perform multiple
>> operations under the same exclusive sdio section ?
> 
> What I am trying to say is this:
> 
> With current code, this can happen, which is not good, because transfers
> from multiple threads can be interleaved and interfere with each other:

Did you observe any bus failure in your test setup with SDIO. Is there any
configure to recreate it.

The SDIO transfer may appear to be split into into multiple transaction but
these calls should not impact each other since most of them are updating the
different registers except WILC_SDIO_FBR_CSA_REG register, which is used for
CSA read/write. If needed, wilc_sdio_set_func0_csa_address() can be modified
to club the 3x CMD52 and 1x CMD53 into a single transfer API.

In my opinion, If sdio_claim_host() is moved to acquire_bus() API then the
SDIO bus will be claimed longer than required especially in
wilc_wlan_handle_txq() API. Ideally, calling sdio_claim_host() call just
before the transfer is enough but now the SDIO I/O bus would be continuously
blocked for multiple independent SDIO transactions that is not necessary.

> 
> thread 1                         thread2
> do_some_higher_level_op() {
>  ...
>  read_register_0x3b0000() {
>   claim_bus
>   CMD52 0x00
>   release bus                    ksdioirqd() {
>                                    claim_bus
>                                    CMD52 0x0f, lets read SDIO_CCCR_INTx
>                                    release_bus
>   claim bus                      }
>   CMD52 0x00
>   release_bus
>   claim_bus
>   CMD52 0x3b
>   release_bus
>   claim_bus
>   CMD53 lets read data
>   release_bus
>  }
>  ...
> }
> 
> What should happen is either:
> 
> thread 1                         thread2
>                                  ksdioirqd() { // option 1
>                                    claim_bus
>                                    CMD52 0x0f, lets read SDIO_CCCR_INTx
>                                    release_bus
>                                  }
> do_some_higher_level_op() {
>  claim_bus
>  ...
>  read_register_0x3b0000 {
>   CMD52 0x00
>   CMD52 0x00
>   CMD52 0x3b
>   CMD53 lets read data
>  }
>  ...
>  read_another_register()
>  ...
>  release_bus
> }
>                                  ksdioirqd() { // option 2
>                                    claim_bus
>                                    CMD52 0x0f, lets read SDIO_CCCR_INTx
>                                    release_bus
>                                  }
> 
> That's what this patch implements, to avoid the interference.
> 
> Maybe I should include the infographics? Or reword this somehow?
> 
>> Usually the WILC register read/write consists of 3x CMD52
>>> to push in CSA pointer address and 1x CMD53 to read/write data to that
>>> address. Most other accesses are also composed of multiple commands.
>>>
>>> Currently, if ksdioirqd wakes up and attempts to read SDIO_CCCR_INTx
>>> to get pending SDIO IRQs in sdio_get_pending_irqs(), it can easily
>>> perform that transfer between two consecutive CMD52 which are pushing
>>> in the CSA pointer address and possibly disrupt the WILC operation.
>>> This is undesired behavior.
>>
>> I agree about the observation, and then I disagree about the statement above on
>> sdio_claim_host/sdio_release_host not meant to be used for multiple commands.
> 
> I think we have a misunderstanding here, see above.
> 
>> I see plenty of sdio wireless drivers performing multiple sdio operations under
>> the same sdio exclusive bus access section, either explicitely in their
>> code, or
>> through a sdio dedicated helper (eg: sdio_enable_func, sdio_disable_func).
>>
>> But more concerns below
>>>
>>> Rework the locking.
>>>
>>> Introduce new .hif_claim/.hif_release callbacks which implement bus
>>> specific locking. Lock/unlock SDIO bus access using sdio_claim_host()
>>> and sdio_release_host(), lock/unlock SPI bus access using the current
>>> hif_cs mutex moved purely into the spi.c interface. Make acquire_bus()
>>> and release_bus() call the .hif_claim/.hif_release() callbacks and do
>>> not access the hif_cs mutex from there at all.
>>>
>>> Remove any SDIO bus locking used directly in commands and the broken
>>> SDIO bus unlocking in wilc_sdio_interrupt(), this is no longer needed.
>>> Fix up SDIO initialization code which newly needs sdio_claim_host()
>>> and sdio_release_host(), since it cannot depend on the locking being
>>> done per-command anymore.
>>>
>>> Signed-off-by: Marek Vasut <marex@denx.de>
>>
>> [...]
>>
>>>
>>> -static void wilc_sdio_interrupt(struct sdio_func *func)
>>> +static void wilc_sdio_claim(struct wilc *wilc)
>>> +{
>>> +    struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev);
>>> +
>>> +    sdio_claim_host(func);
>>> +}
>>> +
>>> +static void wilc_sdio_release(struct wilc *wilc)
>>>   {
>>> +    struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev);
>>> +
>>>      sdio_release_host(func);
>>> +}
>>
>> So with this series, we end up with some bus-specific operations needing some
>> locking, but is now up to the upper layer to control this locking. This feels
>> wrong.
> 
> It always was the upper layer (wlan.c) that attempted to do bus locking,
> except it was incomplete. The acquire_bus()/release_bus() primitives
> seems to be an attempt at serializing bus access across multiple
> operations (which really boils down to multiple SPI or SDIO commands).
> 
> The problem is, acquire_bus()/release_bus() does not really work,
> because it does not prevent e.g. ksdioirqd from inserting a command on
> the SDIO bus. SDIO (or really, mmc framework) has its own way of doing
> bus locking, that's the sdio_claim_host()/sdio_release_host(), SPI does
> have spi_bus_lock()/spi_bus_unlock() which I should use in V2.
> 
> Those sdio_claim_host()/sdio_release_host() and
> spi_bus_lock()/spi_bus_unlock() should be called in
> acquire_bus()/release_bus() to correctly serialize bus access across
> multiple operations. That will also eliminate the custom and not really
> functional hif_cs mutex.
> 
>> The driver has a dedicated sdio layer, so if we need some locking for
>> sdio-specific operations, it should be handled autonomously by the sdio layer,
>> right ?
> 
> Not quite, I don't think the intention was to let anything communicate
> with the WILC device within block "protected" by
> acquire_bus()/release_bus() pair. That's why I believe this is where bus
> lock and unlock should happen too.
> 
>> [...]
>>
>>>   static int wilc_wlan_cfg_commit(struct wilc_vif *vif, int type,
>>> diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.h
>>> b/drivers/net/wireless/microchip/wilc1000/wlan.h
>>> index b9e7f9222eadd..ade2db95e8a0f 100644
>>> --- a/drivers/net/wireless/microchip/wilc1000/wlan.h
>>> +++ b/drivers/net/wireless/microchip/wilc1000/wlan.h
>>> @@ -403,6 +403,8 @@ struct wilc_hif_func {
>>>      void (*disable_interrupt)(struct wilc *nic);
>>>      int (*hif_reset)(struct wilc *wilc);
>>>      bool (*hif_is_init)(struct wilc *wilc);
>>> +    void (*hif_claim)(struct wilc *wilc);
>>> +    void (*hif_release)(struct wilc *wilc);
>>
>> So IIUC, your series push the hif_cs lock into each bus layer of the driver,
>> remove any explicit call to bus-specific locking mechanism from those layers,
>> and makes the upper layer control the locking. As mentioned above, I don't
>> understand why those layers can not manage the bus-specific locking by
>> themselves (which would be a big win for the upper layer).
> 
> Because of acquire_bus()/release_bus() which I think is an attempt to
> serialize bus access across multiple complex operations (=commands sent
> to the card), see above.
> 
>> For SDIO specifically, I feel like letting the layer handle those claim/release
>> would even allow to remove this hif_cs mutex (but we may still need a lock for
>> SPI side)
>>
>> But I may be missing something, so feel free to prove me wrong.
> With spi_bus_lock()/unlock() we can actually dispose of the hif_cs mutex
> altogether.
Alexis Lothoré Oct. 23, 2024, 7:54 a.m. UTC | #6
Hello Marek,

On 10/22/24 15:19, Marek Vasut wrote:
> On 10/22/24 12:43 PM, Alexis Lothoré wrote:
>> Hi Marek,
> 
> Hi,
> 
>> On 10/22/24 03:38, Marek Vasut wrote:
>>> The bus locking in this driver is broken and produces subtle race
>>> condition with ksdioirqd and its mmc_claim_host()/mmc_release_host()
>>> usage in case of SDIO bus. Rework the locking to avoid this race
>>> condition.
>>>
>>> The problem is the hif_cs mutex used in acquire_bus()/release_bus(),
>>> which makes it look like calling acquire_bus() results in exclusive
>>> access to the bus, but that is not true for SDIO bus. For SDIO bus,
>>> to obtain exclusive access (any access, really), it is necessary to
>>> call sdio_claim_host(), which is a wrapper around mmc_claim_host(),
>>> which does its own locking. The acquire_bus() does not do that, but
>>> the SDIO interface implementation does call sdio_claim_host() and
>>> sdio_release_host() every single command, which is problematic. To
>>> make things worse, wilc_sdio_interrupt() implementation called from
>>> ksdioirqd first calls sdio_release_host(), then interrupt handling
>>> and finally sdio_claim_host().
>>>
>>> The core problem is that sdio_claim_host() cannot be done per command,
>>> but has to be done per register/data IO which consists of multiple
>>> commands.
>>
>> Is it really true ? What makes you say that we can not perform multiple
>> operations under the same exclusive sdio section ?
> 
> What I am trying to say is this:
> 
> With current code, this can happen, which is not good, because transfers from
> multiple threads can be interleaved and interfere with each other:
> 
> thread 1                         thread2
> do_some_higher_level_op() {
>  ...
>  read_register_0x3b0000() {
>   claim_bus
>   CMD52 0x00
>   release bus                    ksdioirqd() {
>                                    claim_bus
>                                    CMD52 0x0f, lets read SDIO_CCCR_INTx
>                                    release_bus
>   claim bus                      }
>   CMD52 0x00
>   release_bus
>   claim_bus
>   CMD52 0x3b
>   release_bus
>   claim_bus
>   CMD53 lets read data
>   release_bus
>  }
>  ...
> }
> 
> What should happen is either:
> 
> thread 1                         thread2
>                                  ksdioirqd() { // option 1
>                                    claim_bus
>                                    CMD52 0x0f, lets read SDIO_CCCR_INTx
>                                    release_bus
>                                  }
> do_some_higher_level_op() {
>  claim_bus
>  ...
>  read_register_0x3b0000 {
>   CMD52 0x00
>   CMD52 0x00
>   CMD52 0x3b
>   CMD53 lets read data
>  }
>  ...
>  read_another_register()
>  ...
>  release_bus
> }
>                                  ksdioirqd() { // option 2
>                                    claim_bus
>                                    CMD52 0x0f, lets read SDIO_CCCR_INTx
>                                    release_bus
>                                  }
> 
> That's what this patch implements, to avoid the interference.
> 
> Maybe I should include the infographics? Or reword this somehow?

What I may have misunderstood is your first sentence ("sdio_claim_host() cannot
be done per command, but has to be done per register/data IO which consists of
multiple commands", especially command VS reg/data io), but your graph clarified
it for me, thanks, so in the end we agree on this :) That may just be me having
poorly interpreted, so no need to add the graphs to the commit

[...]

>>>   static int wilc_wlan_cfg_commit(struct wilc_vif *vif, int type,
>>> diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.h b/drivers/net/
>>> wireless/microchip/wilc1000/wlan.h
>>> index b9e7f9222eadd..ade2db95e8a0f 100644
>>> --- a/drivers/net/wireless/microchip/wilc1000/wlan.h
>>> +++ b/drivers/net/wireless/microchip/wilc1000/wlan.h
>>> @@ -403,6 +403,8 @@ struct wilc_hif_func {
>>>       void (*disable_interrupt)(struct wilc *nic);
>>>       int (*hif_reset)(struct wilc *wilc);
>>>       bool (*hif_is_init)(struct wilc *wilc);
>>> +    void (*hif_claim)(struct wilc *wilc);
>>> +    void (*hif_release)(struct wilc *wilc);
>>
>> So IIUC, your series push the hif_cs lock into each bus layer of the driver,
>> remove any explicit call to bus-specific locking mechanism from those layers,
>> and makes the upper layer control the locking. As mentioned above, I don't
>> understand why those layers can not manage the bus-specific locking by
>> themselves (which would be a big win for the upper layer).
> 
> Because of acquire_bus()/release_bus() which I think is an attempt to serialize
> bus access across multiple complex operations (=commands sent to the card), see
> above.

Taking a further look at some examples in the driver, I see that indeed the
"scope" of acquire_bus/release_bus is larger than simple bus operations. So I
withdraw my proposal which was wrong.

Thanks,

Alexis
Marek Vasut Oct. 23, 2024, 2:26 p.m. UTC | #7
On 10/23/24 9:54 AM, Alexis Lothoré wrote:

Hello Alexis,

>>                                   ksdioirqd() { // option 2
>>                                     claim_bus
>>                                     CMD52 0x0f, lets read SDIO_CCCR_INTx
>>                                     release_bus
>>                                   }
>>
>> That's what this patch implements, to avoid the interference.
>>
>> Maybe I should include the infographics? Or reword this somehow?
> 
> What I may have misunderstood is your first sentence ("sdio_claim_host() cannot
> be done per command, but has to be done per register/data IO which consists of
> multiple commands", especially command VS reg/data io), but your graph clarified
> it for me, thanks, so in the end we agree on this :) That may just be me having
> poorly interpreted, so no need to add the graphs to the commit

You're welcome. As long as we can understand each other with one extra 
round trip, all is good :)

> [...]
> 
>>>>    static int wilc_wlan_cfg_commit(struct wilc_vif *vif, int type,
>>>> diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.h b/drivers/net/
>>>> wireless/microchip/wilc1000/wlan.h
>>>> index b9e7f9222eadd..ade2db95e8a0f 100644
>>>> --- a/drivers/net/wireless/microchip/wilc1000/wlan.h
>>>> +++ b/drivers/net/wireless/microchip/wilc1000/wlan.h
>>>> @@ -403,6 +403,8 @@ struct wilc_hif_func {
>>>>        void (*disable_interrupt)(struct wilc *nic);
>>>>        int (*hif_reset)(struct wilc *wilc);
>>>>        bool (*hif_is_init)(struct wilc *wilc);
>>>> +    void (*hif_claim)(struct wilc *wilc);
>>>> +    void (*hif_release)(struct wilc *wilc);
>>>
>>> So IIUC, your series push the hif_cs lock into each bus layer of the driver,
>>> remove any explicit call to bus-specific locking mechanism from those layers,
>>> and makes the upper layer control the locking. As mentioned above, I don't
>>> understand why those layers can not manage the bus-specific locking by
>>> themselves (which would be a big win for the upper layer).
>>
>> Because of acquire_bus()/release_bus() which I think is an attempt to serialize
>> bus access across multiple complex operations (=commands sent to the card), see
>> above.
> 
> Taking a further look at some examples in the driver, I see that indeed the
> "scope" of acquire_bus/release_bus is larger than simple bus operations. So I
> withdraw my proposal which was wrong.
All right.
Marek Vasut Oct. 23, 2024, 2:44 p.m. UTC | #8
On 10/23/24 9:17 AM, Ajay.Kathat@microchip.com wrote:

Hello Ajay,

>> What I am trying to say is this:
>>
>> With current code, this can happen, which is not good, because transfers
>> from multiple threads can be interleaved and interfere with each other:
> 
> Did you observe any bus failure in your test setup with SDIO. Is there any
> configure to recreate it.

I am observing sporadic command and data CRC errors on STM32MP157F 
system with SDIO WILC3000.

> The SDIO transfer may appear to be split into into multiple transaction but
> these calls should not impact each other since most of them are updating the
> different registers except WILC_SDIO_FBR_CSA_REG register, which is used for
> CSA read/write. If needed, wilc_sdio_set_func0_csa_address() can be modified
> to club the 3x CMD52 and 1x CMD53 into a single transfer API.
> 
> In my opinion, If sdio_claim_host() is moved to acquire_bus() API then the
> SDIO bus will be claimed longer than required especially in
> wilc_wlan_handle_txq() API. Ideally, calling sdio_claim_host() call just
> before the transfer is enough but now the SDIO I/O bus would be continuously
> blocked for multiple independent SDIO transactions that is not necessary.

Why would that pose a problem ?

I am more concerned that ksdioirqd can insert a command transfer right 
in the middle of CMD52/CMD53 register read composite transfer, because 
while ksdioirqd does use proper sdio_claim/release_host, this driver 
does it per-SDIO-command instead of per the whole e.g. register read 
"transaction".

[...]
Ajay Singh Oct. 23, 2024, 5:54 p.m. UTC | #9
Hello Marek,


On 10/23/24 07:44, Marek Vasut wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the
> content is safe
> 
> On 10/23/24 9:17 AM, Ajay.Kathat@microchip.com wrote:
> 
> Hello Ajay,
> 
>>> What I am trying to say is this:
>>>
>>> With current code, this can happen, which is not good, because transfers
>>> from multiple threads can be interleaved and interfere with each other:
>>
>> Did you observe any bus failure in your test setup with SDIO. Is there any
>> configure to recreate it.
> 
> I am observing sporadic command and data CRC errors on STM32MP157F
> system with SDIO WILC3000.

Does this patch help to resolve the CRC errors?

Do you observe the CRC errors during the bulk data transfer(iPerf) or does it
even happen during the basic WiFi operation like(scan/connect or basic ping
operation).
Is power-save enabled during the test. With PS enabled, The SDIO commands may
fail momentarily but it should recover.

> 
>> The SDIO transfer may appear to be split into into multiple transaction but
>> these calls should not impact each other since most of them are updating the
>> different registers except WILC_SDIO_FBR_CSA_REG register, which is used for
>> CSA read/write. If needed, wilc_sdio_set_func0_csa_address() can be modified
>> to club the 3x CMD52 and 1x CMD53 into a single transfer API.
>>
>> In my opinion, If sdio_claim_host() is moved to acquire_bus() API then the
>> SDIO bus will be claimed longer than required especially in
>> wilc_wlan_handle_txq() API. Ideally, calling sdio_claim_host() call just
>> before the transfer is enough but now the SDIO I/O bus would be continuously
>> blocked for multiple independent SDIO transactions that is not necessary.
> 
> Why would that pose a problem ?

wilc_wlan_handle_txq() performs many operations on different registers which
are not related. It will block the SDIO bux for longer. Otherwise those
registers are allowed to update separately from the WILC SD side.

> 
> I am more concerned that ksdioirqd can insert a command transfer right
> in the middle of CMD52/CMD53 register read composite transfer, because
> while ksdioirqd does use proper sdio_claim/release_host, this driver
> does it per-SDIO-command instead of per the whole e.g. register read
> "transaction".
> 

I think, using sdio_claim/release for each-SDIO command should suffice because
the CMD52/CMD53 modify the specific registers that are unrelated. Each
CMD52/53 should work properly and independently provided they are protected
with sdio_claim/release.
Additionally, there is no WILC SD module limitation requiring a strict order
for CMD52/CMD53, except for Card Storage Area (CSA) read/write operations.

For CSA read/write operations, which are necessary to read/write any specific
address from the card, multiple CMD52 commands are used to pass the desired
address to be read/written using registers (0x10c, 0x10d, 0x10e). Then, CMD53
is used to read/write to address 0x10f (CSA data window register). This
complete command sequence is required for a single CSA address read/write.
Based on this requirement, CSA read/write operations can cause issues if they
run in parallel with another CSA read/write operation, but not with other
CMD52 and CMD53 commands.
Therefore, one approach to resolve this issue is to add sdio_claim/release
around wilc_sdio_set_func0_csa_address(). This way, this function will be
treated as a single operation and it will only modify the required command flow.


Regards,
Ajay
Marek Vasut Oct. 23, 2024, 6:47 p.m. UTC | #10
On 10/23/24 7:54 PM, Ajay.Kathat@microchip.com wrote:
> Hello Marek,

Hi,

>>>> What I am trying to say is this:
>>>>
>>>> With current code, this can happen, which is not good, because transfers
>>>> from multiple threads can be interleaved and interfere with each other:
>>>
>>> Did you observe any bus failure in your test setup with SDIO. Is there any
>>> configure to recreate it.
>>
>> I am observing sporadic command and data CRC errors on STM32MP157F
>> system with SDIO WILC3000.
> 
> Does this patch help to resolve the CRC errors?

No

> Do you observe the CRC errors during the bulk data transfer(iPerf) or does it
> even happen during the basic WiFi operation like(scan/connect or basic ping
> operation).

I can trigger them during this test:

$ while true ; ifconfig wlan0 up ; ifconfig wlan0 down ; done

But they happen sporadically and seemingly at random.

I already stuffed the driver with trace_printk()s through and through, 
but the trace log does not indicate anything that would be off in any way.

> Is power-save enabled during the test. With PS enabled, The SDIO commands may
> fail momentarily but it should recover.

It seems it gets enabled after first ifconfig up, that's a good hint, 
I'll try to disable it and see if that makes them errors go away. Thanks!

Do you have any details on WHY would such sporadic errors occur and how 
to make those go away even with PS enabled ?

>>> The SDIO transfer may appear to be split into into multiple transaction but
>>> these calls should not impact each other since most of them are updating the
>>> different registers except WILC_SDIO_FBR_CSA_REG register, which is used for
>>> CSA read/write. If needed, wilc_sdio_set_func0_csa_address() can be modified
>>> to club the 3x CMD52 and 1x CMD53 into a single transfer API.
>>>
>>> In my opinion, If sdio_claim_host() is moved to acquire_bus() API then the
>>> SDIO bus will be claimed longer than required especially in
>>> wilc_wlan_handle_txq() API. Ideally, calling sdio_claim_host() call just
>>> before the transfer is enough but now the SDIO I/O bus would be continuously
>>> blocked for multiple independent SDIO transactions that is not necessary.
>>
>> Why would that pose a problem ?
> 
> wilc_wlan_handle_txq() performs many operations on different registers which
> are not related. It will block the SDIO bux for longer. Otherwise those
> registers are allowed to update separately from the WILC SD side.

And is that blocking of SD bus actually a problem ?

>> I am more concerned that ksdioirqd can insert a command transfer right
>> in the middle of CMD52/CMD53 register read composite transfer, because
>> while ksdioirqd does use proper sdio_claim/release_host, this driver
>> does it per-SDIO-command instead of per the whole e.g. register read
>> "transaction".
>>
> 
> I think, using sdio_claim/release for each-SDIO command should suffice because
> the CMD52/CMD53 modify the specific registers that are unrelated. Each
> CMD52/53 should work properly and independently provided they are protected
> with sdio_claim/release.
> Additionally, there is no WILC SD module limitation requiring a strict order
> for CMD52/CMD53, except for Card Storage Area (CSA) read/write operations.

Can multiple register IO operations like that be interleaved with 
ksdioirqd access to SDIO_CCCR_INTx too ?

Take e.g. wilc_wlan_handle_txq(), is the following order legal?

thread1                                | thread2
ret = func->hif_write_reg(wilc,        |
            WILC_CORTUS_INTERRUPT_BASE, |
            1);                         |
                                        | ksdioirqd {
                                        | ret = mmc_io_rw_direct(card, 0,
                                        | 0, SDIO_CCCR_INTx, 0, pending);
                                        | }
ret = func->hif_read_reg(wilc,         |
            WILC_CORTUS_INTERRUPT_BASE, |
            &reg);                      |

> For CSA read/write operations, which are necessary to read/write any specific
> address from the card, multiple CMD52 commands are used to pass the desired
> address to be read/written using registers (0x10c, 0x10d, 0x10e). Then, CMD53
> is used to read/write to address 0x10f (CSA data window register). This
> complete command sequence is required for a single CSA address read/write.
> Based on this requirement, CSA read/write operations can cause issues if they
> run in parallel with another CSA read/write operation, but not with other
> CMD52 and CMD53 commands.

Unrelated to this, but ... is it possible to send the entire 3-Byte 
register CSA address using a single command instead of 3 separate commands ?

> Therefore, one approach to resolve this issue is to add sdio_claim/release
> around wilc_sdio_set_func0_csa_address(). This way, this function will be
> treated as a single operation and it will only modify the required command flow.

I can do the serialization per-command, but please see above.
diff mbox series

Patch

diff --git a/drivers/net/wireless/microchip/wilc1000/cfg80211.c b/drivers/net/wireless/microchip/wilc1000/cfg80211.c
index b0dae6f7c633b..9a9fc8e8c8354 100644
--- a/drivers/net/wireless/microchip/wilc1000/cfg80211.c
+++ b/drivers/net/wireless/microchip/wilc1000/cfg80211.c
@@ -1730,7 +1730,6 @@  static const struct cfg80211_ops wilc_cfg80211_ops = {
 
 static void wlan_init_locks(struct wilc *wl)
 {
-	mutex_init(&wl->hif_cs);
 	mutex_init(&wl->rxq_cs);
 	mutex_init(&wl->cfg_cmd_lock);
 	mutex_init(&wl->vif_mutex);
@@ -1748,7 +1747,6 @@  static void wlan_init_locks(struct wilc *wl)
 
 void wlan_deinit_locks(struct wilc *wilc)
 {
-	mutex_destroy(&wilc->hif_cs);
 	mutex_destroy(&wilc->rxq_cs);
 	mutex_destroy(&wilc->cfg_cmd_lock);
 	mutex_destroy(&wilc->txq_add_to_head_cs);
diff --git a/drivers/net/wireless/microchip/wilc1000/netdev.c b/drivers/net/wireless/microchip/wilc1000/netdev.c
index 7e84fc0fd9118..22c91a0b9b648 100644
--- a/drivers/net/wireless/microchip/wilc1000/netdev.c
+++ b/drivers/net/wireless/microchip/wilc1000/netdev.c
@@ -468,9 +468,7 @@  static void wilc_wlan_deinitialize(struct net_device *dev)
 
 		if (!wl->dev_irq_num &&
 		    wl->hif_func->disable_interrupt) {
-			mutex_lock(&wl->hif_cs);
 			wl->hif_func->disable_interrupt(wl);
-			mutex_unlock(&wl->hif_cs);
 		}
 		complete(&wl->txq_event);
 
diff --git a/drivers/net/wireless/microchip/wilc1000/netdev.h b/drivers/net/wireless/microchip/wilc1000/netdev.h
index 95bc8b8fe65a5..8bdc27edf00af 100644
--- a/drivers/net/wireless/microchip/wilc1000/netdev.h
+++ b/drivers/net/wireless/microchip/wilc1000/netdev.h
@@ -240,9 +240,6 @@  struct wilc {
 	/* protect rxq_entry_t receiver queue */
 	struct mutex rxq_cs;
 
-	/* lock to protect hif access */
-	struct mutex hif_cs;
-
 	struct completion cfg_event;
 	struct completion sync_event;
 	struct completion txq_event;
diff --git a/drivers/net/wireless/microchip/wilc1000/sdio.c b/drivers/net/wireless/microchip/wilc1000/sdio.c
index 170470d1c2092..0796e2ccfe32e 100644
--- a/drivers/net/wireless/microchip/wilc1000/sdio.c
+++ b/drivers/net/wireless/microchip/wilc1000/sdio.c
@@ -56,11 +56,23 @@  struct sdio_cmd53 {
 
 static const struct wilc_hif_func wilc_hif_sdio;
 
-static void wilc_sdio_interrupt(struct sdio_func *func)
+static void wilc_sdio_claim(struct wilc *wilc)
+{
+	struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev);
+
+	sdio_claim_host(func);
+}
+
+static void wilc_sdio_release(struct wilc *wilc)
 {
+	struct sdio_func *func = container_of(wilc->dev, struct sdio_func, dev);
+
 	sdio_release_host(func);
+}
+
+static void wilc_sdio_interrupt(struct sdio_func *func)
+{
 	wilc_handle_isr(sdio_get_drvdata(func));
-	sdio_claim_host(func);
 }
 
 static int wilc_sdio_cmd52(struct wilc *wilc, struct sdio_cmd52 *cmd)
@@ -69,8 +81,6 @@  static int wilc_sdio_cmd52(struct wilc *wilc, struct sdio_cmd52 *cmd)
 	int ret;
 	u8 data;
 
-	sdio_claim_host(func);
-
 	func->num = cmd->function;
 	if (cmd->read_write) {  /* write */
 		if (cmd->raw) {
@@ -85,8 +95,6 @@  static int wilc_sdio_cmd52(struct wilc *wilc, struct sdio_cmd52 *cmd)
 		cmd->data = data;
 	}
 
-	sdio_release_host(func);
-
 	if (ret)
 		dev_err(&func->dev, "%s..failed, err(%d)\n", __func__, ret);
 	return ret;
@@ -99,8 +107,6 @@  static int wilc_sdio_cmd53(struct wilc *wilc, struct sdio_cmd53 *cmd)
 	struct wilc_sdio *sdio_priv = wilc->bus_data;
 	u8 *buf = cmd->buffer;
 
-	sdio_claim_host(func);
-
 	func->num = cmd->function;
 	func->cur_blksize = cmd->block_size;
 	if (cmd->block_mode)
@@ -128,8 +134,6 @@  static int wilc_sdio_cmd53(struct wilc *wilc, struct sdio_cmd53 *cmd)
 			memcpy(cmd->buffer, buf, size);
 	}
 out:
-	sdio_release_host(func);
-
 	if (ret)
 		dev_err(&func->dev, "%s..failed, err(%d)\n", __func__,  ret);
 
@@ -180,9 +184,11 @@  static int wilc_sdio_probe(struct sdio_func *func,
 		goto dispose_irq;
 	}
 
+	wilc_sdio_claim(wilc);
 	wilc_sdio_init(wilc, false);
 
 	ret = wilc_get_chipid(wilc);
+	wilc_sdio_release(wilc);
 	if (ret)
 		goto dispose_irq;
 
@@ -196,7 +202,9 @@  static int wilc_sdio_probe(struct sdio_func *func,
 		goto dispose_irq;
 	}
 
+	wilc_sdio_claim(wilc);
 	wilc_sdio_deinit(wilc);
+	wilc_sdio_release(wilc);
 
 	vif = wilc_netdev_ifc_init(wilc, "wlan%d", WILC_STATION_MODE,
 				   NL80211_IFTYPE_STATION, false);
@@ -258,9 +266,9 @@  static int wilc_sdio_enable_interrupt(struct wilc *dev)
 	struct sdio_func *func = container_of(dev->dev, struct sdio_func, dev);
 	int ret = 0;
 
-	sdio_claim_host(func);
+	wilc_sdio_claim(dev);
 	ret = sdio_claim_irq(func, wilc_sdio_interrupt);
-	sdio_release_host(func);
+	wilc_sdio_release(dev);
 
 	if (ret < 0) {
 		dev_err(&func->dev, "can't claim sdio_irq, err(%d)\n", ret);
@@ -274,11 +282,11 @@  static void wilc_sdio_disable_interrupt(struct wilc *dev)
 	struct sdio_func *func = container_of(dev->dev, struct sdio_func, dev);
 	int ret;
 
-	sdio_claim_host(func);
+	wilc_sdio_claim(dev);
 	ret = sdio_release_irq(func);
 	if (ret < 0)
 		dev_err(&func->dev, "can't release sdio_irq, err(%d)\n", ret);
-	sdio_release_host(func);
+	wilc_sdio_release(dev);
 }
 
 /********************************************
@@ -1013,6 +1021,8 @@  static const struct wilc_hif_func wilc_hif_sdio = {
 	.disable_interrupt = wilc_sdio_disable_interrupt,
 	.hif_reset = wilc_sdio_reset,
 	.hif_is_init = wilc_sdio_is_init,
+	.hif_claim = wilc_sdio_claim,
+	.hif_release = wilc_sdio_release,
 };
 
 static int wilc_sdio_suspend(struct device *dev)
@@ -1053,7 +1063,9 @@  static int wilc_sdio_resume(struct device *dev)
 	if (!IS_ERR(wilc->rtc_clk))
 		clk_prepare_enable(wilc->rtc_clk);
 
+	wilc_sdio_claim(wilc);
 	wilc_sdio_init(wilc, true);
+	wilc_sdio_release(wilc);
 	wilc_sdio_enable_interrupt(wilc);
 
 	return host_wakeup_notify(wilc);
diff --git a/drivers/net/wireless/microchip/wilc1000/spi.c b/drivers/net/wireless/microchip/wilc1000/spi.c
index ce2a9cdd6aa78..7af25ea6068f4 100644
--- a/drivers/net/wireless/microchip/wilc1000/spi.c
+++ b/drivers/net/wireless/microchip/wilc1000/spi.c
@@ -50,6 +50,9 @@  struct wilc_spi {
 		struct gpio_desc *enable;	/* ENABLE GPIO or NULL */
 		struct gpio_desc *reset;	/* RESET GPIO or NULL */
 	} gpios;
+
+	/* lock to protect hif access */
+	struct mutex hif_cs;
 };
 
 static const struct wilc_hif_func wilc_hif_spi;
@@ -1105,6 +1108,15 @@  static int wilc_spi_write(struct wilc *wilc, u32 addr, u8 *buf, u32 size)
  *      Bus interfaces
  *
  ********************************************/
+static void wilc_spi_claim(struct wilc *wilc)
+{
+	mutex_lock(&wilc->hif_cs);
+}
+
+static void wilc_spi_release(struct wilc *wilc)
+{
+	mutex_unlock(&wilc->hif_cs);
+}
 
 static int wilc_spi_reset(struct wilc *wilc)
 {
@@ -1132,6 +1144,7 @@  static int wilc_spi_deinit(struct wilc *wilc)
 
 	spi_priv->isinit = false;
 	wilc_wlan_power(wilc, false);
+	mutex_destroy(&wilc->hif_cs);
 	return 0;
 }
 
@@ -1140,6 +1153,8 @@  static int wilc_spi_init(struct wilc *wilc, bool resume)
 	struct wilc_spi *spi_priv = wilc->bus_data;
 	int ret;
 
+	mutex_init(&spi->hif_cs);
+
 	if (spi_priv->isinit) {
 		/* Confirm we can read chipid register without error: */
 		if (wilc_validate_chipid(wilc) == 0)
diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.c b/drivers/net/wireless/microchip/wilc1000/wlan.c
index 9d80adc45d6be..b149734f19a05 100644
--- a/drivers/net/wireless/microchip/wilc1000/wlan.c
+++ b/drivers/net/wireless/microchip/wilc1000/wlan.c
@@ -767,25 +767,37 @@  static int chip_wakeup(struct wilc *wilc)
 
 static inline int acquire_bus(struct wilc *wilc, enum bus_acquire acquire)
 {
-	int ret = 0;
+	const struct wilc_hif_func *hif_func = wilc->hif_func;
+	int ret;
 
-	mutex_lock(&wilc->hif_cs);
-	if (acquire == WILC_BUS_ACQUIRE_AND_WAKEUP && wilc->power_save_mode) {
-		ret = chip_wakeup(wilc);
-		if (ret)
-			mutex_unlock(&wilc->hif_cs);
-	}
+	hif_func->hif_claim(wilc);
+
+	if (!wilc->power_save_mode)
+		return 0;
 
+	if (acquire != WILC_BUS_ACQUIRE_AND_WAKEUP)
+		return 0;
+
+	ret = chip_wakeup(wilc);
+	if (ret)
+		goto err;
+
+	return 0;
+
+err:
+	hif_func->hif_release(wilc);
 	return ret;
 }
 
 static inline int release_bus(struct wilc *wilc, enum bus_release release)
 {
+	const struct wilc_hif_func *hif_func = wilc->hif_func;
 	int ret = 0;
 
 	if (release == WILC_BUS_RELEASE_ALLOW_SLEEP && wilc->power_save_mode)
 		ret = chip_allow_sleep(wilc);
-	mutex_unlock(&wilc->hif_cs);
+
+	hif_func->hif_release(wilc);
 
 	return ret;
 }
@@ -1447,7 +1459,9 @@  void wilc_wlan_cleanup(struct net_device *dev)
 	wilc->rx_buffer = NULL;
 	kfree(wilc->tx_buffer);
 	wilc->tx_buffer = NULL;
+	acquire_bus(wilc, WILC_BUS_ACQUIRE_AND_WAKEUP);
 	wilc->hif_func->hif_deinit(wilc);
+	release_bus(wilc, WILC_BUS_RELEASE_ALLOW_SLEEP);
 }
 
 static int wilc_wlan_cfg_commit(struct wilc_vif *vif, int type,
diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.h b/drivers/net/wireless/microchip/wilc1000/wlan.h
index b9e7f9222eadd..ade2db95e8a0f 100644
--- a/drivers/net/wireless/microchip/wilc1000/wlan.h
+++ b/drivers/net/wireless/microchip/wilc1000/wlan.h
@@ -403,6 +403,8 @@  struct wilc_hif_func {
 	void (*disable_interrupt)(struct wilc *nic);
 	int (*hif_reset)(struct wilc *wilc);
 	bool (*hif_is_init)(struct wilc *wilc);
+	void (*hif_claim)(struct wilc *wilc);
+	void (*hif_release)(struct wilc *wilc);
 };
 
 #define WILC_MAX_CFG_FRAME_SIZE		1468