diff mbox series

wl1251: dynamically allocate memory used for DMA

Message ID 1676021ae8b6d7aada0b1806fed99b1b8359bdc4.1651495112.git.hns@goldelico.com (mailing list archive)
State New
Headers show
Series wl1251: dynamically allocate memory used for DMA | expand

Commit Message

H. Nikolaus Schaller May 2, 2022, 12:38 p.m. UTC
With introduction of vmap'ed stacks, stack parameters can no
longer be used for DMA and now leads to kernel panic.

It happens at several places for the wl1251 (e.g. when
accessed through SDIO) making it unuseable on e.g. the
OpenPandora.

We solve this by allocating temporary buffers or use wl1251_read32().

Tested on v5.18-rc5 with OpenPandora.

Fixes: a1c510d0adc6 ("ARM: implement support for vmap'ed stacks")
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
---
 drivers/net/wireless/ti/wl1251/event.c | 22 ++++++++++++++--------
 drivers/net/wireless/ti/wl1251/io.c    | 20 ++++++++++++++------
 drivers/net/wireless/ti/wl1251/tx.c    | 15 +++++++++++----
 3 files changed, 39 insertions(+), 18 deletions(-)

Comments

Arnd Bergmann May 2, 2022, 2:06 p.m. UTC | #1
On Mon, May 2, 2022 at 2:38 PM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> With introduction of vmap'ed stacks, stack parameters can no
> longer be used for DMA and now leads to kernel panic.
>
> It happens at several places for the wl1251 (e.g. when
> accessed through SDIO) making it unuseable on e.g. the
> OpenPandora.
>
> We solve this by allocating temporary buffers or use wl1251_read32().

This looks all correct to me. I had another look at the related wlcore
driver now,
and see that the same problem existed there but was fixed back in 2012
in a different way, see 690142e98826 ("wl12xx: fix DMA-API-related warnings").

The approach in the wlcore driver appears to be simpler because it
avoids dynamic memory allocation and the associated error handling.
However, it probably makes another problem worse that also exists
here:

 static inline u32 wl1251_read32(struct wl1251 *wl, int addr)
 {
       u32 response;
       wl->if_ops->read(wl, addr, &wl->buffer_32, sizeof(wl->buffer_32));
       return le32_to_cpu(wl->buffer_32);
 }

I think the 'buffer_32' member of 'struct wl1251' needs an explicit
'__cacheline_aligned' attribute to avoid potentially clobbering
some of the structure during a DMA write.

I don't know if anyone cares enough about the two drivers to
have an opinion. I've added Luca to Cc, but he hasn't maintained
the driver since 2013 and probably doesn't.

It's probably ok to just apply your patch for the moment to fix
the regression we saw on the machines that we know use this.

One more detail:

> diff --git a/drivers/net/wireless/ti/wl1251/event.c b/drivers/net/wireless/ti/wl1251/event.c
> index e6d426edab56b..e945aafd88ee5 100644
> --- a/drivers/net/wireless/ti/wl1251/event.c
> +++ b/drivers/net/wireless/ti/wl1251/event.c
> @@ -169,11 +169,9 @@ int wl1251_event_wait(struct wl1251 *wl, u32 mask, int timeout_ms)
>                 msleep(1);
>
>                 /* read from both event fields */
> -               wl1251_mem_read(wl, wl->mbox_ptr[0], &events_vector,
> -                               sizeof(events_vector));
> +               events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[0]);
>                 event = events_vector & mask;
> -               wl1251_mem_read(wl, wl->mbox_ptr[1], &events_vector,
> -                               sizeof(events_vector));
> +               events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[1]);
>                 event |= events_vector & mask;

This appears to change endianness of the data, on big-endian kernels.
Is that intentional?

My first guess would be that the driver never worked correctly on big-endian
machines, and that the change is indeed correct, but on the other hand
the conversion was added in commit ac9e2d9afa90 ("wl1251: convert
32-bit values to le32 before writing to the chip") in a way that suggests it
was meant to work on both.

       Arnd
H. Nikolaus Schaller May 2, 2022, 2:47 p.m. UTC | #2
Hi Arnd,

> Am 02.05.2022 um 16:06 schrieb Arnd Bergmann <arnd@arndb.de>:
> 
> On Mon, May 2, 2022 at 2:38 PM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>> With introduction of vmap'ed stacks, stack parameters can no
>> longer be used for DMA and now leads to kernel panic.
>> 
>> It happens at several places for the wl1251 (e.g. when
>> accessed through SDIO) making it unuseable on e.g. the
>> OpenPandora.
>> 
>> We solve this by allocating temporary buffers or use wl1251_read32().
> 
> This looks all correct to me. I had another look at the related wlcore
> driver now,
> and see that the same problem existed there but was fixed back in 2012
> in a different way, see 690142e98826 ("wl12xx: fix DMA-API-related warnings").

Interesting!

> 
> The approach in the wlcore driver appears to be simpler because it
> avoids dynamic memory allocation and the associated error handling.

It looks as if it just avoids kmalloc/free sequences in event handling
by allocating a big enough buffer once.

wl1271_cmd_wait_for_event_or_timeout() allocates it like we do now.

> However, it probably makes another problem worse that also exists
> here:
> 
> static inline u32 wl1251_read32(struct wl1251 *wl, int addr)
> {
>       u32 response;
>       wl->if_ops->read(wl, addr, &wl->buffer_32, sizeof(wl->buffer_32));
>       return le32_to_cpu(wl->buffer_32);
> }
> 
> I think the 'buffer_32' member of 'struct wl1251' needs an explicit
> '__cacheline_aligned' attribute to avoid potentially clobbering
> some of the structure during a DMA write.
> 
> I don't know if anyone cares enough about the two drivers to
> have an opinion. I've added Luca to Cc, but he hasn't maintained
> the driver since 2013 and probably doesn't.

Well, there seems to be quite some common code but indeed devices
using these older chips are getting rare so it is probably not worth
combining code. And testing needs someone who owns boards
with both chips...

> 
> It's probably ok to just apply your patch for the moment to fix
> the regression we saw on the machines that we know use this.
> 
> One more detail:
> 
>> diff --git a/drivers/net/wireless/ti/wl1251/event.c b/drivers/net/wireless/ti/wl1251/event.c
>> index e6d426edab56b..e945aafd88ee5 100644
>> --- a/drivers/net/wireless/ti/wl1251/event.c
>> +++ b/drivers/net/wireless/ti/wl1251/event.c
>> @@ -169,11 +169,9 @@ int wl1251_event_wait(struct wl1251 *wl, u32 mask, int timeout_ms)
>>                msleep(1);
>> 
>>                /* read from both event fields */
>> -               wl1251_mem_read(wl, wl->mbox_ptr[0], &events_vector,
>> -                               sizeof(events_vector));
>> +               events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[0]);
>>                event = events_vector & mask;
>> -               wl1251_mem_read(wl, wl->mbox_ptr[1], &events_vector,
>> -                               sizeof(events_vector));
>> +               events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[1]);
>>                event |= events_vector & mask;
> 
> This appears to change endianness of the data, on big-endian kernels.
> Is that intentional?

Hm. I didn't think about it. I just noticed that wl1251_mem_read32 uses the
internal buffer so we don't have to allocate its own buffer any more.

> 
> My first guess would be that the driver never worked correctly on big-endian
> machines, and that the change is indeed correct, but on the other hand
> the conversion was added in commit ac9e2d9afa90 ("wl1251: convert
> 32-bit values to le32 before writing to the chip") in a way that suggests it
> was meant to work on both.

wl1251_event_wait() seems to work with the masks provided by code.
So I guess the conversion to le32 is harmless on the OpenPandora.
Most likely it should be done on big endian devices. I.e. we might have
done the right thing.

Let's see if someone compains or knows more. Otherwise we should
fix it just for the Pandora and N900 (both omap3 based) as the only
upstream users of this chip.

BR and thanks,
Nikolaus
Arnd Bergmann May 2, 2022, 3:04 p.m. UTC | #3
On Mon, May 2, 2022 at 4:47 PM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> > Am 02.05.2022 um 16:06 schrieb Arnd Bergmann <arnd@arndb.de>:
> > On Mon, May 2, 2022 at 2:38 PM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> >
> > The approach in the wlcore driver appears to be simpler because it
> > avoids dynamic memory allocation and the associated error handling.
>
> It looks as if it just avoids kmalloc/free sequences in event handling
> by allocating a big enough buffer once.
>
> wl1271_cmd_wait_for_event_or_timeout() allocates it like we do now.

Ah right, I missed that one.

> > However, it probably makes another problem worse that also exists
> > here:
> >
> > static inline u32 wl1251_read32(struct wl1251 *wl, int addr)
> > {
> >       u32 response;
> >       wl->if_ops->read(wl, addr, &wl->buffer_32, sizeof(wl->buffer_32));
> >       return le32_to_cpu(wl->buffer_32);
> > }
> >
> > I think the 'buffer_32' member of 'struct wl1251' needs an explicit
> > '__cacheline_aligned' attribute to avoid potentially clobbering
> > some of the structure during a DMA write.
> >
> > I don't know if anyone cares enough about the two drivers to
> > have an opinion. I've added Luca to Cc, but he hasn't maintained
> > the driver since 2013 and probably doesn't.
>
> Well, there seems to be quite some common code but indeed devices
> using these older chips are getting rare so it is probably not worth
> combining code. And testing needs someone who owns boards
> with both chips...

No, I wasn't even thinking of combining code, just whether there
is value in keeping the similar bits in sync to ensure we have the
same set of bugs on both ;-)

I think the best fix for both drivers would be to keep the DMA
members (partition and buffer_32) in the respective device
structures, but mark each one as aligned.

> > My first guess would be that the driver never worked correctly on big-endian
> > machines, and that the change is indeed correct, but on the other hand
> > the conversion was added in commit ac9e2d9afa90 ("wl1251: convert
> > 32-bit values to le32 before writing to the chip") in a way that suggests it
> > was meant to work on both.
>
> wl1251_event_wait() seems to work with the masks provided by code.
> So I guess the conversion to le32 is harmless on the OpenPandora.
> Most likely it should be done on big endian devices. I.e. we might have
> done the right thing.
>
> Let's see if someone compains or knows more. Otherwise we should
> fix it just for the Pandora and N900 (both omap3 based) as the only
> upstream users.

Ok. In general I like ensure we keep things working for big-endian
kernels, which are meant to work on any ARMv6+ or newer machine.
Most of the time it's just a couple of drivers (like this one) that get
in the way of actually doing it, but then again very few people ever
care about big-endian ARMv6/v7.

If we don't have a reason to believe this one is actually wrong, I think
fixing the endian issue silently is fine, as is ignoring the potential
other endian issues in the same driver that nobody complained about
in the past decade ;-)

       Arnd
Kalle Valo May 6, 2022, 6:11 a.m. UTC | #4
"H. Nikolaus Schaller" <hns@goldelico.com> wrote:

> With introduction of vmap'ed stacks, stack parameters can no
> longer be used for DMA and now leads to kernel panic.
> 
> It happens at several places for the wl1251 (e.g. when
> accessed through SDIO) making it unuseable on e.g. the
> OpenPandora.
> 
> We solve this by allocating temporary buffers or use wl1251_read32().
> 
> Tested on v5.18-rc5 with OpenPandora.
> 
> Fixes: a1c510d0adc6 ("ARM: implement support for vmap'ed stacks")
> Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>

Patch applied to wireless-next.git, thanks.

454744754cbf wl1251: dynamically allocate memory used for DMA
diff mbox series

Patch

diff --git a/drivers/net/wireless/ti/wl1251/event.c b/drivers/net/wireless/ti/wl1251/event.c
index e6d426edab56b..e945aafd88ee5 100644
--- a/drivers/net/wireless/ti/wl1251/event.c
+++ b/drivers/net/wireless/ti/wl1251/event.c
@@ -169,11 +169,9 @@  int wl1251_event_wait(struct wl1251 *wl, u32 mask, int timeout_ms)
 		msleep(1);
 
 		/* read from both event fields */
-		wl1251_mem_read(wl, wl->mbox_ptr[0], &events_vector,
-				sizeof(events_vector));
+		events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[0]);
 		event = events_vector & mask;
-		wl1251_mem_read(wl, wl->mbox_ptr[1], &events_vector,
-				sizeof(events_vector));
+		events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[1]);
 		event |= events_vector & mask;
 	} while (!event);
 
@@ -202,7 +200,7 @@  void wl1251_event_mbox_config(struct wl1251 *wl)
 
 int wl1251_event_handle(struct wl1251 *wl, u8 mbox_num)
 {
-	struct event_mailbox mbox;
+	struct event_mailbox *mbox;
 	int ret;
 
 	wl1251_debug(DEBUG_EVENT, "EVENT on mbox %d", mbox_num);
@@ -210,12 +208,20 @@  int wl1251_event_handle(struct wl1251 *wl, u8 mbox_num)
 	if (mbox_num > 1)
 		return -EINVAL;
 
+	mbox = kmalloc(sizeof(*mbox), GFP_KERNEL);
+	if (!mbox) {
+		wl1251_error("can not allocate mbox buffer");
+		return -ENOMEM;
+	}
+
 	/* first we read the mbox descriptor */
-	wl1251_mem_read(wl, wl->mbox_ptr[mbox_num], &mbox,
-			    sizeof(struct event_mailbox));
+	wl1251_mem_read(wl, wl->mbox_ptr[mbox_num], mbox,
+			sizeof(*mbox));
 
 	/* process the descriptor */
-	ret = wl1251_event_process(wl, &mbox);
+	ret = wl1251_event_process(wl, mbox);
+	kfree(mbox);
+
 	if (ret < 0)
 		return ret;
 
diff --git a/drivers/net/wireless/ti/wl1251/io.c b/drivers/net/wireless/ti/wl1251/io.c
index 5ebe7958ed5c7..e8d567af74b4b 100644
--- a/drivers/net/wireless/ti/wl1251/io.c
+++ b/drivers/net/wireless/ti/wl1251/io.c
@@ -121,7 +121,13 @@  void wl1251_set_partition(struct wl1251 *wl,
 			  u32 mem_start, u32 mem_size,
 			  u32 reg_start, u32 reg_size)
 {
-	struct wl1251_partition partition[2];
+	struct wl1251_partition_set *partition;
+
+	partition = kmalloc(sizeof(*partition), GFP_KERNEL);
+	if (!partition) {
+		wl1251_error("can not allocate partition buffer");
+		return;
+	}
 
 	wl1251_debug(DEBUG_SPI, "mem_start %08X mem_size %08X",
 		     mem_start, mem_size);
@@ -164,10 +170,10 @@  void wl1251_set_partition(struct wl1251 *wl,
 			     reg_start, reg_size);
 	}
 
-	partition[0].start = mem_start;
-	partition[0].size  = mem_size;
-	partition[1].start = reg_start;
-	partition[1].size  = reg_size;
+	partition->mem.start = mem_start;
+	partition->mem.size  = mem_size;
+	partition->reg.start = reg_start;
+	partition->reg.size  = reg_size;
 
 	wl->physical_mem_addr = mem_start;
 	wl->physical_reg_addr = reg_start;
@@ -176,5 +182,7 @@  void wl1251_set_partition(struct wl1251 *wl,
 	wl->virtual_reg_addr = mem_size;
 
 	wl->if_ops->write(wl, HW_ACCESS_PART0_SIZE_ADDR, partition,
-		sizeof(partition));
+		sizeof(*partition));
+
+	kfree(partition);
 }
diff --git a/drivers/net/wireless/ti/wl1251/tx.c b/drivers/net/wireless/ti/wl1251/tx.c
index 98cd39619d579..e9dc3c72bb110 100644
--- a/drivers/net/wireless/ti/wl1251/tx.c
+++ b/drivers/net/wireless/ti/wl1251/tx.c
@@ -443,19 +443,25 @@  static void wl1251_tx_packet_cb(struct wl1251 *wl,
 void wl1251_tx_complete(struct wl1251 *wl)
 {
 	int i, result_index, num_complete = 0, queue_len;
-	struct tx_result result[FW_TX_CMPLT_BLOCK_SIZE], *result_ptr;
+	struct tx_result *result, *result_ptr;
 	unsigned long flags;
 
 	if (unlikely(wl->state != WL1251_STATE_ON))
 		return;
 
+	result = kmalloc_array(FW_TX_CMPLT_BLOCK_SIZE, sizeof(*result), GFP_KERNEL);
+	if (!result) {
+		wl1251_error("can not allocate result buffer");
+		return;
+	}
+
 	/* First we read the result */
-	wl1251_mem_read(wl, wl->data_path->tx_complete_addr,
-			    result, sizeof(result));
+	wl1251_mem_read(wl, wl->data_path->tx_complete_addr, result,
+			FW_TX_CMPLT_BLOCK_SIZE * sizeof(*result));
 
 	result_index = wl->next_tx_complete;
 
-	for (i = 0; i < ARRAY_SIZE(result); i++) {
+	for (i = 0; i < FW_TX_CMPLT_BLOCK_SIZE; i++) {
 		result_ptr = &result[result_index];
 
 		if (result_ptr->done_1 == 1 &&
@@ -538,6 +544,7 @@  void wl1251_tx_complete(struct wl1251 *wl)
 
 	}
 
+	kfree(result);
 	wl->next_tx_complete = result_index;
 }