Message ID | 1676021ae8b6d7aada0b1806fed99b1b8359bdc4.1651495112.git.hns@goldelico.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 454744754cbf2c21b3fc7344e46e10bee2768094 |
Delegated to: | Kalle Valo |
Headers | show |
Series | wl1251: dynamically allocate memory used for DMA | expand |
On Mon, May 2, 2022 at 2:38 PM H. Nikolaus Schaller <hns@goldelico.com> wrote: > With introduction of vmap'ed stacks, stack parameters can no > longer be used for DMA and now leads to kernel panic. > > It happens at several places for the wl1251 (e.g. when > accessed through SDIO) making it unuseable on e.g. the > OpenPandora. > > We solve this by allocating temporary buffers or use wl1251_read32(). This looks all correct to me. I had another look at the related wlcore driver now, and see that the same problem existed there but was fixed back in 2012 in a different way, see 690142e98826 ("wl12xx: fix DMA-API-related warnings"). The approach in the wlcore driver appears to be simpler because it avoids dynamic memory allocation and the associated error handling. However, it probably makes another problem worse that also exists here: static inline u32 wl1251_read32(struct wl1251 *wl, int addr) { u32 response; wl->if_ops->read(wl, addr, &wl->buffer_32, sizeof(wl->buffer_32)); return le32_to_cpu(wl->buffer_32); } I think the 'buffer_32' member of 'struct wl1251' needs an explicit '__cacheline_aligned' attribute to avoid potentially clobbering some of the structure during a DMA write. I don't know if anyone cares enough about the two drivers to have an opinion. I've added Luca to Cc, but he hasn't maintained the driver since 2013 and probably doesn't. It's probably ok to just apply your patch for the moment to fix the regression we saw on the machines that we know use this. One more detail: > diff --git a/drivers/net/wireless/ti/wl1251/event.c b/drivers/net/wireless/ti/wl1251/event.c > index e6d426edab56b..e945aafd88ee5 100644 > --- a/drivers/net/wireless/ti/wl1251/event.c > +++ b/drivers/net/wireless/ti/wl1251/event.c > @@ -169,11 +169,9 @@ int wl1251_event_wait(struct wl1251 *wl, u32 mask, int timeout_ms) > msleep(1); > > /* read from both event fields */ > - wl1251_mem_read(wl, wl->mbox_ptr[0], &events_vector, > - sizeof(events_vector)); > + events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[0]); > event = events_vector & mask; > - wl1251_mem_read(wl, wl->mbox_ptr[1], &events_vector, > - sizeof(events_vector)); > + events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[1]); > event |= events_vector & mask; This appears to change endianness of the data, on big-endian kernels. Is that intentional? My first guess would be that the driver never worked correctly on big-endian machines, and that the change is indeed correct, but on the other hand the conversion was added in commit ac9e2d9afa90 ("wl1251: convert 32-bit values to le32 before writing to the chip") in a way that suggests it was meant to work on both. Arnd
Hi Arnd, > Am 02.05.2022 um 16:06 schrieb Arnd Bergmann <arnd@arndb.de>: > > On Mon, May 2, 2022 at 2:38 PM H. Nikolaus Schaller <hns@goldelico.com> wrote: >> With introduction of vmap'ed stacks, stack parameters can no >> longer be used for DMA and now leads to kernel panic. >> >> It happens at several places for the wl1251 (e.g. when >> accessed through SDIO) making it unuseable on e.g. the >> OpenPandora. >> >> We solve this by allocating temporary buffers or use wl1251_read32(). > > This looks all correct to me. I had another look at the related wlcore > driver now, > and see that the same problem existed there but was fixed back in 2012 > in a different way, see 690142e98826 ("wl12xx: fix DMA-API-related warnings"). Interesting! > > The approach in the wlcore driver appears to be simpler because it > avoids dynamic memory allocation and the associated error handling. It looks as if it just avoids kmalloc/free sequences in event handling by allocating a big enough buffer once. wl1271_cmd_wait_for_event_or_timeout() allocates it like we do now. > However, it probably makes another problem worse that also exists > here: > > static inline u32 wl1251_read32(struct wl1251 *wl, int addr) > { > u32 response; > wl->if_ops->read(wl, addr, &wl->buffer_32, sizeof(wl->buffer_32)); > return le32_to_cpu(wl->buffer_32); > } > > I think the 'buffer_32' member of 'struct wl1251' needs an explicit > '__cacheline_aligned' attribute to avoid potentially clobbering > some of the structure during a DMA write. > > I don't know if anyone cares enough about the two drivers to > have an opinion. I've added Luca to Cc, but he hasn't maintained > the driver since 2013 and probably doesn't. Well, there seems to be quite some common code but indeed devices using these older chips are getting rare so it is probably not worth combining code. And testing needs someone who owns boards with both chips... > > It's probably ok to just apply your patch for the moment to fix > the regression we saw on the machines that we know use this. > > One more detail: > >> diff --git a/drivers/net/wireless/ti/wl1251/event.c b/drivers/net/wireless/ti/wl1251/event.c >> index e6d426edab56b..e945aafd88ee5 100644 >> --- a/drivers/net/wireless/ti/wl1251/event.c >> +++ b/drivers/net/wireless/ti/wl1251/event.c >> @@ -169,11 +169,9 @@ int wl1251_event_wait(struct wl1251 *wl, u32 mask, int timeout_ms) >> msleep(1); >> >> /* read from both event fields */ >> - wl1251_mem_read(wl, wl->mbox_ptr[0], &events_vector, >> - sizeof(events_vector)); >> + events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[0]); >> event = events_vector & mask; >> - wl1251_mem_read(wl, wl->mbox_ptr[1], &events_vector, >> - sizeof(events_vector)); >> + events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[1]); >> event |= events_vector & mask; > > This appears to change endianness of the data, on big-endian kernels. > Is that intentional? Hm. I didn't think about it. I just noticed that wl1251_mem_read32 uses the internal buffer so we don't have to allocate its own buffer any more. > > My first guess would be that the driver never worked correctly on big-endian > machines, and that the change is indeed correct, but on the other hand > the conversion was added in commit ac9e2d9afa90 ("wl1251: convert > 32-bit values to le32 before writing to the chip") in a way that suggests it > was meant to work on both. wl1251_event_wait() seems to work with the masks provided by code. So I guess the conversion to le32 is harmless on the OpenPandora. Most likely it should be done on big endian devices. I.e. we might have done the right thing. Let's see if someone compains or knows more. Otherwise we should fix it just for the Pandora and N900 (both omap3 based) as the only upstream users of this chip. BR and thanks, Nikolaus
On Mon, May 2, 2022 at 4:47 PM H. Nikolaus Schaller <hns@goldelico.com> wrote: > > Am 02.05.2022 um 16:06 schrieb Arnd Bergmann <arnd@arndb.de>: > > On Mon, May 2, 2022 at 2:38 PM H. Nikolaus Schaller <hns@goldelico.com> wrote: > > > > The approach in the wlcore driver appears to be simpler because it > > avoids dynamic memory allocation and the associated error handling. > > It looks as if it just avoids kmalloc/free sequences in event handling > by allocating a big enough buffer once. > > wl1271_cmd_wait_for_event_or_timeout() allocates it like we do now. Ah right, I missed that one. > > However, it probably makes another problem worse that also exists > > here: > > > > static inline u32 wl1251_read32(struct wl1251 *wl, int addr) > > { > > u32 response; > > wl->if_ops->read(wl, addr, &wl->buffer_32, sizeof(wl->buffer_32)); > > return le32_to_cpu(wl->buffer_32); > > } > > > > I think the 'buffer_32' member of 'struct wl1251' needs an explicit > > '__cacheline_aligned' attribute to avoid potentially clobbering > > some of the structure during a DMA write. > > > > I don't know if anyone cares enough about the two drivers to > > have an opinion. I've added Luca to Cc, but he hasn't maintained > > the driver since 2013 and probably doesn't. > > Well, there seems to be quite some common code but indeed devices > using these older chips are getting rare so it is probably not worth > combining code. And testing needs someone who owns boards > with both chips... No, I wasn't even thinking of combining code, just whether there is value in keeping the similar bits in sync to ensure we have the same set of bugs on both ;-) I think the best fix for both drivers would be to keep the DMA members (partition and buffer_32) in the respective device structures, but mark each one as aligned. > > My first guess would be that the driver never worked correctly on big-endian > > machines, and that the change is indeed correct, but on the other hand > > the conversion was added in commit ac9e2d9afa90 ("wl1251: convert > > 32-bit values to le32 before writing to the chip") in a way that suggests it > > was meant to work on both. > > wl1251_event_wait() seems to work with the masks provided by code. > So I guess the conversion to le32 is harmless on the OpenPandora. > Most likely it should be done on big endian devices. I.e. we might have > done the right thing. > > Let's see if someone compains or knows more. Otherwise we should > fix it just for the Pandora and N900 (both omap3 based) as the only > upstream users. Ok. In general I like ensure we keep things working for big-endian kernels, which are meant to work on any ARMv6+ or newer machine. Most of the time it's just a couple of drivers (like this one) that get in the way of actually doing it, but then again very few people ever care about big-endian ARMv6/v7. If we don't have a reason to believe this one is actually wrong, I think fixing the endian issue silently is fine, as is ignoring the potential other endian issues in the same driver that nobody complained about in the past decade ;-) Arnd
"H. Nikolaus Schaller" <hns@goldelico.com> wrote: > With introduction of vmap'ed stacks, stack parameters can no > longer be used for DMA and now leads to kernel panic. > > It happens at several places for the wl1251 (e.g. when > accessed through SDIO) making it unuseable on e.g. the > OpenPandora. > > We solve this by allocating temporary buffers or use wl1251_read32(). > > Tested on v5.18-rc5 with OpenPandora. > > Fixes: a1c510d0adc6 ("ARM: implement support for vmap'ed stacks") > Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Patch applied to wireless-next.git, thanks. 454744754cbf wl1251: dynamically allocate memory used for DMA
diff --git a/drivers/net/wireless/ti/wl1251/event.c b/drivers/net/wireless/ti/wl1251/event.c index e6d426edab56b..e945aafd88ee5 100644 --- a/drivers/net/wireless/ti/wl1251/event.c +++ b/drivers/net/wireless/ti/wl1251/event.c @@ -169,11 +169,9 @@ int wl1251_event_wait(struct wl1251 *wl, u32 mask, int timeout_ms) msleep(1); /* read from both event fields */ - wl1251_mem_read(wl, wl->mbox_ptr[0], &events_vector, - sizeof(events_vector)); + events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[0]); event = events_vector & mask; - wl1251_mem_read(wl, wl->mbox_ptr[1], &events_vector, - sizeof(events_vector)); + events_vector = wl1251_mem_read32(wl, wl->mbox_ptr[1]); event |= events_vector & mask; } while (!event); @@ -202,7 +200,7 @@ void wl1251_event_mbox_config(struct wl1251 *wl) int wl1251_event_handle(struct wl1251 *wl, u8 mbox_num) { - struct event_mailbox mbox; + struct event_mailbox *mbox; int ret; wl1251_debug(DEBUG_EVENT, "EVENT on mbox %d", mbox_num); @@ -210,12 +208,20 @@ int wl1251_event_handle(struct wl1251 *wl, u8 mbox_num) if (mbox_num > 1) return -EINVAL; + mbox = kmalloc(sizeof(*mbox), GFP_KERNEL); + if (!mbox) { + wl1251_error("can not allocate mbox buffer"); + return -ENOMEM; + } + /* first we read the mbox descriptor */ - wl1251_mem_read(wl, wl->mbox_ptr[mbox_num], &mbox, - sizeof(struct event_mailbox)); + wl1251_mem_read(wl, wl->mbox_ptr[mbox_num], mbox, + sizeof(*mbox)); /* process the descriptor */ - ret = wl1251_event_process(wl, &mbox); + ret = wl1251_event_process(wl, mbox); + kfree(mbox); + if (ret < 0) return ret; diff --git a/drivers/net/wireless/ti/wl1251/io.c b/drivers/net/wireless/ti/wl1251/io.c index 5ebe7958ed5c7..e8d567af74b4b 100644 --- a/drivers/net/wireless/ti/wl1251/io.c +++ b/drivers/net/wireless/ti/wl1251/io.c @@ -121,7 +121,13 @@ void wl1251_set_partition(struct wl1251 *wl, u32 mem_start, u32 mem_size, u32 reg_start, u32 reg_size) { - struct wl1251_partition partition[2]; + struct wl1251_partition_set *partition; + + partition = kmalloc(sizeof(*partition), GFP_KERNEL); + if (!partition) { + wl1251_error("can not allocate partition buffer"); + return; + } wl1251_debug(DEBUG_SPI, "mem_start %08X mem_size %08X", mem_start, mem_size); @@ -164,10 +170,10 @@ void wl1251_set_partition(struct wl1251 *wl, reg_start, reg_size); } - partition[0].start = mem_start; - partition[0].size = mem_size; - partition[1].start = reg_start; - partition[1].size = reg_size; + partition->mem.start = mem_start; + partition->mem.size = mem_size; + partition->reg.start = reg_start; + partition->reg.size = reg_size; wl->physical_mem_addr = mem_start; wl->physical_reg_addr = reg_start; @@ -176,5 +182,7 @@ void wl1251_set_partition(struct wl1251 *wl, wl->virtual_reg_addr = mem_size; wl->if_ops->write(wl, HW_ACCESS_PART0_SIZE_ADDR, partition, - sizeof(partition)); + sizeof(*partition)); + + kfree(partition); } diff --git a/drivers/net/wireless/ti/wl1251/tx.c b/drivers/net/wireless/ti/wl1251/tx.c index 98cd39619d579..e9dc3c72bb110 100644 --- a/drivers/net/wireless/ti/wl1251/tx.c +++ b/drivers/net/wireless/ti/wl1251/tx.c @@ -443,19 +443,25 @@ static void wl1251_tx_packet_cb(struct wl1251 *wl, void wl1251_tx_complete(struct wl1251 *wl) { int i, result_index, num_complete = 0, queue_len; - struct tx_result result[FW_TX_CMPLT_BLOCK_SIZE], *result_ptr; + struct tx_result *result, *result_ptr; unsigned long flags; if (unlikely(wl->state != WL1251_STATE_ON)) return; + result = kmalloc_array(FW_TX_CMPLT_BLOCK_SIZE, sizeof(*result), GFP_KERNEL); + if (!result) { + wl1251_error("can not allocate result buffer"); + return; + } + /* First we read the result */ - wl1251_mem_read(wl, wl->data_path->tx_complete_addr, - result, sizeof(result)); + wl1251_mem_read(wl, wl->data_path->tx_complete_addr, result, + FW_TX_CMPLT_BLOCK_SIZE * sizeof(*result)); result_index = wl->next_tx_complete; - for (i = 0; i < ARRAY_SIZE(result); i++) { + for (i = 0; i < FW_TX_CMPLT_BLOCK_SIZE; i++) { result_ptr = &result[result_index]; if (result_ptr->done_1 == 1 && @@ -538,6 +544,7 @@ void wl1251_tx_complete(struct wl1251 *wl) } + kfree(result); wl->next_tx_complete = result_index; }
With introduction of vmap'ed stacks, stack parameters can no longer be used for DMA and now leads to kernel panic. It happens at several places for the wl1251 (e.g. when accessed through SDIO) making it unuseable on e.g. the OpenPandora. We solve this by allocating temporary buffers or use wl1251_read32(). Tested on v5.18-rc5 with OpenPandora. Fixes: a1c510d0adc6 ("ARM: implement support for vmap'ed stacks") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> --- drivers/net/wireless/ti/wl1251/event.c | 22 ++++++++++++++-------- drivers/net/wireless/ti/wl1251/io.c | 20 ++++++++++++++------ drivers/net/wireless/ti/wl1251/tx.c | 15 +++++++++++---- 3 files changed, 39 insertions(+), 18 deletions(-)