Message ID | 20240910124841.2205629-2-quic_jsuraj@quicinc.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2] net: stmmac: allocate separate page for buffer | expand |
On Tue, 10 Sep 2024 18:18:41 +0530 Suraj Jaiswal wrote: > Currently for TSO page is mapped with dma_map_single() > and then resulting dma address is referenced (and offset) > by multiple descriptors until the whole region is > programmed into the descriptors. > This makes it possible for stmmac_tx_clean() to dma_unmap() > the first of the already processed descriptors, while the > rest are still being processed by the DMA engine. This leads > to an iommu fault due to the DMA engine using unmapped memory > as seen below: > > arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, > iova=0xfc401000, fsynr=0x60003, cbfrsynra=0x121, cb=38 > > Descriptor content: > TDES0 TDES1 TDES2 TDES3 > 317: 0xfc400800 0x0 0x36 0xa02c0b68 > 318: 0xfc400836 0x0 0xb68 0x90000000 > > As we can see above descriptor 317 holding a page address > and 318 holding the buffer address by adding offset to page > addess. Now if 317 descritor is cleaned as part of tx_clean() > then we will get SMMU fault if 318 descriptor is getting accessed. The device is completing earlier chunks of the payload before the entire payload is sent? That's very unusual, is there a manual you can quote on this? > To fix this, let's map each descriptor's memory reference individually. > This way there's no risk of unmapping a region that's still being > referenced by the DMA engine in a later descriptor. This adds overhead. Why not wait with unmapping until the full skb is done? Presumably you can't free half an skb, anyway. Please added Fixes tag and use "PATCH net" as the subject tag/prefix.
On Tue, Sep 10, 2024 at 06:18:41PM +0530, Suraj Jaiswal wrote: > Currently for TSO page is mapped with dma_map_single() > and then resulting dma address is referenced (and offset) > by multiple descriptors until the whole region is > programmed into the descriptors. > This makes it possible for stmmac_tx_clean() to dma_unmap() > the first of the already processed descriptors, while the > rest are still being processed by the DMA engine. This leads > to an iommu fault due to the DMA engine using unmapped memory > as seen below: > > arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, > iova=0xfc401000, fsynr=0x60003, cbfrsynra=0x121, cb=38 > > Descriptor content: > TDES0 TDES1 TDES2 TDES3 > 317: 0xfc400800 0x0 0x36 0xa02c0b68 > 318: 0xfc400836 0x0 0xb68 0x90000000 > > As we can see above descriptor 317 holding a page address > and 318 holding the buffer address by adding offset to page > addess. Now if 317 descritor is cleaned as part of tx_clean() Hi Suraj, As it looks like there will be a v3 anyway, some minor nits from my side. addess -> address Flagged by checkpatch.pl --codespell > then we will get SMMU fault if 318 descriptor is getting accessed. > > To fix this, let's map each descriptor's memory reference individually. > This way there's no risk of unmapping a region that's still being > referenced by the DMA engine in a later descriptor. > > Signed-off-by: Suraj Jaiswal <quic_jsuraj@quicinc.com> > --- > > Changes since v2: > - Update commit text with more details. > - fixed Reverse xmas tree order issue. > > > Changes since v1: > - Fixed function description > - Fixed handling of return value. > > > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 63 ++++++++++++------- > 1 file changed, 42 insertions(+), 21 deletions(-) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index 83b654b7a9fd..98d5a4b64cac 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -4136,21 +4136,25 @@ static bool stmmac_vlan_insert(struct stmmac_priv *priv, struct sk_buff *skb, > /** > * stmmac_tso_allocator - close entry point of the driver > * @priv: driver private structure > - * @des: buffer start address > + * @addr: Contains either skb frag address or skb->data address > * @total_len: total length to fill in descriptors > * @last_segment: condition for the last descriptor > * @queue: TX queue index > + * @is_skb_frag: condition to check whether skb data is part of fragment or not > * Description: > * This function fills descriptor and request new descriptors according to > * buffer length to fill > + * This function returns 0 on success else -ERRNO on fail Please consider using a "Return:" or "Returns:" section to document return values. Flagged by ./scripts/kernel-doc -none -Wall .../stmmac_main.c > */ > -static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des, > - int total_len, bool last_segment, u32 queue) > +static int stmmac_tso_allocator(struct stmmac_priv *priv, void *addr, > + int total_len, bool last_segment, u32 queue, bool is_skb_frag) The line above could be trivially wrapped to <= 80 columns wide, as is still preferred for networking code. Likewise a little further below. Likewise elsewhere in this patch. You can pass an option to checkpatch.pl to check for this. > { > struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue]; > struct dma_desc *desc; > u32 buff_size; > int tmp_len; > + unsigned char *data = addr; > + unsigned int offset = 0; Please consider arranging local variables in Networking code in reverse xmas tree order - longest line to shortest. Edward Cree's xmastree tool can be of assistance here: https://github.com/ecree-solarflare/xmastree > > tmp_len = total_len; > > @@ -4161,20 +4165,44 @@ static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des, > priv->dma_conf.dma_tx_size); > WARN_ON(tx_q->tx_skbuff[tx_q->cur_tx]); > > + buff_size = tmp_len >= TSO_MAX_BUFF_SIZE ? TSO_MAX_BUFF_SIZE : tmp_len; FWIIW, I think that min() would allow this the intent of the line above to be expressed more succinctly. > + > if (tx_q->tbs & STMMAC_TBS_AVAIL) > desc = &tx_q->dma_entx[tx_q->cur_tx].basic; > else > desc = &tx_q->dma_tx[tx_q->cur_tx]; > > - curr_addr = des + (total_len - tmp_len); > + offset = total_len - tmp_len; > + if (!is_skb_frag) { > + curr_addr = dma_map_single(priv->device, data + offset, buff_size, > + DMA_TO_DEVICE); > + > + if (dma_mapping_error(priv->device, curr_addr)) > + return -ENOMEM; > + > + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = curr_addr; > + tx_q->tx_skbuff_dma[tx_q->cur_tx].len = buff_size; > + tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false; > + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB; > + } else { > + curr_addr = skb_frag_dma_map(priv->device, addr, offset, > + buff_size, > + DMA_TO_DEVICE); > + > + if (dma_mapping_error(priv->device, curr_addr)) > + return -ENOMEM; > + > + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = curr_addr; > + tx_q->tx_skbuff_dma[tx_q->cur_tx].len = buff_size; > + tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = true; > + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB; > + } Maybe my eyes are deceiving me, but there seems to be quite a lot of repetition in the two arms of the if/else condition above. If so, can it be consolidated by moving everything other than the assignment of curr out of the conditional blocks? (And dropping the {}.) > + > if (priv->dma_cap.addr64 <= 32) > desc->des0 = cpu_to_le32(curr_addr); > else > stmmac_set_desc_addr(priv, desc, curr_addr); > > - buff_size = tmp_len >= TSO_MAX_BUFF_SIZE ? > - TSO_MAX_BUFF_SIZE : tmp_len; > - > stmmac_prepare_tso_tx_desc(priv, desc, 0, buff_size, > 0, 1, > (last_segment) && (tmp_len <= TSO_MAX_BUFF_SIZE), ...
On 9/12/2024 5:21 AM, Jakub Kicinski wrote: > On Tue, 10 Sep 2024 18:18:41 +0530 Suraj Jaiswal wrote: >> Currently for TSO page is mapped with dma_map_single() >> and then resulting dma address is referenced (and offset) >> by multiple descriptors until the whole region is >> programmed into the descriptors. >> This makes it possible for stmmac_tx_clean() to dma_unmap() >> the first of the already processed descriptors, while the >> rest are still being processed by the DMA engine. This leads >> to an iommu fault due to the DMA engine using unmapped memory >> as seen below: >> >> arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, >> iova=0xfc401000, fsynr=0x60003, cbfrsynra=0x121, cb=38 >> >> Descriptor content: >> TDES0 TDES1 TDES2 TDES3 >> 317: 0xfc400800 0x0 0x36 0xa02c0b68 >> 318: 0xfc400836 0x0 0xb68 0x90000000 >> >> As we can see above descriptor 317 holding a page address >> and 318 holding the buffer address by adding offset to page >> addess. Now if 317 descritor is cleaned as part of tx_clean() >> then we will get SMMU fault if 318 descriptor is getting accessed. > > The device is completing earlier chunks of the payload before the entire > payload is sent? That's very unusual, is there a manual you can quote > on this? Here if as part of tx clean if first descriptor is cleaned before tx complete of next descriptor is done then we are running into this issue. for non tso case if we see xmit code has logic to alloc different page for each fragments and same logic we are trying for TSO case. >> To fix this, let's map each descriptor's memory reference individually. >> This way there's no risk of unmapping a region that's still being >> referenced by the DMA engine in a later descriptor. > > This adds overhead. Why not wait with unmapping until the full skb is > done? Presumably you can't free half an skb, anyway. > > Please added Fixes tag and use "PATCH net" as the subject tag/prefix.
On 9/12/2024 2:17 PM, Simon Horman wrote: > On Tue, Sep 10, 2024 at 06:18:41PM +0530, Suraj Jaiswal wrote: >> Currently for TSO page is mapped with dma_map_single() >> and then resulting dma address is referenced (and offset) >> by multiple descriptors until the whole region is >> programmed into the descriptors. >> This makes it possible for stmmac_tx_clean() to dma_unmap() >> the first of the already processed descriptors, while the >> rest are still being processed by the DMA engine. This leads >> to an iommu fault due to the DMA engine using unmapped memory >> as seen below: >> >> arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, >> iova=0xfc401000, fsynr=0x60003, cbfrsynra=0x121, cb=38 >> >> Descriptor content: >> TDES0 TDES1 TDES2 TDES3 >> 317: 0xfc400800 0x0 0x36 0xa02c0b68 >> 318: 0xfc400836 0x0 0xb68 0x90000000 >> >> As we can see above descriptor 317 holding a page address >> and 318 holding the buffer address by adding offset to page >> addess. Now if 317 descritor is cleaned as part of tx_clean() > > Hi Suraj, > > As it looks like there will be a v3 anyway, some minor nits from my side. > > addess -> address > > Flagged by checkpatch.pl --codespell sure . we will take care of all commnet and update latest patch after verification . > >> then we will get SMMU fault if 318 descriptor is getting accessed. >> >> To fix this, let's map each descriptor's memory reference individually. >> This way there's no risk of unmapping a region that's still being >> referenced by the DMA engine in a later descriptor. >> >> Signed-off-by: Suraj Jaiswal <quic_jsuraj@quicinc.com> >> --- >> >> Changes since v2: >> - Update commit text with more details. >> - fixed Reverse xmas tree order issue. >> >> >> Changes since v1: >> - Fixed function description >> - Fixed handling of return value. >> >> >> .../net/ethernet/stmicro/stmmac/stmmac_main.c | 63 ++++++++++++------- >> 1 file changed, 42 insertions(+), 21 deletions(-) >> >> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> index 83b654b7a9fd..98d5a4b64cac 100644 >> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> @@ -4136,21 +4136,25 @@ static bool stmmac_vlan_insert(struct stmmac_priv *priv, struct sk_buff *skb, >> /** >> * stmmac_tso_allocator - close entry point of the driver >> * @priv: driver private structure >> - * @des: buffer start address >> + * @addr: Contains either skb frag address or skb->data address >> * @total_len: total length to fill in descriptors >> * @last_segment: condition for the last descriptor >> * @queue: TX queue index >> + * @is_skb_frag: condition to check whether skb data is part of fragment or not >> * Description: >> * This function fills descriptor and request new descriptors according to >> * buffer length to fill >> + * This function returns 0 on success else -ERRNO on fail > > Please consider using a "Return:" or "Returns:" section to document > return values. > > Flagged by ./scripts/kernel-doc -none -Wall .../stmmac_main.c > >> */ >> -static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des, >> - int total_len, bool last_segment, u32 queue) >> +static int stmmac_tso_allocator(struct stmmac_priv *priv, void *addr, >> + int total_len, bool last_segment, u32 queue, bool is_skb_frag) > > The line above could be trivially wrapped to <= 80 columns wide, as is > still preferred for networking code. Likewise a little further below. > > Likewise elsewhere in this patch. > > You can pass an option to checkpatch.pl to check for this. > >> { >> struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue]; >> struct dma_desc *desc; >> u32 buff_size; >> int tmp_len; >> + unsigned char *data = addr; >> + unsigned int offset = 0; > > Please consider arranging local variables in Networking code in > reverse xmas tree order - longest line to shortest. > > Edward Cree's xmastree tool can be of assistance here: > https://github.com/ecree-solarflare/xmastree > >> >> tmp_len = total_len; >> >> @@ -4161,20 +4165,44 @@ static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des, >> priv->dma_conf.dma_tx_size); >> WARN_ON(tx_q->tx_skbuff[tx_q->cur_tx]); >> >> + buff_size = tmp_len >= TSO_MAX_BUFF_SIZE ? TSO_MAX_BUFF_SIZE : tmp_len; > > FWIIW, I think that min() would allow this the intent > of the line above to be expressed more succinctly. > >> + >> if (tx_q->tbs & STMMAC_TBS_AVAIL) >> desc = &tx_q->dma_entx[tx_q->cur_tx].basic; >> else >> desc = &tx_q->dma_tx[tx_q->cur_tx]; >> >> - curr_addr = des + (total_len - tmp_len); >> + offset = total_len - tmp_len; >> + if (!is_skb_frag) { >> + curr_addr = dma_map_single(priv->device, data + offset, buff_size, >> + DMA_TO_DEVICE); >> + >> + if (dma_mapping_error(priv->device, curr_addr)) >> + return -ENOMEM; >> + >> + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = curr_addr; >> + tx_q->tx_skbuff_dma[tx_q->cur_tx].len = buff_size; >> + tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false; >> + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB; >> + } else { >> + curr_addr = skb_frag_dma_map(priv->device, addr, offset, >> + buff_size, >> + DMA_TO_DEVICE); >> + >> + if (dma_mapping_error(priv->device, curr_addr)) >> + return -ENOMEM; >> + >> + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = curr_addr; >> + tx_q->tx_skbuff_dma[tx_q->cur_tx].len = buff_size; >> + tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = true; >> + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB; >> + } > > Maybe my eyes are deceiving me, but there seems to be quite a lot of > repetition in the two arms of the if/else condition above. If so, can it be > consolidated by moving everything other than the assignment of curr out of > the conditional blocks? (And dropping the {}.) > >> + >> if (priv->dma_cap.addr64 <= 32) >> desc->des0 = cpu_to_le32(curr_addr); >> else >> stmmac_set_desc_addr(priv, desc, curr_addr); >> >> - buff_size = tmp_len >= TSO_MAX_BUFF_SIZE ? >> - TSO_MAX_BUFF_SIZE : tmp_len; >> - >> stmmac_prepare_tso_tx_desc(priv, desc, 0, buff_size, >> 0, 1, >> (last_segment) && (tmp_len <= TSO_MAX_BUFF_SIZE), > > ...
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 83b654b7a9fd..98d5a4b64cac 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4136,21 +4136,25 @@ static bool stmmac_vlan_insert(struct stmmac_priv *priv, struct sk_buff *skb, /** * stmmac_tso_allocator - close entry point of the driver * @priv: driver private structure - * @des: buffer start address + * @addr: Contains either skb frag address or skb->data address * @total_len: total length to fill in descriptors * @last_segment: condition for the last descriptor * @queue: TX queue index + * @is_skb_frag: condition to check whether skb data is part of fragment or not * Description: * This function fills descriptor and request new descriptors according to * buffer length to fill + * This function returns 0 on success else -ERRNO on fail */ -static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des, - int total_len, bool last_segment, u32 queue) +static int stmmac_tso_allocator(struct stmmac_priv *priv, void *addr, + int total_len, bool last_segment, u32 queue, bool is_skb_frag) { struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue]; struct dma_desc *desc; u32 buff_size; int tmp_len; + unsigned char *data = addr; + unsigned int offset = 0; tmp_len = total_len; @@ -4161,20 +4165,44 @@ static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des, priv->dma_conf.dma_tx_size); WARN_ON(tx_q->tx_skbuff[tx_q->cur_tx]); + buff_size = tmp_len >= TSO_MAX_BUFF_SIZE ? TSO_MAX_BUFF_SIZE : tmp_len; + if (tx_q->tbs & STMMAC_TBS_AVAIL) desc = &tx_q->dma_entx[tx_q->cur_tx].basic; else desc = &tx_q->dma_tx[tx_q->cur_tx]; - curr_addr = des + (total_len - tmp_len); + offset = total_len - tmp_len; + if (!is_skb_frag) { + curr_addr = dma_map_single(priv->device, data + offset, buff_size, + DMA_TO_DEVICE); + + if (dma_mapping_error(priv->device, curr_addr)) + return -ENOMEM; + + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = curr_addr; + tx_q->tx_skbuff_dma[tx_q->cur_tx].len = buff_size; + tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false; + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB; + } else { + curr_addr = skb_frag_dma_map(priv->device, addr, offset, + buff_size, + DMA_TO_DEVICE); + + if (dma_mapping_error(priv->device, curr_addr)) + return -ENOMEM; + + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = curr_addr; + tx_q->tx_skbuff_dma[tx_q->cur_tx].len = buff_size; + tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = true; + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB; + } + if (priv->dma_cap.addr64 <= 32) desc->des0 = cpu_to_le32(curr_addr); else stmmac_set_desc_addr(priv, desc, curr_addr); - buff_size = tmp_len >= TSO_MAX_BUFF_SIZE ? - TSO_MAX_BUFF_SIZE : tmp_len; - stmmac_prepare_tso_tx_desc(priv, desc, 0, buff_size, 0, 1, (last_segment) && (tmp_len <= TSO_MAX_BUFF_SIZE), @@ -4182,6 +4210,7 @@ static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des, tmp_len -= TSO_MAX_BUFF_SIZE; } + return 0; } static void stmmac_flush_tx_descriptors(struct stmmac_priv *priv, int queue) @@ -4351,25 +4380,17 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) pay_len = 0; } - stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue); + if (stmmac_tso_allocator(priv, (skb->data + proto_hdr_len), + tmp_pay_len, nfrags == 0, queue, false)) + goto dma_map_err; /* Prepare fragments */ for (i = 0; i < nfrags; i++) { - const skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; - des = skb_frag_dma_map(priv->device, frag, 0, - skb_frag_size(frag), - DMA_TO_DEVICE); - if (dma_mapping_error(priv->device, des)) + if (stmmac_tso_allocator(priv, frag, skb_frag_size(frag), + (i == nfrags - 1), queue, true)) goto dma_map_err; - - stmmac_tso_allocator(priv, des, skb_frag_size(frag), - (i == nfrags - 1), queue); - - tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des; - tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_frag_size(frag); - tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = true; - tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB; } tx_q->tx_skbuff_dma[tx_q->cur_tx].last_segment = true;
Currently for TSO page is mapped with dma_map_single() and then resulting dma address is referenced (and offset) by multiple descriptors until the whole region is programmed into the descriptors. This makes it possible for stmmac_tx_clean() to dma_unmap() the first of the already processed descriptors, while the rest are still being processed by the DMA engine. This leads to an iommu fault due to the DMA engine using unmapped memory as seen below: arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0xfc401000, fsynr=0x60003, cbfrsynra=0x121, cb=38 Descriptor content: TDES0 TDES1 TDES2 TDES3 317: 0xfc400800 0x0 0x36 0xa02c0b68 318: 0xfc400836 0x0 0xb68 0x90000000 As we can see above descriptor 317 holding a page address and 318 holding the buffer address by adding offset to page addess. Now if 317 descritor is cleaned as part of tx_clean() then we will get SMMU fault if 318 descriptor is getting accessed. To fix this, let's map each descriptor's memory reference individually. This way there's no risk of unmapping a region that's still being referenced by the DMA engine in a later descriptor. Signed-off-by: Suraj Jaiswal <quic_jsuraj@quicinc.com> --- Changes since v2: - Update commit text with more details. - fixed Reverse xmas tree order issue. Changes since v1: - Fixed function description - Fixed handling of return value. .../net/ethernet/stmicro/stmmac/stmmac_main.c | 63 ++++++++++++------- 1 file changed, 42 insertions(+), 21 deletions(-)