diff mbox series

[v4,net] ps3/gelic: Fix SKB allocation

Message ID 4a6ab7b8-0dcc-43b8-a647-9be2a767b06d@infradead.org (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [v4,net] ps3/gelic: Fix SKB allocation | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 6 maintainers not CCed: aneesh.kumar@kernel.org npiggin@gmail.com mpe@ellerman.id.au edumazet@google.com naveen.n.rao@linux.ibm.com linuxppc-dev@lists.ozlabs.org
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 67 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-02-13--00-00 (tests: 1436)

Commit Message

Geoff Levand Feb. 10, 2024, 8:15 a.m. UTC
Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of
6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a
kernel panic when the SKB variable (struct gelic_descr.skb) was accessed.

This fix changes the way the napi buffer and corresponding SKB are
allocated and managed.

Reported-by: sambat goson <sombat3960@gmail.com>
Fixes: 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures")
Signed-off-by: Geoff Levand <geoff@infradead.org>

Comments

Paolo Abeni Feb. 13, 2024, 12:07 p.m. UTC | #1
On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote:
> Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of
> 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a
> kernel panic when the SKB variable (struct gelic_descr.skb) was accessed.
> 
> This fix changes the way the napi buffer and corresponding SKB are
> allocated and managed.

I think this is not what Jakub asked on v3.

Isn't something alike the following enough to fix the NULL ptr deref?

Thanks,

Paolo
---
diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
index d5b75af163d3..51ee6075653f 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -395,7 +395,6 @@ static int gelic_descr_prepare_rx(struct gelic_card *card,
        descr->hw_regs.data_error = 0;
        descr->hw_regs.payload.dev_addr = 0;
        descr->hw_regs.payload.size = 0;
-       descr->skb = NULL;
 
        offset = ((unsigned long)descr->skb->data) &
                (GELIC_NET_RXBUF_ALIGN - 1);
Christophe Leroy Feb. 13, 2024, 12:56 p.m. UTC | #2
Le 13/02/2024 à 13:07, Paolo Abeni a écrit :
> On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote:
>> Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of
>> 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a
>> kernel panic when the SKB variable (struct gelic_descr.skb) was accessed.
>>
>> This fix changes the way the napi buffer and corresponding SKB are
>> allocated and managed.
> 
> I think this is not what Jakub asked on v3.
> 
> Isn't something alike the following enough to fix the NULL ptr deref?

If you think it is enough, please explain in more details.

 From my point of view, when looking at commit 3ce4f9c3fbb3 
("net/ps3_gelic_net: Add gelic_descr structures") that introduced the 
problem, it is not obvious.

Christophe

> 
> Thanks,
> 
> Paolo
> ---
> diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
> index d5b75af163d3..51ee6075653f 100644
> --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
> +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
> @@ -395,7 +395,6 @@ static int gelic_descr_prepare_rx(struct gelic_card *card,
>          descr->hw_regs.data_error = 0;
>          descr->hw_regs.payload.dev_addr = 0;
>          descr->hw_regs.payload.size = 0;
> -       descr->skb = NULL;
>   
>          offset = ((unsigned long)descr->skb->data) &
>                  (GELIC_NET_RXBUF_ALIGN - 1);
>
Paolo Abeni Feb. 13, 2024, 1:09 p.m. UTC | #3
On Tue, 2024-02-13 at 12:56 +0000, Christophe Leroy wrote:
> 
> Le 13/02/2024 à 13:07, Paolo Abeni a écrit :
> > On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote:
> > > Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of
> > > 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a
> > > kernel panic when the SKB variable (struct gelic_descr.skb) was accessed.
> > > 
> > > This fix changes the way the napi buffer and corresponding SKB are
> > > allocated and managed.
> > 
> > I think this is not what Jakub asked on v3.
> > 
> > Isn't something alike the following enough to fix the NULL ptr deref?
> 
> If you think it is enough, please explain in more details.

I'm unsure it will be enough, but at least the quoted line causes a
NULL ptr dereference:

	descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size);
	if (!descr->skb) {
		descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */
		return -ENOMEM;
	}
	// here descr->skb is not NULL...

	descr->hw_regs.dmac_cmd_status = 0;
	descr->hw_regs.result_size = 0;
	descr->hw_regs.valid_size = 0;
	descr->hw_regs.data_error = 0;
	descr->hw_regs.payload.dev_addr = 0;
	descr->hw_regs.payload.size = 0;
	descr->skb = NULL;
	// ... and now it's NULL for no apparent good reason

	offset = ((unsigned long)descr->skb->data) &
        //                            ^^^^^^^ NULL ptr deref

		(GELIC_NET_RXBUF_ALIGN - 1);
	if (offset)
		skb_reserve(descr->skb, GELIC_NET_RXBUF_ALIGN - offset);
	/* io-mmu-map the skb */
	cpu_addr = dma_map_single(ctodev(card), descr->skb->data,
				  GELIC_NET_MAX_FRAME, DMA_FROM_DEVICE);


The buggy line in introduced by the blamed commit.

>  From my point of view, when looking at commit 3ce4f9c3fbb3 
> ("net/ps3_gelic_net: Add gelic_descr structures") that introduced the 
> problem, it is not obvious.

That change is quite complex. It could includes other issues - quickly
skimming over it I could not see them.

Reporting the decoded stack trace generated by the NULL ptr deref could
help.

Cheers,

Paolo
Geoff Levand Feb. 16, 2024, 9:37 a.m. UTC | #4
On 2/13/24 21:07, Paolo Abeni wrote:
> On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote:
>> Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of
>> 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a
>> kernel panic when the SKB variable (struct gelic_descr.skb) was accessed.
>>
>> This fix changes the way the napi buffer and corresponding SKB are
>> allocated and managed.
> 
> I think this is not what Jakub asked on v3.
> 
> Isn't something alike the following enough to fix the NULL ptr deref?
> 
> Thanks,
> 
> Paolo
> ---
> diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
> index d5b75af163d3..51ee6075653f 100644
> --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
> +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
> @@ -395,7 +395,6 @@ static int gelic_descr_prepare_rx(struct gelic_card *card,
>         descr->hw_regs.data_error = 0;
>         descr->hw_regs.payload.dev_addr = 0;
>         descr->hw_regs.payload.size = 0;
> -       descr->skb = NULL;

The reason we set the SKB pointer to NULL here is so we can
detect if an SKB has been allocated or not.  If the SKB pointer
is not NULL, then we delete it.

If we just let the SKB pointer be some random value then later
we will try to delete some random address.

-Geoff
Paolo Abeni Feb. 16, 2024, 10:06 a.m. UTC | #5
On Fri, 2024-02-16 at 18:37 +0900, Geoff Levand wrote:
> On 2/13/24 21:07, Paolo Abeni wrote:
> > On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote:
> > > Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of
> > > 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a
> > > kernel panic when the SKB variable (struct gelic_descr.skb) was accessed.
> > > 
> > > This fix changes the way the napi buffer and corresponding SKB are
> > > allocated and managed.
> > 
> > I think this is not what Jakub asked on v3.
> > 
> > Isn't something alike the following enough to fix the NULL ptr deref?
> > 
> > Thanks,
> > 
> > Paolo
> > ---
> > diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
> > index d5b75af163d3..51ee6075653f 100644
> > --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
> > +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
> > @@ -395,7 +395,6 @@ static int gelic_descr_prepare_rx(struct gelic_card *card,
> >         descr->hw_regs.data_error = 0;
> >         descr->hw_regs.payload.dev_addr = 0;
> >         descr->hw_regs.payload.size = 0;
> > -       descr->skb = NULL;
> 
> The reason we set the SKB pointer to NULL here is so we can
> detect if an SKB has been allocated or not.  If the SKB pointer
> is not NULL, then we delete it.
> 
> If we just let the SKB pointer be some random value then later
> we will try to delete some random address.

Note that this specific 'skb = NULL' assignment happens just after a
successful allocation and just before unconditional dereference of such
ptr:

        descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size);
        if (!descr->skb) {
                descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */
                return -ENOMEM;
        }

        descr->hw_regs.dmac_cmd_status = 0;
        descr->hw_regs.result_size = 0;
        descr->hw_regs.valid_size = 0;
        descr->hw_regs.data_error = 0;
        descr->hw_regs.payload.dev_addr = 0;
        descr->hw_regs.payload.size = 0;
	// XXX here skb is not NULL and valid 
        descr->skb = NULL;

	offset = ((unsigned long)descr->skb->data) &
	// XXX here              ^^^^^^^^^^ 
	// you unconditionally get null ptr deref
                (GELIC_NET_RXBUF_ALIGN - 1);

unless ENOCOFFEE here which is totally possible.

Cheers,

Paolo
Geoff Levand Feb. 16, 2024, 10:37 a.m. UTC | #6
On 2/13/24 22:09, Paolo Abeni wrote:
> On Tue, 2024-02-13 at 12:56 +0000, Christophe Leroy wrote:
>>
>> Le 13/02/2024 à 13:07, Paolo Abeni a écrit :
>>> On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote:
>>>> Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of
>>>> 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a
>>>> kernel panic when the SKB variable (struct gelic_descr.skb) was accessed.
>>>>
>>>> This fix changes the way the napi buffer and corresponding SKB are
>>>> allocated and managed.
>>>
>>> I think this is not what Jakub asked on v3.
>>>
>>> Isn't something alike the following enough to fix the NULL ptr deref?
>>
>> If you think it is enough, please explain in more details.
> 
> I'm unsure it will be enough, but at least the quoted line causes a
> NULL ptr dereference:
> 
> 	descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size);
> 	if (!descr->skb) {
> 		descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */
> 		return -ENOMEM;
> 	}
> 	// here descr->skb is not NULL...
> 
> 	descr->hw_regs.dmac_cmd_status = 0;
> 	descr->hw_regs.result_size = 0;
> 	descr->hw_regs.valid_size = 0;
> 	descr->hw_regs.data_error = 0;
> 	descr->hw_regs.payload.dev_addr = 0;
> 	descr->hw_regs.payload.size = 0;
> 	descr->skb = NULL;
> 	// ... and now it's NULL for no apparent good reason

As I mentioned in an earlier mail, the SKB pointer is set to NULL
so we can detect if an SKB has been allocated.
 
> 
> 	offset = ((unsigned long)descr->skb->data) &
>         //                            ^^^^^^^ NULL ptr deref
> 
> 		(GELIC_NET_RXBUF_ALIGN - 1);
> 	if (offset)
> 		skb_reserve(descr->skb, GELIC_NET_RXBUF_ALIGN - offset);
> 	/* io-mmu-map the skb */
> 	cpu_addr = dma_map_single(ctodev(card), descr->skb->data,
> 				  GELIC_NET_MAX_FRAME, DMA_FROM_DEVICE);
> 
> 
> The buggy line in introduced by the blamed commit.
> 
>>  From my point of view, when looking at commit 3ce4f9c3fbb3 
>> ("net/ps3_gelic_net: Add gelic_descr structures") that introduced the 
>> problem, it is not obvious.
> 
> That change is quite complex. It could includes other issues - quickly
> skimming over it I could not see them.

The proposed change was tested by both Sambat and I.  It fixes the
problem, and no ill effects were seen.

-Geoff
Geoff Levand Feb. 19, 2024, 8:20 a.m. UTC | #7
On 2/16/24 19:06, Paolo Abeni wrote:
> On Fri, 2024-02-16 at 18:37 +0900, Geoff Levand wrote:
>> On 2/13/24 21:07, Paolo Abeni wrote:
>>> On Sat, 2024-02-10 at 17:15 +0900, Geoff Levand wrote:
>>>> Commit 3ce4f9c3fbb3 ("net/ps3_gelic_net: Add gelic_descr structures") of
>>>> 6.8-rc1 did not allocate a network SKB for the gelic_descr, resulting in a
>>>> kernel panic when the SKB variable (struct gelic_descr.skb) was accessed.
>>>>
>>>> This fix changes the way the napi buffer and corresponding SKB are
>>>> allocated and managed.
>>>
>>> I think this is not what Jakub asked on v3.
>>>
>>> Isn't something alike the following enough to fix the NULL ptr deref?
>>>
>>> Thanks,
>>>
>>> Paolo
>>> ---
>>> diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
>>> index d5b75af163d3..51ee6075653f 100644
>>> --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
>>> +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
>>> @@ -395,7 +395,6 @@ static int gelic_descr_prepare_rx(struct gelic_card *card,
>>>         descr->hw_regs.data_error = 0;
>>>         descr->hw_regs.payload.dev_addr = 0;
>>>         descr->hw_regs.payload.size = 0;
>>> -       descr->skb = NULL;
>>
>> The reason we set the SKB pointer to NULL here is so we can
>> detect if an SKB has been allocated or not.  If the SKB pointer
>> is not NULL, then we delete it.
>>
>> If we just let the SKB pointer be some random value then later
>> we will try to delete some random address.
> 
> Note that this specific 'skb = NULL' assignment happens just after a
> successful allocation and just before unconditional dereference of such
> ptr:
> 
>         descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size);
>         if (!descr->skb) {
>                 descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */
>                 return -ENOMEM;
>         }
> 
>         descr->hw_regs.dmac_cmd_status = 0;
>         descr->hw_regs.result_size = 0;
>         descr->hw_regs.valid_size = 0;
>         descr->hw_regs.data_error = 0;
>         descr->hw_regs.payload.dev_addr = 0;
>         descr->hw_regs.payload.size = 0;
> 	// XXX here skb is not NULL and valid 
>         descr->skb = NULL;

I see your point now.  I'll send out a fix-up patch that just
moves the initialization of the descr to before the allocation
of the SKB.  Thanks.

-Geoff
diff mbox series

Patch

diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
index d5b75af163d3..a09b534efa32 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -375,20 +375,14 @@  static int gelic_card_init_chain(struct gelic_card *card,
 static int gelic_descr_prepare_rx(struct gelic_card *card,
 				  struct gelic_descr *descr)
 {
-	static const unsigned int rx_skb_size =
-		ALIGN(GELIC_NET_MAX_FRAME, GELIC_NET_RXBUF_ALIGN) +
-		GELIC_NET_RXBUF_ALIGN - 1;
+	static const unsigned int napi_buff_size =
+		round_up(GELIC_NET_MAX_FRAME, GELIC_NET_RXBUF_ALIGN);
 	dma_addr_t cpu_addr;
-	int offset;
+	void *napi_buff;
 
 	if (gelic_descr_get_status(descr) !=  GELIC_DESCR_DMA_NOT_IN_USE)
 		dev_info(ctodev(card), "%s: ERROR status\n", __func__);
 
-	descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size);
-	if (!descr->skb) {
-		descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */
-		return -ENOMEM;
-	}
 	descr->hw_regs.dmac_cmd_status = 0;
 	descr->hw_regs.result_size = 0;
 	descr->hw_regs.valid_size = 0;
@@ -397,24 +391,32 @@  static int gelic_descr_prepare_rx(struct gelic_card *card,
 	descr->hw_regs.payload.size = 0;
 	descr->skb = NULL;
 
-	offset = ((unsigned long)descr->skb->data) &
-		(GELIC_NET_RXBUF_ALIGN - 1);
-	if (offset)
-		skb_reserve(descr->skb, GELIC_NET_RXBUF_ALIGN - offset);
-	/* io-mmu-map the skb */
-	cpu_addr = dma_map_single(ctodev(card), descr->skb->data,
-				  GELIC_NET_MAX_FRAME, DMA_FROM_DEVICE);
-	descr->hw_regs.payload.dev_addr = cpu_to_be32(cpu_addr);
+	napi_buff = napi_alloc_frag_align(napi_buff_size,
+					  GELIC_NET_RXBUF_ALIGN);
+
+	if (unlikely(!napi_buff))
+		return -ENOMEM;
+
+	descr->skb = napi_build_skb(napi_buff, napi_buff_size);
+
+	if (unlikely(!descr->skb)) {
+		skb_free_frag(napi_buff);
+		return -ENOMEM;
+	}
+
+	cpu_addr = dma_map_single(ctodev(card), napi_buff, napi_buff_size,
+				  DMA_FROM_DEVICE);
+
 	if (dma_mapping_error(ctodev(card), cpu_addr)) {
-		dev_kfree_skb_any(descr->skb);
+		skb_free_frag(napi_buff);
 		descr->skb = NULL;
-		dev_info(ctodev(card),
-			 "%s:Could not iommu-map rx buffer\n", __func__);
+		dev_err_once(ctodev(card), "%s:Could not iommu-map rx buffer\n",
+			     __func__);
 		gelic_descr_set_status(descr, GELIC_DESCR_DMA_NOT_IN_USE);
 		return -ENOMEM;
 	}
 
-	descr->hw_regs.payload.size = cpu_to_be32(GELIC_NET_MAX_FRAME);
+	descr->hw_regs.payload.size = cpu_to_be32(napi_buff_size);
 	descr->hw_regs.payload.dev_addr = cpu_to_be32(cpu_addr);
 
 	gelic_descr_set_status(descr, GELIC_DESCR_DMA_CARDOWNED);