diff mbox series

[net] ionic: Fix allocation of q/cq info structures from device local node

Message ID 20230407233645.35561-1-brett.creeley@amd.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] ionic: Fix allocation of q/cq info structures from device local node | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 18 this patch: 18
netdev/cc_maintainers warning 2 maintainers not CCed: edumazet@google.com pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 18 this patch: 18
netdev/checkpatch warning WARNING: Possible unnecessary 'out of memory' message
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Brett Creeley April 7, 2023, 11:36 p.m. UTC
Commit 116dce0ff047 ("ionic: Use vzalloc for large per-queue related
buffers") made a change to relieve memory pressure by making use of
vzalloc() due to the structures not requiring DMA mapping. However,
it overlooked that these structures are used in the fast path of the
driver and allocations on the non-local node could cause performance
degredation. Fix this by first attempting to use vzalloc_node()
using the device's local node and if that fails try again with
vzalloc().

Fixes: 116dce0ff047 ("ionic: Use vzalloc for large per-queue related buffers")
Signed-off-by: Neel Patel <neel.patel@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
---
 .../net/ethernet/pensando/ionic/ionic_lif.c   | 24 ++++++++++++-------
 1 file changed, 16 insertions(+), 8 deletions(-)

Comments

Leon Romanovsky April 9, 2023, 10:52 a.m. UTC | #1
On Fri, Apr 07, 2023 at 04:36:45PM -0700, Brett Creeley wrote:
> Commit 116dce0ff047 ("ionic: Use vzalloc for large per-queue related
> buffers") made a change to relieve memory pressure by making use of
> vzalloc() due to the structures not requiring DMA mapping. However,
> it overlooked that these structures are used in the fast path of the
> driver and allocations on the non-local node could cause performance
> degredation. Fix this by first attempting to use vzalloc_node()
> using the device's local node and if that fails try again with
> vzalloc().
> 
> Fixes: 116dce0ff047 ("ionic: Use vzalloc for large per-queue related buffers")
> Signed-off-by: Neel Patel <neel.patel@amd.com>
> Signed-off-by: Brett Creeley <brett.creeley@amd.com>
> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
> ---
>  .../net/ethernet/pensando/ionic/ionic_lif.c   | 24 ++++++++++++-------
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> index 957027e546b3..2c4e226b8cf1 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> @@ -560,11 +560,15 @@ static int ionic_qcq_alloc(struct ionic_lif *lif, unsigned int type,
>  	new->q.dev = dev;
>  	new->flags = flags;
>  
> -	new->q.info = vzalloc(num_descs * sizeof(*new->q.info));
> +	new->q.info = vzalloc_node(num_descs * sizeof(*new->q.info),
> +				   dev_to_node(dev));
>  	if (!new->q.info) {
> -		netdev_err(lif->netdev, "Cannot allocate queue info\n");
> -		err = -ENOMEM;
> -		goto err_out_free_qcq;
> +		new->q.info = vzalloc(num_descs * sizeof(*new->q.info));
> +		if (!new->q.info) {
> +			netdev_err(lif->netdev, "Cannot allocate queue info\n");

Kernel memory allocator will try local node first and if memory is
depleted it will go to remote nodes. So basically, you open-coded that
behaviour but with OOM splash when first call to vzalloc_node fails and
with custom error message about memory allocation failure.

Thanks
Brett Creeley April 10, 2023, 6:16 p.m. UTC | #2
On 4/9/2023 3:52 AM, Leon Romanovsky wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Fri, Apr 07, 2023 at 04:36:45PM -0700, Brett Creeley wrote:
>> Commit 116dce0ff047 ("ionic: Use vzalloc for large per-queue related
>> buffers") made a change to relieve memory pressure by making use of
>> vzalloc() due to the structures not requiring DMA mapping. However,
>> it overlooked that these structures are used in the fast path of the
>> driver and allocations on the non-local node could cause performance
>> degredation. Fix this by first attempting to use vzalloc_node()
>> using the device's local node and if that fails try again with
>> vzalloc().
>>
>> Fixes: 116dce0ff047 ("ionic: Use vzalloc for large per-queue related buffers")
>> Signed-off-by: Neel Patel <neel.patel@amd.com>
>> Signed-off-by: Brett Creeley <brett.creeley@amd.com>
>> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
>> ---
>>   .../net/ethernet/pensando/ionic/ionic_lif.c   | 24 ++++++++++++-------
>>   1 file changed, 16 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
>> index 957027e546b3..2c4e226b8cf1 100644
>> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
>> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
>> @@ -560,11 +560,15 @@ static int ionic_qcq_alloc(struct ionic_lif *lif, unsigned int type,
>>        new->q.dev = dev;
>>        new->flags = flags;
>>
>> -     new->q.info = vzalloc(num_descs * sizeof(*new->q.info));
>> +     new->q.info = vzalloc_node(num_descs * sizeof(*new->q.info),
>> +                                dev_to_node(dev));
>>        if (!new->q.info) {
>> -             netdev_err(lif->netdev, "Cannot allocate queue info\n");
>> -             err = -ENOMEM;
>> -             goto err_out_free_qcq;
>> +             new->q.info = vzalloc(num_descs * sizeof(*new->q.info));
>> +             if (!new->q.info) {
>> +                     netdev_err(lif->netdev, "Cannot allocate queue info\n");
> 
> Kernel memory allocator will try local node first and if memory is
> depleted it will go to remote nodes. So basically, you open-coded that
> behaviour but with OOM splash when first call to vzalloc_node fails and
> with custom error message about memory allocation failure.
> 
> Thanks

Leon,

We want to allocate memory from the node local to our PCI device, which 
is not necessarily the same as the node that the thread is running on 
where vzalloc() first tries to alloc. Since it wasn't clear to us that 
vzalloc_node() does any fallback, we followed the example in the ena 
driver to follow up with a more generic vzalloc() request.

Also, the custom message helps us quickly figure out exactly which 
allocation failed.

Thanks,

Brett
Leon Romanovsky April 11, 2023, 12:47 p.m. UTC | #3
On Mon, Apr 10, 2023 at 11:16:03AM -0700, Brett Creeley wrote:
> On 4/9/2023 3:52 AM, Leon Romanovsky wrote:
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > 
> > 
> > On Fri, Apr 07, 2023 at 04:36:45PM -0700, Brett Creeley wrote:
> > > Commit 116dce0ff047 ("ionic: Use vzalloc for large per-queue related
> > > buffers") made a change to relieve memory pressure by making use of
> > > vzalloc() due to the structures not requiring DMA mapping. However,
> > > it overlooked that these structures are used in the fast path of the
> > > driver and allocations on the non-local node could cause performance
> > > degredation. Fix this by first attempting to use vzalloc_node()
> > > using the device's local node and if that fails try again with
> > > vzalloc().
> > > 
> > > Fixes: 116dce0ff047 ("ionic: Use vzalloc for large per-queue related buffers")
> > > Signed-off-by: Neel Patel <neel.patel@amd.com>
> > > Signed-off-by: Brett Creeley <brett.creeley@amd.com>
> > > Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
> > > ---
> > >   .../net/ethernet/pensando/ionic/ionic_lif.c   | 24 ++++++++++++-------
> > >   1 file changed, 16 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> > > index 957027e546b3..2c4e226b8cf1 100644
> > > --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> > > +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> > > @@ -560,11 +560,15 @@ static int ionic_qcq_alloc(struct ionic_lif *lif, unsigned int type,
> > >        new->q.dev = dev;
> > >        new->flags = flags;
> > > 
> > > -     new->q.info = vzalloc(num_descs * sizeof(*new->q.info));
> > > +     new->q.info = vzalloc_node(num_descs * sizeof(*new->q.info),
> > > +                                dev_to_node(dev));
> > >        if (!new->q.info) {
> > > -             netdev_err(lif->netdev, "Cannot allocate queue info\n");
> > > -             err = -ENOMEM;
> > > -             goto err_out_free_qcq;
> > > +             new->q.info = vzalloc(num_descs * sizeof(*new->q.info));
> > > +             if (!new->q.info) {
> > > +                     netdev_err(lif->netdev, "Cannot allocate queue info\n");
> > 
> > Kernel memory allocator will try local node first and if memory is
> > depleted it will go to remote nodes. So basically, you open-coded that
> > behaviour but with OOM splash when first call to vzalloc_node fails and
> > with custom error message about memory allocation failure.
> > 
> > Thanks
> 
> Leon,
> 
> We want to allocate memory from the node local to our PCI device, which is
> not necessarily the same as the node that the thread is running on where
> vzalloc() first tries to alloc.

I'm not sure about it as you are running kernel thread which is
triggered directly by device and most likely will run on same node as
PCI device.

> Since it wasn't clear to us that vzalloc_node() does any fallback, 

vzalloc_node() doesn't do fallback, but vzalloc will find the right node
for you.

> we followed the example in the ena driver to follow up with a more
> generic vzalloc() request.

I don't know about ENA implementation, maybe they have right reasons to
do it, but maybe they don't.

> 
> Also, the custom message helps us quickly figure out exactly which
> allocation failed.

If OOM is missing some info to help debug allocation failures, let's add
it there, but please do not add any custom prints after alloc failures.

Thanks

> 
> Thanks,
> 
> Brett
Jakub Kicinski April 11, 2023, 7:49 p.m. UTC | #4
On Tue, 11 Apr 2023 15:47:04 +0300 Leon Romanovsky wrote:
> > We want to allocate memory from the node local to our PCI device, which is
> > not necessarily the same as the node that the thread is running on where
> > vzalloc() first tries to alloc.  
> 
> I'm not sure about it as you are running kernel thread which is
> triggered directly by device and most likely will run on same node as
> PCI device.

Isn't that true only for bus-side probing?
If you bind/unbind via sysfs does it still try to move to the right
node? Same for resources allocated during ifup?

> > Since it wasn't clear to us that vzalloc_node() does any fallback,   
> 
> vzalloc_node() doesn't do fallback, but vzalloc will find the right node
> for you.

Sounds like we may want a vzalloc_node_with_fallback or some GFP flag?
All the _node() helpers which don't fall back lead to unpleasant code
in the users.

> > we followed the example in the ena driver to follow up with a more
> > generic vzalloc() request.  
> 
> I don't know about ENA implementation, maybe they have right reasons to
> do it, but maybe they don't.
> 
> > 
> > Also, the custom message helps us quickly figure out exactly which
> > allocation failed.  
> 
> If OOM is missing some info to help debug allocation failures, let's add
> it there, but please do not add any custom prints after alloc failures.

+1
Leon Romanovsky April 12, 2023, 4:58 p.m. UTC | #5
On Tue, Apr 11, 2023 at 12:49:45PM -0700, Jakub Kicinski wrote:
> On Tue, 11 Apr 2023 15:47:04 +0300 Leon Romanovsky wrote:
> > > We want to allocate memory from the node local to our PCI device, which is
> > > not necessarily the same as the node that the thread is running on where
> > > vzalloc() first tries to alloc.  
> > 
> > I'm not sure about it as you are running kernel thread which is
> > triggered directly by device and most likely will run on same node as
> > PCI device.
> 
> Isn't that true only for bus-side probing?
> If you bind/unbind via sysfs does it still try to move to the right
> node? Same for resources allocated during ifup?

Kernel threads are more interesting case, as they are not controlled
through mempolicy (maybe it is not true in 2023, I'm not sure).

User triggered threads are subjected to mempolicy and all allocations
are expected to follow it. So users, who wants specific memory behaviour
should use it.

https://docs.kernel.org/6.1/admin-guide/mm/numa_memory_policy.html

There is a huge chance that fallback mechanisms proposed here in ionic
and implemented in ENA are "break" this interface.

> 
> > > Since it wasn't clear to us that vzalloc_node() does any fallback,   
> > 
> > vzalloc_node() doesn't do fallback, but vzalloc will find the right node
> > for you.
> 
> Sounds like we may want a vzalloc_node_with_fallback or some GFP flag?
> All the _node() helpers which don't fall back lead to unpleasant code
> in the users.

I would challenge the whole idea of having *_node() allocations in
driver code at the first place. Even in RDMA, where we super focused
on performance and allocation of memory in right place is super
critical, we rely on general kzalloc().

There is one exception in RDMA world (hfi1), but it is more because of
legacy implementation and not because of specific need, at least Intel
folks didn't success to convince me with real data.

> 
> > > we followed the example in the ena driver to follow up with a more
> > > generic vzalloc() request.  
> > 
> > I don't know about ENA implementation, maybe they have right reasons to
> > do it, but maybe they don't.
> > 
> > > 
> > > Also, the custom message helps us quickly figure out exactly which
> > > allocation failed.  
> > 
> > If OOM is missing some info to help debug allocation failures, let's add
> > it there, but please do not add any custom prints after alloc failures.
> 
> +1
Jakub Kicinski April 12, 2023, 7:44 p.m. UTC | #6
On Wed, 12 Apr 2023 19:58:16 +0300 Leon Romanovsky wrote:
> > > I'm not sure about it as you are running kernel thread which is
> > > triggered directly by device and most likely will run on same node as
> > > PCI device.  
> > 
> > Isn't that true only for bus-side probing?
> > If you bind/unbind via sysfs does it still try to move to the right
> > node? Same for resources allocated during ifup?  
> 
> Kernel threads are more interesting case, as they are not controlled
> through mempolicy (maybe it is not true in 2023, I'm not sure).
> 
> User triggered threads are subjected to mempolicy and all allocations
> are expected to follow it. So users, who wants specific memory behaviour
> should use it.
> 
> https://docs.kernel.org/6.1/admin-guide/mm/numa_memory_policy.html
> 
> There is a huge chance that fallback mechanisms proposed here in ionic
> and implemented in ENA are "break" this interface.

Ack, that's what I would have answered while working for a vendor
myself, 5 years ago. Now, after seeing how NICs get configured in
practice, and all the random tools which may decide to tweak some
random param and forget to pin themselves - I'm not as sure.

Having a policy configured per netdev and maybe netdev helpers for
memory allocation could be an option. We already link netdev to 
the struct device.

> > > vzalloc_node() doesn't do fallback, but vzalloc will find the right node
> > > for you.  
> > 
> > Sounds like we may want a vzalloc_node_with_fallback or some GFP flag?
> > All the _node() helpers which don't fall back lead to unpleasant code
> > in the users.  
> 
> I would challenge the whole idea of having *_node() allocations in
> driver code at the first place. Even in RDMA, where we super focused
> on performance and allocation of memory in right place is super
> critical, we rely on general kzalloc().
> 
> There is one exception in RDMA world (hfi1), but it is more because of
> legacy implementation and not because of specific need, at least Intel
> folks didn't success to convince me with real data.

Yes, but RDMA is much more heavy on the application side, much more
tightly integrated in general.
Leon Romanovsky April 13, 2023, 6:43 a.m. UTC | #7
On Wed, Apr 12, 2023 at 12:44:09PM -0700, Jakub Kicinski wrote:
> On Wed, 12 Apr 2023 19:58:16 +0300 Leon Romanovsky wrote:
> > > > I'm not sure about it as you are running kernel thread which is
> > > > triggered directly by device and most likely will run on same node as
> > > > PCI device.  
> > > 
> > > Isn't that true only for bus-side probing?
> > > If you bind/unbind via sysfs does it still try to move to the right
> > > node? Same for resources allocated during ifup?  
> > 
> > Kernel threads are more interesting case, as they are not controlled
> > through mempolicy (maybe it is not true in 2023, I'm not sure).
> > 
> > User triggered threads are subjected to mempolicy and all allocations
> > are expected to follow it. So users, who wants specific memory behaviour
> > should use it.
> > 
> > https://docs.kernel.org/6.1/admin-guide/mm/numa_memory_policy.html
> > 
> > There is a huge chance that fallback mechanisms proposed here in ionic
> > and implemented in ENA are "break" this interface.
> 
> Ack, that's what I would have answered while working for a vendor
> myself, 5 years ago. Now, after seeing how NICs get configured in
> practice, and all the random tools which may decide to tweak some
> random param and forget to pin themselves - I'm not as sure.

I would like to separate between tweaks to driver internals and general
kernel core functionality. Everything that fails under latter category
should be avoided in drivers and in-some extent in subsystems too.

NUMA, IRQ, e.t.c are one of such general features.

> 
> Having a policy configured per netdev and maybe netdev helpers for
> memory allocation could be an option. We already link netdev to 
> the struct device.

I don't think that it is really needed, I personally never saw real data
which supports claim that system default policy doesn't work for NICs.
I saw a lot of synthetic testing results where allocations were forced
to be taken from far node, but even in this case the performance
difference wasn't huge.

From reading the NUMA Locality docs, I can imagine that NICs already get
right NUMA node from the beginning.
https://docs.kernel.org/6.1/admin-guide/mm/numaperf.html

> 
> > > > vzalloc_node() doesn't do fallback, but vzalloc will find the right node
> > > > for you.  
> > > 
> > > Sounds like we may want a vzalloc_node_with_fallback or some GFP flag?
> > > All the _node() helpers which don't fall back lead to unpleasant code
> > > in the users.  
> > 
> > I would challenge the whole idea of having *_node() allocations in
> > driver code at the first place. Even in RDMA, where we super focused
> > on performance and allocation of memory in right place is super
> > critical, we rely on general kzalloc().
> > 
> > There is one exception in RDMA world (hfi1), but it is more because of
> > legacy implementation and not because of specific need, at least Intel
> > folks didn't success to convince me with real data.
> 
> Yes, but RDMA is much more heavy on the application side, much more
> tightly integrated in general.

Yes and no, we have vast number of in-kernel RDMA users (NVMe, RDS, NFS,
e.t.c) who care about performance.

Thanks
diff mbox series

Patch

diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
index 957027e546b3..2c4e226b8cf1 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
@@ -560,11 +560,15 @@  static int ionic_qcq_alloc(struct ionic_lif *lif, unsigned int type,
 	new->q.dev = dev;
 	new->flags = flags;
 
-	new->q.info = vzalloc(num_descs * sizeof(*new->q.info));
+	new->q.info = vzalloc_node(num_descs * sizeof(*new->q.info),
+				   dev_to_node(dev));
 	if (!new->q.info) {
-		netdev_err(lif->netdev, "Cannot allocate queue info\n");
-		err = -ENOMEM;
-		goto err_out_free_qcq;
+		new->q.info = vzalloc(num_descs * sizeof(*new->q.info));
+		if (!new->q.info) {
+			netdev_err(lif->netdev, "Cannot allocate queue info\n");
+			err = -ENOMEM;
+			goto err_out_free_qcq;
+		}
 	}
 
 	new->q.type = type;
@@ -581,11 +585,15 @@  static int ionic_qcq_alloc(struct ionic_lif *lif, unsigned int type,
 	if (err)
 		goto err_out;
 
-	new->cq.info = vzalloc(num_descs * sizeof(*new->cq.info));
+	new->cq.info = vzalloc_node(num_descs * sizeof(*new->cq.info),
+				    dev_to_node(dev));
 	if (!new->cq.info) {
-		netdev_err(lif->netdev, "Cannot allocate completion queue info\n");
-		err = -ENOMEM;
-		goto err_out_free_irq;
+		new->cq.info = vzalloc(num_descs * sizeof(*new->cq.info));
+		if (!new->cq.info) {
+			netdev_err(lif->netdev, "Cannot allocate completion queue info\n");
+			err = -ENOMEM;
+			goto err_out_free_irq;
+		}
 	}
 
 	err = ionic_cq_init(lif, &new->cq, &new->intr, num_descs, cq_desc_size);