Message ID | 20230407233645.35561-1-brett.creeley@amd.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] ionic: Fix allocation of q/cq info structures from device local node | expand |
On Fri, Apr 07, 2023 at 04:36:45PM -0700, Brett Creeley wrote: > Commit 116dce0ff047 ("ionic: Use vzalloc for large per-queue related > buffers") made a change to relieve memory pressure by making use of > vzalloc() due to the structures not requiring DMA mapping. However, > it overlooked that these structures are used in the fast path of the > driver and allocations on the non-local node could cause performance > degredation. Fix this by first attempting to use vzalloc_node() > using the device's local node and if that fails try again with > vzalloc(). > > Fixes: 116dce0ff047 ("ionic: Use vzalloc for large per-queue related buffers") > Signed-off-by: Neel Patel <neel.patel@amd.com> > Signed-off-by: Brett Creeley <brett.creeley@amd.com> > Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> > --- > .../net/ethernet/pensando/ionic/ionic_lif.c | 24 ++++++++++++------- > 1 file changed, 16 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c > index 957027e546b3..2c4e226b8cf1 100644 > --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c > +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c > @@ -560,11 +560,15 @@ static int ionic_qcq_alloc(struct ionic_lif *lif, unsigned int type, > new->q.dev = dev; > new->flags = flags; > > - new->q.info = vzalloc(num_descs * sizeof(*new->q.info)); > + new->q.info = vzalloc_node(num_descs * sizeof(*new->q.info), > + dev_to_node(dev)); > if (!new->q.info) { > - netdev_err(lif->netdev, "Cannot allocate queue info\n"); > - err = -ENOMEM; > - goto err_out_free_qcq; > + new->q.info = vzalloc(num_descs * sizeof(*new->q.info)); > + if (!new->q.info) { > + netdev_err(lif->netdev, "Cannot allocate queue info\n"); Kernel memory allocator will try local node first and if memory is depleted it will go to remote nodes. So basically, you open-coded that behaviour but with OOM splash when first call to vzalloc_node fails and with custom error message about memory allocation failure. Thanks
On 4/9/2023 3:52 AM, Leon Romanovsky wrote: > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. > > > On Fri, Apr 07, 2023 at 04:36:45PM -0700, Brett Creeley wrote: >> Commit 116dce0ff047 ("ionic: Use vzalloc for large per-queue related >> buffers") made a change to relieve memory pressure by making use of >> vzalloc() due to the structures not requiring DMA mapping. However, >> it overlooked that these structures are used in the fast path of the >> driver and allocations on the non-local node could cause performance >> degredation. Fix this by first attempting to use vzalloc_node() >> using the device's local node and if that fails try again with >> vzalloc(). >> >> Fixes: 116dce0ff047 ("ionic: Use vzalloc for large per-queue related buffers") >> Signed-off-by: Neel Patel <neel.patel@amd.com> >> Signed-off-by: Brett Creeley <brett.creeley@amd.com> >> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> >> --- >> .../net/ethernet/pensando/ionic/ionic_lif.c | 24 ++++++++++++------- >> 1 file changed, 16 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c >> index 957027e546b3..2c4e226b8cf1 100644 >> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c >> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c >> @@ -560,11 +560,15 @@ static int ionic_qcq_alloc(struct ionic_lif *lif, unsigned int type, >> new->q.dev = dev; >> new->flags = flags; >> >> - new->q.info = vzalloc(num_descs * sizeof(*new->q.info)); >> + new->q.info = vzalloc_node(num_descs * sizeof(*new->q.info), >> + dev_to_node(dev)); >> if (!new->q.info) { >> - netdev_err(lif->netdev, "Cannot allocate queue info\n"); >> - err = -ENOMEM; >> - goto err_out_free_qcq; >> + new->q.info = vzalloc(num_descs * sizeof(*new->q.info)); >> + if (!new->q.info) { >> + netdev_err(lif->netdev, "Cannot allocate queue info\n"); > > Kernel memory allocator will try local node first and if memory is > depleted it will go to remote nodes. So basically, you open-coded that > behaviour but with OOM splash when first call to vzalloc_node fails and > with custom error message about memory allocation failure. > > Thanks Leon, We want to allocate memory from the node local to our PCI device, which is not necessarily the same as the node that the thread is running on where vzalloc() first tries to alloc. Since it wasn't clear to us that vzalloc_node() does any fallback, we followed the example in the ena driver to follow up with a more generic vzalloc() request. Also, the custom message helps us quickly figure out exactly which allocation failed. Thanks, Brett
On Mon, Apr 10, 2023 at 11:16:03AM -0700, Brett Creeley wrote: > On 4/9/2023 3:52 AM, Leon Romanovsky wrote: > > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. > > > > > > On Fri, Apr 07, 2023 at 04:36:45PM -0700, Brett Creeley wrote: > > > Commit 116dce0ff047 ("ionic: Use vzalloc for large per-queue related > > > buffers") made a change to relieve memory pressure by making use of > > > vzalloc() due to the structures not requiring DMA mapping. However, > > > it overlooked that these structures are used in the fast path of the > > > driver and allocations on the non-local node could cause performance > > > degredation. Fix this by first attempting to use vzalloc_node() > > > using the device's local node and if that fails try again with > > > vzalloc(). > > > > > > Fixes: 116dce0ff047 ("ionic: Use vzalloc for large per-queue related buffers") > > > Signed-off-by: Neel Patel <neel.patel@amd.com> > > > Signed-off-by: Brett Creeley <brett.creeley@amd.com> > > > Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> > > > --- > > > .../net/ethernet/pensando/ionic/ionic_lif.c | 24 ++++++++++++------- > > > 1 file changed, 16 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c > > > index 957027e546b3..2c4e226b8cf1 100644 > > > --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c > > > +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c > > > @@ -560,11 +560,15 @@ static int ionic_qcq_alloc(struct ionic_lif *lif, unsigned int type, > > > new->q.dev = dev; > > > new->flags = flags; > > > > > > - new->q.info = vzalloc(num_descs * sizeof(*new->q.info)); > > > + new->q.info = vzalloc_node(num_descs * sizeof(*new->q.info), > > > + dev_to_node(dev)); > > > if (!new->q.info) { > > > - netdev_err(lif->netdev, "Cannot allocate queue info\n"); > > > - err = -ENOMEM; > > > - goto err_out_free_qcq; > > > + new->q.info = vzalloc(num_descs * sizeof(*new->q.info)); > > > + if (!new->q.info) { > > > + netdev_err(lif->netdev, "Cannot allocate queue info\n"); > > > > Kernel memory allocator will try local node first and if memory is > > depleted it will go to remote nodes. So basically, you open-coded that > > behaviour but with OOM splash when first call to vzalloc_node fails and > > with custom error message about memory allocation failure. > > > > Thanks > > Leon, > > We want to allocate memory from the node local to our PCI device, which is > not necessarily the same as the node that the thread is running on where > vzalloc() first tries to alloc. I'm not sure about it as you are running kernel thread which is triggered directly by device and most likely will run on same node as PCI device. > Since it wasn't clear to us that vzalloc_node() does any fallback, vzalloc_node() doesn't do fallback, but vzalloc will find the right node for you. > we followed the example in the ena driver to follow up with a more > generic vzalloc() request. I don't know about ENA implementation, maybe they have right reasons to do it, but maybe they don't. > > Also, the custom message helps us quickly figure out exactly which > allocation failed. If OOM is missing some info to help debug allocation failures, let's add it there, but please do not add any custom prints after alloc failures. Thanks > > Thanks, > > Brett
On Tue, 11 Apr 2023 15:47:04 +0300 Leon Romanovsky wrote: > > We want to allocate memory from the node local to our PCI device, which is > > not necessarily the same as the node that the thread is running on where > > vzalloc() first tries to alloc. > > I'm not sure about it as you are running kernel thread which is > triggered directly by device and most likely will run on same node as > PCI device. Isn't that true only for bus-side probing? If you bind/unbind via sysfs does it still try to move to the right node? Same for resources allocated during ifup? > > Since it wasn't clear to us that vzalloc_node() does any fallback, > > vzalloc_node() doesn't do fallback, but vzalloc will find the right node > for you. Sounds like we may want a vzalloc_node_with_fallback or some GFP flag? All the _node() helpers which don't fall back lead to unpleasant code in the users. > > we followed the example in the ena driver to follow up with a more > > generic vzalloc() request. > > I don't know about ENA implementation, maybe they have right reasons to > do it, but maybe they don't. > > > > > Also, the custom message helps us quickly figure out exactly which > > allocation failed. > > If OOM is missing some info to help debug allocation failures, let's add > it there, but please do not add any custom prints after alloc failures. +1
On Tue, Apr 11, 2023 at 12:49:45PM -0700, Jakub Kicinski wrote: > On Tue, 11 Apr 2023 15:47:04 +0300 Leon Romanovsky wrote: > > > We want to allocate memory from the node local to our PCI device, which is > > > not necessarily the same as the node that the thread is running on where > > > vzalloc() first tries to alloc. > > > > I'm not sure about it as you are running kernel thread which is > > triggered directly by device and most likely will run on same node as > > PCI device. > > Isn't that true only for bus-side probing? > If you bind/unbind via sysfs does it still try to move to the right > node? Same for resources allocated during ifup? Kernel threads are more interesting case, as they are not controlled through mempolicy (maybe it is not true in 2023, I'm not sure). User triggered threads are subjected to mempolicy and all allocations are expected to follow it. So users, who wants specific memory behaviour should use it. https://docs.kernel.org/6.1/admin-guide/mm/numa_memory_policy.html There is a huge chance that fallback mechanisms proposed here in ionic and implemented in ENA are "break" this interface. > > > > Since it wasn't clear to us that vzalloc_node() does any fallback, > > > > vzalloc_node() doesn't do fallback, but vzalloc will find the right node > > for you. > > Sounds like we may want a vzalloc_node_with_fallback or some GFP flag? > All the _node() helpers which don't fall back lead to unpleasant code > in the users. I would challenge the whole idea of having *_node() allocations in driver code at the first place. Even in RDMA, where we super focused on performance and allocation of memory in right place is super critical, we rely on general kzalloc(). There is one exception in RDMA world (hfi1), but it is more because of legacy implementation and not because of specific need, at least Intel folks didn't success to convince me with real data. > > > > we followed the example in the ena driver to follow up with a more > > > generic vzalloc() request. > > > > I don't know about ENA implementation, maybe they have right reasons to > > do it, but maybe they don't. > > > > > > > > Also, the custom message helps us quickly figure out exactly which > > > allocation failed. > > > > If OOM is missing some info to help debug allocation failures, let's add > > it there, but please do not add any custom prints after alloc failures. > > +1
On Wed, 12 Apr 2023 19:58:16 +0300 Leon Romanovsky wrote: > > > I'm not sure about it as you are running kernel thread which is > > > triggered directly by device and most likely will run on same node as > > > PCI device. > > > > Isn't that true only for bus-side probing? > > If you bind/unbind via sysfs does it still try to move to the right > > node? Same for resources allocated during ifup? > > Kernel threads are more interesting case, as they are not controlled > through mempolicy (maybe it is not true in 2023, I'm not sure). > > User triggered threads are subjected to mempolicy and all allocations > are expected to follow it. So users, who wants specific memory behaviour > should use it. > > https://docs.kernel.org/6.1/admin-guide/mm/numa_memory_policy.html > > There is a huge chance that fallback mechanisms proposed here in ionic > and implemented in ENA are "break" this interface. Ack, that's what I would have answered while working for a vendor myself, 5 years ago. Now, after seeing how NICs get configured in practice, and all the random tools which may decide to tweak some random param and forget to pin themselves - I'm not as sure. Having a policy configured per netdev and maybe netdev helpers for memory allocation could be an option. We already link netdev to the struct device. > > > vzalloc_node() doesn't do fallback, but vzalloc will find the right node > > > for you. > > > > Sounds like we may want a vzalloc_node_with_fallback or some GFP flag? > > All the _node() helpers which don't fall back lead to unpleasant code > > in the users. > > I would challenge the whole idea of having *_node() allocations in > driver code at the first place. Even in RDMA, where we super focused > on performance and allocation of memory in right place is super > critical, we rely on general kzalloc(). > > There is one exception in RDMA world (hfi1), but it is more because of > legacy implementation and not because of specific need, at least Intel > folks didn't success to convince me with real data. Yes, but RDMA is much more heavy on the application side, much more tightly integrated in general.
On Wed, Apr 12, 2023 at 12:44:09PM -0700, Jakub Kicinski wrote: > On Wed, 12 Apr 2023 19:58:16 +0300 Leon Romanovsky wrote: > > > > I'm not sure about it as you are running kernel thread which is > > > > triggered directly by device and most likely will run on same node as > > > > PCI device. > > > > > > Isn't that true only for bus-side probing? > > > If you bind/unbind via sysfs does it still try to move to the right > > > node? Same for resources allocated during ifup? > > > > Kernel threads are more interesting case, as they are not controlled > > through mempolicy (maybe it is not true in 2023, I'm not sure). > > > > User triggered threads are subjected to mempolicy and all allocations > > are expected to follow it. So users, who wants specific memory behaviour > > should use it. > > > > https://docs.kernel.org/6.1/admin-guide/mm/numa_memory_policy.html > > > > There is a huge chance that fallback mechanisms proposed here in ionic > > and implemented in ENA are "break" this interface. > > Ack, that's what I would have answered while working for a vendor > myself, 5 years ago. Now, after seeing how NICs get configured in > practice, and all the random tools which may decide to tweak some > random param and forget to pin themselves - I'm not as sure. I would like to separate between tweaks to driver internals and general kernel core functionality. Everything that fails under latter category should be avoided in drivers and in-some extent in subsystems too. NUMA, IRQ, e.t.c are one of such general features. > > Having a policy configured per netdev and maybe netdev helpers for > memory allocation could be an option. We already link netdev to > the struct device. I don't think that it is really needed, I personally never saw real data which supports claim that system default policy doesn't work for NICs. I saw a lot of synthetic testing results where allocations were forced to be taken from far node, but even in this case the performance difference wasn't huge. From reading the NUMA Locality docs, I can imagine that NICs already get right NUMA node from the beginning. https://docs.kernel.org/6.1/admin-guide/mm/numaperf.html > > > > > vzalloc_node() doesn't do fallback, but vzalloc will find the right node > > > > for you. > > > > > > Sounds like we may want a vzalloc_node_with_fallback or some GFP flag? > > > All the _node() helpers which don't fall back lead to unpleasant code > > > in the users. > > > > I would challenge the whole idea of having *_node() allocations in > > driver code at the first place. Even in RDMA, where we super focused > > on performance and allocation of memory in right place is super > > critical, we rely on general kzalloc(). > > > > There is one exception in RDMA world (hfi1), but it is more because of > > legacy implementation and not because of specific need, at least Intel > > folks didn't success to convince me with real data. > > Yes, but RDMA is much more heavy on the application side, much more > tightly integrated in general. Yes and no, we have vast number of in-kernel RDMA users (NVMe, RDS, NFS, e.t.c) who care about performance. Thanks
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c index 957027e546b3..2c4e226b8cf1 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c @@ -560,11 +560,15 @@ static int ionic_qcq_alloc(struct ionic_lif *lif, unsigned int type, new->q.dev = dev; new->flags = flags; - new->q.info = vzalloc(num_descs * sizeof(*new->q.info)); + new->q.info = vzalloc_node(num_descs * sizeof(*new->q.info), + dev_to_node(dev)); if (!new->q.info) { - netdev_err(lif->netdev, "Cannot allocate queue info\n"); - err = -ENOMEM; - goto err_out_free_qcq; + new->q.info = vzalloc(num_descs * sizeof(*new->q.info)); + if (!new->q.info) { + netdev_err(lif->netdev, "Cannot allocate queue info\n"); + err = -ENOMEM; + goto err_out_free_qcq; + } } new->q.type = type; @@ -581,11 +585,15 @@ static int ionic_qcq_alloc(struct ionic_lif *lif, unsigned int type, if (err) goto err_out; - new->cq.info = vzalloc(num_descs * sizeof(*new->cq.info)); + new->cq.info = vzalloc_node(num_descs * sizeof(*new->cq.info), + dev_to_node(dev)); if (!new->cq.info) { - netdev_err(lif->netdev, "Cannot allocate completion queue info\n"); - err = -ENOMEM; - goto err_out_free_irq; + new->cq.info = vzalloc(num_descs * sizeof(*new->cq.info)); + if (!new->cq.info) { + netdev_err(lif->netdev, "Cannot allocate completion queue info\n"); + err = -ENOMEM; + goto err_out_free_irq; + } } err = ionic_cq_init(lif, &new->cq, &new->intr, num_descs, cq_desc_size);