diff mbox series

[v1,net-next,2/2] net: ena: Extend customer metrics reporting support

Message ID 20240811100711.12921-3-darinzon@amazon.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series ENA driver metrics changes | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 29 this patch: 29
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 12 of 12 maintainers
netdev/build_clang success Errors and warnings before: 29 this patch: 29
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 29 this patch: 29
netdev/checkpatch warning WARNING: line length of 100 exceeds 80 columns WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns WARNING: line length of 98 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-11--21-00 (tests: 707)

Commit Message

David Arinzon Aug. 11, 2024, 10:07 a.m. UTC
Customers are using ethtool in order to extract
traffic allowance metrics for the interface.

The interface between the driver and the device
has been upgraded to allow more flexibility and
expendability. This change is required to allow
the addition of a new metric -
`conntrack_allowance_available` and additional
metrics in the future.

Signed-off-by: Ron Beider <rbeider@amazon.com>
Signed-off-by: Shahar Itzko <itzko@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
---
 .../net/ethernet/amazon/ena/ena_admin_defs.h  |  30 ++++
 drivers/net/ethernet/amazon/ena/ena_com.c     | 154 +++++++++++++---
 drivers/net/ethernet/amazon/ena/ena_com.h     |  59 +++++++
 drivers/net/ethernet/amazon/ena/ena_ethtool.c | 164 ++++++++++++------
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  14 +-
 5 files changed, 343 insertions(+), 78 deletions(-)

Comments

Jakub Kicinski Aug. 13, 2024, 1:58 a.m. UTC | #1
On Sun, 11 Aug 2024 13:07:11 +0300 David Arinzon wrote:
> +	ENA_ADMIN_BW_IN_ALLOWANCE_EXCEEDED         = 0,
> +	ENA_ADMIN_BW_OUT_ALLOWANCE_EXCEEDED        = 1,
> +	ENA_ADMIN_PPS_ALLOWANCE_EXCEEDED           = 2,
> +	ENA_ADMIN_CONNTRACK_ALLOWANCE_EXCEEDED     = 3,
> +	ENA_ADMIN_LINKLOCAL_ALLOWANCE_EXCEEDED     = 4,
> +	ENA_ADMIN_CONNTRACK_ALLOWANCE_AVAILABLE    = 5,

We have similar stats in the standard "queue-capable" stats:

https://docs.kernel.org/next/networking/netlink_spec/netdev.html#rx-hw-drop-ratelimits-uint
https://docs.kernel.org/next/networking/netlink_spec/netdev.html#tx-hw-drop-ratelimits-uint

they were added based on the virtio stats spec. They appear to map very
neatly to the first stats you have. Drivers must report the stats via
a common API if one exists.
David Arinzon Aug. 13, 2024, 11:29 a.m. UTC | #2
> On Sun, 11 Aug 2024 13:07:11 +0300 David Arinzon wrote:
> > +     ENA_ADMIN_BW_IN_ALLOWANCE_EXCEEDED         = 0,
> > +     ENA_ADMIN_BW_OUT_ALLOWANCE_EXCEEDED        = 1,
> > +     ENA_ADMIN_PPS_ALLOWANCE_EXCEEDED           = 2,
> > +     ENA_ADMIN_CONNTRACK_ALLOWANCE_EXCEEDED     = 3,
> > +     ENA_ADMIN_LINKLOCAL_ALLOWANCE_EXCEEDED     = 4,
> > +     ENA_ADMIN_CONNTRACK_ALLOWANCE_AVAILABLE    = 5,
> 
> We have similar stats in the standard "queue-capable" stats:
> 
> https://docs.kernel.org/next/networking/netlink_spec/netdev.html#rx-hw-
> drop-ratelimits-uint
> https://docs.kernel.org/next/networking/netlink_spec/netdev.html#tx-hw-
> drop-ratelimits-uint
> 
> they were added based on the virtio stats spec. They appear to map very
> neatly to the first stats you have. Drivers must report the stats via a common
> API if one exists.
> --
> pw-bot: cr

Thank you for bringing this to our attention, Jakub.

I will note that this patch modifies the infrastructure/logic in which these stats are retrieved to allow expandability and flexibility of the interface between the driver and the device (written in the commit message).
The top five (0 - 4) are already part of the upstream code and the last one (5) is added in this patch.

The statistics discussed here and are exposed by ENA are not on a queue level but on an interface level, therefore, I am not sure that the ones pointed out by you would be a good fit for us.

But in any case, would it be possible from your point of view to progress in two paths, one would be this patchset with the addition of the new metric and another would be to explore whether there are such stats on an interface level that can be exposed?

Thanks,
David
Jakub Kicinski Aug. 13, 2024, 3:10 p.m. UTC | #3
On Tue, 13 Aug 2024 11:29:50 +0000 Arinzon, David wrote:
> I will note that this patch modifies the infrastructure/logic in
> which these stats are retrieved to allow expandability and
> flexibility of the interface between the driver and the device
> (written in the commit message). The top five (0 - 4) are already
> part of the upstream code and the last one (5) is added in this patch.

That's not clear at all from the one sentence in the commit message.
Please don't assume that the reviewers are familiar with your driver.

> The statistics discussed here and are exposed by ENA are not on a
> queue level but on an interface level, therefore, I am not sure that
> the ones pointed out by you would be a good fit for us.

The API in question is queue-capable, but it also supports reporting
the stats for the overall device, without per-queue breakdown (via
the "get_base_stats" callback).

> But in any case, would it be possible from your point of view to
> progress in two paths, one would be this patchset with the addition
> of the new metric and another would be to explore whether there are
> such stats on an interface level that can be exposed?

Adding a callback and filling in two stats is not a large ask.
Just do it, please.
David Arinzon Aug. 14, 2024, 3:31 p.m. UTC | #4
> > I will note that this patch modifies the infrastructure/logic in which
> > these stats are retrieved to allow expandability and flexibility of
> > the interface between the driver and the device (written in the commit
> > message). The top five (0 - 4) are already part of the upstream code
> > and the last one (5) is added in this patch.
> 
> That's not clear at all from the one sentence in the commit message.
> Please don't assume that the reviewers are familiar with your driver.
> 
> > The statistics discussed here and are exposed by ENA are not on a
> > queue level but on an interface level, therefore, I am not sure that
> > the ones pointed out by you would be a good fit for us.
> 
> The API in question is queue-capable, but it also supports reporting the stats
> for the overall device, without per-queue breakdown (via the
> "get_base_stats" callback).
> 
> > But in any case, would it be possible from your point of view to
> > progress in two paths, one would be this patchset with the addition of
> > the new metric and another would be to explore whether there are such
> > stats on an interface level that can be exposed?
> 
> Adding a callback and filling in two stats is not a large ask.
> Just do it, please.

Hi Jakub,

I've looked into the definition of the metrics under question

Based on AWS documentation (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html)

bw_in_allowance_exceeded: The number of packets queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance.
bw_out_allowance_exceeded: The number of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance.

Based on the netlink spec (https://docs.kernel.org/next/networking/netlink_spec/netdev.html)

rx-hw-drop-ratelimits (uint)
doc: Number of the packets dropped by the device due to the received packets bitrate exceeding the device rate limit.
tx-hw-drop-ratelimits (uint)
doc: Number of the packets dropped by the device due to the transmit packets bitrate exceeding the device rate limit.

The AWS metrics are counting for packets dropped or queued (delayed, but are sent/received with a delay), a change in these metrics is an indication to customers to check their applications and workloads due to risk of exceeding limits.
There's no distinction between dropped and queued in these metrics, therefore, they do not match the ratelimits in the netlink spec.
In case there will be a separation of these metrics in the future to dropped and queued, we'll be able to add the support for hw-drop-ratelimits.

Thanks,
David
Jakub Kicinski Aug. 14, 2024, 7:11 p.m. UTC | #5
On Wed, 14 Aug 2024 15:31:49 +0000 Arinzon, David wrote:
> I've looked into the definition of the metrics under question
> 
> Based on AWS documentation (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html)
> 
> bw_in_allowance_exceeded: The number of packets queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance.
> bw_out_allowance_exceeded: The number of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance.
> 
> Based on the netlink spec (https://docs.kernel.org/next/networking/netlink_spec/netdev.html)
> 
> rx-hw-drop-ratelimits (uint)
> doc: Number of the packets dropped by the device due to the received packets bitrate exceeding the device rate limit.
> tx-hw-drop-ratelimits (uint)
> doc: Number of the packets dropped by the device due to the transmit packets bitrate exceeding the device rate limit.
> 
> The AWS metrics are counting for packets dropped or queued (delayed, but are sent/received with a delay), a change in these metrics is an indication to customers to check their applications and workloads due to risk of exceeding limits.
> There's no distinction between dropped and queued in these metrics, therefore, they do not match the ratelimits in the netlink spec.
> In case there will be a separation of these metrics in the future to dropped and queued, we'll be able to add the support for hw-drop-ratelimits.

Xuan, Michael, the virtio spec calls out drops due to b/w limit being
exceeded, but AWS people say their NICs also count packets buffered
but not dropped towards a similar metric.

I presume the virtio spec is supposed to cover the same use cases.
Have the stats been approved? Is it reasonable to extend the definition
of the "exceeded" stats in the virtio spec to cover what AWS specifies? 
Looks like PR is still open:
https://github.com/oasis-tcs/virtio-spec/issues/180
David Arinzon Aug. 16, 2024, 5:32 p.m. UTC | #6
> > I've looked into the definition of the metrics under question
> >
> > Based on AWS documentation
> > (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-
> networ
> > k-performance-ena.html)
> >
> > bw_in_allowance_exceeded: The number of packets queued or dropped
> because the inbound aggregate bandwidth exceeded the maximum for the
> instance.
> > bw_out_allowance_exceeded: The number of packets queued or dropped
> because the outbound aggregate bandwidth exceeded the maximum for the
> instance.
> >
> > Based on the netlink spec
> > (https://docs.kernel.org/next/networking/netlink_spec/netdev.html)
> >
> > rx-hw-drop-ratelimits (uint)
> > doc: Number of the packets dropped by the device due to the received
> packets bitrate exceeding the device rate limit.
> > tx-hw-drop-ratelimits (uint)
> > doc: Number of the packets dropped by the device due to the transmit
> packets bitrate exceeding the device rate limit.
> >
> > The AWS metrics are counting for packets dropped or queued (delayed,
> but are sent/received with a delay), a change in these metrics is an indication
> to customers to check their applications and workloads due to risk of
> exceeding limits.
> > There's no distinction between dropped and queued in these metrics,
> therefore, they do not match the ratelimits in the netlink spec.
> > In case there will be a separation of these metrics in the future to dropped
> and queued, we'll be able to add the support for hw-drop-ratelimits.
> 
> Xuan, Michael, the virtio spec calls out drops due to b/w limit being
> exceeded, but AWS people say their NICs also count packets buffered but
> not dropped towards a similar metric.
> 
> I presume the virtio spec is supposed to cover the same use cases.
> Have the stats been approved? Is it reasonable to extend the definition of
> the "exceeded" stats in the virtio spec to cover what AWS specifies?
> Looks like PR is still open:
> https://github.com/oasis-tcs/virtio-spec/issues/180

How do we move forward with this patchset?
Regarding the counter itself, even though we don't support this at the moment, I would recommend to keep the queued and dropped
as split (for example, add tx/rx-hw-queued-ratelimits, or something similar, if that makes sense). 

Thanks
David
Jakub Kicinski Aug. 17, 2024, 2:01 a.m. UTC | #7
On Fri, 16 Aug 2024 17:32:56 +0000 Arinzon, David wrote:
> > Xuan, Michael, the virtio spec calls out drops due to b/w limit being
> > exceeded, but AWS people say their NICs also count packets buffered but
> > not dropped towards a similar metric.
> > 
> > I presume the virtio spec is supposed to cover the same use cases.
> > Have the stats been approved? Is it reasonable to extend the definition of
> > the "exceeded" stats in the virtio spec to cover what AWS specifies?
> > Looks like PR is still open:
> > https://github.com/oasis-tcs/virtio-spec/issues/180  
> 
> How do we move forward with this patchset?
> Regarding the counter itself, even though we don't support this at
> the moment, I would recommend to keep the queued and dropped as split
> (for example, add tx/rx-hw-queued-ratelimits, or something similar,
> if that makes sense). 

Could you share some background for your recommendation?
As you say, the advice contradicts your own code :S
Let's iron this out for virtio's benefit.

You can resend the first patch separately in the meantime.
David Arinzon Aug. 17, 2024, 4:42 a.m. UTC | #8
> > > Xuan, Michael, the virtio spec calls out drops due to b/w limit
> > > being exceeded, but AWS people say their NICs also count packets
> > > buffered but not dropped towards a similar metric.
> > >
> > > I presume the virtio spec is supposed to cover the same use cases.
> > > Have the stats been approved? Is it reasonable to extend the
> > > definition of the "exceeded" stats in the virtio spec to cover what AWS
> specifies?
> > > Looks like PR is still open:
> > > https://github.com/oasis-tcs/virtio-spec/issues/180
> >
> > How do we move forward with this patchset?
> > Regarding the counter itself, even though we don't support this at the
> > moment, I would recommend to keep the queued and dropped as split
> (for
> > example, add tx/rx-hw-queued-ratelimits, or something similar, if that
> > makes sense).
> 
> Could you share some background for your recommendation?
> As you say, the advice contradicts your own code :S Let's iron this out for
> virtio's benefit.
> 

The links I've shared before are of public AWS documentation, therefore, this is what AWS currently supports.
When looking at the definition of what queued and what dropped means, having such a separation
will benefit customers better as it will provide them more detailed information about the limits
that they're about to exceed or are already exceeding. A queued packet will be received with a
delay, while a dropped packet wouldn't arrive to the destination.
In both cases, customers need to look into their applications and network loads and see what should
be changed, but when I'm looking at a case where packets are dropped, it is more dire (in some use-cases)
that when packets are being delayed, which is possibly more transparent to some network loads that are
not looking for cases like low latency.

Even though the ENA driver can't support it at the moment, given that the stats interface is
aiming for other drivers to implement (based on their level of support), the level of granularity and separation
will be more generic and more beneficial to customers. In my opinion, the suggestion to virtio is more
posing a limitation based on what AWS currently supports than creating something
generic that other drivers will hopefully implement based on their NICs.

> You can resend the first patch separately in the meantime.

I prefer them to be picked up together.
David Arinzon Aug. 21, 2024, 6:03 p.m. UTC | #9
> > > > Xuan, Michael, the virtio spec calls out drops due to b/w limit
> > > > being exceeded, but AWS people say their NICs also count packets
> > > > buffered but not dropped towards a similar metric.
> > > >
> > > > I presume the virtio spec is supposed to cover the same use cases.
> > > > Have the stats been approved? Is it reasonable to extend the
> > > > definition of the "exceeded" stats in the virtio spec to cover
> > > > what AWS
> > specifies?
> > > > Looks like PR is still open:
> > > > https://github.com/oasis-tcs/virtio-spec/issues/180
> > >
> > > How do we move forward with this patchset?
> > > Regarding the counter itself, even though we don't support this at
> > > the moment, I would recommend to keep the queued and dropped as
> > > split
> > (for
> > > example, add tx/rx-hw-queued-ratelimits, or something similar, if
> > > that makes sense).
> >
> > Could you share some background for your recommendation?
> > As you say, the advice contradicts your own code :S Let's iron this
> > out for virtio's benefit.
> >
> 
> The links I've shared before are of public AWS documentation, therefore,
> this is what AWS currently supports.
> When looking at the definition of what queued and what dropped means,
> having such a separation will benefit customers better as it will provide them
> more detailed information about the limits that they're about to exceed or
> are already exceeding. A queued packet will be received with a delay, while a
> dropped packet wouldn't arrive to the destination.
> In both cases, customers need to look into their applications and network
> loads and see what should be changed, but when I'm looking at a case where
> packets are dropped, it is more dire (in some use-cases) that when packets
> are being delayed, which is possibly more transparent to some network loads
> that are not looking for cases like low latency.
> 
> Even though the ENA driver can't support it at the moment, given that the
> stats interface is aiming for other drivers to implement (based on their level
> of support), the level of granularity and separation will be more generic and
> more beneficial to customers. In my opinion, the suggestion to virtio is more
> posing a limitation based on what AWS currently supports than creating
> something generic that other drivers will hopefully implement based on their
> NICs.
> 
> > You can resend the first patch separately in the meantime.
> 
> I prefer them to be picked up together.
> 

I see that there's no feedback from Xuan or Michael.

Jakub, what are your thoughts about my suggestion?
Jakub Kicinski Aug. 21, 2024, 10:18 p.m. UTC | #10
On Wed, 21 Aug 2024 18:03:27 +0000 Arinzon, David wrote:
> I see that there's no feedback from Xuan or Michael.
> 
> Jakub, what are your thoughts about my suggestion?

I suggest you stop pinging me.
Gal Pressman Aug. 27, 2024, 4:41 p.m. UTC | #11
On 22/08/2024 1:18, Jakub Kicinski wrote:
> On Wed, 21 Aug 2024 18:03:27 +0000 Arinzon, David wrote:
>> I see that there's no feedback from Xuan or Michael.
>>
>> Jakub, what are your thoughts about my suggestion?
> 
> I suggest you stop pinging me.
> 

Note: my reply does not mean I like/agree with anything I saw in this
patch, nor do I mind if it gets merged eventually.

Still, given that the counters in question are already exposed through
ethtool (which was very unclear from the poor commit message/cover
letter) it's kinda unfair to hold back a patch that changes the way the
counters are queried, or adds an additional counter which so far no one
objected to.

Perhaps David can show some good will and help sort out the virtio
stuff, or push his team to expose counters that match the netlink
semantics, but this should've been the "blocker" when they first
introduced these counters, now is too late.
Jakub Kicinski Aug. 27, 2024, 6:04 p.m. UTC | #12
On Tue, 27 Aug 2024 19:41:47 +0300 Gal Pressman wrote:
> Perhaps David can show some good will and help sort out the virtio
> stuff,

Why do you say "perhaps he could". AFAICT all he did is say "they
aren't replying" after I CCed them. Do I need to spell out how to
engage with development community on the Internet?
Gal Pressman Aug. 27, 2024, 6:33 p.m. UTC | #13
On 27/08/2024 21:04, Jakub Kicinski wrote:
> On Tue, 27 Aug 2024 19:41:47 +0300 Gal Pressman wrote:
>> Perhaps David can show some good will and help sort out the virtio
>> stuff,
> 
> Why do you say "perhaps he could". AFAICT all he did is say "they
> aren't replying" after I CCed them. Do I need to spell out how to
> engage with development community on the Internet?

My phrasing is due to the fact that I'm in no position to tell David
what to do.. I just got the feeling that he didn't get your hint.

I understand your motivation, but my point is that even without him
being a "good citizen" these counters are already out there, should they
really block new ones?
Jakub Kicinski Aug. 27, 2024, 6:39 p.m. UTC | #14
On Tue, 27 Aug 2024 21:33:55 +0300 Gal Pressman wrote:
> > Why do you say "perhaps he could". AFAICT all he did is say "they
> > aren't replying" after I CCed them. Do I need to spell out how to
> > engage with development community on the Internet?  
> 
> My phrasing is due to the fact that I'm in no position to tell David
> what to do.. I just got the feeling that he didn't get your hint.
> 
> I understand your motivation, but my point is that even without him
> being a "good citizen" these counters are already out there, should they
> really block new ones?

Agreed, not great to block them.. Unfortunately it's literally 
the only lever we have as maintainers.

Amazon's experience and recommendation needs to be communicated to
the folks working on the virtio spec. IDK if they follow the list,
but there's a process for the spec, and a GH PR to which I linked.
Parav Pandit Aug. 28, 2024, 3:59 a.m. UTC | #15
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, August 15, 2024 12:42 AM
> 
> On Wed, 14 Aug 2024 15:31:49 +0000 Arinzon, David wrote:
> > I've looked into the definition of the metrics under question
> >
> > Based on AWS documentation
> > (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-
> networ
> > k-performance-ena.html)
> >
> > bw_in_allowance_exceeded: The number of packets queued or dropped
> because the inbound aggregate bandwidth exceeded the maximum for the
> instance.
> > bw_out_allowance_exceeded: The number of packets queued or dropped
> because the outbound aggregate bandwidth exceeded the maximum for the
> instance.
> >
> > Based on the netlink spec
> > (https://docs.kernel.org/next/networking/netlink_spec/netdev.html)
> >
> > rx-hw-drop-ratelimits (uint)
> > doc: Number of the packets dropped by the device due to the received
> packets bitrate exceeding the device rate limit.
> > tx-hw-drop-ratelimits (uint)
> > doc: Number of the packets dropped by the device due to the transmit
> packets bitrate exceeding the device rate limit.
> >
> > The AWS metrics are counting for packets dropped or queued (delayed, but
> are sent/received with a delay), a change in these metrics is an indication to
> customers to check their applications and workloads due to risk of exceeding
> limits.
> > There's no distinction between dropped and queued in these metrics,
> therefore, they do not match the ratelimits in the netlink spec.
> > In case there will be a separation of these metrics in the future to dropped
> and queued, we'll be able to add the support for hw-drop-ratelimits.
> 
> Xuan, Michael, the virtio spec calls out drops due to b/w limit being
> exceeded, but AWS people say their NICs also count packets buffered but not
> dropped towards a similar metric.
> 
> I presume the virtio spec is supposed to cover the same use cases.
On tx side, number of packets may not be queued, but may not be even DMAed if the rate has exceeded.
This is hw nic implementation detail and a choice with trade-offs.

Similarly on rx, one may implement drop or queue or both (queue upto some limit, and drop beyond it).

> Have the stats been approved?
Yes. it is approved last year; I have also reviewed it; It is part of the spec nearly 10 months ago at [1].
GH PR is merged but GH is not updated yet.

[1] https://github.com/oasis-tcs/virtio-spec/commit/42f389989823039724f95bbbd243291ab0064f82

> Is it reasonable to extend the definition of the
> "exceeded" stats in the virtio spec to cover what AWS specifies?
Virtio may add new stats for exceeded stats in future.
But I do not understand how AWS ENA nic is related to virtio PCI HW nic.

Should virtio implement it? may be yes. Looks useful to me.
Should it be now in virtio spec, not sure, this depends on virtio community and actual hw/sw supporting it.

> Looks like PR is still open:
> https://github.com/oasis-tcs/virtio-spec/issues/180
Spec already has it at [1] for drops. GH PR is not upto date.
David Arinzon Sept. 3, 2024, 4:29 a.m. UTC | #16
> > > I've looked into the definition of the metrics under question
> > >
> > > Based on AWS documentation
> > > (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-
> > networ
> > > k-performance-ena.html)
> > >
> > > bw_in_allowance_exceeded: The number of packets queued or dropped
> > because the inbound aggregate bandwidth exceeded the maximum for the
> > instance.
> > > bw_out_allowance_exceeded: The number of packets queued or
> dropped
> > because the outbound aggregate bandwidth exceeded the maximum for
> the
> > instance.
> > >
> > > Based on the netlink spec
> > > (https://docs.kernel.org/next/networking/netlink_spec/netdev.html)
> > >
> > > rx-hw-drop-ratelimits (uint)
> > > doc: Number of the packets dropped by the device due to the received
> > packets bitrate exceeding the device rate limit.
> > > tx-hw-drop-ratelimits (uint)
> > > doc: Number of the packets dropped by the device due to the transmit
> > packets bitrate exceeding the device rate limit.
> > >
> > > The AWS metrics are counting for packets dropped or queued (delayed,
> > > but
> > are sent/received with a delay), a change in these metrics is an
> > indication to customers to check their applications and workloads due
> > to risk of exceeding limits.
> > > There's no distinction between dropped and queued in these metrics,
> > therefore, they do not match the ratelimits in the netlink spec.
> > > In case there will be a separation of these metrics in the future to
> > > dropped
> > and queued, we'll be able to add the support for hw-drop-ratelimits.
> >
> > Xuan, Michael, the virtio spec calls out drops due to b/w limit being
> > exceeded, but AWS people say their NICs also count packets buffered
> > but not dropped towards a similar metric.
> >
> > I presume the virtio spec is supposed to cover the same use cases.
> On tx side, number of packets may not be queued, but may not be even
> DMAed if the rate has exceeded.
> This is hw nic implementation detail and a choice with trade-offs.
> 
> Similarly on rx, one may implement drop or queue or both (queue upto some
> limit, and drop beyond it).
> 
> > Have the stats been approved?
> Yes. it is approved last year; I have also reviewed it; It is part of the spec
> nearly 10 months ago at [1].
> GH PR is merged but GH is not updated yet.
> 
> [1] https://github.com/oasis-tcs/virtio-
> spec/commit/42f389989823039724f95bbbd243291ab0064f82
> 
> > Is it reasonable to extend the definition of the "exceeded" stats in
> > the virtio spec to cover what AWS specifies?
> Virtio may add new stats for exceeded stats in future.
> But I do not understand how AWS ENA nic is related to virtio PCI HW nic.
> 
> Should virtio implement it? may be yes. Looks useful to me.
> Should it be now in virtio spec, not sure, this depends on virtio community
> and actual hw/sw supporting it.
> 
> > Looks like PR is still open:
> > https://github.com/oasis-tcs/virtio-spec/issues/180
> Spec already has it at [1] for drops. GH PR is not upto date.

Thank you for the reply, Parav.
I've raised the query and the summary of this discussion in the above mentioned github ticket.
Xuan Zhuo Sept. 4, 2024, 8:05 a.m. UTC | #17
On Tue, 3 Sep 2024 04:29:18 +0000, "Arinzon, David" <darinzon@amazon.com> wrote:
> > > > I've looked into the definition of the metrics under question
> > > >
> > > > Based on AWS documentation
> > > > (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-
> > > networ
> > > > k-performance-ena.html)
> > > >
> > > > bw_in_allowance_exceeded: The number of packets queued or dropped
> > > because the inbound aggregate bandwidth exceeded the maximum for the
> > > instance.
> > > > bw_out_allowance_exceeded: The number of packets queued or
> > dropped
> > > because the outbound aggregate bandwidth exceeded the maximum for
> > the
> > > instance.
> > > >
> > > > Based on the netlink spec
> > > > (https://docs.kernel.org/next/networking/netlink_spec/netdev.html)
> > > >
> > > > rx-hw-drop-ratelimits (uint)
> > > > doc: Number of the packets dropped by the device due to the received
> > > packets bitrate exceeding the device rate limit.
> > > > tx-hw-drop-ratelimits (uint)
> > > > doc: Number of the packets dropped by the device due to the transmit
> > > packets bitrate exceeding the device rate limit.
> > > >
> > > > The AWS metrics are counting for packets dropped or queued (delayed,
> > > > but
> > > are sent/received with a delay), a change in these metrics is an
> > > indication to customers to check their applications and workloads due
> > > to risk of exceeding limits.
> > > > There's no distinction between dropped and queued in these metrics,
> > > therefore, they do not match the ratelimits in the netlink spec.
> > > > In case there will be a separation of these metrics in the future to
> > > > dropped
> > > and queued, we'll be able to add the support for hw-drop-ratelimits.
> > >
> > > Xuan, Michael, the virtio spec calls out drops due to b/w limit being
> > > exceeded, but AWS people say their NICs also count packets buffered
> > > but not dropped towards a similar metric.
> > >
> > > I presume the virtio spec is supposed to cover the same use cases.
> > On tx side, number of packets may not be queued, but may not be even
> > DMAed if the rate has exceeded.
> > This is hw nic implementation detail and a choice with trade-offs.
> >
> > Similarly on rx, one may implement drop or queue or both (queue upto some
> > limit, and drop beyond it).
> >
> > > Have the stats been approved?
> > Yes. it is approved last year; I have also reviewed it; It is part of the spec
> > nearly 10 months ago at [1].
> > GH PR is merged but GH is not updated yet.
> >
> > [1] https://github.com/oasis-tcs/virtio-
> > spec/commit/42f389989823039724f95bbbd243291ab0064f82
> >
> > > Is it reasonable to extend the definition of the "exceeded" stats in
> > > the virtio spec to cover what AWS specifies?
> > Virtio may add new stats for exceeded stats in future.
> > But I do not understand how AWS ENA nic is related to virtio PCI HW nic.
> >
> > Should virtio implement it? may be yes. Looks useful to me.
> > Should it be now in virtio spec, not sure, this depends on virtio community
> > and actual hw/sw supporting it.
> >
> > > Looks like PR is still open:
> > > https://github.com/oasis-tcs/virtio-spec/issues/180
> > Spec already has it at [1] for drops. GH PR is not upto date.
>
> Thank you for the reply, Parav.
> I've raised the query and the summary of this discussion in the above mentioned github ticket.
>

I saw your reply on github.

So what is the question?

Now the stats are rx/tx_hw_drop_ratelimits, so I think these stats should only
count the number of dropped packets.

Yes, I also think the stats of queue packets are good. But that may be
new stats in the next version of the virtio spec or with new virtio feature.

But for the user, I thinks these are important. For me, I think nic
should provide all these stats.

Thanks.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
index 74772b00..9d9fa655 100644
--- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
+++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
@@ -7,6 +7,21 @@ 
 
 #define ENA_ADMIN_RSS_KEY_PARTS              10
 
+#define ENA_ADMIN_CUSTOMER_METRICS_SUPPORT_MASK 0x3F
+#define ENA_ADMIN_CUSTOMER_METRICS_MIN_SUPPORT_MASK 0x1F
+
+ /* customer metrics - in correlation with
+  * ENA_ADMIN_CUSTOMER_METRICS_SUPPORT_MASK
+  */
+enum ena_admin_customer_metrics_id {
+	ENA_ADMIN_BW_IN_ALLOWANCE_EXCEEDED         = 0,
+	ENA_ADMIN_BW_OUT_ALLOWANCE_EXCEEDED        = 1,
+	ENA_ADMIN_PPS_ALLOWANCE_EXCEEDED           = 2,
+	ENA_ADMIN_CONNTRACK_ALLOWANCE_EXCEEDED     = 3,
+	ENA_ADMIN_LINKLOCAL_ALLOWANCE_EXCEEDED     = 4,
+	ENA_ADMIN_CONNTRACK_ALLOWANCE_AVAILABLE    = 5,
+};
+
 enum ena_admin_aq_opcode {
 	ENA_ADMIN_CREATE_SQ                         = 1,
 	ENA_ADMIN_DESTROY_SQ                        = 2,
@@ -53,6 +68,7 @@  enum ena_admin_aq_caps_id {
 	ENA_ADMIN_ENI_STATS                         = 0,
 	/* ENA SRD customer metrics */
 	ENA_ADMIN_ENA_SRD_INFO                      = 1,
+	ENA_ADMIN_CUSTOMER_METRICS                  = 2,
 };
 
 enum ena_admin_placement_policy_type {
@@ -103,6 +119,7 @@  enum ena_admin_get_stats_type {
 	ENA_ADMIN_GET_STATS_TYPE_ENI                = 2,
 	/* extra HW stats for ENA SRD */
 	ENA_ADMIN_GET_STATS_TYPE_ENA_SRD            = 3,
+	ENA_ADMIN_GET_STATS_TYPE_CUSTOMER_METRICS   = 4,
 };
 
 enum ena_admin_get_stats_scope {
@@ -377,6 +394,9 @@  struct ena_admin_aq_get_stats_cmd {
 	 * stats of other device
 	 */
 	u16 device_id;
+
+	/* a bitmap representing the requested metric values */
+	u64 requested_metrics;
 };
 
 /* Basic Statistics Command. */
@@ -459,6 +479,14 @@  struct ena_admin_ena_srd_info {
 	struct ena_admin_ena_srd_stats ena_srd_stats;
 };
 
+/* Customer Metrics Command. */
+struct ena_admin_customer_metrics {
+	/* A bitmap representing the reported customer metrics according to
+	 * the order they are reported
+	 */
+	u64 reported_metrics;
+};
+
 struct ena_admin_acq_get_stats_resp {
 	struct ena_admin_acq_common_desc acq_common_desc;
 
@@ -470,6 +498,8 @@  struct ena_admin_acq_get_stats_resp {
 		struct ena_admin_eni_stats eni_stats;
 
 		struct ena_admin_ena_srd_info ena_srd_info;
+
+		struct ena_admin_customer_metrics customer_metrics;
 	} u;
 };
 
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index 3cc3830d..d958cda9 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -1881,6 +1881,56 @@  int ena_com_get_link_params(struct ena_com_dev *ena_dev,
 	return ena_com_get_feature(ena_dev, resp, ENA_ADMIN_LINK_CONFIG, 0);
 }
 
+static int ena_get_dev_stats(struct ena_com_dev *ena_dev,
+			     struct ena_com_stats_ctx *ctx,
+			     enum ena_admin_get_stats_type type)
+{
+	struct ena_admin_acq_get_stats_resp *get_resp = &ctx->get_resp;
+	struct ena_admin_aq_get_stats_cmd *get_cmd = &ctx->get_cmd;
+	struct ena_com_admin_queue *admin_queue;
+	int ret;
+
+	admin_queue = &ena_dev->admin_queue;
+
+	get_cmd->aq_common_descriptor.opcode = ENA_ADMIN_GET_STATS;
+	get_cmd->aq_common_descriptor.flags = 0;
+	get_cmd->type = type;
+
+	ret = ena_com_execute_admin_command(admin_queue,
+					    (struct ena_admin_aq_entry *)get_cmd,
+					    sizeof(*get_cmd),
+					    (struct ena_admin_acq_entry *)get_resp,
+					    sizeof(*get_resp));
+
+	if (unlikely(ret))
+		netdev_err(ena_dev->net_device, "Failed to get stats. error: %d\n", ret);
+
+	return ret;
+}
+
+static void ena_com_set_supported_customer_metrics(struct ena_com_dev *ena_dev)
+{
+	struct ena_customer_metrics *customer_metrics;
+	struct ena_com_stats_ctx ctx;
+	int ret;
+
+	customer_metrics = &ena_dev->customer_metrics;
+	if (!ena_com_get_cap(ena_dev, ENA_ADMIN_CUSTOMER_METRICS)) {
+		customer_metrics->supported_metrics = ENA_ADMIN_CUSTOMER_METRICS_MIN_SUPPORT_MASK;
+		return;
+	}
+
+	memset(&ctx, 0x0, sizeof(ctx));
+	ctx.get_cmd.requested_metrics = ENA_ADMIN_CUSTOMER_METRICS_SUPPORT_MASK;
+	ret = ena_get_dev_stats(ena_dev, &ctx, ENA_ADMIN_GET_STATS_TYPE_CUSTOMER_METRICS);
+	if (likely(ret == 0))
+		customer_metrics->supported_metrics =
+			ctx.get_resp.u.customer_metrics.reported_metrics;
+	else
+		netdev_err(ena_dev->net_device,
+			   "Failed to query customer metrics support. error: %d\n", ret);
+}
+
 int ena_com_get_dev_attr_feat(struct ena_com_dev *ena_dev,
 			      struct ena_com_dev_get_features_ctx *get_feat_ctx)
 {
@@ -1960,6 +2010,8 @@  int ena_com_get_dev_attr_feat(struct ena_com_dev *ena_dev,
 	else
 		return rc;
 
+	ena_com_set_supported_customer_metrics(ena_dev);
+
 	return 0;
 }
 
@@ -2104,33 +2156,6 @@  int ena_com_dev_reset(struct ena_com_dev *ena_dev,
 	return 0;
 }
 
-static int ena_get_dev_stats(struct ena_com_dev *ena_dev,
-			     struct ena_com_stats_ctx *ctx,
-			     enum ena_admin_get_stats_type type)
-{
-	struct ena_admin_aq_get_stats_cmd *get_cmd = &ctx->get_cmd;
-	struct ena_admin_acq_get_stats_resp *get_resp = &ctx->get_resp;
-	struct ena_com_admin_queue *admin_queue;
-	int ret;
-
-	admin_queue = &ena_dev->admin_queue;
-
-	get_cmd->aq_common_descriptor.opcode = ENA_ADMIN_GET_STATS;
-	get_cmd->aq_common_descriptor.flags = 0;
-	get_cmd->type = type;
-
-	ret =  ena_com_execute_admin_command(admin_queue,
-					     (struct ena_admin_aq_entry *)get_cmd,
-					     sizeof(*get_cmd),
-					     (struct ena_admin_acq_entry *)get_resp,
-					     sizeof(*get_resp));
-
-	if (unlikely(ret))
-		netdev_err(ena_dev->net_device, "Failed to get stats. error: %d\n", ret);
-
-	return ret;
-}
-
 int ena_com_get_eni_stats(struct ena_com_dev *ena_dev,
 			  struct ena_admin_eni_stats *stats)
 {
@@ -2188,6 +2213,50 @@  int ena_com_get_dev_basic_stats(struct ena_com_dev *ena_dev,
 	return ret;
 }
 
+int ena_com_get_customer_metrics(struct ena_com_dev *ena_dev, char *buffer, u32 len)
+{
+	struct ena_admin_aq_get_stats_cmd *get_cmd;
+	struct ena_com_stats_ctx ctx;
+	int ret;
+
+	if (unlikely(len > ena_dev->customer_metrics.buffer_len)) {
+		netdev_err(ena_dev->net_device,
+			   "Invalid buffer size %u. The given buffer is too big.\n", len);
+		return -EINVAL;
+	}
+
+	if (!ena_com_get_cap(ena_dev, ENA_ADMIN_CUSTOMER_METRICS)) {
+		netdev_err(ena_dev->net_device, "Capability %d not supported.\n",
+			   ENA_ADMIN_CUSTOMER_METRICS);
+		return -EOPNOTSUPP;
+	}
+
+	if (!ena_dev->customer_metrics.supported_metrics) {
+		netdev_err(ena_dev->net_device, "No supported customer metrics.\n");
+		return -EOPNOTSUPP;
+	}
+
+	get_cmd = &ctx.get_cmd;
+	memset(&ctx, 0x0, sizeof(ctx));
+	ret = ena_com_mem_addr_set(ena_dev,
+				   &get_cmd->u.control_buffer.address,
+				   ena_dev->customer_metrics.buffer_dma_addr);
+	if (unlikely(ret)) {
+		netdev_err(ena_dev->net_device, "Memory address set failed.\n");
+		return ret;
+	}
+
+	get_cmd->u.control_buffer.length = ena_dev->customer_metrics.buffer_len;
+	get_cmd->requested_metrics = ena_dev->customer_metrics.supported_metrics;
+	ret = ena_get_dev_stats(ena_dev, &ctx, ENA_ADMIN_GET_STATS_TYPE_CUSTOMER_METRICS);
+	if (likely(ret == 0))
+		memcpy(buffer, ena_dev->customer_metrics.buffer_virt_addr, len);
+	else
+		netdev_err(ena_dev->net_device, "Failed to get customer metrics. error: %d\n", ret);
+
+	return ret;
+}
+
 int ena_com_set_dev_mtu(struct ena_com_dev *ena_dev, u32 mtu)
 {
 	struct ena_com_admin_queue *admin_queue;
@@ -2727,6 +2796,24 @@  int ena_com_allocate_debug_area(struct ena_com_dev *ena_dev,
 	return 0;
 }
 
+int ena_com_allocate_customer_metrics_buffer(struct ena_com_dev *ena_dev)
+{
+	struct ena_customer_metrics *customer_metrics = &ena_dev->customer_metrics;
+
+	customer_metrics->buffer_len = ENA_CUSTOMER_METRICS_BUFFER_SIZE;
+	customer_metrics->buffer_virt_addr = NULL;
+
+	customer_metrics->buffer_virt_addr =
+		dma_alloc_coherent(ena_dev->dmadev, customer_metrics->buffer_len,
+				   &customer_metrics->buffer_dma_addr, GFP_KERNEL);
+	if (!customer_metrics->buffer_virt_addr) {
+		customer_metrics->buffer_len = 0;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
 void ena_com_delete_host_info(struct ena_com_dev *ena_dev)
 {
 	struct ena_host_attribute *host_attr = &ena_dev->host_attr;
@@ -2749,6 +2836,19 @@  void ena_com_delete_debug_area(struct ena_com_dev *ena_dev)
 	}
 }
 
+void ena_com_delete_customer_metrics_buffer(struct ena_com_dev *ena_dev)
+{
+	struct ena_customer_metrics *customer_metrics = &ena_dev->customer_metrics;
+
+	if (customer_metrics->buffer_virt_addr) {
+		dma_free_coherent(ena_dev->dmadev, customer_metrics->buffer_len,
+				  customer_metrics->buffer_virt_addr,
+				  customer_metrics->buffer_dma_addr);
+		customer_metrics->buffer_virt_addr = NULL;
+		customer_metrics->buffer_len = 0;
+	}
+}
+
 int ena_com_set_host_attributes(struct ena_com_dev *ena_dev)
 {
 	struct ena_host_attribute *host_attr = &ena_dev->host_attr;
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h b/drivers/net/ethernet/amazon/ena/ena_com.h
index 372066e0..a372c5e7 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.h
+++ b/drivers/net/ethernet/amazon/ena/ena_com.h
@@ -42,6 +42,8 @@ 
 #define ADMIN_CQ_SIZE(depth)	((depth) * sizeof(struct ena_admin_acq_entry))
 #define ADMIN_AENQ_SIZE(depth)	((depth) * sizeof(struct ena_admin_aenq_entry))
 
+#define ENA_CUSTOMER_METRICS_BUFFER_SIZE 512
+
 /*****************************************************************************/
 /*****************************************************************************/
 /* ENA adaptive interrupt moderation settings */
@@ -278,6 +280,16 @@  struct ena_rss {
 
 };
 
+struct ena_customer_metrics {
+	/* in correlation with ENA_ADMIN_CUSTOMER_METRICS_SUPPORT_MASK
+	 * and ena_admin_customer_metrics_id
+	 */
+	u64 supported_metrics;
+	dma_addr_t buffer_dma_addr;
+	void *buffer_virt_addr;
+	u32 buffer_len;
+};
+
 struct ena_host_attribute {
 	/* Debug area */
 	u8 *debug_area_virt_addr;
@@ -327,6 +339,8 @@  struct ena_com_dev {
 	struct ena_intr_moder_entry *intr_moder_tbl;
 
 	struct ena_com_llq_info llq_info;
+
+	struct ena_customer_metrics customer_metrics;
 };
 
 struct ena_com_dev_get_features_ctx {
@@ -604,6 +618,15 @@  int ena_com_get_eni_stats(struct ena_com_dev *ena_dev,
 int ena_com_get_ena_srd_info(struct ena_com_dev *ena_dev,
 			     struct ena_admin_ena_srd_info *info);
 
+/* ena_com_get_customer_metrics - Get customer metrics for network interface
+ * @ena_dev: ENA communication layer struct
+ * @buffer: buffer for returned customer metrics
+ * @len: size of the buffer
+ *
+ * @return: 0 on Success and negative value otherwise.
+ */
+int ena_com_get_customer_metrics(struct ena_com_dev *ena_dev, char *buffer, u32 len);
+
 /* ena_com_set_dev_mtu - Configure the device mtu.
  * @ena_dev: ENA communication layer struct
  * @mtu: mtu value
@@ -814,6 +837,13 @@  int ena_com_allocate_host_info(struct ena_com_dev *ena_dev);
 int ena_com_allocate_debug_area(struct ena_com_dev *ena_dev,
 				u32 debug_area_size);
 
+/* ena_com_allocate_customer_metrics_buffer - Allocate customer metrics resources.
+ * @ena_dev: ENA communication layer struct
+ *
+ * @return: 0 on Success and negative value otherwise.
+ */
+int ena_com_allocate_customer_metrics_buffer(struct ena_com_dev *ena_dev);
+
 /* ena_com_delete_debug_area - Free the debug area resources.
  * @ena_dev: ENA communication layer struct
  *
@@ -828,6 +858,13 @@  void ena_com_delete_debug_area(struct ena_com_dev *ena_dev);
  */
 void ena_com_delete_host_info(struct ena_com_dev *ena_dev);
 
+/* ena_com_delete_customer_metrics_buffer - Free the customer metrics resources.
+ * @ena_dev: ENA communication layer struct
+ *
+ * Free the allocated customer metrics area.
+ */
+void ena_com_delete_customer_metrics_buffer(struct ena_com_dev *ena_dev);
+
 /* ena_com_set_host_attributes - Update the device with the host
  * attributes (debug area and host info) base address.
  * @ena_dev: ENA communication layer struct
@@ -984,6 +1021,28 @@  static inline bool ena_com_get_cap(struct ena_com_dev *ena_dev,
 	return !!(ena_dev->capabilities & BIT(cap_id));
 }
 
+/* ena_com_get_customer_metric_support - query whether device supports a given customer metric.
+ * @ena_dev: ENA communication layer struct
+ * @metric_id: enum value representing the customer metric
+ *
+ * @return - true if customer metric is supported or false otherwise
+ */
+static inline bool ena_com_get_customer_metric_support(struct ena_com_dev *ena_dev,
+						       enum ena_admin_customer_metrics_id metric_id)
+{
+	return !!(ena_dev->customer_metrics.supported_metrics & BIT(metric_id));
+}
+
+/* ena_com_get_customer_metric_count - return the number of supported customer metrics.
+ * @ena_dev: ENA communication layer struct
+ *
+ * @return - the number of supported customer metrics
+ */
+static inline int ena_com_get_customer_metric_count(struct ena_com_dev *ena_dev)
+{
+	return hweight64(ena_dev->customer_metrics.supported_metrics);
+}
+
 /* ena_com_update_intr_reg - Prepare interrupt register
  * @intr_reg: interrupt register to update.
  * @rx_delay_interval: Rx interval in usecs
diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 5efd3e43..1386f5df 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -14,6 +14,10 @@  struct ena_stats {
 	int stat_offset;
 };
 
+struct ena_hw_metrics {
+	char name[ETH_GSTRING_LEN];
+};
+
 #define ENA_STAT_ENA_COM_ENTRY(stat) { \
 	.name = #stat, \
 	.stat_offset = offsetof(struct ena_com_stats_admin, stat) / sizeof(u64) \
@@ -49,6 +53,10 @@  struct ena_stats {
 	.stat_offset = offsetof(struct ena_admin_ena_srd_info, flags) / sizeof(u64) \
 }
 
+#define ENA_METRIC_ENI_ENTRY(stat) { \
+	.name = #stat \
+}
+
 static const struct ena_stats ena_stats_global_strings[] = {
 	ENA_STAT_GLOBAL_ENTRY(tx_timeout),
 	ENA_STAT_GLOBAL_ENTRY(suspend),
@@ -60,6 +68,9 @@  static const struct ena_stats ena_stats_global_strings[] = {
 	ENA_STAT_GLOBAL_ENTRY(reset_fail),
 };
 
+/* A partial list of hw stats. Used when admin command
+ * with type ENA_ADMIN_GET_STATS_TYPE_CUSTOMER_METRICS is not supported
+ */
 static const struct ena_stats ena_stats_eni_strings[] = {
 	ENA_STAT_ENI_ENTRY(bw_in_allowance_exceeded),
 	ENA_STAT_ENI_ENTRY(bw_out_allowance_exceeded),
@@ -68,6 +79,15 @@  static const struct ena_stats ena_stats_eni_strings[] = {
 	ENA_STAT_ENI_ENTRY(linklocal_allowance_exceeded),
 };
 
+static const struct ena_hw_metrics ena_hw_stats_strings[] = {
+	ENA_METRIC_ENI_ENTRY(bw_in_allowance_exceeded),
+	ENA_METRIC_ENI_ENTRY(bw_out_allowance_exceeded),
+	ENA_METRIC_ENI_ENTRY(pps_allowance_exceeded),
+	ENA_METRIC_ENI_ENTRY(conntrack_allowance_exceeded),
+	ENA_METRIC_ENI_ENTRY(linklocal_allowance_exceeded),
+	ENA_METRIC_ENI_ENTRY(conntrack_allowance_available),
+};
+
 static const struct ena_stats ena_srd_info_strings[] = {
 	ENA_STAT_ENA_SRD_MODE_ENTRY(ena_srd_mode),
 	ENA_STAT_ENA_SRD_ENTRY(ena_srd_tx_pkts),
@@ -130,6 +150,7 @@  static const struct ena_stats ena_stats_ena_com_strings[] = {
 #define ENA_STATS_ARRAY_ENA_COM		ARRAY_SIZE(ena_stats_ena_com_strings)
 #define ENA_STATS_ARRAY_ENI		ARRAY_SIZE(ena_stats_eni_strings)
 #define ENA_STATS_ARRAY_ENA_SRD		ARRAY_SIZE(ena_srd_info_strings)
+#define ENA_METRICS_ARRAY_ENI		ARRAY_SIZE(ena_hw_stats_strings)
 
 static void ena_safe_update_stat(u64 *src, u64 *dst,
 				 struct u64_stats_sync *syncp)
@@ -142,6 +163,57 @@  static void ena_safe_update_stat(u64 *src, u64 *dst,
 	} while (u64_stats_fetch_retry(syncp, start));
 }
 
+static void ena_metrics_stats(struct ena_adapter *adapter, u64 **data)
+{
+	struct ena_com_dev *dev = adapter->ena_dev;
+	const struct ena_stats *ena_stats;
+	u64 *ptr;
+	int i;
+
+	if (ena_com_get_cap(dev, ENA_ADMIN_CUSTOMER_METRICS)) {
+		u32 supported_metrics_count;
+		int len;
+
+		supported_metrics_count = ena_com_get_customer_metric_count(dev);
+		len = supported_metrics_count * sizeof(u64);
+
+		/* Fill the data buffer, and advance its pointer */
+		ena_com_get_customer_metrics(adapter->ena_dev, (char *)(*data), len);
+		(*data) += supported_metrics_count;
+
+	} else if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENI_STATS)) {
+		ena_com_get_eni_stats(adapter->ena_dev, &adapter->eni_stats);
+		/* Updating regardless of rc - once we told ethtool how many stats we have
+		 * it will print that much stats. We can't leave holes in the stats
+		 */
+		for (i = 0; i < ENA_STATS_ARRAY_ENI; i++) {
+			ena_stats = &ena_stats_eni_strings[i];
+
+			ptr = (u64 *)&adapter->eni_stats +
+				ena_stats->stat_offset;
+
+			ena_safe_update_stat(ptr, (*data)++, &adapter->syncp);
+		}
+	}
+
+	if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENA_SRD_INFO)) {
+		ena_com_get_ena_srd_info(adapter->ena_dev, &adapter->ena_srd_info);
+		/* Get ENA SRD mode */
+		ptr = (u64 *)&adapter->ena_srd_info;
+		ena_safe_update_stat(ptr, (*data)++, &adapter->syncp);
+		for (i = 1; i < ENA_STATS_ARRAY_ENA_SRD; i++) {
+			ena_stats = &ena_srd_info_strings[i];
+			/* Wrapped within an outer struct - need to accommodate an
+			 * additional offset of the ENA SRD mode that was already processed
+			 */
+			ptr = (u64 *)&adapter->ena_srd_info +
+				ena_stats->stat_offset + 1;
+
+			ena_safe_update_stat(ptr, (*data)++, &adapter->syncp);
+		}
+	}
+}
+
 static void ena_queue_stats(struct ena_adapter *adapter, u64 **data)
 {
 	const struct ena_stats *ena_stats;
@@ -210,39 +282,8 @@  static void ena_get_stats(struct ena_adapter *adapter,
 		ena_safe_update_stat(ptr, data++, &adapter->syncp);
 	}
 
-	if (hw_stats_needed) {
-		if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENI_STATS)) {
-			ena_com_get_eni_stats(adapter->ena_dev, &adapter->eni_stats);
-			/* Updating regardless of rc - once we told ethtool how many stats we have
-			 * it will print that much stats. We can't leave holes in the stats
-			 */
-			for (i = 0; i < ENA_STATS_ARRAY_ENI; i++) {
-				ena_stats = &ena_stats_eni_strings[i];
-
-				ptr = (u64 *)&adapter->eni_stats +
-					ena_stats->stat_offset;
-
-				ena_safe_update_stat(ptr, data++, &adapter->syncp);
-			}
-		}
-
-		if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENA_SRD_INFO)) {
-			ena_com_get_ena_srd_info(adapter->ena_dev, &adapter->ena_srd_info);
-			/* Get ENA SRD mode */
-			ptr = (u64 *)&adapter->ena_srd_info;
-			ena_safe_update_stat(ptr, data++, &adapter->syncp);
-			for (i = 1; i < ENA_STATS_ARRAY_ENA_SRD; i++) {
-				ena_stats = &ena_srd_info_strings[i];
-				/* Wrapped within an outer struct - need to accommodate an
-				 * additional offset of the ENA SRD mode that was already processed
-				 */
-				ptr = (u64 *)&adapter->ena_srd_info +
-					ena_stats->stat_offset + 1;
-
-				ena_safe_update_stat(ptr, data++, &adapter->syncp);
-			}
-		}
-	}
+	if (hw_stats_needed)
+		ena_metrics_stats(adapter, &data);
 
 	ena_queue_stats(adapter, &data);
 	ena_dev_admin_queue_stats(adapter, &data);
@@ -266,8 +307,16 @@  static int ena_get_sw_stats_count(struct ena_adapter *adapter)
 
 static int ena_get_hw_stats_count(struct ena_adapter *adapter)
 {
-	return ENA_STATS_ARRAY_ENI * ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENI_STATS) +
-	       ENA_STATS_ARRAY_ENA_SRD * ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENA_SRD_INFO);
+	struct ena_com_dev *dev = adapter->ena_dev;
+	int count = ENA_STATS_ARRAY_ENA_SRD *
+			ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENA_SRD_INFO);
+
+	if (ena_com_get_cap(dev, ENA_ADMIN_CUSTOMER_METRICS))
+		count += ena_com_get_customer_metric_count(dev);
+	else if (ena_com_get_cap(dev, ENA_ADMIN_ENI_STATS))
+		count += ENA_STATS_ARRAY_ENI;
+
+	return count;
 }
 
 int ena_get_sset_count(struct net_device *netdev, int sset)
@@ -283,6 +332,35 @@  int ena_get_sset_count(struct net_device *netdev, int sset)
 	return -EOPNOTSUPP;
 }
 
+static void ena_metrics_stats_strings(struct ena_adapter *adapter, u8 **data)
+{
+	struct ena_com_dev *dev = adapter->ena_dev;
+	const struct ena_hw_metrics *ena_metrics;
+	const struct ena_stats *ena_stats;
+	int i;
+
+	if (ena_com_get_cap(dev, ENA_ADMIN_CUSTOMER_METRICS)) {
+		for (i = 0; i < ENA_METRICS_ARRAY_ENI; i++) {
+			if (ena_com_get_customer_metric_support(dev, i)) {
+				ena_metrics = &ena_hw_stats_strings[i];
+				ethtool_puts(data, ena_metrics->name);
+			}
+		}
+	} else if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENI_STATS)) {
+		for (i = 0; i < ENA_STATS_ARRAY_ENI; i++) {
+			ena_stats = &ena_stats_eni_strings[i];
+			ethtool_puts(data, ena_stats->name);
+		}
+	}
+
+	if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENA_SRD_INFO)) {
+		for (i = 0; i < ENA_STATS_ARRAY_ENA_SRD; i++) {
+			ena_stats = &ena_srd_info_strings[i];
+			ethtool_puts(data, ena_stats->name);
+		}
+	}
+}
+
 static void ena_queue_strings(struct ena_adapter *adapter, u8 **data)
 {
 	const struct ena_stats *ena_stats;
@@ -338,20 +416,8 @@  static void ena_get_strings(struct ena_adapter *adapter,
 		ethtool_puts(&data, ena_stats->name);
 	}
 
-	if (hw_stats_needed) {
-		if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENI_STATS)) {
-			for (i = 0; i < ENA_STATS_ARRAY_ENI; i++) {
-				ena_stats = &ena_stats_eni_strings[i];
-				ethtool_puts(&data, ena_stats->name);
-			}
-		}
-		if (ena_com_get_cap(adapter->ena_dev, ENA_ADMIN_ENA_SRD_INFO)) {
-			for (i = 0; i < ENA_STATS_ARRAY_ENA_SRD; i++) {
-				ena_stats = &ena_srd_info_strings[i];
-				ethtool_puts(&data, ena_stats->name);
-			}
-		}
-	}
+	if (hw_stats_needed)
+		ena_metrics_stats_strings(adapter, &data);
 
 	ena_queue_strings(adapter, &data);
 	ena_com_dev_strings(&data);
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 0883c9a2..c5b50cfa 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3931,10 +3931,16 @@  static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	pci_set_drvdata(pdev, adapter);
 
+	rc = ena_com_allocate_customer_metrics_buffer(ena_dev);
+	if (rc) {
+		netdev_err(netdev, "ena_com_allocate_customer_metrics_buffer failed\n");
+		goto err_netdev_destroy;
+	}
+
 	rc = ena_map_llq_mem_bar(pdev, ena_dev, bars);
 	if (rc) {
 		dev_err(&pdev->dev, "ENA LLQ bar mapping failed\n");
-		goto err_netdev_destroy;
+		goto err_metrics_destroy;
 	}
 
 	rc = ena_device_init(adapter, pdev, &get_feat_ctx, &wd_state);
@@ -3942,7 +3948,7 @@  static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev_err(&pdev->dev, "ENA device init failed\n");
 		if (rc == -ETIME)
 			rc = -EPROBE_DEFER;
-		goto err_netdev_destroy;
+		goto err_metrics_destroy;
 	}
 
 	/* Initial TX and RX interrupt delay. Assumes 1 usec granularity.
@@ -4063,6 +4069,8 @@  err_worker_destroy:
 err_device_destroy:
 	ena_com_delete_host_info(ena_dev);
 	ena_com_admin_destroy(ena_dev);
+err_metrics_destroy:
+	ena_com_delete_customer_metrics_buffer(ena_dev);
 err_netdev_destroy:
 	free_netdev(netdev);
 err_free_region:
@@ -4126,6 +4134,8 @@  static void __ena_shutoff(struct pci_dev *pdev, bool shutdown)
 
 	ena_com_delete_host_info(ena_dev);
 
+	ena_com_delete_customer_metrics_buffer(ena_dev);
+
 	ena_release_bars(ena_dev, pdev);
 
 	pci_disable_device(pdev);