mbox series

[rdma-next,00/10] Optional counter statistics support

Message ID 20210818112428.209111-1-markzhang@nvidia.com (mailing list archive)
Headers show
Series Optional counter statistics support | expand

Message

Mark Zhang Aug. 18, 2021, 11:24 a.m. UTC
Hi,

This series from Aharon and Neta provides an extension to the rdma
statistics tool that allows to add and remove optional counters
dynamically, using new netlink commands.

The idea of having optional counters is to provide to the users the
ability to get statistics of counters that hurts performance.

Once an optional counter was added, its statistics will be presented
along with all the counters, using the show command.

Binding objects to the optional counters is currently not supported,
neither in auto mode nor in manual mode.

To get the list of optional counters that are supported on this device,
use "rdma statistic mode supported". To see which counters are currently
enabled, use "rdma statistic mode".

$ rdma statistic mode supported
link rocep8s0f0/1
    Optional-set: cc_rx_ce_pkts cc_rx_cnp_pkts cc_tx_cnp_pkts
link rocep8s0f1/1
    Optional-set: cc_rx_ce_pkts cc_rx_cnp_pkts cc_tx_cnp_pkts

$ sudo rdma statistic add link rocep8s0f0/1 optional-set cc_rx_ce_pkts
$ rdma statistic mode
link rocep8s0f0/1
    Optional-set: cc_rx_ce_pkts
$ sudo rdma statistic add link rocep8s0f0/1 optional-set cc_tx_cnp_pkts
$ rdma statistic mode
link rocep8s0f0/1
    Optional-set: cc_rx_ce_pkts cc_tx_cnp_pkts

$ rdma statistic show link rocep8s0f0/1
link rocep8s0f0/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0
out_of_sequence 0 duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err 0 implied_nak_seq_err 0
local_ack_timeout_err 0 resp_local_length_error 0 resp_cqe_error 0 req_cqe_error 0
req_remote_invalid_request 0 req_remote_access_errors 0 resp_remote_access_errors 0
resp_cqe_flush_error 0 req_cqe_flush_error 0 roce_adp_retrans 0 roce_adp_retrans_to 0
roce_slow_restart 0 roce_slow_restart_cnps 0 roce_slow_restart_trans 0 rp_cnp_ignored 0
rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0 rx_icrc_encapsulated 0
    Optional-set: cc_rx_ce_pkts 0 cc_tx_cnp_pkts 0

$ sudo rdma statistic remove link rocep8s0f0/1 optional-set cc_rx_ce_pkts
$ sudo rdma statistic remove link rocep8s0f0/1 optional-set cc_tx_cnp_pkts

Thanks

Aharon Landau (9):
  net/mlx5: Add support in bth_opcode as a match criteria
  net/mlx5: Add priorities for counters in RDMA namespaces
  RDMA/counters: Support to allocate per-port optional counter
    statistics
  RDMA/mlx5: Add alloc_op_port_stats() support
  RDMA/mlx5: Add steering support in optional flow counters
  RDMA/nldev: Add support to add and remove optional counters
  RDMA/mlx5: Add add_op_stat() and remove_op_stat() support
  RDMA/mlx5: Add get_op_stats() support
  RDMA/nldev: Add support to get current enabled optional counters

Neta Ostrovsky (1):
  RDMA/nldev: Add support to get optional counters statistics

 drivers/infiniband/core/counters.c            |  86 +++++
 drivers/infiniband/core/device.c              |   4 +
 drivers/infiniband/core/nldev.c               | 297 ++++++++++++++++--
 drivers/infiniband/hw/mlx5/counters.c         | 157 ++++++++-
 drivers/infiniband/hw/mlx5/fs.c               | 111 +++++++
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  21 ++
 .../net/ethernet/mellanox/mlx5/core/fs_core.c |  54 +++-
 include/linux/mlx5/device.h                   |   2 +
 include/linux/mlx5/fs.h                       |   2 +
 include/linux/mlx5/mlx5_ifc.h                 |  20 +-
 include/rdma/ib_hdrs.h                        |   1 +
 include/rdma/ib_verbs.h                       |  36 +++
 include/rdma/rdma_counter.h                   |   8 +
 include/rdma/rdma_netlink.h                   |   1 +
 include/uapi/rdma/rdma_netlink.h              |  11 +
 15 files changed, 775 insertions(+), 36 deletions(-)

Comments

Jason Gunthorpe Aug. 23, 2021, 7:33 p.m. UTC | #1
On Wed, Aug 18, 2021 at 02:24:18PM +0300, Mark Zhang wrote:
> Hi,
> 
> This series from Aharon and Neta provides an extension to the rdma
> statistics tool that allows to add and remove optional counters
> dynamically, using new netlink commands.
> 
> The idea of having optional counters is to provide to the users the
> ability to get statistics of counters that hurts performance.
> 
> Once an optional counter was added, its statistics will be presented
> along with all the counters, using the show command.
> 
> Binding objects to the optional counters is currently not supported,
> neither in auto mode nor in manual mode.
> 
> To get the list of optional counters that are supported on this device,
> use "rdma statistic mode supported". To see which counters are currently
> enabled, use "rdma statistic mode".
> 
> $ rdma statistic mode supported
> link rocep8s0f0/1
>     Optional-set: cc_rx_ce_pkts cc_rx_cnp_pkts cc_tx_cnp_pkts
> link rocep8s0f1/1
>     Optional-set: cc_rx_ce_pkts cc_rx_cnp_pkts cc_tx_cnp_pkts
> 
> $ sudo rdma statistic add link rocep8s0f0/1 optional-set cc_rx_ce_pkts
> $ rdma statistic mode
> link rocep8s0f0/1
>     Optional-set: cc_rx_ce_pkts
> $ sudo rdma statistic add link rocep8s0f0/1 optional-set cc_tx_cnp_pkts
> $ rdma statistic mode
> link rocep8s0f0/1
>     Optional-set: cc_rx_ce_pkts cc_tx_cnp_pkts

This doesn't look like the right output to iproute to me, the two
command should not be using the same tag and the output of iproute
should always be formed to be valid input to iproute


> $ rdma statistic show link rocep8s0f0/1
> link rocep8s0f0/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0
> out_of_sequence 0 duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err 0 implied_nak_seq_err 0
> local_ack_timeout_err 0 resp_local_length_error 0 resp_cqe_error 0 req_cqe_error 0
> req_remote_invalid_request 0 req_remote_access_errors 0 resp_remote_access_errors 0
> resp_cqe_flush_error 0 req_cqe_flush_error 0 roce_adp_retrans 0 roce_adp_retrans_to 0
> roce_slow_restart 0 roce_slow_restart_cnps 0 roce_slow_restart_trans 0 rp_cnp_ignored 0
> rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0 rx_icrc_encapsulated 0
>     Optional-set: cc_rx_ce_pkts 0 cc_tx_cnp_pkts 0

Also looks bad, optional counters should not be marked specially at
this point.

> Aharon Landau (9):
>   net/mlx5: Add support in bth_opcode as a match criteria
>   net/mlx5: Add priorities for counters in RDMA namespaces
>   RDMA/counters: Support to allocate per-port optional counter
>     statistics
>   RDMA/mlx5: Add alloc_op_port_stats() support
>   RDMA/mlx5: Add steering support in optional flow counters
>   RDMA/nldev: Add support to add and remove optional counters
>   RDMA/mlx5: Add add_op_stat() and remove_op_stat() support
>   RDMA/mlx5: Add get_op_stats() support
>   RDMA/nldev: Add support to get current enabled optional counters
> 
> Neta Ostrovsky (1):
>   RDMA/nldev: Add support to get optional counters statistics

This series is in a poor order, all the core update should come first
and the commit messages should explain what is going on when building
out the new APIs.

The RDMA/mlx5 patches can go last

Jason
Mark Zhang Aug. 24, 2021, 1:44 a.m. UTC | #2
On 8/24/2021 3:33 AM, Jason Gunthorpe wrote:
> On Wed, Aug 18, 2021 at 02:24:18PM +0300, Mark Zhang wrote:
>> Hi,
>>
>> This series from Aharon and Neta provides an extension to the rdma
>> statistics tool that allows to add and remove optional counters
>> dynamically, using new netlink commands.
>>
>> The idea of having optional counters is to provide to the users the
>> ability to get statistics of counters that hurts performance.
>>
>> Once an optional counter was added, its statistics will be presented
>> along with all the counters, using the show command.
>>
>> Binding objects to the optional counters is currently not supported,
>> neither in auto mode nor in manual mode.
>>
>> To get the list of optional counters that are supported on this device,
>> use "rdma statistic mode supported". To see which counters are currently
>> enabled, use "rdma statistic mode".
>>
>> $ rdma statistic mode supported
>> link rocep8s0f0/1
>>      Optional-set: cc_rx_ce_pkts cc_rx_cnp_pkts cc_tx_cnp_pkts
>> link rocep8s0f1/1
>>      Optional-set: cc_rx_ce_pkts cc_rx_cnp_pkts cc_tx_cnp_pkts
>>
>> $ sudo rdma statistic add link rocep8s0f0/1 optional-set cc_rx_ce_pkts
>> $ rdma statistic mode
>> link rocep8s0f0/1
>>      Optional-set: cc_rx_ce_pkts
>> $ sudo rdma statistic add link rocep8s0f0/1 optional-set cc_tx_cnp_pkts
>> $ rdma statistic mode
>> link rocep8s0f0/1
>>      Optional-set: cc_rx_ce_pkts cc_tx_cnp_pkts
> 
> This doesn't look like the right output to iproute to me, the two
> command should not be using the same tag and the output of iproute
> should always be formed to be valid input to iproute

So it should be like this:

$ rdma statistic mode supported
link rocep8s0f0/1 optional-set cc_rx_ce_pkts cc_rx_cnp_pkts  cc_tx_cnp_pkts
link rocep8s0f1/1 optional-set cc_rx_ce_pkts cc_rx_cnp_pkts cc_tx_cnp_pkts

$ sudo rdma statistic add link rocep8s0f0/1 optional-set cc_rx_ce_pkts
$ rdma statistic mode
link rocep8s0f0/1 optional-set cc_rx_ce_pkts
$ sudo rdma statistic add link rocep8s0f0/1 optional-set cc_tx_cnp_pkts
$ rdma statistic mode
link rocep8s0f0/1 optional-set cc_rx_ce_pkts cc_tx_cnp_pkts

> 
>> $ rdma statistic show link rocep8s0f0/1
>> link rocep8s0f0/1 rx_write_requests 0 rx_read_requests 0 rx_atomic_requests 0 out_of_buffer 0
>> out_of_sequence 0 duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err 0 implied_nak_seq_err 0
>> local_ack_timeout_err 0 resp_local_length_error 0 resp_cqe_error 0 req_cqe_error 0
>> req_remote_invalid_request 0 req_remote_access_errors 0 resp_remote_access_errors 0
>> resp_cqe_flush_error 0 req_cqe_flush_error 0 roce_adp_retrans 0 roce_adp_retrans_to 0
>> roce_slow_restart 0 roce_slow_restart_cnps 0 roce_slow_restart_trans 0 rp_cnp_ignored 0
>> rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0 rx_icrc_encapsulated 0
>>      Optional-set: cc_rx_ce_pkts 0 cc_tx_cnp_pkts 0
> 
> Also looks bad, optional counters should not be marked specially at
> this point.

Will put optional counters in the last, like this:

$ rdma statistic show link rocep8s0f0/1
link rocep8s0f0/1 rx_write_requests 0 rx_read_requests 0 
rx_atomic_requests 0 out_of_buffer 0
out_of_sequence 0 duplicate_request 0 rnr_nak_retry_err 0 packet_seq_err 
0 implied_nak_seq_err 0
local_ack_timeout_err 0 resp_local_length_error 0 resp_cqe_error 0 
req_cqe_error 0
req_remote_invalid_request 0 req_remote_access_errors 0 
resp_remote_access_errors 0
resp_cqe_flush_error 0 req_cqe_flush_error 0 roce_adp_retrans 0 
roce_adp_retrans_to 0
roce_slow_restart 0 roce_slow_restart_cnps 0 roce_slow_restart_trans 0 
rp_cnp_ignored 0
rp_cnp_handled 0 np_ecn_marked_roce_packets 0 np_cnp_sent 0 
rx_icrc_encapsulated 0 cc_rx_ce_pkts 0 cc_tx_cnp_pkts 0

>> Aharon Landau (9):
>>    net/mlx5: Add support in bth_opcode as a match criteria
>>    net/mlx5: Add priorities for counters in RDMA namespaces
>>    RDMA/counters: Support to allocate per-port optional counter
>>      statistics
>>    RDMA/mlx5: Add alloc_op_port_stats() support
>>    RDMA/mlx5: Add steering support in optional flow counters
>>    RDMA/nldev: Add support to add and remove optional counters
>>    RDMA/mlx5: Add add_op_stat() and remove_op_stat() support
>>    RDMA/mlx5: Add get_op_stats() support
>>    RDMA/nldev: Add support to get current enabled optional counters
>>
>> Neta Ostrovsky (1):
>>    RDMA/nldev: Add support to get optional counters statistics
> 
> This series is in a poor order, all the core update should come first
> and the commit messages should explain what is going on when building
> out the new APIs.
> 
> The RDMA/mlx5 patches can go last

Will fix it, thanks Jason.

> Jason
>
Jason Gunthorpe Aug. 24, 2021, 1:11 p.m. UTC | #3
On Tue, Aug 24, 2021 at 09:44:26AM +0800, Mark Zhang wrote:
> On 8/24/2021 3:33 AM, Jason Gunthorpe wrote:
> > On Wed, Aug 18, 2021 at 02:24:18PM +0300, Mark Zhang wrote:
> > > Hi,
> > > 
> > > This series from Aharon and Neta provides an extension to the rdma
> > > statistics tool that allows to add and remove optional counters
> > > dynamically, using new netlink commands.
> > > 
> > > The idea of having optional counters is to provide to the users the
> > > ability to get statistics of counters that hurts performance.
> > > 
> > > Once an optional counter was added, its statistics will be presented
> > > along with all the counters, using the show command.
> > > 
> > > Binding objects to the optional counters is currently not supported,
> > > neither in auto mode nor in manual mode.
> > > 
> > > To get the list of optional counters that are supported on this device,
> > > use "rdma statistic mode supported". To see which counters are currently
> > > enabled, use "rdma statistic mode".
> > > 
> > > $ rdma statistic mode supported
> > > link rocep8s0f0/1
> > >      Optional-set: cc_rx_ce_pkts cc_rx_cnp_pkts cc_tx_cnp_pkts
> > > link rocep8s0f1/1
> > >      Optional-set: cc_rx_ce_pkts cc_rx_cnp_pkts cc_tx_cnp_pkts
> > > 
> > > $ sudo rdma statistic add link rocep8s0f0/1 optional-set cc_rx_ce_pkts
> > > $ rdma statistic mode
> > > link rocep8s0f0/1
> > >      Optional-set: cc_rx_ce_pkts
> > > $ sudo rdma statistic add link rocep8s0f0/1 optional-set cc_tx_cnp_pkts
> > > $ rdma statistic mode
> > > link rocep8s0f0/1
> > >      Optional-set: cc_rx_ce_pkts cc_tx_cnp_pkts
> > 
> > This doesn't look like the right output to iproute to me, the two
> > command should not be using the same tag and the output of iproute
> > should always be formed to be valid input to iproute
> 
> So it should be like this:
> 
> $ rdma statistic mode supported
> link rocep8s0f0/1 optional-set cc_rx_ce_pkts cc_rx_cnp_pkts  cc_tx_cnp_pkts
> link rocep8s0f1/1 optional-set cc_rx_ce_pkts cc_rx_cnp_pkts cc_tx_cnp_pkts

Each netlink tag in the protocol should have a unique string in the
output. So you need strings that mean "optional set supported" and
"optional set currently enabled"

Jason