diff mbox series

[v8,net-next,5/5] net: ena: Add PHC documentation

Message ID 20250304190504.3743-6-darinzon@amazon.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series PHC support in ENA driver | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 2 maintainers not CCed: corbet@lwn.net linux-doc@vger.kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 111 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2025-03-04--21-00 (tests: 894)

Commit Message

Arinzon, David March 4, 2025, 7:05 p.m. UTC
Provide the relevant information and guidelines
about the feature support in the ENA driver.

Signed-off-by: Amit Bernstein <amitbern@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
---
 .../device_drivers/ethernet/amazon/ena.rst    | 96 +++++++++++++++++++
 1 file changed, 96 insertions(+)

Comments

Andrew Lunn March 4, 2025, 9:10 p.m. UTC | #1
> +The feature is turned off by default, in order to turn the feature on,
> +please use the following:
> +
> +- sysfs (during runtime):
> +
> +.. code-block:: shell
> +
> +  echo 1 > /sys/bus/pci/devices/<domain:bus:slot.function>/phc_enable
> +
> +All available PTP clock sources can be tracked here:
> +
> +.. code-block:: shell
> +
> +  ls /sys/class/ptp
> +
> +PHC support and capabilities can be verified using ethtool:
> +
> +.. code-block:: shell
> +
> +  ethtool -T <interface>
> +
> +**PHC timestamp**
> +
> +To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example using `testptp`_:
> +
> +.. code-block:: shell
> +
> +  testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware Clock:/ {print $NF}') -k 1

Why is not opening /dev/ptpX sufficient to enable the PHC?

    Andrew
Arinzon, David March 5, 2025, 12:50 p.m. UTC | #2
> > +The feature is turned off by default, in order to turn the feature
> > +on, please use the following:
> > +
> > +- sysfs (during runtime):
> > +
> > +.. code-block:: shell
> > +
> > +  echo 1 > /sys/bus/pci/devices/<domain:bus:slot.function>/phc_enable
> > +
> > +All available PTP clock sources can be tracked here:
> > +
> > +.. code-block:: shell
> > +
> > +  ls /sys/class/ptp
> > +
> > +PHC support and capabilities can be verified using ethtool:
> > +
> > +.. code-block:: shell
> > +
> > +  ethtool -T <interface>
> > +
> > +**PHC timestamp**
> > +
> > +To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example
> using `testptp`_:
> > +
> > +.. code-block:: shell
> > +
> > +  testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware
> > + Clock:/ {print $NF}') -k 1
> 
> Why is not opening /dev/ptpX sufficient to enable the PHC?
> 
>     Andrew

Hi Andrew,

The reasoning for the enablement option of PHC was explained in patch 3 in the series
https://lore.kernel.org/netdev/20250304190504.3743-4-darinzon@amazon.com/

David
David Woodhouse March 5, 2025, 12:59 p.m. UTC | #3
On Wed, 2025-03-05 at 12:50 +0000, Arinzon, David wrote:
>  
> > Why is not opening /dev/ptpX sufficient to enable the PHC?
> > 
> >      Andrew
> 
> Hi Andrew,
> 
> The reasoning for the enablement option of PHC was explained in patch 3 in the series
> https://lore.kernel.org/netdev/20250304190504.3743-4-darinzon@amazon.com/

Not really. That says,

"The PHC feature is disabled by default, although its
footprint is small, most customers do not require such
high-precision timestamping. Enabling PHC unnecessarily
in environments that do not need it might introduce a
minor but unnecessary overhead."

Disable by default, sure. But it gives no explanation for why we need a
separate knob in sysfs to enable/disable it, and we can't just enable
it automatically when /dev/ptpX is opened as Andrew suggested.

If you read the actual code in that patch, there's a hint though.
Because the actual process in ena_phc_enable_set() does the following:

+	ena_destroy_device(adapter, false);
+	rc = ena_restore_device(adapter);

Is that actually tearing down the whole netdev and recreating it when
the PHC enable is flipped? Does it really have to? That doesn't seem
ideal even if it *is* a separate knob in sysfs.
Andrew Lunn March 5, 2025, 3:35 p.m. UTC | #4
> If you read the actual code in that patch, there's a hint though.
> Because the actual process in ena_phc_enable_set() does the following:
> 
> +	ena_destroy_device(adapter, false);
> +	rc = ena_restore_device(adapter);
> 
> Is that actually tearing down the whole netdev and recreating it when
> the PHC enable is flipped?

Well Jakub said it is a pure clock, not related to the netdev. If that
is true, i don't see why this should be needed. But i've not looked at
the code...

	Andrew
Leon Romanovsky March 5, 2025, 5:52 p.m. UTC | #5
On Wed, Mar 05, 2025 at 04:35:48PM +0100, Andrew Lunn wrote:
> > If you read the actual code in that patch, there's a hint though.
> > Because the actual process in ena_phc_enable_set() does the following:
> > 
> > +	ena_destroy_device(adapter, false);
> > +	rc = ena_restore_device(adapter);
> > 
> > Is that actually tearing down the whole netdev and recreating it when
> > the PHC enable is flipped?
> 
> Well Jakub said it is a pure clock, not related to the netdev. 

So why are you ready to merge the code which is not netdev, doesn't have
any association with netdev and doesn't follow netdev semantics (no
custom sysfs files)?

Thanks
Arinzon, David March 26, 2025, 3:32 p.m. UTC | #6
> > > If you read the actual code in that patch, there's a hint though.
> > > Because the actual process in ena_phc_enable_set() does the following:
> > >
> > > +   ena_destroy_device(adapter, false);
> > > +   rc = ena_restore_device(adapter);
> > >
> > > Is that actually tearing down the whole netdev and recreating it
> > > when the PHC enable is flipped?
> >
> > Well Jakub said it is a pure clock, not related to the netdev.

The PHC device is a PTP clock integrated with the networking device under the same PCI device.  
As previously mentioned in this thread, enabling or disabling the ENA PHC requires reconfiguring the ENA network device, 
which involves tearing down and recreating the netdev. 
This necessitates having a separate control knob.

Thanks,
David

> 
> So why are you ready to merge the code which is not netdev, doesn't have
> any association with netdev and doesn't follow netdev semantics (no custom
> sysfs files)?
> 
> Thanks
Leon Romanovsky March 27, 2025, 11:15 a.m. UTC | #7
On Wed, Mar 26, 2025 at 03:32:00PM +0000, Arinzon, David wrote:
> > > > If you read the actual code in that patch, there's a hint though.
> > > > Because the actual process in ena_phc_enable_set() does the following:
> > > >
> > > > +   ena_destroy_device(adapter, false);
> > > > +   rc = ena_restore_device(adapter);
> > > >
> > > > Is that actually tearing down the whole netdev and recreating it
> > > > when the PHC enable is flipped?
> > >
> > > Well Jakub said it is a pure clock, not related to the netdev.
> 
> The PHC device is a PTP clock integrated with the networking device under the same PCI device.  
> As previously mentioned in this thread, enabling or disabling the ENA PHC requires reconfiguring the ENA network device, 
> which involves tearing down and recreating the netdev. 
> This necessitates having a separate control knob.

I appreciate the desire to keep everything in one place and provide
custom, vendor specific sysfs files. However, there is standard way
(auxbus??) to solve it without need to reinvent anything.

Thanks

> 
> Thanks,
> David
> 
> > 
> > So why are you ready to merge the code which is not netdev, doesn't have
> > any association with netdev and doesn't follow netdev semantics (no custom
> > sysfs files)?
> > 
> > Thanks
David Woodhouse March 27, 2025, 12:01 p.m. UTC | #8
On Thu, 2025-03-27 at 13:15 +0200, Leon Romanovsky wrote:
> On Wed, Mar 26, 2025 at 03:32:00PM +0000, Arinzon, David wrote:
> > > > > If you read the actual code in that patch, there's a hint though.
> > > > > Because the actual process in ena_phc_enable_set() does the following:
> > > > > 
> > > > > +   ena_destroy_device(adapter, false);
> > > > > +   rc = ena_restore_device(adapter);
> > > > > 
> > > > > Is that actually tearing down the whole netdev and recreating it
> > > > > when the PHC enable is flipped?
> > > > 
> > > > Well Jakub said it is a pure clock, not related to the netdev.
> > 
> > The PHC device is a PTP clock integrated with the networking device under the same PCI device.  
> > As previously mentioned in this thread, enabling or disabling the
> > ENA PHC requires reconfiguring the ENA network device, 
> > which involves tearing down and recreating the netdev. 
> > This necessitates having a separate control knob.
> 
> I appreciate the desire to keep everything in one place and provide
> custom, vendor specific sysfs files. However, there is standard way
> (auxbus??) to solve it without need to reinvent anything.

Using auxbus would allow the PHC driver to be loaded as a separate
module. But AIUI enabling the PHC involves reconfiguring the device and
thus bringing the network device down and back up. Which isn't ideal.
It *isn't* as separate as auxbus would make it look.

We don't want to unconditionally enable the PHC support in the device
even when it's not being used, because of the potential blast radius of
such a change if *every* guest starts enabling it even when they don't
need to use it.
David Woodhouse April 1, 2025, 8:02 a.m. UTC | #9
On Tue, 2025-03-04 at 22:10 +0100, Andrew Lunn wrote:
> > +The feature is turned off by default, in order to turn the feature
> > on,
> > +please use the following:
> > +
> > +- sysfs (during runtime):
> > +
> > +.. code-block:: shell
> > +
> > +  echo 1 >
> > /sys/bus/pci/devices/<domain:bus:slot.function>/phc_enable
> > +
> > +All available PTP clock sources can be tracked here:
> > +
> > +.. code-block:: shell
> > +
> > +  ls /sys/class/ptp
> > +
> > +PHC support and capabilities can be verified using ethtool:
> > +
> > +.. code-block:: shell
> > +
> > +  ethtool -T <interface>
> > +
> > +**PHC timestamp**
> > +
> > +To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example
> > using `testptp`_:
> > +
> > +.. code-block:: shell
> > +
> > +  testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware
> > Clock:/ {print $NF}') -k 1
> 
> Why is not opening /dev/ptpX sufficient to enable the PHC?

There's currently no hook from ptp_open() and ptp_release() into the
implementation to do anything like that. It could be added, I suppose,
but it isn't a great experience because the act of opening it for the
first time would then take down the netdev.

None of the alternative bikeshed options that David is being bombarded
with here make sense to me, like auxdev or implicit opening, because
it's tied to the netdev.

In a future revision of the device itself, perhaps we can enable the
PHC completely independently of the network device. And then of course
we can just do it on ptp_open(), and we'll add the required hooks in
struct ptp_clock_info. But for now, that doesn't make sense.

I've been a maintainer. I remember looking at submissions and thinking
"there must be a better way to do this", then attempting five different
options and concluding that the original submission is the one I
dislike least after all. This seems like one of those times, with a
bunch of suggestions which ultimately give a worse user experience.
And a contributor caught in the cross-fire with no coherent guidance
about how to proceed.

I think the sysfs control is the best option here.

On the device side, I am internally applying the cluebat because this
whole thing was *predictable*. The PHC feature *should* have been
orthogonal, in the device, to the actual networking. So it can be
enabled or disabled at will. Eventually I hope a later version of the
device should fix that, and we can have a nice clean user interface —
the /dev/ptpX node can always exist, but it does nothing at all if it
isn't being used. In the interim, a sysfs knob to explicitly turn it on
seems like a reasonable approach to me.

I am ambivalent about *also* using auxdev, so that when the netdev is
told to enable the PHC feature, the auxdev 'appears' and a separate PTP
driver binds to it. It's kind of cute, but probably overkill, and not
what any other NIC driver does, is it? CONFIG_PTP_1588_CLOCK_OPTIONAL
exists to handle the case of NICs with optional PTP support.
(Actually it turns out I'm less ambivalent by the end of that paragraph
than I was at the start... :)
David Woodhouse April 1, 2025, 8:46 a.m. UTC | #10
On Tue, 2025-04-01 at 09:02 +0100, David Woodhouse wrote:
> 
> I think the sysfs control is the best option here.

Actually, it occurs to me that the best option is probably a module
parameter. If you have to take the network down and up to change the
mode, why not just unload and reload the module?

I understand there's a policy about that too, because we don't want
bespoke module parameters changing the behaviour of network devices,
which ought to be uniform. But this isn't changing the behaviour of the
network device per se.

Think of it as three drivers. One for the PCI device itself, which
registers one or two auxdevs (depending on a module option). This
module option is not on the network device driver. Then there's the
netdev and the PTP which bind to those auxdevs.

Now in *practice*, actually using auxdev is overkill and not consistent
with what other network+PTP drivers do, and I think it should remain in
a single driver. I'm using the separation to explain why a module
parameter wouldn't be conceptually on the "network" device, and why we
should consider this one of the cases where "policy" is just a
euphemism for not doing our own thinking.

Just put a module param on the PCI device driver and have done with it.
Jakub Kicinski April 2, 2025, 4:23 p.m. UTC | #11
On Tue, 01 Apr 2025 09:46:35 +0100 David Woodhouse wrote:
> On Tue, 2025-04-01 at 09:02 +0100, David Woodhouse wrote:
> > 
> > I think the sysfs control is the best option here.  
> 
> Actually, it occurs to me that the best option is probably a module
> parameter. If you have to take the network down and up to change the
> mode, why not just unload and reload the module?

We have something called devlink params, which support "configuration
modes" (= what level of reset is required to activate the new setting).
Maybe devlink param with cmode of "driver init" would be the best fit?

Module params are annoying because they are scoped to code / module not
instances of the device.
Leon Romanovsky April 2, 2025, 7:38 p.m. UTC | #12
On Wed, Apr 2, 2025, at 19:23, Jakub Kicinski wrote:
> On Tue, 01 Apr 2025 09:46:35 +0100 David Woodhouse wrote:
>> On Tue, 2025-04-01 at 09:02 +0100, David Woodhouse wrote:
>> > 
>> > I think the sysfs control is the best option here.  
>> 
>> Actually, it occurs to me that the best option is probably a module
>> parameter. If you have to take the network down and up to change the
>> mode, why not just unload and reload the module?
>
> We have something called devlink params, which support "configuration
> modes" (= what level of reset is required to activate the new setting).
> Maybe devlink param with cmode of "driver init" would be the best fit?

I had same feeling when I wrote my auxbus response. There is no reason to believe that ptp enable/disable knob won't be usable by other drivers 

It's universally usable, just not related to netdev sysfs layout.

Thanks 

>
> Module params are annoying because they are scoped to code / module not
> instances of the device.
diff mbox series

Patch

diff --git a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
index 4561e8ab..158d7100 100644
--- a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
+++ b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst
@@ -53,9 +53,11 @@  ena_eth_io_defs.h   Definition of ENA data path interface.
 ena_common_defs.h   Common definitions for ena_com layer.
 ena_regs_defs.h     Definition of ENA PCI memory-mapped (MMIO) registers.
 ena_netdev.[ch]     Main Linux kernel driver.
+ena_sysfs.[ch]      Sysfs files.
 ena_ethtool.c       ethtool callbacks.
 ena_xdp.[ch]        XDP files
 ena_pci_id_tbl.h    Supported device IDs.
+ena_phc.[ch]        PTP hardware clock infrastructure (see `PHC`_ for more info)
 =================   ======================================================
 
 Management Interface:
@@ -221,6 +223,100 @@  descriptor it was received on would be recycled. When a packet smaller
 than RX copybreak bytes is received, it is copied into a new memory
 buffer and the RX descriptor is returned to HW.
 
+.. _`PHC`:
+
+PTP Hardware Clock (PHC)
+========================
+.. _`ptp-userspace-api`: https://docs.kernel.org/driver-api/ptp.html#ptp-hardware-clock-user-space-api
+.. _`testptp`: https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/ptp/testptp.c
+
+ENA Linux driver supports PTP hardware clock providing timestamp reference to achieve nanosecond resolution.
+
+**PHC support**
+
+PHC depends on the PTP module, which needs to be either loaded as a module or compiled into the kernel.
+
+Verify if the PTP module is present:
+
+.. code-block:: shell
+
+  grep -w '^CONFIG_PTP_1588_CLOCK=[ym]' /boot/config-`uname -r`
+
+- If no output is provided, the ENA driver cannot be loaded with PHC support.
+
+- ``CONFIG_PTP_1588_CLOCK=y``: the PTP module is already compiled and loaded inside the kernel binary file.
+
+- ``CONFIG_PTP_1588_CLOCK=m``: the PTP module needs to be loaded prior to loading the ENA driver:
+
+Load PTP module:
+
+.. code-block:: shell
+
+  sudo modprobe ptp
+
+**PHC activation**
+
+The feature is turned off by default, in order to turn the feature on,
+please use the following:
+
+- sysfs (during runtime):
+
+.. code-block:: shell
+
+  echo 1 > /sys/bus/pci/devices/<domain:bus:slot.function>/phc_enable
+
+All available PTP clock sources can be tracked here:
+
+.. code-block:: shell
+
+  ls /sys/class/ptp
+
+PHC support and capabilities can be verified using ethtool:
+
+.. code-block:: shell
+
+  ethtool -T <interface>
+
+**PHC timestamp**
+
+To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example using `testptp`_:
+
+.. code-block:: shell
+
+  testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware Clock:/ {print $NF}') -k 1
+
+PHC get time requests should be within reasonable bounds,
+avoid excessive utilization to ensure optimal performance and efficiency.
+The ENA device restricts the frequency of PHC get time requests to a maximum
+of 125 requests per second. If this limit is surpassed, the get time request
+will fail, leading to an increment in the phc_err statistic.
+
+**PHC statistics**
+
+PHC counters can be monitored using the following:
+
+sysfs:
+
+.. code-block:: shell
+
+  cat /sys/bus/pci/devices/<domain:bus:slot.function>/phc_stats/<stat_name>
+
+=================   ======================================================
+**phc_cnt**         Number of successful retrieved timestamps (below expire timeout).
+**phc_exp**         Number of expired retrieved timestamps (above expire timeout).
+**phc_skp**         Number of skipped get time attempts (during block period).
+**phc_err**         Number of failed get time attempts (entering into block state).
+=================   ======================================================
+
+PHC timeouts:
+
+=================   ======================================================
+**expire**          Max time for a valid timestamp retrieval, passing this threshold will fail
+                    the get time request and block new requests until block timeout.
+**block**           Blocking period starts once get time request expires or fails, all get time
+                    requests during block period will be skipped.
+=================   ======================================================
+
 Statistics
 ==========