Message ID | 20250304190504.3743-6-darinzon@amazon.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | PHC support in ENA driver | expand |
> +The feature is turned off by default, in order to turn the feature on, > +please use the following: > + > +- sysfs (during runtime): > + > +.. code-block:: shell > + > + echo 1 > /sys/bus/pci/devices/<domain:bus:slot.function>/phc_enable > + > +All available PTP clock sources can be tracked here: > + > +.. code-block:: shell > + > + ls /sys/class/ptp > + > +PHC support and capabilities can be verified using ethtool: > + > +.. code-block:: shell > + > + ethtool -T <interface> > + > +**PHC timestamp** > + > +To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example using `testptp`_: > + > +.. code-block:: shell > + > + testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware Clock:/ {print $NF}') -k 1 Why is not opening /dev/ptpX sufficient to enable the PHC? Andrew
> > +The feature is turned off by default, in order to turn the feature > > +on, please use the following: > > + > > +- sysfs (during runtime): > > + > > +.. code-block:: shell > > + > > + echo 1 > /sys/bus/pci/devices/<domain:bus:slot.function>/phc_enable > > + > > +All available PTP clock sources can be tracked here: > > + > > +.. code-block:: shell > > + > > + ls /sys/class/ptp > > + > > +PHC support and capabilities can be verified using ethtool: > > + > > +.. code-block:: shell > > + > > + ethtool -T <interface> > > + > > +**PHC timestamp** > > + > > +To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example > using `testptp`_: > > + > > +.. code-block:: shell > > + > > + testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware > > + Clock:/ {print $NF}') -k 1 > > Why is not opening /dev/ptpX sufficient to enable the PHC? > > Andrew Hi Andrew, The reasoning for the enablement option of PHC was explained in patch 3 in the series https://lore.kernel.org/netdev/20250304190504.3743-4-darinzon@amazon.com/ David
On Wed, 2025-03-05 at 12:50 +0000, Arinzon, David wrote: > > > Why is not opening /dev/ptpX sufficient to enable the PHC? > > > > Andrew > > Hi Andrew, > > The reasoning for the enablement option of PHC was explained in patch 3 in the series > https://lore.kernel.org/netdev/20250304190504.3743-4-darinzon@amazon.com/ Not really. That says, "The PHC feature is disabled by default, although its footprint is small, most customers do not require such high-precision timestamping. Enabling PHC unnecessarily in environments that do not need it might introduce a minor but unnecessary overhead." Disable by default, sure. But it gives no explanation for why we need a separate knob in sysfs to enable/disable it, and we can't just enable it automatically when /dev/ptpX is opened as Andrew suggested. If you read the actual code in that patch, there's a hint though. Because the actual process in ena_phc_enable_set() does the following: + ena_destroy_device(adapter, false); + rc = ena_restore_device(adapter); Is that actually tearing down the whole netdev and recreating it when the PHC enable is flipped? Does it really have to? That doesn't seem ideal even if it *is* a separate knob in sysfs.
> If you read the actual code in that patch, there's a hint though. > Because the actual process in ena_phc_enable_set() does the following: > > + ena_destroy_device(adapter, false); > + rc = ena_restore_device(adapter); > > Is that actually tearing down the whole netdev and recreating it when > the PHC enable is flipped? Well Jakub said it is a pure clock, not related to the netdev. If that is true, i don't see why this should be needed. But i've not looked at the code... Andrew
On Wed, Mar 05, 2025 at 04:35:48PM +0100, Andrew Lunn wrote: > > If you read the actual code in that patch, there's a hint though. > > Because the actual process in ena_phc_enable_set() does the following: > > > > + ena_destroy_device(adapter, false); > > + rc = ena_restore_device(adapter); > > > > Is that actually tearing down the whole netdev and recreating it when > > the PHC enable is flipped? > > Well Jakub said it is a pure clock, not related to the netdev. So why are you ready to merge the code which is not netdev, doesn't have any association with netdev and doesn't follow netdev semantics (no custom sysfs files)? Thanks
> > > If you read the actual code in that patch, there's a hint though. > > > Because the actual process in ena_phc_enable_set() does the following: > > > > > > + ena_destroy_device(adapter, false); > > > + rc = ena_restore_device(adapter); > > > > > > Is that actually tearing down the whole netdev and recreating it > > > when the PHC enable is flipped? > > > > Well Jakub said it is a pure clock, not related to the netdev. The PHC device is a PTP clock integrated with the networking device under the same PCI device. As previously mentioned in this thread, enabling or disabling the ENA PHC requires reconfiguring the ENA network device, which involves tearing down and recreating the netdev. This necessitates having a separate control knob. Thanks, David > > So why are you ready to merge the code which is not netdev, doesn't have > any association with netdev and doesn't follow netdev semantics (no custom > sysfs files)? > > Thanks
On Wed, Mar 26, 2025 at 03:32:00PM +0000, Arinzon, David wrote: > > > > If you read the actual code in that patch, there's a hint though. > > > > Because the actual process in ena_phc_enable_set() does the following: > > > > > > > > + ena_destroy_device(adapter, false); > > > > + rc = ena_restore_device(adapter); > > > > > > > > Is that actually tearing down the whole netdev and recreating it > > > > when the PHC enable is flipped? > > > > > > Well Jakub said it is a pure clock, not related to the netdev. > > The PHC device is a PTP clock integrated with the networking device under the same PCI device. > As previously mentioned in this thread, enabling or disabling the ENA PHC requires reconfiguring the ENA network device, > which involves tearing down and recreating the netdev. > This necessitates having a separate control knob. I appreciate the desire to keep everything in one place and provide custom, vendor specific sysfs files. However, there is standard way (auxbus??) to solve it without need to reinvent anything. Thanks > > Thanks, > David > > > > > So why are you ready to merge the code which is not netdev, doesn't have > > any association with netdev and doesn't follow netdev semantics (no custom > > sysfs files)? > > > > Thanks
On Thu, 2025-03-27 at 13:15 +0200, Leon Romanovsky wrote: > On Wed, Mar 26, 2025 at 03:32:00PM +0000, Arinzon, David wrote: > > > > > If you read the actual code in that patch, there's a hint though. > > > > > Because the actual process in ena_phc_enable_set() does the following: > > > > > > > > > > + ena_destroy_device(adapter, false); > > > > > + rc = ena_restore_device(adapter); > > > > > > > > > > Is that actually tearing down the whole netdev and recreating it > > > > > when the PHC enable is flipped? > > > > > > > > Well Jakub said it is a pure clock, not related to the netdev. > > > > The PHC device is a PTP clock integrated with the networking device under the same PCI device. > > As previously mentioned in this thread, enabling or disabling the > > ENA PHC requires reconfiguring the ENA network device, > > which involves tearing down and recreating the netdev. > > This necessitates having a separate control knob. > > I appreciate the desire to keep everything in one place and provide > custom, vendor specific sysfs files. However, there is standard way > (auxbus??) to solve it without need to reinvent anything. Using auxbus would allow the PHC driver to be loaded as a separate module. But AIUI enabling the PHC involves reconfiguring the device and thus bringing the network device down and back up. Which isn't ideal. It *isn't* as separate as auxbus would make it look. We don't want to unconditionally enable the PHC support in the device even when it's not being used, because of the potential blast radius of such a change if *every* guest starts enabling it even when they don't need to use it.
On Tue, 2025-03-04 at 22:10 +0100, Andrew Lunn wrote: > > +The feature is turned off by default, in order to turn the feature > > on, > > +please use the following: > > + > > +- sysfs (during runtime): > > + > > +.. code-block:: shell > > + > > + echo 1 > > > /sys/bus/pci/devices/<domain:bus:slot.function>/phc_enable > > + > > +All available PTP clock sources can be tracked here: > > + > > +.. code-block:: shell > > + > > + ls /sys/class/ptp > > + > > +PHC support and capabilities can be verified using ethtool: > > + > > +.. code-block:: shell > > + > > + ethtool -T <interface> > > + > > +**PHC timestamp** > > + > > +To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example > > using `testptp`_: > > + > > +.. code-block:: shell > > + > > + testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware > > Clock:/ {print $NF}') -k 1 > > Why is not opening /dev/ptpX sufficient to enable the PHC? There's currently no hook from ptp_open() and ptp_release() into the implementation to do anything like that. It could be added, I suppose, but it isn't a great experience because the act of opening it for the first time would then take down the netdev. None of the alternative bikeshed options that David is being bombarded with here make sense to me, like auxdev or implicit opening, because it's tied to the netdev. In a future revision of the device itself, perhaps we can enable the PHC completely independently of the network device. And then of course we can just do it on ptp_open(), and we'll add the required hooks in struct ptp_clock_info. But for now, that doesn't make sense. I've been a maintainer. I remember looking at submissions and thinking "there must be a better way to do this", then attempting five different options and concluding that the original submission is the one I dislike least after all. This seems like one of those times, with a bunch of suggestions which ultimately give a worse user experience. And a contributor caught in the cross-fire with no coherent guidance about how to proceed. I think the sysfs control is the best option here. On the device side, I am internally applying the cluebat because this whole thing was *predictable*. The PHC feature *should* have been orthogonal, in the device, to the actual networking. So it can be enabled or disabled at will. Eventually I hope a later version of the device should fix that, and we can have a nice clean user interface — the /dev/ptpX node can always exist, but it does nothing at all if it isn't being used. In the interim, a sysfs knob to explicitly turn it on seems like a reasonable approach to me. I am ambivalent about *also* using auxdev, so that when the netdev is told to enable the PHC feature, the auxdev 'appears' and a separate PTP driver binds to it. It's kind of cute, but probably overkill, and not what any other NIC driver does, is it? CONFIG_PTP_1588_CLOCK_OPTIONAL exists to handle the case of NICs with optional PTP support. (Actually it turns out I'm less ambivalent by the end of that paragraph than I was at the start... :)
On Tue, 2025-04-01 at 09:02 +0100, David Woodhouse wrote: > > I think the sysfs control is the best option here. Actually, it occurs to me that the best option is probably a module parameter. If you have to take the network down and up to change the mode, why not just unload and reload the module? I understand there's a policy about that too, because we don't want bespoke module parameters changing the behaviour of network devices, which ought to be uniform. But this isn't changing the behaviour of the network device per se. Think of it as three drivers. One for the PCI device itself, which registers one or two auxdevs (depending on a module option). This module option is not on the network device driver. Then there's the netdev and the PTP which bind to those auxdevs. Now in *practice*, actually using auxdev is overkill and not consistent with what other network+PTP drivers do, and I think it should remain in a single driver. I'm using the separation to explain why a module parameter wouldn't be conceptually on the "network" device, and why we should consider this one of the cases where "policy" is just a euphemism for not doing our own thinking. Just put a module param on the PCI device driver and have done with it.
On Tue, 01 Apr 2025 09:46:35 +0100 David Woodhouse wrote: > On Tue, 2025-04-01 at 09:02 +0100, David Woodhouse wrote: > > > > I think the sysfs control is the best option here. > > Actually, it occurs to me that the best option is probably a module > parameter. If you have to take the network down and up to change the > mode, why not just unload and reload the module? We have something called devlink params, which support "configuration modes" (= what level of reset is required to activate the new setting). Maybe devlink param with cmode of "driver init" would be the best fit? Module params are annoying because they are scoped to code / module not instances of the device.
On Wed, Apr 2, 2025, at 19:23, Jakub Kicinski wrote: > On Tue, 01 Apr 2025 09:46:35 +0100 David Woodhouse wrote: >> On Tue, 2025-04-01 at 09:02 +0100, David Woodhouse wrote: >> > >> > I think the sysfs control is the best option here. >> >> Actually, it occurs to me that the best option is probably a module >> parameter. If you have to take the network down and up to change the >> mode, why not just unload and reload the module? > > We have something called devlink params, which support "configuration > modes" (= what level of reset is required to activate the new setting). > Maybe devlink param with cmode of "driver init" would be the best fit? I had same feeling when I wrote my auxbus response. There is no reason to believe that ptp enable/disable knob won't be usable by other drivers It's universally usable, just not related to netdev sysfs layout. Thanks > > Module params are annoying because they are scoped to code / module not > instances of the device.
diff --git a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst index 4561e8ab..158d7100 100644 --- a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst +++ b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst @@ -53,9 +53,11 @@ ena_eth_io_defs.h Definition of ENA data path interface. ena_common_defs.h Common definitions for ena_com layer. ena_regs_defs.h Definition of ENA PCI memory-mapped (MMIO) registers. ena_netdev.[ch] Main Linux kernel driver. +ena_sysfs.[ch] Sysfs files. ena_ethtool.c ethtool callbacks. ena_xdp.[ch] XDP files ena_pci_id_tbl.h Supported device IDs. +ena_phc.[ch] PTP hardware clock infrastructure (see `PHC`_ for more info) ================= ====================================================== Management Interface: @@ -221,6 +223,100 @@ descriptor it was received on would be recycled. When a packet smaller than RX copybreak bytes is received, it is copied into a new memory buffer and the RX descriptor is returned to HW. +.. _`PHC`: + +PTP Hardware Clock (PHC) +======================== +.. _`ptp-userspace-api`: https://docs.kernel.org/driver-api/ptp.html#ptp-hardware-clock-user-space-api +.. _`testptp`: https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/ptp/testptp.c + +ENA Linux driver supports PTP hardware clock providing timestamp reference to achieve nanosecond resolution. + +**PHC support** + +PHC depends on the PTP module, which needs to be either loaded as a module or compiled into the kernel. + +Verify if the PTP module is present: + +.. code-block:: shell + + grep -w '^CONFIG_PTP_1588_CLOCK=[ym]' /boot/config-`uname -r` + +- If no output is provided, the ENA driver cannot be loaded with PHC support. + +- ``CONFIG_PTP_1588_CLOCK=y``: the PTP module is already compiled and loaded inside the kernel binary file. + +- ``CONFIG_PTP_1588_CLOCK=m``: the PTP module needs to be loaded prior to loading the ENA driver: + +Load PTP module: + +.. code-block:: shell + + sudo modprobe ptp + +**PHC activation** + +The feature is turned off by default, in order to turn the feature on, +please use the following: + +- sysfs (during runtime): + +.. code-block:: shell + + echo 1 > /sys/bus/pci/devices/<domain:bus:slot.function>/phc_enable + +All available PTP clock sources can be tracked here: + +.. code-block:: shell + + ls /sys/class/ptp + +PHC support and capabilities can be verified using ethtool: + +.. code-block:: shell + + ethtool -T <interface> + +**PHC timestamp** + +To retrieve PHC timestamp, use `ptp-userspace-api`_, usage example using `testptp`_: + +.. code-block:: shell + + testptp -d /dev/ptp$(ethtool -T <interface> | awk '/PTP Hardware Clock:/ {print $NF}') -k 1 + +PHC get time requests should be within reasonable bounds, +avoid excessive utilization to ensure optimal performance and efficiency. +The ENA device restricts the frequency of PHC get time requests to a maximum +of 125 requests per second. If this limit is surpassed, the get time request +will fail, leading to an increment in the phc_err statistic. + +**PHC statistics** + +PHC counters can be monitored using the following: + +sysfs: + +.. code-block:: shell + + cat /sys/bus/pci/devices/<domain:bus:slot.function>/phc_stats/<stat_name> + +================= ====================================================== +**phc_cnt** Number of successful retrieved timestamps (below expire timeout). +**phc_exp** Number of expired retrieved timestamps (above expire timeout). +**phc_skp** Number of skipped get time attempts (during block period). +**phc_err** Number of failed get time attempts (entering into block state). +================= ====================================================== + +PHC timeouts: + +================= ====================================================== +**expire** Max time for a valid timestamp retrieval, passing this threshold will fail + the get time request and block new requests until block timeout. +**block** Blocking period starts once get time request expires or fails, all get time + requests during block period will be skipped. +================= ====================================================== + Statistics ==========