diff mbox series

HSR/PRP sequence counter issue with Cisco Redbox

Message ID 69ec2fd1a9a048e8b3305a4bc36aad01@EXCH-SVR2013.eberle.local (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series HSR/PRP sequence counter issue with Cisco Redbox | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Wenzel, Marco Jan. 27, 2021, 12:15 p.m. UTC
Hi,

we have figured out an issue with the current PRP driver when trying to communicate with Cisco IE 2000 industrial Ethernet switches in Redbox mode. The Cisco always resets the HSR/PRP sequence counter to "1" at low traffic (<= 1 frame in 400 ms). It can be reproduced by a simple ICMP echo request with 1 s interval between a Linux box running with PRP and a VDAN behind the Cisco Redbox. The Linux box then always receives frames with sequence counter "1" and drops them. The behavior is not configurable at the Cisco Redbox.

I fixed it by ignoring sequence counters with value "1" at the sequence counter check in hsr_register_frame_out ():



Do you think this could be a solution? Should this patch be officially applied in order to avoid other users running into these communication issues?

Thanks
Marco Wenzel

Comments

George McCollister Jan. 28, 2021, 11:11 p.m. UTC | #1
On Wed, Jan 27, 2021 at 6:32 AM Wenzel, Marco <Marco.Wenzel@a-eberle.de> wrote:
>
> Hi,
>
> we have figured out an issue with the current PRP driver when trying to communicate with Cisco IE 2000 industrial Ethernet switches in Redbox mode. The Cisco always resets the HSR/PRP sequence counter to "1" at low traffic (<= 1 frame in 400 ms). It can be reproduced by a simple ICMP echo request with 1 s interval between a Linux box running with PRP and a VDAN behind the Cisco Redbox. The Linux box then always receives frames with sequence counter "1" and drops them. The behavior is not configurable at the Cisco Redbox.
>
> I fixed it by ignoring sequence counters with value "1" at the sequence counter check in hsr_register_frame_out ():
>
> diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
> index 5c97de459905..630c238e81f0 100644
> --- a/net/hsr/hsr_framereg.c
> +++ b/net/hsr/hsr_framereg.c
> @@ -411,7 +411,7 @@ void hsr_register_frame_in(struct hsr_node *node, struct hsr_port *port,
>  int hsr_register_frame_out(struct hsr_port *port, struct hsr_node *node,
>                            u16 sequence_nr)
>  {
> -       if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type]))
> +       if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type]) && (sequence_nr != 1))
>                 return 1;
>
>         node->seq_out[port->type] = sequence_nr;
>
>
> Do you think this could be a solution? Should this patch be officially applied in order to avoid other users running into these communication issues?

This isn't the correct way to solve the problem. IEC 62439-3 defines
EntryForgetTime as "Time after which an entry is removed from the
duplicate table" with a value of 400ms and states devices should
usually be configured to keep entries in the table for a much shorter
time. hsr_framereg.c needs to be reworked to handle this according to
the specification.

>
> Thanks
> Marco Wenzel

Regards,
George McCollister
diff mbox series

Patch

diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
index 5c97de459905..630c238e81f0 100644
--- a/net/hsr/hsr_framereg.c
+++ b/net/hsr/hsr_framereg.c
@@ -411,7 +411,7 @@  void hsr_register_frame_in(struct hsr_node *node, struct hsr_port *port,
 int hsr_register_frame_out(struct hsr_port *port, struct hsr_node *node,
                           u16 sequence_nr)
 {
-       if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type]))
+       if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type]) && (sequence_nr != 1))
                return 1;

        node->seq_out[port->type] = sequence_nr;