Message ID | 20240608025347.90680-1-chengen.du@canonical.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v6] af_packet: Handle outgoing VLAN packets without hardware offloading | expand |
Hi, I would like to provide some additional explanations about the patch. On Sat, Jun 8, 2024 at 10:54 AM Chengen Du <chengen.du@canonical.com> wrote: > > The issue initially stems from libpcap. The ethertype will be overwritten > as the VLAN TPID if the network interface lacks hardware VLAN offloading. > In the outbound packet path, if hardware VLAN offloading is unavailable, > the VLAN tag is inserted into the payload but then cleared from the sk_buff > struct. Consequently, this can lead to a false negative when checking for > the presence of a VLAN tag, causing the packet sniffing outcome to lack > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing > tool may be unable to parse packets as expected. > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the > payload and not in the sk_buff struct. The skb_vlan_tag_present() function > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header > is stripped, preventing the packet capturing tool from determining the > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, > which means the packet capturing tool cannot parse the L3 header correctly. > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") > Cc: stable@vger.kernel.org > Signed-off-by: Chengen Du <chengen.du@canonical.com> > --- > net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 55 insertions(+), 2 deletions(-) > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index ea3ebc160e25..8cffbe1f912d 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, > return packet_lookup_frame(po, rb, rb->head, status); > } > > +static u16 vlan_get_tci(struct sk_buff *skb) > +{ > + struct vlan_hdr vhdr, *vh; > + u8 *skb_orig_data = skb->data; > + int skb_orig_len = skb->len; > + > + skb_push(skb, skb->data - skb_mac_header(skb)); > + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); > + if (skb_orig_data != skb->data) { > + skb->data = skb_orig_data; > + skb->len = skb_orig_len; > + } The reason for not directly using skb_header_pointer(skb, skb_mac_header(skb) + ETH_HLEN, ...) to get the VLAN header is due to the check logic in skb_header_pointer. In the SOCK_DGRAM and PACKET_OUTGOING scenarios, the offset can be a negative number, which causes the check logic (i.e., likely(hlen - offset >= len)) in __skb_header_pointer() to not work as expected. While it is possible to modify __skb_header_pointer() to handle cases where the offset is negative, this change could affect a wider range of code. Please kindly share your thoughts on this approach. > + if (unlikely(!vh)) > + return 0; > + > + return ntohs(vh->h_vlan_TCI); > +} > + > +static __be16 vlan_get_protocol_dgram(struct sk_buff *skb) > +{ > + __be16 proto = skb->protocol; > + > + if (unlikely(eth_type_vlan(proto))) { > + u8 *skb_orig_data = skb->data; > + int skb_orig_len = skb->len; > + > + skb_push(skb, skb->data - skb_mac_header(skb)); > + proto = __vlan_get_protocol(skb, proto, NULL); > + if (skb_orig_data != skb->data) { > + skb->data = skb_orig_data; > + skb->len = skb_orig_len; > + } > + } > + > + return proto; > +} > + > static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc) > { > del_timer_sync(&pkc->retire_blk_timer); > @@ -1007,10 +1044,16 @@ static void prb_clear_rxhash(struct tpacket_kbdq_core *pkc, > static void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc, > struct tpacket3_hdr *ppd) > { > + struct packet_sock *po = container_of(pkc, struct packet_sock, rx_ring.prb_bdqc); > + > if (skb_vlan_tag_present(pkc->skb)) { > ppd->hv1.tp_vlan_tci = skb_vlan_tag_get(pkc->skb); > ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->vlan_proto); > ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > + } else if (unlikely(po->sk.sk_type == SOCK_DGRAM && eth_type_vlan(pkc->skb->protocol))) { > + ppd->hv1.tp_vlan_tci = vlan_get_tci(pkc->skb); > + ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->protocol); > + ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > } else { > ppd->hv1.tp_vlan_tci = 0; > ppd->hv1.tp_vlan_tpid = 0; > @@ -2428,6 +2471,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, > h.h2->tp_vlan_tci = skb_vlan_tag_get(skb); > h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto); > status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > + } else if (unlikely(sk->sk_type == SOCK_DGRAM && eth_type_vlan(skb->protocol))) { > + h.h2->tp_vlan_tci = vlan_get_tci(skb); > + h.h2->tp_vlan_tpid = ntohs(skb->protocol); > + status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > } else { > h.h2->tp_vlan_tci = 0; > h.h2->tp_vlan_tpid = 0; > @@ -2457,7 +2504,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, > sll->sll_halen = dev_parse_header(skb, sll->sll_addr); > sll->sll_family = AF_PACKET; > sll->sll_hatype = dev->type; > - sll->sll_protocol = skb->protocol; > + sll->sll_protocol = (sk->sk_type == SOCK_DGRAM) ? > + vlan_get_protocol_dgram(skb) : skb->protocol; > sll->sll_pkttype = skb->pkt_type; > if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV))) > sll->sll_ifindex = orig_dev->ifindex; > @@ -3482,7 +3530,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, > /* Original length was stored in sockaddr_ll fields */ > origlen = PACKET_SKB_CB(skb)->sa.origlen; > sll->sll_family = AF_PACKET; > - sll->sll_protocol = skb->protocol; > + sll->sll_protocol = (sock->type == SOCK_DGRAM) ? > + vlan_get_protocol_dgram(skb) : skb->protocol; > } > > sock_recv_cmsgs(msg, sk, skb); > @@ -3539,6 +3588,10 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, > aux.tp_vlan_tci = skb_vlan_tag_get(skb); > aux.tp_vlan_tpid = ntohs(skb->vlan_proto); > aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > + } else if (unlikely(sock->type == SOCK_DGRAM && eth_type_vlan(skb->protocol))) { > + aux.tp_vlan_tci = vlan_get_tci(skb); > + aux.tp_vlan_tpid = ntohs(skb->protocol); > + aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > } else { > aux.tp_vlan_tci = 0; > aux.tp_vlan_tpid = 0; > -- > 2.43.0 >
Chengen Du wrote: > The issue initially stems from libpcap. The ethertype will be overwritten > as the VLAN TPID if the network interface lacks hardware VLAN offloading. > In the outbound packet path, if hardware VLAN offloading is unavailable, > the VLAN tag is inserted into the payload but then cleared from the sk_buff > struct. Consequently, this can lead to a false negative when checking for > the presence of a VLAN tag, causing the packet sniffing outcome to lack > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing > tool may be unable to parse packets as expected. > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the > payload and not in the sk_buff struct. The skb_vlan_tag_present() function > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header > is stripped, preventing the packet capturing tool from determining the > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, > which means the packet capturing tool cannot parse the L3 header correctly. > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") > Cc: stable@vger.kernel.org > Signed-off-by: Chengen Du <chengen.du@canonical.com> Overall, solid. > --- > net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 55 insertions(+), 2 deletions(-) > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index ea3ebc160e25..8cffbe1f912d 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, > return packet_lookup_frame(po, rb, rb->head, status); > } > > +static u16 vlan_get_tci(struct sk_buff *skb) > +{ > + struct vlan_hdr vhdr, *vh; > + u8 *skb_orig_data = skb->data; > + int skb_orig_len = skb->len; > + > + skb_push(skb, skb->data - skb_mac_header(skb)); > + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); Don't harcode Ethernet. According to documentation VLANs are used with other link layers. More importantly, in practice PF_PACKET allows inserting this skb->protocol on any device. We don't use link layer specific constants anywhere in the packet socket code for this reason. But instead dev->hard_header_len. One caveat there is variable length link layer headers, where dev->min_header_len != dev->hard_header_len. Will just have to fail on those. > + if (skb_orig_data != skb->data) { > + skb->data = skb_orig_data; > + skb->len = skb_orig_len; > + } > + if (unlikely(!vh)) > + return 0; > + > + return ntohs(vh->h_vlan_TCI); > +} > + Only since I had to respond above: this is non-obvious enough to deserve a function comment. Something like the following? /* For SOCK_DGRAM, data starts at the network protocol, after any VLAN * headers. sll_protocol must point to the network protocol. The * (outer) VLAN TCI is still accessible as auxdata. */ > +static __be16 vlan_get_protocol_dgram(struct sk_buff *skb) > +{ > + __be16 proto = skb->protocol; > + > + if (unlikely(eth_type_vlan(proto))) { > + u8 *skb_orig_data = skb->data; > + int skb_orig_len = skb->len; > + > + skb_push(skb, skb->data - skb_mac_header(skb)); > + proto = __vlan_get_protocol(skb, proto, NULL); > + if (skb_orig_data != skb->data) { > + skb->data = skb_orig_data; > + skb->len = skb_orig_len; > + } > + } > + > + return proto; > +} > + > static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc) > { > del_timer_sync(&pkc->retire_blk_timer); > @@ -1007,10 +1044,16 @@ static void prb_clear_rxhash(struct tpacket_kbdq_core *pkc, > static void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc, > struct tpacket3_hdr *ppd) > { > + struct packet_sock *po = container_of(pkc, struct packet_sock, rx_ring.prb_bdqc); > + > if (skb_vlan_tag_present(pkc->skb)) { > ppd->hv1.tp_vlan_tci = skb_vlan_tag_get(pkc->skb); > ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->vlan_proto); > ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > + } else if (unlikely(po->sk.sk_type == SOCK_DGRAM && eth_type_vlan(pkc->skb->protocol))) { > + ppd->hv1.tp_vlan_tci = vlan_get_tci(pkc->skb); > + ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->protocol); > + ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > } else { > ppd->hv1.tp_vlan_tci = 0; > ppd->hv1.tp_vlan_tpid = 0; > @@ -2428,6 +2471,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, > h.h2->tp_vlan_tci = skb_vlan_tag_get(skb); > h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto); > status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > + } else if (unlikely(sk->sk_type == SOCK_DGRAM && eth_type_vlan(skb->protocol))) { > + h.h2->tp_vlan_tci = vlan_get_tci(skb); > + h.h2->tp_vlan_tpid = ntohs(skb->protocol); > + status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > } else { > h.h2->tp_vlan_tci = 0; > h.h2->tp_vlan_tpid = 0; > @@ -2457,7 +2504,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, > sll->sll_halen = dev_parse_header(skb, sll->sll_addr); > sll->sll_family = AF_PACKET; > sll->sll_hatype = dev->type; > - sll->sll_protocol = skb->protocol; > + sll->sll_protocol = (sk->sk_type == SOCK_DGRAM) ? > + vlan_get_protocol_dgram(skb) : skb->protocol; > sll->sll_pkttype = skb->pkt_type; > if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV))) > sll->sll_ifindex = orig_dev->ifindex; > @@ -3482,7 +3530,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, > /* Original length was stored in sockaddr_ll fields */ > origlen = PACKET_SKB_CB(skb)->sa.origlen; > sll->sll_family = AF_PACKET; > - sll->sll_protocol = skb->protocol; > + sll->sll_protocol = (sock->type == SOCK_DGRAM) ? > + vlan_get_protocol_dgram(skb) : skb->protocol; > } > > sock_recv_cmsgs(msg, sk, skb); > @@ -3539,6 +3588,10 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, > aux.tp_vlan_tci = skb_vlan_tag_get(skb); > aux.tp_vlan_tpid = ntohs(skb->vlan_proto); > aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > + } else if (unlikely(sock->type == SOCK_DGRAM && eth_type_vlan(skb->protocol))) { > + aux.tp_vlan_tci = vlan_get_tci(skb); > + aux.tp_vlan_tpid = ntohs(skb->protocol); > + aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > } else { > aux.tp_vlan_tci = 0; > aux.tp_vlan_tpid = 0; > -- > 2.43.0 >
Chengen Du wrote: > Hi, > > I would like to provide some additional explanations about the patch. > > > On Sat, Jun 8, 2024 at 10:54 AM Chengen Du <chengen.du@canonical.com> wrote: > > > > The issue initially stems from libpcap. The ethertype will be overwritten > > as the VLAN TPID if the network interface lacks hardware VLAN offloading. > > In the outbound packet path, if hardware VLAN offloading is unavailable, > > the VLAN tag is inserted into the payload but then cleared from the sk_buff > > struct. Consequently, this can lead to a false negative when checking for > > the presence of a VLAN tag, causing the packet sniffing outcome to lack > > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing > > tool may be unable to parse packets as expected. > > > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not > > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the > > payload and not in the sk_buff struct. The skb_vlan_tag_present() function > > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header > > is stripped, preventing the packet capturing tool from determining the > > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, > > which means the packet capturing tool cannot parse the L3 header correctly. > > > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 > > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u > > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") > > Cc: stable@vger.kernel.org > > Signed-off-by: Chengen Du <chengen.du@canonical.com> > > --- > > net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 55 insertions(+), 2 deletions(-) > > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > index ea3ebc160e25..8cffbe1f912d 100644 > > --- a/net/packet/af_packet.c > > +++ b/net/packet/af_packet.c > > @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, > > return packet_lookup_frame(po, rb, rb->head, status); > > } > > > > +static u16 vlan_get_tci(struct sk_buff *skb) > > +{ > > + struct vlan_hdr vhdr, *vh; > > + u8 *skb_orig_data = skb->data; > > + int skb_orig_len = skb->len; > > + > > + skb_push(skb, skb->data - skb_mac_header(skb)); > > + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); > > + if (skb_orig_data != skb->data) { > > + skb->data = skb_orig_data; > > + skb->len = skb_orig_len; > > + } > > > The reason for not directly using skb_header_pointer(skb, > skb_mac_header(skb) + ETH_HLEN, ...) to get the VLAN header is due to > the check logic in skb_header_pointer. In the SOCK_DGRAM and > PACKET_OUTGOING scenarios, the offset can be a negative number, which > causes the check logic (i.e., likely(hlen - offset >= len)) in > __skb_header_pointer() to not work as expected. The calculation is still correct? I think that this is not the first situation where negative offsets can be given to skb_header_pointer. > While it is possible to modify __skb_header_pointer() to handle cases > where the offset is negative, this change could affect a wider range > of code.
Hi Willem, I'm sorry, but I would like to confirm the issue further. On Mon, Jun 10, 2024 at 4:19 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Chengen Du wrote: > > The issue initially stems from libpcap. The ethertype will be overwritten > > as the VLAN TPID if the network interface lacks hardware VLAN offloading. > > In the outbound packet path, if hardware VLAN offloading is unavailable, > > the VLAN tag is inserted into the payload but then cleared from the sk_buff > > struct. Consequently, this can lead to a false negative when checking for > > the presence of a VLAN tag, causing the packet sniffing outcome to lack > > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing > > tool may be unable to parse packets as expected. > > > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not > > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the > > payload and not in the sk_buff struct. The skb_vlan_tag_present() function > > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header > > is stripped, preventing the packet capturing tool from determining the > > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, > > which means the packet capturing tool cannot parse the L3 header correctly. > > > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 > > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u > > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") > > Cc: stable@vger.kernel.org > > Signed-off-by: Chengen Du <chengen.du@canonical.com> > > Overall, solid. > > > --- > > net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 55 insertions(+), 2 deletions(-) > > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > index ea3ebc160e25..8cffbe1f912d 100644 > > --- a/net/packet/af_packet.c > > +++ b/net/packet/af_packet.c > > @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, > > return packet_lookup_frame(po, rb, rb->head, status); > > } > > > > +static u16 vlan_get_tci(struct sk_buff *skb) > > +{ > > + struct vlan_hdr vhdr, *vh; > > + u8 *skb_orig_data = skb->data; > > + int skb_orig_len = skb->len; > > + > > + skb_push(skb, skb->data - skb_mac_header(skb)); > > + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); > > Don't harcode Ethernet. > > According to documentation VLANs are used with other link layers. > > More importantly, in practice PF_PACKET allows inserting this > skb->protocol on any device. > > We don't use link layer specific constants anywhere in the packet > socket code for this reason. But instead dev->hard_header_len. > > One caveat there is variable length link layer headers, where > dev->min_header_len != dev->hard_header_len. Will just have to fail > on those. Thank you for pointing out this error. I would like to confirm if I need to use dev->hard_header_len to get the correct header length and return zero if dev->min_header_len != dev->hard_header_len to handle variable-length link layer headers. Is there something I misunderstand, or are there other aspects I need to consider further? > > > + if (skb_orig_data != skb->data) { > > + skb->data = skb_orig_data; > > + skb->len = skb_orig_len; > > + } > > + if (unlikely(!vh)) > > + return 0; > > + > > + return ntohs(vh->h_vlan_TCI); > > +} > > + > > Only since I had to respond above: this is non-obvious enough to > deserve a function comment. Something like the following? > > /* For SOCK_DGRAM, data starts at the network protocol, after any VLAN > * headers. sll_protocol must point to the network protocol. The > * (outer) VLAN TCI is still accessible as auxdata. > */ > > > +static __be16 vlan_get_protocol_dgram(struct sk_buff *skb) > > +{ > > + __be16 proto = skb->protocol; > > + > > + if (unlikely(eth_type_vlan(proto))) { > > + u8 *skb_orig_data = skb->data; > > + int skb_orig_len = skb->len; > > + > > + skb_push(skb, skb->data - skb_mac_header(skb)); > > + proto = __vlan_get_protocol(skb, proto, NULL); > > + if (skb_orig_data != skb->data) { > > + skb->data = skb_orig_data; > > + skb->len = skb_orig_len; > > + } > > + } > > + > > + return proto; > > +} > > + > > static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc) > > { > > del_timer_sync(&pkc->retire_blk_timer); > > @@ -1007,10 +1044,16 @@ static void prb_clear_rxhash(struct tpacket_kbdq_core *pkc, > > static void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc, > > struct tpacket3_hdr *ppd) > > { > > + struct packet_sock *po = container_of(pkc, struct packet_sock, rx_ring.prb_bdqc); > > + > > if (skb_vlan_tag_present(pkc->skb)) { > > ppd->hv1.tp_vlan_tci = skb_vlan_tag_get(pkc->skb); > > ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->vlan_proto); > > ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > > + } else if (unlikely(po->sk.sk_type == SOCK_DGRAM && eth_type_vlan(pkc->skb->protocol))) { > > + ppd->hv1.tp_vlan_tci = vlan_get_tci(pkc->skb); > > + ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->protocol); > > + ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > > } else { > > ppd->hv1.tp_vlan_tci = 0; > > ppd->hv1.tp_vlan_tpid = 0; > > @@ -2428,6 +2471,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, > > h.h2->tp_vlan_tci = skb_vlan_tag_get(skb); > > h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto); > > status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > > + } else if (unlikely(sk->sk_type == SOCK_DGRAM && eth_type_vlan(skb->protocol))) { > > + h.h2->tp_vlan_tci = vlan_get_tci(skb); > > + h.h2->tp_vlan_tpid = ntohs(skb->protocol); > > + status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > > } else { > > h.h2->tp_vlan_tci = 0; > > h.h2->tp_vlan_tpid = 0; > > @@ -2457,7 +2504,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, > > sll->sll_halen = dev_parse_header(skb, sll->sll_addr); > > sll->sll_family = AF_PACKET; > > sll->sll_hatype = dev->type; > > - sll->sll_protocol = skb->protocol; > > + sll->sll_protocol = (sk->sk_type == SOCK_DGRAM) ? > > + vlan_get_protocol_dgram(skb) : skb->protocol; > > sll->sll_pkttype = skb->pkt_type; > > if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV))) > > sll->sll_ifindex = orig_dev->ifindex; > > @@ -3482,7 +3530,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, > > /* Original length was stored in sockaddr_ll fields */ > > origlen = PACKET_SKB_CB(skb)->sa.origlen; > > sll->sll_family = AF_PACKET; > > - sll->sll_protocol = skb->protocol; > > + sll->sll_protocol = (sock->type == SOCK_DGRAM) ? > > + vlan_get_protocol_dgram(skb) : skb->protocol; > > } > > > > sock_recv_cmsgs(msg, sk, skb); > > @@ -3539,6 +3588,10 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, > > aux.tp_vlan_tci = skb_vlan_tag_get(skb); > > aux.tp_vlan_tpid = ntohs(skb->vlan_proto); > > aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > > + } else if (unlikely(sock->type == SOCK_DGRAM && eth_type_vlan(skb->protocol))) { > > + aux.tp_vlan_tci = vlan_get_tci(skb); > > + aux.tp_vlan_tpid = ntohs(skb->protocol); > > + aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; > > } else { > > aux.tp_vlan_tci = 0; > > aux.tp_vlan_tpid = 0; > > -- > > 2.43.0 > > > >
Hi Willem, On Mon, Jun 10, 2024 at 4:21 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Chengen Du wrote: > > Hi, > > > > I would like to provide some additional explanations about the patch. > > > > > > On Sat, Jun 8, 2024 at 10:54 AM Chengen Du <chengen.du@canonical.com> wrote: > > > > > > The issue initially stems from libpcap. The ethertype will be overwritten > > > as the VLAN TPID if the network interface lacks hardware VLAN offloading. > > > In the outbound packet path, if hardware VLAN offloading is unavailable, > > > the VLAN tag is inserted into the payload but then cleared from the sk_buff > > > struct. Consequently, this can lead to a false negative when checking for > > > the presence of a VLAN tag, causing the packet sniffing outcome to lack > > > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing > > > tool may be unable to parse packets as expected. > > > > > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not > > > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the > > > payload and not in the sk_buff struct. The skb_vlan_tag_present() function > > > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header > > > is stripped, preventing the packet capturing tool from determining the > > > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, > > > which means the packet capturing tool cannot parse the L3 header correctly. > > > > > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 > > > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u > > > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Chengen Du <chengen.du@canonical.com> > > > --- > > > net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- > > > 1 file changed, 55 insertions(+), 2 deletions(-) > > > > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > > index ea3ebc160e25..8cffbe1f912d 100644 > > > --- a/net/packet/af_packet.c > > > +++ b/net/packet/af_packet.c > > > @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, > > > return packet_lookup_frame(po, rb, rb->head, status); > > > } > > > > > > +static u16 vlan_get_tci(struct sk_buff *skb) > > > +{ > > > + struct vlan_hdr vhdr, *vh; > > > + u8 *skb_orig_data = skb->data; > > > + int skb_orig_len = skb->len; > > > + > > > + skb_push(skb, skb->data - skb_mac_header(skb)); > > > + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); > > > + if (skb_orig_data != skb->data) { > > > + skb->data = skb_orig_data; > > > + skb->len = skb_orig_len; > > > + } > > > > > > The reason for not directly using skb_header_pointer(skb, > > skb_mac_header(skb) + ETH_HLEN, ...) to get the VLAN header is due to > > the check logic in skb_header_pointer. In the SOCK_DGRAM and > > PACKET_OUTGOING scenarios, the offset can be a negative number, which > > causes the check logic (i.e., likely(hlen - offset >= len)) in > > __skb_header_pointer() to not work as expected. > > The calculation is still correct? > > I think that this is not the first situation where negative offsets > can be given to skb_header_pointer. The check will pass even if the offset is negative, but I believe this may not be the right approach. In my humble opinion, the expected check should be similar to the skb_push check, which ensures that after moving forward by the offset bytes, skb->data remains larger than or equal to skb->head to avoid accessing out-of-bound data. It might be worth considering adding a check in __skb_header_pointer to handle negative offsets, as this seems logical. However, this change could impact a wider range of code. Please correct me if I am mistaken. > > > While it is possible to modify __skb_header_pointer() to handle cases > > where the offset is negative, this change could affect a wider range > > of code. >
Chengen Du wrote: > Hi Willem, > > I'm sorry, but I would like to confirm the issue further. > > On Mon, Jun 10, 2024 at 4:19 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Chengen Du wrote: > > > The issue initially stems from libpcap. The ethertype will be overwritten > > > as the VLAN TPID if the network interface lacks hardware VLAN offloading. > > > In the outbound packet path, if hardware VLAN offloading is unavailable, > > > the VLAN tag is inserted into the payload but then cleared from the sk_buff > > > struct. Consequently, this can lead to a false negative when checking for > > > the presence of a VLAN tag, causing the packet sniffing outcome to lack > > > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing > > > tool may be unable to parse packets as expected. > > > > > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not > > > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the > > > payload and not in the sk_buff struct. The skb_vlan_tag_present() function > > > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header > > > is stripped, preventing the packet capturing tool from determining the > > > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, > > > which means the packet capturing tool cannot parse the L3 header correctly. > > > > > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 > > > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u > > > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Chengen Du <chengen.du@canonical.com> > > > > Overall, solid. > > > > > --- > > > net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- > > > 1 file changed, 55 insertions(+), 2 deletions(-) > > > > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > > index ea3ebc160e25..8cffbe1f912d 100644 > > > --- a/net/packet/af_packet.c > > > +++ b/net/packet/af_packet.c > > > @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, > > > return packet_lookup_frame(po, rb, rb->head, status); > > > } > > > > > > +static u16 vlan_get_tci(struct sk_buff *skb) > > > +{ > > > + struct vlan_hdr vhdr, *vh; > > > + u8 *skb_orig_data = skb->data; > > > + int skb_orig_len = skb->len; > > > + > > > + skb_push(skb, skb->data - skb_mac_header(skb)); > > > + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); > > > > Don't harcode Ethernet. > > > > According to documentation VLANs are used with other link layers. > > > > More importantly, in practice PF_PACKET allows inserting this > > skb->protocol on any device. > > > > We don't use link layer specific constants anywhere in the packet > > socket code for this reason. But instead dev->hard_header_len. > > > > One caveat there is variable length link layer headers, where > > dev->min_header_len != dev->hard_header_len. Will just have to fail > > on those. > > Thank you for pointing out this error. I would like to confirm if I > need to use dev->hard_header_len to get the correct header length and > return zero if dev->min_header_len != dev->hard_header_len to handle > variable-length link layer headers. Is there something I > misunderstand, or are there other aspects I need to consider further? That's right. The min_header_len != hard_header_len check is annoying and may seem pedantic. But it's the only way to trust that the next header starts at hard_header_len.
Chengen Du wrote: > Hi Willem, > > On Mon, Jun 10, 2024 at 4:21 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Chengen Du wrote: > > > Hi, > > > > > > I would like to provide some additional explanations about the patch. > > > > > > > > > On Sat, Jun 8, 2024 at 10:54 AM Chengen Du <chengen.du@canonical.com> wrote: > > > > > > > > The issue initially stems from libpcap. The ethertype will be overwritten > > > > as the VLAN TPID if the network interface lacks hardware VLAN offloading. > > > > In the outbound packet path, if hardware VLAN offloading is unavailable, > > > > the VLAN tag is inserted into the payload but then cleared from the sk_buff > > > > struct. Consequently, this can lead to a false negative when checking for > > > > the presence of a VLAN tag, causing the packet sniffing outcome to lack > > > > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing > > > > tool may be unable to parse packets as expected. > > > > > > > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not > > > > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the > > > > payload and not in the sk_buff struct. The skb_vlan_tag_present() function > > > > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header > > > > is stripped, preventing the packet capturing tool from determining the > > > > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, > > > > which means the packet capturing tool cannot parse the L3 header correctly. > > > > > > > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 > > > > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u > > > > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") > > > > Cc: stable@vger.kernel.org > > > > Signed-off-by: Chengen Du <chengen.du@canonical.com> > > > > --- > > > > net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- > > > > 1 file changed, 55 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > > > index ea3ebc160e25..8cffbe1f912d 100644 > > > > --- a/net/packet/af_packet.c > > > > +++ b/net/packet/af_packet.c > > > > @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, > > > > return packet_lookup_frame(po, rb, rb->head, status); > > > > } > > > > > > > > +static u16 vlan_get_tci(struct sk_buff *skb) > > > > +{ > > > > + struct vlan_hdr vhdr, *vh; > > > > + u8 *skb_orig_data = skb->data; > > > > + int skb_orig_len = skb->len; > > > > + > > > > + skb_push(skb, skb->data - skb_mac_header(skb)); > > > > + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); > > > > + if (skb_orig_data != skb->data) { > > > > + skb->data = skb_orig_data; > > > > + skb->len = skb_orig_len; > > > > + } > > > > > > > > > The reason for not directly using skb_header_pointer(skb, > > > skb_mac_header(skb) + ETH_HLEN, ...) to get the VLAN header is due to > > > the check logic in skb_header_pointer. In the SOCK_DGRAM and > > > PACKET_OUTGOING scenarios, the offset can be a negative number, which > > > causes the check logic (i.e., likely(hlen - offset >= len)) in > > > __skb_header_pointer() to not work as expected. > > > > The calculation is still correct? > > > > I think that this is not the first situation where negative offsets > > can be given to skb_header_pointer. > > The check will pass even if the offset is negative, but I believe this > may not be the right approach. In my humble opinion, the expected > check should be similar to the skb_push check, which ensures that > after moving forward by the offset bytes, skb->data remains larger > than or equal to skb->head to avoid accessing out-of-bound data. It > might be worth considering adding a check in __skb_header_pointer to > handle negative offsets, as this seems logical. However, this change > could impact a wider range of code. Please correct me if I am > mistaken. Your current approach is fine too. A negative offset greater than skb_headroom would certainly be a problem. But in these cases where skb->mac_header is known to be correct, the offset skb_mac_offset() against skb->data must be within bounds, even if it may be negative.
Hi Willem, On Tue, Jun 11, 2024 at 7:18 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Chengen Du wrote: > > Hi Willem, > > > > I'm sorry, but I would like to confirm the issue further. > > > > On Mon, Jun 10, 2024 at 4:19 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > Chengen Du wrote: > > > > The issue initially stems from libpcap. The ethertype will be overwritten > > > > as the VLAN TPID if the network interface lacks hardware VLAN offloading. > > > > In the outbound packet path, if hardware VLAN offloading is unavailable, > > > > the VLAN tag is inserted into the payload but then cleared from the sk_buff > > > > struct. Consequently, this can lead to a false negative when checking for > > > > the presence of a VLAN tag, causing the packet sniffing outcome to lack > > > > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing > > > > tool may be unable to parse packets as expected. > > > > > > > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not > > > > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the > > > > payload and not in the sk_buff struct. The skb_vlan_tag_present() function > > > > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header > > > > is stripped, preventing the packet capturing tool from determining the > > > > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, > > > > which means the packet capturing tool cannot parse the L3 header correctly. > > > > > > > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 > > > > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u > > > > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") > > > > Cc: stable@vger.kernel.org > > > > Signed-off-by: Chengen Du <chengen.du@canonical.com> > > > > > > Overall, solid. > > > > > > > --- > > > > net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- > > > > 1 file changed, 55 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > > > index ea3ebc160e25..8cffbe1f912d 100644 > > > > --- a/net/packet/af_packet.c > > > > +++ b/net/packet/af_packet.c > > > > @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, > > > > return packet_lookup_frame(po, rb, rb->head, status); > > > > } > > > > > > > > +static u16 vlan_get_tci(struct sk_buff *skb) > > > > +{ > > > > + struct vlan_hdr vhdr, *vh; > > > > + u8 *skb_orig_data = skb->data; > > > > + int skb_orig_len = skb->len; > > > > + > > > > + skb_push(skb, skb->data - skb_mac_header(skb)); > > > > + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); > > > > > > Don't harcode Ethernet. > > > > > > According to documentation VLANs are used with other link layers. > > > > > > More importantly, in practice PF_PACKET allows inserting this > > > skb->protocol on any device. > > > > > > We don't use link layer specific constants anywhere in the packet > > > socket code for this reason. But instead dev->hard_header_len. > > > > > > One caveat there is variable length link layer headers, where > > > dev->min_header_len != dev->hard_header_len. Will just have to fail > > > on those. > > > > Thank you for pointing out this error. I would like to confirm if I > > need to use dev->hard_header_len to get the correct header length and > > return zero if dev->min_header_len != dev->hard_header_len to handle > > variable-length link layer headers. Is there something I > > misunderstand, or are there other aspects I need to consider further? > > That's right. > > The min_header_len != hard_header_len check is annoying and may seem > pedantic. But it's the only way to trust that the next header starts > at hard_header_len. Thank you for your advice. I have implemented the modification, but I found that the (min_header_len != hard_header_len) check results in unexpected behavior in the following test scenario: ip link add link ens18 ens18.24 type vlan proto 802.1ad id 24 ip link add link ens18.24 ens18.24.25 type vlan proto 802.1Q id 25 ifconfig ens18.24 1.0.24.1/24 ifconfig ens18.24.25 1.0.25.1/24 ping -n 1.0.25.3 > /dev/null 2>&1 & tcpdump -nn -i any -y LINUX_SLL -Q out not tcp and not udp While receiving a packet from ens18.24.25 (802.1Q), the min_header_len and hard_header_len are 14 and 18, respectively. This check results in the TCI being 0 instead of 25. Should we skip this check to display the correct value, or is there another check that can achieve the same purpose?
Chengen Du wrote: > Hi Willem, > > On Tue, Jun 11, 2024 at 7:18 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Chengen Du wrote: > > > Hi Willem, > > > > > > I'm sorry, but I would like to confirm the issue further. > > > > > > On Mon, Jun 10, 2024 at 4:19 AM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > Chengen Du wrote: > > > > > The issue initially stems from libpcap. The ethertype will be overwritten > > > > > as the VLAN TPID if the network interface lacks hardware VLAN offloading. > > > > > In the outbound packet path, if hardware VLAN offloading is unavailable, > > > > > the VLAN tag is inserted into the payload but then cleared from the sk_buff > > > > > struct. Consequently, this can lead to a false negative when checking for > > > > > the presence of a VLAN tag, causing the packet sniffing outcome to lack > > > > > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing > > > > > tool may be unable to parse packets as expected. > > > > > > > > > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not > > > > > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the > > > > > payload and not in the sk_buff struct. The skb_vlan_tag_present() function > > > > > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header > > > > > is stripped, preventing the packet capturing tool from determining the > > > > > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, > > > > > which means the packet capturing tool cannot parse the L3 header correctly. > > > > > > > > > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 > > > > > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u > > > > > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") > > > > > Cc: stable@vger.kernel.org > > > > > Signed-off-by: Chengen Du <chengen.du@canonical.com> > > > > > > > > Overall, solid. > > > > > > > > > --- > > > > > net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- > > > > > 1 file changed, 55 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > > > > index ea3ebc160e25..8cffbe1f912d 100644 > > > > > --- a/net/packet/af_packet.c > > > > > +++ b/net/packet/af_packet.c > > > > > @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, > > > > > return packet_lookup_frame(po, rb, rb->head, status); > > > > > } > > > > > > > > > > +static u16 vlan_get_tci(struct sk_buff *skb) > > > > > +{ > > > > > + struct vlan_hdr vhdr, *vh; > > > > > + u8 *skb_orig_data = skb->data; > > > > > + int skb_orig_len = skb->len; > > > > > + > > > > > + skb_push(skb, skb->data - skb_mac_header(skb)); > > > > > + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); > > > > > > > > Don't harcode Ethernet. > > > > > > > > According to documentation VLANs are used with other link layers. > > > > > > > > More importantly, in practice PF_PACKET allows inserting this > > > > skb->protocol on any device. > > > > > > > > We don't use link layer specific constants anywhere in the packet > > > > socket code for this reason. But instead dev->hard_header_len. > > > > > > > > One caveat there is variable length link layer headers, where > > > > dev->min_header_len != dev->hard_header_len. Will just have to fail > > > > on those. > > > > > > Thank you for pointing out this error. I would like to confirm if I > > > need to use dev->hard_header_len to get the correct header length and > > > return zero if dev->min_header_len != dev->hard_header_len to handle > > > variable-length link layer headers. Is there something I > > > misunderstand, or are there other aspects I need to consider further? > > > > That's right. > > > > The min_header_len != hard_header_len check is annoying and may seem > > pedantic. But it's the only way to trust that the next header starts > > at hard_header_len. > > Thank you for your advice. > I have implemented the modification, but I found that the > (min_header_len != hard_header_len) check results in unexpected > behavior in the following test scenario: > ip link add link ens18 ens18.24 type vlan proto 802.1ad id 24 > ip link add link ens18.24 ens18.24.25 type vlan proto 802.1Q id 25 > ifconfig ens18.24 1.0.24.1/24 > ifconfig ens18.24.25 1.0.25.1/24 > ping -n 1.0.25.3 > /dev/null 2>&1 & > tcpdump -nn -i any -y LINUX_SLL -Q out not tcp and not udp > > While receiving a packet from ens18.24.25 (802.1Q), the min_header_len > and hard_header_len are 14 and 18, respectively. > This check results in the TCI being 0 instead of 25. > Should we skip this check to display the correct value, or is there > another check that can achieve the same purpose? Interesting. Glad you found this. Makes sense, as VLAN devices have vlandev->hard_header_len = dev->hard_header_len + VLAN_HLEN; Does if (min_header_len && min_header_len != hard_header_len) resolve it? Few devices actually set min_header_len. Initially, only Ethernet in ether_setup() and loopback. It was introduced for validation in dev_validate_header, and a min_header_len of 0 just skips some basic validation. As long as VLAN devices do not initialize min_header_len (e.g., by inheriting it from the physical device and incorrectly setting it to ETH_HLEN), then this should be fine.
Hi Willem, Thank you for the suggestion. I have conducted further tests and found that the results are not as we expected. I would like to explain my findings based on the following tests: ip link add link ens18 ens18.24 type vlan proto 802.1ad id 24 ip link add link ens18.24 ens18.24.25 type vlan proto 802.1Q id 25 ifconfig ens18.24 1.0.24.1/24 ifconfig ens18.24.25 1.0.25.1/24 ping -n 1.0.25.3 > /dev/null 2>&1 & tcpdump -nn -i any -y LINUX_SLL -Q out not tcp and not udp I have added more logs and found the following results: af_packet: tpacket_rcv: dev->name [ens18.24.25] af_packet: tpacket_rcv: dev->name [ens18.24] af_packet: vlan_get_tci: dev->name [ens18.24], min_header_len [14], hard_header_len [18] af_packet: prb_fill_vlan_info: ppd->hv1.tp_vlan_tci [0], ppd->hv1.tp_vlan_tpid [8100] af_packet: prb_fill_vlan_info: currect vlan_tci [19], tp_vlan_tpid [8100] af_packet: tpacket_rcv: dev->name [ens18] af_packet: vlan_get_tci: dev->name [ens18], min_header_len [14], hard_header_len [14] af_packet: prb_fill_vlan_info: ppd->hv1.tp_vlan_tci [18], ppd->hv1.tp_vlan_tpid [88a8] af_packet: prb_fill_vlan_info: currect vlan_tci [18], tp_vlan_tpid [88a8] It seems that the min_header_len has been set even though the device is ens18.24. I will continue investigating this issue. Thank you for your ongoing assistance. Best regards, Chengen Du On Wed, Jun 12, 2024 at 10:07 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Chengen Du wrote: > > Hi Willem, > > > > On Tue, Jun 11, 2024 at 7:18 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > Chengen Du wrote: > > > > Hi Willem, > > > > > > > > I'm sorry, but I would like to confirm the issue further. > > > > > > > > On Mon, Jun 10, 2024 at 4:19 AM Willem de Bruijn > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > Chengen Du wrote: > > > > > > The issue initially stems from libpcap. The ethertype will be overwritten > > > > > > as the VLAN TPID if the network interface lacks hardware VLAN offloading. > > > > > > In the outbound packet path, if hardware VLAN offloading is unavailable, > > > > > > the VLAN tag is inserted into the payload but then cleared from the sk_buff > > > > > > struct. Consequently, this can lead to a false negative when checking for > > > > > > the presence of a VLAN tag, causing the packet sniffing outcome to lack > > > > > > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing > > > > > > tool may be unable to parse packets as expected. > > > > > > > > > > > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not > > > > > > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the > > > > > > payload and not in the sk_buff struct. The skb_vlan_tag_present() function > > > > > > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header > > > > > > is stripped, preventing the packet capturing tool from determining the > > > > > > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, > > > > > > which means the packet capturing tool cannot parse the L3 header correctly. > > > > > > > > > > > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 > > > > > > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u > > > > > > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") > > > > > > Cc: stable@vger.kernel.org > > > > > > Signed-off-by: Chengen Du <chengen.du@canonical.com> > > > > > > > > > > Overall, solid. > > > > > > > > > > > --- > > > > > > net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- > > > > > > 1 file changed, 55 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > > > > > index ea3ebc160e25..8cffbe1f912d 100644 > > > > > > --- a/net/packet/af_packet.c > > > > > > +++ b/net/packet/af_packet.c > > > > > > @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, > > > > > > return packet_lookup_frame(po, rb, rb->head, status); > > > > > > } > > > > > > > > > > > > +static u16 vlan_get_tci(struct sk_buff *skb) > > > > > > +{ > > > > > > + struct vlan_hdr vhdr, *vh; > > > > > > + u8 *skb_orig_data = skb->data; > > > > > > + int skb_orig_len = skb->len; > > > > > > + > > > > > > + skb_push(skb, skb->data - skb_mac_header(skb)); > > > > > > + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); > > > > > > > > > > Don't harcode Ethernet. > > > > > > > > > > According to documentation VLANs are used with other link layers. > > > > > > > > > > More importantly, in practice PF_PACKET allows inserting this > > > > > skb->protocol on any device. > > > > > > > > > > We don't use link layer specific constants anywhere in the packet > > > > > socket code for this reason. But instead dev->hard_header_len. > > > > > > > > > > One caveat there is variable length link layer headers, where > > > > > dev->min_header_len != dev->hard_header_len. Will just have to fail > > > > > on those. > > > > > > > > Thank you for pointing out this error. I would like to confirm if I > > > > need to use dev->hard_header_len to get the correct header length and > > > > return zero if dev->min_header_len != dev->hard_header_len to handle > > > > variable-length link layer headers. Is there something I > > > > misunderstand, or are there other aspects I need to consider further? > > > > > > That's right. > > > > > > The min_header_len != hard_header_len check is annoying and may seem > > > pedantic. But it's the only way to trust that the next header starts > > > at hard_header_len. > > > > Thank you for your advice. > > I have implemented the modification, but I found that the > > (min_header_len != hard_header_len) check results in unexpected > > behavior in the following test scenario: > > ip link add link ens18 ens18.24 type vlan proto 802.1ad id 24 > > ip link add link ens18.24 ens18.24.25 type vlan proto 802.1Q id 25 > > ifconfig ens18.24 1.0.24.1/24 > > ifconfig ens18.24.25 1.0.25.1/24 > > ping -n 1.0.25.3 > /dev/null 2>&1 & > > tcpdump -nn -i any -y LINUX_SLL -Q out not tcp and not udp > > > > While receiving a packet from ens18.24.25 (802.1Q), the min_header_len > > and hard_header_len are 14 and 18, respectively. > > This check results in the TCI being 0 instead of 25. > > Should we skip this check to display the correct value, or is there > > another check that can achieve the same purpose? > > Interesting. Glad you found this. > > Makes sense, as VLAN devices have > > vlandev->hard_header_len = dev->hard_header_len + VLAN_HLEN; > > Does > > if (min_header_len && min_header_len != hard_header_len) > > resolve it? > > Few devices actually set min_header_len. Initially, only Ethernet in > ether_setup() and loopback. It was introduced for validation in > dev_validate_header, and a min_header_len of 0 just skips some basic > validation. > > As long as VLAN devices do not initialize min_header_len (e.g., by > inheriting it from the physical device and incorrectly setting it to > ETH_HLEN), then this should be fine. >
Chengen Du wrote: > Hi Willem, > > Thank you for the suggestion. > I have conducted further tests and found that the results are not as > we expected. > > I would like to explain my findings based on the following tests: > ip link add link ens18 ens18.24 type vlan proto 802.1ad id 24 > ip link add link ens18.24 ens18.24.25 type vlan proto 802.1Q id 25 > ifconfig ens18.24 1.0.24.1/24 > ifconfig ens18.24.25 1.0.25.1/24 > ping -n 1.0.25.3 > /dev/null 2>&1 & > tcpdump -nn -i any -y LINUX_SLL -Q out not tcp and not udp > > I have added more logs and found the following results: > af_packet: tpacket_rcv: dev->name [ens18.24.25] > af_packet: tpacket_rcv: dev->name [ens18.24] > af_packet: vlan_get_tci: dev->name [ens18.24], min_header_len > [14], hard_header_len [18] > af_packet: prb_fill_vlan_info: ppd->hv1.tp_vlan_tci [0], > ppd->hv1.tp_vlan_tpid [8100] > af_packet: prb_fill_vlan_info: currect vlan_tci [19], tp_vlan_tpid [8100] > af_packet: tpacket_rcv: dev->name [ens18] > af_packet: vlan_get_tci: dev->name [ens18], min_header_len [14], > hard_header_len [14] > af_packet: prb_fill_vlan_info: ppd->hv1.tp_vlan_tci [18], > ppd->hv1.tp_vlan_tpid [88a8] > af_packet: prb_fill_vlan_info: currect vlan_tci [18], tp_vlan_tpid [88a8] > > It seems that the min_header_len has been set even though the device > is ens18.24. > I will continue investigating this issue. > Thank you for your ongoing assistance. Thanks. Apparently min_header_len cannot be relied on for this. It is not explicitly set in most drivers, including in the vlan driver. Let's not make this series conditional on that. The number of variable length link layer protocols is minimal, and egregious errors are caught by skb_header_pointer. Just use dev->hard_header_len as is, without the min_header_len check. Btw, please remember not to top-post. https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying > Best regards, > Chengen Du > > On Wed, Jun 12, 2024 at 10:07 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Chengen Du wrote: > > > Hi Willem, > > > > > > On Tue, Jun 11, 2024 at 7:18 AM Willem de Bruijn > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > Chengen Du wrote: > > > > > Hi Willem, > > > > > > > > > > I'm sorry, but I would like to confirm the issue further. > > > > > > > > > > On Mon, Jun 10, 2024 at 4:19 AM Willem de Bruijn > > > > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > > > > > > > Chengen Du wrote: > > > > > > > The issue initially stems from libpcap. The ethertype will be overwritten > > > > > > > as the VLAN TPID if the network interface lacks hardware VLAN offloading. > > > > > > > In the outbound packet path, if hardware VLAN offloading is unavailable, > > > > > > > the VLAN tag is inserted into the payload but then cleared from the sk_buff > > > > > > > struct. Consequently, this can lead to a false negative when checking for > > > > > > > the presence of a VLAN tag, causing the packet sniffing outcome to lack > > > > > > > VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing > > > > > > > tool may be unable to parse packets as expected. > > > > > > > > > > > > > > The TCI-TPID is missing because the prb_fill_vlan_info() function does not > > > > > > > modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the > > > > > > > payload and not in the sk_buff struct. The skb_vlan_tag_present() function > > > > > > > only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header > > > > > > > is stripped, preventing the packet capturing tool from determining the > > > > > > > correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, > > > > > > > which means the packet capturing tool cannot parse the L3 header correctly. > > > > > > > > > > > > > > Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 > > > > > > > Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u > > > > > > > Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") > > > > > > > Cc: stable@vger.kernel.org > > > > > > > Signed-off-by: Chengen Du <chengen.du@canonical.com> > > > > > > > > > > > > Overall, solid. > > > > > > > > > > > > > --- > > > > > > > net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- > > > > > > > 1 file changed, 55 insertions(+), 2 deletions(-) > > > > > > > > > > > > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > > > > > > > index ea3ebc160e25..8cffbe1f912d 100644 > > > > > > > --- a/net/packet/af_packet.c > > > > > > > +++ b/net/packet/af_packet.c > > > > > > > @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, > > > > > > > return packet_lookup_frame(po, rb, rb->head, status); > > > > > > > } > > > > > > > > > > > > > > +static u16 vlan_get_tci(struct sk_buff *skb) > > > > > > > +{ > > > > > > > + struct vlan_hdr vhdr, *vh; > > > > > > > + u8 *skb_orig_data = skb->data; > > > > > > > + int skb_orig_len = skb->len; > > > > > > > + > > > > > > > + skb_push(skb, skb->data - skb_mac_header(skb)); > > > > > > > + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); > > > > > > > > > > > > Don't harcode Ethernet. > > > > > > > > > > > > According to documentation VLANs are used with other link layers. > > > > > > > > > > > > More importantly, in practice PF_PACKET allows inserting this > > > > > > skb->protocol on any device. > > > > > > > > > > > > We don't use link layer specific constants anywhere in the packet > > > > > > socket code for this reason. But instead dev->hard_header_len. > > > > > > > > > > > > One caveat there is variable length link layer headers, where > > > > > > dev->min_header_len != dev->hard_header_len. Will just have to fail > > > > > > on those. > > > > > > > > > > Thank you for pointing out this error. I would like to confirm if I > > > > > need to use dev->hard_header_len to get the correct header length and > > > > > return zero if dev->min_header_len != dev->hard_header_len to handle > > > > > variable-length link layer headers. Is there something I > > > > > misunderstand, or are there other aspects I need to consider further? > > > > > > > > That's right. > > > > > > > > The min_header_len != hard_header_len check is annoying and may seem > > > > pedantic. But it's the only way to trust that the next header starts > > > > at hard_header_len. > > > > > > Thank you for your advice. > > > I have implemented the modification, but I found that the > > > (min_header_len != hard_header_len) check results in unexpected > > > behavior in the following test scenario: > > > ip link add link ens18 ens18.24 type vlan proto 802.1ad id 24 > > > ip link add link ens18.24 ens18.24.25 type vlan proto 802.1Q id 25 > > > ifconfig ens18.24 1.0.24.1/24 > > > ifconfig ens18.24.25 1.0.25.1/24 > > > ping -n 1.0.25.3 > /dev/null 2>&1 & > > > tcpdump -nn -i any -y LINUX_SLL -Q out not tcp and not udp > > > > > > While receiving a packet from ens18.24.25 (802.1Q), the min_header_len > > > and hard_header_len are 14 and 18, respectively. > > > This check results in the TCI being 0 instead of 25. > > > Should we skip this check to display the correct value, or is there > > > another check that can achieve the same purpose? > > > > Interesting. Glad you found this. > > > > Makes sense, as VLAN devices have > > > > vlandev->hard_header_len = dev->hard_header_len + VLAN_HLEN; > > > > Does > > > > if (min_header_len && min_header_len != hard_header_len) > > > > resolve it? > > > > Few devices actually set min_header_len. Initially, only Ethernet in > > ether_setup() and loopback. It was introduced for validation in > > dev_validate_header, and a min_header_len of 0 just skips some basic > > validation. > > > > As long as VLAN devices do not initialize min_header_len (e.g., by > > inheriting it from the physical device and incorrectly setting it to > > ETH_HLEN), then this should be fine. > >
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index ea3ebc160e25..8cffbe1f912d 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -538,6 +538,43 @@ static void *packet_current_frame(struct packet_sock *po, return packet_lookup_frame(po, rb, rb->head, status); } +static u16 vlan_get_tci(struct sk_buff *skb) +{ + struct vlan_hdr vhdr, *vh; + u8 *skb_orig_data = skb->data; + int skb_orig_len = skb->len; + + skb_push(skb, skb->data - skb_mac_header(skb)); + vh = skb_header_pointer(skb, ETH_HLEN, sizeof(vhdr), &vhdr); + if (skb_orig_data != skb->data) { + skb->data = skb_orig_data; + skb->len = skb_orig_len; + } + if (unlikely(!vh)) + return 0; + + return ntohs(vh->h_vlan_TCI); +} + +static __be16 vlan_get_protocol_dgram(struct sk_buff *skb) +{ + __be16 proto = skb->protocol; + + if (unlikely(eth_type_vlan(proto))) { + u8 *skb_orig_data = skb->data; + int skb_orig_len = skb->len; + + skb_push(skb, skb->data - skb_mac_header(skb)); + proto = __vlan_get_protocol(skb, proto, NULL); + if (skb_orig_data != skb->data) { + skb->data = skb_orig_data; + skb->len = skb_orig_len; + } + } + + return proto; +} + static void prb_del_retire_blk_timer(struct tpacket_kbdq_core *pkc) { del_timer_sync(&pkc->retire_blk_timer); @@ -1007,10 +1044,16 @@ static void prb_clear_rxhash(struct tpacket_kbdq_core *pkc, static void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc, struct tpacket3_hdr *ppd) { + struct packet_sock *po = container_of(pkc, struct packet_sock, rx_ring.prb_bdqc); + if (skb_vlan_tag_present(pkc->skb)) { ppd->hv1.tp_vlan_tci = skb_vlan_tag_get(pkc->skb); ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->vlan_proto); ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; + } else if (unlikely(po->sk.sk_type == SOCK_DGRAM && eth_type_vlan(pkc->skb->protocol))) { + ppd->hv1.tp_vlan_tci = vlan_get_tci(pkc->skb); + ppd->hv1.tp_vlan_tpid = ntohs(pkc->skb->protocol); + ppd->tp_status = TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; } else { ppd->hv1.tp_vlan_tci = 0; ppd->hv1.tp_vlan_tpid = 0; @@ -2428,6 +2471,10 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, h.h2->tp_vlan_tci = skb_vlan_tag_get(skb); h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto); status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; + } else if (unlikely(sk->sk_type == SOCK_DGRAM && eth_type_vlan(skb->protocol))) { + h.h2->tp_vlan_tci = vlan_get_tci(skb); + h.h2->tp_vlan_tpid = ntohs(skb->protocol); + status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; } else { h.h2->tp_vlan_tci = 0; h.h2->tp_vlan_tpid = 0; @@ -2457,7 +2504,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, sll->sll_halen = dev_parse_header(skb, sll->sll_addr); sll->sll_family = AF_PACKET; sll->sll_hatype = dev->type; - sll->sll_protocol = skb->protocol; + sll->sll_protocol = (sk->sk_type == SOCK_DGRAM) ? + vlan_get_protocol_dgram(skb) : skb->protocol; sll->sll_pkttype = skb->pkt_type; if (unlikely(packet_sock_flag(po, PACKET_SOCK_ORIGDEV))) sll->sll_ifindex = orig_dev->ifindex; @@ -3482,7 +3530,8 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, /* Original length was stored in sockaddr_ll fields */ origlen = PACKET_SKB_CB(skb)->sa.origlen; sll->sll_family = AF_PACKET; - sll->sll_protocol = skb->protocol; + sll->sll_protocol = (sock->type == SOCK_DGRAM) ? + vlan_get_protocol_dgram(skb) : skb->protocol; } sock_recv_cmsgs(msg, sk, skb); @@ -3539,6 +3588,10 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, aux.tp_vlan_tci = skb_vlan_tag_get(skb); aux.tp_vlan_tpid = ntohs(skb->vlan_proto); aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; + } else if (unlikely(sock->type == SOCK_DGRAM && eth_type_vlan(skb->protocol))) { + aux.tp_vlan_tci = vlan_get_tci(skb); + aux.tp_vlan_tpid = ntohs(skb->protocol); + aux.tp_status |= TP_STATUS_VLAN_VALID | TP_STATUS_VLAN_TPID_VALID; } else { aux.tp_vlan_tci = 0; aux.tp_vlan_tpid = 0;
The issue initially stems from libpcap. The ethertype will be overwritten as the VLAN TPID if the network interface lacks hardware VLAN offloading. In the outbound packet path, if hardware VLAN offloading is unavailable, the VLAN tag is inserted into the payload but then cleared from the sk_buff struct. Consequently, this can lead to a false negative when checking for the presence of a VLAN tag, causing the packet sniffing outcome to lack VLAN tag information (i.e., TCI-TPID). As a result, the packet capturing tool may be unable to parse packets as expected. The TCI-TPID is missing because the prb_fill_vlan_info() function does not modify the tp_vlan_tci/tp_vlan_tpid values, as the information is in the payload and not in the sk_buff struct. The skb_vlan_tag_present() function only checks vlan_all in the sk_buff struct. In cooked mode, the L2 header is stripped, preventing the packet capturing tool from determining the correct TCI-TPID value. Additionally, the protocol in SLL is incorrect, which means the packet capturing tool cannot parse the L3 header correctly. Link: https://github.com/the-tcpdump-group/libpcap/issues/1105 Link: https://lore.kernel.org/netdev/20240520070348.26725-1-chengen.du@canonical.com/T/#u Fixes: 393e52e33c6c ("packet: deliver VLAN TCI to userspace") Cc: stable@vger.kernel.org Signed-off-by: Chengen Du <chengen.du@canonical.com> --- net/packet/af_packet.c | 57 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 55 insertions(+), 2 deletions(-)