Message ID | 20250219211102.225324-2-nnac123@linux.ibm.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Use new for_each macro to create hexdumps | expand |
On Wed, 19 Feb 2025 15:11:00 -0600 Nick Child <nnac123@linux.ibm.com> wrote: > Define for_each_line_in_hex_dump which loops over a buffer and calls > hex_dump_to_buffer for each segment in the buffer. This allows the > caller to decide what to do with the resulting string and is not > limited by a specific printing format like print_hex_dump. > > Signed-off-by: Nick Child <nnac123@linux.ibm.com> > --- > include/linux/printk.h | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/include/linux/printk.h b/include/linux/printk.h > index 4217a9f412b2..12e51b1cdca5 100644 > --- a/include/linux/printk.h > +++ b/include/linux/printk.h > @@ -755,6 +755,26 @@ enum { > extern int hex_dump_to_buffer(const void *buf, size_t len, int rowsize, > int groupsize, char *linebuf, size_t linebuflen, > bool ascii); > +/** > + * for_each_line_in_hex_dump - iterate over buffer, converting into hex ASCII > + * @i: offset in @buff > + * @rowsize: number of bytes to print per line; must be 16 or 32 > + * @linebuf: where to put the converted data > + * @linebuflen: total size of @linebuf, including space for terminating NUL > + * IOW >= (@rowsize * 2) + ((@rowsize - 1 / @groupsize)) + 1 > + * @groupsize: number of bytes to print at a time (1, 2, 4, 8; default = 1) > + * @buf: data blob to dump > + * @len: number of bytes in the @buf > + */ > + #define for_each_line_in_hex_dump(i, rowsize, linebuf, linebuflen, groupsize, \ > + buf, len) \ > + for ((i) = 0; \ > + (i) < (len) && \ > + hex_dump_to_buffer((unsigned char *)(buf) + (i), \ > + (len) - (i), (rowsize), (groupsize), \ > + (linebuf), (linebuflen), false); \ You can avoid the compiler actually checking the function result it you try a bit harder - see below. > + (i) += (rowsize) == 32 ? 32 : 16 \ > + ) If you are doing this as a #define you really shouldn't evaluate the arguments more than once. I'd also not add more code that relies on the perverse and pointless code that enforces rowsize of 16 or 32. Maybe document it, but there is no point changing the stride without doing the equivalent change to the rowsize passed to hex_dump_to_buffer. You could do: #define for_each_line_in_hex_dump(buf_offset, rowsize, linebuf, linebuflen, groupsize, buf, len, ascii) \ for (unsigned int _offset = 0, _rowsize = (rowsize), _len = (len); \ ((offset) = _offset) < _len && (hex_dump_to_buffer((const char *)(buf) + _offset, _len - _offset, \ _rowsize, (groupsize), (linebuf), (linebuflen), (ascii)), 1); \ _offset += _rowsize ) (Assuming I've not mistyped it.) As soon as 'ascii' gets replaced by 'flags' you'll need to pass it through. David
Hi David, On Thu, Feb 20, 2025 at 10:00:50PM +0000, David Laight wrote: > On Wed, 19 Feb 2025 15:11:00 -0600 > Nick Child <nnac123@linux.ibm.com> wrote: > > > --- > > include/linux/printk.h | 20 ++++++++++++++++++++ > > 1 file changed, 20 insertions(+) > > > > diff --git a/include/linux/printk.h b/include/linux/printk.h > > index 4217a9f412b2..12e51b1cdca5 100644 > > --- a/include/linux/printk.h > > +++ b/include/linux/printk.h > > + buf, len) \ > > + for ((i) = 0; \ > > + (i) < (len) && \ > > + hex_dump_to_buffer((unsigned char *)(buf) + (i), \ > > + (len) - (i), (rowsize), (groupsize), \ > > + (linebuf), (linebuflen), false); \ > > You can avoid the compiler actually checking the function result > it you try a bit harder - see below. > This was an extra precaution against infinite loops, breaking when hex_dump_to_buffer returns 0 when len is 0. Technically this won't happen since we check i < len first, and increment i by at least 16 (though your proposal removes the latter assertion). My other thought was to check for error case by checking if the return value was > linebuflen. But I actually prefer the behavior of continuing with the truncated result. I think I prefer it how it is rather than completely ignoring it. Open to other opinons though. > > + (i) += (rowsize) == 32 ? 32 : 16 \ > > + ) > > If you are doing this as a #define you really shouldn't evaluate the > arguments more than once. > I'd also not add more code that relies on the perverse and pointless > code that enforces rowsize of 16 or 32. > Maybe document it, but there is no point changing the stride without > doing the equivalent change to the rowsize passed to hex_dump_to_buffer. > The equivalent conditonal exists in hex_dump_to_buffer so doing it again seemed unnecessary. I understand your recent patch [1] is trying to replace the rowsize is 16 or 32 rule with rowsize is a power of 2 and multiple of groupsize. I suppose the most straightforward and flexible thing the for_each loop can do is to just assume rowsize is valid. > You could do: > #define for_each_line_in_hex_dump(buf_offset, rowsize, linebuf, linebuflen, groupsize, buf, len, ascii) \ > for (unsigned int _offset = 0, _rowsize = (rowsize), _len = (len); \ > ((offset) = _offset) < _len && (hex_dump_to_buffer((const char *)(buf) + _offset, _len - _offset, \ > _rowsize, (groupsize), (linebuf), (linebuflen), (ascii)), 1); \ > _offset += _rowsize ) > > (Assuming I've not mistyped it.) > Trying to understand the reasoning for declaring new tmp variables; Is this to prevent the values from changing in the body of the loop? I tried to avoid declaring new vars in this design because I thought it would recive pushback due to possible name collision and variable declaration inside for loop initializer. I suppose both implementations come with tradeoffs. > As soon as 'ascii' gets replaced by 'flags' you'll need to pass it through. > Yes, if hex_dump_to_buffer becomes a wrapper around a new function (which accepts flag arg), I think there is an opportunity for a lot of confusion to clear up. Old behaviour of hex_dump_to_buffer will be respected but the underlying function will be more flexible. Appreciate the review! - Nick [1] - https://lore.kernel.org/lkml/20250216201901.161781-1-david.laight.linux@gmail.com/
On Fri, 21 Feb 2025 11:37:46 -0600 Nick Child <nnac123@linux.ibm.com> wrote: > Hi David, > > On Thu, Feb 20, 2025 at 10:00:50PM +0000, David Laight wrote: > > On Wed, 19 Feb 2025 15:11:00 -0600 > > Nick Child <nnac123@linux.ibm.com> wrote: > > > > > --- > > > include/linux/printk.h | 20 ++++++++++++++++++++ > > > 1 file changed, 20 insertions(+) > > > > > > diff --git a/include/linux/printk.h b/include/linux/printk.h > > > index 4217a9f412b2..12e51b1cdca5 100644 > > > --- a/include/linux/printk.h > > > +++ b/include/linux/printk.h > > > + buf, len) \ > > > + for ((i) = 0; \ > > > + (i) < (len) && \ > > > + hex_dump_to_buffer((unsigned char *)(buf) + (i), \ > > > + (len) - (i), (rowsize), (groupsize), \ > > > + (linebuf), (linebuflen), false); \ > > > > You can avoid the compiler actually checking the function result > > it you try a bit harder - see below. > > > > This was an extra precaution against infinite loops, breaking when > hex_dump_to_buffer returns 0 when len is 0. Technically this won't happen > since we check i < len first, and increment i by at least 16 (though > your proposal removes the latter assertion). > > My other thought was to check for error case by checking if > the return value was > linebuflen. But I actually prefer the behavior > of continuing with the truncated result. > > I think I prefer it how it is rather than completely ignoring it. > Open to other opinons though. There are plenty of ways to generate infinite loops. I wouldn't worry about someone passing 0 for rowsize. > > > > + (i) += (rowsize) == 32 ? 32 : 16 \ > > > + ) > > > > If you are doing this as a #define you really shouldn't evaluate the > > arguments more than once. > > I'd also not add more code that relies on the perverse and pointless > > code that enforces rowsize of 16 or 32. > > Maybe document it, but there is no point changing the stride without > > doing the equivalent change to the rowsize passed to hex_dump_to_buffer. > > > > The equivalent conditonal exists in hex_dump_to_buffer so doing it > again seemed unnecessary. I understand your recent patch [1] is trying > to replace the rowsize is 16 or 32 rule with rowsize is a power of 2 > and multiple of groupsize. I suppose the most straightforward and > flexible thing the for_each loop can do is to just assume rowsize is > valid. > > > You could do: > > #define for_each_line_in_hex_dump(buf_offset, rowsize, linebuf, linebuflen, groupsize, buf, len, ascii) \ > > for (unsigned int _offset = 0, _rowsize = (rowsize), _len = (len); \ > > ((offset) = _offset) < _len && (hex_dump_to_buffer((const char *)(buf) + _offset, _len - _offset, \ ^ needs to be buf_offset. > > _rowsize, (groupsize), (linebuf), (linebuflen), (ascii)), 1); \ > > _offset += _rowsize ) > > > > (Assuming I've not mistyped it.) > > > > Trying to understand the reasoning for declaring new tmp variables; > Is this to prevent the values from changing in the body of the loop? No, it is to prevent side-effects happening more than once. Think about what would happen if someone passed 'foo -= 4' for len. > I tried to avoid declaring new vars in this design because I thought it > would recive pushback due to possible name collision and variable > declaration inside for loop initializer. The c std level got upped recently to allow declarations inside loops. Usually for a 'loop iterator' - but I think you needed that to be exposed outsize the loop. (Otherwise you don't need _offset and buf_offset. > I suppose both implementations come with tradeoffs. > > > As soon as 'ascii' gets replaced by 'flags' you'll need to pass it through. > > > > Yes, if hex_dump_to_buffer becomes a wrapper around a new function > (which accepts flag arg), I think there is an opportunity for a lot > of confusion to clear up. Old behaviour of hex_dump_to_buffer will be > respected but the underlying function will be more flexible. My changed version (honoring row_size) passes all the self tests. I've not done a grep (yet) to if anything other that 16 or 32 is actually passed. David > > Appreciate the review! > - Nick > > [1] - https://lore.kernel.org/lkml/20250216201901.161781-1-david.laight.linux@gmail.com/
On Fri, Feb 21, 2025 at 06:04:35PM +0000, David Laight wrote: > On Fri, 21 Feb 2025 11:37:46 -0600 > Nick Child <nnac123@linux.ibm.com> wrote: > > On Thu, Feb 20, 2025 at 10:00:50PM +0000, David Laight wrote: > > > You could do: > > > #define for_each_line_in_hex_dump(buf_offset, rowsize, linebuf, linebuflen, groupsize, buf, len, ascii) \ > > > for (unsigned int _offset = 0, _rowsize = (rowsize), _len = (len); \ > > > ((offset) = _offset) < _len && (hex_dump_to_buffer((const char *)(buf) + _offset, _len - _offset, \ > ^ needs to be buf_offset. > > > > _rowsize, (groupsize), (linebuf), (linebuflen), (ascii)), 1); \ > > > _offset += _rowsize ) > > > > > > (Assuming I've not mistyped it.) > > > > > > > Trying to understand the reasoning for declaring new tmp variables; > > Is this to prevent the values from changing in the body of the loop? > > No, it is to prevent side-effects happening more than once. > Think about what would happen if someone passed 'foo -= 4' for len. > If we are protecting against those cases then linebuf, linebuflen, groupsize and ascii should also be stored into tmp variables since they are referenced in the loop conditional every iteration. At which point the loop becomes too messy IMO. Are any other for_each implementations taking these precautions? Not trying to come off dismissive, I genuinely appreciate all the insight, trying to learn more for next time. > > I tried to avoid declaring new vars in this design because I thought it > > would recive pushback due to possible name collision and variable > > declaration inside for loop initializer. > > The c std level got upped recently to allow declarations inside loops. > Usually for a 'loop iterator' - but I think you needed that to be > exposed outsize the loop. > (Otherwise you don't need _offset and buf_offset. > As in decrementing _len and increasing a _buf var rather than tracking offset? I don't really care for exposing the offset, during design I figured some caller may make use of it but I think it is worth removing to reduce the number of arguments. Thanks again, Nick
On Fri, 21 Feb 2025 12:50:59 -0600 Nick Child <nnac123@linux.ibm.com> wrote: > On Fri, Feb 21, 2025 at 06:04:35PM +0000, David Laight wrote: > > On Fri, 21 Feb 2025 11:37:46 -0600 > > Nick Child <nnac123@linux.ibm.com> wrote: > > > On Thu, Feb 20, 2025 at 10:00:50PM +0000, David Laight wrote: > > > > You could do: > > > > #define for_each_line_in_hex_dump(buf_offset, rowsize, linebuf, linebuflen, groupsize, buf, len, ascii) \ > > > > for (unsigned int _offset = 0, _rowsize = (rowsize), _len = (len); \ > > > > ((offset) = _offset) < _len && (hex_dump_to_buffer((const char *)(buf) + _offset, _len - _offset, \ > > ^ needs to be buf_offset. > > > > > > _rowsize, (groupsize), (linebuf), (linebuflen), (ascii)), 1); \ > > > > _offset += _rowsize ) > > > > > > > > (Assuming I've not mistyped it.) > > > > > > > > > > Trying to understand the reasoning for declaring new tmp variables; > > > Is this to prevent the values from changing in the body of the loop? > > > > No, it is to prevent side-effects happening more than once. > > Think about what would happen if someone passed 'foo -= 4' for len. > > > > If we are protecting against those cases then linebuf, linebuflen, > groupsize and ascii should also be stored into tmp variables since they > are referenced in the loop conditional every iteration. > At which point the loop becomes too messy IMO. > Are any other for_each implementations taking these precautions? No, it only matters if they appear in the text expansion of the #define more than once. It may be unlikely here, but for things like min(a,b) where: #define min(a, b) ((a) < (b) ? (a) : (b)) causes: a = 3; b = min(a += 3, 7); to set b to 9 it has to be avoided. > > Not trying to come off dismissive, I genuinely appreciate all the > insight, trying to learn more for next time. > > > > I tried to avoid declaring new vars in this design because I thought it > > > would recive pushback due to possible name collision and variable > > > declaration inside for loop initializer. > > > > The c std level got upped recently to allow declarations inside loops. > > Usually for a 'loop iterator' - but I think you needed that to be > > exposed outsize the loop. > > (Otherwise you don't need _offset and buf_offset. > > > > As in decrementing _len and increasing a _buf var rather than tracking > offset? > I don't really care for exposing the offset, during design I figured > some caller may make use of it but I think it is worth removing to reduce > the number of arguments. Except the loop body needs it - so it needs to be a caller-defined name, even if they don't declare the variable. David > > Thanks again, > Nick
On Fri, Feb 21, 2025 at 10:18:15PM +0000, David Laight wrote: > On Fri, 21 Feb 2025 12:50:59 -0600 > Nick Child <nnac123@linux.ibm.com> wrote: > > > On Fri, Feb 21, 2025 at 06:04:35PM +0000, David Laight wrote: > > > On Fri, 21 Feb 2025 11:37:46 -0600 > > > Nick Child <nnac123@linux.ibm.com> wrote: > > > > On Thu, Feb 20, 2025 at 10:00:50PM +0000, David Laight wrote: > > > > > You could do: > > > > > #define for_each_line_in_hex_dump(buf_offset, rowsize, linebuf, linebuflen, groupsize, buf, len, ascii) \ > > > > > for (unsigned int _offset = 0, _rowsize = (rowsize), _len = (len); \ > > > > > ((offset) = _offset) < _len && (hex_dump_to_buffer((const char *)(buf) + _offset, _len - _offset, \ > > > ^ needs to be buf_offset. > > > > > > > > _rowsize, (groupsize), (linebuf), (linebuflen), (ascii)), 1); \ > > > > > _offset += _rowsize ) > > > > > > > > > > (Assuming I've not mistyped it.) > > > > > > > > > > > > > Trying to understand the reasoning for declaring new tmp variables; > > > > Is this to prevent the values from changing in the body of the loop? > > > > > > No, it is to prevent side-effects happening more than once. > > > Think about what would happen if someone passed 'foo -= 4' for len. > > > > > > > If we are protecting against those cases then linebuf, linebuflen, > > groupsize and ascii should also be stored into tmp variables since they > > are referenced in the loop conditional every iteration. > > At which point the loop becomes too messy IMO. > > Are any other for_each implementations taking these precautions? > > No, it only matters if they appear in the text expansion of the #define > more than once. But the operation is still executed more than once when the variable appears in the loop conditional. This still sounds like the same type of unexpected behaviour. For example, when I set groupsize = 1 then invoke for_each_line_in_hex_dump with groupsize *= 2 I get: [ 4.688870][ T145] HD: 0100 0302 0504 0706 0908 0b0a 0d0c 0f0e [ 4.688949][ T145] HD: 13121110 17161514 1b1a1918 1f1e1d1c [ 4.688969][ T145] HD: 2726252423222120 2f2e2d2c2b2a2928 [ 4.688983][ T145] HD: 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f Similarly if I run with buf: buf += 8: [ 5.019031][ T149] HD: 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 [ 5.019057][ T149] HD: 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f [ 5.019069][ T149] HD: 38 39 3a 3b 3c 3d 3e 3f 98 1a 6a 95 de e6 9a 71 [ 5.019081][ T149] HD: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 The operations are getting executed more than once. Should this be classified as expected behaviour just because those vars are technically only expanded once in the macro? > > Not trying to come off dismissive, I genuinely appreciate all the > > insight, trying to learn more for next time. > > > > > > I tried to avoid declaring new vars in this design because I thought it > > > > would recive pushback due to possible name collision and variable > > > > declaration inside for loop initializer. > > > > > > The c std level got upped recently to allow declarations inside loops. > > > Usually for a 'loop iterator' - but I think you needed that to be > > > exposed outsize the loop. > > > (Otherwise you don't need _offset and buf_offset. > > > > > > > As in decrementing _len and increasing a _buf var rather than tracking > > offset? > > I don't really care for exposing the offset, during design I figured > > some caller may make use of it but I think it is worth removing to reduce > > the number of arguments. > > Except the loop body needs it - so it needs to be a caller-defined name, > even if they don't declare the variable. > > David > > > > > Thanks again, > > Nick >
On Sat, 22 Feb 2025 12:58:48 -0600 Nick Child <nnac123@linux.ibm.com> wrote: > On Fri, Feb 21, 2025 at 10:18:15PM +0000, David Laight wrote: > > On Fri, 21 Feb 2025 12:50:59 -0600 > > Nick Child <nnac123@linux.ibm.com> wrote: > > > > > On Fri, Feb 21, 2025 at 06:04:35PM +0000, David Laight wrote: > > > > On Fri, 21 Feb 2025 11:37:46 -0600 > > > > Nick Child <nnac123@linux.ibm.com> wrote: > > > > > On Thu, Feb 20, 2025 at 10:00:50PM +0000, David Laight wrote: > > > > > > You could do: > > > > > > #define for_each_line_in_hex_dump(buf_offset, rowsize, linebuf, linebuflen, groupsize, buf, len, ascii) \ > > > > > > for (unsigned int _offset = 0, _rowsize = (rowsize), _len = (len); \ > > > > > > ((offset) = _offset) < _len && (hex_dump_to_buffer((const char *)(buf) + _offset, _len - _offset, \ > > > > ^ needs to be buf_offset. > > > > > > > > > > _rowsize, (groupsize), (linebuf), (linebuflen), (ascii)), 1); \ > > > > > > _offset += _rowsize ) > > > > > > > > > > > > (Assuming I've not mistyped it.) > > > > > > > > > > > > > > > > Trying to understand the reasoning for declaring new tmp variables; > > > > > Is this to prevent the values from changing in the body of the loop? > > > > > > > > No, it is to prevent side-effects happening more than once. > > > > Think about what would happen if someone passed 'foo -= 4' for len. > > > > > > > > > > If we are protecting against those cases then linebuf, linebuflen, > > > groupsize and ascii should also be stored into tmp variables since they > > > are referenced in the loop conditional every iteration. > > > At which point the loop becomes too messy IMO. > > > Are any other for_each implementations taking these precautions? > > > > No, it only matters if they appear in the text expansion of the #define > > more than once. > > But the operation is still executed more than once when the variable > appears in the loop conditional. This still sounds like the same type > of unexpected behaviour. For example, when I set groupsize = 1 then > invoke for_each_line_in_hex_dump with groupsize *= 2 I get: > [ 4.688870][ T145] HD: 0100 0302 0504 0706 0908 0b0a 0d0c 0f0e > [ 4.688949][ T145] HD: 13121110 17161514 1b1a1918 1f1e1d1c > [ 4.688969][ T145] HD: 2726252423222120 2f2e2d2c2b2a2928 > [ 4.688983][ T145] HD: 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f > Similarly if I run with buf: buf += 8: > [ 5.019031][ T149] HD: 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 > [ 5.019057][ T149] HD: 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f > [ 5.019069][ T149] HD: 38 39 3a 3b 3c 3d 3e 3f 98 1a 6a 95 de e6 9a 71 > [ 5.019081][ T149] HD: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > The operations are getting executed more than once. Should this be > classified as expected behaviour just because those vars are technically > only expanded once in the macro? Brain fade on my side :-( While you can copy all the integers, all the variables defined in a for(;;) statement have to have the same type - so you can't take a copy of the pointers. So maybe this is actually unwritable without having odd side effects and probably doesn't really save much text in any place it is used. I did a scan of the kernel sources earlier. Everything sets rowsize to 16 or 32, so it doesn't matter if hexdump_to_buffer() just uses the supplied value. I didn't look at the history. David
diff --git a/include/linux/printk.h b/include/linux/printk.h index 4217a9f412b2..12e51b1cdca5 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -755,6 +755,26 @@ enum { extern int hex_dump_to_buffer(const void *buf, size_t len, int rowsize, int groupsize, char *linebuf, size_t linebuflen, bool ascii); +/** + * for_each_line_in_hex_dump - iterate over buffer, converting into hex ASCII + * @i: offset in @buff + * @rowsize: number of bytes to print per line; must be 16 or 32 + * @linebuf: where to put the converted data + * @linebuflen: total size of @linebuf, including space for terminating NUL + * IOW >= (@rowsize * 2) + ((@rowsize - 1 / @groupsize)) + 1 + * @groupsize: number of bytes to print at a time (1, 2, 4, 8; default = 1) + * @buf: data blob to dump + * @len: number of bytes in the @buf + */ + #define for_each_line_in_hex_dump(i, rowsize, linebuf, linebuflen, groupsize, \ + buf, len) \ + for ((i) = 0; \ + (i) < (len) && \ + hex_dump_to_buffer((unsigned char *)(buf) + (i), \ + (len) - (i), (rowsize), (groupsize), \ + (linebuf), (linebuflen), false); \ + (i) += (rowsize) == 32 ? 32 : 16 \ + ) #ifdef CONFIG_PRINTK extern void print_hex_dump(const char *level, const char *prefix_str, int prefix_type, int rowsize, int groupsize,
Define for_each_line_in_hex_dump which loops over a buffer and calls hex_dump_to_buffer for each segment in the buffer. This allows the caller to decide what to do with the resulting string and is not limited by a specific printing format like print_hex_dump. Signed-off-by: Nick Child <nnac123@linux.ibm.com> --- include/linux/printk.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)