diff mbox series

[RFC,3/6] contrib/cgit-rs: introduce Rust wrapper for libgit.a

Message ID 9a846c17c891e17566a9907b3627210a6a08ea76.1723054623.git.steadmon@google.com (mailing list archive)
State Superseded
Headers show
Series Introduce cgit-rs, a Rust wrapper around libgit.a | expand

Commit Message

Josh Steadmon Aug. 7, 2024, 6:21 p.m. UTC
Introduce cgit-rs, a Rust wrapper crate that allows Rust code to call
functions in libgit.a. This initial patch defines build rules and an
interface that exposes user agent string getter functions as a proof of
concept. A proof-of-concept library consumer is provided in
contrib/cgit-rs/src/main.rs. This executable can be run with `cargo run`

Symbols in cgit can collide with symbols from other libraries such as
libgit2. We avoid this by first exposing library symbols in
public_symbol_export.[ch]. These symbols are prepended with "libgit_" to
avoid collisions and set to visible using a visibility pragma. In
build.rs, Rust builds contrib/cgit-rs/libcgit.a, which also contains
libgit.a and other dependent libraries, with -fvisibility=hidden to hide
all symbols within those libraries that haven't been exposed with a
visibility pragma.

Co-authored-by: Kyle Lippincott <spectral@google.com>
Co-authored-by: Calvin Wan <calvinwan@google.com>
Co-authored-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 .gitignore                             |  1 +
 Makefile                               | 13 ++++++++++
 contrib/cgit-rs/Cargo.lock             | 16 +++++++++++++
 contrib/cgit-rs/Cargo.toml             | 16 +++++++++++++
 contrib/cgit-rs/README.md              | 15 ++++++++++++
 contrib/cgit-rs/build.rs               | 33 ++++++++++++++++++++++++++
 contrib/cgit-rs/public_symbol_export.c | 20 ++++++++++++++++
 contrib/cgit-rs/public_symbol_export.h |  8 +++++++
 contrib/cgit-rs/src/lib.rs             |  7 ++++++
 contrib/cgit-rs/src/main.rs            | 10 ++++++++
 10 files changed, 139 insertions(+)
 create mode 100644 contrib/cgit-rs/Cargo.lock
 create mode 100644 contrib/cgit-rs/Cargo.toml
 create mode 100644 contrib/cgit-rs/README.md
 create mode 100644 contrib/cgit-rs/build.rs
 create mode 100644 contrib/cgit-rs/public_symbol_export.c
 create mode 100644 contrib/cgit-rs/public_symbol_export.h
 create mode 100644 contrib/cgit-rs/src/lib.rs
 create mode 100644 contrib/cgit-rs/src/main.rs

Comments

brian m. carlson Aug. 7, 2024, 9:21 p.m. UTC | #1
On 2024-08-07 at 18:21:28, Josh Steadmon wrote:
> Introduce cgit-rs, a Rust wrapper crate that allows Rust code to call
> functions in libgit.a. This initial patch defines build rules and an
> interface that exposes user agent string getter functions as a proof of
> concept. A proof-of-concept library consumer is provided in
> contrib/cgit-rs/src/main.rs. This executable can be run with `cargo run`
> 
> Symbols in cgit can collide with symbols from other libraries such as
> libgit2. We avoid this by first exposing library symbols in
> public_symbol_export.[ch]. These symbols are prepended with "libgit_" to
> avoid collisions and set to visible using a visibility pragma. In
> build.rs, Rust builds contrib/cgit-rs/libcgit.a, which also contains
> libgit.a and other dependent libraries, with -fvisibility=hidden to hide
> all symbols within those libraries that haven't been exposed with a
> visibility pragma.

I think this is a good idea.  It's optional and it allows us to add
functionality as we go along.  Platforms that don't have Rust can just
omit building it.

> +[dependencies]
> +libc = "0.2.155"

I don't love that we're using libc here.  It would be better to use
rustix because that provides safe APIs that are compatible with POSIX,
but I think for now we need this because rustix doesn't offer memory
management like free(3).  I'd really prefer that we didn't have to do
memory management in Rust, but maybe that can come in with a future
series.

libc also comes with the downside that it calls the actual libc
functions, so you have to write things like stat64 on Linux if you want
a 64-bit stat, but that doesn't exist on some of the BSDs, so you have
to write something else and compile it conditionally, and all of that
makes the portability of it worse than with C.

In any event, I have the intention to send a patch to replace libc with
rustix in the future if this series lands.

> diff --git a/contrib/cgit-rs/public_symbol_export.c b/contrib/cgit-rs/public_symbol_export.c
> new file mode 100644
> index 0000000000..3d1cd6cc4f
> --- /dev/null
> +++ b/contrib/cgit-rs/public_symbol_export.c
> @@ -0,0 +1,20 @@
> +// Shim to publicly export Git symbols. These must be renamed so that the
> +// original symbols can be hidden. Renaming these with a "libgit_" prefix also
> +// avoid conflicts with other libraries such as libgit2.
> +
> +#include "contrib/cgit-rs/public_symbol_export.h"
> +#include "version.h"
> +
> +#pragma GCC visibility push(default)

I assume this also works in clang?
Randall S. Becker Aug. 7, 2024, 9:40 p.m. UTC | #2
On Wednesday, August 7, 2024 5:21 PM, brian m. carlson wrote:
>On 2024-08-07 at 18:21:28, Josh Steadmon wrote:
>> Introduce cgit-rs, a Rust wrapper crate that allows Rust code to call
>> functions in libgit.a. This initial patch defines build rules and an
>> interface that exposes user agent string getter functions as a proof
>> of concept. A proof-of-concept library consumer is provided in
>> contrib/cgit-rs/src/main.rs. This executable can be run with `cargo
>> run`
>>
>> Symbols in cgit can collide with symbols from other libraries such as
>> libgit2. We avoid this by first exposing library symbols in
>> public_symbol_export.[ch]. These symbols are prepended with "libgit_"
>> to avoid collisions and set to visible using a visibility pragma. In
>> build.rs, Rust builds contrib/cgit-rs/libcgit.a, which also contains
>> libgit.a and other dependent libraries, with -fvisibility=hidden to
>> hide all symbols within those libraries that haven't been exposed with
>> a visibility pragma.
>
>I think this is a good idea.  It's optional and it allows us to add functionality as we go
>along.  Platforms that don't have Rust can just omit building it.
>
>> +[dependencies]
>> +libc = "0.2.155"
>
>I don't love that we're using libc here.  It would be better to use rustix because that
>provides safe APIs that are compatible with POSIX, but I think for now we need this
>because rustix doesn't offer memory management like free(3).  I'd really prefer that
>we didn't have to do memory management in Rust, but maybe that can come in
>with a future series.

This is a good point. Libc is not portable, but because I can't build with RUST anyway,
I hope that libc is restricted to this facility if used. It should not be included in the
git C build. It is probably moot for me anyway for this series, but I have to mention it
in case anyone else gets the idea to include it as a dependency for git C.

>libc also comes with the downside that it calls the actual libc functions, so you have
>to write things like stat64 on Linux if you want a 64-bit stat, but that doesn't exist
>on some of the BSDs, so you have to write something else and compile it
>conditionally, and all of that makes the portability of it worse than with C.
>
>In any event, I have the intention to send a patch to replace libc with rustix in the
>future if this series lands.
>
>> diff --git a/contrib/cgit-rs/public_symbol_export.c
>> b/contrib/cgit-rs/public_symbol_export.c
>> new file mode 100644
>> index 0000000000..3d1cd6cc4f
>> --- /dev/null
>> +++ b/contrib/cgit-rs/public_symbol_export.c
>> @@ -0,0 +1,20 @@
>> +// Shim to publicly export Git symbols. These must be renamed so that
>> +the // original symbols can be hidden. Renaming these with a
>> +"libgit_" prefix also // avoid conflicts with other libraries such as libgit2.
>> +
>> +#include "contrib/cgit-rs/public_symbol_export.h"
>> +#include "version.h"
>> +
>> +#pragma GCC visibility push(default)
>
>I assume this also works in clang?
>--
>brian m. carlson (they/them or he/him)
>Toronto, Ontario, CA
Mike Hommey Aug. 7, 2024, 10:47 p.m. UTC | #3
On Wed, Aug 07, 2024 at 11:21:28AM -0700, Josh Steadmon wrote:
> +contrib/cgit-rs/hidden_symbol_export.o: contrib/cgit-rs/partial_symbol_export.o
> +	$(OBJCOPY) --localize-hidden $^ $@

This is ELF-specific and won't work on Mac or Windows.

> +    let make_output = std::process::Command::new("make")
> +        .env_remove("PROFILE")
> +        .current_dir(git_root.clone())
> +        .args(&[
> +            "CC=clang",

You should probably not set CC at all here.

> +            "CFLAGS=-fvisibility=hidden",

This won't work for Windows targets.

> +            "contrib/cgit-rs/libcgit.a"
> +        ])
> +        .output()
> +        .expect("Make failed to run");
> +    if !make_output.status.success() {
> +        panic!(
> +                "Make failed:\n  stdout = {}\n  stderr = {}\n",
> +                String::from_utf8(make_output.stdout).unwrap(),
> +                String::from_utf8(make_output.stderr).unwrap()
> +        );
> +    }
> +    std::fs::copy(crate_root.join("libcgit.a"), dst.join("libcgit.a"))?;
> +    println!("cargo::rustc-link-search=native={}", dst.into_os_string().into_string().unwrap());

You might as well use `dst.display()`.

> +    println!("cargo::rustc-link-lib=cgit");
> +    println!("cargo::rustc-link-lib=z");
> +    println!("cargo::rerun-if-changed={}", git_root.into_os_string().into_string().unwrap());

Likewise.

Mike
Josh Steadmon Aug. 7, 2024, 11:05 p.m. UTC | #4
On 2024.08.07 21:21, brian m. carlson wrote:
> On 2024-08-07 at 18:21:28, Josh Steadmon wrote:
> > Introduce cgit-rs, a Rust wrapper crate that allows Rust code to call
> > functions in libgit.a. This initial patch defines build rules and an
> > interface that exposes user agent string getter functions as a proof of
> > concept. A proof-of-concept library consumer is provided in
> > contrib/cgit-rs/src/main.rs. This executable can be run with `cargo run`
> > 
> > Symbols in cgit can collide with symbols from other libraries such as
> > libgit2. We avoid this by first exposing library symbols in
> > public_symbol_export.[ch]. These symbols are prepended with "libgit_" to
> > avoid collisions and set to visible using a visibility pragma. In
> > build.rs, Rust builds contrib/cgit-rs/libcgit.a, which also contains
> > libgit.a and other dependent libraries, with -fvisibility=hidden to hide
> > all symbols within those libraries that haven't been exposed with a
> > visibility pragma.
> 
> I think this is a good idea.  It's optional and it allows us to add
> functionality as we go along.  Platforms that don't have Rust can just
> omit building it.
> 
> > +[dependencies]
> > +libc = "0.2.155"
> 
> I don't love that we're using libc here.  It would be better to use
> rustix because that provides safe APIs that are compatible with POSIX,
> but I think for now we need this because rustix doesn't offer memory
> management like free(3).  I'd really prefer that we didn't have to do
> memory management in Rust, but maybe that can come in with a future
> series.

Yeah, needing to free() is the only thing we striclty need from libc
right now. Please correct me if I'm wrong, but IIUC then any memory that
is allocated on the C side and then passed to Rust needs one of:
1) freed by libc::free() on the Rust side,
2) passed back to the C side to be freed there, or
3) leaked

Am I correct in assuming that your opinion is that writing additional
*_free() functions on the C side is worth it to avoid libc? If so, then
I'm fine with including that in V2.

> libc also comes with the downside that it calls the actual libc
> functions, so you have to write things like stat64 on Linux if you want
> a 64-bit stat, but that doesn't exist on some of the BSDs, so you have
> to write something else and compile it conditionally, and all of that
> makes the portability of it worse than with C.
> 
> In any event, I have the intention to send a patch to replace libc with
> rustix in the future if this series lands.

Even if we can't fully eliminate libc in V2, I'll keep this in mind and
avoid adding additional usage of libc as much as possible.


> > diff --git a/contrib/cgit-rs/public_symbol_export.c b/contrib/cgit-rs/public_symbol_export.c
> > new file mode 100644
> > index 0000000000..3d1cd6cc4f
> > --- /dev/null
> > +++ b/contrib/cgit-rs/public_symbol_export.c
> > @@ -0,0 +1,20 @@
> > +// Shim to publicly export Git symbols. These must be renamed so that the
> > +// original symbols can be hidden. Renaming these with a "libgit_" prefix also
> > +// avoid conflicts with other libraries such as libgit2.
> > +
> > +#include "contrib/cgit-rs/public_symbol_export.h"
> > +#include "version.h"
> > +
> > +#pragma GCC visibility push(default)
> 
> I assume this also works in clang?

Yep.


> -- 
> brian m. carlson (they/them or he/him)
> Toronto, Ontario, CA
Josh Steadmon Aug. 7, 2024, 11:07 p.m. UTC | #5
On 2024.08.07 17:40, rsbecker@nexbridge.com wrote:
> On Wednesday, August 7, 2024 5:21 PM, brian m. carlson wrote:
> >On 2024-08-07 at 18:21:28, Josh Steadmon wrote:
> >> Introduce cgit-rs, a Rust wrapper crate that allows Rust code to call
> >> functions in libgit.a. This initial patch defines build rules and an
> >> interface that exposes user agent string getter functions as a proof
> >> of concept. A proof-of-concept library consumer is provided in
> >> contrib/cgit-rs/src/main.rs. This executable can be run with `cargo
> >> run`
> >>
> >> Symbols in cgit can collide with symbols from other libraries such as
> >> libgit2. We avoid this by first exposing library symbols in
> >> public_symbol_export.[ch]. These symbols are prepended with "libgit_"
> >> to avoid collisions and set to visible using a visibility pragma. In
> >> build.rs, Rust builds contrib/cgit-rs/libcgit.a, which also contains
> >> libgit.a and other dependent libraries, with -fvisibility=hidden to
> >> hide all symbols within those libraries that haven't been exposed with
> >> a visibility pragma.
> >
> >I think this is a good idea.  It's optional and it allows us to add functionality as we go
> >along.  Platforms that don't have Rust can just omit building it.
> >
> >> +[dependencies]
> >> +libc = "0.2.155"
> >
> >I don't love that we're using libc here.  It would be better to use rustix because that
> >provides safe APIs that are compatible with POSIX, but I think for now we need this
> >because rustix doesn't offer memory management like free(3).  I'd really prefer that
> >we didn't have to do memory management in Rust, but maybe that can come in
> >with a future series.
> 
> This is a good point. Libc is not portable, but because I can't build with RUST anyway,
> I hope that libc is restricted to this facility if used. It should not be included in the
> git C build. It is probably moot for me anyway for this series, but I have to mention it
> in case anyone else gets the idea to include it as a dependency for git C.

I know you don't have access to Rust, but would you be able to test the
symbol visibility steps with `make contrib/cgit-rs/libcgit.a`?
Josh Steadmon Aug. 7, 2024, 11:29 p.m. UTC | #6
On 2024.08.08 07:47, Mike Hommey wrote:
> On Wed, Aug 07, 2024 at 11:21:28AM -0700, Josh Steadmon wrote:
> > +contrib/cgit-rs/hidden_symbol_export.o: contrib/cgit-rs/partial_symbol_export.o
> > +	$(OBJCOPY) --localize-hidden $^ $@
> 
> This is ELF-specific and won't work on Mac or Windows.
> 
> > +    let make_output = std::process::Command::new("make")
> > +        .env_remove("PROFILE")
> > +        .current_dir(git_root.clone())
> > +        .args(&[
> > +            "CC=clang",
> 
> You should probably not set CC at all here.

Ack, fixed in V2.

> > +            "CFLAGS=-fvisibility=hidden",
> 
> This won't work for Windows targets.

Understood. I'll have to do some studying on the symbol visibility
options.


> > +            "contrib/cgit-rs/libcgit.a"
> > +        ])
> > +        .output()
> > +        .expect("Make failed to run");
> > +    if !make_output.status.success() {
> > +        panic!(
> > +                "Make failed:\n  stdout = {}\n  stderr = {}\n",
> > +                String::from_utf8(make_output.stdout).unwrap(),
> > +                String::from_utf8(make_output.stderr).unwrap()
> > +        );
> > +    }
> > +    std::fs::copy(crate_root.join("libcgit.a"), dst.join("libcgit.a"))?;
> > +    println!("cargo::rustc-link-search=native={}", dst.into_os_string().into_string().unwrap());
> 
> You might as well use `dst.display()`.

Wouldn't that fail silently in the event that the path is non-UTF-8? I
think I'd prefer to explicitly fail in that case, even if it seems
unlikely.

> > +    println!("cargo::rustc-link-lib=cgit");
> > +    println!("cargo::rustc-link-lib=z");
> > +    println!("cargo::rerun-if-changed={}", git_root.into_os_string().into_string().unwrap());
> 
> Likewise.
> 
> Mike
Randall S. Becker Aug. 7, 2024, 11:51 p.m. UTC | #7
On Wednesday, August 7, 2024 7:08 PM, Josh Steadmon wrote:
>On 2024.08.07 17:40, rsbecker@nexbridge.com wrote:
>> On Wednesday, August 7, 2024 5:21 PM, brian m. carlson wrote:
>> >On 2024-08-07 at 18:21:28, Josh Steadmon wrote:
>> >> Introduce cgit-rs, a Rust wrapper crate that allows Rust code to
>> >> call functions in libgit.a. This initial patch defines build rules
>> >> and an interface that exposes user agent string getter functions as
>> >> a proof of concept. A proof-of-concept library consumer is provided
>> >> in contrib/cgit-rs/src/main.rs. This executable can be run with
>> >> `cargo run`
>> >>
>> >> Symbols in cgit can collide with symbols from other libraries such
>> >> as libgit2. We avoid this by first exposing library symbols in
>> >> public_symbol_export.[ch]. These symbols are prepended with "libgit_"
>> >> to avoid collisions and set to visible using a visibility pragma.
>> >> In build.rs, Rust builds contrib/cgit-rs/libcgit.a, which also
>> >> contains libgit.a and other dependent libraries, with
>> >> -fvisibility=hidden to hide all symbols within those libraries that
>> >> haven't been exposed with a visibility pragma.
>> >
>> >I think this is a good idea.  It's optional and it allows us to add
>> >functionality as we go along.  Platforms that don't have Rust can just
omit
>building it.
>> >
>> >> +[dependencies]
>> >> +libc = "0.2.155"
>> >
>> >I don't love that we're using libc here.  It would be better to use
>> >rustix because that provides safe APIs that are compatible with
>> >POSIX, but I think for now we need this because rustix doesn't offer
>> >memory management like free(3).  I'd really prefer that we didn't
>> >have to do memory management in Rust, but maybe that can come in with a
>future series.
>>
>> This is a good point. Libc is not portable, but because I can't build
>> with RUST anyway, I hope that libc is restricted to this facility if
>> used. It should not be included in the git C build. It is probably
>> moot for me anyway for this series, but I have to mention it in case
anyone else
>gets the idea to include it as a dependency for git C.
>
>I know you don't have access to Rust, but would you be able to test the
symbol
>visibility steps with `make contrib/cgit-rs/libcgit.a`?

Of course. Branch? URI?
brian m. carlson Aug. 7, 2024, 11:55 p.m. UTC | #8
On 2024-08-07 at 23:05:00, Josh Steadmon wrote:
> Yeah, needing to free() is the only thing we striclty need from libc
> right now. Please correct me if I'm wrong, but IIUC then any memory that
> is allocated on the C side and then passed to Rust needs one of:
> 1) freed by libc::free() on the Rust side,
> 2) passed back to the C side to be freed there, or
> 3) leaked
> 
> Am I correct in assuming that your opinion is that writing additional
> *_free() functions on the C side is worth it to avoid libc? If so, then
> I'm fine with including that in V2.

I think if we're going to be writing a general purpose API for
libification, we probably should provide free functions.  Normally, that
will be a call to free(3), but in some cases we may need more complex
logic, and by providing those, we're making the API more consistent and
easy to use.
Mike Hommey Aug. 8, 2024, 12:17 a.m. UTC | #9
On Wed, Aug 07, 2024 at 04:29:54PM -0700, Josh Steadmon wrote:
> > You might as well use `dst.display()`.
> 
> Wouldn't that fail silently in the event that the path is non-UTF-8? I
> think I'd prefer to explicitly fail in that case, even if it seems
> unlikely.

That's the theory, unfortunately, reality is that even the most central
Rust crates don't care:
https://github.com/rust-lang/cc-rs/blob/main/src/lib.rs#L1357-L1360

Even better, last time I tried, cargo or rustc (I don't remember which
one it was) would blatantly fail to work if the path is not UTF-8 in the
first place.

Mike
Junio C Hamano Aug. 8, 2024, 2:21 a.m. UTC | #10
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I think if we're going to be writing a general purpose API for
> libification, we probably should provide free functions.  Normally, that
> will be a call to free(3), but in some cases we may need more complex
> logic, and by providing those, we're making the API more consistent and
> easy to use.

Do you mean that we should have variants of free() that are specific
to each data structure?  E.g., Patrick taught fetch_task_release()
to release the task structure itself, in addition to the resources
it holds, while renaming fetch_task_release() to fetch_task_free(),
with ff25992c (submodule: fix leaking fetch tasks, 2024-08-07), so
if cgit-sys wants to expose fetch_task object to the outside world,
the consumers would call fetch_task_free() on it?
Josh Steadmon Aug. 8, 2024, 5:13 p.m. UTC | #11
On 2024.08.07 19:51, rsbecker@nexbridge.com wrote:
> On Wednesday, August 7, 2024 7:08 PM, Josh Steadmon wrote:
> >On 2024.08.07 17:40, rsbecker@nexbridge.com wrote:
> >> On Wednesday, August 7, 2024 5:21 PM, brian m. carlson wrote:
> >> >On 2024-08-07 at 18:21:28, Josh Steadmon wrote:
> >> >> Introduce cgit-rs, a Rust wrapper crate that allows Rust code to
> >> >> call functions in libgit.a. This initial patch defines build rules
> >> >> and an interface that exposes user agent string getter functions as
> >> >> a proof of concept. A proof-of-concept library consumer is provided
> >> >> in contrib/cgit-rs/src/main.rs. This executable can be run with
> >> >> `cargo run`
> >> >>
> >> >> Symbols in cgit can collide with symbols from other libraries such
> >> >> as libgit2. We avoid this by first exposing library symbols in
> >> >> public_symbol_export.[ch]. These symbols are prepended with "libgit_"
> >> >> to avoid collisions and set to visible using a visibility pragma.
> >> >> In build.rs, Rust builds contrib/cgit-rs/libcgit.a, which also
> >> >> contains libgit.a and other dependent libraries, with
> >> >> -fvisibility=hidden to hide all symbols within those libraries that
> >> >> haven't been exposed with a visibility pragma.
> >> >
> >> >I think this is a good idea.  It's optional and it allows us to add
> >> >functionality as we go along.  Platforms that don't have Rust can just
> omit
> >building it.
> >> >
> >> >> +[dependencies]
> >> >> +libc = "0.2.155"
> >> >
> >> >I don't love that we're using libc here.  It would be better to use
> >> >rustix because that provides safe APIs that are compatible with
> >> >POSIX, but I think for now we need this because rustix doesn't offer
> >> >memory management like free(3).  I'd really prefer that we didn't
> >> >have to do memory management in Rust, but maybe that can come in with a
> >future series.
> >>
> >> This is a good point. Libc is not portable, but because I can't build
> >> with RUST anyway, I hope that libc is restricted to this facility if
> >> used. It should not be included in the git C build. It is probably
> >> moot for me anyway for this series, but I have to mention it in case
> anyone else
> >gets the idea to include it as a dependency for git C.
> >
> >I know you don't have access to Rust, but would you be able to test the
> symbol
> >visibility steps with `make contrib/cgit-rs/libcgit.a`?
> 
> Of course. Branch? URI?

https://github.com/steadmon/git/tree/cgit-dev
Josh Steadmon Aug. 8, 2024, 5:15 p.m. UTC | #12
On 2024.08.07 19:21, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > I think if we're going to be writing a general purpose API for
> > libification, we probably should provide free functions.  Normally, that
> > will be a call to free(3), but in some cases we may need more complex
> > logic, and by providing those, we're making the API more consistent and
> > easy to use.
> 
> Do you mean that we should have variants of free() that are specific
> to each data structure?  E.g., Patrick taught fetch_task_release()
> to release the task structure itself, in addition to the resources
> it holds, while renaming fetch_task_release() to fetch_task_free(),
> with ff25992c (submodule: fix leaking fetch tasks, 2024-08-07), so
> if cgit-sys wants to expose fetch_task object to the outside world,
> the consumers would call fetch_task_free() on it?

Yes, although hopefully the higher-level API we build on top of cgit-sys
will be able to hide this from the Rust consumers.
Josh Steadmon Aug. 8, 2024, 5:23 p.m. UTC | #13
On 2024.08.08 09:17, Mike Hommey wrote:
> On Wed, Aug 07, 2024 at 04:29:54PM -0700, Josh Steadmon wrote:
> > > You might as well use `dst.display()`.
> > 
> > Wouldn't that fail silently in the event that the path is non-UTF-8? I
> > think I'd prefer to explicitly fail in that case, even if it seems
> > unlikely.
> 
> That's the theory, unfortunately, reality is that even the most central
> Rust crates don't care:
> https://github.com/rust-lang/cc-rs/blob/main/src/lib.rs#L1357-L1360
> 
> Even better, last time I tried, cargo or rustc (I don't remember which
> one it was) would blatantly fail to work if the path is not UTF-8 in the
> first place.
> 
> Mike

Thanks for the pointers. Fixed in V2.
Josh Steadmon Aug. 8, 2024, 6:22 p.m. UTC | #14
On 2024.08.07 23:55, brian m. carlson wrote:
> On 2024-08-07 at 23:05:00, Josh Steadmon wrote:
> > Yeah, needing to free() is the only thing we striclty need from libc
> > right now. Please correct me if I'm wrong, but IIUC then any memory that
> > is allocated on the C side and then passed to Rust needs one of:
> > 1) freed by libc::free() on the Rust side,
> > 2) passed back to the C side to be freed there, or
> > 3) leaked
> > 
> > Am I correct in assuming that your opinion is that writing additional
> > *_free() functions on the C side is worth it to avoid libc? If so, then
> > I'm fine with including that in V2.
> 
> I think if we're going to be writing a general purpose API for
> libification, we probably should provide free functions.  Normally, that
> will be a call to free(3)

[snip]

So in this case, does that mean we'd replace our call to `libc::free()`
with just `free()`, and then add a declaration for `free` in our
`extern "C"` section of cgit-sys? It seems to work on my machine, but is
that actually the more portable option compared to using libc::free? Or
have I misunderstood something?
Randall S. Becker Aug. 8, 2024, 6:43 p.m. UTC | #15
On Thursday, August 8, 2024 1:13 PM, Josh Steadmon wrote:
>On 2024.08.07 19:51, rsbecker@nexbridge.com wrote:
>> On Wednesday, August 7, 2024 7:08 PM, Josh Steadmon wrote:
>> >On 2024.08.07 17:40, rsbecker@nexbridge.com wrote:
>> >> On Wednesday, August 7, 2024 5:21 PM, brian m. carlson wrote:
>> >> >On 2024-08-07 at 18:21:28, Josh Steadmon wrote:
>> >> >> Introduce cgit-rs, a Rust wrapper crate that allows Rust code to
>> >> >> call functions in libgit.a. This initial patch defines build
>> >> >> rules and an interface that exposes user agent string getter
>> >> >> functions as a proof of concept. A proof-of-concept library
>> >> >> consumer is provided in contrib/cgit-rs/src/main.rs. This
>> >> >> executable can be run with `cargo run`
>> >> >>
>> >> >> Symbols in cgit can collide with symbols from other libraries
>> >> >> such as libgit2. We avoid this by first exposing library symbols
>> >> >> in public_symbol_export.[ch]. These symbols are prepended with
"libgit_"
>> >> >> to avoid collisions and set to visible using a visibility pragma.
>> >> >> In build.rs, Rust builds contrib/cgit-rs/libcgit.a, which also
>> >> >> contains libgit.a and other dependent libraries, with
>> >> >> -fvisibility=hidden to hide all symbols within those libraries
>> >> >> that haven't been exposed with a visibility pragma.
>> >> >
>> >> >I think this is a good idea.  It's optional and it allows us to
>> >> >add functionality as we go along.  Platforms that don't have Rust
>> >> >can just
>> omit
>> >building it.
>> >> >
>> >> >> +[dependencies]
>> >> >> +libc = "0.2.155"
>> >> >
>> >> >I don't love that we're using libc here.  It would be better to
>> >> >use rustix because that provides safe APIs that are compatible
>> >> >with POSIX, but I think for now we need this because rustix
>> >> >doesn't offer memory management like free(3).  I'd really prefer
>> >> >that we didn't have to do memory management in Rust, but maybe
>> >> >that can come in with a
>> >future series.
>> >>
>> >> This is a good point. Libc is not portable, but because I can't
>> >> build with RUST anyway, I hope that libc is restricted to this
>> >> facility if used. It should not be included in the git C build. It
>> >> is probably moot for me anyway for this series, but I have to
>> >> mention it in case
>> anyone else
>> >gets the idea to include it as a dependency for git C.
>> >
>> >I know you don't have access to Rust, but would you be able to test
>> >the
>> symbol
>> >visibility steps with `make contrib/cgit-rs/libcgit.a`?
>>
>> Of course. Branch? URI?
>
>https://github.com/steadmon/git/tree/cgit-dev

I got to:

ld -r contrib/cgit-rs/public_symbol_export.o libgit.a reftable/libreftable.a
xdiff/lib.a -o contrib/cgit-rs/partial_symbol_export.o
/usr/bin/c89: illegal option -- r

The -r option is not supported on NonStop. I think we had discussed this.
Junio C Hamano Aug. 8, 2024, 7:57 p.m. UTC | #16
<rsbecker@nexbridge.com> writes:

> I got to:
>
> ld -r contrib/cgit-rs/public_symbol_export.o libgit.a reftable/libreftable.a
> xdiff/lib.a -o contrib/cgit-rs/partial_symbol_export.o
> /usr/bin/c89: illegal option -- r
>
> The -r option is not supported on NonStop. I think we had discussed this.

Does it happen to call the feature under different name?  'r' stands
for relocatable output, but some linkers may call the feature as
incremental linking and use '-i'.
Randall S. Becker Aug. 8, 2024, 8:14 p.m. UTC | #17
On Thursday, August 8, 2024 3:58 PM, Junio C Hamano wrote:
><rsbecker@nexbridge.com> writes:
>
>> I got to:
>>
>> ld -r contrib/cgit-rs/public_symbol_export.o libgit.a
>> reftable/libreftable.a xdiff/lib.a -o
>> contrib/cgit-rs/partial_symbol_export.o
>> /usr/bin/c89: illegal option -- r
>>
>> The -r option is not supported on NonStop. I think we had discussed this.
>
>Does it happen to call the feature under different name?  'r' stands for
relocatable
>output, but some linkers may call the feature as incremental linking and
use '-i'.

Well... depends on some factors. ld should not be run directly. Other links
are done
via the CC setting in config.mak.uname. That would invoke c99 for us for the
linker
as with the rest of git C, and that would make thinks somewhat easier as the
linker
is either xld or eld depending on the hardware.

The -Wr option to c99 creates a load file in this situation, which might or
might not
work. I need to look at it after built. I don't think it is what we want.
What I think
we need is -Wxld=unres_symbols ignore which builds an object that can be
linked to other objects even with undefined link symbols - although maybe
not.
I need to be able to select a different option on ia64, but that may not
matter
much. I still worry that we are depending on non-portable linker semantics
that
may not give the result we ultimately want.

Would not the build option be best selected in config.mak.uname?

Maybe GITRS_LINKER= or GITRS_CFLAGS=, etc.
Kyle Lippincott Aug. 8, 2024, 8:18 p.m. UTC | #18
On Thu, Aug 8, 2024 at 11:22 AM Josh Steadmon <steadmon@google.com> wrote:
>
> On 2024.08.07 23:55, brian m. carlson wrote:
> > On 2024-08-07 at 23:05:00, Josh Steadmon wrote:
> > > Yeah, needing to free() is the only thing we striclty need from libc
> > > right now. Please correct me if I'm wrong, but IIUC then any memory that
> > > is allocated on the C side and then passed to Rust needs one of:
> > > 1) freed by libc::free() on the Rust side,
> > > 2) passed back to the C side to be freed there, or
> > > 3) leaked
> > >
> > > Am I correct in assuming that your opinion is that writing additional
> > > *_free() functions on the C side is worth it to avoid libc? If so, then
> > > I'm fine with including that in V2.
> >
> > I think if we're going to be writing a general purpose API for
> > libification, we probably should provide free functions.  Normally, that
> > will be a call to free(3)
>
> [snip]
>
> So in this case, does that mean we'd replace our call to `libc::free()`
> with just `free()`, and then add a declaration for `free` in our
> `extern "C"` section of cgit-sys? It seems to work on my machine, but is
> that actually the more portable option compared to using libc::free? Or
> have I misunderstood something?

I think both having a generic 'free' function, or requiring your API
consumer to have a compatible 'free' function is undesirable. If the
API hands you something that you must return/free, there should be a
function for that specifically. So I would expect if the API has a
`libgit_foo_get(foo** f)` function, there'd be a paired
`libgit_foo_release(foo* f)` (ignoring whatever squabbles we want to
have about the names). Requiring `libgit_foo_get(foo** f)` to be
paired with `libc::free(f)` limits us to always using libc malloc;
pairing it with `libgit_free((void*)f)` means we can't refcount it /
ignore it if it's a part of a parent object, etc.
Josh Steadmon Aug. 8, 2024, 8:43 p.m. UTC | #19
On 2024.08.08 13:18, Kyle Lippincott wrote:
> On Thu, Aug 8, 2024 at 11:22 AM Josh Steadmon <steadmon@google.com> wrote:
> >
> > On 2024.08.07 23:55, brian m. carlson wrote:
> > > On 2024-08-07 at 23:05:00, Josh Steadmon wrote:
> > > > Yeah, needing to free() is the only thing we striclty need from libc
> > > > right now. Please correct me if I'm wrong, but IIUC then any memory that
> > > > is allocated on the C side and then passed to Rust needs one of:
> > > > 1) freed by libc::free() on the Rust side,
> > > > 2) passed back to the C side to be freed there, or
> > > > 3) leaked
> > > >
> > > > Am I correct in assuming that your opinion is that writing additional
> > > > *_free() functions on the C side is worth it to avoid libc? If so, then
> > > > I'm fine with including that in V2.
> > >
> > > I think if we're going to be writing a general purpose API for
> > > libification, we probably should provide free functions.  Normally, that
> > > will be a call to free(3)
> >
> > [snip]
> >
> > So in this case, does that mean we'd replace our call to `libc::free()`
> > with just `free()`, and then add a declaration for `free` in our
> > `extern "C"` section of cgit-sys? It seems to work on my machine, but is
> > that actually the more portable option compared to using libc::free? Or
> > have I misunderstood something?
> 
> I think both having a generic 'free' function, or requiring your API
> consumer to have a compatible 'free' function is undesirable. If the
> API hands you something that you must return/free, there should be a
> function for that specifically. So I would expect if the API has a
> `libgit_foo_get(foo** f)` function, there'd be a paired
> `libgit_foo_release(foo* f)` (ignoring whatever squabbles we want to
> have about the names). Requiring `libgit_foo_get(foo** f)` to be
> paired with `libc::free(f)` limits us to always using libc malloc;
> pairing it with `libgit_free((void*)f)` means we can't refcount it /
> ignore it if it's a part of a parent object, etc.

Sorry, the diff context got removed, the the particular case I was
talking about here is where we get back a strdup()ed string from some of
the config API calls. I guess we could add a c_str_free() somewhere on
the Git side and call that, but it seems like a bit of overkill.

I do agree with you about providing _free() [or in our case,
_clear_and_free()] functions for more complicated data types though.
Randall S. Becker Aug. 12, 2024, 2 a.m. UTC | #20
On Wednesday, August 7, 2024 7:08 PM, Josh Steadmon wrote:
>On 2024.08.07 17:40, rsbecker@nexbridge.com wrote:
>> On Wednesday, August 7, 2024 5:21 PM, brian m. carlson wrote:
>> >On 2024-08-07 at 18:21:28, Josh Steadmon wrote:
>> >> Introduce cgit-rs, a Rust wrapper crate that allows Rust code to
>> >> call functions in libgit.a. This initial patch defines build rules
>> >> and an interface that exposes user agent string getter functions as
>> >> a proof of concept. A proof-of-concept library consumer is provided
>> >> in contrib/cgit-rs/src/main.rs. This executable can be run with
>> >> `cargo run`
>> >>
>> >> Symbols in cgit can collide with symbols from other libraries such
>> >> as libgit2. We avoid this by first exposing library symbols in
>> >> public_symbol_export.[ch]. These symbols are prepended with "libgit_"
>> >> to avoid collisions and set to visible using a visibility pragma.
>> >> In build.rs, Rust builds contrib/cgit-rs/libcgit.a, which also
>> >> contains libgit.a and other dependent libraries, with
>> >> -fvisibility=hidden to hide all symbols within those libraries that
>> >> haven't been exposed with a visibility pragma.
>> >
>> >I think this is a good idea.  It's optional and it allows us to add
>> >functionality as we go along.  Platforms that don't have Rust can just
omit
>building it.
>> >
>> >> +[dependencies]
>> >> +libc = "0.2.155"
>> >
>> >I don't love that we're using libc here.  It would be better to use
>> >rustix because that provides safe APIs that are compatible with
>> >POSIX, but I think for now we need this because rustix doesn't offer
>> >memory management like free(3).  I'd really prefer that we didn't
>> >have to do memory management in Rust, but maybe that can come in with a
>future series.
>>
>> This is a good point. Libc is not portable, but because I can't build
>> with RUST anyway, I hope that libc is restricted to this facility if
>> used. It should not be included in the git C build. It is probably
>> moot for me anyway for this series, but I have to mention it in case
anyone else
>gets the idea to include it as a dependency for git C.
>
>I know you don't have access to Rust, but would you be able to test the
symbol
>visibility steps with `make contrib/cgit-rs/libcgit.a`?

This target is no longer valid. Is there another target I can try?
diff mbox series

Patch

diff --git a/.gitignore b/.gitignore
index 8caf3700c2..7031012330 100644
--- a/.gitignore
+++ b/.gitignore
@@ -248,3 +248,4 @@  Release/
 /git.VC.db
 *.dSYM
 /contrib/buildsystems/out
+/contrib/cgit-rs/target
diff --git a/Makefile b/Makefile
index 1cac51a4f7..fcd06af123 100644
--- a/Makefile
+++ b/Makefile
@@ -653,6 +653,8 @@  CURL_CONFIG = curl-config
 GCOV = gcov
 STRIP = strip
 SPATCH = spatch
+LD = ld
+OBJCOPY = objcopy
 
 export TCL_PATH TCLTK_PATH
 
@@ -2712,6 +2714,7 @@  OBJECTS += $(XDIFF_OBJS)
 OBJECTS += $(FUZZ_OBJS)
 OBJECTS += $(REFTABLE_OBJS) $(REFTABLE_TEST_OBJS)
 OBJECTS += $(UNIT_TEST_OBJS)
+OBJECTS += contrib/cgit-rs/public_symbol_export.o
 
 ifndef NO_CURL
 	OBJECTS += http.o http-walker.o remote-curl.o
@@ -3719,6 +3722,7 @@  clean: profile-clean coverage-clean cocciclean
 	$(RM) $(htmldocs).tar.gz $(manpages).tar.gz
 	$(MAKE) -C Documentation/ clean
 	$(RM) Documentation/GIT-EXCLUDED-PROGRAMS
+	$(RM) -r contrib/cgit-rs/target
 ifndef NO_PERL
 	$(RM) -r perl/build/
 endif
@@ -3864,3 +3868,12 @@  $(UNIT_TEST_PROGS): $(UNIT_TEST_BIN)/%$X: $(UNIT_TEST_DIR)/%.o \
 build-unit-tests: $(UNIT_TEST_PROGS)
 unit-tests: $(UNIT_TEST_PROGS) t/helper/test-tool$X
 	$(MAKE) -C t/ unit-tests
+
+contrib/cgit-rs/partial_symbol_export.o: contrib/cgit-rs/public_symbol_export.o libgit.a reftable/libreftable.a xdiff/lib.a
+	$(LD) -r $^ -o $@
+
+contrib/cgit-rs/hidden_symbol_export.o: contrib/cgit-rs/partial_symbol_export.o
+	$(OBJCOPY) --localize-hidden $^ $@
+
+contrib/cgit-rs/libcgit.a: contrib/cgit-rs/hidden_symbol_export.o
+	$(AR) $(ARFLAGS) $@ $^
diff --git a/contrib/cgit-rs/Cargo.lock b/contrib/cgit-rs/Cargo.lock
new file mode 100644
index 0000000000..f50c593995
--- /dev/null
+++ b/contrib/cgit-rs/Cargo.lock
@@ -0,0 +1,16 @@ 
+# This file is automatically @generated by Cargo.
+# It is not intended for manual editing.
+version = 3
+
+[[package]]
+name = "cgit"
+version = "0.1.0"
+dependencies = [
+ "libc",
+]
+
+[[package]]
+name = "libc"
+version = "0.2.155"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "97b3888a4aecf77e811145cadf6eef5901f4782c53886191b2f693f24761847c"
diff --git a/contrib/cgit-rs/Cargo.toml b/contrib/cgit-rs/Cargo.toml
new file mode 100644
index 0000000000..7b55e6f3e1
--- /dev/null
+++ b/contrib/cgit-rs/Cargo.toml
@@ -0,0 +1,16 @@ 
+[package]
+name = "cgit"
+version = "0.1.0"
+edition = "2021"
+build = "build.rs"
+links = "git"
+
+[[bin]]
+name = "cgit-test"
+path = "src/main.rs"
+
+[lib]
+path = "src/lib.rs"
+
+[dependencies]
+libc = "0.2.155"
diff --git a/contrib/cgit-rs/README.md b/contrib/cgit-rs/README.md
new file mode 100644
index 0000000000..7a59602c30
--- /dev/null
+++ b/contrib/cgit-rs/README.md
@@ -0,0 +1,15 @@ 
+# cgit-info
+
+A small hacky proof-of-concept showing how to provide a Rust FFI for the Git
+library.
+
+## Building
+
+`cargo build` automatically builds and picks up on changes made to both
+the Rust wrapper and git.git code so there is no need to run `make`
+beforehand.
+
+## Running
+
+Assuming you don't make any changes to the Git source, you can just work from
+`contrib/cgit-rs` and use `cargo build` or `cargo run` as usual.
diff --git a/contrib/cgit-rs/build.rs b/contrib/cgit-rs/build.rs
new file mode 100644
index 0000000000..0c1ec06737
--- /dev/null
+++ b/contrib/cgit-rs/build.rs
@@ -0,0 +1,33 @@ 
+use std::env;
+use std::path::PathBuf;
+
+pub fn main() -> std::io::Result<()> {
+    let crate_root = PathBuf::from(env::var_os("CARGO_MANIFEST_DIR").unwrap());
+    let git_root = crate_root.join("../..");
+    let dst = PathBuf::from(env::var_os("OUT_DIR").unwrap());
+
+    let make_output = std::process::Command::new("make")
+        .env_remove("PROFILE")
+        .current_dir(git_root.clone())
+        .args(&[
+            "CC=clang",
+            "CFLAGS=-fvisibility=hidden",
+            "contrib/cgit-rs/libcgit.a"
+        ])
+        .output()
+        .expect("Make failed to run");
+    if !make_output.status.success() {
+        panic!(
+                "Make failed:\n  stdout = {}\n  stderr = {}\n",
+                String::from_utf8(make_output.stdout).unwrap(),
+                String::from_utf8(make_output.stderr).unwrap()
+        );
+    }
+    std::fs::copy(crate_root.join("libcgit.a"), dst.join("libcgit.a"))?;
+    println!("cargo::rustc-link-search=native={}", dst.into_os_string().into_string().unwrap());
+    println!("cargo::rustc-link-lib=cgit");
+    println!("cargo::rustc-link-lib=z");
+    println!("cargo::rerun-if-changed={}", git_root.into_os_string().into_string().unwrap());
+
+    Ok(())
+}
diff --git a/contrib/cgit-rs/public_symbol_export.c b/contrib/cgit-rs/public_symbol_export.c
new file mode 100644
index 0000000000..3d1cd6cc4f
--- /dev/null
+++ b/contrib/cgit-rs/public_symbol_export.c
@@ -0,0 +1,20 @@ 
+// Shim to publicly export Git symbols. These must be renamed so that the
+// original symbols can be hidden. Renaming these with a "libgit_" prefix also
+// avoid conflicts with other libraries such as libgit2.
+
+#include "contrib/cgit-rs/public_symbol_export.h"
+#include "version.h"
+
+#pragma GCC visibility push(default)
+
+const char *libgit_user_agent(void)
+{
+	return git_user_agent();
+}
+
+const char *libgit_user_agent_sanitized(void)
+{
+	return git_user_agent_sanitized();
+}
+
+#pragma GCC visibility pop
diff --git a/contrib/cgit-rs/public_symbol_export.h b/contrib/cgit-rs/public_symbol_export.h
new file mode 100644
index 0000000000..a3372f93fa
--- /dev/null
+++ b/contrib/cgit-rs/public_symbol_export.h
@@ -0,0 +1,8 @@ 
+#ifndef PUBLIC_SYMBOL_EXPORT_H
+#define PUBLIC_SYMBOL_EXPORT_H
+
+const char *libgit_user_agent(void);
+
+const char *libgit_user_agent_sanitized(void);
+
+#endif /* PUBLIC_SYMBOL_EXPORT_H */
diff --git a/contrib/cgit-rs/src/lib.rs b/contrib/cgit-rs/src/lib.rs
new file mode 100644
index 0000000000..dc46e7ff42
--- /dev/null
+++ b/contrib/cgit-rs/src/lib.rs
@@ -0,0 +1,7 @@ 
+use libc::c_char;
+
+extern "C" {
+    // From version.c
+    pub fn libgit_user_agent() -> *const c_char;
+    pub fn libgit_user_agent_sanitized() -> *const c_char;
+}
diff --git a/contrib/cgit-rs/src/main.rs b/contrib/cgit-rs/src/main.rs
new file mode 100644
index 0000000000..1794e3f43e
--- /dev/null
+++ b/contrib/cgit-rs/src/main.rs
@@ -0,0 +1,10 @@ 
+use std::ffi::CStr;
+
+fn main() {
+    println!("Let's print some strings provided by Git");
+    let c_buf = unsafe { cgit::libgit_user_agent() };
+    let c_str = unsafe { CStr::from_ptr(c_buf) };
+    println!("git_user_agent() = {:?}", c_str);
+    println!("git_user_agent_sanitized() = {:?}",
+        unsafe { CStr::from_ptr(cgit::libgit_user_agent_sanitized()) });
+}