diff mbox series

[v2,5/5] cgit: add higher-level cgit crate

Message ID 514c744ba41cf48c4ffd85640b7ef0c5783cc77d.1723242556.git.steadmon@google.com (mailing list archive)
State New
Headers show
Series Introduce cgit-rs, a Rust wrapper around libgit.a | expand

Commit Message

Josh Steadmon Aug. 9, 2024, 10:41 p.m. UTC
From: Calvin Wan <calvinwan@google.com>

Wrap `struct config_set` and a few of its associated functions in
cgit-sys. Also introduce a higher-level "cgit" crate which provides a
more Rust-friendly interface to config_set structs.

Co-authored-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
---
 .gitignore                                    |  1 +
 Makefile                                      |  2 +-
 contrib/cgit-rs/Cargo.lock                    | 14 ++++
 contrib/cgit-rs/Cargo.toml                    | 10 +++
 .../cgit-rs/cgit-sys/public_symbol_export.c   | 30 +++++++
 .../cgit-rs/cgit-sys/public_symbol_export.h   | 12 +++
 contrib/cgit-rs/cgit-sys/src/lib.rs           | 31 ++++++-
 contrib/cgit-rs/src/lib.rs                    | 82 +++++++++++++++++++
 8 files changed, 180 insertions(+), 2 deletions(-)
 create mode 100644 contrib/cgit-rs/Cargo.lock
 create mode 100644 contrib/cgit-rs/Cargo.toml
 create mode 100644 contrib/cgit-rs/src/lib.rs

Comments

Phillip Wood Aug. 12, 2024, 9:26 a.m. UTC | #1
Hi Josh

On 09/08/2024 23:41, Josh Steadmon wrote:
> From: Calvin Wan <calvinwan@google.com>
> 
> Wrap `struct config_set` and a few of its associated functions in
> cgit-sys. Also introduce a higher-level "cgit" crate which provides a
> more Rust-friendly interface to config_set structs.

Having an ergonamic interface is a really good idea. As far as the 
naming goes I think the suggestion of "libgit-rs" is a good one.

> diff --git a/contrib/cgit-rs/cgit-sys/public_symbol_export.h b/contrib/cgit-rs/cgit-sys/public_symbol_export.h
> index 64332f30de..882c7932e8 100644
> --- a/contrib/cgit-rs/cgit-sys/public_symbol_export.h
> +++ b/contrib/cgit-rs/cgit-sys/public_symbol_export.h
> @@ -9,6 +9,18 @@ void libgit_init_git(const char **argv);
>   
>   int libgit_parse_maybe_bool(const char *val);

I'm suprised the compiler does not complain that 'struct config_set' is 
not declared in this header - I was expecting to see

	struct config_set;

before the function declarations. As I said in my comments on the last 
patch I think we'd be better to namespace our types as well as our 
functions in this library layer so this can be resued by other language 
bindings.

 > [...]
> +    pub fn get_str(&mut self, key: &str) -> Option<CString> {

If we're adding an ergonomic api then having return CString isn't ideal. 
I think the equivalent function in libgit2-rs has variants that return a 
String which is convinent if the caller is expecting utf8 values or 
Vec<u8> for non-utf8 values.

Best Wishes

Phillip
Calvin Wan Aug. 21, 2024, 6:46 p.m. UTC | #2
Phillip Wood <phillip.wood123@gmail.com> writes:
> Hi Josh
> 
> On 09/08/2024 23:41, Josh Steadmon wrote:
> > From: Calvin Wan <calvinwan@google.com>
> > 
> > Wrap `struct config_set` and a few of its associated functions in
> > cgit-sys. Also introduce a higher-level "cgit" crate which provides a
> > more Rust-friendly interface to config_set structs.
> 
> Having an ergonamic interface is a really good idea. As far as the 
> naming goes I think the suggestion of "libgit-rs" is a good one.

Agreed -- we plan on renaming it to "libgit-rs" in the next reroll.

> 
> > diff --git a/contrib/cgit-rs/cgit-sys/public_symbol_export.h b/contrib/cgit-rs/cgit-sys/public_symbol_export.h
> > index 64332f30de..882c7932e8 100644
> > --- a/contrib/cgit-rs/cgit-sys/public_symbol_export.h
> > +++ b/contrib/cgit-rs/cgit-sys/public_symbol_export.h
> > @@ -9,6 +9,18 @@ void libgit_init_git(const char **argv);
> >   
> >   int libgit_parse_maybe_bool(const char *val);
> 
> I'm suprised the compiler does not complain that 'struct config_set' is 
> not declared in this header - I was expecting to see
> 
> 	struct config_set;

I'm surprised as well actually. Removing the forward declaration of
"struct config_set" in repository.h doesn't result in any complaints
from the compiler either. Will add it in the reroll, but am curious if
anyone has any ideas why the compiler isn't complaining.

> before the function declarations. As I said in my comments on the last 
> patch I think we'd be better to namespace our types as well as our 
> functions in this library layer so this can be resued by other language 
> bindings.

Are you suggesting something like "#define libgit_config_set
config_set"? I wouldn't be comfortable renaming config_set in git.git
just yet until config/config_set can be a standalone library by itself.

> 
>  > [...]
> > +    pub fn get_str(&mut self, key: &str) -> Option<CString> {
> 
> If we're adding an ergonomic api then having return CString isn't ideal. 
> I think the equivalent function in libgit2-rs has variants that return a 
> String which is convinent if the caller is expecting utf8 values or 
> Vec<u8> for non-utf8 values.

Having both get_cstr() and get_str() makes sense to me.
Kyle Lippincott Aug. 21, 2024, 7:23 p.m. UTC | #3
On Wed, Aug 21, 2024 at 11:46 AM Calvin Wan <calvinwan@google.com> wrote:
>
> Phillip Wood <phillip.wood123@gmail.com> writes:
> > Hi Josh
> >
> > On 09/08/2024 23:41, Josh Steadmon wrote:
> > > From: Calvin Wan <calvinwan@google.com>
> > >
> > > Wrap `struct config_set` and a few of its associated functions in
> > > cgit-sys. Also introduce a higher-level "cgit" crate which provides a
> > > more Rust-friendly interface to config_set structs.
> >
> > Having an ergonamic interface is a really good idea. As far as the
> > naming goes I think the suggestion of "libgit-rs" is a good one.
>
> Agreed -- we plan on renaming it to "libgit-rs" in the next reroll.
>
> >
> > > diff --git a/contrib/cgit-rs/cgit-sys/public_symbol_export.h b/contrib/cgit-rs/cgit-sys/public_symbol_export.h
> > > index 64332f30de..882c7932e8 100644
> > > --- a/contrib/cgit-rs/cgit-sys/public_symbol_export.h
> > > +++ b/contrib/cgit-rs/cgit-sys/public_symbol_export.h
> > > @@ -9,6 +9,18 @@ void libgit_init_git(const char **argv);
> > >
> > >   int libgit_parse_maybe_bool(const char *val);
> >
> > I'm suprised the compiler does not complain that 'struct config_set' is
> > not declared in this header - I was expecting to see
> >
> >       struct config_set;
>
> I'm surprised as well actually. Removing the forward declaration of
> "struct config_set" in repository.h doesn't result in any complaints
> from the compiler either. Will add it in the reroll, but am curious if
> anyone has any ideas why the compiler isn't complaining.

C doesn't require structs be forward declared separately. You can
change the name to be anything you want, and as long as the forward
declaration of the function and the function definition agree, you're
fine. If they don't agree, well, let's hope you don't encounter that
(it's the same problem as if you have a forward declaration that's
*not* visible from the definition that disagrees with the definition:
`void some_func();` vs. `int some_func(int arg) { ... }` -- if the
forward declaration wasn't made in the same translation unit that
defines `some_func`, nothing detects this misuse in C).

For this reason, you should only ever use forward declarations that
are provided by "the code" that's being forward declared. i.e. if
you're trying to use a function from foo.c, please forward declare it
in foo.h, and only there. This way, assuming foo.c includes foo.h,
you'll detect mismatches.

[apologies if people got multiple copies of this, I sent with HTML
mode enabled the first time]

>
> > before the function declarations. As I said in my comments on the last
> > patch I think we'd be better to namespace our types as well as our
> > functions in this library layer so this can be resued by other language
> > bindings.
>
> Are you suggesting something like "#define libgit_config_set
> config_set"? I wouldn't be comfortable renaming config_set in git.git
> just yet until config/config_set can be a standalone library by itself.
>
> >
> >  > [...]
> > > +    pub fn get_str(&mut self, key: &str) -> Option<CString> {
> >
> > If we're adding an ergonomic api then having return CString isn't ideal.
> > I think the equivalent function in libgit2-rs has variants that return a
> > String which is convinent if the caller is expecting utf8 values or
> > Vec<u8> for non-utf8 values.
>
> Having both get_cstr() and get_str() makes sense to me.
Phillip Wood Aug. 22, 2024, 9:12 a.m. UTC | #4
Hi Calvin

On 21/08/2024 19:46, Calvin Wan wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
 >
>> before the function declarations. As I said in my comments on the last
>> patch I think we'd be better to namespace our types as well as our
>> functions in this library layer so this can be resued by other language
>> bindings.
> 
> Are you suggesting something like "#define libgit_config_set
> config_set"? I wouldn't be comfortable renaming config_set in git.git
> just yet until config/config_set can be a standalone library by itself.

I was suggesting[1] adding

     struct libgit_configset {
	    struct config_set set;
     };

to public_symbol_export.c and rewriting the wrappers in that file to use 
this struct e.g.

     int libgit_configset_get_int(struct libgit_configset *cs,
				 const char *key, int *dest)
     {
	    return git_configset_get_int(&cs.set, key, dest);
     }

In public_symbol_export.h we'd then have

     struct libgit_configset;

     int libgit_configset_get_int(struct libgit_configset *,
				 const char *, int *);

If we want the symbol exports to be useful outside of the rust bindings 
I think we need to namespace our types as well as our functions.

[1] 
https://lore.kernel.org/git/5720d5b9-a850-4024-a1fd-54acc6b15a74@gmail.com

>>> +    pub fn get_str(&mut self, key: &str) -> Option<CString> {
>>
>> If we're adding an ergonomic api then having return CString isn't ideal.
>> I think the equivalent function in libgit2-rs has variants that return a
>> String which is convinent if the caller is expecting utf8 values or
>> Vec<u8> for non-utf8 values.
> 
> Having both get_cstr() and get_str() makes sense to me.

Just to be clear get_cstr() would return Vec<u8>? I'm far from a rust 
expert but my understanding was that crates that wrap C libraries 
generally avoid using CString in their API.

Best Wishes

Phillip
Phillip Wood Aug. 22, 2024, 1:24 p.m. UTC | #5
On 21/08/2024 20:23, Kyle Lippincott wrote:
> On Wed, Aug 21, 2024 at 11:46 AM Calvin Wan <calvinwan@google.com> wrote:
>>
>> Phillip Wood <phillip.wood123@gmail.com> writes:
>>> 
>>> I'm suprised the compiler does not complain that 'struct config_set' is
>>> not declared in this header - I was expecting to see
>>>
>>>        struct config_set;
>>
>> I'm surprised as well actually. Removing the forward declaration of
>> "struct config_set" in repository.h doesn't result in any complaints
>> from the compiler either. Will add it in the reroll, but am curious if
>> anyone has any ideas why the compiler isn't complaining.
> 
> C doesn't require structs be forward declared separately. You can
> change the name to be anything you want, and as long as the forward
> declaration of the function and the function definition agree, you're
> fine.

Oh, TIL. The declaration

     struct foo *f(struct bar*);

declares "struct foo" and "struct bar" as well as function "f". The 
declaration of "struct bar" is scoped to the parameter list and so is 
not visible to the rest of the file. This is why we forward declare 
structs that are used in parameter lists and why the compiler complains 
if we don't. The declaration of "struct foo" does have file scope though 
which explains why the compiler does not complain about 
public_symbol_export.h because "struct config_set" is declared as a 
return value before it is used in any parameter lists.

Thanks

Phillip

> If they don't agree, well, let's hope you don't encounter that
> (it's the same problem as if you have a forward declaration that's
> *not* visible from the definition that disagrees with the definition:
> `void some_func();` vs. `int some_func(int arg) { ... }` -- if the
> forward declaration wasn't made in the same translation unit that
> defines `some_func`, nothing detects this misuse in C).
> 
> For this reason, you should only ever use forward declarations that
> are provided by "the code" that's being forward declared. i.e. if
> you're trying to use a function from foo.c, please forward declare it
> in foo.h, and only there. This way, assuming foo.c includes foo.h,
> you'll detect mismatches.
> 
> [apologies if people got multiple copies of this, I sent with HTML
> mode enabled the first time]
> 
>>
>>> before the function declarations. As I said in my comments on the last
>>> patch I think we'd be better to namespace our types as well as our
>>> functions in this library layer so this can be resued by other language
>>> bindings.
>>
>> Are you suggesting something like "#define libgit_config_set
>> config_set"? I wouldn't be comfortable renaming config_set in git.git
>> just yet until config/config_set can be a standalone library by itself.
>>
>>>
>>>   > [...]
>>>> +    pub fn get_str(&mut self, key: &str) -> Option<CString> {
>>>
>>> If we're adding an ergonomic api then having return CString isn't ideal.
>>> I think the equivalent function in libgit2-rs has variants that return a
>>> String which is convinent if the caller is expecting utf8 values or
>>> Vec<u8> for non-utf8 values.
>>
>> Having both get_cstr() and get_str() makes sense to me.
>
diff mbox series

Patch

diff --git a/.gitignore b/.gitignore
index 567cc9888f..7f2ee89b8a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -248,4 +248,5 @@  Release/
 /git.VC.db
 *.dSYM
 /contrib/buildsystems/out
+/contrib/cgit-rs/target
 /contrib/cgit-rs/cgit-sys/target
diff --git a/Makefile b/Makefile
index db8af99f20..3a71d10fbe 100644
--- a/Makefile
+++ b/Makefile
@@ -3723,7 +3723,7 @@  clean: profile-clean coverage-clean cocciclean
 	$(RM) $(htmldocs).tar.gz $(manpages).tar.gz
 	$(MAKE) -C Documentation/ clean
 	$(RM) Documentation/GIT-EXCLUDED-PROGRAMS
-	$(RM) -r contrib/cgit-rs/cgit-sys/target
+	$(RM) -r contrib/cgit-rs/target contrib/cgit-rs/cgit-sys/target
 ifndef NO_PERL
 	$(RM) -r perl/build/
 endif
diff --git a/contrib/cgit-rs/Cargo.lock b/contrib/cgit-rs/Cargo.lock
new file mode 100644
index 0000000000..991e775c34
--- /dev/null
+++ b/contrib/cgit-rs/Cargo.lock
@@ -0,0 +1,14 @@ 
+# This file is automatically @generated by Cargo.
+# It is not intended for manual editing.
+version = 3
+
+[[package]]
+name = "cgit"
+version = "0.1.0"
+dependencies = [
+ "cgit-sys",
+]
+
+[[package]]
+name = "cgit-sys"
+version = "0.1.0"
diff --git a/contrib/cgit-rs/Cargo.toml b/contrib/cgit-rs/Cargo.toml
new file mode 100644
index 0000000000..31f4d84522
--- /dev/null
+++ b/contrib/cgit-rs/Cargo.toml
@@ -0,0 +1,10 @@ 
+[package]
+name = "cgit"
+version = "0.1.0"
+edition = "2021"
+
+[lib]
+path = "src/lib.rs"
+
+[dependencies]
+cgit-sys = { version = "0.1.0", path = "cgit-sys" }
diff --git a/contrib/cgit-rs/cgit-sys/public_symbol_export.c b/contrib/cgit-rs/cgit-sys/public_symbol_export.c
index 62a91f76d0..adf8b21f11 100644
--- a/contrib/cgit-rs/cgit-sys/public_symbol_export.c
+++ b/contrib/cgit-rs/cgit-sys/public_symbol_export.c
@@ -33,6 +33,36 @@  int libgit_parse_maybe_bool(const char *val)
 	return git_parse_maybe_bool(val);
 }
 
+struct config_set *libgit_configset_alloc(void)
+{
+	return git_configset_alloc();
+}
+
+void libgit_configset_clear_and_free(struct config_set *cs)
+{
+	git_configset_clear_and_free(cs);
+}
+
+void libgit_configset_init(struct config_set *cs)
+{
+	git_configset_init(cs);
+}
+
+int libgit_configset_add_file(struct config_set *cs, const char *filename)
+{
+	return git_configset_add_file(cs, filename);
+}
+
+int libgit_configset_get_int(struct config_set *cs, const char *key, int *dest)
+{
+	return git_configset_get_int(cs, key, dest);
+}
+
+int libgit_configset_get_string(struct config_set *cs, const char *key, char **dest)
+{
+	return git_configset_get_string(cs, key, dest);
+}
+
 const char *libgit_user_agent(void)
 {
 	return git_user_agent();
diff --git a/contrib/cgit-rs/cgit-sys/public_symbol_export.h b/contrib/cgit-rs/cgit-sys/public_symbol_export.h
index 64332f30de..882c7932e8 100644
--- a/contrib/cgit-rs/cgit-sys/public_symbol_export.h
+++ b/contrib/cgit-rs/cgit-sys/public_symbol_export.h
@@ -9,6 +9,18 @@  void libgit_init_git(const char **argv);
 
 int libgit_parse_maybe_bool(const char *val);
 
+struct config_set *libgit_configset_alloc(void);
+
+void libgit_configset_clear_and_free(struct config_set *cs);
+
+void libgit_configset_init(struct config_set *cs);
+
+int libgit_configset_add_file(struct config_set *cs, const char *filename);
+
+int libgit_configset_get_int(struct config_set *cs, const char *key, int *dest);
+
+int libgit_configset_get_string(struct config_set *cs, const char *key, char **dest);
+
 const char *libgit_user_agent(void);
 
 const char *libgit_user_agent_sanitized(void);
diff --git a/contrib/cgit-rs/cgit-sys/src/lib.rs b/contrib/cgit-rs/cgit-sys/src/lib.rs
index 8c3ccc2859..d5072ea587 100644
--- a/contrib/cgit-rs/cgit-sys/src/lib.rs
+++ b/contrib/cgit-rs/cgit-sys/src/lib.rs
@@ -1,6 +1,15 @@ 
-use std::ffi::{c_char, c_int};
+use std::ffi::{c_char, c_int, c_void};
+
+#[allow(non_camel_case_types)]
+#[repr(C)]
+pub struct config_set {
+    _data: [u8; 0],
+    _marker: core::marker::PhantomData<(*mut u8, core::marker::PhantomPinned)>,
+}
 
 extern "C" {
+    pub fn free(ptr: *mut c_void);
+
     pub fn libgit_setup_git_directory() -> *const c_char;
 
     // From config.c
@@ -15,6 +24,26 @@  extern "C" {
     // From version.c
     pub fn libgit_user_agent() -> *const c_char;
     pub fn libgit_user_agent_sanitized() -> *const c_char;
+
+    pub fn libgit_configset_alloc() -> *mut config_set;
+    pub fn libgit_configset_clear_and_free(cs: *mut config_set);
+
+    pub fn libgit_configset_init(cs: *mut config_set);
+
+    pub fn libgit_configset_add_file(cs: *mut config_set, filename: *const c_char) -> c_int;
+
+    pub fn libgit_configset_get_int(
+        cs: *mut config_set,
+        key: *const c_char,
+        int: *mut c_int,
+    ) -> c_int;
+
+    pub fn libgit_configset_get_string(
+        cs: *mut config_set,
+        key: *const c_char,
+        dest: *mut *mut c_char,
+    ) -> c_int;
+
 }
 
 #[cfg(test)]
diff --git a/contrib/cgit-rs/src/lib.rs b/contrib/cgit-rs/src/lib.rs
new file mode 100644
index 0000000000..74c83d161a
--- /dev/null
+++ b/contrib/cgit-rs/src/lib.rs
@@ -0,0 +1,82 @@ 
+use std::ffi::{c_char, c_int, c_void, CStr, CString};
+
+use cgit_sys::*;
+
+pub struct ConfigSet(*mut config_set);
+impl ConfigSet {
+    pub fn new() -> Self {
+        unsafe {
+            let ptr = libgit_configset_alloc();
+            libgit_configset_init(ptr);
+            ConfigSet(ptr)
+        }
+    }
+
+    // NEEDSWORK: maybe replace &str with &Path
+    pub fn add_files(&mut self, files: &[&str]) {
+        for file in files {
+            let rs = CString::new(*file).expect("Couldn't convert to CString");
+            unsafe {
+                libgit_configset_add_file(self.0, rs.as_ptr());
+            }
+        }
+    }
+
+    pub fn get_int(&mut self, key: &str) -> Option<c_int> {
+        let key = CString::new(key).expect("Couldn't convert to CString");
+        let mut val: c_int = 0;
+        unsafe {
+            if libgit_configset_get_int(self.0, key.as_ptr(), &mut val as *mut c_int) != 0 {
+                return None;
+            }
+        }
+
+        Some(val)
+    }
+
+    pub fn get_str(&mut self, key: &str) -> Option<CString> {
+        let key = CString::new(key).expect("Couldn't convert to CString");
+        let mut val: *mut c_char = std::ptr::null_mut();
+        unsafe {
+            if libgit_configset_get_string(self.0, key.as_ptr(), &mut val as *mut *mut c_char) != 0
+            {
+                return None;
+            }
+            let borrowed_str = CStr::from_ptr(val);
+            let owned_str = CString::from_vec_with_nul(borrowed_str.to_bytes_with_nul().to_vec());
+            free(val as *mut c_void); // Free the xstrdup()ed pointer from the C side
+            Some(owned_str.unwrap())
+        }
+    }
+}
+
+impl Default for ConfigSet {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl Drop for ConfigSet {
+    fn drop(&mut self) {
+        unsafe {
+            libgit_configset_clear_and_free(self.0);
+        }
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn load_configs_via_configset() {
+        // NEEDSWORK: we need to supply a testdata config
+        let mut cs = ConfigSet::new();
+        let mut path = std::env::home_dir().expect("cannot get home directory path");
+        path.push(".gitconfig");
+        let path: String = path.into_os_string().into_string().unwrap();
+        cs.add_files(&["/etc/gitconfig", ".gitconfig", &path]);
+        assert_eq!(cs.get_int("trace2.eventNesting"), Some(5));
+        assert_eq!(cs.get_str("no_such_config_item"), None);
+    }
+}