Add unicode & bytes c-api support#7904
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds C-API bindings for Python bytes and unicode objects: type-check helpers, bytes constructors/accessors, unicode constructors/accessors/encoding/interning/comparison, public module exports, and crate-visible macro re-export. ChangesC-API Bytes and Unicode Object Bindings
Sequence Diagram(s)sequenceDiagram
participant CCaller as C caller
participant PyBytes_FromStringAndSize
participant VM as RustPython VM
participant Ctx as VM Context
CCaller->>PyBytes_FromStringAndSize: (bytes: *mut c_char, len: isize)
PyBytes_FromStringAndSize->>VM: with_vm_context
alt bytes is NULL
VM->>Ctx: allocate uninitialized Vec<u8>
else bytes not NULL
VM->>Ctx: copy from pointer slice into Vec<u8>
end
Ctx-->>VM: new PyBytes PyObject*
VM-->>PyBytes_FromStringAndSize: PyObject*
PyBytes_FromStringAndSize-->>CCaller: PyObject*
sequenceDiagram
participant CCaller as C caller
participant PyUnicode_InternInPlace
participant VM as RustPython VM
participant Ctx as VM Context
CCaller->>PyUnicode_InternInPlace: string: *mut *mut PyObject
PyUnicode_InternInPlace->>VM: downcast *string to PyStr
VM->>Ctx: intern string
Ctx-->>VM: interned PyObject*
VM-->>PyUnicode_InternInPlace: interned PyObject*
PyUnicode_InternInPlace-->>CCaller: write interned pointer back to *string
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/capi/src/bytesobject.rs`:
- Around line 10-26: Validate that the incoming len is non-negative at the start
of PyBytes_FromStringAndSize and bail out immediately if it's negative: check if
len < 0, set an appropriate Python exception (e.g., raise ValueError or call the
existing C-API error setter) on the VM, and return NULL instead of converting
len to usize; only after this check convert len to usize and proceed with the
current branches that allocate or slice using that usize value (refer to
function PyBytes_FromStringAndSize and the branches that call
Vec::with_capacity/set_len and slice::from_raw_parts).
In `@crates/capi/src/unicodeobject.rs`:
- Around line 108-126: The function PyUnicode_EqualToUTF8AndSize uses
slice::from_raw_parts with size cast unsafely, which overflows when size is
negative; add a guard at the start of PyUnicode_EqualToUTF8AndSize that checks
if size < 0 and immediately returns false (0) via the with_vm/Ok(false) path (or
direct c_int 0) to avoid creating an oversized slice, then proceed with the
existing logic (locate the unicode downcast to PyStr and the slice/from_utf8
steps) only when size is non-negative.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 9999d9a7-88c7-4f09-a7bf-21b46eaf41bf
📒 Files selected for processing (4)
crates/capi/src/bytesobject.rscrates/capi/src/lib.rscrates/capi/src/object.rscrates/capi/src/unicodeobject.rs
|
@bschoenmaeckers may be worth to check coderabbit comments |
Will do 👍 |
|
Addressed review comments |
Summary by CodeRabbit