Add unicode & bytes c-api support by bschoenmaeckers · Pull Request #7904 · RustPython/RustPython

bschoenmaeckers · 2026-05-17T14:34:19Z

Summary by CodeRabbit

New Features
- Added bytes C-API support: create bytes, get size, and access raw byte data from extensions.
- Added Unicode C-API support: create/inspect UTF‑8 strings, encode, compare, and intern strings.
- Expanded public C-API surface to expose the new bytes and unicode capabilities.

coderabbitai · 2026-05-17T14:34:34Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 9b69a26b-4ae8-47b4-b059-6b9ad16a5fad

📥 Commits

Reviewing files that changed from the base of the PR and between 4447635 and 78ba5e0.

📒 Files selected for processing (2)

crates/capi/src/bytesobject.rs
crates/capi/src/unicodeobject.rs

🚧 Files skipped from review as they are similar to previous changes (2)

crates/capi/src/bytesobject.rs
crates/capi/src/unicodeobject.rs

📝 Walkthrough

Walkthrough

Adds C-API bindings for Python bytes and unicode objects: type-check helpers, bytes constructors/accessors, unicode constructors/accessors/encoding/interning/comparison, public module exports, and crate-visible macro re-export.

Changes

C-API Bytes and Unicode Object Bindings

Layer / File(s)	Summary
C-API Type Check Macro Re-export and Module Declaration `crates/capi/src/object.rs`, `crates/capi/src/lib.rs`	`define_py_check` is re-exported with crate visibility and `bytesobject` and `unicodeobject` are declared public.
Bytes Object C-API Functions `crates/capi/src/bytesobject.rs`	Adds `PyBytes_Check`/`PyBytes_CheckExact`, `PyBytes_FromStringAndSize` (handles null pointer/uninitialized buffer and negative lengths), `PyBytes_Size`, `PyBytes_AsString`, and disabled pyo3 tests.
Unicode Object C-API Functions `crates/capi/src/unicodeobject.rs`	Adds `PyUnicode_Check`/`PyUnicode_CheckExact`, `PyUnicode_FromStringAndSize`, `PyUnicode_AsUTF8AndSize`, `PyUnicode_AsEncodedString`, `PyUnicode_InternInPlace`, `PyUnicode_EqualToUTF8AndSize`, and disabled pyo3 tests.

Sequence Diagram(s)

sequenceDiagram
  participant CCaller as C caller
  participant PyBytes_FromStringAndSize
  participant VM as RustPython VM
  participant Ctx as VM Context
  CCaller->>PyBytes_FromStringAndSize: (bytes: *mut c_char, len: isize)
  PyBytes_FromStringAndSize->>VM: with_vm_context
  alt bytes is NULL
    VM->>Ctx: allocate uninitialized Vec<u8>
  else bytes not NULL
    VM->>Ctx: copy from pointer slice into Vec<u8>
  end
  Ctx-->>VM: new PyBytes PyObject*
  VM-->>PyBytes_FromStringAndSize: PyObject*
  PyBytes_FromStringAndSize-->>CCaller: PyObject*

sequenceDiagram
  participant CCaller as C caller
  participant PyUnicode_InternInPlace
  participant VM as RustPython VM
  participant Ctx as VM Context
  CCaller->>PyUnicode_InternInPlace: string: *mut *mut PyObject
  PyUnicode_InternInPlace->>VM: downcast *string to PyStr
  VM->>Ctx: intern string
  Ctx-->>VM: interned PyObject*
  VM-->>PyUnicode_InternInPlace: interned PyObject*
  PyUnicode_InternInPlace-->>CCaller: write interned pointer back to *string

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

RustPython/RustPython#7871: Introduces define_py_check macro-generated C-API type-check functions; related to this PR's usage and re-export of that macro.

Suggested reviewers

youknowone
ShaharNaveh

Poem

🐰 I hopped through bytes and strings today,
From raw C pointers to UTF-8 play.
I copied, interned, and checked with care,
Rust bridges C so Python can share.
Hooray for bindings — nibble, hop, hooray! 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main objective of the changeset: adding C-API support for unicode and bytes types through new FFI functions in bytesobject.rs and unicodeobject.rs.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/capi/src/bytesobject.rs`:
- Around line 10-26: Validate that the incoming len is non-negative at the start
of PyBytes_FromStringAndSize and bail out immediately if it's negative: check if
len < 0, set an appropriate Python exception (e.g., raise ValueError or call the
existing C-API error setter) on the VM, and return NULL instead of converting
len to usize; only after this check convert len to usize and proceed with the
current branches that allocate or slice using that usize value (refer to
function PyBytes_FromStringAndSize and the branches that call
Vec::with_capacity/set_len and slice::from_raw_parts).

In `@crates/capi/src/unicodeobject.rs`:
- Around line 108-126: The function PyUnicode_EqualToUTF8AndSize uses
slice::from_raw_parts with size cast unsafely, which overflows when size is
negative; add a guard at the start of PyUnicode_EqualToUTF8AndSize that checks
if size < 0 and immediately returns false (0) via the with_vm/Ok(false) path (or
direct c_int 0) to avoid creating an oversized slice, then proceed with the
existing logic (locate the unicode downcast to PyStr and the slice/from_utf8
steps) only when size is non-negative.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 9999d9a7-88c7-4f09-a7bf-21b46eaf41bf

📥 Commits

Reviewing files that changed from the base of the PR and between a1a87dc and 4447635.

📒 Files selected for processing (4)

crates/capi/src/bytesobject.rs
crates/capi/src/lib.rs
crates/capi/src/object.rs
crates/capi/src/unicodeobject.rs

youknowone · 2026-05-17T15:24:46Z

@bschoenmaeckers may be worth to check coderabbit comments

bschoenmaeckers · 2026-05-17T15:37:18Z

@bschoenmaeckers may be worth to check coderabbit comments

Will do 👍

bschoenmaeckers · 2026-05-17T15:51:07Z

Addressed review comments

Add unicode & bytes c-api support

4447635

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread crates/capi/src/bytesobject.rs

Comment thread crates/capi/src/unicodeobject.rs

youknowone approved these changes May 17, 2026

View reviewed changes

Check for negative size

78ba5e0

youknowone merged commit 20cb884 into RustPython:main May 19, 2026
26 checks passed

bschoenmaeckers deleted the c-api-strings branch May 19, 2026 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unicode & bytes c-api support#7904

Add unicode & bytes c-api support#7904
youknowone merged 2 commits into
RustPython:mainfrom
bschoenmaeckers:c-api-strings

bschoenmaeckers commented May 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 17, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

youknowone commented May 17, 2026

Uh oh!

bschoenmaeckers commented May 17, 2026

Uh oh!

bschoenmaeckers commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bschoenmaeckers commented May 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

youknowone commented May 17, 2026

Uh oh!

bschoenmaeckers commented May 17, 2026

Uh oh!

bschoenmaeckers commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bschoenmaeckers commented May 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading