Skip to content

Align cleanup bytecode layout with CPython#7935

Open
youknowone wants to merge 3 commits into
RustPython:mainfrom
youknowone:bytecode-parity
Open

Align cleanup bytecode layout with CPython#7935
youknowone wants to merge 3 commits into
RustPython:mainfrom
youknowone:bytecode-parity

Conversation

@youknowone
Copy link
Copy Markdown
Member

@youknowone youknowone commented May 20, 2026

Summary

Align exception, finally, and with cleanup bytecode layout with CPython 3.14 compiler behavior for issue youknowone#32.

This updates shared codegen/CFG assembly behavior rather than editing Lib/, using the local CPython compiler sources as reference:

  • Python/codegen.c for try/finally, with, while, break/continue, and named-except cleanup layout
  • Python/flowgraph.c for late CFG cleanup and load-fast borrow behavior
  • Python/assemble.c for jump offset fixed-point behavior around EXTENDED_ARG

Validation

  • cargo fmt --check
  • cargo test -p rustpython-codegen --lib
  • cargo build --release -p rustpython
  • cargo clippy -p rustpython-codegen --lib
  • git diff --check
  • scripts/compare_bytecode.py with release RustPython for the issue Add proper support for kwargs #32 grouped scope:
    • multiprocessing/forkserver.py
    • test/test_compile.py
    • test/test_faulthandler.py
    • test/test_hashlib.py
    • test/test_importlib/extension/test_loader.py
    • test/test_poplib.py
    • test/test_socket.py
    • test/test_sys_settrace.py
    • test/test_unittest/testmock/testsealable.py

All grouped-scope compare reports are at 0 differences.

Summary by CodeRabbit

  • Refactor

    • Improved compiler control-flow and bytecode layout generation for complex constructs (match OR-patterns, try/finally, infinite-loop patterns, multi-with), with better NOP/location preservation and more consistent jump/epilogue ordering to match reference behavior.
  • Tests

    • Added extensive tests covering bytecode/layout edge cases (NOP preservation, location overrides, jump/return epilogues, and various try/while/match interactions).

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

This PR refines RustPython's bytecode codegen to achieve closer CPython-parity by improving control-flow handling in try/finally, while-true loops, match statements, and jump assembly. Changes span both high-level AST compilation and low-level IR finalization, introducing new statement-shape predicates, refined NOP preservation logic, location-override propagation, and conditional jump retargeting.

Changes

CPython-parity bytecode codegen refinements

Layer / File(s) Summary
Statement-shape predicates for control-flow decisions
crates/codegen/src/compile.rs
New predicates (statements_are_single_with, statements_end_with_conditional, statements_end_with_while_true_without_break) enable precise detection of statement patterns used to optimize control-flow and NOP preservation decisions in try/finally and loop blocks.
Try/finally optimized-entry and NOP preservation
crates/codegen/src/compile.rs
Refactored try/finally entry logic to use statement-shape predicates for finalbody checks, compute force_remove_finally_entry_nop for with-while-true patterns, remove fast-borrow disabling for bare-except handlers, and adjust finally-exit label conditions.
While-True cleanup and nested-with interactions
crates/codegen/src/compile.rs
Extended cleanup NOP removal to cover while-true-without-break patterns, reworked nested-multiline-with cleanup-target boolean logic, and added tests asserting with/for/inner-infinite-loop cleanup adjacency shapes.
Match OR-pattern detection and case emission
crates/codegen/src/compile.rs
Detect Pattern::MatchOr early (contains_match_or), compute has_match_or_case to disable load-fast borrowing for match-or cases, adjust case-body emission and lineno_override handling to preserve CPython pass/NOP shapes after subject POP_TOP.
Emit edge no-location before JumpNoInterrupt
crates/codegen/src/compile.rs
Insert set_no_location() immediately prior to emitting a JumpNoInterrupt on specific cleanup/control-flow edges.
Compile.rs bytecode layout validation tests
crates/codegen/src/compile.rs
Add and update tests validating match-or load-fast preservation, NEXT_LOCATION NOP retention, nested-finally conditional fallthrough, try/finally boolop-while variants, named-except shared cleanup paths, async match/comprehension jump layout, and multi-with infinite-loop interactions.
IR finalize pipeline restructuring and early cleanups
crates/codegen/src/ir.rs
Run remove_redundant_const_pop_top_pairs() immediately after insert_superinstructions(), insert retarget_conditional_jumps_to_empty_while_exit_epilogue() before duplication/DCE/unreachable-elimination, and mirror ordering in debug tracing.
Resolve next-location overrides before linearization
crates/codegen/src/ir.rs
Add resolve_next_location_overrides() reverse-CFG pass that rewrites lineno_override == -2 by copying location/end-location/lineno_override from following instructions and forces -1 at scope-exit/jump/chain-end boundaries; call it during finalize_code before cache precomputation.
Jump-argument sizing and assembly adjustments
crates/codegen/src/ir.rs
Adjust jump-argument resolution (mutable new_arg for EndAsyncFor) and force EXTENDED_ARG sizing for certain backward jumps when resolved deltas reach the 0xff boundary with targets already outside one-byte range.
Unreachable-block and CPython-shaped epilogue preservation
crates/codegen/src/ir.rs
Add preserves_cpython_unreachable_fallthrough_return_epilogue() to detect and retain unreachable blocks that match CPython-shaped no-location LOAD_CONST + RETURN_VALUE epilogues instead of clearing them unconditionally.
Jump reordering with while-body anchoring and conditional retargeting
crates/codegen/src/ir.rs
Add follows_while_body_anchor_order() to avoid reordering when anchor order holds, track block-push presence in conditional chains to skip false-path reordering, and add retarget_conditional_jumps_to_empty_while_exit_epilogue() to rewrite conditional targets around line-marker implicit-return epilogues.
Return duplication gating and exception cleanup cloning
crates/codegen/src/ir.rs
Prevent duplicating final implicit-return epilogue when NoInterrupt jumps target the last block or a lineful fallthrough exists; collect conditional sources before cloning named-except cleanup returns; skip inlining when controlling jump is NoInterrupt.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • RustPython/RustPython#7870: Overlapping changes around LOAD_FAST_BORROW, empty-block fallthrough, and while/orelse block reuse in codegen/compile passes.
  • RustPython/RustPython#7781: Similar CPython-parity control-flow and no-location NOP handling in compile.rs and the retargeting pass in ir.rs.
  • RustPython/RustPython#7721: Related work on preserving CPython-shaped no-location NOPs and JumpNoInterrupt/set_no_location emission.

Poem

🐰 I hopped through NOPs and linenos bright,

Chased while True loops deep into the night,
I nudged a jump, I copied a line,
Now epilogues fall through, preserved just fine,
A rabbit's bytecode dance — CPython-shaped delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Align cleanup bytecode layout with CPython' directly summarizes the main change—updating RustPython's code generation to match CPython's bytecode layout for exception, finally, and with cleanup patterns.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone changed the title [codex] Align cleanup bytecode layout with CPython Align cleanup bytecode layout with CPython May 20, 2026
@youknowone youknowone marked this pull request as ready for review May 20, 2026 12:28
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/codegen/src/compile.rs`:
- Around line 8728-8731: The current check only sees root-level OR patterns by
testing has_match_or_case over m.pattern with matches!(...,
ast::Pattern::MatchOr(_)); implement a recursive detector (e.g.,
contains_match_or(pattern: &ast::Pattern) -> bool) that walks all pattern
variants and returns true if any nested ast::Pattern::MatchOr is found, then
replace the .any(...) calls (including the similar check at the later block
around the 8788-8793 area) to use contains_match_or(&m.pattern) so nested forms
like [1 | 2, x] or Foo(bar=1 | 2) are correctly detected.
- Around line 1096-1099: The helper statements_end_with_conditional currently
only checks for a top-level ast::Stmt::If; broaden it to treat other conditional
shapes (e.g., ast::Stmt::Match) and container statements whose inner body ends
with a conditional (e.g., ast::Stmt::With whose body.last() is an If, nested
blocks, etc.) by making statements_end_with_conditional recursively inspect the
last statement: return true for If or Match, and for With (and other block-like
stmts) call the same helper on their inner body(s); update any call sites
(notably the final-body layout logic that uses statements_end_with_conditional)
to rely on this expanded predicate so trailing match/with-nested-if forms are
recognized as conditional finalizers.

In `@crates/codegen/src/ir.rs`:
- Around line 10882-10898: The current booleans
has_nointerrupt_jump_to_last_block and
has_lineful_return_fallthrough_to_last_block are global to the function and
incorrectly suppress duplication for all predecessors of last_block; change them
to be edge-specific by computing these predicates per predecessor (or per (pred,
last_block) edge) instead of once for the whole function. Concretely, replace
the top-level uses of has_nointerrupt_jump_to_last_block and
has_lineful_return_fallthrough_to_last_block with code that for each predecessor
block computes: scan that predecessor's instructions with jump_thread_kind(...)
== Some(JumpThreadKind::NoInterrupt) and target==last_block (using
next_nonempty_block to compare), and check whether the specific predecessor's
fallthrough (block.next) ends with a lineful ReturnValue (use
instruction_lineno(...) and matches!(instr.instr.real(),
Some(Instruction::ReturnValue))). Do the same edge-specific refactor at the
other occurrence referenced (lines ~10924-10948), so suppression decisions are
made per-edge rather than function-wide.
- Around line 10787-10798: The current pattern in
ends_with_line_marker_implicit_return(block: &Block) uses [.., marker, load,
ret] which will match blocks that have real instructions before the suffix;
change the match so we only consider truly empty epilogues by requiring the
block be exactly [marker, load, ret] or, if you want to allow only padding nops,
add a guard that verifies all instructions before marker are non-real padding
(e.g. instr.real() is None or Instruction::Nop and/or have the appropriate
no-location flags) before retargeting; update the match and/or add that extra
guard around marker, load, ret to avoid skipping earlier real instructions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 3a0c4118-c74c-4b6a-beaa-4402e66577f3

📥 Commits

Reviewing files that changed from the base of the PR and between ae3804f and 8d1b959.

📒 Files selected for processing (2)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs

Comment thread crates/codegen/src/compile.rs
Comment thread crates/codegen/src/compile.rs Outdated
Comment thread crates/codegen/src/ir.rs
Comment thread crates/codegen/src/ir.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
crates/codegen/src/ir.rs (1)

10931-10936: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check the fallthrough successor for the lineful return.

This predicate still inspects the current block instead of the fallthrough block, so it never becomes true for the jump cases this branch is trying to suppress. That means duplicate_end_returns() can still clone the final epilogue even when the specific fallthrough edge already reaches a lineful return.

Suggested fix
-                let has_lineful_return_fallthrough_to_last_block = block.next != BlockIdx::NULL
-                    && next_nonempty_block(blocks, block.next) == last_block
-                    && block.instructions.last().is_some_and(|instr| {
+                let has_lineful_return_fallthrough_to_last_block = next != BlockIdx::NULL
+                    && next == last_block
+                    && blocks[next.idx()].instructions.last().is_some_and(|instr| {
                         matches!(instr.instr.real(), Some(Instruction::ReturnValue))
                             && instruction_lineno(instr) > 0
                     });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/codegen/src/ir.rs` around lines 10931 - 10936, The predicate
has_lineful_return_fallthrough_to_last_block is checking the current block's
last instruction instead of the fallthrough block; update it to inspect the
fallthrough block returned by next_nonempty_block(blocks, block.next) (e.g., let
fallthrough = next_nonempty_block(blocks, block.next)) and then test
blocks[fallthrough].instructions.last().is_some_and(|instr|
matches!(instr.instr.real(), Some(Instruction::ReturnValue)) &&
instruction_lineno(instr) > 0); this prevents duplicate_end_returns() from
cloning the final epilogue when the fallthrough already ends with a lineful
return.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@crates/codegen/src/ir.rs`:
- Around line 10931-10936: The predicate
has_lineful_return_fallthrough_to_last_block is checking the current block's
last instruction instead of the fallthrough block; update it to inspect the
fallthrough block returned by next_nonempty_block(blocks, block.next) (e.g., let
fallthrough = next_nonempty_block(blocks, block.next)) and then test
blocks[fallthrough].instructions.last().is_some_and(|instr|
matches!(instr.instr.real(), Some(Instruction::ReturnValue)) &&
instruction_lineno(instr) > 0); this prevents duplicate_end_returns() from
cloning the final epilogue when the fallthrough already ends with a lineful
return.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: e7ad65d4-609a-468e-88b2-90f7b2e590d2

📥 Commits

Reviewing files that changed from the base of the PR and between 8d1b959 and aa50896.

📒 Files selected for processing (2)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant