Is your feature request related to a problem? Please describe.
When materializing features to an online store via LocalOutputNode, the current implementation converts the entire Arrow Table into a Python list of ValueProto objects before any writing occurs. At hundreds of thousands of rows, this causes severe memory pressure and can OOM in practice.
Root Cause
The call chain in LocalOutputNode.execute():
rows_to_write = _convert_arrow_to_proto(
input_table, self.feature_view, join_key_to_value_type
)
online_store.online_write_batch(..., data=rows_to_write, ...)
_convert_arrow_to_proto (utils.py:325) performs three full-data copies sequentially:
- Arrow → NumPy (
to_numpy(zero_copy_only=False)) — necessary to bridge Arrow nulls to Python type system
- NumPy →
List[ValueProto] — each scalar becomes an independent Python protobuf heap object (~200 bytes overhead per value vs 4–8 bytes raw)
- Column-wise → row-wise (
list(zip(...))) — full materialization into a Python list
Describe the solution you'd like
Chunk iteration in LocalOutputNode (minimal, low-risk):
BATCH_SIZE = 10_000
for batch in input_table.to_batches(max_chunksize=BATCH_SIZE):
rows_to_write = _convert_arrow_to_proto(
batch, self.feature_view, join_key_to_value_type
)
online_store.online_write_batch(
config=context.repo_config,
table=self.feature_view,
data=rows_to_write,
progress=lambda x: None,
)
# rows_to_write eligible for GC after each iteration
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Is your feature request related to a problem? Please describe.
When materializing features to an online store via LocalOutputNode, the current implementation converts the entire Arrow Table into a Python list of ValueProto objects before any writing occurs. At hundreds of thousands of rows, this causes severe memory pressure and can OOM in practice.
Root Cause
The call chain in
LocalOutputNode.execute():_convert_arrow_to_proto(utils.py:325) performs three full-data copies sequentially:to_numpy(zero_copy_only=False)) — necessary to bridge Arrow nulls to Python type systemList[ValueProto]— each scalar becomes an independent Python protobuf heap object (~200 bytes overhead per value vs 4–8 bytes raw)list(zip(...))) — full materialization into a Python listDescribe the solution you'd like
Chunk iteration in
LocalOutputNode(minimal, low-risk):Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.