fix(trino): Clean up temporary entity tables after retrieval#6381
Merged
Conversation
Contributor
Author
|
Friendly ping @woop -- this has been open ~6 days. Happy to address any feedback. |
Member
|
@Jwrede can you please fix linting here as well? |
1c95355 to
cb6b73d
Compare
Contributor
Author
|
@ntkathole Fixed -- rebased onto master and resolved the ruff import sorting issue. |
cb6b73d to
f571bff
Compare
ntkathole
reviewed
May 14, 2026
| """Returns the SQL query that will be executed in Trino to build the historical feature table""" | ||
| return self._query | ||
| with self._query_generator() as query: | ||
| return query |
Member
There was a problem hiding this comment.
Anyone who calls to_sql() for logging, debugging, or query inspection before materializing results will now silently destroy the temp table, causing subsequent calls to fail with a "table not found" error.
job = store.get_historical_features(entity_df=my_dataframe, ...)
sql = job.to_sql() # table gets dropped here on context exit
df = job.to_df() # FAILS — temp table no longer exists
Member
There was a problem hiding this comment.
same for
First to_df() → succeeds → drops table
Second to_df() → query references a table that's gone → fails
Contributor
Author
There was a problem hiding this comment.
Good catch -- redesigned the cleanup approach. The context manager pattern was wrong here since every method exit triggered a table drop.
Changes:
- Query is now stored as a plain string; to_sql() returns it directly with no side effects
- Cleanup moved to a dedicated _drop_temp_table() method, called only after actual query execution (to_df, to_trino)
- _cleaned_up flag makes the drop idempotent -- safe to call multiple times
- del fallback ensures cleanup if execution methods are never called
ntkathole
approved these changes
May 14, 2026
eb09ca7 to
d821b8d
Compare
TrinoOfflineStore.get_historical_features() creates a temporary table for the entity DataFrame but never drops it, leaking tables indefinitely. Apply the same context manager pattern used by BigQuery, Redshift, and Athena offline stores: wrap the query in a generator that issues DROP TABLE IF EXISTS in a finally block. Fixes feast-dev#6306 Signed-off-by: Jonathan Wrede <[email protected]>
Signed-off-by: Jonathan Wrede <[email protected]>
Avoid dropping the temporary entity table on to_sql() calls. Previously, every method used a context manager that dropped the table on exit, so calling to_sql() before to_df() would destroy the table and cause subsequent queries to fail. Now the query is stored as a plain string and cleanup is handled by a dedicated _drop_temp_table() method called only after query execution (to_df, to_trino). A __del__ fallback ensures cleanup if execution methods are never called. The _cleaned_up flag makes the drop idempotent. Signed-off-by: Jonathan Wrede <[email protected]>
d821b8d to
3ddef73
Compare
rpathade
pushed a commit
to rpathade/feast
that referenced
this pull request
May 21, 2026
…ev#6381) * fix(trino): Clean up temporary entity tables after retrieval TrinoOfflineStore.get_historical_features() creates a temporary table for the entity DataFrame but never drops it, leaking tables indefinitely. Apply the same context manager pattern used by BigQuery, Redshift, and Athena offline stores: wrap the query in a generator that issues DROP TABLE IF EXISTS in a finally block. Fixes feast-dev#6306 Signed-off-by: Jonathan Wrede <[email protected]> * fix: sort imports for ruff compliance Signed-off-by: Jonathan Wrede <[email protected]> * fix: decouple temp table cleanup from query access Avoid dropping the temporary entity table on to_sql() calls. Previously, every method used a context manager that dropped the table on exit, so calling to_sql() before to_df() would destroy the table and cause subsequent queries to fail. Now the query is stored as a plain string and cleanup is handled by a dedicated _drop_temp_table() method called only after query execution (to_df, to_trino). A __del__ fallback ensures cleanup if execution methods are never called. The _cleaned_up flag makes the drop idempotent. Signed-off-by: Jonathan Wrede <[email protected]> --------- Signed-off-by: Jonathan Wrede <[email protected]> Signed-off-by: RutujaPathade <[email protected]>
rpathade
pushed a commit
to rpathade/feast
that referenced
this pull request
May 21, 2026
…ev#6381) * fix(trino): Clean up temporary entity tables after retrieval TrinoOfflineStore.get_historical_features() creates a temporary table for the entity DataFrame but never drops it, leaking tables indefinitely. Apply the same context manager pattern used by BigQuery, Redshift, and Athena offline stores: wrap the query in a generator that issues DROP TABLE IF EXISTS in a finally block. Fixes feast-dev#6306 Signed-off-by: Jonathan Wrede <[email protected]> * fix: sort imports for ruff compliance Signed-off-by: Jonathan Wrede <[email protected]> * fix: decouple temp table cleanup from query access Avoid dropping the temporary entity table on to_sql() calls. Previously, every method used a context manager that dropped the table on exit, so calling to_sql() before to_df() would destroy the table and cause subsequent queries to fail. Now the query is stored as a plain string and cleanup is handled by a dedicated _drop_temp_table() method called only after query execution (to_df, to_trino). A __del__ fallback ensures cleanup if execution methods are never called. The _cleaned_up flag makes the drop idempotent. Signed-off-by: Jonathan Wrede <[email protected]> --------- Signed-off-by: Jonathan Wrede <[email protected]> Signed-off-by: RutujaPathade <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TrinoOfflineStore.get_historical_features()creates a temporary table(
feast_entity_df_<uuid>) for the entity DataFrame but never drops it,leaking tables in the Trino catalog indefinitely.
This applies the same context manager pattern already used by the
BigQuery, Redshift, and Athena offline stores: wrap the query in a
generator that issues
DROP TABLE IF EXISTSin afinallyblock.The cleanup only runs when the entity_df is a DataFrame (when a temp
table was actually created), not when it is a SQL string.
Fixes #6306
How Has This Been Tested?
(
bigquery.py:304), Redshift (redshift.py:236), and Athena(
athena.py:224).TrinoRetrievalJobthat pass a plain query string(e.g.
pull_latest_from_table_or_query) continue to work via thefallback
contextmanagerwrapper.