Skip to content

fix(trino): Clean up temporary entity tables after retrieval#6381

Merged
ntkathole merged 3 commits into
feast-dev:masterfrom
Jwrede:fix/trino-temp-table-cleanup
May 14, 2026
Merged

fix(trino): Clean up temporary entity tables after retrieval#6381
ntkathole merged 3 commits into
feast-dev:masterfrom
Jwrede:fix/trino-temp-table-cleanup

Conversation

@Jwrede
Copy link
Copy Markdown
Contributor

@Jwrede Jwrede commented May 7, 2026

Summary

TrinoOfflineStore.get_historical_features() creates a temporary table
(feast_entity_df_<uuid>) for the entity DataFrame but never drops it,
leaking tables in the Trino catalog indefinitely.

This applies the same context manager pattern already used by the
BigQuery, Redshift, and Athena offline stores: wrap the query in a
generator that issues DROP TABLE IF EXISTS in a finally block.

The cleanup only runs when the entity_df is a DataFrame (when a temp
table was actually created), not when it is a SQL string.

Fixes #6306

How Has This Been Tested?

  • Verified the change follows the same pattern as BigQuery
    (bigquery.py:304), Redshift (redshift.py:236), and Athena
    (athena.py:224).
  • All callers of TrinoRetrievalJob that pass a plain query string
    (e.g. pull_latest_from_table_or_query) continue to work via the
    fallback contextmanager wrapper.

@Jwrede Jwrede requested a review from a team as a code owner May 7, 2026 21:11
@Jwrede
Copy link
Copy Markdown
Contributor Author

Jwrede commented May 13, 2026

Friendly ping @woop -- this has been open ~6 days. Happy to address any feedback.

@ntkathole
Copy link
Copy Markdown
Member

@Jwrede can you please fix linting here as well?

@Jwrede Jwrede force-pushed the fix/trino-temp-table-cleanup branch from 1c95355 to cb6b73d Compare May 14, 2026 06:49
@Jwrede
Copy link
Copy Markdown
Contributor Author

Jwrede commented May 14, 2026

@ntkathole Fixed -- rebased onto master and resolved the ruff import sorting issue.

@Jwrede Jwrede force-pushed the fix/trino-temp-table-cleanup branch from cb6b73d to f571bff Compare May 14, 2026 06:50
"""Returns the SQL query that will be executed in Trino to build the historical feature table"""
return self._query
with self._query_generator() as query:
return query
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyone who calls to_sql() for logging, debugging, or query inspection before materializing results will now silently destroy the temp table, causing subsequent calls to fail with a "table not found" error.

job = store.get_historical_features(entity_df=my_dataframe, ...)
sql = job.to_sql()       # table gets dropped here on context exit
df = job.to_df()         # FAILS — temp table no longer exists

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for

First to_df() → succeeds → drops table
Second to_df() → query references a table that's gone → fails

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- redesigned the cleanup approach. The context manager pattern was wrong here since every method exit triggered a table drop.

Changes:

  • Query is now stored as a plain string; to_sql() returns it directly with no side effects
  • Cleanup moved to a dedicated _drop_temp_table() method, called only after actual query execution (to_df, to_trino)
  • _cleaned_up flag makes the drop idempotent -- safe to call multiple times
  • del fallback ensures cleanup if execution methods are never called

@ntkathole ntkathole force-pushed the fix/trino-temp-table-cleanup branch from eb09ca7 to d821b8d Compare May 14, 2026 07:18
Jwrede added 3 commits May 14, 2026 13:23
TrinoOfflineStore.get_historical_features() creates a temporary table
for the entity DataFrame but never drops it, leaking tables
indefinitely. Apply the same context manager pattern used by
BigQuery, Redshift, and Athena offline stores: wrap the query in a
generator that issues DROP TABLE IF EXISTS in a finally block.

Fixes feast-dev#6306

Signed-off-by: Jonathan Wrede <[email protected]>
Avoid dropping the temporary entity table on to_sql() calls.
Previously, every method used a context manager that dropped
the table on exit, so calling to_sql() before to_df() would
destroy the table and cause subsequent queries to fail.

Now the query is stored as a plain string and cleanup is
handled by a dedicated _drop_temp_table() method called only
after query execution (to_df, to_trino). A __del__ fallback
ensures cleanup if execution methods are never called. The
_cleaned_up flag makes the drop idempotent.

Signed-off-by: Jonathan Wrede <[email protected]>
@ntkathole ntkathole force-pushed the fix/trino-temp-table-cleanup branch from d821b8d to 3ddef73 Compare May 14, 2026 07:53
@ntkathole ntkathole merged commit d86b13d into feast-dev:master May 14, 2026
19 of 25 checks passed
rpathade pushed a commit to rpathade/feast that referenced this pull request May 21, 2026
…ev#6381)

* fix(trino): Clean up temporary entity tables after retrieval

TrinoOfflineStore.get_historical_features() creates a temporary table
for the entity DataFrame but never drops it, leaking tables
indefinitely. Apply the same context manager pattern used by
BigQuery, Redshift, and Athena offline stores: wrap the query in a
generator that issues DROP TABLE IF EXISTS in a finally block.

Fixes feast-dev#6306

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: sort imports for ruff compliance

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: decouple temp table cleanup from query access

Avoid dropping the temporary entity table on to_sql() calls.
Previously, every method used a context manager that dropped
the table on exit, so calling to_sql() before to_df() would
destroy the table and cause subsequent queries to fail.

Now the query is stored as a plain string and cleanup is
handled by a dedicated _drop_temp_table() method called only
after query execution (to_df, to_trino). A __del__ fallback
ensures cleanup if execution methods are never called. The
_cleaned_up flag makes the drop idempotent.

Signed-off-by: Jonathan Wrede <[email protected]>

---------

Signed-off-by: Jonathan Wrede <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>
rpathade pushed a commit to rpathade/feast that referenced this pull request May 21, 2026
…ev#6381)

* fix(trino): Clean up temporary entity tables after retrieval

TrinoOfflineStore.get_historical_features() creates a temporary table
for the entity DataFrame but never drops it, leaking tables
indefinitely. Apply the same context manager pattern used by
BigQuery, Redshift, and Athena offline stores: wrap the query in a
generator that issues DROP TABLE IF EXISTS in a finally block.

Fixes feast-dev#6306

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: sort imports for ruff compliance

Signed-off-by: Jonathan Wrede <[email protected]>

* fix: decouple temp table cleanup from query access

Avoid dropping the temporary entity table on to_sql() calls.
Previously, every method used a context manager that dropped
the table on exit, so calling to_sql() before to_df() would
destroy the table and cause subsequent queries to fail.

Now the query is stored as a plain string and cleanup is
handled by a dedicated _drop_temp_table() method called only
after query execution (to_df, to_trino). A __del__ fallback
ensures cleanup if execution methods are never called. The
_cleaned_up flag makes the drop idempotent.

Signed-off-by: Jonathan Wrede <[email protected]>

---------

Signed-off-by: Jonathan Wrede <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TrinoOfflineStore does not cleanup temporary tables

2 participants