Skip to content

feat: DuckDB historical retrieval without entity dataframe#6108

Open
Vperiodt wants to merge 12 commits into
feast-dev:masterfrom
Vperiodt:patch-dataframe
Open

feat: DuckDB historical retrieval without entity dataframe#6108
Vperiodt wants to merge 12 commits into
feast-dev:masterfrom
Vperiodt:patch-dataframe

Conversation

@Vperiodt
Copy link
Copy Markdown
Contributor

@Vperiodt Vperiodt commented Mar 14, 2026

What this PR does / why we need it:

Adds date-range historical retrieval for the DuckDB offline store when entity_df is omitted.

Which issue(s) this PR fixes:

fixes #5832 related to #1611

Misc


Open with Devin

@Vperiodt Vperiodt marked this pull request as ready for review March 15, 2026 18:17
@Vperiodt Vperiodt requested a review from a team as a code owner March 15, 2026 18:17
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

full_feature_names: bool = False,
**kwargs,
) -> RetrievalJob:
start_date: Optional[datetime] = kwargs.get("start_date", None)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead keep it as optional params in get_historical_features

DEFAULT_ENTITY_DF_EVENT_TIMESTAMP_COL = "event_timestamp"


def _build_entity_df_from_sources(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstring for this

)


DEFAULT_ENTITY_DF_EVENT_TIMESTAMP_COL = "event_timestamp"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import this from from feast.infra.offline_stores.offline_utils import DEFAULT_ENTITY_DF_EVENT_TIMESTAMP_COL

@ntkathole
Copy link
Copy Markdown
Member

@Vperiodt There is integration test tests/integration/offline_store/test_non_entity_mode.py, see if duckdb coverage can be included there

start_date = end_date - timedelta(seconds=max_ttl_seconds)
else:
start_date = end_date - timedelta(days=30)
start_date = make_tzaware(start_date)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 229 - Line 244 is common across the offline store to figure out the start_date & end_date. If its not too much, is it possible to create utility function which can be re-used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I’ll refactor this into a reusable utility function and apply it to similar date range calculations across the codebase in a follow-up PR

devin-ai-integration[bot]

This comment was marked as resolved.

@jyejare
Copy link
Copy Markdown
Collaborator

jyejare commented Apr 7, 2026

@Vperiodt The lint checks are failing please check and rebase the PR.

Copy link
Copy Markdown
Collaborator

@jyejare jyejare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The approach is clean — making entity_df optional and building a synthetic entity DataFrame from the feature view sources is a good way to enable date-range retrieval for DuckDB. Test coverage is solid (553 lines of unit tests + integration test updates).

However, CI is currently failing (lint-python and unit-test-python 3.10), so this is not merge-ready yet.

Blockers

  1. CI failureslint-python and unit-test-python (3.10, ubuntu-latest) are red on the latest run (Apr 13). Please investigate and fix.
  2. Unconditional ibis import in repo_configuration.py — breaks all tests when ibis is not installed. See inline comment for the fix.

Minor Items

  1. Missing docstring on _build_entity_df_from_sources (flagged by @ntkathole, still pending).
  2. Duplicate date-range logic — the start_date/end_date defaulting is copy-pasted across offline stores. Please extract to a shared utility or reuse compute_non_entity_date_range.

Will ACK once CI is green and the blockers are addressed.

)


def _build_entity_df_from_sources(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for this function. Given the non-trivial logic around join_key_map handling, field_mapping, and timestamp filtering, a brief docstring explaining the purpose ("Build a synthetic entity DataFrame from feature view sources for non-entity retrieval mode") and the return value would help future maintainers.

Comment on lines +229 to +243
make_tzaware(end_date) if end_date else datetime.now(timezone.utc)
)

if start_date is None:
max_ttl_seconds = 0
for fv in feature_views:
if fv.ttl and isinstance(fv.ttl, timedelta):
max_ttl_seconds = max(
max_ttl_seconds, int(fv.ttl.total_seconds())
)
if max_ttl_seconds > 0:
start_date = end_date - timedelta(seconds=max_ttl_seconds)
else:
start_date = end_date - timedelta(days=30)
start_date = make_tzaware(start_date)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This start_date/end_date derivation logic (defaulting end_date to now, computing start_date from max TTL, falling back to 30 days) is duplicated across the DuckDB, Postgres, and File offline stores. The Postgres PR (#6057) has the same pattern inlined.

Please extract this into a shared utility (e.g. in offline_utils.py or reuse the existing compute_non_entity_date_range from feast.utils). This would keep the stores consistent and avoid divergence over time.

Comment on lines +94 to +99
"duckdb": (
"local",
importlib.import_module(
"tests.universal.feature_repos.duckdb_repo_configuration"
).DuckDBDataSourceCreator,
),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This importlib.import_module call runs at module load time, which transitively imports ibis (via duckdb_repo_configurationfeast.infra.offline_stores.duckdb). If ibis is not installed, this breaks all tests that import from repo_configuration.py — not just DuckDB-specific tests.

Other non-core offline stores (BigQuery, Redshift, Snowflake at lines 116-137) are added conditionally behind environment variable checks. Please wrap this in a try/except ImportError:

try:
    from tests.universal.feature_repos.duckdb_repo_configuration import DuckDBDataSourceCreator
    OFFLINE_STORE_TO_PROVIDER_CONFIG["duckdb"] = ("local", DuckDBDataSourceCreator)
except ImportError:
    pass

Vperiodt added 11 commits April 17, 2026 17:12
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Signed-off-by: Vanshika Vanshika <[email protected]>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DuckDB - historical retrieval without entity dataframe

4 participants