Skip to content

fix(compute-engine/local): Honor field_mapping on join keys in dedup + join nodes#6395

Merged
franciscojavierarceo merged 3 commits into
feast-dev:masterfrom
1fanwang:fix/materialize-field-mapping-on-join-keys
May 13, 2026
Merged

fix(compute-engine/local): Honor field_mapping on join keys in dedup + join nodes#6395
franciscojavierarceo merged 3 commits into
feast-dev:masterfrom
1fanwang:fix/materialize-field-mapping-on-join-keys

Conversation

@1fanwang
Copy link
Copy Markdown
Contributor

What this PR does

Fixes #5942.

The bug isn't actually Snowflake-specific despite the issue title — it lives in the local compute engine's DAG nodes.

Path:

  1. _get_column_names() returns the reverse-mapped join keys (e.g. ["USERID"]).
  2. pull_latest_from_table_or_query is called with those raw names — correct, the offline store needs them.
  3. LocalSourceReadNode.execute renames the result columns via field_mapping.get(col, col) — the table now has user_id.
  4. LocalDedupNode and LocalJoinNode then look up column_info.join_keys (still ["USERID"]) on a df whose columns are user_id. pandas raises KeyError: Index(['USERID'], dtype='object') — the exact error in the issue.

The existing ColumnInfo class already has mapped properties for timestamp_column and created_timestamp_column (precedent: PR #4886, which fixed the analogous bug for get_historical_features). This PR adds the missing join_keys_columns mapped property and uses it in LocalDedupNode and LocalJoinNode.

How was this tested?

  • New regression test test_local_dedup_node_with_field_mapping_on_join_key in tests/unit/infra/compute_engines/local/test_nodes.py. Reproduces the exact KeyError(['USERID']) on master; passes with this PR.
  • All 9 existing tests in test_nodes.py still pass.
  • All 62 tests under tests/unit/infra/compute_engines/ still pass.
  • ruff check, ruff format --check, mypy all clean.

Scope

Local-engine only. The Spark, Ray, AWS Lambda, k8s, and Snowflake compute engines have the same column_info.join_keys lookup pattern in their nodes and likely the same latent bug. Each is a separate <50 LoC follow-up — happy to file those if maintainers want them in this PR or as stacked.

When a batch source defines a `field_mapping` that renames an entity join
key (e.g. `USERID` -> `user_id`), the source-read node renames the columns
on the pulled Arrow table to their mapped names. Downstream `LocalDedupNode`
and `LocalJoinNode` then look up the *pre-mapping* names from
`column_info.join_keys`, which raises `KeyError: Index(['USERID'])` during
materialization (or returns an empty join).

Add a `join_keys_columns` property on `ColumnInfo` that mirrors the existing
`timestamp_column` / `created_timestamp_column` properties — returning join
keys translated through `field_mapping` — and use it from the dedup and
join nodes.

Fixes feast-dev#5942.

Signed-off-by: 1fanwang <[email protected]>
@1fanwang 1fanwang requested a review from a team as a code owner May 12, 2026 08:36
@HaoXuAI
Copy link
Copy Markdown
Collaborator

HaoXuAI commented May 12, 2026

LGTM
The initial work didn't handle the field mapping because I was thinking to unify the field mapping interface. But this never happened.

@1fanwang 1fanwang changed the title fix(compute-engine/local): honor field_mapping on join keys in dedup + join nodes fix(compute-engine/local): Honor field_mapping on join keys in dedup + join nodes May 12, 2026
@franciscojavierarceo franciscojavierarceo merged commit bd01824 into feast-dev:master May 13, 2026
26 checks passed
rpathade pushed a commit to rpathade/feast that referenced this pull request May 21, 2026
…+ join nodes (feast-dev#6395)

* fix: Apply field mapping to join keys in local compute engine nodes

When a batch source defines a `field_mapping` that renames an entity join
key (e.g. `USERID` -> `user_id`), the source-read node renames the columns
on the pulled Arrow table to their mapped names. Downstream `LocalDedupNode`
and `LocalJoinNode` then look up the *pre-mapping* names from
`column_info.join_keys`, which raises `KeyError: Index(['USERID'])` during
materialization (or returns an empty join).

Add a `join_keys_columns` property on `ColumnInfo` that mirrors the existing
`timestamp_column` / `created_timestamp_column` properties — returning join
keys translated through `field_mapping` — and use it from the dedup and
join nodes.

Fixes feast-dev#5942.

Signed-off-by: 1fanwang <[email protected]>

* test: also cover LocalJoinNode field_mapping case

Signed-off-by: 1fanwang <[email protected]>

---------

Signed-off-by: 1fanwang <[email protected]>
Signed-off-by: RutujaPathade <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Materialization fails when field mappings are used to rename entity join keys (SnowflakeSource)

3 participants