Skip to content

fix: Ambiguous truth value of array during materialization#6259

Merged
ntkathole merged 2 commits into
feast-dev:masterfrom
alan-gauthier-jt:fix-array-materialize
Apr 14, 2026
Merged

fix: Ambiguous truth value of array during materialization#6259
ntkathole merged 2 commits into
feast-dev:masterfrom
alan-gauthier-jt:fix-array-materialize

Conversation

@alan-gauthier-jt
Copy link
Copy Markdown
Contributor

@alan-gauthier-jt alan-gauthier-jt commented Apr 10, 2026

What this PR does / why we need it:

feast materialize crashes with ValueError: The truth value of an empty array is ambiguous when a scalar feature column contains an empty numpy array (e.g. np.array([])). This is a real-world scenario when a DataFrame row has a missing value represented as an empty array rather than None or np.nan.

Root cause: In _convert_scalar_values_to_proto (sdk/python/feast/type_map.py), the null check uses not pd.isnull(value) for every value in the loop. pd.isnull() is vectorised — when value is a numpy array, it returns a boolean array instead of a scalar. Applying Python's not operator to that array raises ValueError. The same issue exists in:

  • the BOOL scalar path (not pd.isnull(value) in a list comprehension)
  • the UNIX_TIMESTAMP early-return path (_python_datetime_to_int_timestamp(values) called with the raw values list, including any array-like values)
  • the sample type-validation check (sample == 0)

Fix: Before calling pd.isnull(), guard both scalar conversion loops (generic and BOOL) and the sample type-validation with an explicit isinstance(value, np.ndarray) check. Any array-like value in a scalar feature column is unmappable to a protobuf scalar field anyway, so it is safely treated as null → ProtoValue().

Input value Behaviour before Behaviour after
np.array([]) (empty) ValueError crash ProtoValue() (null)
np.array([np.nan, 1.0]) ValueError crash ProtoValue() (null)
np.array([1.0, 2.0]) ValueError crash ProtoValue() (null)
None ProtoValue() (null) unchanged
scalar non-null ProtoValue(field=value) unchanged

Which issue(s) this PR fixes:

Fixes #6255

Checks

  • I've made sure the tests are passing.
  • My commits are signed off (git commit -s)
  • My PR title follows conventional commits format

Testing Strategy

  • Unit tests
  • Integration tests
  • Manual tests
  • Testing is not required for this change

Misc


Open with Devin

@alan-gauthier-jt alan-gauthier-jt requested a review from a team as a code owner April 10, 2026 13:30
devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread sdk/python/feast/type_map.py Outdated
return [ProtoValue(unix_timestamp_val=ts) for ts in int_timestamps] # type: ignore
out = []
for value in values:
if isinstance(value, np.ndarray) or (
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's extract this into small helper and use it everywhere to avoid duplication:

def _is_array_like(value: Any) -> bool:
    return isinstance(value, np.ndarray) or (
        hasattr(value, "__len__") and not isinstance(value, (str, bytes))
    )

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ntkathole ntkathole changed the title fix: ambiguous truth value of array during materialization fix: Ambiguous truth value of array during materialization Apr 13, 2026
Comment thread sdk/python/feast/type_map.py Outdated
out.append(ProtoValue())
else:
(ts,) = _python_datetime_to_int_timestamp([value])
out.append(ProtoValue(unix_timestamp_val=ts)) # type: ignore
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if else logic is going through all rows, better to pre filter first:

if feast_value_type == ValueType.UNIX_TIMESTAMP:
    out = [None] * len(values)
    clean_indices = []
    clean_values = []
    for i, value in enumerate(values):
        if _is_array_like(value) or value is None:
            out[i] = ProtoValue()
        else:
            clean_indices.append(i)
            clean_values.append(value)
    if clean_values:
        timestamps = _python_datetime_to_int_timestamp(clean_values)
        for i, ts in zip(clean_indices, timestamps):
            out[i] = ProtoValue(unix_timestamp_val=ts)
    return out

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I implemented your solution

@ntkathole
Copy link
Copy Markdown
Member

@alan-gauthier-jt I think even better if we fix this at https://github.com/feast-dev/feast/blob/master/sdk/python/feast/type_map.py by adding logic for scalar columns, skip array-like values when picking a sample.

Comment thread sdk/python/feast/type_map.py
Copy link
Copy Markdown
Member

@ntkathole ntkathole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alan-gauthier-jt looks good

@ntkathole ntkathole force-pushed the fix-array-materialize branch from 9037bdc to b91978e Compare April 14, 2026 07:31
@ntkathole ntkathole merged commit d0c8984 into feast-dev:master Apr 14, 2026
3 of 6 checks passed
franciscojavierarceo pushed a commit that referenced this pull request May 4, 2026
# [0.63.0](v0.62.0...v0.63.0) (2026-05-04)

### Bug Fixes

* Add project filter to apply_data_source and delete_data_source (closes [#6206](#6206)) ([#6322](#6322)) ([96562c4](96562c4))
* Add project_id filter to SnowflakeRegistry UPDATE path ([#6243](#6243)) ([6658b71](6658b71)), closes [#6208](#6208) [#6208](#6208)
* Add subprocess timeouts to prevent test_e2e_local hanging on Dask atexit handler ([3de6556](3de6556))
* Ambiguous truth value of array during materialization ([#6259](#6259)) ([d0c8984](d0c8984))
* Auto-detect GCS/S3 registry store when registry is passed as string ([#6260](#6260)) ([7ebcf03](7ebcf03))
* **bigquery:** Prefer query over table in get_table_query_string ([#6360](#6360)) ([77ed779](77ed779)), closes [#6200](#6200)
* correct project_id scoping in get_user_metadata and delete_project ([0c469a7](0c469a7))
* disable Redis RDB persistence in test deployments ([44cd682](44cd682))
* Disable snowflake tests temporarily in CI ([#6356](#6356)) ([31d5a98](31d5a98))
* Filter empty SQL commands at execute_snowflake_statement call sites ([#6249](#6249)) ([92ffbb9](92ffbb9))
* Fix five bugs in milvus online store ([#6275](#6275)) ([212504b](212504b))
* Fix issue with apply feature view ([835cda8](835cda8))
* Fix streaming materialization for exotic sources with lazy UDF pipelines ([c07972d](c07972d))
* Handle missing features gracefully instead of panicking ([7d00b3a](7d00b3a))
* Harden informer cache with label selectors and memory optimizations ([#6242](#6242)) ([3f11356](3f11356))
* **helm:** Avoid nil pointer for metrics.enabled inside podAnnotations ([#6251](#6251)) ([c833f1a](c833f1a))
* Include git in feast server image ([fb03c46](fb03c46))
* Include StreamFeatureView in freshness metric ([#6269](#6269)) ([463f16c](463f16c))
* Pre-create S3A event log dir before SparkContext init ([#6317](#6317)) ([9feca77](9feca77))
* Remote Online Store Type Inference Error with All-NULL Columns ([#6063](#6063)) ([de67bdd](de67bdd))
* Remove selector with kustomize overlay using a JSON 6902 patch ([9107a43](9107a43))
* Resolve multiple bugs in SnowflakeRegistry and Snowflake connection handling ([#6315](#6315)) ([7e66a2e](7e66a2e))
* **spark:** BatchFeatureView with TransformationMode.PYTHON now reads all source columns ([a310eaf](a310eaf))
* **spark:** Use SELECT * when feature_name_columns is empty in pull_all_from_table_or_query ([e1b1d2d](e1b1d2d))
* Support pandas mode in feature builder and fix dask column extraction ([863315e](863315e))
* support SQL string as entity_df in RemoteOfflineStore.get_historical_features ([c559889](c559889))
* Wrap LocalOutputNode return value in ArrowTableValue for consist… ([#6286](#6286)) ([a16cd55](a16cd55))

### Features

* Add agent skills and Cursor/Claude rules for Feast development ([312eea3](312eea3))
* Add feature view versioning support to FAISS online store ([b36acb7](b36acb7))
* Add feature view versioning support to Redis and DynamoDB online stores ([#6257](#6257)) ([edf25af](edf25af)), closes [#6164](#6164) [#6163](#6163)
* Add optional 'org' in feature view ([#6288](#6288)) ([#6301](#6301)) ([608b105](608b105))
* Add RaySource, to_ray_dataset first-class method, docs, and tests ([1c98157](1c98157))
* Add TLS support for Go Feature Server ([#6229](#6229)) ([28a58d0](28a58d0))
* Add Vector Search support to MongoDBOnlineStore ([#6344](#6344)) ([c102738](c102738))
* Add versioning support to Milvus online store ([#6330](#6330)) ([3268ced](3268ced))
* Addresses performance issues in the Redis online store ([2e50da0](2e50da0))
* Allow to set gpu for ray ([5580ab4](5580ab4))
* Bump redis-py version cap from <5 to <8 ([#6339](#6339)) ([9538180](9538180))
* Expose feature_server, materialization, and openlineage configuration via FeatureStore CRD ([ec6ecfd](ec6ecfd))
* Make online_write_batch_size configurable in MaterializationConfig ([#6268](#6268)) ([d41becf](d41becf))
* Make udf optional if agg defined ([#5689](#5689)) ([#6328](#6328)) ([f630056](f630056))
* MongoDB offline store ([#6138](#6138)) ([8eebad7](8eebad7))
* Optional input_schema for ODFV ([#6308](#6308)) ([#6312](#6312)) ([f08b4e8](f08b4e8))
* Provision minimal TokenReview RBAC for OIDC auth and add SSL error logging in token parser ([#6240](#6240)) ([dca57e8](dca57e8))
* **spark:** Add compute-on-read support for BatchFeatureView in get_… ([#6357](#6357)) ([630d9f8](630d9f8))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError: The truth value of an empty array is ambiguous during materialization

3 participants