feat: Cassandra online store, concurrent fetching for multiple entities#3356
Merged
feast-ci-bot merged 1 commit intoNov 29, 2022
Merged
Conversation
minimal handling of exceptions in concurrent query execution read_concurrency parameter in Cassandra online store config yaml Signed-off-by: Stefano Lottini <[email protected]>
ce4a0eb to
e9c04f9
Compare
Contributor
Author
|
/lgtm |
Collaborator
|
@hemidactylus: you cannot LGTM your own PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Collaborator
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adchia, hemidactylus The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
kevjumba
pushed a commit
that referenced
this pull request
Dec 5, 2022
# [0.27.0](v0.26.0...v0.27.0) (2022-12-05) ### Bug Fixes * Changing Snowflake template code to avoid query not implemented … ([#3319](#3319)) ([1590d6b](1590d6b)) * Dask zero division error if parquet dataset has only one partition ([#3236](#3236)) ([69e4a7d](69e4a7d)) * Enable Spark materialization on Yarn ([#3370](#3370)) ([0c20a4e](0c20a4e)) * Ensure that Snowflake accounts for number columns that overspecify precision ([#3306](#3306)) ([0ad0ace](0ad0ace)) * Fix memory leak from usage.py not properly cleaning up call stack ([#3371](#3371)) ([a0c6fde](a0c6fde)) * Fix workflow to contain env vars ([#3379](#3379)) ([548bed9](548bed9)) * Update bytewax materialization ([#3368](#3368)) ([4ebe00f](4ebe00f)) * Update the version counts ([#3378](#3378)) ([8112db5](8112db5)) * Updated AWS Athena template ([#3322](#3322)) ([5956981](5956981)) * Wrong UI data source type display ([#3276](#3276)) ([8f28062](8f28062)) ### Features * Cassandra online store, concurrency in bulk write operations ([#3367](#3367)) ([eaf354c](eaf354c)) * Cassandra online store, concurrent fetching for multiple entities ([#3356](#3356)) ([00fa21f](00fa21f)) * Get Snowflake Query Output As Pyspark Dataframe ([#2504](#2504)) ([#3358](#3358)) ([2f18957](2f18957))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This changes the retrieval of features from the Cassandra online store by leveraging the
Cassandra driver's native concurrency capabilities.
When there are several entities to be retrieved, instead of a sequential read one-by-one, entity after entity,
the reads are executed concurrently, with the driver ensuring the results are kept in the correct order and the call
returns when all results are available.
This, as measured in realistic environments, implies a speedup of 2-3x for retrieval of 20 to 100 entities at once.
Using the Cassandra driver's
execute_concurrent_with_argsfunction requires a new parameter controlling the maximum amount of concurrency to use (somewhat bounded by the number of vCPUs at hand): for transparency, this is exposed in the feature store configuration yaml as a new parameter, which is documented and correctly handled by the guided procedure offeast init -t cassandra.