Skip to content

feat(operator): Auto-create KubeRay RBAC for Feast service account#6411

Open
aravind-n wants to merge 1 commit into
feast-dev:masterfrom
aravind-n:feat/operator-auto-rbac-kuberay
Open

feat(operator): Auto-create KubeRay RBAC for Feast service account#6411
aravind-n wants to merge 1 commit into
feast-dev:masterfrom
aravind-n:feat/operator-auto-rbac-kuberay

Conversation

@aravind-n
Copy link
Copy Markdown

What this PR does / why we need it:

When a FeatureStore selects the Ray compute engine in KubeRay mode (type: ray.engine, use_kuberay: true), the Feast pod uses the CodeFlare SDK to discover a RayCluster and read mTLS Secrets. Previously the Feast service account had no permissions on ray.io/rayclusters or core/secrets, so every materialization failed with 403 Forbidden and users had to hand-apply a Role + RoleBinding before each deployment.

This PR makes the operator provision that RBAC automatically:

  • New internal/controller/services/ray_rbac.go detects KubeRay mode from the batch-engine ConfigMap and CreateOrUpdates a namespace-scoped Role + RoleBinding named feast-<crName>-kuberay, owner-referenced to the FeatureStore for automatic GC. When use_kuberay flips back to false (or batchEngine is removed) the resources are deleted on the next reconcile.
  • Role rules match exactly what CodeFlare needs:
    • ray.io/rayclustersget, list, watch
    • core/secretsget, list, watch, create, update, delete
  • Wired into FeastServices.Deploy() right after createServiceAccount() so the binding subject exists before the Role applies.
  • Operator's own kubebuilder RBAC markers widened to match — Kubernetes RBAC escalation rules require the granter to hold the verbs it grants. config/rbac/role.yaml and dist/install.yaml regenerated via make manifests / make build-installer.
  • No CRD changes. use_kuberay: true already exists in the Ray compute engine config; the operator just learns to act on it.
  • Docs updated in docs/how-to-guides/feast-operator/06-batch-and-jobs.md so users know manual RBAC setup is no longer required.

Which issue(s) this PR fixes:

Fixes #6408

Checks

  • I've made sure the tests are passing.
  • My commits are signed off (git commit -s)
  • My PR title follows conventional commits format

Testing Strategy

  • Unit tests
  • Integration tests
  • Manual tests
  • Testing is not required for this change

Misc

New ginkgo suite at internal/controller/featurestore_controller_kuberay_rbac_test.go covers three reconciler-level flows against an envtest API server:

  • Role + RoleBinding are created with the correct rules and owner reference when the batch-engine ConfigMap has type: ray.engine and use_kuberay: true.
  • Both are deleted on the next reconcile when use_kuberay flips to false.
  • Neither is created when no batchEngine is configured.

The operator's full suite (make fmt vet lint test) passes locally with these changes.

@aravind-n aravind-n requested a review from a team as a code owner May 18, 2026 00:07
@aravind-n aravind-n changed the title feat(operator): auto-create KubeRay RBAC for Feast service account feat(operator): Auto-create KubeRay RBAC for Feast service account May 18, 2026
When the batch-engine ConfigMap selects the Ray engine in KubeRay mode
(type: ray.engine, use_kuberay: true), the Feast service pod uses the
CodeFlare SDK to discover RayCluster resources and read mTLS Secrets.
Previously the Feast SA had no permissions on either, so the SDK calls
returned 403 and users had to apply the Role + RoleBinding by hand
before every materialization run.

This change makes the operator provision them automatically:

  - New services/ray_rbac.go reads the batch-engine ConfigMap once per
    reconcile and, when KubeRay is selected, CreateOrUpdates a
    namespace-scoped Role + RoleBinding named feast-<crName>-kuberay
    granting the Feast SA:
      ray.io/rayclusters: get, list, watch
      core/secrets:       get, list, watch, create, update, delete
  - Both resources are owner-referenced to the FeatureStore so they
    GC with the CR. When use_kuberay flips back to false (or the
    batchEngine block is removed), they are deleted on the next
    reconcile.
  - The operator's own kubebuilder RBAC markers are widened to match
    so it can hand those verbs to the Feast SA (k8s RBAC escalation
    rules require the granter to hold the granted verbs). config/rbac
    and dist/install.yaml are regenerated accordingly.
  - New ginkgo suite covers create-on-enable, delete-on-disable, and
    no-op when batchEngine is absent.
  - 06-batch-and-jobs.md documents the new auto-RBAC behavior so users
    know manual setup is no longer required.

Fixes feast-dev#6408

Signed-off-by: Aravind Nidadavolu <[email protected]>
@aravind-n aravind-n force-pushed the feat/operator-auto-rbac-kuberay branch from 08a3fc9 to 2f10a08 Compare May 18, 2026 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feast Operator should auto-create RBAC for Feast service account to access KubeRay cluster

1 participant