feat: Data Quality Monitoring added in feast UI#6422
Draft
jyejare wants to merge 12 commits into
Draft
Conversation
Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds Data Quality Monitoring (DQM) across the Feast stack, including a new Monitoring section in the Feast UI, new monitoring REST endpoints + CLI, and multi-backend offline-store support for computing/storing monitoring metrics (plus metrics/audit logging enhancements).
Changes:
- UI: Adds Monitoring pages (dashboard, feature detail, feature tab) and react-query hooks for monitoring endpoints.
- SDK/Backend: Adds monitoring compute/storage abstractions to
OfflineStoreand implements them for multiple backends; adds monitoring REST router andfeast monitorCLI. - Ops/Docs: Adds operator CRD + repo-config mapping for DQM config, expands metrics/audit logging, and adds monitoring docs + quickstart references.
Reviewed changes
Copilot reviewed 55 out of 59 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| ui/src/queries/useMonitoringApi.ts | New react-query hooks and fetch helpers for monitoring endpoints |
| ui/src/pages/Sidebar.tsx | Adds “Monitoring” item to the sidebar |
| ui/src/pages/monitoring/Index.tsx | Monitoring landing page with tabs + filters + compute action |
| ui/src/pages/monitoring/FeatureViewMetricsPanel.tsx | Feature-view aggregate metrics panel/table |
| ui/src/pages/monitoring/FeatureServiceMetricsPanel.tsx | Feature-service aggregate metrics panel/table |
| ui/src/pages/monitoring/FeatureMetricsTable.tsx | Feature metrics table with inline mini-histograms |
| ui/src/pages/monitoring/FeatureMetricsDetail.tsx | Feature-level detail view (distribution + stats + null-rate timeline) |
| ui/src/pages/monitoring/components/StatsPanel.tsx | Stats panel for a single feature metric (with baseline comparison) |
| ui/src/pages/monitoring/components/MetricsFilters.tsx | Filters UI for monitoring queries |
| ui/src/pages/monitoring/components/HistogramChart.tsx | SVG histogram rendering for numeric/categorical features |
| ui/src/pages/features/FeatureMonitoringTab.tsx | Adds Monitoring tab content on feature detail pages |
| ui/src/pages/features/FeatureInstance.tsx | Adds “Monitoring” tab to feature instance navigation/routes |
| ui/src/FeastUISansProviders.tsx | Wires Monitoring routes and Monitoring context into the UI app |
| ui/src/contexts/MonitoringContext.ts | New context for monitoring API base URL and enable flag |
| ui/package-lock.json | Updates UI package lock (including version bump) |
| sdk/python/tests/unit/monitoring/test_metrics_calculator.py | Unit tests for metrics calculator + NaN/Inf sanitization |
| sdk/python/tests/unit/monitoring/init.py | Adds unit test package init for monitoring |
| sdk/python/tests/integration/monitoring/init.py | Adds integration test package init for monitoring |
| sdk/python/feast/repo_config.py | Adds DqmConfig and dqm repo config field |
| sdk/python/feast/monitoring/monitoring_utils.py | Shared monitoring constants + helpers for normalization/aggregation |
| sdk/python/feast/monitoring/metrics_calculator.py | PyArrow/NumPy fallback metrics calculator |
| sdk/python/feast/monitoring/dqm_job_manager.py | DQM job persistence/status manager using offline store storage |
| sdk/python/feast/monitoring/init.py | Exposes monitoring public API symbols |
| sdk/python/feast/metrics.py | Adds offline retrieval metrics + structured audit logging helpers |
| sdk/python/feast/infra/offline_stores/offline_store.py | Adds monitoring compute/storage abstract methods; adds offline retrieval instrumentation |
| sdk/python/feast/infra/offline_stores/duckdb.py | Implements monitoring compute + parquet-backed storage for DuckDB |
| sdk/python/feast/infra/offline_stores/dask.py | Implements monitoring compute + parquet-backed storage for Dask |
| sdk/python/feast/infra/offline_stores/contrib/spark_offline_store/spark.py | Implements monitoring compute + SparkSQL storage for Spark |
| sdk/python/feast/infra/offline_stores/contrib/oracle_offline_store/oracle.py | Implements monitoring compute + Oracle storage via MERGE |
| sdk/python/feast/infra/feature_servers/base_config.py | Adds new metrics config flags: offline_features + audit_logging |
| sdk/python/feast/feature_server.py | Emits online audit logs around get-online-features calls |
| sdk/python/feast/cli/monitor.py | Adds feast monitor run CLI for batch/log monitoring compute |
| sdk/python/feast/cli/cli.py | Registers the new monitor CLI command group |
| sdk/python/feast/api/registry/rest/monitoring.py | Adds FastAPI router for monitoring compute/read endpoints |
| sdk/python/feast/api/registry/rest/init.py | Registers monitoring router with the registry REST API |
| Makefile | Avoids recreating .venv in CI install target |
| infra/feast-operator/internal/controller/services/services_types.go | Adds DQM YAML config struct to operator repo config |
| infra/feast-operator/internal/controller/services/repo_config.go | Maps operator DQM spec to repo config YAML |
| infra/feast-operator/internal/controller/services/repo_config_test.go | Tests operator repo config YAML includes dqm.auto_baseline |
| infra/feast-operator/docs/api/markdown/ref.md | Documents operator DQM config API fields |
| infra/feast-operator/dist/install.yaml | Updates CRD schema with spec.dqm.autoBaseline |
| infra/feast-operator/config/samples/v1_featurestore_serving.yaml | Documents new metrics flags in sample config |
| infra/feast-operator/config/crd/bases/feast.dev_featurestores.yaml | Updates CRD base schema with DQM config |
| infra/feast-operator/api/v1/zz_generated.deepcopy.go | Adds deepcopy support for DQM config |
| infra/feast-operator/api/v1/featurestore_types.go | Adds dqm field + type to operator API |
| docs/SUMMARY.md | Adds links to monitoring quickstart and how-to guide |
| docs/reference/feature-servers/python-feature-server.md | Documents offline retrieval metrics + audit logging |
| docs/how-to-guides/feature-monitoring.md | New how-to guide for feature monitoring |
| .secrets.baseline | Updates secrets baseline for new notebook content |
Files not reviewed (2)
- infra/feast-operator/api/v1/zz_generated.deepcopy.go: Language not supported
- ui/package-lock.json: Language not supported
Comments suppressed due to low confidence (3)
ui/src/FeastUISansProviders.tsx:161
- The routing JSX appears malformed (nested duplicate
/p/:projectName/*Routeblocks and inconsistent indentation), suggesting one of the<Route>elements isn’t being properly closed before sibling routes are declared. This will either fail compilation or produce an unexpected route hierarchy; please re-check the<Route>nesting and ensure each opened<Route>is closed before adding siblings likedata-set/,permissions/,monitoring/, etc.
This issue also appears on line 221 of the same file.
ui/src/FeastUISansProviders.tsx:226
- The provider closing tags are unbalanced here:
</FeatureFlagsContext.Provider>is present but there is no corresponding<FeatureFlagsContext.Provider>opening tag in this file, andDataModeContext.Provider(opened above) is never closed. This will break compilation and/or context propagation—please fix the provider nesting and ensure every opened provider is properly closed.
ui/src/queries/useMonitoringApi.ts:223 useComputeMetricsPOST to/monitoring/computealso ignoresfetchOptions/credentials used elsewhere in the UI. If the registry server is protected via cookies or auth headers, the compute call may fail. Consider passing through the same headers/credentials strategy used byrestFetchfor consistency.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+110
to
+118
| const qs = buildQueryString(params); | ||
| const res = await fetch(`${baseUrl}${path}${qs}`); | ||
| if (!res.ok) { | ||
| throw new Error(`Failed to fetch ${path}: ${res.status} ${res.statusText}`); | ||
| } | ||
| const text = await res.text(); | ||
| const sanitized = text.replace(/:\s*NaN/g, ": null").replace(/:\s*Infinity/g, ": null").replace(/:\s*-Infinity/g, ": null"); | ||
| return JSON.parse(sanitized); | ||
| }; |
Comment on lines
+225
to
+231
| { | ||
| onSuccess: () => { | ||
| queryClient.invalidateQueries("monitoring-features"); | ||
| queryClient.invalidateQueries("monitoring-feature-views"); | ||
| queryClient.invalidateQueries("monitoring-feature-services"); | ||
| }, | ||
| }, |
Comment on lines
+164
to
+184
| const useBaselineMetrics = ( | ||
| project: string, | ||
| featureViewName?: string, | ||
| featureName?: string, | ||
| dataSourceType?: string, | ||
| ) => { | ||
| const { apiBaseUrl, enabled } = useContext(MonitoringContext); | ||
| return useQuery<FeatureMetric[]>( | ||
| ["monitoring-baseline", project, featureViewName, featureName], | ||
| () => | ||
| fetchMonitoring<FeatureMetric[]>( | ||
| apiBaseUrl, | ||
| "/monitoring/metrics/baseline", | ||
| { | ||
| project, | ||
| feature_view_name: featureViewName, | ||
| feature_name: featureName, | ||
| data_source_type: dataSourceType, | ||
| }, | ||
| ), | ||
| { staleTime: STALE_TIME, enabled, retry: 1 }, |
Comment on lines
+97
to
+101
| const hasError = | ||
| featureQuery.isError && fvQuery.isError && fsQuery.isError; | ||
| const hasData = | ||
| (featureQuery.data && featureQuery.data.length > 0) || | ||
| (fvQuery.data && fvQuery.data.length > 0); |
Comment on lines
+209
to
+214
| <h4 style={{ fontSize: 14, fontWeight: 600, marginBottom: 8 }}> | ||
| Null Rate Over Time | ||
| </h4> | ||
| <svg width={chartWidth} height={chartHeight + 20} role="img"> | ||
| <polyline | ||
| points={polyline} |
Comment on lines
+116
to
+139
| if job_type == "auto_compute": | ||
| result = monitoring_service.auto_compute( | ||
| project=project, | ||
| feature_view_name=job.get("feature_view_name"), | ||
| ) | ||
| elif job_type == "baseline": | ||
| result = monitoring_service.compute_baseline( | ||
| project=project, | ||
| feature_view_name=job.get("feature_view_name"), | ||
| feature_names=params.get("feature_names"), | ||
| ) | ||
| elif job_type == "compute": | ||
| result = monitoring_service.compute_metrics( | ||
| project=project, | ||
| feature_view_name=job.get("feature_view_name"), | ||
| feature_names=params.get("feature_names"), | ||
| start_date=date.fromisoformat(params["start_date"]) | ||
| if params.get("start_date") | ||
| else None, | ||
| end_date=date.fromisoformat(params["end_date"]) | ||
| if params.get("end_date") | ||
| else None, | ||
| granularity=params.get("granularity", "daily"), | ||
| ) |
Comment on lines
+97
to
+112
| float_array = pc.cast(valid, pa.float64()) | ||
| result["mean"] = _safe_float(pc.mean(float_array).as_py()) # type: ignore[attr-defined] | ||
| result["stddev"] = _safe_float(pc.stddev(float_array, ddof=1).as_py()) # type: ignore[attr-defined] | ||
|
|
||
| min_max = pc.min_max(float_array) # type: ignore[attr-defined] | ||
| result["min_val"] = min_max["min"].as_py() | ||
| result["max_val"] = min_max["max"].as_py() | ||
|
|
||
| quantiles = pc.quantile(float_array, q=[0.50, 0.75, 0.90, 0.95, 0.99]) # type: ignore[attr-defined] | ||
| q_values = quantiles.to_pylist() | ||
| result["p50"] = q_values[0] | ||
| result["p75"] = q_values[1] | ||
| result["p90"] = q_values[2] | ||
| result["p95"] = q_values[3] | ||
| result["p99"] = q_values[4] | ||
|
|
Comment on lines
+75
to
+92
| @router.post("/monitoring/compute", tags=["Monitoring"]) | ||
| async def compute_metrics(request: ComputeMetricsRequest): | ||
| """Submit a DQM job to compute and store metrics. Returns job_id.""" | ||
| if request.granularity not in VALID_GRANULARITIES: | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail=f"Invalid granularity '{request.granularity}'. " | ||
| f"Must be one of {VALID_GRANULARITIES}", | ||
| ) | ||
|
|
||
| store = _get_store() | ||
| if request.feature_view_name: | ||
| fv = store.registry.get_feature_view( | ||
| name=request.feature_view_name, project=request.project | ||
| ) | ||
| assert_permissions(fv, actions=[AuthzedAction.UPDATE]) | ||
|
|
||
| svc = _get_monitoring_service() |
Comment on lines
+105
to
+114
| const fetchMonitoring = async <T>( | ||
| baseUrl: string, | ||
| path: string, | ||
| params: Record<string, string | undefined>, | ||
| ): Promise<T> => { | ||
| const qs = buildQueryString(params); | ||
| const res = await fetch(`${baseUrl}${path}${qs}`); | ||
| if (!res.ok) { | ||
| throw new Error(`Failed to fetch ${path}: ${res.status} ${res.statusText}`); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Adds a Data Quality Monitoring UI to the Feast web interface. Users can view feature-level metrics (distributions, null rates, statistics), feature view aggregates, and feature service health — all from a new Monitoring sidebar section.
Key additions:
react-query) for all monitoring REST endpointsWhich issue(s) this PR fixes:
Part of the Feast monitoring initiative — provides the UI counterpart for the monitoring backend APIs.
Other PR that needs to be merged first
#6202
DEMO
Screen.Recording.2026-05-20.at.9.52.54.PM.mov
Checks
git commit -s)Testing Strategy