Skip to content

feat: Data Quality Monitoring added in feast UI#6422

Draft
jyejare wants to merge 12 commits into
feast-dev:masterfrom
jyejare:monitoring_ui
Draft

feat: Data Quality Monitoring added in feast UI#6422
jyejare wants to merge 12 commits into
feast-dev:masterfrom
jyejare:monitoring_ui

Conversation

@jyejare
Copy link
Copy Markdown
Collaborator

@jyejare jyejare commented May 20, 2026

What this PR does / why we need it:

Adds a Data Quality Monitoring UI to the Feast web interface. Users can view feature-level metrics (distributions, null rates, statistics), feature view aggregates, and feature service health — all from a new Monitoring sidebar section.

Key additions:

  • Monitoring dashboard with paginated feature metrics table, inline mini-histograms, and health indicators
  • Feature detail view with full numeric/categorical distribution charts, stats panel, and null rate timeline
  • Monitoring tab on individual feature pages for contextual access
  • Feature View and Feature Service aggregate metrics panels
  • Filters for feature view, granularity, data source, and date range
  • API hooks (react-query) for all monitoring REST endpoints

Which issue(s) this PR fixes:

Part of the Feast monitoring initiative — provides the UI counterpart for the monitoring backend APIs.

Other PR that needs to be merged first

#6202

DEMO

Screen.Recording.2026-05-20.at.9.52.54.PM.mov

Checks

  • I've made sure the tests are passing.
  • My commits are signed off (git commit -s)
  • My PR title follows conventional commits format

Testing Strategy

  • Manual tests

jyejare and others added 12 commits May 20, 2026 20:57
Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Signed-off-by: Jitendra Yejare <[email protected]>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <[email protected]>
@jyejare jyejare requested review from a team and sudohainguyen as code owners May 20, 2026 15:37
@jyejare jyejare requested review from ejscribner, robhowley and tokoko and removed request for a team May 20, 2026 15:37
@jyejare jyejare changed the title feat: Data Quality Monitoring in feast feat: Data Quality Monitoring added in feast UI May 20, 2026
@jyejare jyejare marked this pull request as draft May 20, 2026 15:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Data Quality Monitoring (DQM) across the Feast stack, including a new Monitoring section in the Feast UI, new monitoring REST endpoints + CLI, and multi-backend offline-store support for computing/storing monitoring metrics (plus metrics/audit logging enhancements).

Changes:

  • UI: Adds Monitoring pages (dashboard, feature detail, feature tab) and react-query hooks for monitoring endpoints.
  • SDK/Backend: Adds monitoring compute/storage abstractions to OfflineStore and implements them for multiple backends; adds monitoring REST router and feast monitor CLI.
  • Ops/Docs: Adds operator CRD + repo-config mapping for DQM config, expands metrics/audit logging, and adds monitoring docs + quickstart references.

Reviewed changes

Copilot reviewed 55 out of 59 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
ui/src/queries/useMonitoringApi.ts New react-query hooks and fetch helpers for monitoring endpoints
ui/src/pages/Sidebar.tsx Adds “Monitoring” item to the sidebar
ui/src/pages/monitoring/Index.tsx Monitoring landing page with tabs + filters + compute action
ui/src/pages/monitoring/FeatureViewMetricsPanel.tsx Feature-view aggregate metrics panel/table
ui/src/pages/monitoring/FeatureServiceMetricsPanel.tsx Feature-service aggregate metrics panel/table
ui/src/pages/monitoring/FeatureMetricsTable.tsx Feature metrics table with inline mini-histograms
ui/src/pages/monitoring/FeatureMetricsDetail.tsx Feature-level detail view (distribution + stats + null-rate timeline)
ui/src/pages/monitoring/components/StatsPanel.tsx Stats panel for a single feature metric (with baseline comparison)
ui/src/pages/monitoring/components/MetricsFilters.tsx Filters UI for monitoring queries
ui/src/pages/monitoring/components/HistogramChart.tsx SVG histogram rendering for numeric/categorical features
ui/src/pages/features/FeatureMonitoringTab.tsx Adds Monitoring tab content on feature detail pages
ui/src/pages/features/FeatureInstance.tsx Adds “Monitoring” tab to feature instance navigation/routes
ui/src/FeastUISansProviders.tsx Wires Monitoring routes and Monitoring context into the UI app
ui/src/contexts/MonitoringContext.ts New context for monitoring API base URL and enable flag
ui/package-lock.json Updates UI package lock (including version bump)
sdk/python/tests/unit/monitoring/test_metrics_calculator.py Unit tests for metrics calculator + NaN/Inf sanitization
sdk/python/tests/unit/monitoring/init.py Adds unit test package init for monitoring
sdk/python/tests/integration/monitoring/init.py Adds integration test package init for monitoring
sdk/python/feast/repo_config.py Adds DqmConfig and dqm repo config field
sdk/python/feast/monitoring/monitoring_utils.py Shared monitoring constants + helpers for normalization/aggregation
sdk/python/feast/monitoring/metrics_calculator.py PyArrow/NumPy fallback metrics calculator
sdk/python/feast/monitoring/dqm_job_manager.py DQM job persistence/status manager using offline store storage
sdk/python/feast/monitoring/init.py Exposes monitoring public API symbols
sdk/python/feast/metrics.py Adds offline retrieval metrics + structured audit logging helpers
sdk/python/feast/infra/offline_stores/offline_store.py Adds monitoring compute/storage abstract methods; adds offline retrieval instrumentation
sdk/python/feast/infra/offline_stores/duckdb.py Implements monitoring compute + parquet-backed storage for DuckDB
sdk/python/feast/infra/offline_stores/dask.py Implements monitoring compute + parquet-backed storage for Dask
sdk/python/feast/infra/offline_stores/contrib/spark_offline_store/spark.py Implements monitoring compute + SparkSQL storage for Spark
sdk/python/feast/infra/offline_stores/contrib/oracle_offline_store/oracle.py Implements monitoring compute + Oracle storage via MERGE
sdk/python/feast/infra/feature_servers/base_config.py Adds new metrics config flags: offline_features + audit_logging
sdk/python/feast/feature_server.py Emits online audit logs around get-online-features calls
sdk/python/feast/cli/monitor.py Adds feast monitor run CLI for batch/log monitoring compute
sdk/python/feast/cli/cli.py Registers the new monitor CLI command group
sdk/python/feast/api/registry/rest/monitoring.py Adds FastAPI router for monitoring compute/read endpoints
sdk/python/feast/api/registry/rest/init.py Registers monitoring router with the registry REST API
Makefile Avoids recreating .venv in CI install target
infra/feast-operator/internal/controller/services/services_types.go Adds DQM YAML config struct to operator repo config
infra/feast-operator/internal/controller/services/repo_config.go Maps operator DQM spec to repo config YAML
infra/feast-operator/internal/controller/services/repo_config_test.go Tests operator repo config YAML includes dqm.auto_baseline
infra/feast-operator/docs/api/markdown/ref.md Documents operator DQM config API fields
infra/feast-operator/dist/install.yaml Updates CRD schema with spec.dqm.autoBaseline
infra/feast-operator/config/samples/v1_featurestore_serving.yaml Documents new metrics flags in sample config
infra/feast-operator/config/crd/bases/feast.dev_featurestores.yaml Updates CRD base schema with DQM config
infra/feast-operator/api/v1/zz_generated.deepcopy.go Adds deepcopy support for DQM config
infra/feast-operator/api/v1/featurestore_types.go Adds dqm field + type to operator API
docs/SUMMARY.md Adds links to monitoring quickstart and how-to guide
docs/reference/feature-servers/python-feature-server.md Documents offline retrieval metrics + audit logging
docs/how-to-guides/feature-monitoring.md New how-to guide for feature monitoring
.secrets.baseline Updates secrets baseline for new notebook content
Files not reviewed (2)
  • infra/feast-operator/api/v1/zz_generated.deepcopy.go: Language not supported
  • ui/package-lock.json: Language not supported
Comments suppressed due to low confidence (3)

ui/src/FeastUISansProviders.tsx:161

  • The routing JSX appears malformed (nested duplicate /p/:projectName/* Route blocks and inconsistent indentation), suggesting one of the <Route> elements isn’t being properly closed before sibling routes are declared. This will either fail compilation or produce an unexpected route hierarchy; please re-check the <Route> nesting and ensure each opened <Route> is closed before adding siblings like data-set/, permissions/, monitoring/, etc.

This issue also appears on line 221 of the same file.
ui/src/FeastUISansProviders.tsx:226

  • The provider closing tags are unbalanced here: </FeatureFlagsContext.Provider> is present but there is no corresponding <FeatureFlagsContext.Provider> opening tag in this file, and DataModeContext.Provider (opened above) is never closed. This will break compilation and/or context propagation—please fix the provider nesting and ensure every opened provider is properly closed.
    ui/src/queries/useMonitoringApi.ts:223
  • useComputeMetrics POST to /monitoring/compute also ignores fetchOptions/credentials used elsewhere in the UI. If the registry server is protected via cookies or auth headers, the compute call may fail. Consider passing through the same headers/credentials strategy used by restFetch for consistency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +110 to +118
const qs = buildQueryString(params);
const res = await fetch(`${baseUrl}${path}${qs}`);
if (!res.ok) {
throw new Error(`Failed to fetch ${path}: ${res.status} ${res.statusText}`);
}
const text = await res.text();
const sanitized = text.replace(/:\s*NaN/g, ": null").replace(/:\s*Infinity/g, ": null").replace(/:\s*-Infinity/g, ": null");
return JSON.parse(sanitized);
};
Comment on lines +225 to +231
{
onSuccess: () => {
queryClient.invalidateQueries("monitoring-features");
queryClient.invalidateQueries("monitoring-feature-views");
queryClient.invalidateQueries("monitoring-feature-services");
},
},
Comment on lines +164 to +184
const useBaselineMetrics = (
project: string,
featureViewName?: string,
featureName?: string,
dataSourceType?: string,
) => {
const { apiBaseUrl, enabled } = useContext(MonitoringContext);
return useQuery<FeatureMetric[]>(
["monitoring-baseline", project, featureViewName, featureName],
() =>
fetchMonitoring<FeatureMetric[]>(
apiBaseUrl,
"/monitoring/metrics/baseline",
{
project,
feature_view_name: featureViewName,
feature_name: featureName,
data_source_type: dataSourceType,
},
),
{ staleTime: STALE_TIME, enabled, retry: 1 },
Comment on lines +97 to +101
const hasError =
featureQuery.isError && fvQuery.isError && fsQuery.isError;
const hasData =
(featureQuery.data && featureQuery.data.length > 0) ||
(fvQuery.data && fvQuery.data.length > 0);
Comment on lines +209 to +214
<h4 style={{ fontSize: 14, fontWeight: 600, marginBottom: 8 }}>
Null Rate Over Time
</h4>
<svg width={chartWidth} height={chartHeight + 20} role="img">
<polyline
points={polyline}
Comment on lines +116 to +139
if job_type == "auto_compute":
result = monitoring_service.auto_compute(
project=project,
feature_view_name=job.get("feature_view_name"),
)
elif job_type == "baseline":
result = monitoring_service.compute_baseline(
project=project,
feature_view_name=job.get("feature_view_name"),
feature_names=params.get("feature_names"),
)
elif job_type == "compute":
result = monitoring_service.compute_metrics(
project=project,
feature_view_name=job.get("feature_view_name"),
feature_names=params.get("feature_names"),
start_date=date.fromisoformat(params["start_date"])
if params.get("start_date")
else None,
end_date=date.fromisoformat(params["end_date"])
if params.get("end_date")
else None,
granularity=params.get("granularity", "daily"),
)
Comment on lines +97 to +112
float_array = pc.cast(valid, pa.float64())
result["mean"] = _safe_float(pc.mean(float_array).as_py()) # type: ignore[attr-defined]
result["stddev"] = _safe_float(pc.stddev(float_array, ddof=1).as_py()) # type: ignore[attr-defined]

min_max = pc.min_max(float_array) # type: ignore[attr-defined]
result["min_val"] = min_max["min"].as_py()
result["max_val"] = min_max["max"].as_py()

quantiles = pc.quantile(float_array, q=[0.50, 0.75, 0.90, 0.95, 0.99]) # type: ignore[attr-defined]
q_values = quantiles.to_pylist()
result["p50"] = q_values[0]
result["p75"] = q_values[1]
result["p90"] = q_values[2]
result["p95"] = q_values[3]
result["p99"] = q_values[4]

Comment on lines +75 to +92
@router.post("/monitoring/compute", tags=["Monitoring"])
async def compute_metrics(request: ComputeMetricsRequest):
"""Submit a DQM job to compute and store metrics. Returns job_id."""
if request.granularity not in VALID_GRANULARITIES:
raise HTTPException(
status_code=400,
detail=f"Invalid granularity '{request.granularity}'. "
f"Must be one of {VALID_GRANULARITIES}",
)

store = _get_store()
if request.feature_view_name:
fv = store.registry.get_feature_view(
name=request.feature_view_name, project=request.project
)
assert_permissions(fv, actions=[AuthzedAction.UPDATE])

svc = _get_monitoring_service()
Comment on lines +105 to +114
const fetchMonitoring = async <T>(
baseUrl: string,
path: string,
params: Record<string, string | undefined>,
): Promise<T> => {
const qs = buildQueryString(params);
const res = await fetch(`${baseUrl}${path}${qs}`);
if (!res.ok) {
throw new Error(`Failed to fetch ${path}: ${res.status} ${res.statusText}`);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants