feat: HyperANF implementation by SemyonSinchenko · Pull Request #841 · graphframes/graphframes

SemyonSinchenko · 2026-06-02T12:58:14Z

What changes were proposed in this pull request?

HyperANF
bump versions in CI matrix

Why are the changes needed?

Close #840

- HyperANF - bump versions in CI matrix

SemyonSinchenko · 2026-06-02T15:32:45Z

@james-willis I will add python bindings (connect/classic) and docs after pre-approve of the API design.

I was thinking a lot... It looks like this one is still neccessary.

SemyonSinchenko · 2026-06-02T18:03:08Z

The final goals are HyperBALL, approximate closeness centrality, etc. All of these are just simple transformations on top of the HyoerANF. Should I add these in the same PRs or in follow-up PRs?

james-willis · 2026-06-09T18:06:34Z

+    val hop0func = udf(HyperANF.hll(lgNomEntries))
+    var state = edges
+      .groupBy(col(GraphFrame.SRC).alias(GraphFrame.ID))
+      .agg(hll_sketch_agg(GraphFrame.DST, lgNomEntries).alias("hop_1"))


Is it important to make sure we always pass the same type into the hll sketch functions? on 157 we convert to string so maybe we should do that here as well

related to this, how does this function deal with cycles? do we need a test for this case where there is a cycle to the hop 0 node?

Cycles are not a problem. We are limited by hHops. When user do union + estimate all the cycles will be gone.

james-willis · 2026-06-09T18:12:42Z

 publishArtifact := false

 lazy val commonSetting = Seq(
  libraryDependencies ++= Seq(


Please add org.apache.datasketches.hll.HllSketch here

Why? I mean it is a part of the Spark Runtime.

james-willis · 2026-06-09T18:16:49Z

+          col(GraphFrame.DST) === col(GraphFrame.ID),
+          "left")
+        .groupBy(col(GraphFrame.SRC).alias(GraphFrame.ID))
+        .agg(hll_union_agg(s"hop_${hop - 1}").alias(s"hop_${hop}"))


hll_union_agg(s"hop_${hop - 1}") will return null if hop_n is null. we should probably handle this with a coalesce to some null sketch?

Tbh I don't see how it can be null. It can be empty and this is handled correctly. But how can it be null (except some vertex-id is null?) P.S. null vertex IDs are considered as an invalid graph: at the moment most of GF algorithms will just fail on null-ids and handling it is very expensive (full-scan).

Let me check this.

You are right: there can be nulls. From the other side.

This code:

( spark .createDataFrame([(1, None), (1, None), (1, None)], schema="k: int, v: binary") .toDF("k", "v") .groupBy("k") .agg(F.hll_union_agg("v").alias("v")) .select(F.hll_sketch_estimate("v").alias("v")) .show() )

returns

+---+ | v| +---+ | 0| +---+

so it is not a problem actually.

I added a test for that case.

SemyonSinchenko added 2 commits June 2, 2026 14:55

feat: HyperANF implementation

8268e2a

- HyperANF - bump versions in CI matrix

feat: add a top-level entry point

d6f6150

SemyonSinchenko changed the title ~~[WIP] feat: HyperANF implementation~~ feat: HyperANF implementation Jun 2, 2026

SemyonSinchenko requested a review from james-willis June 2, 2026 15:32

SemyonSinchenko added 2 commits June 2, 2026 17:44

fix: typo

e41ab79

feat: hop0

c7eb69c

I was thinking a lot... It looks like this one is still neccessary.

SemyonSinchenko self-assigned this Jun 9, 2026

james-willis requested changes Jun 9, 2026

View reviewed changes

SemyonSinchenko added 3 commits June 11, 2026 12:11

Merge remote-tracking branch 'graphframes/main' into 840-hyperANF

c2fddff

fix: addressing comments

71e4044

fix: addressing comments

beb088b

SemyonSinchenko requested a review from james-willis June 11, 2026 14:29

Conversation

SemyonSinchenko commented Jun 2, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Uh oh!

SemyonSinchenko commented Jun 2, 2026

Uh oh!

SemyonSinchenko commented Jun 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants