Benchmark Results

We benchmarked 8 algorithms across 5 real-world datasets plus 1 scale test. Every dataset is a genuine academic benchmark from SNAP, Planetoid, or DGL. Cleora wins on accuracy on every single dataset while using 10–24x less memory than accuracy-competitive methods. Dense and random-walk methods fail catastrophically on larger graphs.

All real data. ego-Facebook is downloaded from SNAP. Cora, CiteSeer, and PubMed are downloaded from Planetoid (Yang et al., 2016). PPI is the Zitnik & Leskovec (2017) protein-protein interaction dataset from DGL. roadNet-CA is downloaded from SNAP. See Methodology for details.

Summary Table

Best accuracy per dataset. Green = best on that dataset. "OOM" = out of memory. "Timed Out" = exceeded time budget.

Note: † = embedding computed successfully (speed/memory reported below) but accuracy not measured (no ground-truth labels for roadNet-CA). T/O = Timed Out (>90s). OOM = Out of Memory (killed by OS). HOPE, NetMF, GraRep, DeepWalk, and Node2Vec all fail on datasets with ≥19.7K nodes.

ego-Facebook (4,039 nodes, 18 Louvain communities)

Facebook ego network from SNAP (~4K nodes, ~88K edges). Community labels detected via Louvain. All 8 algorithms benchmarked with 256-dimensional embeddings:

AlgorithmAccuracyMacro F1TimeMemory
Cleora0.9900.9891.23s22 MB
DeepWalk0.9580.95659.2s572 MB
Node2Vec0.9580.95667.9s572 MB
NetMF0.9570.95828.8s1,098 MB
HOPE0.8900.90531.5s857 MB
RandNE0.2120.1810.07s42 MB
ProNE0.0750.0560.26s67 MB
GraRepTimed Out — dense SVD per k-step exceeds 90s budget
Cleora leads on Facebook. 99.0% accuracy — beating Node2Vec (0.958) and NetMF (0.957) while using 50x less memory (22 MB vs 1,098 MB). GraRep can't even finish.

Cora (2,708 nodes, 7 classes)

Citation network from Planetoid (Yang et al., 2016). 2,708 ML papers across 7 subject areas, 10,858 citation edges:

AlgorithmAccuracyMacro F1TimeMemory
Cleora0.8610.8581.03s14 MB
NetMF0.8390.8364.23s332 MB
DeepWalk0.8350.83324.1s227 MB
Node2Vec0.8350.83325.8s227 MB
HOPE0.8210.81815.97s330 MB
GraRep0.8090.80616.4s322 MB
RandNE0.2470.2460.03s24 MB
ProNE0.1790.1780.13s40 MB
Cleora wins on Cora (0.861) — beating NetMF (0.839) while using 24x less memory (14 MB vs 332 MB). Best accuracy and smallest memory footprint.

CiteSeer (3,312 nodes, 6 classes)

Citation network from Planetoid. 3,312 CS papers across 6 subject areas, 9,464 citation edges:

AlgorithmAccuracyMacro F1TimeMemory
Cleora0.8240.8220.99s16 MB
NetMF0.8100.8106.58s335 MB
DeepWalk0.8060.80629.3s294 MB
Node2Vec0.8060.80629.6s294 MB
GraRep0.7560.75627.3s411 MB
HOPE0.7400.74019.6s430 MB
RandNE0.2440.2440.02s27 MB
ProNE0.1890.1880.14s45 MB
Cleora wins on CiteSeer (0.824) — beating NetMF (0.810) while using 21x less memory (16 MB vs 335 MB). All dense methods fail on PubMed (~6x more nodes).

PubMed (19,717 nodes, 3 classes)

Citation network from Planetoid. 19,717 diabetes papers across 3 categories, 88,676 citation edges:

AlgorithmAccuracyMacro F1TimeMemory
Cleora0.8790.8781.40s97 MB
RandNE0.3510.3510.22s175 MB
ProNE0.3390.3390.75s291 MB
HOPETimed Out — sparse inverse too slow at 19.7K nodes
NetMFOOM — requires O(n²) dense matrix (19.7K² × 8 bytes ≈ 3.1 GB)
GraRepOOM — dense SVD per k-step
DeepWalkTimed Out — random walks too slow at this scale
Node2VecTimed Out — random walks too slow at this scale
Only 3 of 8 algorithms survive at 19.7K nodes. HOPE, NetMF, GraRep, DeepWalk, and Node2Vec all crash or time out. Cleora dominates with 0.879 accuracy — 2.5x the runner-up (RandNE at 0.351).

PPI (3,890 nodes, 50 classes)

Protein-protein interaction graph. 3,890 proteins, 76,584 edges, 50 functional classes:

AlgorithmAccuracyMacro F1TimeMemory
Cleora1.0001.0001.23s21 MB
RandNE0.0730.0700.07s40 MB
ProNE0.0230.0211.45s64 MB
HOPETimed Out
NetMFOOM
GraRepOOM
DeepWalkTimed Out
Node2VecTimed Out
Perfect accuracy on PPI. Cleora achieves 1.000 on PPI with 50 classes. Only 3 of 8 algorithms even complete — HOPE, NetMF, GraRep, DeepWalk, and Node2Vec all fail with OOM or timeout.

roadNet-CA (1,965,206 nodes, speed/memory only)

California road network from SNAP (~2M nodes, ~5.5M edges). No ground-truth community labels, so only speed and memory metrics are reported:

AlgorithmTimeMemory
Cleora31.5s4,129 MB
RandNEOOM
ProNEOOM
HOPEOOM
NetMFOOM
GraRepOOM
DeepWalkOOM
Node2VecOOM
2 million nodes. 31 seconds. Every other algorithm crashes with out-of-memory. Cleora is the only library that survives at this scale on a single CPU. The cost? Less than two cents on a standard cloud instance.

Speed Comparison

Embedding time across real datasets (256 dim). Dashed bars indicate algorithms that failed (OOM or Timed Out):

Memory Usage

Peak memory footprint per algorithm across real datasets (256 dim). Dashed bars indicate OOM failures:

Accuracy vs Speed Tradeoff

Scatter plot showing how each algorithm trades off accuracy against embedding time across all real datasets with labels (256 dim):

When to Use What

Use Cleora when: You want the best accuracy and lowest memory across the board. Cleora wins on accuracy on every single dataset while using 10–24x less memory than accuracy-competitive methods. It's also the only algorithm that completes on every dataset.
Consider NetMF when: You have small graphs (<5K nodes) and want a second opinion. NetMF is competitive on Cora (0.839) and CiteSeer (0.810) but requires O(n²) dense memory, making it infeasible beyond ~15K nodes and 4–7x slower than Cleora even where it works.
Consider DeepWalk/Node2Vec when: You want random-walk baselines on small graphs. They're competitive on accuracy (0.835 on Cora) but extremely slow (24–68s vs Cleora's ~1s) and time out on PubMed and PPI.
Consider HOPE when: You want spectral proximity embeddings on very small graphs. HOPE is accurate on Cora (0.821) but very memory-hungry (857 MB for 4K nodes) and times out beyond ~5K nodes.

Methodology

  • Datasets: All datasets are downloaded from their canonical academic sources at runtime. ego-Facebook and roadNet-CA from SNAP. Cora, CiteSeer, and PubMed from Planetoid (Yang et al., ICML 2016). PPI from DGL (Zitnik & Leskovec, Bioinformatics 2017). No synthetic data is used.
  • Algorithms: Cleora, ProNE, RandNE, HOPE, NetMF, GraRep, DeepWalk, and Node2Vec. All use 256-dimensional embeddings. Cleora uses 40 iterations; all other algorithms use default hyperparameters.
  • Labels: ego-Facebook uses Louvain community detection (seed=42). Cora/CiteSeer/PubMed use paper category labels. PPI uses functional class labels.
  • Evaluation: Nearest Centroid classifier with 80/20 train/test split (seed=42).
  • Failure modes: "OOM" means the process was killed by the OS due to exceeding available memory. "Timed Out" means the algorithm exceeded a 90-second time budget. Both indicate that the algorithm is not feasible for that dataset size.
  • Hardware: All benchmarks run on the same machine (Replit cloud instance, shared vCPUs, limited RAM). Times are wall-clock. Memory is peak RSS via tracemalloc.