#1 Accuracy.
Every Dataset.
Tested on 5 canonical academic datasets against 7 competing algorithms — HOPE, NetMF, GraRep, DeepWalk, Node2Vec, ProNE, RandNE — Cleora wins on accuracy on every single dataset, and is the only algorithm that scales to every graph without crashing.
The Algorithm That Shouldn't Exist
Every other library needs random walks, negative sampling, and GPU clusters to approximate what Cleora computes exactly — with a single sparse matrix power on one CPU core. The result? Highest accuracy on real-world graphs where others score single digits.
Sparse Markov Matrix
Constructs a sparse transition matrix from your input graph. Handles heterogeneous hypergraphs with typed, multi-relational edges natively.
Matrix Powers = All Walk Distributions
Each iteration multiplies the embedding matrix by the sparse transition matrix — Mk captures the full distribution of all walks of length k. No sampling, no noise, no stochastic approximation. This is what makes Cleora deterministic and orders of magnitude faster.
L2-Normalized Propagation
Each iteration replaces every node's embedding with the L2-normalized average of its neighbors' embeddings. 3-4 iterations for co-occurrence similarity, 7+ for contextual similarity like skip-gram.
What Makes Cleora Different
No Sampling, No Training
Unlike DeepWalk, Node2Vec, and LINE, Cleora eliminates both random walk sampling AND skip-gram training entirely. It captures all walk distributions exactly via matrix powers. No noise, perfect reproducibility.
240x Faster Than GraphSAGE
Zomato reported embedding generation in under 5 minutes with Cleora, compared to 20 hours with GraphSAGE on the same dataset. Rust core with adaptive parallelism makes every CPU cycle count.
Deterministic Embeddings
Same input always produces the same output. Deterministic by default — no stochastic variation, no "run it 5 times and average" workflows. Critical for reproducible research and production ML pipelines.
Heterogeneous Hypergraphs
Natively handles multi-type nodes and edges, bipartite graphs, and hypergraphs. TSV input with typed columns like complex::reflexive::product. No graph preprocessing needed.
5 MB, No Heavy Dependencies
The entire library is ~5 MB with only numpy and scipy. Compare: PyTorch Geometric is 500 MB+, DGL is 400 MB+. Cleora ships as a single compiled Rust extension. No CUDA, no cuDNN, no GPU driver headaches.
Stable & Inductive
Embeddings are stable across runs and support inductive learning: new nodes can be embedded without retraining the entire graph. Production-ready from day one.
How Zomato Replaced GraphSAGE with Cleora
From 20 hours to under 5 minutes — powering recommendations for 80M+ users across 500+ cities
The Problem
Zomato's ML team needed graph embeddings to power "People Like You" restaurant recommendations. Their initial approach with GraphSAGE took ~20 hours just to process customer-restaurant interaction data for a single city region — making it impossible to scale across 500+ cities.
Customer-Restaurant Graph
Bipartite graph of customer orders and restaurant interactions across the Zomato platform
Cleora Embeddings < 5 minutes
240x faster than GraphSAGE, 240x faster than DeepWalk (as measured by Zomato). No walk sampling, no skip-gram training. Purely structure-based — iterative weighted averaging of neighbor embeddings + L2 normalization.
EMDE Density Estimation
Customer preferences modeled as probability density functions. Locality-sensitive hashing compresses multiple embedding vectors into single representations.
Production Recommendations
Restaurant recommendations, search ranking, dish suggestions, and "People Like You" lookalikes — all powered by Cleora embeddings across 500+ cities.
From Raw Graph to Embeddings in Seconds
A deterministic pipeline that replaces random walks, skip-gram, and GPU training with pure linear algebra.
Input Data
Feed edge lists, interaction logs, or knowledge triples. Cleora accepts any TSV with typed columns — entities, relations, and modifiers in a single file.
Hypergraph Construction
Builds a heterogeneous hypergraph where a single edge can connect multiple entities of different types. No bipartite projections needed.
Sparse Markov Matrix
Constructs a sparse transition matrix from the graph. Rows are normalized so each row sums to 1 — a proper Markov chain over the entity space.
Matrix Power = All Walk Distributions
Each iteration applies one sparse matrix power — Mk captures the full distribution of all walks of length k. No sampling, no noise — this is what makes Cleora deterministic and fast.
L2-Normalized Propagation
Each iteration replaces every node's embedding with the L2-normalized average of its neighbors. 3-4 iterations for co-occurrence similarity, 7+ for contextual similarity.
Embeddings Ready
Dense, deterministic embedding vectors for every entity — ready for downstream ML. Same input always yields same output, guaranteed reproducibility.
Everything You Need in One Package
Minimal dependencies (just numpy + scipy). No GPU. Production-ready graph embeddings.
7 Alternative Algorithms
ProNE, RandNE, HOPE, NetMF, GraRep, DeepWalk, Node2Vec — all included as comparison baselines under one API. Cleora is faster and leaner than every one of them, and beats accuracy across every benchmark.
MLP Classifier
MLP classifier and Label Propagation included — pure numpy/scipy, no PyTorch, no GPU. Evaluate embedding quality directly without external dependencies.
Rust-Powered Core
Sparse matrix operations in Rust with PyO3 bindings. Adaptive parallelism. 10-100x faster than pure Python implementations.
Rich Evaluation Suite
AUC, MRR, Hits@K, MAP@K, nDCG, ARI, Silhouette Score, and k-fold cross-validation. Evaluate without leaving the library.
Graph Sampling
Neighborhood, subgraph, and GraphSAINT mini-batching. Negative sampling and train/test edge splits for scalable link prediction.
Heterogeneous Graphs
Multi-type nodes and edges. Per-relation embedding, metapath-based embedding, and homogeneous export. Real-world data doesn't fit in simple graphs.
Hyperparameter Tuning
Grid search and random search with automatic evaluation. Find the optimal embedding configuration in one call across all 7 alternative algorithms.
Benchmarking Suite
Compare all 7 alternative algorithms against Cleora with time, memory, and accuracy metrics. Benchmark on your own graphs or use the 5 built-in graph generators. Publication-ready formatted tables included.
CLI Tool
pycleora embed --input graph.tsv --dim 1024 for scripting and CI/CD pipelines. Embed graphs without writing Python.
8 Algorithms. 5 Datasets. Honest Results.
Every dataset below is a genuine academic benchmark — from SNAP, Planetoid, and DGL. We test against 7 competing algorithms (HOPE, NetMF, GraRep, DeepWalk, Node2Vec, ProNE, RandNE). Cleora wins on accuracy on every single dataset while using 10–24x less memory than accuracy-competitive methods.
ego-Facebook
Cora
CiteSeer
PubMed
PPI
roadNet-CA
Memory: Cleora Uses 10–50x Less Than Competitors
100% Free. 100% Accurate. 100% Yours.
Cleora is open-source software, free to use, modify, and deploy — no license fees, no API keys, no usage limits. Run it on your laptop, your server, or a cloud instance. Here's what the infrastructure costs look like when you do deploy:
16 Libraries. One Winner.
We compared pycleora against every major graph embedding library. The result is unambiguous.
| Feature | pycleora 3.2 | PyG | KarateClub | Original Cleora | DGL | Node2Vec | StellarGraph | GEM | GraphVite | DeepWalk | LINE | SDNE | graspologic | GraphSAGE | Struc2Vec | VERSE | NetSMF |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CPU-only (no GPU needed) | Yes | Optional | Yes | Yes | Optional | Yes | Optional | Yes | No (GPU) | Yes | Yes | Optional | Yes | Optional | Yes | Yes | Yes |
| Rust-powered core | Yes | No (C++) | No | Yes | No (C++) | No | No (TF) | No | No (C++) | No | No (C++) | No | No | No | No | No (C++) | No (C++) |
| No negative sampling needed | Yes | No | No | Yes | No | No | No | Partial | No | No | No | Yes | Yes | No | No | No | Yes |
| Deterministic output | Yes | No | No | Yes | No | No | No | No | No | No | No | No | Partial | No | No | No | No |
| Node2Vec / DeepWalk | Built-in | Yes | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes | No | No | No | No | No | No | No |
| Built-in classifiers (no PyTorch) | MLP + Label Propagation | Requires PyTorch | No | No | Requires PyTorch | No | Requires TF | No | No | No | No | No | No | Requires TF | No | No | No |
| Graph generators | 5 | Some | No | No | Some | No | No | No | No | No | No | No | No | No | No | No | No |
| Graph sampling | 6 methods | Yes | No | No | Yes | No | Yes | No | Yes | No | No | No | No | Yes | No | No | No |
| Hyperparameter tuning | Grid + Random | Manual | No | No | Manual | No | Manual | No | No | No | No | No | No | No | No | No | No |
| Install size | ~5 MB | ~500 MB+ | ~15 MB | ~3 MB | ~400 MB+ | ~2 MB | ~600 MB+ | ~50 MB | ~200 MB+ | ~5 MB | ~5 MB | ~300 MB+ | ~50 MB | ~500 MB+ | ~5 MB | ~5 MB | ~10 MB |
| Multi-GPU support | Not Needed | Yes | No | No | Yes | No | Limited | No | Yes | No | No | No | No | No | No | No | No |
| Actively maintained | Yes | Yes | Yes | Minimal | Yes | Yes | Archived | Inactive | Inactive | Inactive | Inactive | Inactive | Yes | Inactive | Inactive | Inactive | Inactive |
Feature comparison only. Performance benchmarks are on the Benchmarks page (5 real-world datasets from SNAP, Planetoid & DGL + 1 scale test).
From Edges to Embeddings in 5 Lines
from pycleora import SparseMatrix, embed, find_most_similar
# Build graph from edge list
edges = ["alice item_laptop", "alice item_mouse", "bob item_keyboard"]
graph = SparseMatrix.from_iterator(iter(edges), "complex::reflexive::product")
# Generate 1024-dimensional embeddings
embeddings = embed(graph, feature_dim=1024, num_iterations=4)
# Find similar entities
similar = find_most_similar(graph, embeddings, "alice", top_k=5)
for r in similar:
print(f"{r['entity_id']}: {r['similarity']:.4f}")
Ready to Embed Your Graph?
Join Zomato, Dailymotion, Synerise, and ML teams worldwide using Cleora in production. Install in seconds, embed in minutes.
pip install pycleora