Use Cases & Tutorials

Real-world applications of graph embeddings with pycleora. Each use case includes a working code example.

Recommendation Systems

Graph embeddings transform user-item interactions into dense vectors where proximity encodes preference similarity. Cleora's deterministic output and sub-minute embedding time make it ideal for production recommendation pipelines that need daily retraining.

How It Works

Model users and items as nodes in a bipartite graph. Each purchase, view, or rating becomes an edge. Cleora propagates structural information so that users who interact with similar items get similar embeddings — and items consumed by similar users cluster together.

  • Collaborative filtering — find users with similar taste profiles
  • Item similarity — "customers who bought X also bought Y"
  • Cold-start mitigation — new items with even 1-2 interactions get meaningful embeddings through neighbor propagation
from pycleora import SparseMatrix, embed, find_most_similar

# User-item interaction edges
edges = [
    "user_1 product_laptop", "user_1 product_mouse",
    "user_2 product_laptop", "user_2 product_keyboard",
    "user_3 product_phone",  "user_3 product_case",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::product"
)
embeddings = embed(graph, feature_dim=64, num_iterations=4)

# Find products similar to "product_laptop"
similar = find_most_similar(graph, embeddings, "product_laptop", top_k=5)
for r in similar:
    print(f"{r['entity_id']}: {r['similarity']:.4f}")

Fraud Detection

Fraudulent accounts often share devices, IPs, phone numbers, or payment methods with other fraudulent accounts. Graph embeddings capture these transitive relationships even when direct signals are weak.

How It Works

Build a heterogeneous graph connecting accounts to their devices, IPs, emails, and payment instruments. Cleora embeds all entity types jointly — fraudulent accounts that share infrastructure with known bad actors will cluster together in embedding space, even across 3-4 hops of indirection.

  • Account takeover — detect when legitimate accounts start sharing infrastructure with fraud rings
  • Synthetic identity — catch fabricated identities that reuse real SSNs, addresses, or phone numbers
  • Money laundering — trace layered transactions through intermediary accounts
# Heterogeneous fraud graph: accounts, devices, IPs
edges = [
    "account_1 device_abc ip_1.2.3.4",
    "account_2 device_abc ip_5.6.7.8",  # shares device with account_1
    "account_3 device_xyz ip_5.6.7.8",  # shares IP with account_2
    "account_4 device_xyz ip_9.0.1.2",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=32, num_iterations=5)

# Accounts sharing infrastructure cluster together
from pycleora import cosine_similarity
idx1 = graph.get_entity_index("account_1")
idx3 = graph.get_entity_index("account_3")
print(f"Similarity: {cosine_similarity(embeddings[idx1], embeddings[idx3]):.4f}")

Social Network Analysis

Social networks are natural graphs. Embeddings capture community structure, influence patterns, and user similarity without manual feature engineering.

Applications

  • Community detection — Cleora embeddings + clustering reveal natural communities without explicit label data
  • Influence scoring — central users get embeddings distinct from peripheral users
  • Friend suggestions — link prediction using embedding similarity
  • Content recommendation — users with similar social neighborhoods see similar content
from pycleora import SparseMatrix, embed
from pycleora.community import detect_communities_louvain
from pycleora.datasets import load_dataset

# Load Facebook social network (4,039 nodes)
data = load_dataset("facebook")
graph = SparseMatrix.from_iterator(
    iter(data["edges"]), "complex::reflexive::user"
)
embeddings = embed(graph, feature_dim=1024, num_iterations=4)

# Discover communities
communities = detect_communities_louvain(graph)
print(f"Found {len(set(communities.values()))} communities")

Knowledge Graphs

Knowledge graphs encode facts as (subject, predicate, object) triples. Embedding these graphs enables relation prediction, entity typing, and question answering.

Applications

  • Link prediction — predict missing facts ("Who directed Inception?")
  • Entity resolution — match entities across different knowledge bases
  • Semantic search — find entities related to a query concept
# Knowledge graph: (entity, relation, entity) triples
edges = [
    "inception directed_by nolan",
    "inception genre_scifi",
    "interstellar directed_by nolan",
    "interstellar genre_scifi",
    "tenet directed_by nolan",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=64, num_iterations=4)

# Predict links: what else might Nolan direct?
from pycleora import predict_links
predictions = predict_links(graph, embeddings, top_k=5, source_entities=["nolan"])

Entity Resolution

When the same real-world entity appears under different names across datasets, graph embeddings can match them by structural similarity rather than string matching.

How It Works

Connect entities to their attributes (address fragments, phone prefixes, email domains). Entities sharing multiple attributes will have similar embeddings even if their names differ completely. This captures the "guilt by association" pattern that string matching cannot.

Drug Discovery

Molecular interaction networks, protein-protein interaction graphs, and drug-target networks are natural applications for graph embeddings. Cleora's speed enables rapid iteration over large biomedical graphs.

Applications

  • Drug repurposing — find existing drugs that may treat new diseases based on network proximity
  • Side effect prediction — drugs with similar target profiles share similar side effects
  • Protein function prediction — annotate uncharacterized proteins based on their interaction partners

E-commerce / Retail

Product catalogs, customer journeys, and purchase histories form rich graphs. Graph embeddings let retailers personalize at scale — from cross-selling to dynamic pricing — by encoding the latent structure of who buys what, when, and alongside which other products.

Key Applications

  • Catalog personalization — embed products and customers jointly so each visitor sees a storefront tailored to their purchase neighborhood
  • Cross-selling & bundling — find products frequently co-purchased by similar customer clusters
  • Dynamic pricing — use embedding similarity to identify substitute products and model price elasticity across the graph
  • Customer segmentation — cluster customers by behavioral similarity without hand-crafted features
from pycleora import SparseMatrix, embed, find_most_similar

# Customer-product purchase graph with session context
edges = [
    "customer_1 sku_shoes sku_socks session_morning",
    "customer_1 sku_jacket session_evening",
    "customer_2 sku_shoes sku_belt session_morning",
    "customer_3 sku_jacket sku_scarf session_evening",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=64, num_iterations=4)

# Cross-sell: what pairs well with shoes?
similar = find_most_similar(graph, embeddings, "sku_shoes", top_k=5)
for r in similar:
    print(f"{r['entity_id']}: {r['similarity']:.4f}")

Healthcare / Medicine

Medical data is inherently relational — drugs interact with other drugs, symptoms co-occur, patients follow similar diagnostic paths. Graph embeddings capture these multi-hop clinical relationships that tabular models miss entirely.

Key Applications

  • Drug-drug interaction prediction — embed drugs and their targets to predict adverse interactions before they occur
  • Symptom-based diagnostics — map symptom-disease-treatment relationships for differential diagnosis support
  • Patient journey mapping — embed patient trajectories to find cohorts with similar disease progression
  • Contact tracing / epidemiology — model infection spread through contact networks to identify super-spreader nodes
from pycleora import SparseMatrix, embed, find_most_similar

# Drug-drug interaction network
edges = [
    "drug_aspirin target_cox1 target_cox2",
    "drug_ibuprofen target_cox1 target_cox2",
    "drug_warfarin target_vkorc1",
    "drug_aspirin interaction_bleeding drug_warfarin",
    "drug_metformin target_ampk pathway_glucose",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=64, num_iterations=5)

# Find drugs with similar interaction profiles to aspirin
similar = find_most_similar(graph, embeddings, "drug_aspirin", top_k=5)
for r in similar:
    print(f"{r['entity_id']}: {r['similarity']:.4f}")

Supply Chain / Logistics

Supply chains are directed graphs of suppliers, factories, warehouses, and customers. Embedding these networks reveals hidden dependencies, bottleneck risks, and optimization opportunities that traditional ERP systems cannot surface.

Key Applications

  • Route optimization — embed logistics networks to find efficient paths considering multi-hop transfers
  • Disruption prediction — suppliers with similar network positions carry similar risk profiles
  • Supplier risk scoring — score suppliers by their structural importance and substitutability in the graph
  • Demand propagation — trace how demand shocks travel upstream through the supply network
from pycleora import SparseMatrix, embed, find_most_similar

# Supply chain: supplier → factory → warehouse → retailer
edges = [
    "supplier_china factory_shenzhen",
    "supplier_vietnam factory_hanoi",
    "factory_shenzhen warehouse_la warehouse_rotterdam",
    "factory_hanoi warehouse_la",
    "warehouse_la retailer_us_west retailer_us_east",
    "warehouse_rotterdam retailer_eu_north retailer_eu_south",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::node"
)
embeddings = embed(graph, feature_dim=32, num_iterations=5)

# Find nodes structurally similar to a disrupted supplier
similar = find_most_similar(graph, embeddings, "supplier_china", top_k=3)
for r in similar:
    print(f"Alternative: {r['entity_id']} (similarity: {r['similarity']:.4f})")

Telecommunications

Telecom networks are massive graphs of subscribers, cell towers, call records, and data sessions. Graph embeddings compress these billion-edge networks into actionable vectors for churn prediction, network planning, and anomaly detection.

Key Applications

  • Churn prediction — subscribers embedded near churned users are at higher risk, capturing social influence effects
  • Subscriber communities — detect calling communities for targeted marketing and plan optimization
  • Network anomaly detection — unusual traffic patterns produce outlier embeddings in the call graph
  • Cell tower optimization — embed tower-subscriber relationships to plan capacity and coverage
from pycleora import SparseMatrix, embed
from pycleora import cosine_similarity

# Call detail records: caller, callee, cell tower
edges = [
    "sub_alice sub_bob tower_north",
    "sub_alice sub_carol tower_north",
    "sub_bob sub_dave tower_south",
    "sub_eve sub_frank tower_south",
    "sub_eve sub_dave tower_east",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=64, num_iterations=4)

# Check if two subscribers are in the same community
idx_a = graph.get_entity_index("sub_alice")
idx_e = graph.get_entity_index("sub_eve")
print(f"Community similarity: {cosine_similarity(embeddings[idx_a], embeddings[idx_e]):.4f}")

Cybersecurity

Cyber threats operate on graphs — attack paths traverse networks, threat actors share infrastructure, and lateral movement follows credential relationships. Graph embeddings turn complex security telemetry into vectors that power detection models.

Key Applications

  • Attack graph analysis — embed network topology to score attack paths and prioritize patching
  • Threat intelligence — cluster indicators of compromise (IPs, domains, hashes) by shared infrastructure
  • Lateral movement detection — anomalous credential usage creates unusual embedding trajectories
  • Insider threat — employees accessing resources outside their embedding neighborhood trigger alerts
from pycleora import SparseMatrix, embed, find_most_similar

# Network security graph: hosts, credentials, services
edges = [
    "host_web cred_admin svc_http",
    "host_db cred_admin svc_postgres",
    "host_web cred_deploy svc_ssh",
    "host_attacker cred_stolen svc_ssh",
    "host_attacker cred_admin svc_http",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=32, num_iterations=5)

# Hosts similar to the attacker node — potential lateral movement targets
similar = find_most_similar(graph, embeddings, "host_attacker", top_k=5)
for r in similar:
    print(f"At risk: {r['entity_id']} (proximity: {r['similarity']:.4f})")

Energy / Utilities

Power grids, gas pipelines, and water networks are physical graphs where failures cascade. Graph embeddings model these topological dependencies to predict outages, optimize load balancing, and plan smart grid upgrades.

Key Applications

  • Failure prediction — embed grid components to identify nodes whose failure would cascade most broadly
  • Smart grid optimization — balance load across the network using embedding-based clustering of consumption patterns
  • Renewable integration — model the structural impact of adding solar/wind nodes to the existing grid graph
  • Anomaly detection — unusual consumption patterns produce outlier embeddings indicating theft or equipment failure
from pycleora import SparseMatrix, embed, find_most_similar

# Power grid topology: substations, transformers, feeders
edges = [
    "substation_a transformer_1 feeder_north",
    "substation_a transformer_2 feeder_south",
    "substation_b transformer_3 feeder_north",
    "substation_b transformer_4 feeder_east",
    "feeder_north meter_cluster_1 meter_cluster_2",
    "feeder_south meter_cluster_3",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::node"
)
embeddings = embed(graph, feature_dim=32, num_iterations=5)

# Which components are most similar to a failed transformer?
similar = find_most_similar(graph, embeddings, "transformer_1", top_k=3)
for r in similar:
    print(f"Cascade risk: {r['entity_id']} ({r['similarity']:.4f})")

Real Estate / PropTech

Property valuation depends on neighborhood context — nearby amenities, transport links, school quality, and comparable sales. Graph embeddings encode this spatial and relational context into property vectors that capture location value beyond latitude/longitude.

Key Applications

  • Neighborhood-aware valuation — properties connected to similar amenities and infrastructure get similar embeddings
  • Comparable property search — find truly comparable sales by structural similarity, not just distance
  • Investment hotspot detection — identify emerging neighborhoods by embedding trajectory over time
  • Amenity impact scoring — quantify how new developments (transit, parks) change property embeddings
from pycleora import SparseMatrix, embed, find_most_similar

# Property-amenity graph: properties linked to nearby features
edges = [
    "property_101 school_lincoln park_central metro_blue",
    "property_102 school_lincoln grocery_fresh metro_blue",
    "property_201 school_washington park_riverside",
    "property_202 school_washington grocery_organic metro_red",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=64, num_iterations=4)

# Find comparable properties to property_101
similar = find_most_similar(graph, embeddings, "property_101", top_k=5)
for r in similar:
    print(f"Comparable: {r['entity_id']} (similarity: {r['similarity']:.4f})")

HR / Talent Management

Skills, roles, employees, and projects form a rich organizational graph. Graph embeddings surface hidden talent connections — who has transferable skills, which teams collaborate implicitly, and where skill gaps emerge before they become critical.

Key Applications

  • Skill graph matching — embed skills and candidates together to find non-obvious fits beyond keyword matching
  • Internal mobility — identify employees whose skill embeddings are close to open roles in other departments
  • Team composition — build balanced teams by selecting members with complementary (distant) embeddings
  • Skill gap analysis — compare the embedding centroid of your team against industry benchmark roles
from pycleora import SparseMatrix, embed, find_most_similar

# Employee-skill-project graph
edges = [
    "emp_alice skill_python skill_ml project_recsys",
    "emp_bob skill_python skill_devops project_infra",
    "emp_carol skill_ml skill_stats project_recsys",
    "emp_dave skill_java skill_devops project_infra",
    "role_ml_eng skill_python skill_ml skill_devops",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=64, num_iterations=4)

# Who best fits the ML Engineer role?
similar = find_most_similar(graph, embeddings, "role_ml_eng", top_k=5)
for r in similar:
    print(f"Candidate: {r['entity_id']} (fit: {r['similarity']:.4f})")

Research / Academia

Academic literature forms a citation graph where papers, authors, institutions, and topics are deeply interconnected. Graph embeddings reveal research trends, collaboration opportunities, and the true impact of work beyond simple citation counts.

Key Applications

  • Citation network analysis — embed papers to find thematically related work that string-based search misses
  • Collaboration graphs — predict fruitful co-author pairings based on embedding proximity
  • Research trend prediction — track how topic embeddings shift over time to identify emerging fields
  • Reviewer matching — find qualified reviewers whose expertise embeddings align with a submission
from pycleora import SparseMatrix, embed, find_most_similar

# Citation + co-authorship graph
edges = [
    "paper_gnn author_kipf topic_graphs venue_iclr",
    "paper_attention author_vaswani topic_nlp venue_neurips",
    "paper_graphsage author_hamilton topic_graphs venue_neurips",
    "paper_gnn cites paper_graphsage",
    "paper_gat author_velickovic topic_graphs cites paper_gnn",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=64, num_iterations=4)

# Find papers related to graph neural networks
similar = find_most_similar(graph, embeddings, "paper_gnn", top_k=5)
for r in similar:
    print(f"Related: {r['entity_id']} ({r['similarity']:.4f})")

Education / EdTech

Learning is a graph problem — courses build on prerequisites, students follow similar paths, and knowledge domains interconnect. Graph embeddings power adaptive learning by encoding these relationships into vectors that personalize the educational experience.

Key Applications

  • Learning path optimization — embed courses and prerequisites to recommend the most efficient study sequence
  • Course recommendations — students with similar learning trajectories get personalized course suggestions
  • Knowledge gap analysis — identify missing prerequisite knowledge by comparing student embeddings to successful completers
  • Peer matching — pair students with complementary strengths for collaborative learning
from pycleora import SparseMatrix, embed, find_most_similar

# Student-course-topic knowledge graph
edges = [
    "student_1 course_calc course_linear_alg topic_math",
    "student_1 course_ml topic_cs",
    "student_2 course_calc course_stats topic_math",
    "student_3 course_linear_alg course_ml topic_cs",
    "course_ml prereq course_calc prereq course_linear_alg",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=64, num_iterations=4)

# Recommend next course for student_2
similar = find_most_similar(graph, embeddings, "student_2", top_k=5)
courses = [r for r in similar if r["entity_id"].startswith("course_")]
for c in courses:
    print(f"Suggested: {c['entity_id']} ({c['similarity']:.4f})")

Gaming / Metaverse

Online games generate enormous interaction graphs — players team up, trade items, compete, and form guilds. Graph embeddings capture player behavior and social dynamics for matchmaking, anti-cheat, and personalized in-game experiences.

Key Applications

  • Player matchmaking — embed players by play style, skill, and social connections for balanced matches
  • Bot detection — bots exhibit abnormal graph patterns (e.g., no social ties, repetitive trade partners)
  • In-game recommendations — suggest items, quests, or guilds based on player embedding similarity
  • Toxicity modeling — players in toxic clusters get flagged through embedding proximity to known offenders
from pycleora import SparseMatrix, embed, find_most_similar

# Player interaction graph: co-play, trades, guild membership
edges = [
    "player_x player_y guild_alpha match_12345",
    "player_x player_z guild_alpha",
    "player_y item_sword trade_001",
    "player_z item_shield trade_002",
    "bot_1 bot_2 trade_003 trade_004 trade_005",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=32, num_iterations=4)

# Find players similar to player_x for matchmaking
similar = find_most_similar(graph, embeddings, "player_x", top_k=5)
for r in similar:
    print(f"Match: {r['entity_id']} (compatibility: {r['similarity']:.4f})")

Music / Entertainment

Music discovery thrives on graphs — listeners, artists, tracks, genres, and playlists form a rich bipartite network. Graph embeddings enable content discovery that goes beyond genre tags, capturing the latent taste dimensions that make a listener enjoy both jazz and lo-fi hip-hop.

Key Applications

  • Music discovery — embed tracks and listeners to recommend songs from unfamiliar artists that match taste profiles
  • Playlist generation — create coherent playlists by selecting tracks with smoothly transitioning embeddings
  • Artist collaboration prediction — artists with similar audience embeddings but different genres make surprising, successful collabs
  • Trend forecasting — track how genre embeddings shift over time to predict the next breakout sound
from pycleora import SparseMatrix, embed, find_most_similar

# Listener-track-artist-genre graph
edges = [
    "user_1 track_midnight artist_jazz genre_jazz",
    "user_1 track_lofi_rain artist_chillhop genre_hiphop",
    "user_2 track_midnight artist_jazz genre_jazz",
    "user_2 track_blue_note artist_jazz",
    "user_3 track_lofi_rain genre_hiphop track_boom_bap",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=64, num_iterations=4)

# Discover tracks for user_1 beyond their usual genres
similar = find_most_similar(graph, embeddings, "user_1", top_k=10)
tracks = [r for r in similar if r["entity_id"].startswith("track_")]
for t in tracks:
    print(f"Discover: {t['entity_id']} ({t['similarity']:.4f})")

Agriculture / AgriTech

Agriculture is an underappreciated graph domain — crop varieties, soil types, weather patterns, pest species, and supply chain nodes are all interconnected. Graph embeddings help farmers and agronomists make data-driven decisions across these complex interdependencies.

Key Applications

  • Crop similarity networks — embed crop varieties by shared growing conditions, pest vulnerabilities, and nutrient needs
  • Fertilizer optimization — model soil-crop-nutrient interactions to recommend optimal fertilization schedules
  • Food supply chain tracing — track products from farm to table through the distribution graph for safety recalls
  • Pest outbreak prediction — pests spread through spatial and crop-similarity networks; embeddings model this diffusion
from pycleora import SparseMatrix, embed, find_most_similar

# Crop-soil-pest interaction graph
edges = [
    "crop_wheat soil_clay pest_aphid nutrient_nitrogen",
    "crop_barley soil_clay pest_aphid nutrient_potassium",
    "crop_rice soil_loam pest_borer nutrient_nitrogen",
    "crop_corn soil_sandy pest_rootworm nutrient_phosphorus",
    "crop_soybean soil_loam nutrient_nitrogen pest_nematode",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=32, num_iterations=4)

# If aphids hit wheat, which other crops are at risk?
similar = find_most_similar(graph, embeddings, "crop_wheat", top_k=3)
for r in similar:
    print(f"At risk: {r['entity_id']} (similarity: {r['similarity']:.4f})")

Legal Tech

Legal reasoning is inherently graph-structured — cases cite precedents, statutes reference other statutes, and regulatory requirements link across jurisdictions. Graph embeddings unlock semantic legal search and compliance analysis that keyword matching cannot achieve.

Key Applications

  • Case law networks — embed cases by citation relationships to find the most relevant precedents
  • Precedent analysis — identify how landmark decisions influence downstream rulings through embedding proximity
  • Compliance mapping — connect regulations, controls, and business processes to identify coverage gaps
  • Contract clause similarity — embed contract provisions linked to their legal concepts for template matching
from pycleora import SparseMatrix, embed, find_most_similar

# Case law citation graph with legal topics
edges = [
    "case_brown_v_board topic_equal_protection cites case_plessy",
    "case_roe_v_wade topic_privacy topic_due_process",
    "case_griswold topic_privacy cites case_meyer",
    "case_obergefell topic_equal_protection topic_due_process cites case_griswold",
    "regulation_gdpr topic_privacy topic_data_protection",
]
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::entity"
)
embeddings = embed(graph, feature_dim=64, num_iterations=4)

# Find cases most related to privacy law
similar = find_most_similar(graph, embeddings, "topic_privacy", top_k=5)
for r in similar:
    print(f"Related: {r['entity_id']} ({r['similarity']:.4f})")

Tutorial: Build a Recommendation System

End-to-end example: from raw interaction data to production-ready recommendations.

from pycleora import SparseMatrix, embed, find_most_similar, predict_links
from pycleora.io_utils import save_embeddings

# Step 1: Load interaction data
with open("interactions.tsv") as f:
    edges = [line.strip() for line in f if line.strip()]

# Step 2: Build graph and embed
graph = SparseMatrix.from_iterator(
    iter(edges), "complex::reflexive::product"
)
embeddings = embed(graph, feature_dim=1024, num_iterations=4)

# Step 3: Generate recommendations for a user
recs = find_most_similar(graph, embeddings, "user_42", top_k=20)
item_recs = [r for r in recs if r["entity_id"].startswith("item_")]

# Step 4: Save for serving
save_embeddings(graph, embeddings, "embeddings.npz")

Tutorial: Community Detection

Discover communities in a social network and visualize the results.

from pycleora import SparseMatrix, embed
from pycleora.community import detect_communities_louvain
from pycleora.viz import visualize
from pycleora.datasets import load_dataset

# Load and embed
data = load_dataset("facebook")
graph = SparseMatrix.from_iterator(iter(data["edges"]), "complex::reflexive::user")
embeddings = embed(graph, feature_dim=1024)

# Detect communities
communities = detect_communities_louvain(graph)

# Visualize with t-SNE, colored by community
visualize(graph, embeddings, labels=communities,
         method="tsne", save_path="communities.png")

Tutorial: Scikit-learn Pipeline

Use CleoraEmbedder with scikit-learn's familiar fit/transform API.

from pycleora import CleoraEmbedder

# Scikit-learn compatible interface
embedder = CleoraEmbedder(
    feature_dim=1024,
    num_iterations=4,
    columns="complex::reflexive::product"
)

# fit_transform returns embeddings directly
edges = ["alice item_1", "alice item_2", "bob item_2", "bob item_3"]
embeddings = embedder.fit_transform(edges)

# Access the graph and entity IDs after fitting
print(embedder.entity_ids_)
print(embedder.embeddings_.shape)

# get_params/set_params for sklearn compatibility
print(embedder.get_params())