Skip to content

samvardhan03/GraphNLP-Intel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ•ΈοΈ graphnlp-intel

PyPI - Version License Docs Python Version

graphnlp-intel is an open-source Python library and REST API that transforms unstructured documents into rich, interactive knowledge graphs using state-of-the-art NLP, relationship extraction, and GNN-based sentiment propagation.

πŸš€ Quickstart

Install the library and download the required spaCy model:

pip install graphnlp-intel
python -m spacy download en_core_web_sm

Run the pipeline in 6 lines of code:

from graphnlp import Pipeline

pipe = Pipeline(domain="finance")
result = pipe.run(["Goldman Sachs acquired a 5% stake in Microsoft for $2.3 billion."])

# Visualize, export, and summarize
result.graph.visualize("output.html") # Generates a Pyvis interactive HTML graph
result.export_json("output.json")    # Exports D3 compatible JSON
print(result.summary())              # Output stats on nodes, edges, sentiment, and communities

🧠 How it works

The system processes unstructured text through a 5-stage pipeline:

 πŸ“„ Ingestion      πŸ” Extraction         πŸ•ΈοΈ Graph Build         🧠 GNN              πŸ“ˆ Output
 DocumentLoader β†’ NERExtractor       β†’ GraphBuilder       β†’ GraphGNN          β†’ Pyvis HTML /
 TextChunker      RelationExtractor    CommunityDetector                        D3 JSON /
 EmailParser      EmbeddingExtractor                                            Neo4j / Redis

Standalone Extractor Usage

from graphnlp.extraction.ner import NERExtractor
from graphnlp.extraction.relations import RelationExtractor

ner = NERExtractor()
entities = ner.extract("Apple Inc reported revenue of $120 billion.")

rel_ext = RelationExtractor()
triples = rel_ext.extract("Apple Inc reported revenue of $120 billion.")

Standalone Graph Construction Usage

from graphnlp.graph.builder import GraphBuilder
from graphnlp.graph.community import CommunityDetector
import networkx as nx

builder = GraphBuilder()
graph = builder.build(triples, entities, embeddings_dict)

detector = CommunityDetector()
communities = detector.detect(graph)

🧩 Domain adapters

Domain adapters supply contextual logic like schema mappings, preprocessing, and post-processing steps tailored to specific industries.

Adapter Entity Types Use Case
finance TICKER, ORG, AMOUNT, DATE Parse fund records, expand ticker syms, build COMPETITOR_OF graphs
email PERSON, MERCHANT, MONEY Strip HTML/headers, parse invoices, generate PAID_TO expense clusters
feedback PRODUCT, SCORE, FEATURE Normalize 5-star ratings, cluster feature complaints, link reviews
incidents SERVICE, ERROR, SEV Standardize P0/P1 flags, deduplicate logs, build AFFECTS topological graphs

Using the Email Adapter

from graphnlp.adapters.base import get_adapter
from graphnlp.adapters.email import EmailAdapter
import networkx as nx

adapter = get_adapter("email")
clean_text = adapter.preprocess(raw_email_string)

# Graph integration
g = nx.DiGraph()
g.add_edge("$234.56", "Amazon", predicate="paid_to")
spend_clusters = EmailAdapter.monthly_spend_summary(g)

Custom Adapter Implementation

from graphnlp.adapters.base import DomainAdapter

class HealthcareAdapter(DomainAdapter):
    @property
    def domain(self) -> str:
        return "healthcare"
        
    @property
    def entity_types(self) -> list[str]:
        return ["PATIENT", "SYMPTOM", "DRUG"]
        
    def preprocess(self, text: str) -> str:
        return text.replace("Pt.", "Patient")

⚑ API Server

Deploy the multi-tenant REST API via Docker:

make docker-up

Endpoints

Method Path Description
GET /health Check service health and system status.
POST /v1/analyze Submit documents for processing (sync or async).
GET /v1/analyze/{job_id} Poll status of an async analysis job.
GET /v1/graph/{graph_id} Retrieve D3.js compatible graph JSON by ID.
GET /v1/graph/{graph_id}/summary Retrieve summarized stats of the graph.
POST /v1/webhooks Register a new webhook endpoint for async complete events.
GET /v1/webhooks List registered webhooks for the given tenant.

Auth, Submit, and Poll

# Submit Sync
curl -X POST http://localhost:8000/v1/analyze \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"documents": ["Invoice 123 for $500 to AWS"], "domain": "finance", "async": false}'

# Submit Async
curl -X POST http://localhost:8000/v1/analyze \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"documents": ["Massive batch 1...", "Massive batch 2..."], "async": true}'

# Poll Async Status
curl -X GET http://localhost:8000/v1/analyze/job-1234 \
  -H "Authorization: Bearer sk-your-api-key"

πŸ“¦ SDK Integration

Python SDK

pip install graphnlp-client
from graphnlp_client.client import GraphNLPClient

client = GraphNLPClient(api_key="sk-your-api-key", base_url="http://localhost:8000")

# Sync
result = client.analyze(["Azure bill $300"], domain="email")
print(result["graph_id"])

# Get Graph data
graph = client.get_graph(result["graph_id"])

TypeScript / JavaScript SDK

npm install graphnlp-client
import { GraphNLPClient } from 'graphnlp-client';

const client = new GraphNLPClient({ apiKey: 'sk-your-api-key' });

async function analyze() {
  const result = await client.analyze(['Q4 earnings were up 12%'], { domain: 'finance' });
  const graph = await client.getGraph(result.graph_id);
  console.log(graph.nodes);
}

πŸͺ Webhooks

Register webhooks to receive JSON payloads upon async task completion.

curl -X POST http://localhost:8000/v1/webhooks \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://yourapp.com/hook", "events": ["graph.ready"], "secret": "wh_sec_123"}'

Webhook Payload Example

{
  "event": "graph.ready",
  "job_id": "job-1234",
  "graph_id": "graph-5678",
  "tenant_id": "tenant-abc",
  "timestamp": "2026-04-18T10:00:00Z",
  "signature": "sha256=d2b8b9a..."
}

βš™οΈ Configuration

Configure the platform using config/default.yaml or environment variables:

# config/default.yaml
environment: production
neo4j:
  uri: bolt://localhost:7687
redis:
  url: redis://localhost:6379
api:
  rate_limit_per_minute: 100
nlp:
  ner_model: en_core_web_sm
  embedding_model: all-MiniLM-L6-v2
# .env
GRAPHNLP_ENVIRONMENT=production
GRAPHNLP_NEO4J_URI=bolt://neo4j:7687
GRAPHNLP_NEO4J_USER=neo4j
GRAPHNLP_NEO4J_PASSWORD=supersecret
GRAPHNLP_REDIS_URL=redis://redis:6379

πŸ› οΈ CLI Reference

Manage the platform using the built-in Typer CLI:

  • graphnlp run --domain finance --file data.csv : Run pipeline on a local file.
  • graphnlp serve --port 8000 --reload : Start the FastAPI server.
  • graphnlp worker --concurrency 4 : Start the Celery async worker.
  • graphnlp generate-key -t my-tenant : Generate a new API key for the specified tenant.

πŸ—οΈ Architecture

graphnlp-intel/
β”œβ”€β”€ graphnlp/
β”‚   β”œβ”€β”€ config.py              # Pydantic Settings
β”‚   β”œβ”€β”€ pipeline.py            # Main Orchestrator
β”‚   β”œβ”€β”€ ingestion/             # Loaders, Chunkers, Email Parsers
β”‚   β”œβ”€β”€ extraction/            # NER, Relations, SBERT Embeddings
β”‚   β”œβ”€β”€ graph/                 # NetworkX Builder, PyG GNN, Diff, Louvain
β”‚   β”œβ”€β”€ adapters/              # Domain-specific logic
β”‚   β”œβ”€β”€ storage/               # Neo4j & Redis handlers
β”‚   β”œβ”€β”€ api/                   # FastAPI routes, Auth, Tenant Middleware
β”‚   β”œβ”€β”€ queue/                 # Celery workers & tasks
β”‚   └── webhooks/              # HMAC Dispatcher
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ unit/                  # Isolated logic blocks
β”‚   β”œβ”€β”€ integration/           # E2E API tests
β”‚   └── fixtures/              # CSV/JSON samples
β”œβ”€β”€ sdk/
β”‚   β”œβ”€β”€ python/                # PyPI API wrapper
β”‚   └── js/                    # NPM API wrapper
β”œβ”€β”€ docker/
β”‚   β”œβ”€β”€ docker-compose.yml     # Local orchestration
β”‚   β”œβ”€β”€ Dockerfile             # API Container
β”‚   └── Dockerfile.worker      # Celery Container
└── pyproject.toml             # Dependencies & metadata

πŸ“š Open Source Stack

We stand on the shoulders of giants.

Component Library
NLP Base spacy
Deep Learning torch
Graph Neural Nets torch-geometric
Language Models transformers
Sentence Embeddings sentence-transformers
Graph Analytics networkx
Async Queue celery
Web Framework fastapi
Configuration pydantic
Caching & Rate Limits redis.asyncio
Graph Persistence neo4j (async driver)
CLI Generation typer

πŸ—ΊοΈ Roadmap

Phase Milestone Expected
Phase 1 Streaming Engine (Kafka integration, real-time diffing) Q3 2026
Phase 2 Custom Model Fine-Tuning (LoRA automated pipeline) Q4 2026
Phase 3 Visual Graph Dashboard (React SPA for interactive analytics) Q1 2027

πŸ’Ό Custom Builds & Enterprise

Tier Price Features
Open Source Free Apache 2.0 Β· Self-hosted Β· All adapters Β· CLI
Custom NER $800–2,000 Fine-tune NER Β· HF model delivery Β· Eval report
Hosted API $2,500 + $400/mo FEATURED Β· AWS/GCP/Azure deploy Β· Docker + TF Β· SDK
Enterprise $8,000+ Streaming Β· Dashboard Β· Alerting SLA Β· White-label

Interested in Hosted API or Enterprise tiers? Get a quote on our site.

🀝 Contributing

We welcome contributions!

git clone https://github.com/samvardhan03/GraphNLP-Intel.git
cd GraphNLP-Intel
./setup_dev.sh
make test

πŸ“„ License

This project is licensed under the Apache License 2.0.

@software{graphnlpintel2026,
  author = {GraphNLP Team},
  title = {graphnlp-intel: Hybrid Graph-NLP Intelligence Platform},
  year = {2026},
  url = {https://github.com/samvardhan03/GraphNLP-Intel}
}

About

An enterprise-grade hybrid Graph-NLP intelligence platform. Transforms unstructured text into interactive knowledge graphs using zero-shot NER, relationship extraction, and GNN sentiment propagation.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors