--d --h --m --s

Universal Inference Protocol

Intelligence everywhere.
Permission from no one.

The shortest path between a model and silicon. One binary routes any AI model to any hardware. 19 backends. Zero dependencies. ESP32 to datacenter.

# Install and run in 30 seconds
git clone https://git.inference-x.com/salka/inference-x.git
cd inference-x && make
./inference-x model.gguf --serve 8080

305 KB

Binary Size

Dependencies

Quant Formats

Backends

What is this, exactly?

Three things to know. Nothing more.

📦

It's a tiny file

305 kilobytes. Smaller than a photo on your phone. This file lets your computer run AI — any AI — without the internet. Download it, run it. That's it.

🔒

Your words stay yours

When you use AI online, your questions travel to a distant server. Someone can read them. With Inference-X, nothing leaves your machine. Ever. It's just you and your computer.

🌍

It runs on anything

Old laptop, new phone, Raspberry Pi, datacenter. Same file. It detects your hardware and uses it. No configuration. No expertise needed.

What can YOUR computer do?

Move the slider to your RAM. See what's possible.

8 GB RAM

1 GB128 GB

Your AI runs locally. No internet. No account. Free forever.

How small is 305 KB?

Smaller than what you think.

The entire engine — all 19 hardware targets, all 23 formats — fits in less space than a single photo.

Where do your words go?

Cloud AI

☁️ → 🏢 → 👁️

Your question leaves your device, crosses the internet, reaches a server in another country, gets processed, stored, and analyzed. You pay per word.

Inference-X

💻 → 🧠 → ✅

Your question stays on your desk. The answer is computed by your own processor. Nothing leaves. Nothing is stored. You pay nothing.

One binary to run them all

12,571 lines of C++17. Six architectures. The model describes itself. The engine reads.

FUSED

Zero-Copy Inference

Dequantization and matrix multiply in one instruction loop. No intermediate buffer. Output closer to the model's theoretical maximum.

MoE

Trillion-Parameter Native

Only active experts exist in memory. A 1-trillion-parameter model runs on 64 GB RAM. Prefetch next layer while current computes.

DISPATCH

19 Silicon Targets

kernel_dispatch.h routes computation to 19 backends through one abstraction. Same source, same call, automatic detection.

ADAPTIVE

Smart Precision

Simple questions get compressed early layers. Complex reasoning gets full precision. The engine adapts to each query.

AIR-GAP

No Cloud. Ever.

No network calls. No telemetry. No phone-home. Models are local files. Works on a plane, in a submarine, on the moon.

AUTO

Zero Configuration

Chat templates, EOS tokens, architecture — all auto-detected from GGUF metadata. Download a model and run.

Any silicon. One binary.

The Makefile detects your hardware. You don't configure it — it configures itself.

CPU AVX2/512Intel, AMD

CUDANVIDIA

ROCmAMD GPU

MetalApple Silicon

VulkanCross-platform

ARM NEONARM / RPi

SnapdragonQualcomm

Hexagon HVXQualcomm DSP

TPUGoogle

InferentiaAWS

GaudiIntel HPU

MaiaMicrosoft

SambaNovaRDU

GraphcoreIPU

GroqLPU

Cerebras850K cores

FPGAXilinx

WebGPUBrowser

OpenCLUniversal

Tested models

Runs any GGUF model. Here are a few we've benchmarked.

SmolLM2 135MQ8_0

130 tok/s · 2 GB RAM

Quick answers. Tiny device. Lightning fast.

Phi-3 Mini 3.8BQ4_K_M

~4 tok/s · 4 GB RAM

Smart conversations, code help, translations.

Llama 3.2 3BQ4_K_M

3.82 tok/s · 4 GB RAM

Meta's compact model. Great reasoning.

Mistral 7BQ4_K_M

2.06 tok/s · 8 GB RAM

Creative writing, analysis, multilingual.

Llama 3.1 8BQ4_K_M

1.75 tok/s · 8 GB RAM

Full-featured assistant. Code. Math. Logic.

DeepSeek R1 14BQ4_K_M

0.97 tok/s · 16 GB RAM

Advanced reasoning. Expert-level answers.

OpenAI-compatible API

Start with --serve 8080. Drop-in replacement. Any client library works.

# Works with any OpenAI-compatible client
from openai import OpenAI

client = OpenAI(
  base_url="http://localhost:8080/v1",
  api_key="none"
)

resp = client.chat.completions.create(
  model="local",
  messages=[{"role": "user", "content": "Hello!"}],
  stream=True
)

Endpoints: POST /v1/chat/completions · POST /v1/completions · GET /v1/models · GET /health

How much does AI cost?

Using AI 1 hour per day, every day, for a year.

Cloud API (GPT-4 class)

~$360

/year

Inference-X (your hardware)

forever · electricity only

No API key. No subscription. No limit. Your hardware, your AI.

Ready? Three steps.

Pick your system.

Open Terminal: git clone https://git.inference-x.com/salka/inference-x.git && cd inference-x && make
Download a model from HuggingFace (any .gguf file)
Run: ./inference-x your-model.gguf — AI on your Mac.

Install: sudo apt install build-essential git
Build: git clone https://git.inference-x.com/salka/inference-x.git && cd inference-x && make
Run: ./inference-x your-model.gguf --serve 8080 — open localhost:8080

Install Git and a C++ compiler (MinGW or Visual Studio)
PowerShell: git clone https://git.inference-x.com/salka/inference-x.git; cd inference-x; make
Run: inference-x.exe your-model.gguf

On your Pi: sudo apt install build-essential git
Build: git clone https://git.inference-x.com/salka/inference-x.git && cd inference-x && make
Run: ./inference-x smollm2-135m.gguf — AI on a $35 board.

Free for those who need it. Fair for those who profit.

No tricks. No limits. The engine is the same everywhere.

Community

forever

Individuals, researchers, students, open-source, businesses under $1M.

Full engine
All models
Community support
All 19 backends

Get Started

Business

Custom

annual

Companies with $1M+ revenue using IX in production.

Commercial license
Priority SLA
Custom optimization
Hardware consulting

Contact

OEM

Custom

per unit

Hardware manufacturers embedding IX in products.

Binary redistribution
Custom backends
On-site integration
Co-branding

Contact

Neural Surgery

Extract, measure, and transplant components between AI models. Like organ transplants — for neural networks.

🔬

Scan

Analyze model architecture — layers, attention heads, FFN dimensions, expert topology. Non-invasive. Complete.

7 models scannable

✂️

Extract

Isolate individual layers, attention mechanisms, or expert networks. Clean cuts. Preserves signal integrity.

Precision: layer-level

🧬

Graft

Transplant components between compatible models. A reasoning layer from one, creativity from another. Chimeric intelligence.

Families: auto-detected

Live Model Registry

Connecting to OASIS...

Model Forge

Build custom AI models from components. Select a base, choose precision, optimize for your hardware. No training required.

Select Base

Choose from 7+ GGUF models. Each pre-analyzed for organ compatibility.

→

Configure

Set quantization (Q2→FP32), precision strategy, expert selection. 23 formats supported.

→

Deploy

One binary. Your hardware. Adaptive precision matches model to silicon automatically.

Forge Registry

Loading registry...

Model Store

Pre-configured models for specific industries. Healthcare, agriculture, legal, finance. Deploy in seconds.

🏥

Healthcare

Medical diagnosis, drug interaction, radiology AI. Privacy-first. Runs locally.

Q2 2026

🌾

Agriculture

Crop disease detection, irrigation optimization, yield prediction. Edge-ready.

Q2 2026

⚖️

Legal

Contract analysis, compliance checking, case research. Your data stays yours.

Q2 2026

💰

Finance

Risk assessment, market analysis, regulatory compliance. Zero cloud dependency.

Q2 2026

🔧

Engineering

Code generation, CAD analysis, technical documentation. Runs on your workstation.

Q2 2026

🎓

Education

Tutoring, curriculum generation, assessment. Works offline. Perfect for schools.

Q2 2026

Available Now

Loading catalog...

Intelligence everywhere.Permission from no one.

What is this, exactly?

It's a tiny file

Your words stay yours

It runs on anything

What can YOUR computer do?

How small is 305 KB?

Where do your words go?

Cloud AI

Inference-X

One binary to run them all

Zero-Copy Inference

Trillion-Parameter Native

19 Silicon Targets

Smart Precision

No Cloud. Ever.

Zero Configuration

Any silicon. One binary.

Tested models

OpenAI-compatible API

How much does AI cost?

Ready? Three steps.

Free for those who need it. Fair for those who profit.

Neural Surgery

Scan

Extract

Graft

Model Forge

Select Base

Configure

Deploy

Forge Registry

Model Store

Healthcare

Agriculture

Legal

Finance

Engineering

Education

Available Now

Start in 30 seconds.

Intelligence everywhere.
Permission from no one.