Building a Neuron.
The smallest possible Cosmonapse program - one LLM Neuron backed by Hugging Face, one Axon, one Dendrite, one TASK, one reply. Single process, in-memory Synapse, no broker to start. Read this first; every other example adds something on top of this shape, and the LLM doesn't add any boilerplate.
Python 3.11 or newer. httpx powers the HuggingFace Neuron source. Grab a token at huggingface.co/settings/tokens - read scope is enough.
# Python 3.11+. httpx powers the HuggingFace Neuron source. $ pip install cosmonapse httpx # Read scope is enough - the token grants access to the public # Inference Providers router at https://router.huggingface.co. $ export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxx
An LLM, behind the same interface.
A Neuron is anything that satisfies async fn(input, context) → output. Neuron(source="huggingface", ...) is the unified factory: it returns an async callable with that shape, wrapped around any OpenAI-compatible chat endpoint. Switch source="ollama" or source="flask" and the rest of the program is unchanged.
# A Neuron is anything that satisfies the NeuronFn contract: # async fn(input, context) -> output # # The unified factory wraps any source behind that interface. Here it's # a HuggingFace endpoint; it could equally be Ollama, a Flask app, or an # MCP server - the Axon never knows the difference. import os from cosmonapse import Neuron greeter = Neuron( source="huggingface", endpoint="https://router.huggingface.co", model="meta-llama/Llama-3.1-8B-Instruct", api_key=os.environ["HF_TOKEN"], use_chat_api=True, max_new_tokens=128, temperature=0.7, ) # Input the orchestrator sends: {"prompt": "..."} or {"messages": [...]} # Output the Neuron returns: {"response": "<text>", "meta": <raw>}
Identity, capabilities, validation.
The Axon wraps the Neuron, gives it an addressable id on the bus, and turns return values into protocol-valid AGENT_OUTPUT Signals. It never touches the Synapse itself - that boundary is enforced in code, not convention. This snippet is identical whether the Neuron is an LLM, a function, or a Flask app.
# The Axon declares identity + capabilities and owns the Neuron. # It doesn't know it's wrapping an LLM - this code is byte-for-byte the # same as it would be for a hand-written async function. from cosmonapse import Axon axon = Axon( neuron_id="greeter", neuron_fn=greeter, capabilities=["text-generation", "chat", "greet"], )
The only thing that touches the Synapse.
The Dendrite hosts Axons, emits REGISTER / HEARTBEAT / DEREGISTER on their behalf, routes inbound TASKs, and exposes the dispatch API. We build two - a role="worker" that serves requests, and an orchestrator (default role) that sends them. Both share the same in-memory Synapse.
# The Dendrite is the only component that touches the Synapse. # role="worker" is a protocol guard: workers can serve TASKs and bid, # but cannot emit orchestration signals (TASK / FINAL / etc.). from cosmonapse import Dendrite, MemorySynapse synapse = MemorySynapse() # in-process - no socket await synapse.connect() worker = Dendrite( synapse=synapse, namespace="demo", role="worker", ) worker.attach_axon(axon) orchestrator = Dendrite(synapse=synapse, namespace="demo")
One TASK, one Pathway, one reply.
dispatch_and_wait is sugar over a Pathway: emit a TASK on a new trace_id, open a Pathway scoped to that trace, await the first terminal Signal, close the Pathway, and return the Signal. The LLM Neuron returns {"response": "...", "meta": {...}}, so the answer lives at reply.payload["output"]["response"].
# dispatch_and_wait is sugar over a Pathway: # 1. emit a TASK on this trace_id # 2. open a Pathway scoped to the trace # 3. await the first terminal Signal (AGENT_OUTPUT here) # 4. close the Pathway, return the Signal async with worker, orchestrator: reply = await orchestrator.dispatch_and_wait( neuron="greeter", input={"prompt": "Say hello to a project called Cosmonapse in one line."}, timeout_s=30.0, ) print(f"[{reply.type.value}] {reply.payload['output']['response']}")
The whole program.
About 25 lines of real code, including the LLM. Save as main.py and run.
import asyncio, os from cosmonapse import Axon, Dendrite, MemorySynapse, Neuron greeter = Neuron( source="huggingface", endpoint="https://router.huggingface.co", model="meta-llama/Llama-3.1-8B-Instruct", api_key=os.environ["HF_TOKEN"], use_chat_api=True, max_new_tokens=128, temperature=0.7, ) async def main(): synapse = MemorySynapse() await synapse.connect() try: axon = Axon( neuron_id="greeter", neuron_fn=greeter, capabilities=["text-generation", "chat", "greet"], ) worker = Dendrite(synapse=synapse, namespace="demo", role="worker") worker.attach_axon(axon) orchestrator = Dendrite(synapse=synapse, namespace="demo") async with worker, orchestrator: reply = await orchestrator.dispatch_and_wait( neuron="greeter", input={"prompt": "Say hello to a project called Cosmonapse in one line."}, timeout_s=30.0, ) print(f"[{reply.type.value}] {reply.payload['output']['response']}") finally: await synapse.close() asyncio.run(main())
$ python main.py
[AGENT_OUTPUT] Hello, Cosmonapse! Welcome aboard - let's build something cool.Exact text varies - the model is stochastic.
One line moves between providers.
The endpoint is the only HuggingFace-specific line. Point it at a dedicated HF endpoint, a local TGI / vLLM server, or LM Studio - the Neuron, Axon, and Dendrite code never changes. For Ollama, swap the source.
# The endpoint is the only HF-specific line. Point it elsewhere for any # OpenAI-compatible chat server - your Neuron code never changes. endpoint="https://router.huggingface.co" # default endpoint="https://<your-endpoint>.endpoints.huggingface.cloud" # dedicated HF endpoint endpoint="http://localhost:8080" # local TGI / vLLM / LM Studio # For Ollama, switch source - same Axon, same Dendrite. greeter = Neuron(source="ollama", model="llama3")
See the Signals animate in the browser.
cosmo doppler --prism opens a live, read-only visualization of every Signal on the bus - REGISTER, TASK, AGENT_OUTPUT - as the greeter answers.
# This example runs in-process on MemorySynapse, which Prism cannot # attach to. To watch it live, run a dev synapse and point the code at it: # terminal 1 - the bus $ cosmo synapse start memory --namespace=demo # terminal 2 - Prism, the live browser view (http://127.0.0.1:7071) $ cosmo doppler --prism --url=cosmo://127.0.0.1:7070 -n demo # in the code - swap one line: # synapse = MemorySynapse() synapse = await connect_synapse("cosmo://127.0.0.1:7070")

Integrating an Engram
Bind shared memory and call recall() / imprint() from inside the Neuron.
Pathway - three shapes
The full surface dispatch_and_wait is built on. Sequential, reactive, streaming.
Round-robin orchestrator
Split worker and orchestrator across processes, load-balance HuggingFace workers.