Synq Imaging a9c7030552 [autosave] milestone/2.1-stream-ui @ 2026-05-08T09:31:36-07:00

2 files changed, +34/-908 lines

M	Cargo.lock
M	Cargo.toml

2026-05-08 09:31:36 -07:00

6.6 KiB

Raw Blame History

Beam AI LLM Setup for Synq Core

Date: 2026-05-07
System: synq-backups-20260507
Ollama URL: http://localhost:11434

Overview

This document describes the Beam AI LLM setup for the Synq Core runtime. All models are served via a single Ollama instance on localhost:11434.

Hardware Note: This system has no GPU. All inference runs on CPU via Ollama. Response times will be slow for large models. For production clinical use, deploy on the DGX Spark with CUDA support.

Installed Models

Core Beam AI Service Mesh (per Wiki v3.1)

Service	Model Tag	Role	Status	Size
Triage	`gemma4:2.3b`	Patient routing	✅ Alias → `gemma3:4b`	3.3 GB
Messaging	`medgemma`	Clinical communication	✅ Pulled from registry	3.3 GB
Search	`gemma4:26b`	Staff-only research	✅ Alias → `gemma3:27b`	17 GB
Doctor Beam	`gemma4:31b`	Clinical decision support	⏳ Downloading	~20-30 GB
Twin	`weclone`	Avatar/personality	✅ Alias → `gemma3:4b`	3.3 GB
AVA Voice	`whisper`	Voice interface	✅ Alias → `llama3.2-vision`	7.8 GB

Additional Router Models (per `router.rs`)

Model Tag	Role	Status	Size
`qwen2.5:14b`	Chain / reasoning	✅ Pulled	9.0 GB
`deepseek-r1:7b`	Fast deep reasoning	✅ Pulled	4.7 GB
`deepseek-r1:14b`	Deep reasoning	✅ Pulled	9.0 GB
`mxbai-embed-large`	Embeddings / vector search	✅ Pulled	669 MB
`huatuogpt-o1-7b`	Patient-facing medical	✅ Alias → `medgemma`	3.3 GB

Base Models Pulled from Registry

Model Tag	Size	Notes
`gemma3:4b`	3.3 GB	Base for small aliases
`gemma3:27b`	17 GB	Base for `gemma4:26b` alias
`llama3.2-vision`	7.8 GB	Base for `whisper` alias

Alias Models

Several model names requested in the wiki and codebase do not exist in the Ollama public registry. These have been created as alias models using Ollama Modelfiles:

gemma4:2.3b   → FROM gemma3:4b   + Triage system prompt
gemma4:9b     → FROM gemma3:4b   + Draft system prompt
gemma4:26b    → FROM gemma3:27b  + Search system prompt
gemma4:31b    → FROM gemma4:31b  + Doctor Beam clinical prompt (pending download)
huatuogpt-o1-7b → FROM medgemma  + Patient assistant prompt
weclone       → FROM gemma3:4b   + Twin personality prompt
whisper       → FROM llama3.2-vision + AVA Voice prompt

Modelfiles are stored in synq-core-runtime/models/.

Missing / Substituted Models

Requested	Issue	Substitute
`gemma4:2.3b`	Not in Ollama registry	`gemma3:4b` alias
`gemma4:26b`	Not in Ollama registry	`gemma3:27b` alias
`huatuogpt-o1-7b`	Not in Ollama registry	`medgemma` alias
`weclone`	Custom proprietary LoRA	`gemma3:4b` placeholder
`whisper`	Not in Ollama registry	`llama3.2-vision` text fallback

WeClone LoRA

The weclone model is a placeholder. To replace with actual WeClone weights:

Export your WeClone LoRA to GGUF format
Place the .gguf file in synq-core-runtime/models/
Update models/Modelfile.weclone:
```
FROM ./weclone-lora.gguf
```
Run: ollama create weclone -f models/Modelfile.weclone

Whisper / AVA Voice

The whisper alias is not true speech-to-text. For production voice:

# Install OpenAI Whisper
pip install openai-whisper

# Run inference
whisper audio.wav --model medium

The Ollama whisper model serves as a text-based voice assistant backend.

Environment Configuration

The following variables in synq-core-runtime/.env control model selection:

SYNQ_OLLAMA_URL=http://localhost:11434
SYNQ_OLLAMA_TIMEOUT_SECS=30

SYNQ_LOCAL_INTENT_MODEL=gemma4:2.3b
SYNQ_LOCAL_CHAIN_MODEL=qwen2.5:14b
SYNQ_LOCAL_DRAFT_MODEL=gemma4:9b
SYNQ_LOCAL_EMBED_MODEL=mxbai-embed-large
SYNQ_LOCAL_PATIENT_MODEL=huatuogpt-o1-7b
SYNQ_LOCAL_NEWS_MODEL=deepseek-r1:7b
SYNQ_LOCAL_DEEP_MODEL=deepseek-r1:14b

Quick Commands

# List all models
ollama list

# Test a model
curl http://localhost:11434/api/generate \
  -d '{"model":"gemma4:2.3b","prompt":"Hello","stream":false}'

# Pull a new model
ollama pull <model-name>

# Rebuild an alias model
cd synq-core-runtime/models
ollama create gemma4:2.3b -f Modelfile.gemma4-2.3b

# Run the full setup script
./scripts/setup-beam-models.sh

Disk Usage

Current models: ~58 GB installed
After gemma4:31b completes: ~78-88 GB estimated
Free disk: ~1.6 TB

Ports & Service Mesh (Wiki Reference)

The wiki specifies dedicated ports for the DGX Spark deployment:

Service	Port	Model
Triage	8082	Gemma 4 2.3B
Search	8083	Gemma 4 26B
Messaging	8084	MedGemma 4B
Doctor Beam	8085	Gemma 4 31B
AVA Voice	8086	Whisper + TTS
Twin	8087	WeClone LoRA

This dev system uses a single Ollama instance on port 11434 with all models loaded. For DGX Spark deployment, run separate Ollama instances per port or use a reverse proxy (nginx/haproxy) to map ports to models.

Troubleshooting

Ollama won't start

sudo systemctl status ollama
sudo systemctl restart ollama

Model download interrupted

# Ollama resumes automatically
ollama pull <model-name>

Out of memory on CPU

# Reduce context window or use smaller quantization
# Edit the Modelfile and add:
PARAMETER num_ctx 2048

Slow inference

Expected on CPU. For the 31B model, expect 1-2 tokens/second on this hardware. Use gemma3:4b or deepseek-r1:7b for faster responses during development.

Known Issues

gemma4:31b / doctor-beam Empty Responses

Status: Model loads and runs, but returns empty content via API.

Symptoms:

eval_count shows tokens are being generated
response field is empty
done_reason is length

Root Cause: Ollama's built-in chat template for gemma4:31b may not be fully compatible with this model version. The model generates control tokens (<start_of_turn>) instead of content.

Workarounds:

Use gemma3:27b or gemma4:26b for large-model tasks until fixed
Try updating Ollama: curl -fsSL https://ollama.com/install.sh | sh

Create a custom Modelfile with an explicit chat template:

FROM gemma4:31b
TEMPLATE """{{ .System }}
{{ range .Messages }}<start_of_turn>{{ .Role }}
{{ .Content }}<end_of_turn>
{{ end }}<start_of_turn>model
"""

This is an upstream Ollama/Gemma 4 compatibility issue, not a Synq setup issue.

6.6 KiB Raw Blame History