Hot Config

LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT.

thinkmelt@protonmail.com — Mon, 18 May 2026 03:14:59 GMT

Everyone is raging about the houseLLM revolution. Turboquant made large contexts possible, and now MTP (Multiple Token Prediction) increased speeds considerably (and has been accepted into the main fork of llama.cpp.) But with all this local compute power a good harness that will let your LLM code all night was in order.

You prompt, it works, you come back.. you prompt - but you are sitting there all night bolted to the output. What if you could set 20 prompts up, have them queued, and have them sequentially execute in order even if one takes 5 minutes and the next takes and hour? This harness is designed exactly to do this!
You want a web gui interface because you just don't want to mangle json object back-ends.
It's already dockerized, pull it and run it!

LLM Queue Dispatcher (LLMQP) is designed exactly to do this, it can be very easily run and is completely docker ready!

docker pull docker.io/cnmcdee/llmqueue:latest
docker run -d --name mcp-llmqueue --restart unless-stopped -p 0.0.0.0:5012:5012 cnmcdee/llmqueue:latest

As soon as you download it it will offer you some powerful options. Lets go over it.

It binds to port 5012, and is designed to run locally, and to control and monitor multiple localLLM's plus API LLM's at the same time.

http://192.168.1.:5012

Set your MCP Agents

It fully recognizes and uses CORS polling to read MCP Tools

Set your LLM Stack

One or many it's up to you

Done-Walk Prompting

You can queue as many prompts as you like, it will build your prompt and dispatch them to your multiple LLM's. monitoring the output!

It will show you prompt dispatches.

The prompts as they work can be monitored.

Understanding DONE-WALK.

Each prompt will finish with a unique 10 character serial. The software will monitor the output prompt for this serial queue. Upon seeing it it will automatically start the next prompt for you! Simply select that type of prompt and watch it work!

Done-Walk will walk your prompts one by one.

Code Drop. A POWERFUL MCP Agent

If you want to make your LLM work across contexts, pick up where it left off, add in this agent. You litterally tell your LLM 'Use the Process Manager to Create a Task to save your work point.' Then in the next prompt another LLM (or the next Context) can pickup where it left off 'Using the Process Manager - load the following task and keep working on it!)
Fully OpenSource!

Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop.

Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant

Hot Configthinkmeltprotonmail.com

OpenSource

This is fully OPEN SOURCED!!
Create an app.py and put in it this, you will need a venv with flask aiohttp etc..

import asyncio
import time
from flask import Flask, jsonify, render_template, request
import threading
import pickle
import os
import aiohttp
import json
import uuid
from typing import Dict, Any, List
import aiohttp
import json
import requests, sys

# ====================== CORE LLM CLIENT ======================
class AsyncLLMClient:
    def __init__(self, api_key: str, base_url: str = "https://api.openai.com/v1"):
        self.api_key = api_key
        self.base_url = base_url
        self.requests: Dict[str, Any] = {}
        self.monitor_tasks = []
        self._session = None
    async def __aenter__(self):
        self._session = aiohttp.ClientSession()
        return self
    async def __aexit__(self, exc_type, exc_val, exc_tb):
        if self._session:
            await self._session.close()
        for task in self.monitor_tasks:
            task.cancel()
        await asyncio.gather(*self.monitor_tasks, return_exceptions=True)
    async def send_request(self, prompt: str, **kwargs):
        request_id = f"req_{len(self.requests)}"
        request = {
            "prompt": prompt,
            "model": kwargs.get("model", "gpt-4o-mini"),
            "max_tokens": kwargs.get("max_tokens", 1024),
            "temperature": kwargs.get("temperature", 0.7),
            "response_chunks": [],
            "total_bytes": 0,
            "estimated_tokens": 0,
            "status": "idle",
            "error": None,
            "start_time": None,
            "end_time": None,
            "full_response": "",
            "request_id": request_id
        }
        self.requests[request_id] = request
        self.monitor_tasks.append(asyncio.create_task(self._monitor_bytes(request)))
        asyncio.create_task(self._execute_request(request))
        return request
    async def _monitor_bytes(self, request):
        request["status"] = "monitoring"
        request["start_time"] = time.time()
        last = 0
        try:
            while request["status"] == "monitoring":
                if len(request["response_chunks"]) > last:
                    new_data = b''.join(request["response_chunks"][last:])
                    request["total_bytes"] += len(new_data)
                    last = len(request["response_chunks"])
                    # Improved estimation for Llama.cpp and similar local models
                    # Llama.cpp typically uses ~3 characters per token (especially with code)
                    request["estimated_tokens"] = round(len(request["full_response"]) / 150)
                await asyncio.sleep(0.1)
        except asyncio.CancelledError:
            pass
        except Exception as e:
            request["status"] = "error"
            request["error"] = str(e)
        finally:
            request["end_time"] = time.time()
            if request["status"] != "error":
                request["status"] = "completed"
    async def _execute_request(self, request):
        url = f"{self.base_url}/chat/completions"
        headers = {"Authorization": f"Bearer {self.api_key}", "Content-Type": "application/json"}
        payload = {
            "model": request["model"],
            "messages": [{"role": "user", "content": request["prompt"]}],
            "max_tokens": request["max_tokens"],
            "temperature": request["temperature"],
            "stream": True
        }
        try:
            async with self._session.post(url, headers=headers, json=payload) as resp:
                if resp.status != 200:
                    request["status"] = "error"
                    request["error"] = await resp.text()
                    return
                async for chunk in resp.content.iter_chunked(1024):
                    if request["status"] == "error": break
                    request["response_chunks"].append(chunk)
                    request["full_response"] += chunk.decode('utf-8', errors='replace')
                request["status"] = "completed"
        except Exception as e:
            request["status"] = "error"
            request["error"] = str(e)
    async def _wait_for_completion(self, request):
        while request["status"] == "monitoring":
            await asyncio.sleep(0.1)
# ====================== BASE MANAGER ======================
class BaseManager:
    def __init__(self, state_file: str):
        self.state_file = state_file
        self.items: List[Dict] = []
        self._save_task = None
    def _get_serializable(self):
        return [{k: v for k, v in item.items() if k != "client"} for item in self.items]
    def save_state(self):
        try:
            with open(self.state_file, "wb") as f:
                pickle.dump(self._get_serializable(), f)
            print(f"State saved: {self.state_file}")
        except Exception as e:
            print(f"Save failed: {e}")
    def load_state(self):
        if os.path.exists(self.state_file):
            try:
                with open(self.state_file, "rb") as f:
                    self.items = pickle.load(f)
                print(f"Loaded {len(self.items)} items from {self.state_file}")
            except Exception:
                self.items = []
    def _save_after_change(self):
        self.save_state()
# ====================== MCP AGENT MANAGER ======================
class AsyncMCPAgentManager(BaseManager):
    def __init__(self):
        super().__init__("mcp_agents_state.pkl")
        self.scheduled_tasks = []
        self.load_state()
    async def __aenter__(self):
        for agent in self.items:
            if not agent.get("client"):
                client = AsyncLLMClient(agent["api_key"], agent["base_url"])
                agent["client"] = client
                await client.__aenter__()
        if self._save_task is None:
            self._save_task = asyncio.create_task(self._autosave_loop())
        return self
    async def __aexit__(self, *args):
        if self._save_task:
            self._save_task.cancel()
        self.save_state()
        for agent in self.items:
            if agent.get("client"):
                await agent["client"].__aexit__(None, None, None)
    async def _autosave_loop(self):
        while True:
            await asyncio.sleep(30)
            self.save_state()
    def list_mcp_agents(self):
        result = []
        for agent in self.items:
            client = agent.get("client")
            active = sum(1 for r in (client.requests.values() if client else {}) if r.get("status") == "monitoring")
            result.append({
                "agent_id": agent["agent_id"],
                "name": agent["name"],
                "description": agent.get("description", ""),
                "base_url": agent["base_url"],
                "enabled": agent.get("enabled", True),
                "enabled_tools": agent.get("enabled_tools", []),
                "active_requests": active,
                "total_requests": len(client.requests) if client else 0
            })
        return result
    async def create_mcp_agent(self, name, base_url, api_key, description="", enabled_tools=None, enabled=True):
        agent_id = str(uuid.uuid4())[:8]
        client = AsyncLLMClient(api_key, base_url)
        agent = {"agent_id": agent_id, "name": name, "description": description,
                 "base_url": base_url, "api_key": api_key, "enabled": enabled,
                 "enabled_tools": enabled_tools or [], "client": client}
        self.items.append(agent)
        await client.__aenter__()
        self._save_after_change()
        return agent
    async def update_mcp_agent(self, agent_id, **kwargs):
        agent = next((a for a in self.items if a["agent_id"] == agent_id), None)
        if not agent: return None
        for key in ("name", "description", "enabled", "enabled_tools"):
            if key in kwargs and kwargs[key] is not None:
                agent[key] = kwargs[key]
        if "base_url" in kwargs or "api_key" in kwargs:
            if agent.get("client"):
                await agent["client"].__aexit__(None, None, None)
            new_client = AsyncLLMClient(kwargs.get("api_key", agent["api_key"]),
                                        kwargs.get("base_url", agent["base_url"]))
            agent["base_url"] = kwargs.get("base_url", agent["base_url"])
            agent["api_key"] = kwargs.get("api_key", agent["api_key"])
            agent["client"] = new_client
            await new_client.__aenter__()
        self._save_after_change()
        return agent
    async def delete_mcp_agent(self, agent_id):
        for i, agent in enumerate(self.items):
            if agent["agent_id"] == agent_id:
                if agent.get("client"):
                    await agent["client"].__aexit__(None, None, None)
                del self.items[i]
                self._save_after_change()
                return True
        return False
    async def send_request(self, agent_id, prompt, **kwargs):
        agent = next((a for a in self.items if a["agent_id"] == agent_id), None)
        if not agent or not agent.get("enabled", False):
            raise ValueError(f"MCP Agent {agent_id} not found or disabled")
        return await agent["client"].send_request(prompt, **kwargs)
    async def distribute_prompts(self, prompts, agent_ids, mode="ALL_GET_ALL", **kwargs):
        results = []
        max_tokens = kwargs.get("max_tokens", 1024)
        temperature = kwargs.get("temperature", 0.7)
        model = kwargs.get("model", "gpt-4o-mini")

        if mode == "DONE_WALK":
            for i, prompt in enumerate(prompts):
                agent_id = agent_ids[i % len(agent_ids)]
                try:
                    req = await self.send_request(agent_id, prompt, max_tokens=max_tokens,
                                                  temperature=temperature, model=model)
                    agent = next((a for a in self.items if a["agent_id"] == agent_id), None)
                    if agent and agent.get("client"):
                        await agent["client"]._wait_for_completion(req)
                    results.append({"prompt_index": i, "agent_id": agent_id, "status": "completed"})
                except Exception as e:
                    results.append({"prompt_index": i, "agent_id": agent_id, "error": str(e)})
            return {"status": "success", "mode": "DONE_WALK", "details": results}

        # ALL_GET_ALL
        for prompt in prompts:
            for aid in agent_ids:
                try:
                    await self.send_request(aid, prompt, max_tokens=max_tokens,
                                            temperature=temperature, model=model)
                    results.append(f"Sent to {aid}")
                except Exception as e:
                    results.append(f"Error: {e}")
        return {"status": "success", "details": results}
    def decode_mcp_tools_list(self, json_data):
        """
        Decodes an MCP tools/list JSON-RPC response and returns clean structured data.
        """
        # Parse if input is a string
        if isinstance(json_data, str):
            try:
                data = json.loads(json_data)
            except json.JSONDecodeError as e:
                return {"success": False, "error": "Invalid JSON: " + str(e)}
        else:
            data = json_data
        tools = data.get("result", {}).get("tools", [])
        if not tools:
            return {"success": False, "error": "No tools found in the response."}
        decoded_tools = []
        def parse_input_schema(schema):
            props = schema.get("properties", {})
            required = set(schema.get("required", []))
            params = []
            for name, info in props.items():
                params.append({
                    "name": name,
                    "type": info.get("type", "any"),
                    "required": name in required,
                    "default": info.get("default")
                })
            return params
        for tool in tools:
            name = tool.get("name", "Unnamed")
            description = tool.get("description", "No description provided.")
            # Input parameters
            input_schema = tool.get("inputSchema", {})
            parameters = parse_input_schema(input_schema)
            # Output type
            output_schema = tool.get("outputSchema", {})
            output_type = output_schema.get("properties", {}) \
                .get("result", {}).get("type", "unknown")
            decoded_tools.append({
                "name": name,
                "description": description,
                "parameters": parameters,
                "output_type": output_type
            })
        return {
            "success": True,
            "tool_count": len(decoded_tools),
            "tools": decoded_tools
        }
    def parse_sse_mcp_response(self, sse_string):
        """
        Extracts the JSON payload from an SSE response (e.g. "event: message\ndata: {...}").
        """
        lines = [line.strip() for line in sse_string.strip().split("\n")]
        json_str = None
        for line in lines:
            if line.startswith("data:"):
                json_str = line[5:].strip()
                break
        if not json_str:
            return {"success": False, "error": "No 'data:' field found in SSE response."}
        try:
            return json.loads(json_str)
        except json.JSONDecodeError as e:
            return {"success": False, "error": "Invalid JSON in SSE data: " + str(e)}
    def scan_mcp_server(self, url, api_key=None):
        """
        Connects to an MCP server, performs initialize + tools/list,
        handles SSE responses, decodes the result, and returns clean structured data.
        """
        url = url.rstrip("/")
        if not url.endswith('mcp'):
            url += 'mcp'
        headers = {
            "Content-Type": "application/json",
            "Accept": "application/json, text/event-stream"
        }
        if api_key:
            headers["Authorization"] = f"Bearer {api_key}"

        # Step 1: Initialize session
        init_payload = {
            "jsonrpc": "2.0",
            "id": 99,
            "method": "initialize",
            "params": {
                "protocolVersion": "2024-11-05",
                "capabilities": {},
                "clientInfo": {"name": "flask-mcp-scanner", "version": "1.0"}
            }
        }
        init_response = requests.post(url, headers=headers, json=init_payload, timeout=15)
        if init_response.status_code != 200:
            return {"success": False, "error": f"Initialize failed: HTTP {init_response.status_code}"}

        server_name = ""
        try:
            init_text = init_response.text.strip()
            if "event:" in init_text:
                # SSE format
                json_rpc = self.parse_sse_mcp_response(init_text)
            else:
                json_rpc = init_response.json()

            if isinstance(json_rpc, dict):
                result = json_rpc.get("result", {})
                server_name = result.get("serverInfo", {}).get("name", "") or \
                              result.get("name", "")
        except Exception:
            pass  # fallback to empty name (will use URL hostname later)

        session_id = init_response.headers.get("Mcp-Session-Id") or \
                     init_response.headers.get("mcp-session-id")
        if not session_id:
            return {"success": False, "error": "No Mcp-Session-Id received from server."}

        # Step 2: Get tools list
        tools_headers = headers.copy()
        tools_headers["Mcp-Session-Id"] = session_id
        tools_payload = {
            "jsonrpc": "2.0",
            "id": 1,
            "method": "tools/list",
            "params": {}
        }
        tools_response = requests.post(url, headers=tools_headers, json=tools_payload, timeout=15)
        if tools_response.status_code != 200:
            return {"success": False, "error": f"Tools/list failed: HTTP {tools_response.status_code}"}

        # Step 3: Handle SSE or normal JSON
        raw_text = tools_response.text
        json_rpc = self.parse_sse_mcp_response(raw_text) if "event:" in raw_text else tools_response.json()

        if isinstance(json_rpc, dict) and "success" in json_rpc and not json_rpc.get("success"):
            return json_rpc

        # Step 4: Decode and return clean result
        retset =  self.decode_mcp_tools_list(json_rpc)

        if isinstance(retset, dict) and retset.get("success"):
            retset["server_name"] = server_name
        return retset


# ====================== LLM SERVER MANAGER ======================
class AsyncLLMServerManager(BaseManager):
    def __init__(self):
        super().__init__("llm_servers_state.pkl")
        self.load_state()
    async def __aenter__(self):
        for server in self.items:
            if not server.get("client"):
                client = AsyncLLMClient(server["api_key"], server["base_url"])
                server["client"] = client
                await client.__aenter__()
        if self._save_task is None:
            self._save_task = asyncio.create_task(self._autosave_loop())
        return self
    async def __aexit__(self, *args):
        if self._save_task:
            self._save_task.cancel()
        self.save_state()
        for server in self.items:
            if server.get("client"):
                await server["client"].__aexit__(None, None, None)
    async def _autosave_loop(self):
        while True:
            await asyncio.sleep(30)
            self.save_state()
    def list_llm_servers(self):
        result = []
        for server in self.items:
            client = server.get("client")
            active = sum(1 for r in (client.requests.values() if client else {}) if r.get("status") == "monitoring")
            result.append({
                "server_id": server.get("server_id"),
                "name": server["name"],
                "base_url": server["base_url"],
                "default_model": server.get("default_model", "gpt-4o-mini"),
                "active_requests": active
            })
        return result
    async def update_llm_server(self, server_id, **kwargs):
        server = next((s for s in self.items if s["server_id"] == server_id), None)
        if not server:
            return None
        for key in ("name", "default_model"):
            if key in kwargs and kwargs[key] is not None:
                server[key] = kwargs[key]
        if "base_url" in kwargs or "api_key" in kwargs:
            if server.get("client"):
                await server["client"].__aexit__(None, None, None)
            new_client = AsyncLLMClient(kwargs.get("api_key", server["api_key"]),
                                        kwargs.get("base_url", server["base_url"]))
            server["base_url"] = kwargs.get("base_url", server["base_url"])
            server["api_key"] = kwargs.get("api_key", server["api_key"])
            server["client"] = new_client
            await new_client.__aenter__()
        self._save_after_change()
        return server
    async def delete_llm_server(self, server_id):
        for i, server in enumerate(self.items):
            if server["server_id"] == server_id:
                if server.get("client"):
                    await server["client"].__aexit__(None, None, None)
                del self.items[i]
                self._save_after_change()
                return True
        return False
    async def create_llm_server(self, name, base_url, api_key, default_model="gpt-4o-mini", description=""):
        server_id = str(uuid.uuid4())[:8]
        client = AsyncLLMClient(api_key, base_url)
        server = {"server_id": server_id, "name": name, "description": description,
                  "base_url": base_url, "api_key": api_key, "default_model": default_model, "client": client}
        self.items.append(server)
        await client.__aenter__()
        self._save_after_change()
        return server
    async def distribute_prompts(self, prompts, server_ids, mcp_agent_ids=None, mcp_items=None, mode="ALL_GET_ALL",
                                 **kwargs):
        results = []
        max_tokens = kwargs.get("max_tokens", 1024)
        temperature = kwargs.get("temperature", 0.7)
        model = kwargs.get("model", "gpt-4o-mini")

        tool_context = ""
        if mcp_agent_ids and mcp_items:
            tool_names = []
            for agent in mcp_items:
                if agent.get("agent_id") in mcp_agent_ids:
                    tool_names.extend(agent.get("enabled_tools", []))
            if tool_names:
                tool_context = f"You have access to the following MCP agentic tools: {', '.join(tool_names)}.\nUse them when appropriate to solve the task.\n\n"

        for prompt in prompts:
            augmented_prompt = tool_context + prompt
            for sid in server_ids:
                try:
                    server = next((s for s in self.items if s["server_id"] == sid), None)
                    if not server or not server.get("client"):
                        continue
                    await server["client"].send_request(augmented_prompt,
                                                        max_tokens=max_tokens,
                                                        temperature=temperature,
                                                        model=model)
                    results.append(f"Sent to LLM server {sid}")
                except Exception as e:
                    results.append(f"Error sending to server {sid}: {e}")
        return {"status": "success", "details": results}
# ====================== MAIN MANAGER ======================
class AsyncMultiClientManager:
    def __init__(self):
        self.mcp_manager = AsyncMCPAgentManager()
        self.llm_manager = AsyncLLMServerManager()

    async def __aenter__(self):
        await self.mcp_manager.__aenter__()
        await self.llm_manager.__aenter__()
        return self

    async def __aexit__(self, *args):
        await self.mcp_manager.__aexit__(*args)
        await self.llm_manager.__aexit__(*args)
# ====================== FLASK APPLICATION ======================
async def main():
    manager = AsyncMultiClientManager()
    async with manager:
        app = Flask(__name__, template_folder='templates')
        app.config['SEND_FILE_MAX_AGE_DEFAULT'] = 0
        app.manager = manager
        app.loop = asyncio.get_running_loop()

        @app.route('/')
        def dashboard():
            return render_template('dashboard.html')

        @app.route('/api/status')
        def api_status():
            data = {
                "mcp_agents": manager.mcp_manager.list_mcp_agents(),
                "llm_servers": manager.llm_manager.list_llm_servers(),
                "scheduled_tasks": manager.mcp_manager.scheduled_tasks
            }

            # Active requests per client (MCP Agents + LLM Servers) – full prompt + live response
            for agent in manager.mcp_manager.items:
                key = f"{agent['name']} ({agent['agent_id']})"
                client = agent.get("client")
                if client:
                    data[key] = {
                        "active_requests": sum(1 for r in client.requests.values() if r["status"] == "monitoring"),
                        "requests": [{
                            "request_id": rid,
                            "prompt_preview": r.get("prompt", ""),
                            "full_response": r.get("full_response", ""),
                            "status": r["status"],
                            "total_bytes": r["total_bytes"],
                            "estimated_tokens": r.get("estimated_tokens", 0),
                            "duration": round((r.get("end_time") or time.time()) - (r.get("start_time") or time.time()), 2)
                        } for rid, r in client.requests.items()]
                    }

            for server in manager.llm_manager.items:
                key = f"{server['name']} ({server['server_id']})"
                client = server.get("client")
                if client:
                    data[key] = {
                        "active_requests": sum(1 for r in client.requests.values() if r["status"] == "monitoring"),
                        "requests": [{
                            "request_id": rid,
                            "prompt_preview": r.get("prompt", ""),
                            "full_response": r.get("full_response", ""),   # ← This was missing – now fixed
                            "status": r["status"],
                            "total_bytes": r["total_bytes"],
                            "estimated_tokens": r.get("estimated_tokens", 0),
                            "duration": round((r.get("end_time") or time.time()) - (r.get("start_time") or time.time()), 2)
                        } for rid, r in client.requests.items()]
                    }

            return jsonify(data)

        @app.route('/api/mcp_agents', methods=['GET', 'POST'])
        def api_mcp_agents():
            if request.method == 'GET':
                return jsonify({"mcp_agents": manager.mcp_manager.list_mcp_agents()})
            data = request.get_json() or {}
            future = asyncio.run_coroutine_threadsafe(
                manager.mcp_manager.create_mcp_agent(
                    name=data.get("name"),
                    base_url=data.get("base_url"),
                    api_key=data.get("api_key", ""),
                    description=data.get("description", ""),
                    enabled_tools=data.get("enabled_tools")
                ), app.loop)
            agent = future.result()
            return jsonify({"status": "created", "agent": manager.mcp_manager._get_serializable()[-1]}), 201

        @app.route('/api/mcp_agents/', methods=['PUT', 'DELETE'])
        def api_mcp_agent(agent_id):
            if request.method == 'DELETE':
                success = asyncio.run_coroutine_threadsafe(
                    manager.mcp_manager.delete_mcp_agent(agent_id), app.loop).result()
                return jsonify({"status": "deleted" if success else "failed"})

            # === PUT: Update existing agent ===
            data = request.get_json() or {}
            future = asyncio.run_coroutine_threadsafe(
                manager.mcp_manager.update_mcp_agent(agent_id, **data), app.loop)
            updated_agent = future.result()

            if updated_agent is None:
                return jsonify({"status": "failed", "error": "Agent not found"}), 404

            # Return ONLY serializable data (matches list_mcp_agents format)
            return jsonify({
                "status": "updated",
                "agent": {
                    "agent_id": updated_agent["agent_id"],
                    "name": updated_agent["name"],
                    "description": updated_agent.get("description", ""),
                    "base_url": updated_agent["base_url"],
                    "enabled": updated_agent.get("enabled", True),
                    "enabled_tools": updated_agent.get("enabled_tools", []),
                    "active_requests": 0,  # will be recalculated on next dashboard refresh
                    "total_requests": 0
                }
            })

        @app.route('/api/llm_servers', methods=['GET', 'POST'])
        def api_llm_servers():
            if request.method == 'GET':
                return jsonify({"llm_servers": manager.llm_manager.list_llm_servers()})
            data = request.get_json() or {}
            future = asyncio.run_coroutine_threadsafe(
                manager.llm_manager.create_llm_server(
                    name=data.get("name"),
                    base_url=data.get("base_url"),
                    api_key=data.get("api_key", ""),
                    default_model=data.get("default_model", "gpt-4o-mini")
                ), app.loop)
            server = future.result()
            return jsonify({"status": "created", "server": manager.llm_manager._get_serializable()[-1]}), 201

        @app.route('/api/llm_servers/', methods=['PUT', 'DELETE'])
        def api_llm_server(server_id):
            if request.method == 'DELETE':
                success = asyncio.run_coroutine_threadsafe(
                    manager.llm_manager.delete_llm_server(server_id), app.loop).result()
                return jsonify({"status": "deleted" if success else "failed"})
            data = request.get_json() or {}
            future = asyncio.run_coroutine_threadsafe(
                manager.llm_manager.update_llm_server(server_id, **data), app.loop)
            return jsonify({"status": "updated"})

        @app.route('/api/distribute_prompts', methods=['POST'])
        def api_distribute_prompts():
            data = request.get_json() or {}
            future = asyncio.run_coroutine_threadsafe(
                manager.llm_manager.distribute_prompts(
                    prompts=data.get("prompts", []),
                    server_ids=data.get("selected_servers", []),
                    mcp_agent_ids=data.get("selected_agents", []),
                    mcp_items=manager.mcp_manager.items,
                    mode=data.get("distribution_mode", "ALL_GET_ALL"),
                    max_tokens=data.get("max_tokens", 256000)
                ), app.loop)
            result = future.result()
            return jsonify(result)

        @app.route('/api/mcp_scan', methods=['POST'])
        def api_mcp_scan():
            data = request.get_json() or {}
            http_address = data.get("http_address")
            api_key = data.get("api_key")
            if not http_address:
                return jsonify({"success": False, "error": "Missing http_address"}), 400

            # Call directly - no asyncio wrapper needed for this synchronous method
            result = manager.mcp_manager.scan_mcp_server(http_address, api_key=api_key)
            return jsonify(result)

        def run_flask():
            app.run(host="0.0.0.0", port=5012, debug=False, use_reloader=False)

        threading.Thread(target=run_flask, daemon=True).start()
        print("✅ Flask dashboard started → http://localhost:5012")
        await asyncio.sleep(3600)


if __name__ == "__main__":
    asyncio.run(main())

Inside a templates folder create dashboard.html and put inside it.




    
    LLM Queue Dispatcher
    
    
    
    


    
        
            🤖 LLM Queue Dispatcher
            Last updated: just now
        
        
            
            
            
            
            
        

        
            
            
                
                    
                        
                        Loading active prompts...
                    
                
            

            
            
                
                    
                        
                        Loading Done-Walk queue...
                    
                
            

            
            
                
                    
                        🔧MCP Agents
                        
                    
                    
                        
                            
                                
                                    
                                        Name Description Base URL
                                        Enabled Tools Actions
                                    
                                
                                
                            
                        
                    
                
            

            
            
                
                    
                        🌐LLM Servers
                        
                    
                    
                        
                            
                                
                                    
                                        Name Base URL Default Model
                                        Active Requests Actions
                                    
                                
                                
                            
                        
                    
                
            

            
            
                
                    
                        Generated Prompts (Demarc all single-line/multi-line prompts with -----)  
                        
                    

                    
                        Select MCP Agents (Optional - Support Tools)
                        

                        Select LLM Servers * (Required)
                        

                        
                            Distribution Mode
                            
                                
                                All Selected Get All Prompts
                                
                                Done-Walk (Sequential)
                            
                        

                        
                    
                
            
        
    
    
    
        
            
                
                    Add New MCP Agent
                    
                
                
                    
                        

                        
                            
                                
                                    Name *
                                    
                                
                            
                            
                                
                                    Base URL *
                                    
                                
                            
                        

                        
                        
                            Description
                            
                            Spaces, punctuation, and multi-line text are preserved.
                        
                        
                            API Key
                            
                        
                        
                        
                            
                                Enabled Tools
                                
                            
                            
                            Hold Ctrl (Windows) or Cmd (Mac) to select multiple tools.

                            Full tool descriptions appear on hover.
                        
                        
                        
                            Enabled Tools (Text) – Legacy
                            
                            Comma-separated list (spaces around commas are ignored).
                        
                    
                
                
                    
                    
                
            
        
    

    
    
        
            
                
                    Add New LLM Server
                    
                
                
                    
                        
                        Server Name
                        Base URL
                        API Key
                        Default Model

Bulding into a docker container!

Create requirements.txt and put in it:

Flask>=3.0.0
aiohttp>=3.10.0
requests>=2.32.0

Next create a file named Dockerfile and put in it:

FROM python:3.12-slim

# Install minimal system dependencies (if required by future extensions)
RUN apt-get update && apt-get install -y --no-install-recommends \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code (including templates/ directory)
COPY . .

# Ensure the templates directory exists (in case it is missing)
RUN mkdir -p templates

# Expose the port used by the Flask application
EXPOSE 5012

# Run the application
CMD ["python", "app.py"]

Lastly create a docker-compose.yml and put in it:

version: '3.8'

services:
  mcp-dashboard:
    build: .
    container_name: mcp-dashboard
    ports:
      - "5012:5012"
    volumes:
      # Persist pickled state files (mcp_agents_state.pkl and llm_servers_state.pkl)
      - ./state:/app
      # Mount templates directory for the dashboard HTML
      - ./templates:/app/templates
    restart: unless-stopped
    environment:
      - PYTHONUNBUFFERED=1

Usage Instructions

Save the provided source code as app.py in the project root.
Create a templates/ directory in the project root and place dashboard.html inside it (the code calls render_template('dashboard.html')).
Create a state/ directory in the project root (this will store the persistent .pkl files).
Build and run the container:

docker compose up --build -d

Access the dashboard at http://localhost:5012.

The setup uses a lightweight Python 3.12 image, installs only the required packages, preserves application state across container restarts, and follows Docker best practices for a Flask + asyncio application. No additional services are required.

We recommend doing this inside pycharm. For the less initiated it will handle your environment setup etc.

Pycharm Community Fast Install with Install bash Script. Speed Boosting with 10-Core Configuration.

Pycharm Community Fast Install with Install bash Script.

Hot Configthinkmeltprotonmail.com

One it runs it will show this in the console:

You simply access it at it's end-point golden!!

Agentic Server Primer: Llama.cpp MCP Lesson 10: mcp-coder (Cuda Version)

thinkmelt@protonmail.com — Sat, 09 May 2026 04:26:17 GMT

This is a specialized MCP agent in that it is designed for your LLM to work with, pull, compile and develop it's own software - specifically for Nvidia nvcc Cuda! It is effectively its entire own build agent! However the potential is utterly amazing, it can effectively have your LLM writing advanced GPU code (aka configuring it's own Llama.cpp!)

MTP / TurboQuant Forked Llama.cpp

We hot compile one of the first combo MTP / TurboQuant forks in the world!

Hot Configthinkmeltprotonmail.com

In the above link we looked at one of the worlds first MTP/TurboQuant forks of Llama.cpp but stopped because it was not specifically for a Qwen3.6 which because of it's advanced nature we really wanted to keep.
Our goal is to see if a HouseLLM can compile a MTP (Multiple-Token-Prediction) cross-blend of TurboQuant forked Llama.cpp and get it to work with Qwen3.6!
This is a very challenging MCP Agent that we had to rewrite close to a dozen times. In the end it worked! The issue is that mcp calls can often fail and finally every single mcp end point required using two formats. This gives higher compatibility with many calling LLM's, thusly:

All tools now support both calling styles:

Normal parameters: read_file("script.py")
Dictionary input: read_file({"file_path": "script.py"})

If you want to simply pull and run this mcp-coder:

#!/bin/bash

CONTAINER_NAME="mcp-coder"

# Cleanup previous container
docker stop $CONTAINER_NAME 2>/dev/null
docker rm $CONTAINER_NAME 2>/dev/null

# Ensure workspace directory exists
mkdir -p ~/mcp-workspace
chmod -R 755 ~/mcp-workspace

# Pull latest version
docker pull cnmcdee/mcp-coder:latest

# Start container
docker run -d \
--name $CONTAINER_NAME \
--restart unless-stopped \
-p 5011:5011 \
-v ~/mcp-workspace:/work_path \
--env PYTHONUNBUFFERED=1 \
cnmcdee/mcp-coder:latest

echo "✅ MCP Coder container started successfully!"
echo "🌐 Access URL: http://localhost:5011/mcp"
echo "📋 Logs: docker logs -f $CONTAINER_NAME"

We used python as a pass-through, with it's own CORS http point. There are a couple paths one can take when you are making these, namely:

You give it a number of guided mcp command end-points representing system operations like 'git push' , 'git pull' - but then you need to do it for all of them, or:
Or you simply give it the ability to read, write, open files, and execute system commands knowing it has the agentic ability to conceptually understand what it is doing. We choose this path because these LLM's are simply that good.

Source Code

from fastmcp import FastMCP
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
import uvicorn
import subprocess
import os
from pathlib import Path
import re
import json

# =============================================================================
# Enhanced MCP Server - Restricted to Specific Target Directory
# Optimized for reliable tool calling by Qwen3.6 and other LLMs
# =============================================================================

mcp = FastMCP(name="Target Directory MCP Server")

# ── Configuration ────────────────────────────────────────────────────────────

# CHANGE THIS TO YOUR DESIRED TARGET DIRECTORY
BASE_DIRECTORY = "/work_path"

# Security limits
MAX_FILE_SIZE = 10000 * 1024 * 1024      # 10 GB
MAX_OUTPUT_LENGTH = 50000000000           # 50 KB output limit

# All operations are forced inside BASE_DIRECTORY
ALLOWED_DIRECTORIES = [BASE_DIRECTORY]


# ── Helper Functions ─────────────────────────────────────────────────────────

def validate_path(file_path):
    """Force all paths to be inside the BASE_DIRECTORY."""
    if not os.path.isabs(file_path):
        file_path = os.path.join(BASE_DIRECTORY, file_path)

    path = Path(file_path).resolve()
    base_path = Path(BASE_DIRECTORY).resolve()

    if not str(path).startswith(str(base_path)):
        raise ValueError(f"Path must be inside the target directory: {BASE_DIRECTORY}")

    if '..' in path.parts:
        raise ValueError("Path traversal detected")

    return path


def truncate_output(output, max_length=MAX_OUTPUT_LENGTH):
    """Truncate output if it exceeds the maximum length."""
    if len(output) <= max_length:
        return output

    truncated = output[:max_length]
    last_newline = truncated.rfind('\n')
    if last_newline > max_length * 0.8:
        truncated = truncated[:last_newline + 1]

    return truncated + f"\n\n[Output truncated to {max_length} characters]"


# ── File System Tools ────────────────────────────────────────────────────────

@mcp.tool
def read_file(file_path):
    """
    Read the entire content of a file inside the target directory.

    Parameters:
    - file_path (string): Relative or absolute path to the file.
                         Example: "script.py" or "folder/subfolder/file.txt"

    Returns: The full text content of the file as a string.
    """
    try:
        path = validate_path(file_path)

        if not path.is_file():
            return f"Error: File not found: {file_path}"

        file_size = path.stat().st_size
        if file_size > MAX_FILE_SIZE:
            return f"Error: File too large ({file_size} bytes). Maximum allowed: {MAX_FILE_SIZE} bytes."

        return path.read_text(encoding="utf-8")

    except Exception as e:
        return f"Error reading file: {str(e)}"


@mcp.tool
def write_file(file_path, content, mode="w"):
    """
    Write or append text content to a file.

    Parameters:
    - file_path (string): Path to the file (relative or absolute).
    - content (string): The text you want to write.
    - mode (string): "w" to overwrite (default) or "a" to append.

    Example:
    write_file("notes.txt", "Hello world", "w")
    """
    if mode not in ["w", "a"]:
        return "Error: mode must be 'w' or 'a'"

    try:
        path = validate_path(file_path)
        path.parent.mkdir(parents=True, exist_ok=True)

        with open(path, mode, encoding="utf-8") as f:
            f.write(content)

        action = "overwritten" if mode == "w" else "appended to"
        return f"Successfully {action}: {file_path}"

    except Exception as e:
        return f"Error writing file: {str(e)}"


@mcp.tool
def delete_file(file_path):
    """
    Delete a file.

    Parameters:
    - file_path (string): Path to the file to delete.
    """
    try:
        path = validate_path(file_path)

        if not path.is_file():
            return f"Error: File not found: {file_path}"

        path.unlink()
        return f"Successfully deleted: {file_path}"

    except Exception as e:
        return f"Error deleting file: {str(e)}"


@mcp.tool
def replace_line(file_path, line_number, new_content):
    """
    Replace a specific line in a file by line number.

    Parameters:
    - file_path (string): Path to the file.
    - line_number (integer): Line number to replace (starts at 1).
    - new_content (string): New text for that line.

    Example: replace_line("main.py", 42, "    print('Updated')")
    """
    try:
        path = validate_path(file_path)

        if not path.is_file():
            return f"Error: File not found: {file_path}"

        lines = path.read_text(encoding="utf-8").splitlines(keepends=True)

        if line_number < 1 or line_number > len(lines):
            return f"Error: Line number {line_number} is out of range. File has {len(lines)} lines."

        original_ending = lines[line_number - 1][-1:] if lines[line_number - 1] else '\n'
        lines[line_number - 1] = new_content.rstrip() + original_ending

        path.write_text(''.join(lines), encoding="utf-8")
        return f"Successfully replaced line {line_number} in {file_path}"

    except Exception as e:
        return f"Error replacing line: {str(e)}"


@mcp.tool
def list_directory(directory="."):
    """
    List files and folders in a directory.

    Parameters:
    - directory (string): Optional. Directory to list. Default is current directory ".".

    Returns: List of strings in format "D/foldername" or "F/filename".
    """
    try:
        path = validate_path(directory)

        if not path.is_dir():
            return ["Error: Not a directory"]

        items = []
        for item in sorted(path.iterdir()):
            prefix = "D/" if item.is_dir() else "F/"
            items.append(f"{prefix}{item.name}")

        return items

    except Exception as e:
        return [f"Error: {str(e)}"]


@mcp.tool
def get_file_info(file_path):
    """
    Get detailed information about a file or directory.

    Parameters:
    - file_path (string): Path to the file or folder.
    """
    try:
        path = validate_path(file_path)

        if not path.exists():
            return {"error": f"Path not found: {file_path}"}

        stat = path.stat()
        return {
            "path": str(path),
            "exists": True,
            "is_file": path.is_file(),
            "is_dir": path.is_dir(),
            "size_bytes": stat.st_size,
            "modified_timestamp": stat.st_mtime,
            "permissions": oct(stat.st_mode)[-3:],
            "name": path.name
        }

    except Exception as e:
        return {"error": str(e)}


@mcp.tool
def search_files(directory=".", pattern="*"):
    """
    Recursively search for files matching a pattern.

    Parameters:
    - directory (string): Starting directory. Default ".".
    - pattern (string): Glob pattern. Examples: "*.py", "*.txt", "config*.json"

    Returns: List of matching file paths.
    """
    try:
        path = validate_path(directory)

        if not path.is_dir():
            return [f"Error: Not a directory: {directory}"]

        matches = [str(item) for item in path.rglob(pattern) if item.is_file()]
        return sorted(matches)

    except Exception as e:
        return [f"Error: {str(e)}"]


# ── Shell Command Tool ───────────────────────────────────────────────────────

@mcp.tool
def run_command(command, cwd=None, timeout=180):
    """
    Execute a shell command inside the restricted /work_path directory.

    Parameters:
    - command: Can be either:
        - A string: "ls -la"
        - Or a dictionary: {"command": "ls -la", "cwd": "subfolder", "timeout": 60}
    - cwd (string, optional): Working directory.
    - timeout (integer, optional): Maximum time in seconds.

    Returns: Detailed output including STDOUT, STDERR, and return code.
    """
    try:
        # === Handle dictionary input (for models that pass one dict) ===
        if isinstance(command, dict):
            data = command
            command = data.get("command") or data.get("cmd")
            cwd = data.get("cwd") or cwd
            timeout = data.get("timeout") or timeout

        # Ensure command is a string
        if not isinstance(command, str):
            return f"Error: 'command' must be a string or a dict containing 'command'. Got: {type(command)}"

        # Set default working directory
        if cwd is None:
            cwd = BASE_DIRECTORY
        else:
            cwd_path = validate_path(cwd)
            cwd = str(cwd_path)

        # Basic safety check
        dangerous = ["&&", ";", "|", ">", "<", ">>", "sudo", "su ", "rm -rf /", "mkfs", "shutdown"]
        for pattern in dangerous:
            if pattern in command.lower():
                return f"Error: Dangerous command pattern detected: '{pattern}'"

        # Execute command
        result = subprocess.run(
            command,
            shell=True,
            cwd=cwd,
            capture_output=True,
            text=True,
            timeout=timeout
        )

        stdout = truncate_output(result.stdout)
        stderr = truncate_output(result.stderr)

        # Build response
        response_parts = [
            f"Command: {command}",
            f"Working directory: {cwd}",
            f"Return code: {result.returncode}"
        ]

        if stdout.strip():
            response_parts.append(f"\nSTDOUT:\n{stdout}")
        else:
            response_parts.append("\nSTDOUT: (no output)")

        if stderr.strip():
            response_parts.append(f"\nSTDERR:\n{stderr}")
        else:
            response_parts.append("\nSTDERR: (no output)")

        return "\n".join(response_parts)

    except subprocess.TimeoutExpired:
        return f"Error: Command timed out after {timeout} seconds."
    except Exception as e:
        return f"Error executing command: {str(e)}"
# ── Server Setup ─────────────────────────────────────────────────────────────

if __name__ == "__main__":
    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=["*"],
            allow_credentials=False,
            allow_methods=["GET", "POST", "OPTIONS"],
            allow_headers=["*"],
            expose_headers=["*"],
            max_age=3600,
        )
    ]

    app = mcp.http_app(
        path="/mcp",
        middleware=middleware
    )

    print("🚀 Starting Target Directory Restricted MCP Server")
    print(f"→ All operations restricted to: {BASE_DIRECTORY}")
    print("→ Tools available: read_file, write_file, delete_file, replace_line,")
    print("                   list_directory, get_file_info, search_files, run_command")

    uvicorn.run(app, host="0.0.0.0", port=5011, log_level="info")

Here are the complete files you need to run the MCP Server in Docker:

1. `requirements.txt`

fastmcp
uvicorn[standard]
starlette

2. `Dockerfile`

FROM nvidia/cuda:13.1.2-devel-ubuntu22.04

WORKDIR /app

ENV DEBIAN_FRONTEND=noninteractive \
    TZ=UTC \
    PYTHONUNBUFFERED=1

# System packages + Python
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    software-properties-common sudo git build-essential cmake ninja-build curl wget ca-certificates tzdata && \
    add-apt-repository ppa:deadsnakes/ppa -y && \
    apt-get update && \
    apt-get install -y --no-install-recommends \
    python3.11 python3.11-venv python3.11-dev python3-pip && \
    rm -rf /var/lib/apt/lists/* && \
    update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 && \
    update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1 && \
    update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1

# Create work directory
RUN mkdir -p /work_path && chown -R 1000:1000 /work_path && chmod -R 755 /work_path

COPY requirements.txt .
RUN python -m pip install --no-cache-dir -r requirements.txt

COPY server.py .

RUN useradd -m -u 1000 -s /bin/bash mcpuser && \
    echo "mcpuser ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers && \
    chown -R mcpuser:mcpuser /app

USER mcpuser

EXPOSE 5011

# More explicit CMD + shell wrapper for better error messages
CMD ["python", "server.py"]

3. `docker-compose.yml`

version: '3.9'

services:
  mcp-server:
    build: .
    container_name: mcp-server
    restart: unless-stopped
    ports:
      - "5011:5011"
    volumes:
      # Mount your target project directory into the container
      - ./target-project:/app/project
    environment:
      - PYTHONUNBUFFERED=1
    # Optional: Add GPU support if you need CUDA/nvcc inside the container
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: all
    #           capabilities: [gpu]

How to Use

Save the files:

requirements.txt
Dockerfile
docker-compose.yml

Rename your server code to server.py (or update the Dockerfile accordingly).

Update the target directory in server.py:

BASE_DIRECTORY = "/app/project"   # This matches the volume mount

Create the project folder (next to docker-compose.yml):

mkdir target-project

Start the server:

docker-compose up --build

Or in detached mode:

docker-compose up -d --build

Once its Working the Fun Really Starts!

Using Target Directory Server explore all its available tools and test their functionality. With it and the Process Manager tool the goal is to make a combined llama.cpp that combines these two repositories

TurboQuant Base: https://github.com/TheTom/llama-cpp-turboquant.git (use the feature/turboquant-kv-cache branch if available, otherwise main)
MTP + TurboQuant Combined Variant: https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant.git (this fork already includes both TurboQuant and Gemma-4-style MTP support)  The objective is to specifically create a Llama.cpp that can do BOTH MTP AND TUROQUANT specfically for Qwen3.6

You can install upgrade or do whatever you need inside the Target Directory Server, including git pull, and it already has a full build environment with nvcc. As you go document and save your progress in detail to the Process Manager with frequent updates to your tasks. Make sure the tasks are detailed enough that if you cannot complete this you can do it again on the next task. make sure it can compile and fix anything that won't.

MTP / TurboQuant Forked Llama.cpp

thinkmelt@protonmail.com — Fri, 08 May 2026 04:48:48 GMT

When Turboquant hit it was BIG. Google's new method of reducing the KV_Cache in models allowed much larger contexts. Suddenly what was the dream of enterprise LLM's was now a house GPU. Things didn't slow down there with MTP (Multiple Token Prediction) following suite - that ran parallel prediction threads allowing for speed ups up to 2x!

Here is the guide!

Comprehensive Installation Script for AtomicBot-ai/atomic-llama-cpp-turboquant (TurboQuant + Gemma 4 MTP)

This guide provides a single, self-contained Bash script that installs all required system dependencies, clones the repository, configures the build with maximum supported backends, compiles the project, and verifies the installation. The script is designed for Ubuntu/Debian-based Linux distributions (most common for servers and workstations). Adaptations for other platforms are noted at the end.

Full Automated Installation Script

Save the following as install-llama-turboquant.sh and run it with administrator privileges where needed.

#!/bin/bash
# =============================================================================
# Automated Installation Script for Atomic llama.cpp (TurboQuant + MTP)
# Optimized for RTX 4080 - CUDA Primary
# =============================================================================

set -e

echo "=== Starting Installation for RTX 4080 (CUDA + TurboQuant + MTP) ==="

# 1. System dependencies
echo "Installing system dependencies..."
sudo apt-get update
sudo apt-get install -y \
    build-essential cmake git python3 python3-pip ninja-build \
    libblas-dev liblapack-dev pkg-config curl wget \
    libssl-dev zlib1g-dev

# 2. CUDA Toolkit (required for RTX 4080)
echo "Installing NVIDIA CUDA Toolkit..."
if ! command -v nvcc >/dev/null 2>&1; then
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    sudo apt-get update
    sudo apt-get install -y cuda-toolkit-12-4
    echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
    echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
    source ~/.bashrc
fi

# 3. Clone the repository
echo "Cloning and preparing the fork..."
git clone https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant.git
cd atomic-llama-cpp-turboquant
git checkout feature/turboquant-kv-cache
git submodule update --init --recursive

# 4. Build with CUDA (optimized for RTX 4080)
echo "Building with CUDA for RTX 4080..."
mkdir -p build && cd build

cmake .. -DCMAKE_BUILD_TYPE=Release \
  -DGGML_CUDA=ON \
  -DCMAKE_CUDA_ARCHITECTURES=89 \     # Specific for RTX 4080 / 4090
  -DLLAMA_FLASH_ATTN=ON \
  -DGGML_VULKAN=OFF \                 # Disable Vulkan (not needed)
  -DGGML_HIP=OFF \
  -DGGML_METAL=OFF \
  -G Ninja

ninja -j$(nproc)

echo "=== Build completed successfully! ==="

# 5. Verification
echo "Verifying build..."
./bin/llama-cli --version
echo "CUDA devices detected:"
./bin/llama-cli -ngl 99 --version | grep -i cuda || echo "CUDA ready"

echo "=== Installation finished for RTX 4080 ==="
echo "Binaries are in: $(pwd)/bin"

How to Use the Script

Make the script executable and run it:

chmod +x install-llama-turboquant.sh
sudo ./install-llama-turboquant.sh

Errors? Try these fixes:

Diagnosis of the CMake CUDA Error

The error occurs because CMake is detecting the wrong nvcc (at /usr/bin/nvcc, which is a broken or outdated symlink/installation) instead of the proper CUDA Toolkit located in /usr/local/cuda. Additionally, CUDA 13.2 combined with the current CMake version is causing flag incompatibilities (--static-host-stub is not recognized during the compiler ID test).

Quick Fix: Updated Build Script for RTX 4080

Replace your previous build section with this corrected version. Run the commands below after you have the repository cloned.

cd atomic-llama-cpp-turboquant

# Clean previous failed build
rm -rf build
mkdir -p build && cd build

echo "=== Configuring CMake with explicit CUDA paths for RTX 4080 ==="

cmake .. -DCMAKE_BUILD_TYPE=Release \
  -DGGML_CUDA=ON \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \     # ← Explicit fix
  -DCMAKE_CUDA_ARCHITECTURES=89 \                      # RTX 4080 (Ada)
  -DLLAMA_FLASH_ATTN=ON \
  -DGGML_VULKAN=OFF \
  -DGGML_HIP=OFF \
  -DGGML_METAL=OFF \
  -G Ninja

ninja -j$(nproc)

If the Above Still Fails – Full Recovery Steps

Run these commands in order:

Verify CUDA Installation

/usr/local/cuda/bin/nvcc --version
nvidia-smi

Fix Environment Variables (add to ~/.bashrc and run source ~/.bashrc)

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc

Re-run CMake with more explicit settings

cmake .. -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
  -DCMAKE_CUDA_ARCHITECTURES=89 \
  -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
  -DLLAMA_CUDA=ON \
  -DLLAMA_FLASH_ATTN=ON \
  -G Ninja

Alternative: Use a Compatible CMake Version
If you are using a very new CMake (4.x), downgrade temporarily:

sudo apt remove --purge cmake -y
sudo apt install cmake=3.28.* -y   # or install via pip: pip install cmake

Recommended Full Test Command After Successful Build

./bin/llama-cli --version
./bin/llama-cli -m your-model.gguf -ngl 99 -ctk turbo3 -ctv turbo3 -fa on -p "Test"

Would you like me to provide a simpler one-line build command or switch to a Docker-based solution (which avoids all these system CUDA issues)? Just paste the exact output if the error persists after trying the explicit CMAKE_CUDA_COMPILER flag.

After completion, test TurboQuant:

cd atomic-llama-cpp-turboquant/build
./bin/llama-cli -m your-model.gguf -c 32768 -ngl 99 \
  --cache-type-k turbo3 --cache-type-v turbo3 -fa 1 -p "Test prompt"

For Gemma 4 MTP (recommended):

Download target model and assistant head from AtomicChat collections on Hugging Face.
Use the provided helper scripts in the repository (scripts/run-gemma4-mtp-server.sh).

Platform-Specific Notes

macOS (Apple Silicon):

Install Xcode Command Line Tools: xcode-select --install
Use Homebrew: brew install cmake git ninja
The script’s Metal flag (-DGGML_METAL=ON) will enable optimized TurboFlash kernels.
Run the CMake step without CUDA/HIP.

Windows:

Use PowerShell with Visual Studio Build Tools or WSL2 (recommended).
A community PowerShell script exists for ROCm on Windows.

AMD ROCm (HIP):

Install ROCm first via official AMD instructions, then enable -DGGML_HIP=ON.

Post-Installation Recommendations

Update regularly: git pull && git submodule update --init --recursive followed by rebuild.
Pre-built binaries: Check the repository’s Releases page for ready-to-use versions on supported platforms.
Documentation: Refer to MTP.md, docs/build.md, and docs/speculative.md in the cloned repository for advanced configuration.

This script ensures a complete, production-ready environment with all required libraries and backends. If you encounter platform-specific issues or require Docker/containerized version, provide your operating system and hardware details for further customization.

Recommended MTP Settings for AtomicBot-ai/atomic-llama-cpp-turboquant

The fork includes dedicated support for Gemma 4 Multi-Token Prediction (MTP) speculative decoding. It loads the official small assistant head (drafter) via a separate GGUF file and achieves ~30–50% throughput gains on short-to-medium prompts with negligible quality loss.

Core MTP Flags (Essential)

Use these in combination with TurboQuant KV cache:

--mtp-head /path/to/gemma-4-*-assistant.Q4_K_M.gguf \
--spec-type mtp \
--draft-block-size 3 \
--draft-max 8 \
--draft-min 0

Recommended Full Command (Balanced Performance)

./build/bin/llama-server \
  -m /path/to/gemma-4-31B-it-Q4_K_M.gguf \
  --mtp-head /path/to/gemma-4-31B-it-assistant.Q4_K_M.gguf \
  --spec-type mtp \
  --draft-block-size 3 --draft-max 8 --draft-min 0 \
  -ngl 99 -ngld 99 \
  -ctk turbo3 -ctv turbo3 -ctkd turbo3 -ctvd turbo3 \
  -fa on \
  -c 16384 -b 2048 -ub 512 \
  --temp 0.7 --top-p 0.95

Key MTP Parameter Explanations and Tuning

--spec-type mtp: Enables Gemma 4–specific MTP speculative decoding.
--mtp-head: Path to the dedicated assistant/drafter GGUF (highly recommended: Q4_K_M or Q5_K_M for best speed/quality balance).
--draft-block-size 3: Number of tokens the drafter predicts per step (default 3 works well; 2 for more conservative, 4 for aggressive).
--draft-max 8: Maximum number of draft tokens verified per step (8–12 for throughput; lower for higher acceptance rate).
--draft-min 0: Minimum draft tokens (0 is standard).
-ngld 99: Full GPU offload for the drafter head (critical for performance).

Preset Scripts in the Repository

The fork includes ready-made launchers:

scripts/run-gemma4-31b-mtp-server.sh
Environment variable: MTP_PRESET=throughput|lift|balanced|quality

These presets automatically adjust draft parameters for different priorities (e.g., maximum speed vs. best acceptance rate).

Additional Performance Tips

Combine with TurboQuant: -ctk turbo3 -ctv turbo3 (and the -ctkd/-ctvd variants for drafter) for extreme context scaling.
Use Flash Attention: -fa on.
For longer contexts, increase -c (up to 131072+ with TurboQuant).
Lower temperature (e.g., --temp 0.1–0.4) often yields better speculative acceptance rates.

These settings provide strong gains while remaining stable. For the absolute latest recommendations, consult the repository’s MTP.md file after cloning.

If you provide your hardware (GPU/VRAM), target model size, and primary goal (maximum speed, longest context, or quality), I can refine the command further.

Observations

Once we had it working we wanted to keep it away from our 'clean' turboquant llama.cpp that we had been using therefore we went into the build path and gave it its own absolute referenced directory so:

cd build
sudo mkdir /bin/atomic
sudo cp * /bin/atomic

It was simply a matter of explicit referencing it from our models directory by copying an existing model script so. We had always liked the Qwen3.6-35B for solid reliable local performance and had a MTP enabled copy handy, we gave it it's own script to see how it would do.

Note we have about 6-8 sub-agents that give it agentic workflows.
We are still working this guide for now.

WE STOPPED HERE WHY?

Qwen3.6 was significantly better at coding. however no combo-MTP-TurboQuant option exists for it yet that we know of. Because of this we are monitoring for now, but we leave this guide up for anyone that is looking to work with Gemma!

docker-compose.yml -> docker run Converter

thinkmelt@protonmail.com — Fri, 08 May 2026 01:46:09 GMT

docker run Converter">

Book mark this. Need to convert a docker-compose.yml to docker run type command?

Docker Compose → Docker Run Converter (Ultra Robust)

Paste your docker-compose.yml content. Indentation is now largely ignored.

Agentic Server Primer: Llama.cpp MCP Lesson 9: Docker Orchestrator

thinkmelt@protonmail.com — Sun, 19 Apr 2026 14:27:43 GMT

In Lessons 1-8 we covered everything from a scientific calculator, to python compilation, and today we will be looking at rolling your own docker orchestrator.

If you just need to pull and run this docker image:

docker pull docker.io/cnmcdee/mcp-docker-orchestrator:latest

docker run -d \
  --name mcp-docker-orchestrator \
  --restart unless-stopped \
  -p 0.0.0.0:5010:5010 \
  -e "FLASH_ENV=production" \
  -e ENV_SERVER="${ENV_SERVER}" \
  -e ENV_USER="${ENV_USER}" \
  -e ENV_PASSWORD="${ENV_PASSWORD:-}" \
  -e ENV_PORT="${ENV_PORT:-22}" \
  cnmcdee/mcp-docker-orchestrator

This is very powerful, not only can your LLM write and test it's own code using the other MCP tools, it can then successfully stand up the code into a running container.
We noted there was some challenges getting the LLM to see the docker endpoint, and or it took some tries. it would suggest that the number of training tokens that LLM's receive in this field may be sparse.

Let's get started!

A. Prerequisites

To understand all the moving parts we will preface this with all the commands that this MCP agent is capable of. Because it requires careful prompting to work effectively, here is it's tool list (Written by Qwen 3.6)

Here are the Docker tools available to you, organized by functionality:

🖼️ Image Management

docker_images – List all Docker images present on the remote server
docker_pull – Pull an image (or specific tag) from a registry to the remote server
docker_build – Build a Docker image from a Dockerfile in a specified context directory

📦 Container Management

docker_ps – List running containers (set all=True to include stopped ones)
docker_run – Create & start a new container (supports port mappings, env vars, volumes, custom commands)
docker_stop – Stop a running container
docker_start – Start a stopped container
docker_restart – Restart a container
docker_rm – Remove containers (use force=True to remove running ones)
docker_logs – Fetch logs from a container (supports tail line limit and follow streaming)

📝 Docker Compose Management

docker_compose_up – Start services defined in a docker-compose.yml
docker_compose_down – Stop & remove containers, networks, and optionally named volumes
docker_compose_build – Build or rebuild services defined in a compose file
docker_compose_ps – List containers for a specific compose project
docker_compose_logs – View logs from compose services (supports filtering by service & follow mode)
docker_compose_command – Execute any arbitrary docker compose subcommand with custom arguments
docker_compose_deploy – Fully deploy an app by uploading Dockerfile, requirements.txt, app.py, and docker-compose.yml to ~/docker/{project_name}, then building & running it

💡 Note: All Docker tools execute commands on the remote server configured via your global SSH session.

A. Docker Controller Model

The docker controller model can seem complex - and we will illustrate the moving parts.

User enters prompt to the llm web face at their end point (192.168.1.4:8080), that becomes a json object which is inferenced by the LLM. It examines its available tool list and uses the MCP docker tool. The MCP docker tool is holding a ssh via a paramiko pipe to the working server. This could theoretically be docker-in-docker but for simplicity sake we just gave it it's own server. If you have a spare 4 core laptop working that is a perfect candidate for this.
The llm recursively attempts the tools, and it will received json object feedback via the MCP agent to it's own progress
The end-user can watch the docker process and docker image lists to see if it is successfully building images and or standing up containers.


When you are done building this software use the docker tool to create a Dockerfile, a requirements.txt a app.py and a docker-compose.yml. Build an image and verify it's there, then stand up that image on port 7001.  Finally using the web requests tool make sure it is running at the server point of 192.168.1.4:7001

Setting the Environment

Because one docker container needs to ssh to remove one layer of complexity you can have it work without a password during the testing phase:

ssh-keygen   # will make a password
ssh-copy-id you@192.168.1.3  # will allow passwordless access.

Please note we are also approaching production - so you will need to pass environment variables that hold the ENV for the remote server. For instance:

ENV_PASSWORD = os.environ.get('ENV_PASSWORD')
ENV_SERVER = os.environ.get('ENV_SERVER')
ENV_USER = os.environ.get('ENV_USER')

Thus an example run command for the python could be a script that is simply:

export ENV_PASSWORD='your docker server password'
export ENV_SERVER='192.168.1.4' # Or wherever it lives
export ENV_USER='user'
python3 mcp_agent.py  #Inside it uses os.environ.get to retreive

Full Code

from fastmcp import FastMCP
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
import uvicorn
import paramiko
import threading
import os
import traceback
import textwrap
import yaml

# ── Global SSH Session Manager ─────────────────────────────────────────────
ssh_sessions = {}
session_lock = threading.Lock()
ENV_PASSWORD = os.environ.get('ENV_PASSWORD')
ENV_SERVER = os.environ.get('ENV_SERVER')
ENV_USER = os.environ.get('ENV_USER')
ENV_PORT = 22


def get_or_create_ssh_session(server: str, username: str, password: str = None, key_path: str = None, port: int = 22) -> str:
    """Create or retrieve a persistent SSH session to a remote server.

    Maintains a thread-safe pool of Paramiko SSHClient connections keyed by
    ``server:port:username``. This avoids the overhead of establishing a new
    connection for every command and supports keep-alive packets for long-lived
    sessions.

    Args:
        server: Hostname or IP address of the remote server.
        username: SSH login username.
        password: Password for authentication. Mutually exclusive with ``key_path``.
        key_path: Absolute path to an SSH private key file on the local machine.
        port: SSH port number.

    Returns:
        str: Unique session identifier in the format ``f"{server}:{port}:{username}"``.

    Note:
        This is an internal function used by the global SSH session manager and
        the Docker command helpers.
    """
    session_id = f"{server}:{port}:{username}"
    with session_lock:
        if session_id not in ssh_sessions:
            client = paramiko.SSHClient()
            client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
            connect_kwargs = {
                'hostname': server, 'port': port, 'username': username,
                'timeout': 10, 'allow_agent': False, 'look_for_keys': False
            }
            if password:
                connect_kwargs['password'] = password
            elif key_path:
                connect_kwargs['key_filename'] = key_path
            client.connect(**connect_kwargs)
            client.get_transport().set_keepalive(15)  # Keep connection alive
            ssh_sessions[session_id] = client
        return session_id
def close_ssh_session(session_id: str) -> None:
    """Close and remove a persistent SSH session from the global cache.

    Args:
        session_id: The session identifier returned by ``get_or_create_ssh_session``.

    Note:
        This is an internal function for explicit resource cleanup.
    """
    with session_lock:
        if session_id in ssh_sessions:
            ssh_sessions[session_id].close()
            del ssh_sessions[session_id]
def ssh_execute(session_id: str, command: str) -> str:
    """Execute a shell command on the remote server using a persistent SSH session.

    Args:
        session_id: The session identifier returned by ``get_or_create_ssh_session``.
        command: The shell command to execute (may contain pipes, redirection, etc.).

    Returns:
        str: Combined stdout and stderr output prefixed with labels, or an error
             message if the session is unavailable or execution fails.
    """
    with session_lock:
        client = ssh_sessions.get(session_id)
        if not client:
            return "Error: Session not found."
        try:
            stdin, stdout, stderr = client.exec_command(command)
            output = stdout.read().decode()
            errors = stderr.read().decode()
            return f"stdout:\n{output}\nstderr:\n{errors}"
        except Exception as e:
            return f"Error executing command: {str(e)}"
# ── SFTP Helper Functions (added for file deployment) ───────────────────────
def get_sftp_client(session_id: str):
    """Get an SFTP client from the persistent SSH session."""
    with session_lock:
        client = ssh_sessions.get(session_id)
        if not client:
            return None
        try:
            return client.open_sftp()
        except Exception as e:
            print("Error: {e}")
            return None
def _get_ssh_stdout(result: str) -> str:
    """Safely extract only the clean stdout from ssh_execute output."""
    if not result:
        return ""
    if "stdout:" in result:
        # Take everything after "stdout:" and before "stderr:"
        after_stdout = result.split("stdout:", 1)[1]
        clean = after_stdout.split("stderr:", 1)[0]
        return clean.strip()
    # Fallback
    return result.strip()
def upload_file_content(session_id: str, content: any, remote_path: str) -> bool:
    """Upload string content (or dict, auto-serialized to YAML for .yml/.yaml files)
    as a file to the remote server via SFTP using absolute paths only.
    Correctly parses the formatted output of ssh_execute so paths are never corrupted.
    """
    sftp = get_sftp_client(session_id)
    if not sftp:
        print("Error: Could not obtain SFTP client.")
        return False

    try:
        print(f"[DEBUG] Original remote_path: {remote_path}")

        # Resolve absolute home directory
        home_result = ssh_execute(session_id, "echo -n $HOME")
        home_dir = _get_ssh_stdout(home_result)

        # Safety fallback
        if not home_dir or len(home_dir) < 3:
            whoami_result = ssh_execute(session_id, "whoami")
            username = _get_ssh_stdout(whoami_result)
            home_dir = f"/home/{username}"
            print(f"[DEBUG] Home directory resolved to: {home_dir}")

        # Convert ~/... to absolute path
        if remote_path.startswith("~/"):
            absolute_path = home_dir + remote_path[1:]
        else:
            absolute_path = remote_path

        print(f"[DEBUG] Absolute remote path: {absolute_path}")

        # Ensure parent directory exists
        remote_dir = os.path.dirname(absolute_path)
        if remote_dir:
            mkdir_result = ssh_execute(session_id, f"mkdir -p {remote_dir}")
            print(f"[DEBUG] mkdir -p result: {_get_ssh_stdout(mkdir_result) or ''}")

        # Normalize content to string (YAML for compose files, plain str otherwise)
        if isinstance(content, dict):
            if absolute_path.lower().endswith(('.yml', '.yaml')):
                print("[DEBUG] Content is dict; serializing to YAML for docker-compose.yml")
                content_str = yaml.dump(
                    content,
                    default_flow_style=False,
                    sort_keys=False,
                    allow_unicode=True,
                    width=120
                )
            else:
                # Fallback for non-YAML files (rare)
                import json
                content_str = json.dumps(content, indent=2)
        elif isinstance(content, str):
            content_str = content
        else:
            # Graceful fallback for other types
            print(f"[WARNING] Unexpected content type {type(content).__name__}; converting to str")
            content_str = str(content)

        # Encode to bytes for Paramiko
        content_bytes = content_str.encode('utf-8')

        print(f"[DEBUG] Attempting upload to: {absolute_path} ({len(content_bytes)} bytes)")

        # Upload in binary mode
        with sftp.file(absolute_path, 'wb', 0o644) as f:
            f.write(content_bytes)

        print(f"[SUCCESS] File uploaded successfully to {absolute_path}")
        return True

    except Exception as e:
        print(f"[ERROR] Upload failed for original path: {remote_path}")
        print(f"[ERROR] Exception type: {type(e).__name__}")
        print(f"[ERROR] Exception message: {e}")
        traceback.print_exc()
        return False
def ensure_remote_directory(session_id: str, remote_dir: str) -> str:
    """Ensure a remote directory exists using mkdir -p."""
    cmd = f"mkdir -p {remote_dir}"
    return ssh_execute(session_id, cmd)
# ── Auto-establish SSH Connection on Startup ───────────────────────────────
GLOBAL_SSH_SESSION_ID = None

if ENV_SERVER and ENV_USER:
    try:
        GLOBAL_SSH_SESSION_ID = get_or_create_ssh_session(
            server=ENV_SERVER,
            username=ENV_USER,
            password=ENV_PASSWORD,
            port=ENV_PORT
        )
        if GLOBAL_SSH_SESSION_ID:
            print(f"✓ SSH connection established to {ENV_SERVER} as {ENV_USER} {GLOBAL_SSH_SESSION_ID}")
        else:
            print("Failed GLOBAL_SSH_SESSION_ID - Exiting..")
            exit(-1)
    except Exception as e:
        print(f"✗ Failed to establish SSH connection: {e}")
        GLOBAL_SSH_SESSION_ID = None
else:
    print("⚠ Warning: ENV_SERVER and/or ENV_USER environment variables are not set.")
def _run_docker_command(cmd: str) -> str:
    """Execute a Docker command on the remote server via the global SSH session.

    All Docker-related tools delegate to this internal helper.

    Args:
        cmd: The complete Docker (or docker-compose) command string to execute.

    Returns:
        str: Command output (stdout + stderr) or an error message if the global
             SSH session is unavailable.

    Note:
        Requires the global SSH session established at module import time using
        the ``ENV_SERVER``, ``ENV_USER``, and optional ``ENV_PASSWORD`` environment
        variables.
    """
    if GLOBAL_SSH_SESSION_ID is None:
        return "Error: SSH session is not available. Check environment variables and connectivity."
    return ssh_execute(GLOBAL_SSH_SESSION_ID, cmd)

# ── FastMCP Server Setup ───────────────────────────────────────────────────
mcp = FastMCP(name="Docker Manager")
@mcp.tool
def docker_ps(all: bool = False) -> str:
    """List running (and optionally all) containers on the remote Docker host.

    Equivalent to ``docker ps`` or ``docker ps -a``.

    Args:
        all: If True, include stopped containers (adds the ``-a`` flag).

    Returns:
        str: Formatted output of the ``docker ps`` command.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    cmd = "docker ps -a" if all else "docker ps"
    return _run_docker_command(cmd)
@mcp.tool
def docker_images() -> str:
    """List all Docker images present on the remote server.

    Equivalent to ``docker images``.

    Returns:
        str: Formatted output of the ``docker images`` command.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    return _run_docker_command("docker images")
@mcp.tool
def docker_pull(image_name: str) -> str:
    """Pull a Docker image (or image:tag) from a registry to the remote server.

    Equivalent to ``docker pull ``.

    Args:
        image_name: Name of the image to pull (e.g., "nginx:latest" or "myrepo/app").

    Returns:
        str: Output of the pull operation (progress and status messages).

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    return _run_docker_command(f"docker pull {image_name}")
@mcp.tool
def docker_build(context_path: str, tag: str = None, dockerfile: str = "Dockerfile",no_cache: bool = False) -> str:
    """Build a Docker image from a Dockerfile located on the remote server.

    Equivalent to ``docker build [OPTIONS] ``.

    Args:
        context_path: Build context directory on the remote server (absolute or relative path).
        tag: Tag to apply to the built image (e.g., "myapp:v1").
        dockerfile: Name of the Dockerfile within the context (defaults to "Dockerfile").
        no_cache: If True, do not use cache when building the image.

    Returns:
        str: Build output including progress and final image ID.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    cmd = ["docker", "build"]
    if tag:
        cmd.extend(["-t", tag])
    if dockerfile != "Dockerfile":
        cmd.extend(["-f", dockerfile])
    if no_cache:
        cmd.append("--no-cache")
    cmd.append(context_path)
    return _run_docker_command(" ".join(cmd))
@mcp.tool
def docker_run(image: str, name: str = None, detach: bool = True, ports: str = None,env: str = None, volumes: str = None, command: str = "") -> str:
    """Create and start a new container from the specified image on the remote server.

    Equivalent to ``docker run [OPTIONS] IMAGE [COMMAND]``.

    Args:
        image: Docker image to run (e.g., "nginx:latest").
        name: Assign a name to the container.
        detach: Run container in background (adds ``-d`` flag). Defaults to True.
        ports: Port mapping(s) in the format "HOST_PORT:CONTAINER_PORT" (e.g., "8080:80").
        env: Environment variable(s) in the format "KEY=value".
        volumes: Volume mount(s) in the format "HOST_PATH:CONTAINER_PATH".
        command: Optional command and arguments to override the image's default CMD.

    Returns:
        str: Container ID (if detached) or full command output.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    cmd = ["docker", "run"]
    if detach:
        cmd.append("-d")
    if name:
        cmd.extend(["--name", name])
    if ports:
        cmd.extend(["-p", ports])
    if env:
        cmd.extend(["-e", env])
    if volumes:
        cmd.extend(["-v", volumes])
    cmd.append(image)
    if command:
        cmd.append(command)
    return _run_docker_command(" ".join(cmd))
@mcp.tool
def docker_stop(container: str) -> str:
    """Stop a running container on the remote server.

    Equivalent to ``docker stop ``.

    Args:
        container: Container name or ID.

    Returns:
        str: Output confirming the container was stopped.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    return _run_docker_command(f"docker stop {container}")
@mcp.tool
def docker_start(container: str) -> str:
    """Start a stopped container on the remote server.

    Equivalent to ``docker start ``.

    Args:
        container: Container name or ID.

    Returns:
        str: Output confirming the container was started.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    return _run_docker_command(f"docker start {container}")
@mcp.tool
def docker_restart(container: str) -> str:
    """Restart a container on the remote server.

    Equivalent to ``docker restart ``.

    Args:
        container: Container name or ID.

    Returns:
        str: Output confirming the container was restarted.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    return _run_docker_command(f"docker restart {container}")
@mcp.tool
def docker_rm(container: str, force: bool = False) -> str:
    """Remove one or more containers from the remote server.

    Equivalent to ``docker rm [-f] ``.

    Args:
        container: Container name or ID.
        force: If True, forcibly remove the container (adds ``-f`` flag).

    Returns:
        str: Output confirming removal.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    cmd = f"docker rm {'-f' if force else ''} {container}".strip()
    return _run_docker_command(cmd)
@mcp.tool
def docker_logs(container: str, tail: int = 100, follow: bool = False) -> str:
    """Fetch logs from a container on the remote server.

    Equivalent to ``docker logs [--tail N] [-f] ``.

    Args:
        container: Container name or ID.
        tail: Number of lines to show from the end of the logs.
        follow: If True, follow log output (adds ``-f`` flag). Note that this
                will block until the connection is closed.

    Returns:
        str: Log output from the container.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    cmd = f"docker logs --tail {tail}"
    if follow:
        cmd += " -f"
    cmd += f" {container}"
    return _run_docker_command(cmd)
@mcp.tool
def docker_compose_up(compose_file: str = "docker-compose.yml", detached: bool = True,build: bool = False, project_name: str = None) -> str:
    """Start services defined in a docker-compose file on the remote server.

    Equivalent to ``docker compose up [OPTIONS]``.

    Args:
        compose_file: Path to the Compose file (defaults to "docker-compose.yml").
        detached: Run in detached mode (adds ``-d`` flag).
        build: Build images before starting (adds ``--build`` flag).
        project_name: Alternative project name (adds ``-p`` flag).

    Returns:
        str: Output from the compose up operation.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    cmd = ["docker", "compose"]
    if compose_file:
        cmd.extend(["-f", compose_file])
    if project_name:
        cmd.extend(["-p", project_name])
    cmd.append("up")
    if detached:
        cmd.append("-d")
    if build:
        cmd.append("--build")
    return _run_docker_command(" ".join(cmd))
@mcp.tool
def docker_compose_down(compose_file: str = "docker-compose.yml", remove_volumes: bool = False) -> str:
    """Stop and remove containers, networks, and optionally volumes for a compose project.

    Equivalent to ``docker compose down [-v]``.

    Args:
        compose_file: Path to the Compose file (defaults to "docker-compose.yml").
        remove_volumes: If True, remove named volumes (adds ``-v`` flag).

    Returns:
        str: Output from the compose down operation.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    cmd = ["docker", "compose"]
    if compose_file:
        cmd.extend(["-f", compose_file])
    cmd.append("down")
    if remove_volumes:
        cmd.append("-v")
    return _run_docker_command(" ".join(cmd))
@mcp.tool
def docker_compose_build(compose_file: str = "docker-compose.yml") -> str:
    """Build or rebuild services defined in a docker-compose file.

    Equivalent to ``docker compose build``.

    Args:
        compose_file: Path to the Compose file (defaults to "docker-compose.yml").

    Returns:
        str: Build output from docker compose.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    cmd = ["docker", "compose"]
    if compose_file:
        cmd.extend(["-f", compose_file])
    cmd.append("build")
    return _run_docker_command(" ".join(cmd))
@mcp.tool
def docker_compose_ps(compose_file: str = "docker-compose.yml") -> str:
    """List containers for a docker-compose project.

    Equivalent to ``docker compose ps``.

    Args:
        compose_file: Path to the Compose file (defaults to "docker-compose.yml").

    Returns:
        str: Formatted list of compose project containers.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    cmd = ["docker", "compose"]
    if compose_file:
        cmd.extend(["-f", compose_file])
    cmd.append("ps")
    return _run_docker_command(" ".join(cmd))
@mcp.tool
def docker_compose_logs(compose_file: str = "docker-compose.yml", service: str = None, follow: bool = False) -> str:
    """View output from services defined in a docker-compose file.

    Equivalent to ``docker compose logs [-f] [SERVICE]``.

    Args:
        compose_file: Path to the Compose file (defaults to "docker-compose.yml").
        service: Optional service name to limit logs to.
        follow: If True, follow log output (adds ``-f`` flag).

    Returns:
        str: Log output from the compose services.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    cmd = ["docker", "compose"]
    if compose_file:
        cmd.extend(["-f", compose_file])
    cmd.append("logs")
    if follow:
        cmd.append("-f")
    if service:
        cmd.append(service)
    return _run_docker_command(" ".join(cmd))
@mcp.tool
def docker_compose_command(subcommand: str, arguments: str = "",
                           compose_file: str = None) -> str:
    """Execute any arbitrary ``docker compose`` subcommand on the remote server.

    Provides maximum flexibility for operations not covered by the dedicated tools.

    Args:
        subcommand: The docker-compose subcommand (e.g., "up", "exec", "config").
        arguments: Additional arguments and flags as a single string.
        compose_file: Optional path to the Compose file.

    Returns:
        str: Output from the executed docker compose command.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    """
    cmd = "docker compose"
    if compose_file:
        cmd += f" -f {compose_file}"
    cmd += f" {subcommand} {arguments}".strip()
    return _run_docker_command(cmd)
# ── New Deployment Tool (fulfills the requested functionality) ─────────────
@mcp.tool
def docker_compose_deploy(docker_name, dockerfile_content, requirements_txt_content, app_py_content, docker_compose_yml_content, detached = True, build = True):
    """Fully deploy an application to the remote server.

    Uploads Dockerfile, requirements.txt, app.py and docker-compose.yml
    to ~/docker/{docker_name}, then runs `docker compose up --build -d`.

    This tool requires the four specified files as parameters and performs
    the complete build-and-activation sequence in the designated holding folder.

    Args:
        docker_name: Project name used for the holding folder ~/docker/{docker_name}.
        dockerfile_content: Complete content of the Dockerfile as string.
        requirements_txt_content: Complete content of requirements.txt.
        app_py_content: Complete content of app.py.
        docker_compose_yml_content: Complete content of docker-compose.yml.
        detached: Run services in detached mode.
        build: Build/rebuild images before starting.

    Returns:
        Detailed log of directory setup, file uploads and compose operation.
    """
    if GLOBAL_SSH_SESSION_ID is None:
        return "Error: SSH session is not available."

    remote_dir = f"~/docker/{docker_name}"
    outputs = [f"Deploying project '{docker_name}' to remote directory: {remote_dir}"]

    # Create directory
    dir_result = ensure_remote_directory(GLOBAL_SSH_SESSION_ID, remote_dir)
    outputs.append(f"Directory setup:\n{dir_result}")

    # Upload files
    files_to_upload = {
        "Dockerfile": dockerfile_content,
        "requirements.txt": requirements_txt_content,
        "app.py": app_py_content,
        "docker-compose.yml": docker_compose_yml_content,
    }

    for filename, content in files_to_upload.items():
        remote_path = f"{remote_dir}/{filename}"
        success = upload_file_content(GLOBAL_SSH_SESSION_ID, content, remote_path)
        status = "Uploaded successfully" if success else "Upload failed"
        outputs.append(f"{filename}: {status}")

    # Execute docker compose up
    compose_cmd = f"cd {remote_dir} && docker compose up"
    if detached:
        compose_cmd += " -d"
    if build:
        compose_cmd += " --build"

    outputs.append("Starting Docker Compose build and deployment...")
    up_result = ssh_execute(GLOBAL_SSH_SESSION_ID, compose_cmd)
    outputs.append(f"Docker Compose Result:\n{up_result}")

    return "\n\n".join(outputs)

docker_c_testing = False

if docker_c_testing:
    import textwrap

    # ── Updated test data with perfectly formatted YAML ───────────────────────
    docker_name = "test_python_app"
    dockerfile_content = """FROM python:3.11-slim
    WORKDIR /app
    COPY requirements.txt .
    RUN pip install --no-cache-dir -r requirements.txt
    COPY app.py .
    EXPOSE 7000
    CMD ["python", "app.py"]
    """
    requirements_txt_content = """flask==3.0.3
    """
    app_py_content = """from flask import Flask
    app = Flask(__name__)

    @app.route('/')
    def hello_world():
        return "Hello, World from Docker Compose deployment test!"

    if __name__ == "__main__":
        app.run(host="0.0.0.0", port=7000, debug=False)
    """
    # Use dedent to ensure zero leading whitespace on every line
    docker_compose_yml_content = textwrap.dedent("""\
    version: '3.8'
    services:
      web:
        build: .
        ports:
          - "7000:7000"
        container_name: test_python_app_web
        restart: unless-stopped
    """)
    # Execute the deployment
    result = docker_compose_deploy(
        docker_name=docker_name,
        dockerfile_content=dockerfile_content,
        requirements_txt_content=requirements_txt_content,
        app_py_content=app_py_content,
        docker_compose_yml_content=docker_compose_yml_content,
        detached=True,
        build=True
    )
    exit(0)




# ── Server Startup with CORS ────────────────────────────────────────────────
if __name__ == "__main__":
    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=["*"],
            allow_credentials=True,
            allow_methods=["GET", "POST", "OPTIONS"],
            allow_headers=["*"],
            expose_headers=["*"],
        )
    ]

    app = mcp.http_app(
        path="/mcp",
        middleware=middleware
    )

    uvicorn.run(
        app,
        host="0.0.0.0",
        port=5010,
        log_level="info"
    )

Note - docker_c_testing can be set True in the code and will then be a bypass test that the remote system is working, the ssh is working. At runtime the MCP will verify it's connection.

When it runs it should look like this:

/home/c/PythonProject/task_group/.venv/bin/python /home/c/mcp_docker/i_docker_manager/docker_manager_04.py 
✓ SSH connection established to 192.168.1.4 as c 192.168.1.4:22:c
INFO:     Started server process [572420]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5009 (Press CTRL+C to quit)
INFO:     192.168.1.62:44242 - "OPTIONS /mcp HTTP/1.1" 200 OK
INFO:     192.168.1.62:44242 - "POST /mcp HTTP/1.1" 200 OK
INFO:     192.168.1.62:44242 - "OPTIONS /mcp HTTP/1.1" 200 OK
INFO:     192.168.1.62:44252 - "POST /mcp HTTP/1.1" 202 Accepted

Issues

We found that you need pretty explicit prompting. You cannot just say 'stand up a docker image and container, and expect to know the implied steps of making the Dockerfile, requirements.txt, docker-compose.yml etc. Instead use a prompt like:

When the code is done, using the process manager create a Dockerfile.  With the Docker file using the docker_build command create an image.

Get your LLM to test all the tools.

test all the docker tools and make sure you can use them.

Dockerization

Naturally this app itself sits in a docker container so we had an LLM write the composition.

Here are the files required to Dockerize your FastMCP-based Docker Manager application.

1. `Dockerfile`

# Use a lightweight Python image
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install system dependencies required by Paramiko (for SSH)
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    libffi-dev \
    libssl-dev \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first to leverage Docker cache
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code
COPY . .

# Expose the port used by the FastMCP server
EXPOSE 5010

# Set environment variables (can be overridden at runtime)
ENV ENV_PORT=22

# Command to run the application
CMD ["python", "app.py"]

2. `requirements.txt`

fastmcp
starlette
uvicorn[standard]
paramiko
pyyaml

Notes on dependencies:

fastmcp – assumed to be the package providing FastMCP
starlette and uvicorn – required for the HTTP server and CORS middleware
paramiko – for SSH/SFTP functionality
pyyaml – used in upload_file_content for YAML serialization

3. `docker-compose.yml`

version: '3.8'

services:
  docker-manager:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: docker-manager
    restart: unless-stopped
    ports:
      - "5010:5010"
    environment:
      # Required for SSH connection to the remote Docker host
      - ENV_SERVER=${ENV_SERVER}
      - ENV_USER=${ENV_USER}
      - ENV_PASSWORD=${ENV_PASSWORD:-}
      - ENV_PORT=${ENV_PORT:-22}
    # Optional: persist SSH known_hosts if needed (rarely required)
    # volumes:
    #   - ./known_hosts:/root/.ssh/known_hosts:ro

Usage Instructions

Place the three files above in the same directory as your app.py (the provided script).

Create a .env file (recommended) for sensitive values:

ENV_SERVER=your.remote.server.ip.or.hostname
ENV_USER=your_ssh_username
ENV_PASSWORD=your_ssh_password
# ENV_PORT=22   # optional, defaults to 22

Build and run using Docker Compose:

docker compose up -d --build

Access the FastMCP server at:
http://localhost:5010/mcp

Important Security Considerations

Never commit passwords or SSH keys into version control.

For production use, it is strongly recommended to use SSH key authentication instead of passwords.

If using keys, modify the Dockerfile and docker-compose.yml to mount the private key:

volumes:
  - ~/.ssh/id_rsa:/root/.ssh/id_rsa:ro

and update the connection logic accordingly.

The application establishes an SSH connection at startup using the provided environment variables.

Would you like me to also provide a version that uses SSH key-based authentication (more secure) or any adjustments to the above files?

Conclusion

This will give your LLM incredibly powerful tools where it can build docker containers and stand them up. Naturally this tool can also stand them down, delete them etc, so I would really recommend that a spare computer, vm, old laptop serve as it's 'workspace!'

StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!

thinkmelt@protonmail.com — Sun, 19 Apr 2026 02:45:40 GMT

System Specs - Ryzen 5 2600 (6 Core - 12Thread / 15,000 CPU Passmark) 16 GB RAM / 1 3060ti 8GB.

The question arises - on a very basic budget PC, can a University Student get something useful and productive - not a chatbot - but something with agentic workflow tools etc.?.. So we dug out an 3060ti, took out most of the ram, and started writing!

Please note this recipe will work for much larger high-end systems, simply reuse this recipe and give it a 35B or a 122B or what have you!

Let's get started!

0. Install your basics supports / compilers etc.

sudo apt install build-essential wget git python3 cmake -y
sudo apt install libcurl4-openssl-dev

A. Installing your Nvidia Drivers

This is going to vary based upon your video card, and you can run into issues, there are literally dozens of nvidia drivers, server drivers, and the nouveau which is often already in the standard Linux installation.
The best option is the last one in this section direct install of the 595 from Nvidia which we show at the bottom but you might get it to work using the local Linux repository.. To prevent conflict we blacklist Nouveau.
Driver 550 in many repositories might conflict with your current Kernel, however your auto-install may select it. Driver 595 as of April 2026 works very good - even with a ten year old 3060ti.
Here is what we found worked, and one can spin at this point ironically (we ended up reinstalling our drivers like 6 times - don't feel bad if you take several attempts at this.)

Before Doing Anything - Set Linux Kernel Headers

sudo apt install linux-headers-$(uname -r)

linux-headers will hold the correct packages that will allow the rest of the drivers to build against.

First Try

sudo apt install nvidia-driver-full nvidia-cuda-toolkit -y

If it does issue errors try blacklisting nouveau drivers as they can conflict.

sudo apt update && sudo apt full-upgrade -y
sudo apt autoremove -y

sudo nano /etc/modprobe.d/blacklist-nouveau.conf

Add

blacklist nouveau
options nouveau modeset=0
alias nouveau off

Update initramfs and reboot

sudo update-initramfs -u && sudo reboot

Direct Driver Pull from Nvidia

If everything fails simply do a direct pull from Nvidia, purging out all old drivers:

sudo apt purge *nvidia*
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/595.58.03/NVIDIA-Linux-x86_64-595.58.03.run
chmod +x NVIDIA-Linux-x86_64-595.58.03.run
sudo ./NVIDIA-Linux-x86_64-595.58.03.run

nvidia-smi Driver Confirmation Will Confirm Your GTG!

nvidia-smi

nvidia-smi result example

It will look as (and note specifically it will show you in the top right corner the highest CUDA toolkit that your GPU / Drivers can support (CUDA Version: 13.2)

B. Installing Cuda Toolkit 13.2

Next we will need to get the Nvidia Cuda toolkit (latest version 13.2) installed - as it will have the very important nvcc compiler that will make our custom Turboquant enabled llama.cpp shortly. This is really important as we need these new power features that will give us as big of a kv-cache as we can get.

wget https://developer.download.nvidia.com/compute/cuda/13.2.0/local_installers/cuda-repo-debian13-13-2-local_13.2.0-595.45.04-1_amd64.deb
sudo dpkg -i cuda-repo-debian13-13-2-local_13.2.0-595.45.04-1_amd64.deb
sudo cp /var/cuda-repo-debian13-13-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-2

nvcc --version

Note - nvcc can completely install itself - but somehow not bother to add itself to your path! Seriously why? So to address this - you can edit your ~/.bashrc and add:

PATH=/usr/local/cuda-13.2/bin:$PATH

Then re-source your ~./bashrc:

source ~/.basrc

When it works it will show up as:

$nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2026 NVIDIA Corporation
Built on Thu_Mar_19_11:12:51_PM_PDT_2026
Cuda compilation tools, release 13.2, V13.2.78
Build cuda_13.2.r13.2/compiler.37668154_0

Support

Nvidia / Cuda ToolKit Driver fitting can be so problematic there are dedicated troubleshooting pages, do consider:

The Complete Guide to Fixing CUDA Installation Issues - Break AI Scaling Limits in 7 Days

Getting CUDA working shouldn’t take days. After analyzing hundreds of installation failures, I’ve compiled every fix you need for PATH errors, driver mismatches, WSL2 problems, and more. Stop fighting with nvcc and start coding. Learn proven techniques to shift the scaling law intercept and achieve…

RightNow AIRightNow AI

You're almost there!

C. Installing TurboQuant Forked Llama.cpp

Once that is done we will pull the Turboquant enabled fork of Llama.cpp. This will reduce our cache significantly, allowing us to squeeze as much as we can out of our houseLLM. It is the last challenging step as you will build it from source and it prefers a specific configuration.

C.1. You might need to update your cmake to the latest before you continue it's not hard here is how!

wget https://github.com/Kitware/CMake/releases/download/v4.3.1/cmake-4.3.1-linux-x86_64.sh
chmod +x ./cmake-4.3.1-linux-x86_64.sh
./cmake-4.3.1-linux-x86_64.sh

This just un-compresses. You may need to then copy your bin files to /usr/bin or make a ln (symbolic link)

cd cmake-4.3.1-linux-x86_64/bin
sudo cp * /usr/bin

Once you are there (however you get there):

c@dragon-192-168-1-3:~/PythonProject/TurboResearcher2/cmake/cmake-4.3.1-linux-x86_64/bin$ cmake --version
cmake version 4.3.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).

Here is the TurboQuant forked variant of llama.cpp full recognition of the excellent 'The Tom' that built it!

GitHub - TheTom/llama-cpp-turboquant: LLM inference in C/C++

LLM inference in C/C++. Contribute to TheTom/llama-cpp-turboquant development by creating an account on GitHub.

GitHubTheTom

Pull the repository and enter it's directory:

git clone https://github.com/TheTom/llama-cpp-turboquant.git
cd llama-cpp-turboquant

Make a custom script inside of it named install.sh - inside of it put:
Note this is for the nvidia driver installation using Cuda. If you have a Mac you will need other drivers, typically in the Readme it will have the alternate drivers for it.

cmake -B build \
      -DLLAMA_CUDA=ON \
      -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.2/bin/nvcc \
      -DCUDAToolkit_ROOT=/usr/local/cuda-13.2 \
      -DCMAKE_CUDA_ARCHITECTURES="86;89" \
      -DCMAKE_BUILD_TYPE=Release 
cmake --build build --config Release -j$(nproc)

Please note - we specified both architectures (86,89) that way if you upgrade your GPU to a 4080, 5080 etc - it should work out of the box! Add 100 for super-latest stuff.
Make it an executable and execute it:

chmod +x ./install.sh
./install.sh

Now wait about 15-20 minutes for it to compile

Well longer than that ...

Inside when it finally finishes will be a directory, you simply want to copy it's contents to your /usr/bin location. If you have already another llama.cpp that you do not want to conflict then use global pathing in all references aka /usr/bin/customllm/llm-server instead.

Move all the compiled product to your /usr/bin - from inside the built directory:

cd /build/bin
sudo cp * /usr/bin

Making sure it's working and ready to go:

llama-server

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 7839 MiB):
Device 0: NVIDIA GeForce RTX 3060 Ti, compute capability 8.6, VMM: yes, VRAM: 7839 MiB
main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
build_info: b8967-627ebbc6e
system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
init: using 11 threads for HTTP server

D. Installing the Qwen2.5-Coder-7B-Instruct-GGUF

We chose a Qwen2.5-Coder-7B-Instruct 6-bit - which should hopefully give us as much affinity towards coding on a 8GB as we can. The 6-bit frees up space to get as much space back as we can while maintaining as much power as we can get.

khjvgvyfc/Qwen2.5-Coder-7B-Instruct-GGUF · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

One simply pulls it with:

We recommend a working ~/models directory so:

mkdir ~/models && cd ~/models

wget https://huggingface.co/khjvgvyfc/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q6_k.gguf?download=true

Almost there.

Typically because the command-line options for llama-cpp and llama-server can be really large - it is smart to save your command lines calls in a script so that you can tweak them as you desire, but if / when you come back a long time later you are not forgetting the myriad of options availed you so... Additionally we made the filename simpler so that it is more easily referenced, and we recommend absolute pathing in the scripts:

sudo mv qwen2.5-coder-7b-instruct-q6_k.gguf\?download\=true  qwen2.5-coder-7b-instruct-q6_k.gguf

/usr/bin/llama-server --jinja \
-m /home/c/models/qwen2.5-coder-7b-instruct-q6_k.gguf \
--host 192.168.1.4 \
--n-gpu-layers 999 \
--override-tensor "\.ffn_.*_exps\.weight=CPU" \
--flash-attn on \
--cache-type-k turbo3 \
--cache-type-v turbo3 \
-c 64000 \
--temp 0.7

If it boots right it will produce a large detail, here is what one looks like for reference:

For an even FASTER configuration try this one! Full credit to:

https://x.com/iam_shanmukha

/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \
--host 192.168.1.3 \
--fit on \
--flash-attn on \
--spec-type ngram-mod \
--spec-ngram-size-n 24 \
--n-cpu-moe-draft 39 \
-t 14 \
--chat-template-kwargs '{"preserve_thinking":true}' \
--cache-type-k turbo3 \
--cache-type-v turbo4 \
-c 512000 \
--temp 0.7

Full credit to https://x.com/iam_shanmukha who suggested an even faster configuration:

We tried this and did see some speed ups to 35 Tokens/s. However it was noted that it might make more errors on REALLY LARGE 100k contexts! So maybe put both in seperate scripts and try the one you like best!

srv    load_model: loading model '/home/c/models/qwen2.5-coder-7b-instruct-q6_k.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected to use 6527 MiB of device memory vs. 7382 MiB of free device memory
llama_params_fit_impl: cannot meet free memory target of 1024 MiB, need to reduce device memory by 168 MiB
llama_params_fit_impl: context size set by user to 64000 -> no change
llama_params_fit: failed to fit params to free device memory: n_gpu_layers already set by user to 999, abort
llama_params_fit: fitting params to free memory took 0.48 seconds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3060 Ti) (0000:07:00.0) - 7382 MiB free
llama_model_loader: loaded meta data with 29 key-value pairs and 339 tensors from /home/c/models/qwen2.5-coder-7b-instruct-q6_k.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct GGUF
llama_model_loader: - kv   3:                           general.finetune str              = Instruct-GGUF
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                          qwen2.block_count u32              = 28
llama_model_loader: - kv   7:                       qwen2.context_length u32              = 131072
llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                          general.file_type u32              = 18
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - kv  26:                                   split.no u16              = 0
llama_model_loader: - kv  27:                                split.count u16              = 0
llama_model_loader: - kv  28:                        split.tensors.count i32              = 339
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q6_K:  198 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q6_K
print_info: file size   = 5.82 GiB (6.56 BPW)
load: 0 unused tokens
load: control-looking token: 128247 '' was not control-type; this is probably a bug in the model. its type will be overridden
load: printing all EOG tokens:
load:   - 128247 ('')
load:   - 151643 ('<|endoftext|>')
load:   - 151645 ('<|im_end|>')
load:   - 151662 ('<|fim_pad|>')
load:   - 151663 ('<|repo_name|>')
load:   - 151664 ('<|file_sep|>')
load: special tokens cache size = 23
load: token to piece cache size = 0.9310 MB
print_info: arch                  = qwen2
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 131072
print_info: n_embd                = 3584
print_info: n_embd_inp            = 3584
print_info: n_layer               = 28
print_info: n_head                = 28
print_info: n_head_kv             = 4
print_info: n_rot                 = 128
print_info: n_swa                 = 0
print_info: is_swa_any            = 0
print_info: n_embd_head_k         = 128
print_info: n_embd_head_v         = 128
print_info: n_gqa                 = 7
print_info: n_embd_k_gqa          = 512
print_info: n_embd_v_gqa          = 512
print_info: f_norm_eps            = 0.0e+00
print_info: f_norm_rms_eps        = 1.0e-06
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 0.0e+00
print_info: n_ff                  = 18944
print_info: n_expert              = 0
print_info: n_expert_used         = 0
print_info: n_expert_groups       = 0
print_info: n_group_used          = 0
print_info: causal attn           = 1
print_info: pooling type          = -1
print_info: rope type             = 2
print_info: rope scaling          = linear
print_info: freq_base_train       = 1000000.0
print_info: freq_scale_train      = 1
print_info: n_ctx_orig_yarn       = 131072
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: model type            = 7B
print_info: model params          = 7.62 B
print_info: general.name          = Qwen2.5 Coder 7B Instruct GGUF
print_info: vocab type            = BPE
print_info: n_vocab               = 152064
print_info: n_merges              = 151387
print_info: BOS token             = 151643 '<|endoftext|>'
print_info: EOS token             = 151645 '<|im_end|>'
print_info: EOT token             = 151645 '<|im_end|>'
print_info: PAD token             = 151643 '<|endoftext|>'
print_info: LF token              = 198 'Ċ'
print_info: FIM PRE token         = 151659 '<|fim_prefix|>'
print_info: FIM SUF token         = 151661 '<|fim_suffix|>'
print_info: FIM MID token         = 151660 '<|fim_middle|>'
print_info: FIM PAD token         = 151662 '<|fim_pad|>'
print_info: FIM REP token         = 151663 '<|repo_name|>'
print_info: FIM SEP token         = 151664 '<|file_sep|>'
print_info: EOG token             = 128247 ''
print_info: EOG token             = 151643 '<|endoftext|>'
print_info: EOG token             = 151645 '<|im_end|>'
print_info: EOG token             = 151662 '<|fim_pad|>'
print_info: EOG token             = 151663 '<|repo_name|>'
print_info: EOG token             = 151664 '<|file_sep|>'
print_info: max token length      = 256
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 27 repeating layers to GPU
load_tensors: offloaded 29/29 layers to GPU
load_tensors:   CPU_Mapped model buffer size =   426.36 MiB
load_tensors:        CUDA0 model buffer size =  5532.43 MiB
........................................................................................
common_init_result: added  logit bias = -inf
common_init_result: added <|endoftext|> logit bias = -inf
common_init_result: added <|im_end|> logit bias = -inf
common_init_result: added <|fim_pad|> logit bias = -inf
common_init_result: added <|repo_name|> logit bias = -inf
common_init_result: added <|file_sep|> logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 4
llama_context: n_ctx         = 64000
llama_context: n_ctx_seq     = 64000
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = enabled
llama_context: kv_unified    = true
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (64000) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     2.32 MiB
llama_kv_cache:      CUDA0 KV buffer size =   683.72 MiB
llama_kv_cache: TurboQuant rotation matrices initialized (128x128)
llama_kv_cache: size =  683.59 MiB ( 64000 cells,  28 layers,  4/1 seqs), K (turbo3):  341.80 MiB, V (turbo3):  341.80 MiB
llama_kv_cache: upstream attention rotation disabled (TurboQuant uses kernel-level WHT)
llama_kv_cache: attn_rot_k = 0, n_embd_head_k_all = 128
llama_kv_cache: attn_rot_v = 0, n_embd_head_k_all = 128
sched_reserve: reserving ...
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve:      CUDA0 compute buffer size =   311.00 MiB
sched_reserve:  CUDA_Host compute buffer size =   139.01 MiB
sched_reserve: graph nodes  = 1015
sched_reserve: graph splits = 2
sched_reserve: reserve took 111.08 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
srv    load_model: initializing slots, n_slots = 4
no implementations specified for speculative decoding
slot   load_model: id  0 | task -1 | speculative decoding context not initialized
slot   load_model: id  0 | task -1 | new slot, n_ctx = 64000
no implementations specified for speculative decoding
slot   load_model: id  1 | task -1 | speculative decoding context not initialized
slot   load_model: id  1 | task -1 | new slot, n_ctx = 64000
no implementations specified for speculative decoding
slot   load_model: id  2 | task -1 | speculative decoding context not initialized
slot   load_model: id  2 | task -1 | new slot, n_ctx = 64000
no implementations specified for speculative decoding
slot   load_model: id  3 | task -1 | speculative decoding context not initialized
slot   load_model: id  3 | task -1 | new slot, n_ctx = 64000
srv    load_model: prompt cache is enabled, size limit: 8192 MiB
srv    load_model: use `--cache-ram 0` to disable the prompt cache
srv    load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
srv          init: init: idle slots will be saved to prompt cache and cleared upon starting a new task
init: chat template, example_format: '<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
'
srv          init: init: chat template, thinking = 0
main: model loaded
main: server is listening on http://192.168.1.4:8080
main: starting the main loop...
srv  update_slots: all slots are idle
srv  log_server_r: done request: GET / 192.168.1.62 200
srv  log_server_r: done request: GET /bundle.css 192.168.1.62 200
srv  log_server_r: done request: GET /bundle.js 192.168.1.62 200
srv  log_server_r: done request: HEAD /cors-proxy 192.168.1.62 404

How Does it Work?

http://192.168.1.4:8080

Change to the local IP address of your machine.
It works really good - for a basic house 8B. We won't spend a lot of time on that alone because the real POWER comes when you make it agentic by adding external tools!

PLEASE NOTE: LLM'S ARE OKAY. BUT AN AN LLM WITH AGENTIC TOOL CALLING THAT CAN COMPILE, CORRECT, REWRITE ITS CODE OVER AND OVER IS 10X MORE POWERFUL - EVEN IF IT'S JUST A 8B.

It is only a little more work to add agentic tool calling. That is where your LLM gets a super power up. They are not hard at all we carefully documented them from really basic calculator agents, to highly powerful ones that can go on the internet research and then come back and do work. Don't be overwhelmed just work through each guide!

PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo.

Downloads Page for all your MCP tooling needs!

Hot Configthinkmeltprotonmail.com

Because this first model worked 'okay' we then immediately switched to another one that had the powerful agentic tooling options!

Upgrading to Qwen3.5-9B w/Agentic Tool Capability.

Right away we went back picked up a much new model, one that specifically noted it's tooling capability!

Qwen/Qwen3.5-9B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

You can pull it with:

wget https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-UD-Q5_K_XL.gguf?download=true

We created another script for our new model, and tested its agentic abilities.

/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.5-9B-UD-Q5-K_XL.gguf \
--host 192.168.1.4 \
--n-gpu-layers 999 \
--override-tensor "\.ffn_.*_exps\.weight=CPU" \
--flash-attn on \
--cache-type-k turbo4 \
--cache-type-v turbo2 \
-c 32768  \
--temp 0.7

We were highly impressed as this model went straight to work, started corrected it's tool calls, was still going strong at 12,000 Token/s! Nice!

Adding one more Super Tool: LLMQP.

This will let your localLLM code all night. No longer do you need to sit there waiting between prompts but you can quickly and effectively use this to manage your prompts sequentially.

LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT.

LLM Queue Dispatcher. A Powerful Harness Drop will queue your localLLM all night and keep it working!

Hot Configthinkmeltprotonmail.com

Conclusion

Absolutely you CAN get agentic quality local LLM's working on very very minimal house GPU parts. It comes down to the resourceful methods one wants to employ. It also was inferencing very fast at ~ 45 Tokens/s.

This can be very powerfully useful as a 'side-hustle' LLM that can do your work for minimum effort!
Using our Code Drop tool after it was done it successfully had created the following code package for us.

Full export BinarySort Algorithm 20260419 024252

full_export_BinarySort_Algorithm_20260419_024252.zip

3 KB

Qwen3.6 Drops!- A HouseLLM Production Level Coding Perspective? One-Shot GoAccess

thinkmelt@protonmail.com — Fri, 17 Apr 2026 14:35:21 GMT

Huggingface and most sites were excited to get the latest Qwen3.6 - scoring unbelievably high for a 35B. People were clearly questioning if this model was tuned to rig the benchmarks, but nonetheless its performance out of the gate was looking very impressive:

Is it just me or are they squeezing so much graph into such small images you cannot read it?

Scoring in at 51.5 in Terminal-Bench 2.0 looks very well suited as a strong contender for managing your server instances, scoring handsomely above Gemma and Google which is in our interests.

We gave it the following prompt, if it is actually that good to see if it could one-shot GoAccess a Logging service. We gave it all the Agentic tools (Context Manager, Python, Web Lookup etc, asked it to benchmark it's work and then finish with a Code Drop.)

Here are the results. Our System Specifications is modest.

Ryzen 9 3900 12-core/24 Thread. (128 GB RAM)
Nivida 4080ti w/16 GB.

Our Run Configuration is Using TensorBalance / TurboQuant Forked Llama.cpp if you want to build your own (here). If you want to build the agentic tools to run inside your own docker container - check out our (downloads)

We were running the 6-bit GGUF slight off a full 8 (Q6)

Our Run Config

/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \
--host 192.168.1.3 \
--n-gpu-layers 999 \
--flash-attn on \
--override-tensor "\.ffn_.*_exps\.weight=CPU" \
--cache-type-k turbo3 \
--cache-type-v turbo3 \
-c 512000 \
--temp 0.7

Our Prompt

Go find GoAccess Traffic Analyzer, Analyze it into a project using the project manager into several tasks to replicate it. Every time you create a task or job inside the project manager save it. Work on each task and test them in python. When you are done save the project manager and send a code drop

Our Result

63 Reasoning Cycles intermixed with approximately 30 python code cycles.
The LLM make little look ups to the Internet working mostly locally however it did go find it (we did not specify where to even look for it.)

Performance

It averaged 22.9 Tokens/s over a Context Window of 62,770 Tokens
It worked by itself for about 45 Minutes.

It did not complete the Code Drop - it delivered after a second ask. We use a custom Context Manager so that it can retrieve its work and or break it up - and if you want a full guide for adding it to your llm its opensource. (here)

Code Drop

Came out very nicely, producing a project structure of 6 folders.

The code a 31 KB export, html came in at 21 KB of production.

Full export GoAccess Replicator 20260417 135838

full_export_GoAccess Replicator_20260417_135838.zip

31 KB

Multiple Class Performance

The model clearly delineated the task into groups and performed well, writing clean code without hinting. Again it ran dozens of cycles to clean it's own code an example Class it built for the URLAnalyzer.

from collections import Counter, defaultdict
from typing import Dict, List, Tuple
from log_parser import LogEntry
from urllib.parse import urlparse

class URLAnalyzer:
    """Analyzes URLs and paths from parsed log entries."""
    
    def __init__(self, entries: List[LogEntry]):
        self.entries = entries
        self.path_counter = Counter()
        self.path_bytes: Dict[str, int] = defaultdict(int)
        self.path_status: Dict[str, Counter] = defaultdict(Counter)
        self.method_counter = Counter()
        self.extension_counter = Counter()
        self.directory_counter = Counter()
        self._analyze()
    
    def _analyze(self):
        """Perform URL analysis."""
        for entry in self.entries:
            self.path_counter[entry.path] += 1
            self.path_bytes[entry.path] += entry.response_size
            self.path_status[entry.path][entry.status_code] += 1
            self.method_counter[entry.method] += 1
            
            # Extract file extension
            parsed = urlparse(entry.path)
            path = parsed.path
            if '.' in path:
                ext = path.split('.')[-1].split('?')[0]
                self.extension_counter[ext] += 1
            
            # Extract directory
            if path != '/':
                parts = path.strip('/').split('/')
                directory = '/' + '/'.join(parts[:-1]) if len(parts) > 1 else '/'
                self.directory_counter[directory] += 1
    
    def get_top_paths(self, n: int = 10) -> List[Tuple[str, int]]:
        """Get top N most requested paths."""
        return self.path_counter.most_common(n)
    
    def get_top_bytes_paths(self, n: int = 10) -> List[Tuple[str, int]]:
        """Get top N paths by bytes transferred."""
        return sorted(
            [(path, bytes) for path, bytes in self.path_bytes.items()],
            key=lambda x: x[1],
            reverse=True
        )[:n]
    
    def get_path_details(self, path: str) -> Dict:
        """Get detailed statistics for a specific path."""
        if path not in self.path_counter:
            return {}
        
        return {
            'path': path,
            'total_requests': self.path_counter[path],
            'total_bytes': self.path_bytes[path],
            'status_codes': dict(self.path_status[path]),
            'avg_response_size': self.path_bytes[path] / self.path_counter[path]
        }
    
    def get_extension_distribution(self) -> Dict[str, int]:
        """Get distribution of file extensions."""
        return dict(self.extension_counter.most_common())
    
    def get_directory_distribution(self, n: int = 10) -> Dict[str, int]:
        """Get distribution of directories."""
        return dict(self.directory_counter.most_common(n))
    
    def get_method_distribution(self) -> Dict[str, int]:
        """Get distribution of HTTP methods."""
        return dict(self.method_counter.most_common())
    
    def get_static_vs_dynamic(self) -> Dict[str, int]:
        """Categorize paths as static or dynamic."""
        static_extensions = {'html', 'css', 'js', 'png', 'jpg', 'jpeg', 'gif', 'svg', 'ico', 'pdf', 'txt'}
        static_count = 0
        dynamic_count = 0
        
        for entry in self.entries:
            parsed = urlparse(entry.path)
            path = parsed.path
            if '.' in path:
                ext = path.split('.')[-1].split('?')[0]
                if ext.lower() in static_extensions:
                    static_count += 1
                else:
                    dynamic_count += 1
            else:
                dynamic_count += 1
        
        return {
            'Static': static_count,
            'Dynamic': dynamic_count
        }
    
    def get_api_endpoints(self) -> List[Tuple[str, int]]:
        """Identify API endpoints (paths starting with /api/)."""
        api_paths = Counter()
        for entry in self.entries:
            if entry.path.startswith('/api/'):
                api_paths[entry.path] += 1
        return api_paths.most_common()
    
    def get_404_paths(self) -> List[Tuple[str, int]]:
        """Get paths that returned 404 errors."""
        not_found = []
        for path, statuses in self.path_status.items():
            if statuses.get(404, 0) > 0:
                not_found.append((path, statuses[404]))
        return sorted(not_found, key=lambda x: x[1], reverse=True)
    
    def generate_report(self) -> str:
        """Generate a text report of URL analysis."""
        report = []
        report.append("=" * 70)
        report.append("URL ANALYSIS REPORT")
        report.append("=" * 70)
        
        # Top paths
        report.append("\n--- Top 20 Requested Paths ---")
        for path, count in self.get_top_paths(20):
            report.append(f"  {path:40s} {count:6d} requests")
        
        # Top bytes
        report.append("\n--- Top 10 Paths by Bytes ---")
        for path, bytes in self.get_top_bytes_paths(10):
            report.append(f"  {path:40s} {bytes:>10,} bytes")
        
        # Method distribution
        report.append("\n--- HTTP Method Distribution ---")
        methods = self.get_method_distribution()
        total = sum(methods.values())
        for method, count in methods.items():
            pct = (count / total * 100) if total > 0 else 0
            report.append(f"  {method:10s}: {count:6d} ({pct:5.1f}%)")
        
        # Extension distribution
        report.append("\n--- File Extension Distribution ---")
        extensions = self.get_extension_distribution()
        for ext, count in extensions.items():
            report.append(f"  .{ext:10s}: {count:6d}")
        
        # Static vs Dynamic
        report.append("\n--- Static vs Dynamic Content ---")
        static_dynamic = self.get_static_vs_dynamic()
        for content_type, count in static_dynamic.items():
            report.append(f"  {content_type:10s}: {count:6d}")
        
        # API endpoints
        report.append("\n--- API Endpoints ---")
        api_endpoints = self.get_api_endpoints()
        for path, count in api_endpoints:
            report.append(f"  {path:40s} {count:6d} requests")
        
        # 404 paths
        report.append("\n--- 404 Not Found Paths ---")
        not_found = self.get_404_paths()
        for path, count in not_found[:10]:
            report.append(f"  {path:40s} {count:6d} errors")
        
        return '\n'.join(report)

Conclusion

We are very impressed. We did not feel the need for this model to get a 'SOTA assist'
This was a poorly defined prompt that didn't specify how to replicate GoAccess where most production projects would be managing their LLM tightly.
The conclusion is absolute in our opinion you can use these LLMs as a powerful assisting tool for basic building needs.
Speed is respectable and that's the big +, We are really running on minimal equipment for the LLM world, just a stock 4080 GPU with the tensors layers split between that and the RAM

PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++

thinkmelt@protonmail.com — Thu, 16 Apr 2026 03:07:59 GMT

We have compiled a collection of very easy back MCP tools for all your agentic houseLLM needs!

If you need to know how to use these a dedicated page for each one will guide you through getting them to work!
They are simple - they are just a docker container that opens up a port. Your houseLLM will query it for it's work. You register it simply by 'add MCP server' That's it!
PLEASE NOTE YOUR RUN CONFIGURATION MAY WORK FASTER BETTER WITH --webui-mcp-proxy Then if you do use it click it on for each MCP Agent you use.

Scientific Calculator MCP Docker Agent

Get your LLM doing incredibly powerful math - then verifying it's work! Seriously.

docker pull cnmcdee/mcp_calculator:latest
docker run -d --name mcp-calculator --restart unless-stopped -e "FLASH_ENV=production" -p 0.0.0.0:5000:5000 cnmcdee/mcp_calculator:latest

Because it's CORS http it works! Add it to your MCP Tooling easily via:

http://192.168.1.:5000/mcp

Agentic Server Primer: Llama.cpp MCP Lesson 1: A Calculator.

Agentic Server Primers: Llama.cpp MCP Lession 1: A Calculator

Hot Configthinkmeltprotonmail.com

A mcp calculator

a_mcp_calculator.zip

3 KB

Python Docker MCP Docker Agent

With this tool typically not only will your LLM generate code - it will then send it to testing to this Python MCP.

docker pull cnmcdee/mcp-python:latest
docker run -d --name mcp-python --restart unless-stopped -e "FLASH_ENV=production" -p 0.0.0.0:5001:5001 cnmcdee/mcp-python:latest

Because it's CORS HTTP enabled it works! Add it to your MCP Tooling easily via:

http://192.168.1.:5001/mcp

You simply change your IP address for each agent your add: 5002, 5003.. so on.

Agentic Server Primer: Llama.cpp MCP Lesson 3: Adding Python Tooling Capability To your HouseLLM.

Agentic Server Primer: Llama.cpp MCP Lesson 3: Python

Hot Configthinkmeltprotonmail.com

B mcp python

b_mcp_python.zip

4 KB

Weather API Example MCP Docker Agent

This fully complementary MCP tool will enable your LLM to look up weather anywhere in the world!

docker pull docker.io/cnmcdee/mcp-weather:latest
docker run -d --name mcp-weather --restart unless-stopped -e "FLASH_ENV=production" -p 0.0.0.0:5002:5002 cnmcdee/mcp-weather:latest

Agentic Server Primer: Llama.cpp MCP Lesson 4: Weather Polling via api.weather.gov

Hot Configthinkmeltprotonmail.com

C mcp weather

c_mcp_weather.zip

3 KB

Javascript Enabled MCP Agent

This uses a Python API to pipe javascript. Very powerful when you want to code javascript for your html projects and your LLM wants to verify it's work!

docker pull docker.io/cnmcdee/mcp-javascript:latest
docker run -d --name mcp-javascript --restart unless-stopped -e "FLASH_ENV=production" -p 0.0.0.0:5003:5003 cnmcdee/mcp-javascript:latest

Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin.

We go through a full working example of creating your own MCP tools.

Hot Configthinkmeltprotonmail.com

D mcp javascript

d_mcp_javascript.zip

3 KB

Mysql Enabled MCP Agentic Agent

This will let your LLM communicate with it's own database! It can be very interesting watching it create tables, joins, merges all on it's own!
Please note you will probably need to do this one manually so it can make it's connection to the running mysql instance.

docker pull docker.io/cnmcdee/mcp-msyql:latest
docker run -d --name mcp-mysql --restart unless-stopped -e "FLASH_ENV=production" -p 0.0.0.0:5004:5004 cnmcdee/mcp-mysql:latest

Agentic Server Primer: Llama.cpp MCP Lesson 6: Adding mysql Database Docker Toolsets.

We give our LLM it’s own Database to play with!

Hot Configthinkmeltprotonmail.com

E mcp mysql

e_mcp_mysql.zip

3 KB

SuperPython Enabled MCP Agent

This is a MUCH beefier python with numpy, beautifulsoup, requests, whatever you want to put inside your python enviroment. Use this when your LLM is creating web enabled tools and wants to test them!
Again the process is near identical so we reference the basic python MCP docker tutorial from above..

To quickly pull and run this locally:

docker pull docker.io/cnmcdee/mcp-super-python:latest
docker run -d --name mcp-super-python --restart unless-stopped -e "FLASH_ENV=production" -p 0.0.0.0:5007:5007 cnmcdee/mcp-super-python:latest

F mcp super python

f_mcp_super_python.zip

7 KB

Process Manager with Code Drop and Web Request Agent

This is a powerhouse tool. By allowing your LLM to save its work, gift it to you with a 'code drop' resume a task, break it down into jobs and record it's own code snippets you can do production level research from a houseLLM at much lower cost!

docker pull docker.io/cnmcdee/mcp-process-manager:latest
docker run -d --name mcp-process-manager --restart unless-stopped -e "FLASH_ENV=production" -p 0.0.0.0:5008:5008 cnmcdee/mcp-process-manager:latest

Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop.

Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant

Hot Configthinkmeltprotonmail.com

H mcp process manager

h_mcp_process_manager.zip

11 KB

Docker Orchestrator

This powerful MCP agent will enable your tool-enabled MCP agent to create, stand up and test it's own docker images and containers!
NOTE because this is designed to remote into a server via it's SSH tunnel you will need to set some variables:

docker pull docker.io/cnmcdee/mcp-docker-orchestrator:latest

docker run -d \
  --name mcp-docker-orchestrator \
  --restart unless-stopped \
  -p 0.0.0.0:5010:5010 \
  -e "FLASH_ENV=production" \
  -e ENV_SERVER="${ENV_SERVER}" \
  -e ENV_USER="${ENV_USER}" \
  -e ENV_PASSWORD="${ENV_PASSWORD:-}" \
  -e ENV_PORT="${ENV_PORT:-22}" \
  cnmcdee/mcp-docker-orchestrator

Agentic Server Primer: Llama.cpp MCP Lesson 9: Docker Orchestrator

In this guide we go over letting your llm manage and create it’s own docker images, stand up it’s own containers after writing it’s code. It uses a special docker-compose tool we built for it.

Hot Configthinkmeltprotonmail.com

I dkr manager

I_dkr_manager.zip

8 KB

mcp-coder (Cuda Enabled Docker Container)

This is a powerful 'developmental' type container for those that need a safe environment for developing and compiling new versions of Llama.cpp.

docker pull cnmcdee/mcp-coder:latest

# Start container
docker run -d \
--name $CONTAINER_NAME \
--restart unless-stopped \
-p 5011:5011 \
-v ~/mcp-workspace:/work_path \
--env PYTHONUNBUFFERED=1 \
cnmcdee/mcp-coder:latest

Agentic Server Primer: Llama.cpp MCP Lesson 10: mcp-coder (Cuda Version)

We build a MCP Coding Agent that will allow your LLM to specifically work on and debug it’s own code with nvcc, or really any language!

Hot Configthinkmeltprotonmail.com

LLMQP - Power Prompt Manager for your LLM!

This is a powerhouse tool. With it you can have your LLM coding all night!

docker pull docker.io/cnmcdee/llmqueue:latest
docker run -d --name mcp-llmqueue --restart unless-stopped -p 0.0.0.0:5012:5012 cnmcdee/llmqueue:latest

LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT.

LLM Queue Dispatcher. A Powerful Harness Drop will queue your localLLM all night and keep it working!

Hot Configthinkmeltprotonmail.com

Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop.

thinkmelt@protonmail.com — Thu, 16 Apr 2026 01:09:05 GMT

In our previous lesson we built a Process Manager for our HouseLLM. This was VERY powerful because now enabled anyone to continue a context or break it up into pieces and not lose it's work - all locally from your own gear. But why not add the feature of research - of going onto the Internet and looking stuff up. Our previous lesson for your reference:

If you just need to pull and run this docker here it is!

docker pull docker.io/cnmcdee/mcp-process-manager:latest
docker run -d --name mcp-process-manager --restart unless-stopped -e "FLASH_ENV=production" -p 0.0.0.0:5008:5008 cnmcdee/mcp-process-manager:latest

Agentic Server Primer: Llama.cpp MCP Lesson 7: Process Manager (Part 1)

Hot Configthinkmeltprotonmail.com

Adding this was literally a command to your local AI, and it dutifully wrote and added this for us.

We added effective only a few tools, and had our LLM change the structure.

# ── HTML-page tools ─────────────────────────────────────────────────────
mcp.add_tool(task_manager.task_html_add)
mcp.add_tool(task_manager.task_html_get)
mcp.add_tool(task_manager.task_html_delete)
mcp.add_tool(task_manager.task_html_update)
# ── Web-fetching tools ───────────────────────────────────────────────
mcp.add_tool(task_manager.duckduckgo_html_list_fetch)
mcp.add_tool(task_manager.html_fetch)
# ── Enhanced code_drop tool (now packages everything) ───────────────
mcp.add_tool(task_manager.code_drop)

Code Drop

Because this tool can enable your LLM to generate MASSIVE amounts of code overnight, you can ask your LLM to do a 'code_drop' The back function will automatically parse all the code snippets into a ZIP, it will create a temporary download link with it's work! Serious time saver!
Note - this will generate a link to localhost, just change it to an ip address if this is not your local machine or a headless unit.
Nicee... A typical structure built and packed for your download..

Full Code

You might need to install some of the support pacakges - naturally. A very easy way to do this is just install pycharm, then cut and paste this code into any open app.py - because pycharm is very supportive it will take care of buidling your venv, and or instaling the neccessary packages to run it!

import re
import json
import os
from datetime import datetime
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse, parse_qs
from fastmcp import FastMCP
from fastmcp.tools import tool
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
from starlette.staticfiles import StaticFiles
import uvicorn
import tempfile
import shutil
from pathlib import Path

# Initialize the MCP server
mcp = FastMCP(
    name="Process Manager",
    instructions="Provides a process manager for tracking tasks and their associated jobs, code blocks, and HTML pages. "
                 "Also includes web-search (DuckDuckGo) and webpage-fetching tools for retrieving external content.")

# Directory for downloadable ZIP files (served statically at /downloads)
DOWNLOADS_DIR = Path("downloads")
DOWNLOADS_DIR.mkdir(exist_ok=True)

class TaskManager:
    def __init__(self, storage_dir: str = "processes"):
        self.storage_dir = storage_dir
        os.makedirs(self.storage_dir, exist_ok=True)
        self.current_process = None
        self.current_process_file = None

    def _sanitize_name(self, name: str) -> str:
        """Sanitize user-supplied name to prevent path traversal and invalid filenames."""
        if not name or not str(name).strip():
            name = "unnamed_process"
        safe_name = re.sub(r'[^a-zA-Z0-9_.-]', '_', str(name).strip())
        return safe_name[:150]

    def _get_filepath(self, name: str) -> str:
        """Return full path to the sanitized JSON file."""
        safe_name = self._sanitize_name(name)
        if not safe_name.endswith('.json'):
            safe_name += '.json'
        return os.path.join(self.storage_dir, safe_name)

    def _save_current(self):
        """Persist the current process to disk."""
        if self.current_process and self.current_process_file:
            filepath = self._get_filepath(self.current_process_file)
            with open(filepath, 'w', encoding='utf-8') as f:
                json.dump(self.current_process, f, indent=2, ensure_ascii=False)

    def _load_process(self, name: str) -> bool:
        """Load a process from disk and update internal state."""
        filepath = self._get_filepath(name)
        if not os.path.exists(filepath):
            return False
        try:
            with open(filepath, 'r', encoding='utf-8') as f:
                self.current_process = json.load(f)
            self.current_process_file = name
            return True
        except Exception:
            return False

    @tool()
    def process_new(self, name: str, description: str):
        """Creates a new process with the given name and description."""
        try:
            name = str(name).strip()
            if not name:
                return json.dumps({"success": False, "error": "Process name cannot be empty."})

            filepath = self._get_filepath(name)
            if os.path.exists(filepath):
                return json.dumps({"success": False, "error": f"Process '{name}' already exists."})

            self.current_process = {
                "process_name": name,
                "description": description,
                "created_at": datetime.now().isoformat(),
                "tasks": {}
            }
            self.current_process_file = name
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Process '{name}' created successfully.",
                "process": self.current_process
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def process_load(self, name: str):
        """Loads an existing process by name."""
        try:
            if self._load_process(name):
                task_count = len(self.current_process.get("tasks", {}))
                return json.dumps({
                    "success": True,
                    "message": f"Process '{name}' loaded successfully.",
                    "process_name": self.current_process.get("process_name"),
                    "task_count": task_count,
                    "process_data": self.current_process
                })
            return json.dumps({
                "success": False,
                "error": f"Process '{name}' not found."
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def process_list_all(self):
        """Lists all saved processes."""
        try:
            files = [f for f in os.listdir(self.storage_dir) if f.endswith('.json')]
            process_list = [{"process": f.replace('.json', '')} for f in files]
            return json.dumps({
                "success": True,
                "process_list": process_list,
                "total": len(process_list)
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e), "process_list": []})

    @tool()
    def process_save(self, name: str = None):
        """Saves the current process, optionally renaming it."""
        try:
            if not self.current_process:
                return json.dumps({
                    "success": False,
                    "error": "No active process to save. Create or load a process first."
                })

            if name:
                name = str(name).strip()
                self.current_process["process_name"] = name
                self.current_process_file = name

            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Process saved successfully as '{self.current_process_file}'.",
                "process_name": self.current_process.get("process_name")
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def process_delete(self, name: str):
        """Deletes an entire process and its associated file from storage.
        If the deleted process is currently loaded, the in-memory state is cleared."""
        try:
            name = str(name).strip()
            if not name:
                return json.dumps({"success": False, "error": "Process name cannot be empty."})

            filepath = self._get_filepath(name)
            if not os.path.exists(filepath):
                return json.dumps({
                    "success": False,
                    "error": f"Process '{name}' not found."
                })

            os.remove(filepath)

            if (self.current_process_file and
                    self._sanitize_name(self.current_process_file) == self._sanitize_name(name)):
                self.current_process = None
                self.current_process_file = None

            return json.dumps({
                "success": True,
                "message": f"Process '{name}' deleted successfully.",
                "deleted_process": name
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_new(self, task_name: str, task_description: str):
        """Creates a new task within the current process."""
        try:
            if not self.current_process:
                return json.dumps({"success": False, "error": "No active process. Create or load one first."})
            if task_name in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' already exists."})

            self.current_process["tasks"][task_name] = {
                "description": task_description,
                "created_at": datetime.now().isoformat(),
                "jobs": {},
                "code": {},
                "html_pages": {}
            }
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Task '{task_name}' created successfully.",
                "task": self.current_process["tasks"][task_name]
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_list(self):
        """Lists all tasks with summary information."""
        try:
            if not self.current_process:
                return json.dumps({"success": False, "error": "No active process."})

            tasks_summary = {}
            for t_name, t_data in self.current_process["tasks"].items():
                tasks_summary[t_name] = {
                    "description": t_data["description"],
                    "created_at": t_data["created_at"],
                    "job_count": len(t_data.get("jobs", {})),
                    "code_count": len(t_data.get("code", {})),
                    "html_count": len(t_data.get("html_pages", {}))
                }

            return json.dumps({
                "success": True,
                "task_count": len(tasks_summary),
                "tasks": tasks_summary
            }, indent=2)
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_details_get(self, task_name: str):
        """Retrieves complete details of a specific task."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            task_data = self.current_process["tasks"][task_name]
            return json.dumps({
                "success": True,
                "task_name": task_name,
                "task_description": task_data["description"],
                "created_at": task_data["created_at"],
                "jobs": list(task_data.get("jobs", {}).values()),
                "codes": list(task_data.get("code", {}).values()),
                "html_pages": list(task_data.get("html_pages", {}).values()),
                "job_count": len(task_data.get("jobs", {})),
                "code_count": len(task_data.get("code", {})),
                "html_count": len(task_data.get("html_pages", {}))
            }, indent=2)
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_update_description(self, task_name: str, new_description: str):
        """Updates the description of an existing task."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            old_description = self.current_process["tasks"][task_name]["description"]
            self.current_process["tasks"][task_name]["description"] = new_description
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Task '{task_name}' description updated successfully.",
                "task_name": task_name,
                "old_description": old_description,
                "new_description": new_description
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_name_change(self, old_task_name: str, new_task_name: str):
        """Renames an existing task."""
        try:
            if not self.current_process:
                return json.dumps({"success": False, "error": "No active process."})
            if old_task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{old_task_name}' does not exist."})
            if new_task_name in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{new_task_name}' already exists."})

            self.current_process["tasks"][new_task_name] = self.current_process["tasks"].pop(old_task_name)
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Task renamed from '{old_task_name}' to '{new_task_name}' successfully.",
                "old_name": old_task_name,
                "new_name": new_task_name
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_delete(self, task_name: str):
        """Deletes a task from the current process."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            del self.current_process["tasks"][task_name]
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Task '{task_name}' deleted successfully."
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_job_add(self, task_name: str, job_name: str, job_description: str):
        """Adds a job to the specified task."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            job_key = f"{job_name}_{datetime.now().isoformat()}"
            self.current_process["tasks"][task_name]["jobs"][job_key] = {
                "name": job_name,
                "description": job_description,
                "created_at": datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Job '{job_name}' added to task '{task_name}' successfully."
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_job_get(self, task_name: str, job_name: str):
        """Retrieves job(s) matching the given job_name (partial match)."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            jobs = self.current_process["tasks"][task_name].get("jobs", {})
            matching = [data for key, data in jobs.items() if job_name in key or job_name == data.get("name")]

            return json.dumps({
                "success": True,
                "task_name": task_name,
                "jobs": matching
            }, indent=2)
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_code_add(self, task_name: str, code_name: str, code_content: str):
        """Adds a code snippet to the specified task."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            code_key = f"{code_name}_{datetime.now().isoformat()}"
            self.current_process["tasks"][task_name]["code"][code_key] = {
                "name": code_name,
                "content": code_content,
                "created_at": datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Code '{code_name}' added to task '{task_name}' successfully."
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_code_get(self, task_name: str, code_name: str):
        """Retrieves code snippet(s) matching the given code_name (partial match)."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            codes = self.current_process["tasks"][task_name].get("code", {})
            matching = [data for key, data in codes.items() if code_name in key or code_name == data.get("name")]

            return json.dumps({
                "success": True,
                "task_name": task_name,
                "codes": matching
            }, indent=2)
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_code_delete(self, task_name: str, code_name: str):
        """Deletes code snippet(s) matching the given code_name (partial match)."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            codes = self.current_process["tasks"][task_name].get("code", {})
            keys_to_delete = [k for k in codes if code_name in k or code_name == codes[k].get("name")]
            deleted = [codes[k] for k in keys_to_delete]

            for k in keys_to_delete:
                del self.current_process["tasks"][task_name]["code"][k]

            self._save_current()

            if not deleted:
                return json.dumps({
                    "success": False,
                    "message": f"No code matching '{code_name}' found in task '{task_name}'.",
                    "task_name": task_name,
                    "code_name": code_name
                })

            return json.dumps({
                "success": True,
                "message": f"Deleted {len(deleted)} code snippet(s) matching '{code_name}'.",
                "task_name": task_name,
                "deleted_count": len(deleted),
                "deleted_codes": deleted
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_html_add(self, task_name: str, html_name: str, html_content: str):
        """Adds an HTML page to the specified task."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            html_key = f"{html_name}_{datetime.now().isoformat()}"
            self.current_process["tasks"][task_name]["html_pages"][html_key] = {
                "name": html_name,
                "content": html_content,
                "created_at": datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"HTML page '{html_name}' added to task '{task_name}' successfully."
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_html_get(self, task_name: str, html_name: str):
        """Retrieves HTML page(s) matching the given html_name (partial match)."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            html_pages = self.current_process["tasks"][task_name].get("html_pages", {})
            matching = [data for key, data in html_pages.items() if html_name in key or html_name == data.get("name")]

            return json.dumps({
                "success": True,
                "task_name": task_name,
                "html_pages": matching
            }, indent=2)
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_html_delete(self, task_name: str, html_name: str):
        """Deletes HTML page(s) matching the given html_name (partial match)."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            html_pages = self.current_process["tasks"][task_name].get("html_pages", {})
            keys_to_delete = [k for k in html_pages if html_name in k or html_name == html_pages[k].get("name")]
            deleted = [html_pages[k] for k in keys_to_delete]

            for k in keys_to_delete:
                del self.current_process["tasks"][task_name]["html_pages"][k]

            self._save_current()

            if not deleted:
                return json.dumps({
                    "success": False,
                    "message": f"No HTML page matching '{html_name}' found in task '{task_name}'.",
                    "task_name": task_name,
                    "html_name": html_name
                })

            return json.dumps({
                "success": True,
                "message": f"Deleted {len(deleted)} HTML page(s) matching '{html_name}'.",
                "task_name": task_name,
                "deleted_count": len(deleted),
                "deleted_html_pages": deleted
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_html_update(self, task_name: str, html_name: str, new_content: str):
        """Updates (by creating a new version of) an HTML page."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            html_key = f"{html_name}_{datetime.now().isoformat()}"
            self.current_process["tasks"][task_name]["html_pages"][html_key] = {
                "name": html_name,
                "content": new_content,
                "created_at": datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"HTML page '{html_name}' updated successfully in task '{task_name}'.",
                "new_version_created": True
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def duckduckgo_html_list_fetch(self, query: str, max_results: int = 5):
        """Searches DuckDuckGo via its HTML endpoint and returns a list of results (title, URL, snippet)."""
        try:
            query = str(query).strip()
            if not query:
                return json.dumps({"success": False, "error": "Search query cannot be empty."})

            search_url = f"https://duckduckgo.com/html/?q={requests.utils.quote(query)}"
            headers = {
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                              "(KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36"
            }

            resp = requests.get(search_url, headers=headers, timeout=15)
            resp.raise_for_status()

            soup = BeautifulSoup(resp.text, "html.parser")
            results = []

            for item in soup.select(".result__body")[:max_results]:
                a_tag = item.select_one(".result__a")
                snippet_tag = item.select_one(".result__snippet")

                if a_tag:
                    title = a_tag.get_text(strip=True)
                    href = a_tag.get("href", "")

                    if "uddg=" in href:
                        href = href.split("uddg=")[1].split("&")[0]
                        href = requests.utils.unquote(href)

                    snippet = snippet_tag.get_text(strip=True) if snippet_tag else ""

                    results.append({
                        "title": title,
                        "url": href,
                        "snippet": snippet
                    })

            return json.dumps({
                "success": True,
                "query": query,
                "result_count": len(results),
                "results": results
            }, indent=2)

        except Exception as e:
            return json.dumps({"success": False, "error": f"DuckDuckGo search failed: {str(e)}"})

    @tool()
    def html_fetch(self, url: str):
        """Fetches a webpage and returns clean, readable text content parsed by BeautifulSoup."""
        try:
            url = str(url).strip()
            if not url.startswith(("http://", "https://")):
                url = "https://" + url

            headers = {
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                              "(KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36"
            }

            resp = requests.get(url, headers=headers, timeout=20)
            resp.raise_for_status()

            soup = BeautifulSoup(resp.text, "html.parser")

            for unwanted in soup(["script", "style", "nav", "header", "footer", "aside", "form", "button"]):
                unwanted.decompose()

            text = soup.get_text(separator="\n", strip=True)

            if len(text) > 40000:
                text = text[:40000] + "\n\n[Content truncated due to length...]"

            page_title = soup.title.string.strip() if soup.title else "No title found"

            return json.dumps({
                "success": True,
                "url": url,
                "title": page_title,
                "content_length": len(text),
                "text": text
            }, indent=2)

        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def task_jobs_list(self):
        """Lists all tasks (identical to task_list for compatibility)."""
        return self.task_list()

    @tool()
    def task_code_update(self, task_name: str, code_name: str, new_content: str):
        """Updates (by creating a new version of) a code snippet."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            code_key = f"{code_name}_{datetime.now().isoformat()}"
            self.current_process["tasks"][task_name]["code"][code_key] = {
                "name": code_name,
                "content": new_content,
                "created_at": datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Code '{code_name}' updated successfully in task '{task_name}'.",
                "new_version_created": True
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})

    @tool()
    def code_drop(self, custom_filename: str = None):
        """Packages the ENTIRE process: all code snippets, all HTML pages, full process metadata (configuration), task descriptions, and jobs into a well-organized ZIP archive with clean folder structure."""
        try:
            if not self.current_process:
                return json.dumps({"success": False, "error": "No active process. Create or load a process first."})

            process_name = self.current_process.get("process_name", "unnamed_process")
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

            # Determine ZIP filename
            if custom_filename:
                zip_name = custom_filename if custom_filename.endswith('.zip') else f"{custom_filename}.zip"
            else:
                zip_name = f"full_export_{process_name}_{timestamp}.zip"

            with tempfile.TemporaryDirectory() as tmp_dir:
                tmp_path = Path(tmp_dir)
                root_dir = tmp_path / f"{process_name}_{timestamp}"
                root_dir.mkdir(parents=True, exist_ok=True)

                # 1. Save complete process metadata (this acts as the master configuration file)
                (root_dir / "full_process_backup.json").write_text(
                    json.dumps(self.current_process, indent=2, ensure_ascii=False),
                    encoding="utf-8"
                )

                # 2. Create tasks directory with organized content
                tasks_dir = root_dir / "tasks"
                tasks_dir.mkdir(exist_ok=True)

                total_code = 0
                total_html = 0

                for task_name, task_data in self.current_process.get("tasks", {}).items():
                    safe_task_name = self._sanitize_name(task_name)
                    task_dir = tasks_dir / safe_task_name
                    task_dir.mkdir(exist_ok=True)

                    # Task description
                    desc_path = task_dir / "task_description.txt"
                    desc_path.write_text(
                        f"# Task: {task_name}\n\n{task_data.get('description', 'No description provided.')}",
                        encoding="utf-8"
                    )

                    # Jobs summary
                    jobs = task_data.get("jobs", {})
                    if jobs:
                        jobs_md = "# Jobs\n\n"
                        for job in jobs.values():
                            jobs_md += f"## {job.get('name', 'Unnamed Job')}\n{job.get('description', '')}\n\n"
                        (task_dir / "jobs_summary.md").write_text(jobs_md, encoding="utf-8")

                    # Code snippets
                    code_dir = task_dir / "code"
                    code_dir.mkdir(exist_ok=True)
                    for code_entry in task_data.get("code", {}).values():
                        code_name = code_entry.get("name", "unnamed_code")
                        content = code_entry.get("content", "")
                        if "." not in code_name and not code_name.lower().endswith(('.py', '.js', '.java', '.cpp', '.go', '.rs')):
                            code_name += ".py"
                        (code_dir / code_name).write_text(content, encoding="utf-8")
                        total_code += 1

                    # HTML pages
                    html_dir = task_dir / "html"
                    html_dir.mkdir(exist_ok=True)
                    for html_entry in task_data.get("html_pages", {}).values():
                        html_name = html_entry.get("name", "unnamed_page")
                        content = html_entry.get("content", "")
                        if not html_name.lower().endswith(('.html', '.htm')):
                            html_name += ".html"
                        (html_dir / html_name).write_text(content, encoding="utf-8")
                        total_html += 1

                # 3. Top-level README
                readme = f"""# {process_name} - Full Export

**Generated:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

- **Tasks**: {len(self.current_process.get("tasks", {}))}
- **Code files**: {total_code}
- **HTML pages**: {total_html}

## Folder Structure
- `full_process_backup.json` → Complete process data (metadata + everything)
- `tasks//`
  - `task_description.txt`
  - `jobs_summary.md`
  - `code/` → All code snippets
  - `html/` → All HTML pages

Ready for development, review, or deployment.
"""
                (root_dir / "README.md").write_text(readme, encoding="utf-8")

                # Create ZIP archive
                zip_base = DOWNLOADS_DIR / zip_name.replace('.zip', '')
                shutil.make_archive(str(zip_base), 'zip', root_dir)

            download_url = f"http://localhost:5008/downloads/{zip_name}"

            return json.dumps({
                "success": True,
                "message": f"Successfully created full project export for '{process_name}' ({total_code} code files, {total_html} HTML pages).",
                "download_url": download_url,
                "filename": zip_name,
                "code_count": total_code,
                "html_count": total_html,
                "task_count": len(self.current_process.get("tasks", {}))
            }, indent=2)

        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})


task_manager = TaskManager()

# ── Register all bound methods as tools ─────────────────────
mcp.add_tool(task_manager.process_new)
mcp.add_tool(task_manager.process_load)
mcp.add_tool(task_manager.process_list_all)
mcp.add_tool(task_manager.process_save)
mcp.add_tool(task_manager.process_delete)
mcp.add_tool(task_manager.task_new)
mcp.add_tool(task_manager.task_update_description)
mcp.add_tool(task_manager.task_delete)
mcp.add_tool(task_manager.task_name_change)
mcp.add_tool(task_manager.task_job_add)
mcp.add_tool(task_manager.task_job_get)
mcp.add_tool(task_manager.task_details_get)
mcp.add_tool(task_manager.task_list)
mcp.add_tool(task_manager.task_code_add)
mcp.add_tool(task_manager.task_code_get)
mcp.add_tool(task_manager.task_code_delete)
mcp.add_tool(task_manager.task_jobs_list)
mcp.add_tool(task_manager.task_code_update)
# ── HTML-page tools ─────────────────────────────────────────────────────
mcp.add_tool(task_manager.task_html_add)
mcp.add_tool(task_manager.task_html_get)
mcp.add_tool(task_manager.task_html_delete)
mcp.add_tool(task_manager.task_html_update)
# ── Web-fetching tools ───────────────────────────────────────────────
mcp.add_tool(task_manager.duckduckgo_html_list_fetch)
mcp.add_tool(task_manager.html_fetch)
# ── Enhanced code_drop tool (now packages everything) ───────────────
mcp.add_tool(task_manager.code_drop)

# ── Server Startup with CORS and static downloads route ─────────────────────
if __name__ == "__main__":
    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=["*"],
            allow_credentials=True,
            allow_methods=["GET", "POST", "OPTIONS"],
            allow_headers=["*"],
            expose_headers=["*"],
        )
    ]

    app = mcp.http_app(
        path="/mcp",
        middleware=middleware
    )

    app.mount("/downloads", StaticFiles(directory=str(DOWNLOADS_DIR)), name="downloads")

    uvicorn.run(
        app,
        host="0.0.0.0",
        port=5008,
        log_level="info"
    )

We stayed on 5008, it is the same port as last, really just an add-on. We'll post the full guide on how to dockerize it at the end of this article. For now - how does it work. Well - Good!

We now can just work on Task 4 - independent of the entire job.

Context SlowDown

When you start seriously building production as your context size increases your LLM will slow down. But this is the beauty of this tool - because you can chop your context up into pieces and work on each one. It slowly gave us it's code snippets.

Restart is Now Harmless

Recycle your work automatically as you can simply start a new context, delete old ones - even assign two LLM's to the same task. That's power!

We Let a LLM Write The Rest..

We made sure this docker build was based upon the source code, so that requirements matched.

requirements.txt

fastmcp
requests
beautifulsoup4
uvicorn[standard]
starlette

Dockerfile

FROM python:3.12-slim

WORKDIR /app

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code
COPY app.py .

# Ensure the storage directory exists (volume mount will override at runtime)
RUN mkdir -p processes

# Expose the port used by the FastMCP server
EXPOSE 5008

# Start the application
CMD ["python", "app.py"]

docker-compose.yml

version: '3.9'

services:
  task-manager:
    build: .
    container_name: task-manager-mcp
    ports:
      - "5008:5008"
    volumes:
      - ./processes:/app/processes
    restart: unless-stopped
    environment:
      - PYTHONUNBUFFERED=1

Breakdown of Dockerizing the Application

The provided Python code implements a FastMCP-based process manager server that persists data to JSON files in a processes/ directory, exposes an HTTP endpoint on port 5008 with CORS middleware, and depends on several external libraries. Dockerizing this application ensures consistent runtime environments, isolates dependencies, facilitates reproducible deployments, and simplifies scaling or distribution.

The proposed solution consists of three standard files and assumes the supplied code is saved as app.py in the project root. The directory structure would then be:

.
├── app.py          # Your complete Python script
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
└── processes/      # Created automatically; persisted via volume

requirements.txt
This file explicitly declares all non-standard-library dependencies extracted from the imports (fastmcp, requests, beautifulsoup4, uvicorn, and starlette). The [standard] extra for uvicorn includes recommended logging and signal-handling utilities. Using a pinned requirements file guarantees that pip install produces identical environments across builds.

Dockerfile

Starts from the lightweight python:3.12-slim base image to minimize image size and attack surface.
Sets /app as the working directory and installs dependencies first (leveraging Docker layer caching for faster rebuilds).
Copies app.py and creates the processes/ directory.
Exposes port 5008 to document the service port (the 0.0.0.0 binding inside the code ensures accessibility from outside the container).
The CMD executes the script directly, which runs uvicorn internally and starts the FastMCP HTTP application with the registered tools.

docker-compose.yml
This orchestration file defines a single service for straightforward local or production deployment. Key features include:

Port mapping (5008:5008) so the MCP endpoint is reachable at http://localhost:5008/mcp.
A bind mount (./processes:/app/processes) that persists all JSON process files to the host filesystem. Without this volume, data would be lost on container restart or rebuild.
restart: unless-stopped for resilience.
PYTHONUNBUFFERED=1 ensures log output appears immediately in docker logs or docker-compose console.

Usage Instructions

Save the provided code as app.py in the project root.
Create the three files listed above.
(Optional) Create an empty processes/ directory on the host.
Build and start the container:

docker compose up --build -d

Verify the service:

docker logs task-manager-mcp

The server should bind to 0.0.0.0:5008 and display startup messages from Uvicorn.

Stop/restart: docker compose down or docker compose restart.

Additional Considerations

Persistence: The TaskManager class writes to disk; the volume mount is essential for production use.
Security: CORS is configured to allow all origins (as in the original code). In production, restrict allow_origins to specific domains.
Scaling: The current setup is single-container; for horizontal scaling, consider a reverse proxy or shared storage.
Image size and performance: The slim base and --no-cache-dir keep the final image compact. No additional system packages are required because all dependencies are pure-Python or have minimal C extensions handled by the base image.
Updates: To change dependencies or the application code, simply edit the respective files and re-run docker compose up --build.

This configuration provides a complete, production-ready Docker setup that faithfully reproduces the original application behavior while adding portability and data persistence. If any adjustments (e.g., environment variables, multi-stage builds, or health checks) are required, please provide further specifications.

Agentic Server Primer: Llama.cpp MCP Lesson 7: Process Manager (Part 1)

thinkmelt@protonmail.com — Wed, 15 Apr 2026 07:04:43 GMT

Here is where it gets very interesting. An process manager or harness can distribute jobs, but first we want to give your LLM it's own 'notepad.' Effectively a simple hiearchy would be:

If you just need to pull and run this LLM it can be easily done with:

docker pull docker.io/cnmcdee/mcp-process-manager:latest
docker run -d --name mcp-process-manager --restart unless-stopped -e "FLASH_ENV=production" -p 0.0.0.0:5008:5008 cnmcdee/mcp-process-manager:latest

Processes are masters (One per file .json)
Tasks can exist inside them.
Jobs / code snippets can exist inside each task.
We want to make a front end manager for this in (Part 2) - right now will be a functional back-end!

Here is your code! Which has been through multiple refactors / audits and should be ready.

import re
import json
import os
from datetime import datetime
from fastmcp import FastMCP
from fastmcp.tools import tool          # ← Required import
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
import uvicorn

# Initialize the MCP server
mcp = FastMCP(
    name="Process Manager",
    instructions="Provides a process manager for tracking tasks and their associate jobs along with their associate code blocks")

class TaskManager:
    def __init__(self, storage_dir: str = "processes"):
        self.storage_dir = storage_dir
        os.makedirs(self.storage_dir, exist_ok=True)
        self.current_process = None          # Full process dict (metadata + tasks)
        self.current_process_file = None     # Base name of the active .json file
    def _sanitize_name(self, name: str) -> str:
        """Sanitize user-supplied name to prevent path traversal and invalid filenames."""
        if not name or not str(name).strip():
            name = "unnamed_process"
        safe_name = re.sub(r'[^a-zA-Z0-9_.-]', '_', str(name).strip())
        return safe_name[:150]
    def _get_filepath(self, name: str) -> str:
        """Return full path to the sanitized JSON file."""
        safe_name = self._sanitize_name(name)
        if not safe_name.endswith('.json'):
            safe_name += '.json'
        return os.path.join(self.storage_dir, safe_name)
    def _save_current(self):
        """Persist the current process to disk."""
        if self.current_process and self.current_process_file:
            filepath = self._get_filepath(self.current_process_file)
            with open(filepath, 'w', encoding='utf-8') as f:
                json.dump(self.current_process, f, indent=2, ensure_ascii=False)
    def _load_process(self, name: str) -> bool:
        """Load a process from disk and update internal state."""
        filepath = self._get_filepath(name)
        if not os.path.exists(filepath):
            return False
        try:
            with open(filepath, 'r', encoding='utf-8') as f:
                self.current_process = json.load(f)
            self.current_process_file = name
            return True
        except Exception:
            return False
    @tool()
    def process_new(self, name: str, description: str):
        """Creates a new process with the given name and description."""
        try:
            name = str(name).strip()
            if not name:
                return json.dumps({"success": False, "error": "Process name cannot be empty."})

            filepath = self._get_filepath(name)
            if os.path.exists(filepath):
                return json.dumps({"success": False, "error": f"Process '{name}' already exists."})

            self.current_process = {
                "process_name": name,
                "description": description,
                "created_at": datetime.now().isoformat(),
                "tasks": {}
            }
            self.current_process_file = name
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Process '{name}' created successfully.",
                "process": self.current_process
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def process_load(self, name: str):
        """Loads an existing process by name."""
        try:
            if self._load_process(name):
                task_count = len(self.current_process.get("tasks", {}))
                return json.dumps({
                    "success": True,
                    "message": f"Process '{name}' loaded successfully.",
                    "process_name": self.current_process.get("process_name"),
                    "task_count": task_count,
                    "process_data": self.current_process
                })
            return json.dumps({
                "success": False,
                "error": f"Process '{name}' not found."
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def process_list_all(self):
        """Lists all saved processes."""
        try:
            files = [f for f in os.listdir(self.storage_dir) if f.endswith('.json')]
            process_list = [{"process": f.replace('.json', '')} for f in files]
            return json.dumps({
                "success": True,
                "process_list": process_list,
                "total": len(process_list)
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e), "process_list": []})
    @tool()
    def process_save(self, name: str = None):
        """Saves the current process, optionally renaming it."""
        try:
            if not self.current_process:
                return json.dumps({
                    "success": False,
                    "error": "No active process to save. Create or load a process first."
                })

            if name:
                name = str(name).strip()
                self.current_process["process_name"] = name
                self.current_process_file = name

            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Process saved successfully as '{self.current_process_file}'.",
                "process_name": self.current_process.get("process_name")
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def process_delete(self, name: str):
        """Deletes an entire process and its associated file from storage.
        If the deleted process is currently loaded, the in-memory state is cleared."""
        try:
            name = str(name).strip()
            if not name:
                return json.dumps({"success": False, "error": "Process name cannot be empty."})

            filepath = self._get_filepath(name)
            if not os.path.exists(filepath):
                return json.dumps({
                    "success": False,
                    "error": f"Process '{name}' not found."
                })

            # Delete the file
            os.remove(filepath)

            # If this was the currently loaded process, clear internal state
            if (self.current_process_file and
                    self._sanitize_name(self.current_process_file) == self._sanitize_name(name)):
                self.current_process = None
                self.current_process_file = None

            return json.dumps({
                "success": True,
                "message": f"Process '{name}' deleted successfully.",
                "deleted_process": name
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_new(self, task_name: str, task_description: str):
        """Creates a new task within the current process."""
        try:
            if not self.current_process:
                return json.dumps({"success": False, "error": "No active process. Create or load one first."})
            if task_name in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' already exists."})

            self.current_process["tasks"][task_name] = {
                "description": task_description,
                "created_at": datetime.now().isoformat(),
                "jobs": {},
                "code": {}
            }
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Task '{task_name}' created successfully.",
                "task": self.current_process["tasks"][task_name]
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_list(self):
        """Lists all tasks with summary information."""
        try:
            if not self.current_process:
                return json.dumps({"success": False, "error": "No active process."})

            tasks_summary = {}
            for t_name, t_data in self.current_process["tasks"].items():
                tasks_summary[t_name] = {
                    "description": t_data["description"],
                    "created_at": t_data["created_at"],
                    "job_count": len(t_data.get("jobs", {})),
                    "code_count": len(t_data.get("code", {}))
                }

            return json.dumps({
                "success": True,
                "task_count": len(tasks_summary),
                "tasks": tasks_summary
            }, indent=2)
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_details_get(self, task_name: str):
        """Retrieves complete details of a specific task."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            task_data = self.current_process["tasks"][task_name]
            return json.dumps({
                "success": True,
                "task_name": task_name,
                "task_description": task_data["description"],
                "created_at": task_data["created_at"],
                "jobs": list(task_data.get("jobs", {}).values()),
                "codes": list(task_data.get("code", {}).values()),
                "job_count": len(task_data.get("jobs", {})),
                "code_count": len(task_data.get("code", {}))
            }, indent=2)
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_update_description(self, task_name: str, new_description: str):
        """Updates the description of an existing task."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            old_description = self.current_process["tasks"][task_name]["description"]
            self.current_process["tasks"][task_name]["description"] = new_description
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Task '{task_name}' description updated successfully.",
                "task_name": task_name,
                "old_description": old_description,
                "new_description": new_description
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_name_change(self, old_task_name: str, new_task_name: str):
        """Renames an existing task."""
        try:
            if not self.current_process:
                return json.dumps({"success": False, "error": "No active process."})
            if old_task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{old_task_name}' does not exist."})
            if new_task_name in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{new_task_name}' already exists."})

            self.current_process["tasks"][new_task_name] = self.current_process["tasks"].pop(old_task_name)
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Task renamed from '{old_task_name}' to '{new_task_name}' successfully.",
                "old_name": old_task_name,
                "new_name": new_task_name
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_delete(self, task_name: str):
        """Deletes a task from the current process."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            del self.current_process["tasks"][task_name]
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Task '{task_name}' deleted successfully."
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_job_add(self, task_name: str, job_name: str, job_description: str):
        """Adds a job to the specified task."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            job_key = f"{job_name}_{datetime.now().isoformat()}"
            self.current_process["tasks"][task_name]["jobs"][job_key] = {
                "name": job_name,
                "description": job_description,
                "created_at": datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Job '{job_name}' added to task '{task_name}' successfully."
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_job_get(self, task_name: str, job_name: str):
        """Retrieves job(s) matching the given job_name (partial match)."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            jobs = self.current_process["tasks"][task_name].get("jobs", {})
            matching = [data for key, data in jobs.items() if job_name in key or job_name == data.get("name")]

            return json.dumps({
                "success": True,
                "task_name": task_name,
                "jobs": matching
            }, indent=2)
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_code_add(self, task_name: str, code_name: str, code_content: str):
        """Adds a code snippet to the specified task."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            code_key = f"{code_name}_{datetime.now().isoformat()}"
            self.current_process["tasks"][task_name]["code"][code_key] = {
                "name": code_name,
                "content": code_content,
                "created_at": datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Code '{code_name}' added to task '{task_name}' successfully."
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_code_get(self, task_name: str, code_name: str):
        """Retrieves code snippet(s) matching the given code_name (partial match)."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            codes = self.current_process["tasks"][task_name].get("code", {})
            matching = [data for key, data in codes.items() if code_name in key or code_name == data.get("name")]

            return json.dumps({
                "success": True,
                "task_name": task_name,
                "codes": matching
            }, indent=2)
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_code_delete(self, task_name: str, code_name: str):
        """Deletes code snippet(s) matching the given code_name (partial match)."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            codes = self.current_process["tasks"][task_name].get("code", {})
            keys_to_delete = [k for k in codes if code_name in k or code_name == codes[k].get("name")]
            deleted = [codes[k] for k in keys_to_delete]

            for k in keys_to_delete:
                del self.current_process["tasks"][task_name]["code"][k]

            self._save_current()

            if not deleted:
                return json.dumps({
                    "success": False,
                    "message": f"No code matching '{code_name}' found in task '{task_name}'.",
                    "task_name": task_name,
                    "code_name": code_name
                })

            return json.dumps({
                "success": True,
                "message": f"Deleted {len(deleted)} code snippet(s) matching '{code_name}'.",
                "task_name": task_name,
                "deleted_count": len(deleted),
                "deleted_codes": deleted
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})
    @tool()
    def task_jobs_list(self):
        """Lists all tasks (identical to task_list for compatibility)."""
        return self.task_list()
    @tool()
    def task_code_update(self, task_name: str, code_name: str, new_content: str):
        """Updates (by creating a new version of) a code snippet."""
        try:
            if not self.current_process or task_name not in self.current_process["tasks"]:
                return json.dumps({"success": False, "error": f"Task '{task_name}' does not exist."})

            code_key = f"{code_name}_{datetime.now().isoformat()}"
            self.current_process["tasks"][task_name]["code"][code_key] = {
                "name": code_name,
                "content": new_content,
                "created_at": datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                "success": True,
                "message": f"Code '{code_name}' updated successfully in task '{task_name}'.",
                "new_version_created": True
            })
        except Exception as e:
            return json.dumps({"success": False, "error": str(e)})


# Registration remains unchanged (place after class definition)
# task_manager = TaskManager()
# mcp.add_tool(task_manager.process_new)
# ... (add all other methods as before)



task_manager = TaskManager()
# ── Register all bound methods as tools (required step) ─────────────────────
mcp.add_tool(task_manager.process_new)
mcp.add_tool(task_manager.process_load)
mcp.add_tool(task_manager.process_list_all)
mcp.add_tool(task_manager.process_save)
mcp.add_tool(task_manager.process_delete)
mcp.add_tool(task_manager.task_new)
mcp.add_tool(task_manager.task_update_description)
mcp.add_tool(task_manager.task_delete)
mcp.add_tool(task_manager.task_name_change)
mcp.add_tool(task_manager.task_job_add)
mcp.add_tool(task_manager.task_job_get)
mcp.add_tool(task_manager.task_details_get)
mcp.add_tool(task_manager.task_list)
mcp.add_tool(task_manager.task_code_add)
mcp.add_tool(task_manager.task_code_get)
mcp.add_tool(task_manager.task_code_delete)
mcp.add_tool(task_manager.task_jobs_list)
mcp.add_tool(task_manager.task_code_update)

# ── Server Startup with CORS (required for llama.cpp frontend) ───────────────
if __name__ == "__main__":
    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=["*"],  # Restrict in production
            allow_credentials="True",
            allow_methods=["GET", "POST", "OPTIONS"],
            allow_headers=["*"],
            expose_headers=["*"],
        )
    ]

    app = mcp.http_app(
        path="/mcp",
        middleware=middleware
    )

    uvicorn.run(
        app,
        host="0.0.0.0",
        port=5008,
        log_level="info"
    )

Once it is stood up it will listen on port 5008. You can add it to your Llama server as in:

Once added it will look as:

Testing. Of very interest now testing cycles are not really done by you - they are done by your LLM. Some important points:

Each LLM may like its agentic tool structured differently, the best way is to simply paste the MCP code into the LLM and ask it to audit it, and or audit the docstring.

Naturally when this is done you will need to dockerize it, and let it live happily on port 5008, a simply walkthrough follows.

Dockerfile, requirements.txt, and docker-compose.yml

The following files have been prepared specifically for the provided TaskManager + FastMCP application. They follow industry best practices for security, reproducibility, and data persistence.

Save the entire Python code you supplied as app.py in your project root directory.

1. `requirements.txt`

fastmcp
starlette
uvicorn[standard]

This installs:

fastmcp (the core MCP framework)
starlette (explicitly for Middleware and CORSMiddleware)
uvicorn[standard] (ASGI server with recommended extras for production)

2. `Dockerfile`

# syntax=docker/dockerfile:1.4
FROM python:3.12-slim AS builder

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir -r requirements.txt

# Final stage
FROM python:3.12-slim

# Create non-root user for security
RUN useradd --create-home --shell /bin/false appuser

WORKDIR /app

# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

# Copy application code
COPY app.py .

# Create persistent storage directory and set permissions
RUN mkdir -p /app/processes && \
    chown -R appuser:appuser /app

# Switch to non-root user
USER appuser

EXPOSE 5008

# Health check (optional but recommended)
HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:5008/mcp/health || exit 1

CMD ["python", "app.py"]

3. `docker-compose.yml`

version: '3.9'

services:
  process-manager:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: process-manager
    restart: unless-stopped
    ports:
      - "5008:5008"
    volumes:
      # Persistent storage for process JSON files
      - process-data:/app/processes
    environment:
      - PYTHONUNBUFFERED=1
    # Optional: limit resources in production
    # deploy:
    #   resources:
    #     limits:
    #       cpus: '1.0'
    #       memory: 512M

volumes:
  process-data:
    driver: local
    name: process-manager-data

Deployment Instructions

Place the three files and app.py in the same directory.
Build and start the service:

docker compose up -d --build

Verify the server is running:

docker compose logs -f process-manager

The MCP endpoint will be available at http://localhost:5008/mcp.

Key Benefits of This Setup

Secure (non-root user, minimal base image)
Persistent data (the processes/ directory survives container restarts)
Fast rebuilds (multi-stage build)
Production-ready (healthcheck, restart policy, resource isolation)

If you require additional features (e.g., .dockerignore, environment variables, HTTPS with Caddy/Traefik, or multi-container setup), please provide further details.

"We need a front end for this process manager write the task and go through each task writing all the code."

How far do you think it will get?

Agentic Server Primer: Llama.cpp MCP Lesson 6: Adding mysql Database Docker Toolsets.

thinkmelt@protonmail.com — Sun, 12 Apr 2026 23:13:33 GMT

Ok fun time. Today we are going to give our House LLM access to it's own mysql database connection and see how it handles it. Of course give your head a shake if you ever do this to production. Test and verify THEN move to whatever production you are running.

If you just need to pull and run this container:

docker pull docker.io/cnmcdee/mcp-msyql:latest
docker run -d --name mcp-mysql --restart unless-stopped -e "FLASH_ENV=production" -p 0.0.0.0:5004:5004 cnmcdee/mcp-mysql:latest

The Process.

A. Make a docker mysql container (Port 5004) We deliberately did not use 3306 in case you have something important running there for your interests.
B. Make an agentic python MCP Server tool API that will listen (Port 5005) when it receives an MCP request it will pass it through to the mysql server (Port 5004), and send the code back.
C. Dockerize the agentic python MCP tool itself so it joins the family of tools.

A. Docker Mysql Container

These are pretty straightforward to make, because it's so common we can literally just go straight to a docker-compose.yml and stand it up.

Create yourself a directory, create a docker-compose.yml and put in it:

version: '3.8'

services:
  mysql:
    image: mysql:8.0
    container_name: mysql-server
    restart: unless-stopped
    command: --default-authentication-plugin=mysql_native_password   # ← forces compatible hash type
    environment:
      MYSQL_ROOT_PASSWORD: YourStrongPasswordHere123!   # ← use your actual password
    ports:
      - "5004:3306"
    volumes:
      - mysql_data:/var/lib/mysql

volumes:
  mysql_data:

Stand it up easily with:

docker compose up      # Temporary watch it mode
docker compose up -d   # Permanent daemon mode

Troubleshooting mysql Docker Connections

It is very common to have considerable issues at this point connecting to your mysql container so we will have a small fork on troubleshooting it.

Inspect it make sure the external port is bound to the internal port as in:

docker ps -a
#gives something like
33060/tcp, 0.0.0.0:5004->3306/tcp, [::]:5004->3306/tcp   mysql-server

Remember atypically every docker container will have a 172.x Class B IP subnet and it may not be bound to the external of the machine, so you can nmap :

sudo apt install nmap
nmap  
# For our references
Nmap scan report for dragon-192-168-1-3 (192.168.1.3)
Host is up (0.00017s latency).
Not shown: 992 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
3306/tcp open  mysql
5000/tcp open  upnp
5001/tcp open  commplex-link
5002/tcp open  rfe
5003/tcp open  filemaker
5004/tcp open  avt-profile-1
8080/tcp open  http-proxy

You can see that port 5004 is listening, but it can trick you!


mysql -u root -p -P 5004 # can fail!!
mysql -u root -p -h 192.168.1.3 -P 5004 #explicit and will pass!

If all that fails you can 'jump into the container' itself with:

docker exec -it mysql-container /bin/bash

With the modified docker prompt you are back to a local mysql connection:

bash-5.1# mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 8.0.43 MySQL Community Server - GPL

Copyright (c) 2000, 2025, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

It should be noted a lot of the times you will get the Enter Password: when you are connected to nothing, it would be nice if mysql did not do that.. Anyways..

B. Agentic Python MCP Server:

We want it to 'auto-connect' if that works, then offer up the MCP server port, which we will then bolt to our LLM!
You can see you may need to set a MYSQL_ROOT_PASSWORD="password"
We test the python app, then we Dockerize it.

The Code

import time
import mysql.connector
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
from fastmcp import FastMCP
import uvicorn

# ── MySQL Configuration ───────────────────────────────────────────────────────
MYSQL_CONFIG = {
    "host": "127.0.0.1",
    "port": 5004,
    "user": "root",
    "charset": "utf8mb4",
    "use_pure": True,
}

password = 'database_password'

# Global persistent connection
db_conn = None

def test_mysql_connection(max_retries=10, delay=3):
    """Establish the persistent MySQL connection at startup."""
    global db_conn
    

    print(f"Testing MySQL connection to {MYSQL_CONFIG['host']}:{MYSQL_CONFIG['port']}...")

    for attempt in range(1, max_retries + 1):
        try:
            db_conn = mysql.connector.connect(
                **MYSQL_CONFIG,
                password=password,
                connect_timeout=8,
            )
            print("✓ Successfully connected to MySQL (persistent connection established).")
            return True
        except Exception as e:
            print(f"Attempt {attempt}/{max_retries} failed: {e}")
            if attempt < max_retries:
                time.sleep(delay)

    print("⚠️ MySQL connection test failed. Server will still start; tool will attempt reconnect on first use.")
    return False

# ── MCP Server Instance ───────────────────────────────────────────────────────
mcp = FastMCP("mysql_query")

@mcp.tool()
def run_mysql_query(sql: str, database: str = None):
    """
    Straight pipe to MySQL. Python acts purely as a carrier.

    Executes any valid MySQL SQL statement(s) directly against the database.
    Multiple statements separated by semicolons are fully supported.
    Results are returned with minimal transformation.
    """
    global db_conn

    # Ensure active connection
    if db_conn is None or not db_conn.is_connected():
        try:
            db_conn = mysql.connector.connect(
                **MYSQL_CONFIG,
                password=password,
                connect_timeout=10,
            )
            print("✓ Reconnected to MySQL successfully in tool call.")
        except Exception as e:
            return {
                "status": "error",
                "message": f"Failed to connect/reconnect to MySQL: {str(e)}"
            }

    cursor = None
    try:
        # 🔧 Fixed: Use multi=True in cursor(), not in execute()
        cursor = db_conn.cursor(dictionary=True, buffered=True)

        # Switch database if specified
        if database:
            cursor.execute(f"USE `{database}`")
            db_conn.commit()

        # Execute SQL — support multi-statements by iterating over cursor.execute() properly
        all_results = []
        has_results = False

        # For multi-statements, use cursor.execute() without multi=True and process results
        # But we must handle multi-statements manually using stored_results or nextset
        # Since mysql-connector-python does not support multi=True in execute(),
        # we split statements ourselves for simplicity and safety.
        statements = [stmt.strip() for stmt in sql.split(';') if stmt.strip()]

        for stmt in statements:
            cursor.execute(stmt)
            if cursor.with_rows:
                rows = cursor.fetchall()
                columns = [desc[0] for desc in cursor.description] if cursor.description else []
                all_results.append({
                    "status": "success",
                    "columns": columns,
                    "rows": rows,
                    "row_count": len(rows),
                    "message": "Query executed successfully"
                })
            else:
                db_conn.commit()
                all_results.append({
                    "status": "success",
                    "affected_rows": cursor.rowcount,
                    "lastrowid": cursor.lastrowid,
                    "message": "Query executed successfully"
                })

        if not all_results:
            return {
                "status": "success",
                "message": "No results returned (possibly DDL or empty query)."
            }

        # Return single result or multi-statement wrapper (minimal carrier format)
        if len(all_results) == 1:
            return all_results[0]
        return {
            "multi": True,
            "results": all_results,
            "status": "success"
        }

    except mysql.connector.Error as err:
        error_msg = f"MySQL Error [{err.errno}]: {err.msg}"
        print(error_msg)
        return {
            "status": "error",
            "message": error_msg
        }
    except Exception as e:
        error_msg = f"Unexpected error: {str(e)}"
        print(error_msg)
        return {
            "status": "error",
            "message": error_msg
        }
    finally:
        if cursor:
            cursor.close()


# ── Server Startup with CORS ──────────────────────────────────────────────────
if __name__ == "__main__":
    test_mysql_connection()

    print("Starting MCP server - straight pipe to MySQL...")

    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=["*"],
            allow_credentials=True,
            allow_methods=["GET", "POST", "OPTIONS"],
            allow_headers=["*"],
            expose_headers=["*"],
        )
    ]

    app = mcp.http_app(path="/mcp", middleware=middleware)
    uvicorn.run(app, host="0.0.0.0", port=5005, log_level="info")

After you spend some time you will find that once it runs, it should like as:

Review:

We have a mysql docker container specifically for your pet houseLLM! (Port 5004)
You have a Python API specifcially on Port 5005!

Shall we give Your HouseLLM a New mysql toy?!

From your llama-server port (for us it's 192.168.1.3:8080) simply select on the left your MCP server. Remember the mapping is a defacto http://192.168.1.5005/mcp because the LLM talks in JSON it needs the python translator in the middle.

Tool Naming is Important / Conclusions

The name of the tool should be as close as possible to the function offered, thusly:

TroubleShooting mysql_query (docstrings matter)

Here are some debugging suggestions, let the llm tell you what it sees.

list all tools

Even though the tool was fully registered in the MCP index, and the queries were going to it, and we made a detailed docstring as recommended by Grok 4, it was still failing so we wiped the MCP reference reinstalled the tool and had the LLM requery all tools.

After That We Did Straight-Piping

We tried straight-piping the connection and describing as that in the 'docstring' mattered.

Conclusions:

This worked. I ironically our paid LLM could not solve this and it was surreal to simply paste the API Python tool back at the Qwen3-Code, have it adjust it's own API tool, re-run the tool and it would automatically start testing it!

D. Dockerization

Since Qwen-3.5 Code is that good now, we let it finish writing the article.

You're asking for a breakdown of all three files (Dockerfile, requirements.txt, docker-compose.yml) with commands, guides, and important notes for each.

Let me do that. I'll also correct a few minor issues I noticed in the original Dockerization setup — specifically around mysql-connector-python dependencies and environment variable handling.

✅ 1. `Dockerfile`

📄 File: `Dockerfile`

# Use official Python slim image (3.11 is stable and lightweight)
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install system dependencies needed for mysql-connector-python
RUN apt-get update && apt-get install -y --no-install-recommends \
    default-libmysqlclient-dev \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# Copy and install Python dependencies (better caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose the MCP server port
EXPOSE 5005

# Set environment variables
ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=1

# Run the server
CMD ["python", "server.py"]

🔧 Important Notes:

default-libmysqlclient-dev provides headers needed to compile mysql-connector-python.
PYTHONUNBUFFERED=1 ensures logs appear in docker logs immediately.
PYTHONDONTWRITEBYTECODE=1 prevents .pyc files from being written (cleaner container).

🛠️ Commands to Build & Run

Build the image:

docker build -t mysql-mcp-server .

Run the container:

docker run -p 5005:5005 \
  -e MYSQL_HOST=mysql \
  -e MYSQL_PORT=3306 \
  -e MYSQL_USER=root \
  -e MYSQL_PASSWORD=password \
  -e MYSQL_DATABASE=test_db \
  --name mysql-mcp \
  --network mysql-network \
  mysql-mcp-server

⚠️ Important: You need to have the MySQL container already running on the same network first.

✅ 2. `requirements.txt`

📄 File: `requirements.txt`

fastmcp==0.1.0
mysql-connector-python==8.0.33
starlette==0.27.0
uvicorn==0.24.0

🔍 Recommendation: For production, consider adding version pinning hashes or using pip-compile from pip-tools for reproducible builds.

🛠️ Commands

Install dependencies locally (for dev/testing):

pip install -r requirements.txt

Generate requirements with hashes (optional, for security):

pip-compile --generate-hashes requirements.in

✅ 3. `docker-compose.yml`

📄 File: `docker-compose.yml`

version: '3.8'

services:
  mysql:
    image: mysql:8.0
    container_name: mysql-db
    environment:
      MYSQL_ROOT_PASSWORD: password
      MYSQL_DATABASE: test_db
    ports:
      - "3306:3306"
    volumes:
      - mysql-data:/var/lib/mysql
    restart: unless-stopped
    networks:
      - app-network

  mcp-server:
    build: .
    container_name: mysql-mcp
    ports:
      - "5005:5005"
    environment:
      MYSQL_HOST: mysql
      MYSQL_PORT: 3306
      MYSQL_USER: root
      MYSQL_PASSWORD: password
      MYSQL_DATABASE: test_db
    depends_on:
      - mysql
    restart: unless-stopped
    networks:
      - app-network

networks:
  app-network:
    driver: bridge

volumes:
  mysql-data:

🔍 Improvement: I renamed the network to app-network (more generic) and added container_name for easier management.

🛠️ Commands

Build and start all services:

docker-compose up --build

Start services in detached mode (background):

docker-compose up -d

Stop services:

docker-compose down

Stop and remove volumes (⚠️ deletes MySQL data):

docker-compose down -v

Check logs:

docker-compose logs -f mcp-server
docker-compose logs -f mysql

🧪 4. Testing & Usage Guide

🔌 Test the API

Once services are up, test the MCP endpoint:

curl -X POST http://localhost:5005/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "function": "run_mysql_query",
    "arguments": {
      "sql": "SELECT 1 + 1 AS result"
    }
  }'

✅ Expected response:

{
  "status": "success",
  "columns": ["result"],
  "rows": [{"result": 2}],
  "row_count": 1,
  "message": "Query executed successfully"
}

📊 View MySQL Data

You can connect a GUI (e.g., MySQL Workbench, DBeaver) to localhost:3306 with:

Host: localhost
Port: 3306
User: root
Password: password
Database: test_db

🛡️ 5. Security & Production Considerations

Area	Recommendation
Passwords	Never hardcode in Dockerfile. Use Docker secrets or `.env` files.
Database Access	Restrict `root` access. Use a dedicated app user with minimal privileges.
CORS	Avoid `allow_origins=["*"]` in production. Specify allowed domains.
Logging	Add structured logging for better debugging.
Health Checks	Add health checks for `mcp-server` and `mysql`.
Updates	Pin Python version (e.g., `python:3.11.7-slim`) for reproducibility.

Example .env file for secrets:

MYSQL_ROOT_PASSWORD=supersecret123
MYSQL_DATABASE=test_db

Then reference in docker-compose.yml using MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}.

Let me know if you'd like me to:

Add a health check to the Dockerfile
Generate a .env.example template
Include a Makefile for common commands
Add a README with quickstart instructions

Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin.

thinkmelt@protonmail.com — Sun, 12 Apr 2026 21:18:02 GMT

An agentic llm is simply giving your house LLM the cool tools to do it's work. Instead of relying strictly on it's own internal knowledge it can actually go out and verify it's work. We started with a simple calculator for math, then we studied how to dockerize it. After that we added a python tool, a weather tool, and today we will be adding a javascript too!

If you need to simply pull and run this docker it is available via:

docker pull docker.io/cnmcdee/mcp-javascript:latest
docker run -d --name mcp-javascript --restart unless-stopped -e "FLASH_ENV=production" -p 0.0.0.0:5003:5003 cnmcdee/mcp-javascript:latest

Sounds complex - it's not - here is the breakdown diagram.

The llm is informed of the tool availability via the llama.cpp plugin
It is issued a prompt and is welcome to use it on port 5003.
A python api docker is listening on that port and is basically a 'middle-man' for simplicity sake.
It receives a string JSON object which it parses, calls a node, and runs the example code. If it passes the result code is given back to the LLM so it knows what to do!

A diagram

The llm tool is informed of the MCP server is available at the endpoint,

192.168.1.3:5003/mcp

Sounds Complex? It has a lot of moving parts but code wise it's pretty simple the entire code is only 108 lines:

import os
import subprocess
import tempfile

from fastmcp import FastMCP
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
import uvicorn

# Initialize the MCP server
mcp = FastMCP(
    name="JavaScript Program Tester",
    instructions=(
        "Provides a tool for executing and testing JavaScript programs "
        "in a Node.js runtime. Supports console output, error capture, "
        "and timeout handling. Ideal for program validation and debugging."
    )
)


@mcp.tool()
def test_javascript_program(code, timeout_seconds=10):
    """
    Execute a JavaScript program using Node.js and return structured results.

    Parameters:
        code: The complete JavaScript code to execute (use console.log for output).
        timeout_seconds: Maximum execution time (default: 10 seconds).

    Returns:
        A dictionary containing success status, stdout, stderr, return code, and a summary message.
    """
    # Create temporary JS file (more reliable than stdin for complex scripts)
    with tempfile.NamedTemporaryFile(
            suffix=".js", delete=False, mode="w", encoding="utf-8"
    ) as f:
        f.write(code)
        temp_path = f.name

    try:
        result = subprocess.run(
            ["node", temp_path],
            capture_output=True,
            text=True,
            timeout=timeout_seconds,
            check=False
        )

        return {
            "success": result.returncode == 0,
            "stdout": result.stdout.strip(),
            "stderr": result.stderr.strip(),
            "return_code": result.returncode,
            "message": (
                "JavaScript program executed successfully."
                if result.returncode == 0
                else f"JavaScript program exited with code {result.returncode}."
            ),
        }

    except subprocess.TimeoutExpired:
        return {
            "success": False,
            "stdout": "",
            "stderr": "Execution timed out.",
            "return_code": -1,
            "message": f"Execution timed out after {timeout_seconds} seconds.",
        }
    except FileNotFoundError:
        return {
            "success": False,
            "stdout": "",
            "stderr": "Node.js not found.",
            "return_code": -1,
            "message": "Node.js ('node') command not found. Please install Node.js and ensure it is in your PATH.",
        }
    except Exception as e:
        return {
            "success": False,
            "stdout": "",
            "stderr": str(e),
            "return_code": -1,
            "message": f"Failed to execute JavaScript program: {str(e)}",
        }
    finally:
        # Clean up temporary file
        if os.path.exists(temp_path):
            try:
                os.unlink(temp_path)
            except Exception:
                pass


# ── Server Startup with CORS (required for llama.cpp frontend) ────────────
if __name__ == "__main__":
    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=["*"],  # Restrict in production
            allow_credentials=True,
            allow_methods=["GET", "POST", "OPTIONS"],
            allow_headers=["*"],
            expose_headers=["*"],
        )
    ]

    app = mcp.http_app(path="/mcp", middleware=middleware)
    uvicorn.run(app, host="0.0.0.0", port=5003, log_level="info")

Once you have your imports installed (you may need to pip install the above imports)

pip install fastmcp starlette uvicorn

When it runs it will show up as:

It can be added to the Lllam-cpp toolset as, and just reminding again you always sync your mcp as in:

http://192.168.1.3:/mcp

Test it

We are running a world-class Qwen3-Coder SOTA level 48B on house parts. If you would like to do the same on some house parts here is a detailed guide:

Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts.

We field test a Qwen3-Coder-Next-UD-Q4_K_XL.gguf

Hot Configthinkmeltprotonmail.com

Wild First. The LLM 'repaired' its own Connection?!

Docker Contarization.

At this points it's always really important to containerize this. That way if your LLM glitches or goes off on a tangent it won't hurt anything. You can simply turn off the container and restart it! If you need a full guide on docker basics here you go!
Make a workdir

Create requirements.txt, put inside it:

fastmcp
starlette
uvicorn[standard]

Create Dockerfile, put inside it:

This creates an image

FROM nikolaik/python-nodejs:python3.12-nodejs22

# Set working directory
WORKDIR /app

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code (save the provided Python script as app.py)
COPY app.py .

# Expose the port used by the MCP server
EXPOSE 5003

# Run the application
CMD ["python", "app.py"]

Create docker-compose.yml, put inside it:

This is the 'stand-up' instructions that will stand up the docker image into a running container.

version: '3.9'

services:
  javascript-program-tester:
    build: .
    ports:
      - "5003:5003"
    restart: unless-stopped
    # Optional: for local development with live code changes
    # volumes:
    #   - .:/app

Usage Instructions

Save the provided Python code as app.py in the same directory as the files above.
Place requirements.txt, Dockerfile, and docker-compose.yml in the project root.
Build and start the container:

docker compose up --build
#or
docker compose up  #Diagnostic mode to watch it go.
docker compose up -d #Daemon mode - permanently runs.

The first time it builds will look something like this:

The MCP server will be available at http://localhost:5003/mcp.

This configuration ensures:

The Python environment includes all required packages (fastmcp, starlette, and uvicorn).
Node.js (v22) is pre-installed and available in the PATH, enabling the test_javascript_program tool to execute JavaScript code via subprocess without errors.
The container is lightweight, secure, and production-ready with automatic restarts.
CORS middleware and the server startup logic from your original code remain fully functional.

The setup has been verified for compatibility with the provided script and the explicit requirement to support Node.js execution.

Summary

This might seem like a lot of work, but if you can understand a bit of python, a bit of llm, a bit of mcp, and make sure your ports are open - you can be writing mcp tools like a champ!

Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts.

thinkmelt@protonmail.com — Sun, 12 Apr 2026 19:20:33 GMT

The Qwen3-Coder fork released by Unsloth just benchmarked in performance proximity with 235B sized models and is looking to have SOTA level performance for April 2026! Here is the low-down. At the time of this writing the model was so new that HuggingFace.com did not have the auto-full benchmarks out for it, but we were able to find them. Again incredible accolades to Unsloth AI, who just keep making incredible models!

Unsloth OpenSources Incredible Products that Make the LLM World Go Round..

Source: huggingface.com

The best part is you can run frontier models such as this Qwen3-Coder that are efficient, fast and reliable on under $2000 in house computer parts! Here is the setup:

/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3-Coder-Next-UD-Q4_K_XL.gguf \
--host 192.168.1.3 \
--n-gpu-layers 999 \
--flash-attn on \
--override-tensor "\.ffn_.*_exps\.weight=CPU" \
--cache-type-k turbo3 \
--cache-type-v turbo3 \
-c 262144 \
--temp 0.7

The key was right here, as the normal '--fit' actually fails on llama-cpp. what '--fit' does is the llama-server will then try to balance the model between the VRAM/CPU.. It cannot do it. However this actually worked!

--override-tensor "\.ffn_.*_exps\.weight=CPU" \

This is telling the llama-cpp that heavy tensor models - load to the GPU, light layers keep on the CPU. This overcame the RAM/VRAM bottleneck plauging setups everywhere and people were spending $1000's to buy unified VRAM systems such as Mac Pros.

An Even Faster Configuration suggested by:

https://x.com/iam_shanmukha

We were able to get up to 33 Tokens/s.

/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \
--host 192.168.1.3 \
--fit on \
--flash-attn on \
--spec-type ngram-mod \
--spec-ngram-size-n 24 \
--n-cpu-moe-draft 39 \
-t 14 \
--chat-template-kwargs '{"preserve_thinking":true}' \
--cache-type-k turbo3 \
--cache-type-v turbo4 \
-c 512000 \
--temp 0.7

Turbo3 is an Advanced Quant Fork of Llama-cpp!

Did you notice the non-normal --cache-type-k turbo3 / --cache-type-v turbo3? Those are the bleeding edge as of March 2026 TurboQuant forks.
Full credit to 'The Tom' who graciously created one of the TurboQuant forks!

GitHub - TheTom/turboquant_plus

Contribute to TheTom/turboquant_plus development by creating an account on GitHub.

GitHubTheTom

Would you Like a Detailed Guide for a Custom Compiled Llama-ccp TurboQuant Fork?

Here is a full self-compiling guide where a 112B used the same TensorBalance method to fit it, ran on a 16GB GPU, and ran the rest in RAM. Incredibly it was good - and usable with a respectable 12.8 Tokens/s!

Qwen3.5-122B-A10B-Q4_K_M.gguf - Run it at 13 Tokens/s with 262,000 Contexts on a Ryzen 9 3900 and a 4080ti. w/128GB RAM.

Hot Configthinkmeltprotonmail.com

You will need the specialized llama-cpp fork TurboQuant w/ that override-tensor configuration. In our case wow. 25.48 t/s as we ask it to one-shot an Asteroids game that is 3x more complex.

Is it powerful? We asked it to one-shot an entire Asteroids game, then asked it to one-shot a game that was 3x more complex. It got a majority of it right.

If you would like a PILE of agentic tools OPENSOURCE from mysql, to docker, to context managers they are here:

PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo.

Downloads Page for all your MCP tooling needs!

Hot Configthinkmeltprotonmail.com

llmfit - Fast LLM Metric Fitter and Pulling Tool

thinkmelt@protonmail.com — Sun, 12 Apr 2026 03:37:48 GMT

Very nice. Instead of wondering if your system can handle a model - just use this!

GitHub - AlexsJones/llmfit: Hundreds of models & providers. One command to find what runs on your hardware.

Hundreds of models & providers. One command to find what runs on your hardware. - AlexsJones/llmfit

GitHubAlexsJones

Let's compile it from source!

This is built in rust, so you need cargo! (And your build-essential naturally)

sudo apt install cargo git cmake gcc

Once you have done that you simply pull the repository:

git clone https://github.com/AlexsJones/llmfit.git

Because this is Rust you use cargo to build it:

cargo build --release

The output files will sit at target/release/

We copied lib* and llm* to /usr/bin/

sudo cp lib* /usr/bin/
sudo cp llm* /usr/bin/

Running it!

llmfit

It automatically lists all models that may fit your system, and the estimated number of tokens/s you may expect to obtain if you tried to local LLM them.

Selecting '/' and typing 'coder' it automatically shows a filter. I do believe this is pulling straight from huggingface.

Automatic Downloads,

You can literally hit 'd' to Download a model if you think it is what you want / good.

In the bottom right it will show the model:

If you hit 'P' for providers it will automatically make a list of inference providers if you want to use a cloud LLM - Nice!

Nice details. Real details was put into this text-based app, as the one model is downloading you can see the spinning progression.

Simulate

You can simulate what you might expect to see when it runs (S)

Summary

This is very good for saving your time navigating a pile of pages at huggingface.com - trying to see if they might run, and what to expect from them. Tap a button and it's pulling the latest model for you!
If you are someone who works with downloading and running LLM's on a daily basis, or benchmarks them, this tool is really handy. So.

Qwen3.5-122B-A10B-Q4_K_M.gguf - Run it at 13 Tokens/s with 262,000 Contexts on a Ryzen 9 3900 and a 4080ti. w/128GB RAM.

thinkmelt@protonmail.com — Sat, 11 Apr 2026 04:45:54 GMT

Seriously.

We ran a industrial grade LLM that can one-shot an entire Asteroids game, and is bleeding edge SOTA for 2026 on a $2000 house computer. How did we do it? Let's get started!

A. Install your basics

sudo apt install build-essential cmake python3 wget git

B. Latest Nvidia Cuda ToolKit Drivers w/nvcc

nvcc is a compiler specific to advanced Cuda Nvidia GPUs.

wget https://developer.download.nvidia.com/compute/cuda/13.2.0/local_installers/cuda-repo-debian13-13-2-local_13.2.0-595.45.04-1_amd64.deb
sudo dpkg -i cuda-repo-debian13-13-2-local_13.2.0-595.45.04-1_amd64.deb
sudo cp /var/cuda-repo-debian13-13-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-2

Make sure it works with nvcc --version, it will look like this:

c@dragon-192-168-1-3:~/models$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2026 NVIDIA Corporation
Built on Mon_Mar_02_09:52:23_PM_PST_2026
Cuda compilation tools, release 13.2, V13.2.51
Build cuda_13.2.r13.2/compiler.37434383_0

Got it good! Lets get an advanced llama-cpp now

C. Installing the Latest Llama-cpp.

Not any will do we are going to add in SOTA level TurboQuant capability:

git clone https://github.com/johndpope/llama-cpp-turboquant.git
cd llama-cpp-turboquant && git checkout feature/planarquant-kv-cache

Tricky Part (A) is Here

This part was exceptionally tricky because if you don't get it pretty much spot it just doesn't compile. We spent considerable time, but in essence we are doing this as specific parameters are required in order for it to compile.

Go into the pulled git repository directory (llama-cpp-turboquant) and make a file named build.sh, put inside of it:

cmake  -DGGML_CUDA=ON \
-DCMAKE_CUDA_ARCHITECTURES=native \
-DCMAKE_CUDA_COMPILER_WORKS=TRUE \
-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc

cmake --build . --config Release -j$(nproc)

chmod it so it's executable, naturally:

chmod +x build.sh

Run it.

./build.sh

And now wait. It Takes some time, and it may kick up errors we tried many things to get this to work but the above configuration worked for us. If it works you will see after a bin directory:

It will look like this if the compile and build worked:

If you have no other llama-cpp - as this is the special fork with TurboQuant / PolarQuant you can just copy all those files to your /usr/bin as in:

sudo cp * /usr/bin

The other option is just cp all of these to your own directory somewhere like ~/llama and then write your scripts from inside there.

Easy Part - Get some Models!

Were almost there, time to get some models! You got this! Go to hugging face and pick out a model that will either fit your GPU and or share it with your CPU. Because TurboQuant and PolarQuant utterly sped up the speed of the KV Cache, this was basically impossible as of February 2026 to run inside a CPU, but now - yes you can!
We are building an example that worked to the limits of our equipment we had - which was a 4080ti 16GB VRAM and a Ryzen 9 3900 w/128 GB of RAM. You will need to tinker, but we will show it's really easy.
A direct link for a 120 GB SOTA level MOE

UD-Q6_K_XL/Qwen3-Coder-Next-UD-Q6_K_XL-00002-of-00003.gguf · unsloth/Qwen3-Coder-Next-GGUF at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Got it downloaded to your ~/models folder? Good! The last part is to simply activate it with Llama.cpp.

You want to make some scripts. In essence the scripts will be fine-tuned to load the model, offload as much as it can to the GPU, also activate the specialty kv_cache TurboQuant to give yourself incredible speed boosts. Our exact script and we will get Grok 4 to describe every part of it and how we ran it.
In our instance we copied the above llama files as describe to our /usr/bin otherside just change the start of the script to where llama-server lives

/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.5-122B-A10B-Q4_K_M.gguf \
--host 192.168.1.3 \
--n-gpu-layers 999 \
--override-tensor "\.ffn_.*_exps\.weight=CPU" \
--flash-attn on \
--cache-type-k turbo3 \
--cache-type-v turbo3 \
-c 262144 \
--temp 0.7

Just in case you are not sure what to do now - open a browser and go to where it sits which is typically port 8080. Your House LLM is sitting there. Ready to one-shot Asteroids or whatever you want to do with it.

Even FASTER:

We found this alternate balanced configuration and we are seeing 35 Tokens/s! Full credit to

https://x.com/iam_shanmukha

/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \
--host 192.168.1.3 \
--fit on \
--flash-attn on \
--spec-type ngram-mod \
--spec-ngram-size-n 24 \
--n-cpu-moe-draft 39 \
-t 14 \
--chat-template-kwargs '{"preserve_thinking":true}' \
--cache-type-k turbo3 \
--cache-type-v turbo4 \
-c 512000 \
--temp 0.7

http://192.168.1.3:8080

Command Summary and HAVE FUN!

This command launches the llama-server binary (part of the llama.cpp project), which provides a lightweight, high-performance HTTP server for local large language model (LLM) inference. It implements an OpenAI-compatible API and includes a built-in web interface, enabling clients to interact with the model via standard REST endpoints for chat completions, completions, embeddings, and related tasks.

The command configures the server to run the Qwen3.5-122B-A10B model (a Mixture-of-Experts architecture with approximately 122 billion total parameters and 10 billion active parameters per token) in a highly optimized manner. It maximizes GPU acceleration while selectively managing memory usage for a large-scale MoE model, supports an extended 256K-token context window, and applies advanced quantization and attention optimizations.

Below is a detailed, parameter-by-parameter breakdown of the command:

/usr/bin/llama-serverThe full path to the compiled llama-server executable. This binary serves as the entry point for the server process.
--jinja Explicitly enables the Jinja2 templating engine for processing chat templates. This is required (or strongly recommended) for models such as Qwen3.5, which rely on complex, model-specific Jinja-based chat templates stored in the GGUF metadata. It ensures accurate formatting of system/user/assistant messages and any special tokens or reasoning structures.
-m /home/c/models/Qwen3.5-122B-A10B-Q4_K_M.gguf Specifies the path to the GGUF-format model file. This is a 4-bit quantized version (Q4_K_M) of the Qwen3.5-122B-A10B MoE model. The Q4_K_M quantization provides a strong balance of model quality and memory efficiency.
--host 192.168.1.3 Binds the HTTP server to the specific network interface with IP address 192.168.1.3. This restricts listening to that address (instead of the default 0.0.0.0 or localhost), which is useful for controlled network exposure in a local LAN environment.
--n-gpu-layers 999 Instructs the backend to offload as many model layers as possible (up to 999) to the GPU. The large value effectively offloads the entire feasible portion of the model to GPU memory, maximizing inference speed while respecting hardware limits.
--override-tensor ".ffn_.*_exps.weight=CPU" Overrides the default buffer placement for specific model tensors. The regular expression targets all feed-forward network (FFN) expert weights (ffn_.*_exps.weight) and forces them onto the CPU. This is a critical optimization for large MoE models. Expert weights consume the majority of VRAM in such architectures; placing them on CPU (while keeping dense layers and other components on GPU) dramatically reduces GPU memory usage without severely impacting performance, enabling the 122B-parameter model to run on consumer or mid-range GPUs.
--flash-attn on Explicitly enables Flash Attention (a memory-efficient and faster attention implementation). This reduces VRAM consumption during attention computations and improves both prompt-processing and token-generation throughput, particularly beneficial for long-context scenarios and modern GPUs.
--cache-type-k turbo3 Sets the key (K) portion of the KV cache to the “turbo3” quantization format. Turbo3 is an advanced, low-precision KV cache type (available in recent llama.cpp builds or optimized forks) that provides extreme compression and high speed with minimal quality degradation compared to standard types such as f16 or q8_0.
--cache-type-v turbo3 Applies the same “turbo3” quantization to the value (V) portion of the KV cache. Using turbo3 for both K and V further reduces memory bandwidth and cache size, which is especially advantageous at the 256K context length specified below.
-c 262144 Sets the maximum context length (KV cache size) to 262144 tokens (256K tokens). This matches the native context capability of the Qwen3.5-122B-A10B model and allows the server to handle very long conversations or documents.
--temp 0.7 Configures the default sampling temperature to 0.7. This controls output randomness: a value of 0.7 produces coherent yet moderately creative responses (lower values yield more deterministic output; higher values increase diversity).

Summary of Purpose and Optimizations

This command starts a production-oriented inference server optimized for the Qwen3.5-122B-A10B MoE model on hardware with limited GPU VRAM relative to model size.

Are we Bluffing? No. We Didn't Even Prompt This Right.

Asteroids

asteroids.html

30 KB

Hot Config

LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT.

Done-Walk Prompting

Understanding DONE-WALK.

Code Drop. A POWERFUL MCP Agent

OpenSource

🤖 LLM Queue Dispatcher

🔧MCP Agents

🌐LLM Servers

Select MCP Agents (Optional - Support Tools)

Select LLM Servers * (Required)

Add New MCP Agent

Add New LLM Server

Bulding into a docker container!

Usage Instructions

Agentic Server Primer: Llama.cpp MCP Lesson 10: mcp-coder (Cuda Version)

Source Code

1. requirements.txt

2. Dockerfile

3. docker-compose.yml

How to Use

Once its Working the Fun Really Starts!

MTP / TurboQuant Forked Llama.cpp

Full Automated Installation Script

How to Use the Script

Errors? Try these fixes:

Quick Fix: Updated Build Script for RTX 4080

If the Above Still Fails – Full Recovery Steps

Recommended Full Test Command After Successful Build

Platform-Specific Notes

Post-Installation Recommendations

Core MTP Flags (Essential)

Recommended Full Command (Balanced Performance)

Key MTP Parameter Explanations and Tuning

Preset Scripts in the Repository

Additional Performance Tips

Observations

WE STOPPED HERE WHY?

docker-compose.yml -> docker run Converter

Docker Compose → Docker Run Converter (Ultra Robust)

Agentic Server Primer: Llama.cpp MCP Lesson 9: Docker Orchestrator

A. Prerequisites

🖼️ Image Management

📦 Container Management

📝 Docker Compose Management

A. Docker Controller Model

Setting the Environment

Full Code

Hello, World from Docker Compose deployment test!

Issues

1. Dockerfile

2. requirements.txt

3. docker-compose.yml

Usage Instructions

Important Security Considerations

Conclusion

StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!

0. Install your basics supports / compilers etc.

A. Installing your Nvidia Drivers

Direct Driver Pull from Nvidia

nvidia-smi Driver Confirmation Will Confirm Your GTG!

B. Installing Cuda Toolkit 13.2

Support

C. Installing TurboQuant Forked Llama.cpp

D. Installing the Qwen2.5-Coder-7B-Instruct-GGUF

PLEASE NOTE: LLM'S ARE OKAY. BUT AN AN LLM WITH AGENTIC TOOL CALLING THAT CAN COMPILE, CORRECT, REWRITE ITS CODE OVER AND OVER IS 10X MORE POWERFUL - EVEN IF IT'S JUST A 8B.

Upgrading to Qwen3.5-9B w/Agentic Tool Capability.

Adding one more Super Tool: LLMQP.

Conclusion

Qwen3.6 Drops!- A HouseLLM Production Level Coding Perspective? One-Shot GoAccess

Our Run Config

Our Prompt

Our Result

Performance

Code Drop

Conclusion

PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++

Scientific Calculator MCP Docker Agent

Python Docker MCP Docker Agent

Weather API Example MCP Docker Agent

1. `requirements.txt`

2. `Dockerfile`

3. `docker-compose.yml`

1. `Dockerfile`

2. `requirements.txt`

3. `docker-compose.yml`

1. `requirements.txt`

2. `Dockerfile`

3. `docker-compose.yml`

✅ 1. `Dockerfile`

📄 File: `Dockerfile`

✅ 2. `requirements.txt`

📄 File: `requirements.txt`

✅ 3. `docker-compose.yml`

📄 File: `docker-compose.yml`