<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Hot Config]]></title><description><![CDATA[Hot Config]]></description><link>https://www.hotconfig.com/</link><image><url>https://www.hotconfig.com/favicon.png</url><title>Hot Config</title><link>https://www.hotconfig.com/</link></image><generator>Ghost 4.48</generator><lastBuildDate>Thu, 21 May 2026 05:39:02 GMT</lastBuildDate><atom:link href="https://www.hotconfig.com/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT.]]></title><description><![CDATA[LLM Queue Dispatcher. A Powerful Harness Drop will queue your localLLM all night and keep it working!]]></description><link>https://www.hotconfig.com/llm-queue-dispatcher/</link><guid isPermaLink="false">6a0a80f19e9ad20001df40d8</guid><category><![CDATA[localLLM]]></category><category><![CDATA[HouseGPU]]></category><category><![CDATA[DoneWALK]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Mon, 18 May 2026 03:14:59 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/05/rect4.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/05/rect4.png" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT."><p>Everyone is raging about the houseLLM revolution. &#xA0;<a href="https://www.hotconfig.com/studentllm-examinin/">Turboquant</a> made large contexts possible, and now <a href="https://github.com/ggml-org/llama.cpp">MTP (Multiple Token Prediction) </a>increased speeds considerably (and has been accepted into the main fork of llama.cpp.) &#xA0;But with all this local compute power a good harness that will let your LLM code all night was in order. &#xA0;</p><ul><li>You prompt, it works, you come back.. you prompt - but you are sitting there all night bolted to the output. &#xA0;What if you could set 20 prompts up, have them queued, and have them sequentially execute in order even if one takes 5 minutes and the next takes and hour? &#xA0;This harness is designed exactly to do this!</li><li>You want a web gui interface because you just don&apos;t want to mangle json object back-ends. </li><li>It&apos;s already dockerized, pull it and run it!</li></ul><p>LLM Queue Dispatcher (LLMQP) is designed exactly to do this, it can be very easily run and is completely docker ready!</p><pre><code class="language-bash">docker pull docker.io/cnmcdee/llmqueue:latest
docker run -d --name mcp-llmqueue --restart unless-stopped -p 0.0.0.0:5012:5012 cnmcdee/llmqueue:latest</code></pre><p>As soon as you download it it will offer you some powerful options. Lets go over it.</p><ul><li>It binds to port 5012, and is designed to run locally, and to control and monitor <em>multiple</em> localLLM&apos;s plus API LLM&apos;s at the same time.</li></ul><pre><code class="language-bash">http://192.168.1.&lt;your ip&gt;:5012</code></pre><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/05/image-3.png" class="kg-image" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT." loading="lazy" width="1003" height="266" srcset="https://www.hotconfig.com/content/images/size/w600/2026/05/image-3.png 600w, https://www.hotconfig.com/content/images/size/w1000/2026/05/image-3.png 1000w, https://www.hotconfig.com/content/images/2026/05/image-3.png 1003w" sizes="(min-width: 720px) 720px"></figure><p><strong>Set your MCP Agents</strong></p><ul><li>It fully recognizes and uses CORS polling to read MCP Tools </li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/05/image-4.png" class="kg-image" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT." loading="lazy" width="1336" height="352" srcset="https://www.hotconfig.com/content/images/size/w600/2026/05/image-4.png 600w, https://www.hotconfig.com/content/images/size/w1000/2026/05/image-4.png 1000w, https://www.hotconfig.com/content/images/2026/05/image-4.png 1336w" sizes="(min-width: 720px) 720px"></figure><p><strong>Set your LLM Stack</strong></p><ul><li>One or many it&apos;s up to you</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/05/image-5.png" class="kg-image" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT." loading="lazy" width="1336" height="352" srcset="https://www.hotconfig.com/content/images/size/w600/2026/05/image-5.png 600w, https://www.hotconfig.com/content/images/size/w1000/2026/05/image-5.png 1000w, https://www.hotconfig.com/content/images/2026/05/image-5.png 1336w" sizes="(min-width: 720px) 720px"></figure><h3 id="done-walk-prompting">Done-Walk Prompting</h3><ul><li>You can queue as many prompts as you like, it will build your prompt and dispatch them to your multiple LLM&apos;s. monitoring the output!</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/05/image-6.png" class="kg-image" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT." loading="lazy" width="1336" height="809" srcset="https://www.hotconfig.com/content/images/size/w600/2026/05/image-6.png 600w, https://www.hotconfig.com/content/images/size/w1000/2026/05/image-6.png 1000w, https://www.hotconfig.com/content/images/2026/05/image-6.png 1336w" sizes="(min-width: 720px) 720px"></figure><p>It will show you prompt dispatches.</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/05/image-7.png" class="kg-image" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT." loading="lazy" width="481" height="230"></figure><p>The prompts as they work can be monitored. &#xA0;</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/05/image-8.png" class="kg-image" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT." loading="lazy" width="1342" height="759" srcset="https://www.hotconfig.com/content/images/size/w600/2026/05/image-8.png 600w, https://www.hotconfig.com/content/images/size/w1000/2026/05/image-8.png 1000w, https://www.hotconfig.com/content/images/2026/05/image-8.png 1342w" sizes="(min-width: 720px) 720px"></figure><h3 id="understanding-done-walk">Understanding DONE-WALK.</h3><p>Each prompt will finish with a unique 10 character serial. &#xA0;The software will monitor the output prompt for this serial queue. &#xA0;Upon seeing it it will automatically start the next &#xA0;prompt for you! Simply select that type of prompt and watch it work!<br></p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.hotconfig.com/content/images/2026/05/image-9.png" class="kg-image" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT." loading="lazy" width="666" height="117" srcset="https://www.hotconfig.com/content/images/size/w600/2026/05/image-9.png 600w, https://www.hotconfig.com/content/images/2026/05/image-9.png 666w"><figcaption>Done-Walk will walk your prompts one by one.</figcaption></figure><h3 id="code-drop-a-powerful-mcp-agent">Code Drop. A POWERFUL MCP Agent</h3><ul><li>If you want to make your LLM work across contexts, pick up where it left off, add in this agent. &#xA0;You litterally tell your LLM &apos;Use the Process Manager to Create a Task to save your work point.&apos; &#xA0;Then in the next prompt another LLM (or the next Context) can pickup where it left off &apos;Using the Process Manager - load the following task and keep working on it!) </li><li>Fully OpenSource!</li></ul><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/agentic-server-primer-llama-cpp-mcp-lesson-7-process-manager-part-1/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop.</div><div class="kg-bookmark-description">Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT."><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/surf_bot_clipped.png" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT."></div></a></figure><h3 id="opensource">OpenSource</h3><ul><li>This is fully OPEN SOURCED!!</li><li>Create an app.py and put in it this, you will need a <code>venv</code> with <code>flask</code> <code>aiohttp</code> etc..</li></ul><pre><code class="language-python">import asyncio
import time
from flask import Flask, jsonify, render_template, request
import threading
import pickle
import os
import aiohttp
import json
import uuid
from typing import Dict, Any, List
import aiohttp
import json
import requests, sys

# ====================== CORE LLM CLIENT ======================
class AsyncLLMClient:
    def __init__(self, api_key: str, base_url: str = &quot;https://api.openai.com/v1&quot;):
        self.api_key = api_key
        self.base_url = base_url
        self.requests: Dict[str, Any] = {}
        self.monitor_tasks = []
        self._session = None
    async def __aenter__(self):
        self._session = aiohttp.ClientSession()
        return self
    async def __aexit__(self, exc_type, exc_val, exc_tb):
        if self._session:
            await self._session.close()
        for task in self.monitor_tasks:
            task.cancel()
        await asyncio.gather(*self.monitor_tasks, return_exceptions=True)
    async def send_request(self, prompt: str, **kwargs):
        request_id = f&quot;req_{len(self.requests)}&quot;
        request = {
            &quot;prompt&quot;: prompt,
            &quot;model&quot;: kwargs.get(&quot;model&quot;, &quot;gpt-4o-mini&quot;),
            &quot;max_tokens&quot;: kwargs.get(&quot;max_tokens&quot;, 1024),
            &quot;temperature&quot;: kwargs.get(&quot;temperature&quot;, 0.7),
            &quot;response_chunks&quot;: [],
            &quot;total_bytes&quot;: 0,
            &quot;estimated_tokens&quot;: 0,
            &quot;status&quot;: &quot;idle&quot;,
            &quot;error&quot;: None,
            &quot;start_time&quot;: None,
            &quot;end_time&quot;: None,
            &quot;full_response&quot;: &quot;&quot;,
            &quot;request_id&quot;: request_id
        }
        self.requests[request_id] = request
        self.monitor_tasks.append(asyncio.create_task(self._monitor_bytes(request)))
        asyncio.create_task(self._execute_request(request))
        return request
    async def _monitor_bytes(self, request):
        request[&quot;status&quot;] = &quot;monitoring&quot;
        request[&quot;start_time&quot;] = time.time()
        last = 0
        try:
            while request[&quot;status&quot;] == &quot;monitoring&quot;:
                if len(request[&quot;response_chunks&quot;]) &gt; last:
                    new_data = b&apos;&apos;.join(request[&quot;response_chunks&quot;][last:])
                    request[&quot;total_bytes&quot;] += len(new_data)
                    last = len(request[&quot;response_chunks&quot;])
                    # Improved estimation for Llama.cpp and similar local models
                    # Llama.cpp typically uses ~3 characters per token (especially with code)
                    request[&quot;estimated_tokens&quot;] = round(len(request[&quot;full_response&quot;]) / 150)
                await asyncio.sleep(0.1)
        except asyncio.CancelledError:
            pass
        except Exception as e:
            request[&quot;status&quot;] = &quot;error&quot;
            request[&quot;error&quot;] = str(e)
        finally:
            request[&quot;end_time&quot;] = time.time()
            if request[&quot;status&quot;] != &quot;error&quot;:
                request[&quot;status&quot;] = &quot;completed&quot;
    async def _execute_request(self, request):
        url = f&quot;{self.base_url}/chat/completions&quot;
        headers = {&quot;Authorization&quot;: f&quot;Bearer {self.api_key}&quot;, &quot;Content-Type&quot;: &quot;application/json&quot;}
        payload = {
            &quot;model&quot;: request[&quot;model&quot;],
            &quot;messages&quot;: [{&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: request[&quot;prompt&quot;]}],
            &quot;max_tokens&quot;: request[&quot;max_tokens&quot;],
            &quot;temperature&quot;: request[&quot;temperature&quot;],
            &quot;stream&quot;: True
        }
        try:
            async with self._session.post(url, headers=headers, json=payload) as resp:
                if resp.status != 200:
                    request[&quot;status&quot;] = &quot;error&quot;
                    request[&quot;error&quot;] = await resp.text()
                    return
                async for chunk in resp.content.iter_chunked(1024):
                    if request[&quot;status&quot;] == &quot;error&quot;: break
                    request[&quot;response_chunks&quot;].append(chunk)
                    request[&quot;full_response&quot;] += chunk.decode(&apos;utf-8&apos;, errors=&apos;replace&apos;)
                request[&quot;status&quot;] = &quot;completed&quot;
        except Exception as e:
            request[&quot;status&quot;] = &quot;error&quot;
            request[&quot;error&quot;] = str(e)
    async def _wait_for_completion(self, request):
        while request[&quot;status&quot;] == &quot;monitoring&quot;:
            await asyncio.sleep(0.1)
# ====================== BASE MANAGER ======================
class BaseManager:
    def __init__(self, state_file: str):
        self.state_file = state_file
        self.items: List[Dict] = []
        self._save_task = None
    def _get_serializable(self):
        return [{k: v for k, v in item.items() if k != &quot;client&quot;} for item in self.items]
    def save_state(self):
        try:
            with open(self.state_file, &quot;wb&quot;) as f:
                pickle.dump(self._get_serializable(), f)
            print(f&quot;State saved: {self.state_file}&quot;)
        except Exception as e:
            print(f&quot;Save failed: {e}&quot;)
    def load_state(self):
        if os.path.exists(self.state_file):
            try:
                with open(self.state_file, &quot;rb&quot;) as f:
                    self.items = pickle.load(f)
                print(f&quot;Loaded {len(self.items)} items from {self.state_file}&quot;)
            except Exception:
                self.items = []
    def _save_after_change(self):
        self.save_state()
# ====================== MCP AGENT MANAGER ======================
class AsyncMCPAgentManager(BaseManager):
    def __init__(self):
        super().__init__(&quot;mcp_agents_state.pkl&quot;)
        self.scheduled_tasks = []
        self.load_state()
    async def __aenter__(self):
        for agent in self.items:
            if not agent.get(&quot;client&quot;):
                client = AsyncLLMClient(agent[&quot;api_key&quot;], agent[&quot;base_url&quot;])
                agent[&quot;client&quot;] = client
                await client.__aenter__()
        if self._save_task is None:
            self._save_task = asyncio.create_task(self._autosave_loop())
        return self
    async def __aexit__(self, *args):
        if self._save_task:
            self._save_task.cancel()
        self.save_state()
        for agent in self.items:
            if agent.get(&quot;client&quot;):
                await agent[&quot;client&quot;].__aexit__(None, None, None)
    async def _autosave_loop(self):
        while True:
            await asyncio.sleep(30)
            self.save_state()
    def list_mcp_agents(self):
        result = []
        for agent in self.items:
            client = agent.get(&quot;client&quot;)
            active = sum(1 for r in (client.requests.values() if client else {}) if r.get(&quot;status&quot;) == &quot;monitoring&quot;)
            result.append({
                &quot;agent_id&quot;: agent[&quot;agent_id&quot;],
                &quot;name&quot;: agent[&quot;name&quot;],
                &quot;description&quot;: agent.get(&quot;description&quot;, &quot;&quot;),
                &quot;base_url&quot;: agent[&quot;base_url&quot;],
                &quot;enabled&quot;: agent.get(&quot;enabled&quot;, True),
                &quot;enabled_tools&quot;: agent.get(&quot;enabled_tools&quot;, []),
                &quot;active_requests&quot;: active,
                &quot;total_requests&quot;: len(client.requests) if client else 0
            })
        return result
    async def create_mcp_agent(self, name, base_url, api_key, description=&quot;&quot;, enabled_tools=None, enabled=True):
        agent_id = str(uuid.uuid4())[:8]
        client = AsyncLLMClient(api_key, base_url)
        agent = {&quot;agent_id&quot;: agent_id, &quot;name&quot;: name, &quot;description&quot;: description,
                 &quot;base_url&quot;: base_url, &quot;api_key&quot;: api_key, &quot;enabled&quot;: enabled,
                 &quot;enabled_tools&quot;: enabled_tools or [], &quot;client&quot;: client}
        self.items.append(agent)
        await client.__aenter__()
        self._save_after_change()
        return agent
    async def update_mcp_agent(self, agent_id, **kwargs):
        agent = next((a for a in self.items if a[&quot;agent_id&quot;] == agent_id), None)
        if not agent: return None
        for key in (&quot;name&quot;, &quot;description&quot;, &quot;enabled&quot;, &quot;enabled_tools&quot;):
            if key in kwargs and kwargs[key] is not None:
                agent[key] = kwargs[key]
        if &quot;base_url&quot; in kwargs or &quot;api_key&quot; in kwargs:
            if agent.get(&quot;client&quot;):
                await agent[&quot;client&quot;].__aexit__(None, None, None)
            new_client = AsyncLLMClient(kwargs.get(&quot;api_key&quot;, agent[&quot;api_key&quot;]),
                                        kwargs.get(&quot;base_url&quot;, agent[&quot;base_url&quot;]))
            agent[&quot;base_url&quot;] = kwargs.get(&quot;base_url&quot;, agent[&quot;base_url&quot;])
            agent[&quot;api_key&quot;] = kwargs.get(&quot;api_key&quot;, agent[&quot;api_key&quot;])
            agent[&quot;client&quot;] = new_client
            await new_client.__aenter__()
        self._save_after_change()
        return agent
    async def delete_mcp_agent(self, agent_id):
        for i, agent in enumerate(self.items):
            if agent[&quot;agent_id&quot;] == agent_id:
                if agent.get(&quot;client&quot;):
                    await agent[&quot;client&quot;].__aexit__(None, None, None)
                del self.items[i]
                self._save_after_change()
                return True
        return False
    async def send_request(self, agent_id, prompt, **kwargs):
        agent = next((a for a in self.items if a[&quot;agent_id&quot;] == agent_id), None)
        if not agent or not agent.get(&quot;enabled&quot;, False):
            raise ValueError(f&quot;MCP Agent {agent_id} not found or disabled&quot;)
        return await agent[&quot;client&quot;].send_request(prompt, **kwargs)
    async def distribute_prompts(self, prompts, agent_ids, mode=&quot;ALL_GET_ALL&quot;, **kwargs):
        results = []
        max_tokens = kwargs.get(&quot;max_tokens&quot;, 1024)
        temperature = kwargs.get(&quot;temperature&quot;, 0.7)
        model = kwargs.get(&quot;model&quot;, &quot;gpt-4o-mini&quot;)

        if mode == &quot;DONE_WALK&quot;:
            for i, prompt in enumerate(prompts):
                agent_id = agent_ids[i % len(agent_ids)]
                try:
                    req = await self.send_request(agent_id, prompt, max_tokens=max_tokens,
                                                  temperature=temperature, model=model)
                    agent = next((a for a in self.items if a[&quot;agent_id&quot;] == agent_id), None)
                    if agent and agent.get(&quot;client&quot;):
                        await agent[&quot;client&quot;]._wait_for_completion(req)
                    results.append({&quot;prompt_index&quot;: i, &quot;agent_id&quot;: agent_id, &quot;status&quot;: &quot;completed&quot;})
                except Exception as e:
                    results.append({&quot;prompt_index&quot;: i, &quot;agent_id&quot;: agent_id, &quot;error&quot;: str(e)})
            return {&quot;status&quot;: &quot;success&quot;, &quot;mode&quot;: &quot;DONE_WALK&quot;, &quot;details&quot;: results}

        # ALL_GET_ALL
        for prompt in prompts:
            for aid in agent_ids:
                try:
                    await self.send_request(aid, prompt, max_tokens=max_tokens,
                                            temperature=temperature, model=model)
                    results.append(f&quot;Sent to {aid}&quot;)
                except Exception as e:
                    results.append(f&quot;Error: {e}&quot;)
        return {&quot;status&quot;: &quot;success&quot;, &quot;details&quot;: results}
    def decode_mcp_tools_list(self, json_data):
        &quot;&quot;&quot;
        Decodes an MCP tools/list JSON-RPC response and returns clean structured data.
        &quot;&quot;&quot;
        # Parse if input is a string
        if isinstance(json_data, str):
            try:
                data = json.loads(json_data)
            except json.JSONDecodeError as e:
                return {&quot;success&quot;: False, &quot;error&quot;: &quot;Invalid JSON: &quot; + str(e)}
        else:
            data = json_data
        tools = data.get(&quot;result&quot;, {}).get(&quot;tools&quot;, [])
        if not tools:
            return {&quot;success&quot;: False, &quot;error&quot;: &quot;No tools found in the response.&quot;}
        decoded_tools = []
        def parse_input_schema(schema):
            props = schema.get(&quot;properties&quot;, {})
            required = set(schema.get(&quot;required&quot;, []))
            params = []
            for name, info in props.items():
                params.append({
                    &quot;name&quot;: name,
                    &quot;type&quot;: info.get(&quot;type&quot;, &quot;any&quot;),
                    &quot;required&quot;: name in required,
                    &quot;default&quot;: info.get(&quot;default&quot;)
                })
            return params
        for tool in tools:
            name = tool.get(&quot;name&quot;, &quot;Unnamed&quot;)
            description = tool.get(&quot;description&quot;, &quot;No description provided.&quot;)
            # Input parameters
            input_schema = tool.get(&quot;inputSchema&quot;, {})
            parameters = parse_input_schema(input_schema)
            # Output type
            output_schema = tool.get(&quot;outputSchema&quot;, {})
            output_type = output_schema.get(&quot;properties&quot;, {}) \
                .get(&quot;result&quot;, {}).get(&quot;type&quot;, &quot;unknown&quot;)
            decoded_tools.append({
                &quot;name&quot;: name,
                &quot;description&quot;: description,
                &quot;parameters&quot;: parameters,
                &quot;output_type&quot;: output_type
            })
        return {
            &quot;success&quot;: True,
            &quot;tool_count&quot;: len(decoded_tools),
            &quot;tools&quot;: decoded_tools
        }
    def parse_sse_mcp_response(self, sse_string):
        &quot;&quot;&quot;
        Extracts the JSON payload from an SSE response (e.g. &quot;event: message\ndata: {...}&quot;).
        &quot;&quot;&quot;
        lines = [line.strip() for line in sse_string.strip().split(&quot;\n&quot;)]
        json_str = None
        for line in lines:
            if line.startswith(&quot;data:&quot;):
                json_str = line[5:].strip()
                break
        if not json_str:
            return {&quot;success&quot;: False, &quot;error&quot;: &quot;No &apos;data:&apos; field found in SSE response.&quot;}
        try:
            return json.loads(json_str)
        except json.JSONDecodeError as e:
            return {&quot;success&quot;: False, &quot;error&quot;: &quot;Invalid JSON in SSE data: &quot; + str(e)}
    def scan_mcp_server(self, url, api_key=None):
        &quot;&quot;&quot;
        Connects to an MCP server, performs initialize + tools/list,
        handles SSE responses, decodes the result, and returns clean structured data.
        &quot;&quot;&quot;
        url = url.rstrip(&quot;/&quot;)
        if not url.endswith(&apos;mcp&apos;):
            url += &apos;mcp&apos;
        headers = {
            &quot;Content-Type&quot;: &quot;application/json&quot;,
            &quot;Accept&quot;: &quot;application/json, text/event-stream&quot;
        }
        if api_key:
            headers[&quot;Authorization&quot;] = f&quot;Bearer {api_key}&quot;

        # Step 1: Initialize session
        init_payload = {
            &quot;jsonrpc&quot;: &quot;2.0&quot;,
            &quot;id&quot;: 99,
            &quot;method&quot;: &quot;initialize&quot;,
            &quot;params&quot;: {
                &quot;protocolVersion&quot;: &quot;2024-11-05&quot;,
                &quot;capabilities&quot;: {},
                &quot;clientInfo&quot;: {&quot;name&quot;: &quot;flask-mcp-scanner&quot;, &quot;version&quot;: &quot;1.0&quot;}
            }
        }
        init_response = requests.post(url, headers=headers, json=init_payload, timeout=15)
        if init_response.status_code != 200:
            return {&quot;success&quot;: False, &quot;error&quot;: f&quot;Initialize failed: HTTP {init_response.status_code}&quot;}

        server_name = &quot;&quot;
        try:
            init_text = init_response.text.strip()
            if &quot;event:&quot; in init_text:
                # SSE format
                json_rpc = self.parse_sse_mcp_response(init_text)
            else:
                json_rpc = init_response.json()

            if isinstance(json_rpc, dict):
                result = json_rpc.get(&quot;result&quot;, {})
                server_name = result.get(&quot;serverInfo&quot;, {}).get(&quot;name&quot;, &quot;&quot;) or \
                              result.get(&quot;name&quot;, &quot;&quot;)
        except Exception:
            pass  # fallback to empty name (will use URL hostname later)

        session_id = init_response.headers.get(&quot;Mcp-Session-Id&quot;) or \
                     init_response.headers.get(&quot;mcp-session-id&quot;)
        if not session_id:
            return {&quot;success&quot;: False, &quot;error&quot;: &quot;No Mcp-Session-Id received from server.&quot;}

        # Step 2: Get tools list
        tools_headers = headers.copy()
        tools_headers[&quot;Mcp-Session-Id&quot;] = session_id
        tools_payload = {
            &quot;jsonrpc&quot;: &quot;2.0&quot;,
            &quot;id&quot;: 1,
            &quot;method&quot;: &quot;tools/list&quot;,
            &quot;params&quot;: {}
        }
        tools_response = requests.post(url, headers=tools_headers, json=tools_payload, timeout=15)
        if tools_response.status_code != 200:
            return {&quot;success&quot;: False, &quot;error&quot;: f&quot;Tools/list failed: HTTP {tools_response.status_code}&quot;}

        # Step 3: Handle SSE or normal JSON
        raw_text = tools_response.text
        json_rpc = self.parse_sse_mcp_response(raw_text) if &quot;event:&quot; in raw_text else tools_response.json()

        if isinstance(json_rpc, dict) and &quot;success&quot; in json_rpc and not json_rpc.get(&quot;success&quot;):
            return json_rpc

        # Step 4: Decode and return clean result
        retset =  self.decode_mcp_tools_list(json_rpc)

        if isinstance(retset, dict) and retset.get(&quot;success&quot;):
            retset[&quot;server_name&quot;] = server_name
        return retset


# ====================== LLM SERVER MANAGER ======================
class AsyncLLMServerManager(BaseManager):
    def __init__(self):
        super().__init__(&quot;llm_servers_state.pkl&quot;)
        self.load_state()
    async def __aenter__(self):
        for server in self.items:
            if not server.get(&quot;client&quot;):
                client = AsyncLLMClient(server[&quot;api_key&quot;], server[&quot;base_url&quot;])
                server[&quot;client&quot;] = client
                await client.__aenter__()
        if self._save_task is None:
            self._save_task = asyncio.create_task(self._autosave_loop())
        return self
    async def __aexit__(self, *args):
        if self._save_task:
            self._save_task.cancel()
        self.save_state()
        for server in self.items:
            if server.get(&quot;client&quot;):
                await server[&quot;client&quot;].__aexit__(None, None, None)
    async def _autosave_loop(self):
        while True:
            await asyncio.sleep(30)
            self.save_state()
    def list_llm_servers(self):
        result = []
        for server in self.items:
            client = server.get(&quot;client&quot;)
            active = sum(1 for r in (client.requests.values() if client else {}) if r.get(&quot;status&quot;) == &quot;monitoring&quot;)
            result.append({
                &quot;server_id&quot;: server.get(&quot;server_id&quot;),
                &quot;name&quot;: server[&quot;name&quot;],
                &quot;base_url&quot;: server[&quot;base_url&quot;],
                &quot;default_model&quot;: server.get(&quot;default_model&quot;, &quot;gpt-4o-mini&quot;),
                &quot;active_requests&quot;: active
            })
        return result
    async def update_llm_server(self, server_id, **kwargs):
        server = next((s for s in self.items if s[&quot;server_id&quot;] == server_id), None)
        if not server:
            return None
        for key in (&quot;name&quot;, &quot;default_model&quot;):
            if key in kwargs and kwargs[key] is not None:
                server[key] = kwargs[key]
        if &quot;base_url&quot; in kwargs or &quot;api_key&quot; in kwargs:
            if server.get(&quot;client&quot;):
                await server[&quot;client&quot;].__aexit__(None, None, None)
            new_client = AsyncLLMClient(kwargs.get(&quot;api_key&quot;, server[&quot;api_key&quot;]),
                                        kwargs.get(&quot;base_url&quot;, server[&quot;base_url&quot;]))
            server[&quot;base_url&quot;] = kwargs.get(&quot;base_url&quot;, server[&quot;base_url&quot;])
            server[&quot;api_key&quot;] = kwargs.get(&quot;api_key&quot;, server[&quot;api_key&quot;])
            server[&quot;client&quot;] = new_client
            await new_client.__aenter__()
        self._save_after_change()
        return server
    async def delete_llm_server(self, server_id):
        for i, server in enumerate(self.items):
            if server[&quot;server_id&quot;] == server_id:
                if server.get(&quot;client&quot;):
                    await server[&quot;client&quot;].__aexit__(None, None, None)
                del self.items[i]
                self._save_after_change()
                return True
        return False
    async def create_llm_server(self, name, base_url, api_key, default_model=&quot;gpt-4o-mini&quot;, description=&quot;&quot;):
        server_id = str(uuid.uuid4())[:8]
        client = AsyncLLMClient(api_key, base_url)
        server = {&quot;server_id&quot;: server_id, &quot;name&quot;: name, &quot;description&quot;: description,
                  &quot;base_url&quot;: base_url, &quot;api_key&quot;: api_key, &quot;default_model&quot;: default_model, &quot;client&quot;: client}
        self.items.append(server)
        await client.__aenter__()
        self._save_after_change()
        return server
    async def distribute_prompts(self, prompts, server_ids, mcp_agent_ids=None, mcp_items=None, mode=&quot;ALL_GET_ALL&quot;,
                                 **kwargs):
        results = []
        max_tokens = kwargs.get(&quot;max_tokens&quot;, 1024)
        temperature = kwargs.get(&quot;temperature&quot;, 0.7)
        model = kwargs.get(&quot;model&quot;, &quot;gpt-4o-mini&quot;)

        tool_context = &quot;&quot;
        if mcp_agent_ids and mcp_items:
            tool_names = []
            for agent in mcp_items:
                if agent.get(&quot;agent_id&quot;) in mcp_agent_ids:
                    tool_names.extend(agent.get(&quot;enabled_tools&quot;, []))
            if tool_names:
                tool_context = f&quot;You have access to the following MCP agentic tools: {&apos;, &apos;.join(tool_names)}.\nUse them when appropriate to solve the task.\n\n&quot;

        for prompt in prompts:
            augmented_prompt = tool_context + prompt
            for sid in server_ids:
                try:
                    server = next((s for s in self.items if s[&quot;server_id&quot;] == sid), None)
                    if not server or not server.get(&quot;client&quot;):
                        continue
                    await server[&quot;client&quot;].send_request(augmented_prompt,
                                                        max_tokens=max_tokens,
                                                        temperature=temperature,
                                                        model=model)
                    results.append(f&quot;Sent to LLM server {sid}&quot;)
                except Exception as e:
                    results.append(f&quot;Error sending to server {sid}: {e}&quot;)
        return {&quot;status&quot;: &quot;success&quot;, &quot;details&quot;: results}
# ====================== MAIN MANAGER ======================
class AsyncMultiClientManager:
    def __init__(self):
        self.mcp_manager = AsyncMCPAgentManager()
        self.llm_manager = AsyncLLMServerManager()

    async def __aenter__(self):
        await self.mcp_manager.__aenter__()
        await self.llm_manager.__aenter__()
        return self

    async def __aexit__(self, *args):
        await self.mcp_manager.__aexit__(*args)
        await self.llm_manager.__aexit__(*args)
# ====================== FLASK APPLICATION ======================
async def main():
    manager = AsyncMultiClientManager()
    async with manager:
        app = Flask(__name__, template_folder=&apos;templates&apos;)
        app.config[&apos;SEND_FILE_MAX_AGE_DEFAULT&apos;] = 0
        app.manager = manager
        app.loop = asyncio.get_running_loop()

        @app.route(&apos;/&apos;)
        def dashboard():
            return render_template(&apos;dashboard.html&apos;)

        @app.route(&apos;/api/status&apos;)
        def api_status():
            data = {
                &quot;mcp_agents&quot;: manager.mcp_manager.list_mcp_agents(),
                &quot;llm_servers&quot;: manager.llm_manager.list_llm_servers(),
                &quot;scheduled_tasks&quot;: manager.mcp_manager.scheduled_tasks
            }

            # Active requests per client (MCP Agents + LLM Servers) &#x2013; full prompt + live response
            for agent in manager.mcp_manager.items:
                key = f&quot;{agent[&apos;name&apos;]} ({agent[&apos;agent_id&apos;]})&quot;
                client = agent.get(&quot;client&quot;)
                if client:
                    data[key] = {
                        &quot;active_requests&quot;: sum(1 for r in client.requests.values() if r[&quot;status&quot;] == &quot;monitoring&quot;),
                        &quot;requests&quot;: [{
                            &quot;request_id&quot;: rid,
                            &quot;prompt_preview&quot;: r.get(&quot;prompt&quot;, &quot;&quot;),
                            &quot;full_response&quot;: r.get(&quot;full_response&quot;, &quot;&quot;),
                            &quot;status&quot;: r[&quot;status&quot;],
                            &quot;total_bytes&quot;: r[&quot;total_bytes&quot;],
                            &quot;estimated_tokens&quot;: r.get(&quot;estimated_tokens&quot;, 0),
                            &quot;duration&quot;: round((r.get(&quot;end_time&quot;) or time.time()) - (r.get(&quot;start_time&quot;) or time.time()), 2)
                        } for rid, r in client.requests.items()]
                    }

            for server in manager.llm_manager.items:
                key = f&quot;{server[&apos;name&apos;]} ({server[&apos;server_id&apos;]})&quot;
                client = server.get(&quot;client&quot;)
                if client:
                    data[key] = {
                        &quot;active_requests&quot;: sum(1 for r in client.requests.values() if r[&quot;status&quot;] == &quot;monitoring&quot;),
                        &quot;requests&quot;: [{
                            &quot;request_id&quot;: rid,
                            &quot;prompt_preview&quot;: r.get(&quot;prompt&quot;, &quot;&quot;),
                            &quot;full_response&quot;: r.get(&quot;full_response&quot;, &quot;&quot;),   # &#x2190; This was missing &#x2013; now fixed
                            &quot;status&quot;: r[&quot;status&quot;],
                            &quot;total_bytes&quot;: r[&quot;total_bytes&quot;],
                            &quot;estimated_tokens&quot;: r.get(&quot;estimated_tokens&quot;, 0),
                            &quot;duration&quot;: round((r.get(&quot;end_time&quot;) or time.time()) - (r.get(&quot;start_time&quot;) or time.time()), 2)
                        } for rid, r in client.requests.items()]
                    }

            return jsonify(data)

        @app.route(&apos;/api/mcp_agents&apos;, methods=[&apos;GET&apos;, &apos;POST&apos;])
        def api_mcp_agents():
            if request.method == &apos;GET&apos;:
                return jsonify({&quot;mcp_agents&quot;: manager.mcp_manager.list_mcp_agents()})
            data = request.get_json() or {}
            future = asyncio.run_coroutine_threadsafe(
                manager.mcp_manager.create_mcp_agent(
                    name=data.get(&quot;name&quot;),
                    base_url=data.get(&quot;base_url&quot;),
                    api_key=data.get(&quot;api_key&quot;, &quot;&quot;),
                    description=data.get(&quot;description&quot;, &quot;&quot;),
                    enabled_tools=data.get(&quot;enabled_tools&quot;)
                ), app.loop)
            agent = future.result()
            return jsonify({&quot;status&quot;: &quot;created&quot;, &quot;agent&quot;: manager.mcp_manager._get_serializable()[-1]}), 201

        @app.route(&apos;/api/mcp_agents/&lt;agent_id&gt;&apos;, methods=[&apos;PUT&apos;, &apos;DELETE&apos;])
        def api_mcp_agent(agent_id):
            if request.method == &apos;DELETE&apos;:
                success = asyncio.run_coroutine_threadsafe(
                    manager.mcp_manager.delete_mcp_agent(agent_id), app.loop).result()
                return jsonify({&quot;status&quot;: &quot;deleted&quot; if success else &quot;failed&quot;})

            # === PUT: Update existing agent ===
            data = request.get_json() or {}
            future = asyncio.run_coroutine_threadsafe(
                manager.mcp_manager.update_mcp_agent(agent_id, **data), app.loop)
            updated_agent = future.result()

            if updated_agent is None:
                return jsonify({&quot;status&quot;: &quot;failed&quot;, &quot;error&quot;: &quot;Agent not found&quot;}), 404

            # Return ONLY serializable data (matches list_mcp_agents format)
            return jsonify({
                &quot;status&quot;: &quot;updated&quot;,
                &quot;agent&quot;: {
                    &quot;agent_id&quot;: updated_agent[&quot;agent_id&quot;],
                    &quot;name&quot;: updated_agent[&quot;name&quot;],
                    &quot;description&quot;: updated_agent.get(&quot;description&quot;, &quot;&quot;),
                    &quot;base_url&quot;: updated_agent[&quot;base_url&quot;],
                    &quot;enabled&quot;: updated_agent.get(&quot;enabled&quot;, True),
                    &quot;enabled_tools&quot;: updated_agent.get(&quot;enabled_tools&quot;, []),
                    &quot;active_requests&quot;: 0,  # will be recalculated on next dashboard refresh
                    &quot;total_requests&quot;: 0
                }
            })

        @app.route(&apos;/api/llm_servers&apos;, methods=[&apos;GET&apos;, &apos;POST&apos;])
        def api_llm_servers():
            if request.method == &apos;GET&apos;:
                return jsonify({&quot;llm_servers&quot;: manager.llm_manager.list_llm_servers()})
            data = request.get_json() or {}
            future = asyncio.run_coroutine_threadsafe(
                manager.llm_manager.create_llm_server(
                    name=data.get(&quot;name&quot;),
                    base_url=data.get(&quot;base_url&quot;),
                    api_key=data.get(&quot;api_key&quot;, &quot;&quot;),
                    default_model=data.get(&quot;default_model&quot;, &quot;gpt-4o-mini&quot;)
                ), app.loop)
            server = future.result()
            return jsonify({&quot;status&quot;: &quot;created&quot;, &quot;server&quot;: manager.llm_manager._get_serializable()[-1]}), 201

        @app.route(&apos;/api/llm_servers/&lt;server_id&gt;&apos;, methods=[&apos;PUT&apos;, &apos;DELETE&apos;])
        def api_llm_server(server_id):
            if request.method == &apos;DELETE&apos;:
                success = asyncio.run_coroutine_threadsafe(
                    manager.llm_manager.delete_llm_server(server_id), app.loop).result()
                return jsonify({&quot;status&quot;: &quot;deleted&quot; if success else &quot;failed&quot;})
            data = request.get_json() or {}
            future = asyncio.run_coroutine_threadsafe(
                manager.llm_manager.update_llm_server(server_id, **data), app.loop)
            return jsonify({&quot;status&quot;: &quot;updated&quot;})

        @app.route(&apos;/api/distribute_prompts&apos;, methods=[&apos;POST&apos;])
        def api_distribute_prompts():
            data = request.get_json() or {}
            future = asyncio.run_coroutine_threadsafe(
                manager.llm_manager.distribute_prompts(
                    prompts=data.get(&quot;prompts&quot;, []),
                    server_ids=data.get(&quot;selected_servers&quot;, []),
                    mcp_agent_ids=data.get(&quot;selected_agents&quot;, []),
                    mcp_items=manager.mcp_manager.items,
                    mode=data.get(&quot;distribution_mode&quot;, &quot;ALL_GET_ALL&quot;),
                    max_tokens=data.get(&quot;max_tokens&quot;, 256000)
                ), app.loop)
            result = future.result()
            return jsonify(result)

        @app.route(&apos;/api/mcp_scan&apos;, methods=[&apos;POST&apos;])
        def api_mcp_scan():
            data = request.get_json() or {}
            http_address = data.get(&quot;http_address&quot;)
            api_key = data.get(&quot;api_key&quot;)
            if not http_address:
                return jsonify({&quot;success&quot;: False, &quot;error&quot;: &quot;Missing http_address&quot;}), 400

            # Call directly - no asyncio wrapper needed for this synchronous method
            result = manager.mcp_manager.scan_mcp_server(http_address, api_key=api_key)
            return jsonify(result)

        def run_flask():
            app.run(host=&quot;0.0.0.0&quot;, port=5012, debug=False, use_reloader=False)

        threading.Thread(target=run_flask, daemon=True).start()
        print(&quot;&#x2705; Flask dashboard started &#x2192; http://localhost:5012&quot;)
        await asyncio.sleep(3600)


if __name__ == &quot;__main__&quot;:
    asyncio.run(main())


</code></pre><p>Inside a <code>templates</code> folder create <code>dashboard.html</code> and put inside it.</p><pre><code class="language-html">&lt;!DOCTYPE html&gt;
&lt;html lang=&quot;en&quot;&gt;
&lt;head&gt;
    &lt;meta charset=&quot;UTF-8&quot;&gt;
    &lt;title&gt;LLM Queue Dispatcher&lt;/title&gt;
    &lt;link href=&quot;https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css&quot; rel=&quot;stylesheet&quot;&gt;
    &lt;script src=&quot;https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/js/bootstrap.bundle.min.js&quot;&gt;&lt;/script&gt;
    &lt;script src=&quot;https://cdn.jsdelivr.net/npm/marked/marked.min.js&quot;&gt;&lt;/script&gt;
    &lt;style&gt;
        body { padding: 30px; font-family: system-ui, -apple-system, sans-serif; background-color: #f8f9fa; }
        .card { margin-bottom: 25px; box-shadow: 0 4px 12px rgba(0,0,0,0.1); }
        .request-card { border-left: 5px solid #0d6efd; margin-bottom: 20px; }
        .status-monitoring { color: #0d6efd; font-weight: bold; animation: pulse 2s infinite; }
        .status-completed { color: #198754; }
        .status-error { color: #dc3545; }
        .thinking-box { background-color: #f8f9fa; border: 1px dashed #0d6efd; border-radius: 8px; padding: 18px; max-height: 420px; overflow-y: auto; font-size: 0.98rem; line-height: 1.6; }
        .progress-bar { transition: width 0.6s ease-in-out; }
        @keyframes pulse { 0%, 100% { opacity: 1; } 50% { opacity: 0.6; } }
        #last-updated { font-size: 0.9rem; color: #6c757d; }
        .countdown { font-family: monospace; font-weight: 700; letter-spacing: 1px; }
        .countdown.urgent { color: #dc3545; animation: pulse 1s infinite; }
        /* Pronounced ringed code blocks &#x2013; entire block only, no per-line borders */
        .markdown-body pre {
            background-color: #e3f0ff !important;   /* more pronounced cool blue-gray */
            border: 3px solid #0d6efd !important;   /* strong blue ring */
            border-radius: 8px;
            padding: 16px !important;
            margin: 12px 0;
            overflow-x: auto;
        }
        .markdown-body pre code {
            background-color: transparent !important;
            border: none !important;
            padding: 0;
        }
    &lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;
    &lt;div class=&quot;container&quot;&gt;
        &lt;div class=&quot;d-flex justify-content-between align-items-center mb-4&quot;&gt;
            &lt;h1 class=&quot;mb-0&quot;&gt;&#x1F916; LLM Queue Dispatcher&lt;/h1&gt;
            &lt;div&gt;&lt;span id=&quot;last-updated&quot; class=&quot;text-muted&quot;&gt;Last updated: just now&lt;/span&gt;&lt;/div&gt;
        &lt;/div&gt;
        &lt;ul class=&quot;nav nav-tabs mb-4&quot; role=&quot;tablist&quot;&gt;
            &lt;li class=&quot;nav-item&quot;&gt;&lt;button class=&quot;nav-link active&quot; data-bs-toggle=&quot;tab&quot; data-bs-target=&quot;#active-prompts&quot;&gt;Active Prompts&lt;/button&gt;&lt;/li&gt;
            &lt;li class=&quot;nav-item&quot;&gt;&lt;button class=&quot;nav-link&quot; data-bs-toggle=&quot;tab&quot; data-bs-target=&quot;#done-walk-queue&quot;&gt;DONE WALK QUEUE&lt;/button&gt;&lt;/li&gt;
            &lt;li class=&quot;nav-item&quot;&gt;&lt;button class=&quot;nav-link&quot; data-bs-toggle=&quot;tab&quot; data-bs-target=&quot;#mcp-agents&quot;&gt;&#x1F527; MCP Agents&lt;/button&gt;&lt;/li&gt;
            &lt;li class=&quot;nav-item&quot;&gt;&lt;button class=&quot;nav-link&quot; data-bs-toggle=&quot;tab&quot; data-bs-target=&quot;#llm-servers&quot;&gt;&#x1F310; LLM Servers&lt;/button&gt;&lt;/li&gt;
            &lt;li class=&quot;nav-item&quot;&gt;&lt;button class=&quot;nav-link&quot; data-bs-toggle=&quot;tab&quot; data-bs-target=&quot;#prompt-generator&quot;&gt;PROMPT GENERATOR&lt;/button&gt;&lt;/li&gt;
        &lt;/ul&gt;

        &lt;div class=&quot;tab-content&quot;&gt;
            &lt;!-- Active Prompts --&gt;
            &lt;div class=&quot;tab-pane fade show active&quot; id=&quot;active-prompts&quot;&gt;
                &lt;div id=&quot;active-content&quot;&gt;
                    &lt;div class=&quot;text-center py-5&quot;&gt;
                        &lt;div class=&quot;spinner-border text-primary&quot; role=&quot;status&quot;&gt;&lt;/div&gt;
                        &lt;p class=&quot;mt-3 text-muted&quot;&gt;Loading active prompts...&lt;/p&gt;
                    &lt;/div&gt;
                &lt;/div&gt;
            &lt;/div&gt;

            &lt;!-- DONE WALK QUEUE --&gt;
            &lt;div class=&quot;tab-pane fade&quot; id=&quot;done-walk-queue&quot;&gt;
                &lt;div id=&quot;done-walk-content&quot;&gt;
                    &lt;div class=&quot;text-center py-5&quot;&gt;
                        &lt;div class=&quot;spinner-border text-primary&quot; role=&quot;status&quot;&gt;&lt;/div&gt;
                        &lt;p class=&quot;mt-3 text-muted&quot;&gt;Loading Done-Walk queue...&lt;/p&gt;
                    &lt;/div&gt;
                &lt;/div&gt;
            &lt;/div&gt;

            &lt;!-- MCP Agents --&gt;
            &lt;div class=&quot;tab-pane fade&quot; id=&quot;mcp-agents&quot;&gt;
                &lt;div class=&quot;card&quot;&gt;
                    &lt;div class=&quot;card-header bg-white border-0 d-flex justify-content-between align-items-center&quot;&gt;
                        &lt;h5 class=&quot;mb-0&quot;&gt;&lt;span class=&quot;badge bg-primary me-2&quot;&gt;&#x1F527;&lt;/span&gt;MCP Agents&lt;/h5&gt;
                        &lt;button class=&quot;btn btn-success&quot; onclick=&quot;showAddAgentModal()&quot;&gt;&lt;i class=&quot;bi bi-plus-circle&quot;&gt;&lt;/i&gt; Add New MCP Agent&lt;/button&gt;
                    &lt;/div&gt;
                    &lt;div class=&quot;card-body p-0&quot;&gt;
                        &lt;div class=&quot;table-responsive&quot;&gt;
                            &lt;table class=&quot;table table-hover mb-0&quot; id=&quot;agents-table&quot;&gt;
                                &lt;thead class=&quot;table-light&quot;&gt;
                                    &lt;tr&gt;
                                        &lt;th&gt;Name&lt;/th&gt;&lt;th&gt;Description&lt;/th&gt;&lt;th&gt;Base URL&lt;/th&gt;
                                        &lt;th&gt;Enabled Tools&lt;/th&gt;&lt;th class=&quot;text-end&quot;&gt;Actions&lt;/th&gt;
                                    &lt;/tr&gt;
                                &lt;/thead&gt;
                                &lt;tbody id=&quot;agents-tbody&quot;&gt;&lt;/tbody&gt;
                            &lt;/table&gt;
                        &lt;/div&gt;
                    &lt;/div&gt;
                &lt;/div&gt;
            &lt;/div&gt;

            &lt;!-- LLM Servers --&gt;
            &lt;div class=&quot;tab-pane fade&quot; id=&quot;llm-servers&quot;&gt;
                &lt;div class=&quot;card&quot;&gt;
                    &lt;div class=&quot;card-header bg-white border-0 d-flex justify-content-between align-items-center&quot;&gt;
                        &lt;h5 class=&quot;mb-0&quot;&gt;&lt;span class=&quot;badge bg-success me-2&quot;&gt;&#x1F310;&lt;/span&gt;LLM Servers&lt;/h5&gt;
                        &lt;button class=&quot;btn btn-success&quot; onclick=&quot;showAddServerModal()&quot;&gt;&lt;i class=&quot;bi bi-plus-circle&quot;&gt;&lt;/i&gt; Add New LLM Server&lt;/button&gt;
                    &lt;/div&gt;
                    &lt;div class=&quot;card-body p-0&quot;&gt;
                        &lt;div class=&quot;table-responsive&quot;&gt;
                            &lt;table class=&quot;table table-hover mb-0&quot; id=&quot;servers-table&quot;&gt;
                                &lt;thead class=&quot;table-light&quot;&gt;
                                    &lt;tr&gt;
                                        &lt;th&gt;Name&lt;/th&gt;&lt;th&gt;Base URL&lt;/th&gt;&lt;th&gt;Default Model&lt;/th&gt;
                                        &lt;th&gt;Active Requests&lt;/th&gt;&lt;th class=&quot;text-end&quot;&gt;Actions&lt;/th&gt;
                                    &lt;/tr&gt;
                                &lt;/thead&gt;
                                &lt;tbody id=&quot;servers-tbody&quot;&gt;&lt;/tbody&gt;
                            &lt;/table&gt;
                        &lt;/div&gt;
                    &lt;/div&gt;
                &lt;/div&gt;
            &lt;/div&gt;

            &lt;!-- Prompt Generator --&gt;
            &lt;div class=&quot;tab-pane fade&quot; id=&quot;prompt-generator&quot;&gt;
                &lt;div class=&quot;card-body&quot;&gt;
                    &lt;div class=&quot;mt-4&quot;&gt;
                        &lt;label class=&quot;form-label fw-bold&quot;&gt;Generated Prompts (Demarc all single-line/multi-line prompts with -----)  &lt;/label&gt;
                        &lt;textarea id=&quot;generated-prompts&quot; class=&quot;form-control&quot; rows=&quot;10&quot; style=&quot;font-family: monospace;&quot;&gt;&lt;/textarea&gt;
                    &lt;/div&gt;

                    &lt;div class=&quot;mt-4 border-top pt-4&quot;&gt;
                        &lt;h6 class=&quot;mb-3&quot;&gt;Select MCP Agents &lt;small class=&quot;text-muted&quot;&gt;(Optional - Support Tools)&lt;/small&gt;&lt;/h6&gt;
                        &lt;div id=&quot;agent-selection&quot; class=&quot;row&quot;&gt;&lt;/div&gt;

                        &lt;h6 class=&quot;mb-3 mt-4&quot;&gt;Select LLM Servers &lt;span class=&quot;text-danger&quot;&gt;*&lt;/span&gt; &lt;small class=&quot;text-muted&quot;&gt;(Required)&lt;/small&gt;&lt;/h6&gt;
                        &lt;div id=&quot;server-selection&quot; class=&quot;row&quot;&gt;&lt;/div&gt;

                        &lt;div class=&quot;mt-4&quot;&gt;
                            &lt;label class=&quot;form-label fw-bold&quot;&gt;Distribution Mode&lt;/label&gt;
                            &lt;div class=&quot;btn-group w-100&quot; role=&quot;group&quot;&gt;
                                &lt;input type=&quot;radio&quot; class=&quot;btn-check&quot; name=&quot;dist-mode&quot; id=&quot;mode-all&quot; value=&quot;ALL_GET_ALL&quot; checked&gt;
                                &lt;label class=&quot;btn btn-outline-primary&quot; for=&quot;mode-all&quot;&gt;All Selected Get All Prompts&lt;/label&gt;
                                &lt;input type=&quot;radio&quot; class=&quot;btn-check&quot; name=&quot;dist-mode&quot; id=&quot;mode-donewalk&quot; value=&quot;DONE_WALK&quot;&gt;
                                &lt;label class=&quot;btn btn-outline-primary&quot; for=&quot;mode-donewalk&quot;&gt;Done-Walk (Sequential)&lt;/label&gt;
                            &lt;/div&gt;
                        &lt;/div&gt;

                        &lt;button onclick=&quot;applyPrompts()&quot; class=&quot;btn btn-success mt-4 w-100&quot;&gt;Apply &amp;amp; Distribute Prompts&lt;/button&gt;
                    &lt;/div&gt;
                &lt;/div&gt;
            &lt;/div&gt;
        &lt;/div&gt;
    &lt;/div&gt;
    &lt;!-- ==================== UPDATED MCP AGENT MODAL ==================== --&gt;
    &lt;div class=&quot;modal fade&quot; id=&quot;agentModal&quot; tabindex=&quot;-1&quot;&gt;
        &lt;div class=&quot;modal-dialog modal-lg&quot;&gt;
            &lt;div class=&quot;modal-content&quot;&gt;
                &lt;div class=&quot;modal-header&quot;&gt;
                    &lt;h5 class=&quot;modal-title&quot; id=&quot;agentModalLabel&quot;&gt;Add New MCP Agent&lt;/h5&gt;
                    &lt;button type=&quot;button&quot; class=&quot;btn-close&quot; data-bs-dismiss=&quot;modal&quot;&gt;&lt;/button&gt;
                &lt;/div&gt;
                &lt;div class=&quot;modal-body&quot;&gt;
                    &lt;form id=&quot;agentForm&quot;&gt;
                        &lt;input type=&quot;hidden&quot; id=&quot;agent-id&quot; name=&quot;id&quot;&gt;

                        &lt;div class=&quot;row&quot;&gt;
                            &lt;div class=&quot;col-md-6&quot;&gt;
                                &lt;div class=&quot;mb-3&quot;&gt;
                                    &lt;label class=&quot;form-label&quot;&gt;Name &lt;span class=&quot;text-danger&quot;&gt;*&lt;/span&gt;&lt;/label&gt;
                                    &lt;input type=&quot;text&quot; id=&quot;agent-name&quot; name=&quot;name&quot; class=&quot;form-control&quot; required&gt;
                                &lt;/div&gt;
                            &lt;/div&gt;
                            &lt;div class=&quot;col-md-6&quot;&gt;
                                &lt;div class=&quot;mb-3&quot;&gt;
                                    &lt;label class=&quot;form-label&quot;&gt;Base URL &lt;span class=&quot;text-danger&quot;&gt;*&lt;/span&gt;&lt;/label&gt;
                                    &lt;input type=&quot;text&quot; id=&quot;agent-baseurl&quot; name=&quot;baseurl&quot; class=&quot;form-control&quot;
                                           placeholder=&quot;http://192.168.1.3:5000/mcp&quot; required&gt;
                                &lt;/div&gt;
                            &lt;/div&gt;
                        &lt;/div&gt;

                        &lt;!-- Description &#x2013; now with better guidance --&gt;
                        &lt;div class=&quot;mb-3&quot;&gt;
                            &lt;label class=&quot;form-label&quot;&gt;Description&lt;/label&gt;
                            &lt;textarea id=&quot;agent-description&quot; name=&quot;description&quot; class=&quot;form-control&quot; rows=&quot;4&quot;
                                      placeholder=&quot;Enter agent description (spaces and newlines are fully supported)&quot;&gt;&lt;/textarea&gt;
                            &lt;small class=&quot;text-muted&quot;&gt;Spaces, punctuation, and multi-line text are preserved.&lt;/small&gt;
                        &lt;/div&gt;
                        &lt;div class=&quot;mb-3&quot;&gt;
                            &lt;label class=&quot;form-label&quot;&gt;API Key&lt;/label&gt;
                            &lt;input type=&quot;password&quot; id=&quot;agent-apikey&quot; name=&quot;apikey&quot; class=&quot;form-control&quot;&gt;
                        &lt;/div&gt;
                        &lt;!-- Tools Section --&gt;
                        &lt;div class=&quot;mb-3&quot;&gt;
                            &lt;div class=&quot;d-flex justify-content-between align-items-center mb-2&quot;&gt;
                                &lt;label class=&quot;form-label mb-0&quot;&gt;Enabled Tools&lt;/label&gt;
                                &lt;button type=&quot;button&quot; class=&quot;btn btn-sm btn-info&quot; onclick=&quot;scanMCP()&quot;&gt;
                                    &lt;i class=&quot;bi bi-broadcast&quot;&gt;&lt;/i&gt; SCAN MCP Tools
                                &lt;/button&gt;
                            &lt;/div&gt;
                            &lt;select id=&quot;agent-tools-select&quot; name=&quot;tools[]&quot; class=&quot;form-select&quot; multiple size=&quot;8&quot;
                                    style=&quot;max-height: 280px;&quot;&gt;
                                &lt;!-- Populated by scanMCP() --&gt;
                            &lt;/select&gt;
                            &lt;small class=&quot;text-muted&quot;&gt;Hold Ctrl (Windows) or Cmd (Mac) to select multiple tools.&lt;br&gt;
                            Full tool descriptions appear on hover.&lt;/small&gt;
                        &lt;/div&gt;
                        &lt;!-- Legacy text field (kept for backward compatibility) --&gt;
                        &lt;div class=&quot;mb-3&quot;&gt;
                            &lt;label class=&quot;form-label&quot;&gt;Enabled Tools (Text) &#x2013; Legacy&lt;/label&gt;
                            &lt;input type=&quot;text&quot; id=&quot;agent-tools&quot; name=&quot;tools_text&quot; class=&quot;form-control&quot;
                                   placeholder=&quot;add, subtract, evaluate_math&quot;&gt;
                            &lt;small class=&quot;text-muted&quot;&gt;Comma-separated list (spaces around commas are ignored).&lt;/small&gt;
                        &lt;/div&gt;
                    &lt;/form&gt;
                &lt;/div&gt;
                &lt;div class=&quot;modal-footer&quot;&gt;
                    &lt;button type=&quot;button&quot; class=&quot;btn btn-secondary&quot; data-bs-dismiss=&quot;modal&quot;&gt;Cancel&lt;/button&gt;
                    &lt;button type=&quot;button&quot; class=&quot;btn btn-primary&quot; onclick=&quot;saveAgent()&quot;&gt;
                        &lt;i class=&quot;bi bi-save&quot;&gt;&lt;/i&gt; Save Agent
                    &lt;/button&gt;
                &lt;/div&gt;
            &lt;/div&gt;
        &lt;/div&gt;
    &lt;/div&gt;

    &lt;!-- LLM Server Modal (kept for completeness) --&gt;
    &lt;div class=&quot;modal fade&quot; id=&quot;serverModal&quot; tabindex=&quot;-1&quot;&gt;
        &lt;div class=&quot;modal-dialog modal-lg&quot;&gt;
            &lt;div class=&quot;modal-content&quot;&gt;
                &lt;div class=&quot;modal-header&quot;&gt;
                    &lt;h5 class=&quot;modal-title&quot; id=&quot;serverModalLabel&quot;&gt;Add New LLM Server&lt;/h5&gt;
                    &lt;button type=&quot;button&quot; class=&quot;btn-close&quot; data-bs-dismiss=&quot;modal&quot;&gt;&lt;/button&gt;
                &lt;/div&gt;
                &lt;div class=&quot;modal-body&quot;&gt;
                    &lt;form id=&quot;serverForm&quot;&gt;
                        &lt;input type=&quot;hidden&quot; id=&quot;server-id&quot;&gt;
                        &lt;div class=&quot;mb-3&quot;&gt;&lt;label class=&quot;form-label&quot;&gt;Server Name&lt;/label&gt;&lt;input type=&quot;text&quot; id=&quot;server-name&quot; class=&quot;form-control&quot; required&gt;&lt;/div&gt;
                        &lt;div class=&quot;mb-3&quot;&gt;&lt;label class=&quot;form-label&quot;&gt;Base URL&lt;/label&gt;&lt;input type=&quot;text&quot; id=&quot;server-baseurl&quot; class=&quot;form-control&quot; required&gt;&lt;/div&gt;
                        &lt;div class=&quot;mb-3&quot;&gt;&lt;label class=&quot;form-label&quot;&gt;API Key&lt;/label&gt;&lt;input type=&quot;password&quot; id=&quot;server-apikey&quot; class=&quot;form-control&quot;&gt;&lt;/div&gt;
                        &lt;div class=&quot;mb-3&quot;&gt;&lt;label class=&quot;form-label&quot;&gt;Default Model&lt;/label&gt;&lt;input type=&quot;text&quot; id=&quot;server-model&quot; class=&quot;form-control&quot; value=&quot;gpt-4o-mini&quot;&gt;&lt;/div&gt;
                    &lt;/form&gt;
                &lt;/div&gt;
                &lt;div class=&quot;modal-footer&quot;&gt;
                    &lt;button type=&quot;button&quot; class=&quot;btn btn-secondary&quot; data-bs-dismiss=&quot;modal&quot;&gt;Cancel&lt;/button&gt;
                    &lt;button type=&quot;button&quot; class=&quot;btn btn-primary&quot; onclick=&quot;saveServer()&quot;&gt;Save Server&lt;/button&gt;
                &lt;/div&gt;
            &lt;/div&gt;
        &lt;/div&gt;
    &lt;/div&gt;

    &lt;script&gt;

        let currentAgents = [];
        let currentServers = [];
        let editingAgentId = null;
        let editingServerId = null;

        // NEW: Remember which MCP agents the user has selected/deselected in Prompt Generator
        let savedSelectedAgentIds = new Set();
        let savedSelectedServerIds = new Set();

        // Done-Walk sequential queue
        let doneWalkQueue = [];
        let isDoneWalkRunning = false;

        // ====================== MCP SCAN (ALL tools enabled by default) =======================
        function showAddAgentModal() {
                editingAgentId = null;
                document.getElementById(&apos;agentModalLabel&apos;).textContent = &apos;Add New MCP Agent&apos;;
                document.getElementById(&apos;agentForm&apos;).reset();
                document.getElementById(&apos;agent-tools-select&apos;).innerHTML =
                    &apos;&lt;option disabled&gt;Click SCAN MCP Tools or press Enter in the Base URL field...&lt;/option&gt;&apos;;

                const modal = new bootstrap.Modal(document.getElementById(&apos;agentModal&apos;));
                modal.show();

                // Auto-trigger scan when user presses Enter in the Base URL field
                const baseUrlInput = document.getElementById(&apos;agent-baseurl&apos;);
                baseUrlInput.removeEventListener(&apos;keypress&apos;, handleBaseUrlEnter); // prevent duplicates

                function handleBaseUrlEnter(e) {
                    if (e.key === &apos;Enter&apos;) {
                        e.preventDefault();
                        scanMCP();
                    }
                }
                baseUrlInput.addEventListener(&apos;keypress&apos;, handleBaseUrlEnter);
            }

        // ====================== UPDATED: scanMCP (full JSON diagnostic + smart name extraction) =======================
        async function scanMCP() {
            const baseUrl = document.getElementById(&apos;agent-baseurl&apos;).value.trim();
            const apiKey = document.getElementById(&apos;agent-apikey&apos;).value.trim();
            if (!baseUrl) return alert(&quot;Please enter a Base URL first.&quot;);

            const scanBtn = document.querySelector(&apos;#agentModal .btn-info&apos;);
            const originalHTML = scanBtn.innerHTML;
            scanBtn.disabled = true;
            scanBtn.innerHTML = `&lt;span class=&quot;spinner-border spinner-border-sm&quot;&gt;&lt;/span&gt; Scanning...`;



            try {
                const res = await fetch(&apos;/api/mcp_scan&apos;, {
                    method: &apos;POST&apos;,
                    headers: {&apos;Content-Type&apos;: &apos;application/json&apos;},
                    body: JSON.stringify({ http_address: baseUrl, api_key: apiKey || null })
                });

                const result = await res.json();


                if (!result.success) {
                    console.error(&apos;&#x274C; [scanMCP] Backend reported error:&apos;, result.error);
                    return alert(`&#x274C; ${result.error || &apos;Scan failed&apos;}`);
                }

                const tools = result.tools || [];

                // &#x2500;&#x2500; SMART NAME EXTRACTION (tries many common MCP fields) &#x2500;&#x2500;
                let serverName = &apos;&apos;;

                // Direct fields the backend might return
                if (result.server_name) serverName = result.server_name;
                else if (result.name) serverName = result.name;
                else if (result.serverInfo &amp;&amp; result.serverInfo.name) serverName = result.serverInfo.name;
                else if (result.result &amp;&amp; result.result.name) serverName = result.result.name;
                else if (result.result &amp;&amp; result.result.server_name) serverName = result.result.server_name;

                // Fallback: clean hostname from URL
                if (!serverName) {
                    try {
                        const urlObj = new URL(baseUrl.startsWith(&apos;http&apos;) ? baseUrl : &apos;http://&apos; + baseUrl);
                        serverName = urlObj.hostname.replace(/^www\./, &apos;&apos;).toUpperCase() + &apos; MCP&apos;;
                    } catch (e) {
                        serverName = &apos;MCP Agent&apos;;
                    }
                }

                // Auto-fill the Name field
                document.getElementById(&apos;agent-name&apos;).value = serverName;

                // Populate tools dropdown
                const select = document.getElementById(&apos;agent-tools-select&apos;);
                select.innerHTML = &apos;&apos;;

                tools.forEach(tool =&gt; {
                    const name = tool.name || tool.tool_name || &apos;&apos;;
                    if (!name) return;

                    const opt = document.createElement(&apos;option&apos;);
                    opt.value = name;
                    opt.textContent = name;
                    opt.title = tool.description
                        ? tool.description.substring(0, 300) + (tool.description.length &gt; 300 ? &apos;...&apos; : &apos;&apos;)
                        : &apos;No description provided.&apos;;
                    opt.selected = true;
                    select.appendChild(opt);
                });

                const toolNames = tools.map(t =&gt; t.name || t.tool_name || &apos;&apos;).filter(Boolean);
                document.getElementById(&apos;agent-tools&apos;).value = toolNames.join(&apos;, &apos;);

                alert(`&#x2705; Success! Discovered ${toolNames.length} tools.\n\nName field auto-filled with: &quot;${serverName}&quot; `);

            } catch (e) {
                console.error(&apos;&#x1F6A8; [scanMCP] Exception:&apos;, e);
                alert(&quot;Failed to reach backend scan service.&quot;);
            } finally {
                scanBtn.disabled = false;
                scanBtn.innerHTML = originalHTML;
            }
        }

        // ====================== loadMCPTools (kept clean for Edit mode) =======================
        async function loadMCPTools(selectElement, baseUrl) {
            if (!selectElement || !baseUrl) return;
            selectElement.innerHTML = &apos;&lt;option disabled&gt;Loading tools...&lt;/option&gt;&apos;;

            try {
                const apiKey = document.getElementById(&apos;agent-apikey&apos;)?.value.trim() || null;

                const response = await fetch(&apos;/api/mcp_scan&apos;, {
                    method: &apos;POST&apos;,
                    headers: { &apos;Content-Type&apos;: &apos;application/json&apos; },
                    body: JSON.stringify({ http_address: baseUrl, api_key: apiKey })
                });

                if (!response.ok) throw new Error(`HTTP ${response.status}`);

                const result = await response.json();
                selectElement.innerHTML = &apos;&apos;;

                if (result.success &amp;&amp; Array.isArray(result.tools) &amp;&amp; result.tools.length &gt; 0) {
                    result.tools.forEach(tool =&gt; {
                        const name = tool.name || tool.tool_name || &apos;&apos;;
                        if (!name) return;
                        const opt = document.createElement(&apos;option&apos;);
                        opt.value = name;
                        opt.textContent = name;
                        opt.title = tool.description
                            ? tool.description.substring(0, 300) + (tool.description.length &gt; 300 ? &apos;...&apos; : &apos;&apos;)
                            : &apos;No description provided.&apos;;
                        selectElement.appendChild(opt);
                    });
                } else {
                    const msg = result.error ? result.error.substring(0, 60) : &apos;No tools found&apos;;
                    selectElement.innerHTML = `&lt;option disabled&gt;${msg}&lt;/option&gt;`;
                }
            } catch (err) {
                console.error(err);
                selectElement.innerHTML = &apos;&lt;option disabled&gt;Failed to load tools&lt;/option&gt;&apos;;
            }
        }
        function showAddAgentModal() {
        editingAgentId = null;
        document.getElementById(&apos;agentModalLabel&apos;).textContent = &apos;Add New MCP Agent&apos;;
        document.getElementById(&apos;agentForm&apos;).reset();
        document.getElementById(&apos;agent-tools-select&apos;).innerHTML =
            &apos;&lt;option disabled&gt;Click SCAN MCP Tools or press Enter in the Base URL field...&lt;/option&gt;&apos;;

        const modal = new bootstrap.Modal(document.getElementById(&apos;agentModal&apos;));
        modal.show();

        // &#x2500;&#x2500; NEW: Auto-trigger scan when user presses Enter in Base URL field &#x2500;&#x2500;
        const baseUrlInput = document.getElementById(&apos;agent-baseurl&apos;);

        // Remove any old listeners to avoid duplicates
        baseUrlInput.removeEventListener(&apos;keypress&apos;, handleBaseUrlEnter);

        function handleBaseUrlEnter(e) {
            if (e.key === &apos;Enter&apos;) {
                e.preventDefault();           // prevent form submission
                scanMCP();                    // same function used by the SCAN button
            }
        }

        baseUrlInput.addEventListener(&apos;keypress&apos;, handleBaseUrlEnter);
    }

        async function loadMCPTools(selectElement, baseUrl) {
            if (!selectElement || !baseUrl) return;

            selectElement.innerHTML = &apos;&lt;option disabled&gt;Loading tools...&lt;/option&gt;&apos;;

            try {
                const apiKey = document.getElementById(&apos;agent-apikey&apos;)?.value.trim() || null;

                const response = await fetch(&apos;/api/mcp_scan&apos;, {
                    method: &apos;POST&apos;,
                    headers: { &apos;Content-Type&apos;: &apos;application/json&apos; },
                    body: JSON.stringify({
                        http_address: baseUrl,
                        api_key: apiKey
                    })
                });

                if (!response.ok) throw new Error(`HTTP ${response.status}`);

                const result = await response.json();

                selectElement.innerHTML = &apos;&apos;;

                if (result.success &amp;&amp; Array.isArray(result.tools) &amp;&amp; result.tools.length &gt; 0) {
                    result.tools.forEach(tool =&gt; {
                        const name = tool.name || tool.tool_name || &apos;&apos;;
                        if (!name) return;

                        const opt = document.createElement(&apos;option&apos;);
                        opt.value = name;
                        opt.textContent = name;
                        opt.title = tool.description
                            ? tool.description.substring(0, 300) + (tool.description.length &gt; 300 ? &apos;...&apos; : &apos;&apos;)
                            : &apos;No description provided.&apos;;
                        selectElement.appendChild(opt);
                    });
                } else {
                    const msg = result.error ? result.error.substring(0, 60) : &apos;No tools found&apos;;
                    selectElement.innerHTML = `&lt;option disabled&gt;${msg}&lt;/option&gt;`;
                }
            } catch (err) {
                console.error(err);
                selectElement.innerHTML = &apos;&lt;option disabled&gt;Failed to load tools&lt;/option&gt;&apos;;
            }
        }
        async function saveAgent() {
            const select = document.getElementById(&apos;agent-tools-select&apos;);
            const enabledTools = Array.from(select.selectedOptions).map(opt =&gt; opt.value);

            const payload = {
                name: document.getElementById(&apos;agent-name&apos;).value.trim(),
                description: document.getElementById(&apos;agent-description&apos;).value.trim(),
                base_url: document.getElementById(&apos;agent-baseurl&apos;).value.trim(),
                api_key: document.getElementById(&apos;agent-apikey&apos;).value.trim(),
                enabled_tools: enabledTools
            };

            if (!payload.name || !payload.base_url) {
                return alert(&quot;Name and Base URL are required.&quot;);
            }

            let url = &apos;/api/mcp_agents&apos;;
            let method = &apos;POST&apos;;
            if (editingAgentId) {
                url = `/api/mcp_agents/${editingAgentId}`;
                method = &apos;PUT&apos;;
            }

            try {
                const res = await fetch(url, {
                    method: method,
                    headers: { &apos;Content-Type&apos;: &apos;application/json&apos; },
                    body: JSON.stringify(payload)
                });

                if (res.ok) {
                    bootstrap.Modal.getInstance(document.getElementById(&apos;agentModal&apos;)).hide();
                    editingAgentId = null;
                    updateDashboard();
                } else {
                    const errorText = await res.text().catch(() =&gt; &apos;Unknown error&apos;);
                    alert(`Failed to save agent: ${res.status} - ${errorText}`);
                }
            } catch (e) {
                console.error(e);
                alert(&quot;Error saving agent.&quot;);
            }
        }
        function editAgent(id) {
            const agent = currentAgents.find(a =&gt; a.agent_id === id);
            if (!agent) return;

            editingAgentId = id;
            document.getElementById(&apos;agentModalLabel&apos;).textContent = &apos;Edit MCP Agent&apos;;
            document.getElementById(&apos;agent-id&apos;).value = agent.agent_id || &apos;&apos;;
            document.getElementById(&apos;agent-name&apos;).value = agent.name || &apos;&apos;;
            document.getElementById(&apos;agent-description&apos;).value = agent.description || &apos;&apos;;
            document.getElementById(&apos;agent-baseurl&apos;).value = agent.base_url || &apos;&apos;;
            document.getElementById(&apos;agent-apikey&apos;).value = agent.api_key || &apos;&apos;;

            const modal = new bootstrap.Modal(document.getElementById(&apos;agentModal&apos;));
            modal.show();

            const toolsSelect = document.getElementById(&apos;agent-tools-select&apos;);
            if (agent.base_url) {
                loadMCPTools(toolsSelect, agent.base_url).then(() =&gt; {
                    const enabledSet = new Set((agent.enabled_tools || []));
                    Array.from(toolsSelect.options).forEach(opt =&gt; {
                        opt.selected = enabledSet.has(opt.value);
                    });
                });
            }
        }
        function renderAgentsTable() {
            const tbody = document.getElementById(&apos;agents-tbody&apos;);
            tbody.innerHTML = currentAgents.length === 0 ?
                `&lt;tr&gt;&lt;td colspan=&quot;5&quot; class=&quot;text-center py-4 text-muted&quot;&gt;No MCP agents yet.&lt;/td&gt;&lt;/tr&gt;` : &apos;&apos;;

            currentAgents.forEach(agent =&gt; {
                const count = (agent.enabled_tools || []).length;
                const toolList = (agent.enabled_tools || []).join(&apos;, &apos;) || &apos;None&apos;;
                const row = document.createElement(&apos;tr&apos;);
                row.innerHTML = `
                    &lt;td&gt;&lt;strong&gt;${agent.name}&lt;/strong&gt;&lt;/td&gt;
                    &lt;td style=&quot;white-space: pre-wrap; word-break: break-word;&quot;&gt;
                        ${agent.description || &apos;&#x2014;&apos;}
                    &lt;/td&gt;
                    &lt;td&gt;&lt;code&gt;${agent.base_url}&lt;/code&gt;&lt;/td&gt;
                    &lt;td&gt;
                        &lt;span class=&quot;badge bg-primary&quot; title=&quot;${toolList}&quot;&gt;
                            ${count} tool${count !== 1 ? &apos;s&apos; : &apos;&apos;}
                        &lt;/span&gt;
                    &lt;/td&gt;
                    &lt;td class=&quot;text-end&quot;&gt;
                        &lt;button class=&quot;btn btn-sm btn-outline-primary me-1&quot; onclick=&quot;editAgent(&apos;${agent.agent_id}&apos;)&quot;&gt;Edit&lt;/button&gt;
                        &lt;button class=&quot;btn btn-sm btn-outline-danger&quot; onclick=&quot;deleteAgent(&apos;${agent.agent_id}&apos;)&quot;&gt;Delete&lt;/button&gt;
                    &lt;/td&gt;`;
                tbody.appendChild(row);
            });
        }
        async function loadMCPTools(selectElement, baseUrl) {
            if (!selectElement || !baseUrl) return;

            selectElement.innerHTML = &apos;&lt;option disabled&gt;Loading tools...&lt;/option&gt;&apos;;

            try {
                const apiKey = document.getElementById(&apos;agent-apikey&apos;)?.value.trim() || null;

                const response = await fetch(&apos;/api/mcp_scan&apos;, {
                    method: &apos;POST&apos;,
                    headers: { &apos;Content-Type&apos;: &apos;application/json&apos; },
                    body: JSON.stringify({
                        http_address: baseUrl,
                        api_key: apiKey
                    })
                });

                if (!response.ok) throw new Error(`HTTP ${response.status}`);

                const result = await response.json();

                selectElement.innerHTML = &apos;&apos;;

                if (result.success &amp;&amp; Array.isArray(result.tools) &amp;&amp; result.tools.length &gt; 0) {
                    result.tools.forEach(tool =&gt; {
                        const toolName = tool.name || tool.tool_name || tool || &apos;&apos;;
                        if (!toolName) return;

                        const opt = document.createElement(&apos;option&apos;);
                        opt.value = toolName;
                        opt.textContent = toolName;
                        selectElement.appendChild(opt);
                    });
                } else {
                    const msg = result.error ? result.error.substring(0, 60) : &apos;No tools found&apos;;
                    selectElement.innerHTML = `&lt;option disabled&gt;${msg}&lt;/option&gt;`;
                }
            } catch (err) {
                console.error(err);
                selectElement.innerHTML = &apos;&lt;option disabled&gt;Failed to load tools&lt;/option&gt;&apos;;
            }
        }
        function renderServersTable() {
            const tbody = document.getElementById(&apos;servers-tbody&apos;);
            tbody.innerHTML = currentServers.length === 0 ?
                `&lt;tr&gt;&lt;td colspan=&quot;5&quot; class=&quot;text-center py-4 text-muted&quot;&gt;No LLM servers yet.&lt;/td&gt;&lt;/tr&gt;` : &apos;&apos;;

            currentServers.forEach(server =&gt; {
                const row = document.createElement(&apos;tr&apos;);
                row.innerHTML = `
                    &lt;td&gt;&lt;strong&gt;${server.name}&lt;/strong&gt;&lt;/td&gt;
                    &lt;td&gt;&lt;code&gt;${server.base_url}&lt;/code&gt;&lt;/td&gt;
                    &lt;td&gt;${server.default_model || &apos;&#x2014;&apos;}&lt;/td&gt;
                    &lt;td&gt;${server.active_requests || 0}&lt;/td&gt;
                    &lt;td class=&quot;text-end&quot;&gt;
                        &lt;button class=&quot;btn btn-sm btn-outline-primary me-1&quot; onclick=&quot;editServer(&apos;${server.server_id}&apos;)&quot;&gt;Edit&lt;/button&gt;
                        &lt;button class=&quot;btn btn-sm btn-outline-danger&quot; onclick=&quot;deleteServer(&apos;${server.server_id}&apos;)&quot;&gt;Delete&lt;/button&gt;
                    &lt;/td&gt;`;
                tbody.appendChild(row);
            });
        }
        async function deleteAgent(id) {
            if (!confirm(&quot;Delete this MCP agent permanently?&quot;)) return;
            try {
                const res = await fetch(`/api/mcp_agents/${id}`, { method: &apos;DELETE&apos; });
                if (res.ok) updateDashboard();
            } catch (e) { console.error(e); }
        }
        // ====================== LLM SERVER CRUD ======================
        function showAddServerModal() {
            editingServerId = null;
            document.getElementById(&apos;serverModalLabel&apos;).textContent = &apos;Add New LLM Server&apos;;
            document.getElementById(&apos;serverForm&apos;).reset();
            new bootstrap.Modal(document.getElementById(&apos;serverModal&apos;)).show();
            const baseUrlInput = document.getElementById(&apos;agent-baseurl&apos;);
            const toolsSelect = document.getElementById(&apos;agent-tools-select&apos;);

            baseUrlInput.addEventListener(&apos;blur&apos;, async () =&gt; {
                const url = baseUrlInput.value.trim();
                if (url) {
                    await loadMCPTools(toolsSelect, url);
                }
            });
        }
        function editServer(id) {
            const server = currentServers.find(s =&gt; s.server_id === id);
            if (!server) return;
            editingServerId = id;
            document.getElementById(&apos;serverModalLabel&apos;).textContent = &apos;Edit LLM Server&apos;;
            document.getElementById(&apos;server-id&apos;).value = server.server_id;
            document.getElementById(&apos;server-name&apos;).value = server.name || &apos;&apos;;
            document.getElementById(&apos;server-baseurl&apos;).value = server.base_url || &apos;&apos;;
            document.getElementById(&apos;server-apikey&apos;).value = server.api_key || &apos;&apos;;
            document.getElementById(&apos;server-model&apos;).value = server.default_model || &apos;gpt-4o-mini&apos;;
            new bootstrap.Modal(document.getElementById(&apos;serverModal&apos;)).show();
        }
        async function saveServer() {
            const payload = {
                name: document.getElementById(&apos;server-name&apos;).value.trim(),
                base_url: document.getElementById(&apos;server-baseurl&apos;).value.trim(),
                api_key: document.getElementById(&apos;server-apikey&apos;).value.trim(),
                default_model: document.getElementById(&apos;server-model&apos;).value.trim()
            };

            if (!payload.name || !payload.base_url) return alert(&quot;Name and Base URL required.&quot;);

            const url = editingServerId ? `/api/llm_servers/${editingServerId}` : &apos;/api/llm_servers&apos;;
            const method = editingServerId ? &apos;PUT&apos; : &apos;POST&apos;;

            try {
                const res = await fetch(url, {
                    method: method,
                    headers: { &apos;Content-Type&apos;: &apos;application/json&apos; },
                    body: JSON.stringify(payload)
                });
                if (res.ok) {
                    bootstrap.Modal.getInstance(document.getElementById(&apos;serverModal&apos;)).hide();
                    updateDashboard();
                } else alert(&quot;Failed to save server.&quot;);
            } catch (e) { alert(&quot;Error saving server.&quot;); }
        }
        async function deleteServer(id) {
            if (!confirm(&quot;Delete this LLM server permanently?&quot;)) return;
            try {
                const res = await fetch(`/api/llm_servers/${id}`, { method: &apos;DELETE&apos; });
                if (res.ok) updateDashboard();
            } catch (e) { console.error(e); }
        }
        // Helper to generate a unique 10-character trigger (used by applyPrompts)
        function generate10CharTrigger() {
            const chars = &apos;ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789&apos;;
            let result = &apos;&apos;;
            for (let i = 0; i &lt; 10; i++) {
                result += chars.charAt(Math.floor(Math.random() * chars.length));
            }
            return result;
        }

        // ====================== INITIALIZATION ======================
        function formatCountdown(seconds) {
                    if (seconds &lt;= 0) return &quot;Now&quot;;
                    const h = Math.floor(seconds / 3600);
                    const m = Math.floor((seconds % 3600) / 60);
                    const s = Math.floor(seconds % 60);
                    return h &gt; 0 ? `${h}h ${m}m` : `${m}m ${s}s`;
                }
        function generatePrompts() {
            const base = document.getElementById(&apos;base-prompt&apos;).value.trim();
            const c1 = document.getElementById(&apos;col1&apos;).value.split(&apos;,&apos;).map(s =&gt; s.trim()).filter(Boolean);
            const c2 = document.getElementById(&apos;col2&apos;).value.split(&apos;,&apos;).map(s =&gt; s.trim()).filter(Boolean);
            let out = &apos;&apos;;
            if (c1.length &amp;&amp; c2.length) {
                c1.forEach(a =&gt; c2.forEach(b =&gt; out += base.replace(&apos;{col1}&apos;, a).replace(&apos;{col2}&apos;, b) + &apos;\n\n&apos;));
            } else out = base;
            document.getElementById(&apos;generated-prompts&apos;).value = out.trim();
        }
        function renderAgentSelection() {
            const container = document.getElementById(&apos;agent-selection&apos;);

            // Rebuild the UI using the saved state as the single source of truth
            container.innerHTML = currentAgents.length === 0 ?
                &apos;&lt;div class=&quot;col-12 text-muted&quot;&gt;No MCP agents available.&lt;/div&gt;&apos; : &apos;&apos;;

            currentAgents.forEach(agent =&gt; {
                const isChecked = savedSelectedAgentIds.has(agent.agent_id);

                const div = document.createElement(&apos;div&apos;);
                div.className = &apos;col-md-4 mb-2&apos;;
                div.innerHTML = `&lt;div class=&quot;form-check&quot;&gt;
                    &lt;input class=&quot;form-check-input agent-checkbox&quot; type=&quot;checkbox&quot;
                           id=&quot;ag-${agent.agent_id}&quot; value=&quot;${agent.agent_id}&quot;
                           ${isChecked ? &apos;checked&apos; : &apos;&apos;}&gt;
                    &lt;label class=&quot;form-check-label&quot; for=&quot;ag-${agent.agent_id}&quot;&gt;${agent.name}&lt;/label&gt;
                &lt;/div&gt;`;
                container.appendChild(div);
            });

            // Attach change listeners so user deselections are immediately saved
            container.querySelectorAll(&apos;.agent-checkbox&apos;).forEach(cb =&gt; {
                cb.addEventListener(&apos;change&apos;, function () {
                    if (this.checked) {
                        savedSelectedAgentIds.add(this.value);
                    } else {
                        savedSelectedAgentIds.delete(this.value);
                    }
                });
            });
        }
        function renderServerSelection() {
            const container = document.getElementById(&apos;server-selection&apos;);

            // Capture current user selections before rebuild
            const currentlyChecked = new Set();
            container.querySelectorAll(&apos;.server-checkbox:checked&apos;).forEach(cb =&gt; {
                currentlyChecked.add(cb.value);
            });

            // Update persistent state
            if (currentlyChecked.size &gt; 0) {
                savedSelectedServerIds = currentlyChecked;
            }

            // Rebuild UI
            container.innerHTML = currentServers.length === 0 ?
                &apos;&lt;div class=&quot;col-12 text-muted&quot;&gt;No LLM servers available.&lt;/div&gt;&apos; : &apos;&apos;;

            currentServers.forEach(server =&gt; {
                const isChecked = savedSelectedServerIds.has(server.server_id);

                const div = document.createElement(&apos;div&apos;);
                div.className = &apos;col-md-4 mb-2&apos;;
                div.innerHTML = `&lt;div class=&quot;form-check&quot;&gt;
                    &lt;input class=&quot;form-check-input server-checkbox&quot; type=&quot;checkbox&quot;
                           id=&quot;sv-${server.server_id}&quot; value=&quot;${server.server_id}&quot;
                           ${isChecked ? &apos;checked&apos; : &apos;&apos;}&gt;
                    &lt;label class=&quot;form-check-label&quot; for=&quot;sv-${server.server_id}&quot;&gt;${server.name}&lt;/label&gt;
                &lt;/div&gt;`;
                container.appendChild(div);
            });

            // Re-attach change listeners
            container.querySelectorAll(&apos;.server-checkbox&apos;).forEach(cb =&gt; {
                cb.addEventListener(&apos;change&apos;, function () {
                    if (this.checked) {
                        savedSelectedServerIds.add(this.value);
                    } else {
                        savedSelectedServerIds.delete(this.value);
                    }
                });
            });
        }
        async function runDoneWalkSequence() {
            if (doneWalkQueue.length === 0 || isDoneWalkRunning) return;
            isDoneWalkRunning = true;

            const selectedLLMServers = Array.from(document.querySelectorAll(&apos;.server-checkbox:checked&apos;))
                                           .map(cb =&gt; cb.value);

            console.log(&apos;&#x1F680; Starting Done-Walk sequence with&apos;, doneWalkQueue.length, &apos;prompts&apos;);

            while (doneWalkQueue.length &gt; 0) {
                const item = doneWalkQueue.shift();
                const currentPrompt = item.prompt.trim();
                const triggerId = item.trigger;

                if (!currentPrompt) continue;

                console.log(&apos;&#x1F4E4; Sending prompt with unique trigger:&apos;, triggerId);

                // Strong, explicit instruction (proven to make LLM output trigger at the very end)
                const instruction = &quot;\n\n--- FINAL INSTRUCTION ---\n&quot; +
                                    &quot;You have now completed your full and final response to the prompt above.\n&quot; +
                                    &quot;At the very end of your entire reply, you MUST output exactly this 10-character trigger &quot; +
                                    &quot;and NOTHING ELSE after it (no period, no extra text, no newlines after it):\n\n&quot; +
                                    triggerId;

                const augmentedPrompt = currentPrompt + instruction;

                // Send ONLY this one prompt
                try {
                    await fetch(&apos;/api/distribute_prompts&apos;, {
                        method: &apos;POST&apos;,
                        headers: { &apos;Content-Type&apos;: &apos;application/json&apos; },
                        body: JSON.stringify({
                            prompts: [augmentedPrompt],
                            selected_agents: [],
                            selected_servers: selectedLLMServers,
                            distribution_mode: &quot;DONE_WALK&quot;,
                            max_tokens: 4096
                        })
                    });
                } catch (e) {
                    console.error(&apos;&#x274C; Failed to send prompt:&apos;, e);
                }

                updateDashboard(); // immediate UI refresh
                await new Promise(r =&gt; setTimeout(r, 1200));

                console.log(&apos;&#x1F50D; Starting poll loop &#x2013; waiting for trigger in MAIN OUTPUT only:&apos;, triggerId);

                let triggerDetected = false;
                let pollCount = 0;

                while (!triggerDetected &amp;&amp; pollCount &lt; 120) {   // ~3-minute safety limit
                    pollCount++;
                    await new Promise(r =&gt; setTimeout(r, 1500));

                    const statusRes = await fetch(&apos;/api/status&apos;);
                    const statusData = await statusRes.json();

                    triggerDetected = Object.values(statusData).some(info =&gt; {
                        if (!info || !info.requests) return false;

                        return info.requests.some(r =&gt; {
                            const rawResponse = r.full_response || r.response || r.content || &apos;&apos;;
                            if (!rawResponse) return false;

                            const parsed = cleanLLMResponse(rawResponse);
                            const foundInMainOutput = parsed.content &amp;&amp; parsed.content.includes(triggerId);

                            if (foundInMainOutput) {
                                console.log(&apos;&#x2705; TRIGGER FOUND IN MAIN OUTPUT (poll #&apos; + pollCount + &apos;) &#x2013; request:&apos;, r.request_id);
                            } else if (parsed.content) {
                                console.log(&apos;&#x274C; Trigger not in main output yet (poll #&apos; + pollCount + &apos;). Last 200 chars of content:&apos;,
                                            parsed.content.slice(-200));
                            }
                            return foundInMainOutput;
                        });
                    });

                    if (!triggerDetected) {
                        console.log(`&#x23F3; Poll #${pollCount} &#x2013; trigger &quot;${triggerId}&quot; still not in main output`);
                    }
                }

                if (triggerDetected) {
                    console.log(&apos;&#x2705; Unique trigger&apos;, triggerId, &apos;detected in main output &#x2013; advancing to next prompt&apos;);
                } else {
                    console.warn(&apos;&#x26A0;&#xFE0F; Timeout waiting for trigger &#x2013; proceeding anyway&apos;);
                }

                updateDashboard(); // live queue update
            }

            isDoneWalkRunning = false;
            console.log(&apos;&#x1F389; Done-Walk sequence completed&apos;);
            alert(&quot;&#x2705; Done-Walk sequence completed.&quot;);
            updateDashboard();
        }
        function switchToActivePromptsTab() {
            const tabButton = document.querySelector(&apos;button[data-bs-target=&quot;#active-prompts&quot;]&apos;);
            if (tabButton) {
                const tab = new bootstrap.Tab(tabButton);
                tab.show();
                console.log(&apos;&#x1F504; Switched UI to Active Prompts tab&apos;);
            }
        }
        async function applyPrompts() {
            const selectedMCPAgents = Array.from(document.querySelectorAll(&apos;.agent-checkbox:checked&apos;)).map(cb =&gt; cb.value);
            const selectedLLMServers = Array.from(document.querySelectorAll(&apos;.server-checkbox:checked&apos;)).map(cb =&gt; cb.value);
            let rawText = document.getElementById(&apos;generated-prompts&apos;).value.trim();
            if (!rawText) return alert(&quot;No prompts generated.&quot;);

            // Extract prompts using the ----- separator key
            let prompts = rawText.split(&apos;-----&apos;)
                                 .map(p =&gt; p.trim())
                                 .filter(p =&gt; p.length &gt; 0);

            if (selectedLLMServers.length === 0) {
                return alert(&quot;Please select at least one LLM Server.&quot;);
            }

            const mode = document.querySelector(&apos;input[name=&quot;dist-mode&quot;]:checked&apos;).value;

            if (mode === &quot;DONE_WALK&quot;) {
                // Sequential Done-Walk mode (unchanged)
                const queueWithTriggers = prompts.map(prompt =&gt; ({
                    prompt: prompt,
                    trigger: generate10CharTrigger()
                }));

                doneWalkQueue = queueWithTriggers;
                console.log(&apos;&#x2705; Done-Walk queue initialized with&apos;, queueWithTriggers.length, &apos;prompts and triggers&apos;);
                runDoneWalkSequence();
                switchToActivePromptsTab();
                return;
            }

            // === NEW BEHAVIOR: ALL_GET_ALL now distributes prompts EQUALLY across selected servers ===
            console.log(&apos;&#x1F4E4; Distributing&apos;, prompts.length, &apos;prompts equally across&apos;, selectedLLMServers.length, &apos;LLM servers&apos;);

            const numServers = selectedLLMServers.length;
            const promptsPerServer = Math.ceil(prompts.length / numServers);

            // Split prompts evenly
            for (let i = 0; i &lt; numServers; i++) {
                const start = i * promptsPerServer;
                const end = Math.min(start + promptsPerServer, prompts.length);
                const serverPrompts = prompts.slice(start, end);

                if (serverPrompts.length === 0) break;

                const serverId = selectedLLMServers[i];

                console.log(`   &#x2192; Server ${i+1}/${numServers} (${serverId}) gets ${serverPrompts.length} prompts`);

                try {
                    await fetch(&apos;/api/distribute_prompts&apos;, {
                        method: &apos;POST&apos;,
                        headers: { &apos;Content-Type&apos;: &apos;application/json&apos; },
                        body: JSON.stringify({
                            prompts: serverPrompts,
                            selected_agents: selectedMCPAgents,
                            selected_servers: [serverId],           // one server at a time
                            distribution_mode: &quot;ALL_GET_ALL&quot;,
                            max_tokens: 2048
                        })
                    });
                } catch (e) {
                    console.error(&apos;&#x274C; Failed to send batch to server&apos;, serverId, e);
                }
            }

            alert(`&#x2705; Prompts distributed equally across ${numServers} selected LLM server${numServers &gt; 1 ? &apos;s&apos; : &apos;&apos;}!`);
            switchToActivePromptsTab();
            updateDashboard();
        }
        // Robust real-time SSE cleaner &#x2013; separates content and thinking
        // Robust real-time SSE cleaner &#x2013; separates content and thinking
        function cleanLLMResponse(rawText) {
            if (!rawText) return { content: &apos;&apos;, thinking: &apos;&apos; };

            let content = &apos;&apos;;
            let thinking = &apos;&apos;;

            const lines = rawText.split(&apos;\n&apos;);
            for (let line of lines) {
                line = line.trim();
                if (!line || line === &apos;data: [DONE]&apos; || line === &apos;data: {}&apos;) continue;

                if (line.startsWith(&apos;data: &apos;)) {
                    const jsonStr = line.substring(6).trim();
                    if (!jsonStr) continue;

                    try {
                        const parsed = JSON.parse(jsonStr);

                        const deltaContent = parsed.choices?.[0]?.delta?.content ||
                                            parsed.choices?.[0]?.message?.content || &apos;&apos;;
                        if (deltaContent) content += deltaContent;

                        const reasoning = parsed.choices?.[0]?.delta?.reasoning_content ||
                                         parsed.choices?.[0]?.delta?.thinking ||
                                         parsed.thinking || &apos;&apos;;
                        if (reasoning) thinking += reasoning;

                    } catch (e) {
                        content += jsonStr + &apos; &apos;;
                    }
                }
            }

            return {
                content: content.trim(),
                thinking: thinking.trim()
            };
        }
        async function updateDashboard() {
    try {
        const res = await fetch(&apos;/api/status&apos;);
        if (!res.ok) throw new Error(&apos;Failed to fetch status&apos;);
        const data = await res.json();

        currentAgents = data.mcp_agents || [];
        currentServers = data.llm_servers || [];
        renderAgentsTable();
        renderServersTable();
        renderAgentSelection();
        renderServerSelection();

        // ==================== ACTIVE PROMPTS RENDERING ====================
        let activeHtml = &apos;&apos;;
        let hasPrompts = false;

        // Persistent dropdown states
        if (typeof window.dropdownStates === &apos;undefined&apos;) {
            window.dropdownStates = {};
        }

        document.querySelectorAll(&apos;#active-content details&apos;).forEach(details =&gt; {
            const id = details.getAttribute(&apos;data-id&apos;);
            if (id) window.dropdownStates[id] = details.open;
        });

        for (const [key, info] of Object.entries(data)) {
            if ([&apos;mcp_agents&apos;, &apos;llm_servers&apos;, &apos;scheduled_tasks&apos;].includes(key)) continue;
            if (!info || !info.requests || !info.requests.length) continue;

            hasPrompts = true;
            activeHtml += `&lt;div class=&quot;card request-card mb-4&quot;&gt;
                &lt;div class=&quot;card-header bg-white&quot;&gt;
                    &lt;h5&gt;${key} &lt;span class=&quot;badge bg-info&quot;&gt;${info.active_requests || 0} active&lt;/span&gt;&lt;/h5&gt;
                &lt;/div&gt;
                &lt;div class=&quot;card-body&quot;&gt;`;

            info.requests.forEach((req) =&gt; {
                const progress = req.status === &apos;completed&apos; ? 100 : (req.status === &apos;monitoring&apos; ? 95 : 0);
                const statusClass = req.status === &apos;completed&apos; ? &apos;success&apos; : req.status === &apos;monitoring&apos; ? &apos;primary&apos; : &apos;secondary&apos;;

                const parsed = cleanLLMResponse(req.full_response || &apos;&apos;);
                const renderedMarkdown = marked.parse(parsed.content || &apos;Waiting for LLM response...&apos;);
                const renderedPrompt = marked.parse(req.prompt_preview || &apos;No prompt text available&apos;);
                const renderedThinking = parsed.thinking ? marked.parse(parsed.thinking) : &apos;&apos;;

                const promptId = `prompt-${key}-${req.request_id}`;
                const thinkingId = `thinking-${key}-${req.request_id}`;
                const responseId = `response-${key}-${req.request_id}`;

                const promptOpen = window.dropdownStates[promptId] !== false;
                const thinkingOpen = window.dropdownStates[thinkingId] !== false;
                const responseOpen = window.dropdownStates[responseId] !== false;

                activeHtml += `
                    &lt;div class=&quot;mb-4 border rounded&quot;&gt;
                        &lt;div class=&quot;p-3 d-flex justify-content-between align-items-center&quot;&gt;
                            &lt;strong&gt;${req.request_id}&lt;/strong&gt;
                            &lt;span class=&quot;badge bg-${statusClass}&quot;&gt;${req.status.toUpperCase()}&lt;/span&gt;
                        &lt;/div&gt;

                        &lt;details class=&quot;px-3 pb-3&quot; data-id=&quot;${promptId}&quot; ${promptOpen ? &apos;open&apos; : &apos;&apos;}&gt;
                            &lt;summary class=&quot;d-flex justify-content-between align-items-center text-muted small cursor-pointer&quot; style=&quot;list-style:none;&quot;&gt;
                                &lt;span&gt;Full Prompt Text&lt;/span&gt;
                                &lt;span class=&quot;badge bg-light text-dark&quot;&gt;&#x25BC;&lt;/span&gt;
                            &lt;/summary&gt;
                            &lt;div class=&quot;mt-2 p-3 bg-light rounded border markdown-body&quot;&gt;
                                ${renderedPrompt}
                            &lt;/div&gt;
                        &lt;/details&gt;

                        ${renderedThinking ? `
                        &lt;details class=&quot;px-3 pb-3&quot; data-id=&quot;${thinkingId}&quot; ${thinkingOpen ? &apos;open&apos; : &apos;&apos;}&gt;
                            &lt;summary class=&quot;d-flex justify-content-between align-items-center text-muted small cursor-pointer&quot; style=&quot;list-style:none;&quot;&gt;
                                &lt;span&gt;Thinking / Reasoning&lt;/span&gt;
                                &lt;span class=&quot;badge bg-light text-dark&quot;&gt;&#x25BC;&lt;/span&gt;
                            &lt;/summary&gt;
                            &lt;div class=&quot;mt-2 p-3 bg-white border rounded markdown-body&quot;&gt;
                                ${renderedThinking}
                            &lt;/div&gt;
                        &lt;/details&gt;` : &apos;&apos;}

                        &lt;details class=&quot;px-3 pb-3&quot; data-id=&quot;${responseId}&quot; ${responseOpen ? &apos;open&apos; : &apos;&apos;}&gt;
                            &lt;summary class=&quot;d-flex justify-content-between align-items-center text-muted small cursor-pointer&quot; style=&quot;list-style:none;&quot;&gt;
                                &lt;span&gt;LLM Response (Live)&lt;/span&gt;
                                &lt;span class=&quot;badge bg-light text-dark&quot;&gt;&#x25BC;&lt;/span&gt;
                            &lt;/summary&gt;
                            &lt;div class=&quot;mt-2 p-3 bg-white border rounded markdown-body&quot; style=&quot;min-height: 120px;&quot;&gt;
                                ${renderedMarkdown}
                            &lt;/div&gt;
                        &lt;/details&gt;

                        &lt;div class=&quot;px-3 pb-3&quot;&gt;
                            &lt;div class=&quot;progress mt-2&quot; style=&quot;height:10px;&quot;&gt;
                                &lt;div class=&quot;progress-bar bg-primary&quot; style=&quot;width:${progress}%&quot;&gt;&lt;/div&gt;
                            &lt;/div&gt;
                            &lt;div class=&quot;row mt-3 text-center small&quot;&gt;
                                &lt;div class=&quot;col&quot;&gt;&lt;strong&gt;${(req.total_bytes || 0).toLocaleString()}&lt;/strong&gt;&lt;br&gt;Bytes&lt;/div&gt;
                                &lt;div class=&quot;col&quot;&gt;&lt;strong&gt;${(req.estimated_tokens || 0).toLocaleString()}&lt;/strong&gt;&lt;br&gt;Tokens&lt;/div&gt;
                                &lt;div class=&quot;col&quot;&gt;&lt;strong&gt;${(req.duration || 0).toFixed(1)}s&lt;/strong&gt;&lt;br&gt;Duration&lt;/div&gt;
                            &lt;/div&gt;
                        &lt;/div&gt;
                    &lt;/div&gt;`;
            });

            activeHtml += `&lt;/div&gt;&lt;/div&gt;`;
        }

        if (!hasPrompts) {
            activeHtml = `&lt;div class=&quot;text-center py-5 text-muted&quot;&gt;
                &lt;i class=&quot;bi bi-inbox display-4 mb-3 d-block&quot;&gt;&lt;/i&gt;
                No active prompts at the moment.
            &lt;/div&gt;`;
        }

        document.getElementById(&apos;active-content&apos;).innerHTML = activeHtml;

        // ==================== DONE WALK QUEUE TAB ====================
        const queueHtml = doneWalkQueue.length &gt; 0 ?
            `&lt;div class=&quot;alert alert-info&quot;&gt;
                &lt;strong&gt;Done-Walk Queue (${doneWalkQueue.length} remaining)&lt;/strong&gt;
                &lt;ul class=&quot;list-group mt-2&quot;&gt;
                    ${doneWalkQueue.map((item, i) =&gt; {
                        const preview = item.prompt.substring(0, 80);
                        return `&lt;li class=&quot;list-group-item&quot;&gt;
                            &lt;strong&gt;${i+1}.&lt;/strong&gt; ${preview}${item.prompt.length &gt; 80 ? &apos;...&apos; : &apos;&apos;}&lt;br&gt;
                            &lt;span class=&quot;badge bg-primary&quot;&gt;Trigger: &lt;code&gt;${item.trigger}&lt;/code&gt;&lt;/span&gt;
                        &lt;/li&gt;`;
                    }).join(&apos;&apos;)}
                &lt;/ul&gt;
            &lt;/div&gt;` :
            `&lt;div class=&quot;text-center py-5 text-muted&quot;&gt;
                &lt;i class=&quot;bi bi-inbox display-4 mb-3 d-block&quot;&gt;&lt;/i&gt;
                No prompts in Done-Walk queue.
            &lt;/div&gt;`;

        document.getElementById(&apos;done-walk-content&apos;).innerHTML = queueHtml;

        document.getElementById(&apos;last-updated&apos;).textContent = `Last updated: just now`;
    } catch (e) {
        console.error(e);
        document.getElementById(&apos;active-content&apos;).innerHTML = `&lt;div class=&quot;alert alert-danger&quot;&gt;Failed to load data from server.&lt;/div&gt;`;
    }
}


    document.addEventListener(&apos;DOMContentLoaded&apos;, function () {
        console.log(&apos;&#x1F680; Page loaded &#x2013; fetching all data from backend&apos;);

        // 1. Load everything on first page load
        updateDashboard();

        // 2. Refresh when user switches to MCP Agents or LLM Servers tab
        const mcpTabBtn = document.querySelector(&apos;button[data-bs-target=&quot;#mcp-agents&quot;]&apos;);
        const serversTabBtn = document.querySelector(&apos;button[data-bs-target=&quot;#llm-servers&quot;]&apos;);

        if (mcpTabBtn) {
            mcpTabBtn.addEventListener(&apos;shown.bs.tab&apos;, () =&gt; {
                console.log(&apos;MCP Agents tab opened &#x2013; refreshing data&apos;);
                updateDashboard();
            });
        }
        if (serversTabBtn) {
            serversTabBtn.addEventListener(&apos;shown.bs.tab&apos;, () =&gt; {
                console.log(&apos;LLM Servers tab opened &#x2013; refreshing data&apos;);
                updateDashboard();
            });
        }

        // Optional: live refresh every 8 seconds (uncomment if desired)
        // setInterval(updateDashboard, 8000);
    });

    // Auto-refresh every 800ms so statistics and LLM answer flow live
    setInterval(updateDashboard, 800);
    &lt;/script&gt;
&lt;/body&gt;
&lt;/html&gt;</code></pre><h3 id="bulding-into-a-docker-container">Bulding into a docker container!</h3><ul><li><strong>Create requirements.txt and put in it:</strong></li></ul><pre><code class="language-txt">Flask&gt;=3.0.0
aiohttp&gt;=3.10.0
requests&gt;=2.32.0
</code></pre><ul><li><strong>Next create a file named Dockerfile and put in it:</strong></li></ul><pre><code class="language-dockerfile">FROM python:3.12-slim

# Install minimal system dependencies (if required by future extensions)
RUN apt-get update &amp;&amp; apt-get install -y --no-install-recommends \
    &amp;&amp; rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code (including templates/ directory)
COPY . .

# Ensure the templates directory exists (in case it is missing)
RUN mkdir -p templates

# Expose the port used by the Flask application
EXPOSE 5012

# Run the application
CMD [&quot;python&quot;, &quot;app.py&quot;]
</code></pre><ul><li><strong>Lastly create a docker-compose.yml and put in it:</strong></li></ul><pre><code class="language-yaml">version: &apos;3.8&apos;

services:
  mcp-dashboard:
    build: .
    container_name: mcp-dashboard
    ports:
      - &quot;5012:5012&quot;
    volumes:
      # Persist pickled state files (mcp_agents_state.pkl and llm_servers_state.pkl)
      - ./state:/app
      # Mount templates directory for the dashboard HTML
      - ./templates:/app/templates
    restart: unless-stopped
    environment:
      - PYTHONUNBUFFERED=1
</code></pre><h3 id="usage-instructions">Usage Instructions</h3><ol><li>Save the provided source code as <strong><code>app.py</code></strong> in the project root.</li><li>Create a <strong><code>templates/</code></strong> directory in the project root and place <code>dashboard.html</code> inside it (the code calls <code>render_template(&apos;dashboard.html&apos;)</code>).</li><li>Create a <strong><code>state/</code></strong> directory in the project root (this will store the persistent <code>.pkl</code> files).</li><li>Build and run the container:</li></ol><pre><code class="language-bash">docker compose up --build -d
</code></pre><ol><li>Access the dashboard at <code>http://localhost:5012</code>.</li></ol><p>The setup uses a lightweight Python 3.12 image, installs only the required packages, preserves application state across container restarts, and follows Docker best practices for a Flask + asyncio application. No additional services are required.</p><p></p><p>We recommend doing this inside pycharm. For the less initiated it will handle your environment setup etc.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/pycharm-community-fast-install/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Pycharm Community Fast Install with Install bash Script. Speed Boosting with 10-Core Configuration.</div><div class="kg-bookmark-description">Pycharm Community Fast Install with Install bash Script.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT."><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2025/09/Screenshot-at-2025-09-13-12-40-33.png" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT."></div></a></figure><p>One it runs it will show this in the console:</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/05/image-10.png" class="kg-image" alt="LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT." loading="lazy" width="892" height="192" srcset="https://www.hotconfig.com/content/images/size/w600/2026/05/image-10.png 600w, https://www.hotconfig.com/content/images/2026/05/image-10.png 892w" sizes="(min-width: 720px) 720px"></figure><p>You simply access it at it&apos;s &#xA0;end-point golden!!</p>]]></content:encoded></item><item><title><![CDATA[Agentic Server Primer: Llama.cpp MCP Lesson 10: mcp-coder (Cuda Version)]]></title><description><![CDATA[We build a MCP Coding Agent that will allow your LLM to specifically work on and debug it's own code with nvcc, or really any language!]]></description><link>https://www.hotconfig.com/agentic-server-primer-4/</link><guid isPermaLink="false">69feab139e9ad20001df404a</guid><category><![CDATA[MCP Server]]></category><category><![CDATA[agentic server]]></category><category><![CDATA[Llama.cpp]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Sat, 09 May 2026 04:26:17 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/05/lesson_10.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/05/lesson_10.png" alt="Agentic Server Primer: Llama.cpp MCP Lesson 10: mcp-coder (Cuda Version)"><p>This is a specialized MCP agent in that it is designed for your LLM to work with, pull, compile and develop it&apos;s own software - specifically for Nvidia nvcc Cuda! &#xA0; It is effectively its entire own build agent! However the potential is utterly amazing, it can effectively have your LLM writing advanced GPU code (aka configuring it&apos;s own Llama.cpp!)</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/mtp-turboquant-forked-llama-cpp/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">MTP / TurboQuant Forked Llama.cpp</div><div class="kg-bookmark-description">We hot compile one of the first combo MTP / TurboQuant forks in the world!</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="Agentic Server Primer: Llama.cpp MCP Lesson 10: mcp-coder (Cuda Version)"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/05/kg0MD.jpg" alt="Agentic Server Primer: Llama.cpp MCP Lesson 10: mcp-coder (Cuda Version)"></div></a></figure><ul><li>In the above link we looked at one of the worlds first MTP/TurboQuant forks of Llama.cpp but stopped because it was not specifically for a Qwen3.6 which because of it&apos;s advanced nature we really wanted to keep. </li><li>Our goal is to see if a HouseLLM can compile a MTP (Multiple-Token-Prediction) cross-blend of TurboQuant forked Llama.cpp and get it to work with Qwen3.6! </li><li>This is a very challenging MCP Agent that we had to rewrite close to a dozen times. In the end it worked! The <em>issue is that mcp calls can often fail and finally every single mcp end point required using two formats. This gives higher compatibility with many calling LLM&apos;s, thusly:</em></li></ul><p>All tools now support <strong>both</strong> calling styles:</p><ul><li>Normal parameters: read_file(&quot;script.py&quot;)</li><li>Dictionary input: read_file({&quot;file_path&quot;: &quot;script.py&quot;})</li></ul><p>If you want to simply pull and run this mcp-coder:</p><pre><code class="language-bash">#!/bin/bash

CONTAINER_NAME=&quot;mcp-coder&quot;

# Cleanup previous container
docker stop $CONTAINER_NAME 2&gt;/dev/null
docker rm $CONTAINER_NAME 2&gt;/dev/null

# Ensure workspace directory exists
mkdir -p ~/mcp-workspace
chmod -R 755 ~/mcp-workspace

# Pull latest version
docker pull cnmcdee/mcp-coder:latest

# Start container
docker run -d \
--name $CONTAINER_NAME \
--restart unless-stopped \
-p 5011:5011 \
-v ~/mcp-workspace:/work_path \
--env PYTHONUNBUFFERED=1 \
cnmcdee/mcp-coder:latest

echo &quot;&#x2705; MCP Coder container started successfully!&quot;
echo &quot;&#x1F310; Access URL: http://localhost:5011/mcp&quot;
echo &quot;&#x1F4CB; Logs: docker logs -f $CONTAINER_NAME&quot;</code></pre><p>We used &#xA0;python as a pass-through, with it&apos;s own CORS http point. &#xA0;There are a couple paths one can take when you are making these, namely:</p><ul><li>You give &#xA0;it a number of guided mcp command end-points representing system operations like &apos;git push&apos; , &apos;git pull&apos; &#xA0;- but then you need to do it for all of them, or:</li><li>Or you simply give it the ability to read, write, open files, and execute system commands knowing it has the agentic ability to conceptually understand what it is doing. &#xA0;We choose this path because these LLM&apos;s are simply that good. </li></ul><h3 id="source-code">Source Code</h3><pre><code class="language-python">from fastmcp import FastMCP
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
import uvicorn
import subprocess
import os
from pathlib import Path
import re
import json

# =============================================================================
# Enhanced MCP Server - Restricted to Specific Target Directory
# Optimized for reliable tool calling by Qwen3.6 and other LLMs
# =============================================================================

mcp = FastMCP(name=&quot;Target Directory MCP Server&quot;)

# &#x2500;&#x2500; Configuration &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;

# CHANGE THIS TO YOUR DESIRED TARGET DIRECTORY
BASE_DIRECTORY = &quot;/work_path&quot;

# Security limits
MAX_FILE_SIZE = 10000 * 1024 * 1024      # 10 GB
MAX_OUTPUT_LENGTH = 50000000000           # 50 KB output limit

# All operations are forced inside BASE_DIRECTORY
ALLOWED_DIRECTORIES = [BASE_DIRECTORY]


# &#x2500;&#x2500; Helper Functions &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;

def validate_path(file_path):
    &quot;&quot;&quot;Force all paths to be inside the BASE_DIRECTORY.&quot;&quot;&quot;
    if not os.path.isabs(file_path):
        file_path = os.path.join(BASE_DIRECTORY, file_path)

    path = Path(file_path).resolve()
    base_path = Path(BASE_DIRECTORY).resolve()

    if not str(path).startswith(str(base_path)):
        raise ValueError(f&quot;Path must be inside the target directory: {BASE_DIRECTORY}&quot;)

    if &apos;..&apos; in path.parts:
        raise ValueError(&quot;Path traversal detected&quot;)

    return path


def truncate_output(output, max_length=MAX_OUTPUT_LENGTH):
    &quot;&quot;&quot;Truncate output if it exceeds the maximum length.&quot;&quot;&quot;
    if len(output) &lt;= max_length:
        return output

    truncated = output[:max_length]
    last_newline = truncated.rfind(&apos;\n&apos;)
    if last_newline &gt; max_length * 0.8:
        truncated = truncated[:last_newline + 1]

    return truncated + f&quot;\n\n[Output truncated to {max_length} characters]&quot;


# &#x2500;&#x2500; File System Tools &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;

@mcp.tool
def read_file(file_path):
    &quot;&quot;&quot;
    Read the entire content of a file inside the target directory.

    Parameters:
    - file_path (string): Relative or absolute path to the file.
                         Example: &quot;script.py&quot; or &quot;folder/subfolder/file.txt&quot;

    Returns: The full text content of the file as a string.
    &quot;&quot;&quot;
    try:
        path = validate_path(file_path)

        if not path.is_file():
            return f&quot;Error: File not found: {file_path}&quot;

        file_size = path.stat().st_size
        if file_size &gt; MAX_FILE_SIZE:
            return f&quot;Error: File too large ({file_size} bytes). Maximum allowed: {MAX_FILE_SIZE} bytes.&quot;

        return path.read_text(encoding=&quot;utf-8&quot;)

    except Exception as e:
        return f&quot;Error reading file: {str(e)}&quot;


@mcp.tool
def write_file(file_path, content, mode=&quot;w&quot;):
    &quot;&quot;&quot;
    Write or append text content to a file.

    Parameters:
    - file_path (string): Path to the file (relative or absolute).
    - content (string): The text you want to write.
    - mode (string): &quot;w&quot; to overwrite (default) or &quot;a&quot; to append.

    Example:
    write_file(&quot;notes.txt&quot;, &quot;Hello world&quot;, &quot;w&quot;)
    &quot;&quot;&quot;
    if mode not in [&quot;w&quot;, &quot;a&quot;]:
        return &quot;Error: mode must be &apos;w&apos; or &apos;a&apos;&quot;

    try:
        path = validate_path(file_path)
        path.parent.mkdir(parents=True, exist_ok=True)

        with open(path, mode, encoding=&quot;utf-8&quot;) as f:
            f.write(content)

        action = &quot;overwritten&quot; if mode == &quot;w&quot; else &quot;appended to&quot;
        return f&quot;Successfully {action}: {file_path}&quot;

    except Exception as e:
        return f&quot;Error writing file: {str(e)}&quot;


@mcp.tool
def delete_file(file_path):
    &quot;&quot;&quot;
    Delete a file.

    Parameters:
    - file_path (string): Path to the file to delete.
    &quot;&quot;&quot;
    try:
        path = validate_path(file_path)

        if not path.is_file():
            return f&quot;Error: File not found: {file_path}&quot;

        path.unlink()
        return f&quot;Successfully deleted: {file_path}&quot;

    except Exception as e:
        return f&quot;Error deleting file: {str(e)}&quot;


@mcp.tool
def replace_line(file_path, line_number, new_content):
    &quot;&quot;&quot;
    Replace a specific line in a file by line number.

    Parameters:
    - file_path (string): Path to the file.
    - line_number (integer): Line number to replace (starts at 1).
    - new_content (string): New text for that line.

    Example: replace_line(&quot;main.py&quot;, 42, &quot;    print(&apos;Updated&apos;)&quot;)
    &quot;&quot;&quot;
    try:
        path = validate_path(file_path)

        if not path.is_file():
            return f&quot;Error: File not found: {file_path}&quot;

        lines = path.read_text(encoding=&quot;utf-8&quot;).splitlines(keepends=True)

        if line_number &lt; 1 or line_number &gt; len(lines):
            return f&quot;Error: Line number {line_number} is out of range. File has {len(lines)} lines.&quot;

        original_ending = lines[line_number - 1][-1:] if lines[line_number - 1] else &apos;\n&apos;
        lines[line_number - 1] = new_content.rstrip() + original_ending

        path.write_text(&apos;&apos;.join(lines), encoding=&quot;utf-8&quot;)
        return f&quot;Successfully replaced line {line_number} in {file_path}&quot;

    except Exception as e:
        return f&quot;Error replacing line: {str(e)}&quot;


@mcp.tool
def list_directory(directory=&quot;.&quot;):
    &quot;&quot;&quot;
    List files and folders in a directory.

    Parameters:
    - directory (string): Optional. Directory to list. Default is current directory &quot;.&quot;.

    Returns: List of strings in format &quot;D/foldername&quot; or &quot;F/filename&quot;.
    &quot;&quot;&quot;
    try:
        path = validate_path(directory)

        if not path.is_dir():
            return [&quot;Error: Not a directory&quot;]

        items = []
        for item in sorted(path.iterdir()):
            prefix = &quot;D/&quot; if item.is_dir() else &quot;F/&quot;
            items.append(f&quot;{prefix}{item.name}&quot;)

        return items

    except Exception as e:
        return [f&quot;Error: {str(e)}&quot;]


@mcp.tool
def get_file_info(file_path):
    &quot;&quot;&quot;
    Get detailed information about a file or directory.

    Parameters:
    - file_path (string): Path to the file or folder.
    &quot;&quot;&quot;
    try:
        path = validate_path(file_path)

        if not path.exists():
            return {&quot;error&quot;: f&quot;Path not found: {file_path}&quot;}

        stat = path.stat()
        return {
            &quot;path&quot;: str(path),
            &quot;exists&quot;: True,
            &quot;is_file&quot;: path.is_file(),
            &quot;is_dir&quot;: path.is_dir(),
            &quot;size_bytes&quot;: stat.st_size,
            &quot;modified_timestamp&quot;: stat.st_mtime,
            &quot;permissions&quot;: oct(stat.st_mode)[-3:],
            &quot;name&quot;: path.name
        }

    except Exception as e:
        return {&quot;error&quot;: str(e)}


@mcp.tool
def search_files(directory=&quot;.&quot;, pattern=&quot;*&quot;):
    &quot;&quot;&quot;
    Recursively search for files matching a pattern.

    Parameters:
    - directory (string): Starting directory. Default &quot;.&quot;.
    - pattern (string): Glob pattern. Examples: &quot;*.py&quot;, &quot;*.txt&quot;, &quot;config*.json&quot;

    Returns: List of matching file paths.
    &quot;&quot;&quot;
    try:
        path = validate_path(directory)

        if not path.is_dir():
            return [f&quot;Error: Not a directory: {directory}&quot;]

        matches = [str(item) for item in path.rglob(pattern) if item.is_file()]
        return sorted(matches)

    except Exception as e:
        return [f&quot;Error: {str(e)}&quot;]


# &#x2500;&#x2500; Shell Command Tool &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;

@mcp.tool
def run_command(command, cwd=None, timeout=180):
    &quot;&quot;&quot;
    Execute a shell command inside the restricted /work_path directory.

    Parameters:
    - command: Can be either:
        - A string: &quot;ls -la&quot;
        - Or a dictionary: {&quot;command&quot;: &quot;ls -la&quot;, &quot;cwd&quot;: &quot;subfolder&quot;, &quot;timeout&quot;: 60}
    - cwd (string, optional): Working directory.
    - timeout (integer, optional): Maximum time in seconds.

    Returns: Detailed output including STDOUT, STDERR, and return code.
    &quot;&quot;&quot;
    try:
        # === Handle dictionary input (for models that pass one dict) ===
        if isinstance(command, dict):
            data = command
            command = data.get(&quot;command&quot;) or data.get(&quot;cmd&quot;)
            cwd = data.get(&quot;cwd&quot;) or cwd
            timeout = data.get(&quot;timeout&quot;) or timeout

        # Ensure command is a string
        if not isinstance(command, str):
            return f&quot;Error: &apos;command&apos; must be a string or a dict containing &apos;command&apos;. Got: {type(command)}&quot;

        # Set default working directory
        if cwd is None:
            cwd = BASE_DIRECTORY
        else:
            cwd_path = validate_path(cwd)
            cwd = str(cwd_path)

        # Basic safety check
        dangerous = [&quot;&amp;&amp;&quot;, &quot;;&quot;, &quot;|&quot;, &quot;&gt;&quot;, &quot;&lt;&quot;, &quot;&gt;&gt;&quot;, &quot;sudo&quot;, &quot;su &quot;, &quot;rm -rf /&quot;, &quot;mkfs&quot;, &quot;shutdown&quot;]
        for pattern in dangerous:
            if pattern in command.lower():
                return f&quot;Error: Dangerous command pattern detected: &apos;{pattern}&apos;&quot;

        # Execute command
        result = subprocess.run(
            command,
            shell=True,
            cwd=cwd,
            capture_output=True,
            text=True,
            timeout=timeout
        )

        stdout = truncate_output(result.stdout)
        stderr = truncate_output(result.stderr)

        # Build response
        response_parts = [
            f&quot;Command: {command}&quot;,
            f&quot;Working directory: {cwd}&quot;,
            f&quot;Return code: {result.returncode}&quot;
        ]

        if stdout.strip():
            response_parts.append(f&quot;\nSTDOUT:\n{stdout}&quot;)
        else:
            response_parts.append(&quot;\nSTDOUT: (no output)&quot;)

        if stderr.strip():
            response_parts.append(f&quot;\nSTDERR:\n{stderr}&quot;)
        else:
            response_parts.append(&quot;\nSTDERR: (no output)&quot;)

        return &quot;\n&quot;.join(response_parts)

    except subprocess.TimeoutExpired:
        return f&quot;Error: Command timed out after {timeout} seconds.&quot;
    except Exception as e:
        return f&quot;Error executing command: {str(e)}&quot;
# &#x2500;&#x2500; Server Setup &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;

if __name__ == &quot;__main__&quot;:
    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=[&quot;*&quot;],
            allow_credentials=False,
            allow_methods=[&quot;GET&quot;, &quot;POST&quot;, &quot;OPTIONS&quot;],
            allow_headers=[&quot;*&quot;],
            expose_headers=[&quot;*&quot;],
            max_age=3600,
        )
    ]

    app = mcp.http_app(
        path=&quot;/mcp&quot;,
        middleware=middleware
    )

    print(&quot;&#x1F680; Starting Target Directory Restricted MCP Server&quot;)
    print(f&quot;&#x2192; All operations restricted to: {BASE_DIRECTORY}&quot;)
    print(&quot;&#x2192; Tools available: read_file, write_file, delete_file, replace_line,&quot;)
    print(&quot;                   list_directory, get_file_info, search_files, run_command&quot;)

    uvicorn.run(app, host=&quot;0.0.0.0&quot;, port=5011, log_level=&quot;info&quot;)</code></pre><p><strong>Here are the complete files you need to run the MCP Server in Docker:</strong></p><h3 id="1-requirementstxt">1. <code>requirements.txt</code></h3><pre><code class="language-txt">fastmcp
uvicorn[standard]
starlette
</code></pre><h3 id="2-dockerfile">2. <code>Dockerfile</code></h3><pre><code class="language-Dockerfile">FROM nvidia/cuda:13.1.2-devel-ubuntu22.04

WORKDIR /app

ENV DEBIAN_FRONTEND=noninteractive \
    TZ=UTC \
    PYTHONUNBUFFERED=1

# System packages + Python
RUN apt-get update &amp;&amp; \
    apt-get install -y --no-install-recommends \
    software-properties-common sudo git build-essential cmake ninja-build curl wget ca-certificates tzdata &amp;&amp; \
    add-apt-repository ppa:deadsnakes/ppa -y &amp;&amp; \
    apt-get update &amp;&amp; \
    apt-get install -y --no-install-recommends \
    python3.11 python3.11-venv python3.11-dev python3-pip &amp;&amp; \
    rm -rf /var/lib/apt/lists/* &amp;&amp; \
    update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 &amp;&amp; \
    update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1 &amp;&amp; \
    update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1

# Create work directory
RUN mkdir -p /work_path &amp;&amp; chown -R 1000:1000 /work_path &amp;&amp; chmod -R 755 /work_path

COPY requirements.txt .
RUN python -m pip install --no-cache-dir -r requirements.txt

COPY server.py .

RUN useradd -m -u 1000 -s /bin/bash mcpuser &amp;&amp; \
    echo &quot;mcpuser ALL=(ALL) NOPASSWD:ALL&quot; &gt;&gt; /etc/sudoers &amp;&amp; \
    chown -R mcpuser:mcpuser /app

USER mcpuser

EXPOSE 5011

# More explicit CMD + shell wrapper for better error messages
CMD [&quot;python&quot;, &quot;server.py&quot;]</code></pre><h3 id="3-docker-composeyml">3. <code>docker-compose.yml</code></h3><pre><code class="language-yaml">version: &apos;3.9&apos;

services:
  mcp-server:
    build: .
    container_name: mcp-server
    restart: unless-stopped
    ports:
      - &quot;5011:5011&quot;
    volumes:
      # Mount your target project directory into the container
      - ./target-project:/app/project
    environment:
      - PYTHONUNBUFFERED=1
    # Optional: Add GPU support if you need CUDA/nvcc inside the container
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: all
    #           capabilities: [gpu]
</code></pre><hr><h3 id="how-to-use">How to Use</h3><p><strong>Save the files</strong>:</p><ul><li><code>requirements.txt</code></li><li><code>Dockerfile</code></li><li><code>docker-compose.yml</code></li></ul><p><strong>Rename your server code</strong> to <code>server.py</code> (or update the Dockerfile accordingly).</p><p><strong>Update the target directory</strong> in <code>server.py</code>:</p><pre><code class="language-python">BASE_DIRECTORY = &quot;/app/project&quot;   # This matches the volume mount
</code></pre><p><strong>Create the project folder</strong> (next to docker-compose.yml):</p><pre><code class="language-bash">mkdir target-project
</code></pre><p><strong>Start the server</strong>:</p><pre><code class="language-bash">docker-compose up --build
</code></pre><p>Or in detached mode:</p><pre><code class="language-bash">docker-compose up -d --build
</code></pre><h3 id="once-its-working-the-fun-really-starts">Once its Working the Fun Really Starts!</h3><pre><code class="language-text">Using Target Directory Server explore all its available tools and test their functionality. With it and the Process Manager tool the goal is to make a combined llama.cpp that combines these two repositories

TurboQuant Base: https://github.com/TheTom/llama-cpp-turboquant.git (use the feature/turboquant-kv-cache branch if available, otherwise main)
MTP + TurboQuant Combined Variant: https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant.git (this fork already includes both TurboQuant and Gemma-4-style MTP support)  The objective is to specifically create a Llama.cpp that can do BOTH MTP AND TUROQUANT specfically for Qwen3.6

You can install upgrade or do whatever you need inside the Target Directory Server, including git pull, and it already has a full build environment with nvcc. As you go document and save your progress in detail to the Process Manager with frequent updates to your tasks. Make sure the tasks are detailed enough that if you cannot complete this you can do it again on the next task. make sure it can compile and fix anything that won&apos;t.</code></pre>]]></content:encoded></item><item><title><![CDATA[MTP / TurboQuant Forked Llama.cpp]]></title><description><![CDATA[We hot compile one of the first combo MTP / TurboQuant forks in the world!]]></description><link>https://www.hotconfig.com/mtp-turboquant-forked-llama-cpp/</link><guid isPermaLink="false">69fd6a549e9ad20001df3fe9</guid><category><![CDATA[MTP]]></category><category><![CDATA[TurboQuant]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Fri, 08 May 2026 04:48:48 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/05/kg0MD.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/05/kg0MD.jpg" alt="MTP / TurboQuant Forked Llama.cpp"><p>When Turboquant hit it was BIG. Google&apos;s new method of reducing the KV_Cache in models allowed much larger contexts. &#xA0;Suddenly what was the dream of enterprise LLM&apos;s was now a house GPU. &#xA0;Things didn&apos;t slow down there with MTP (Multiple Token Prediction) following suite - that ran parallel prediction threads allowing for speed ups up to 2x! &#xA0;</p><p>Here is the guide!</p><p></p><p><strong>Comprehensive Installation Script for AtomicBot-ai/atomic-llama-cpp-turboquant (TurboQuant + Gemma 4 MTP)</strong></p><p>This guide provides a single, self-contained Bash script that installs all required system dependencies, clones the repository, configures the build with maximum supported backends, compiles the project, and verifies the installation. The script is designed for <strong>Ubuntu/Debian-based Linux distributions</strong> (most common for servers and workstations). Adaptations for other platforms are noted at the end.</p><h3 id="full-automated-installation-script">Full Automated Installation Script</h3><p>Save the following as <code>install-llama-turboquant.sh</code> and run it with administrator privileges where needed.</p><pre><code class="language-bash">#!/bin/bash
# =============================================================================
# Automated Installation Script for Atomic llama.cpp (TurboQuant + MTP)
# Optimized for RTX 4080 - CUDA Primary
# =============================================================================

set -e

echo &quot;=== Starting Installation for RTX 4080 (CUDA + TurboQuant + MTP) ===&quot;

# 1. System dependencies
echo &quot;Installing system dependencies...&quot;
sudo apt-get update
sudo apt-get install -y \
    build-essential cmake git python3 python3-pip ninja-build \
    libblas-dev liblapack-dev pkg-config curl wget \
    libssl-dev zlib1g-dev

# 2. CUDA Toolkit (required for RTX 4080)
echo &quot;Installing NVIDIA CUDA Toolkit...&quot;
if ! command -v nvcc &gt;/dev/null 2&gt;&amp;1; then
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    sudo apt-get update
    sudo apt-get install -y cuda-toolkit-12-4
    echo &apos;export PATH=/usr/local/cuda/bin:$PATH&apos; &gt;&gt; ~/.bashrc
    echo &apos;export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH&apos; &gt;&gt; ~/.bashrc
    source ~/.bashrc
fi

# 3. Clone the repository
echo &quot;Cloning and preparing the fork...&quot;
git clone https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant.git
cd atomic-llama-cpp-turboquant
git checkout feature/turboquant-kv-cache
git submodule update --init --recursive

# 4. Build with CUDA (optimized for RTX 4080)
echo &quot;Building with CUDA for RTX 4080...&quot;
mkdir -p build &amp;&amp; cd build

cmake .. -DCMAKE_BUILD_TYPE=Release \
  -DGGML_CUDA=ON \
  -DCMAKE_CUDA_ARCHITECTURES=89 \     # Specific for RTX 4080 / 4090
  -DLLAMA_FLASH_ATTN=ON \
  -DGGML_VULKAN=OFF \                 # Disable Vulkan (not needed)
  -DGGML_HIP=OFF \
  -DGGML_METAL=OFF \
  -G Ninja

ninja -j$(nproc)

echo &quot;=== Build completed successfully! ===&quot;

# 5. Verification
echo &quot;Verifying build...&quot;
./bin/llama-cli --version
echo &quot;CUDA devices detected:&quot;
./bin/llama-cli -ngl 99 --version | grep -i cuda || echo &quot;CUDA ready&quot;

echo &quot;=== Installation finished for RTX 4080 ===&quot;
echo &quot;Binaries are in: $(pwd)/bin&quot;</code></pre><h3 id="how-to-use-the-script">How to Use the Script</h3><p>Make the script executable and run it:</p><pre><code class="language-bash">chmod +x install-llama-turboquant.sh
sudo ./install-llama-turboquant.sh
</code></pre><h3 id="errors-try-these-fixes">Errors? Try these fixes:</h3><p><strong>Diagnosis of the CMake CUDA Error</strong></p><p>The error occurs because <strong>CMake is detecting the wrong <code>nvcc</code></strong> (at <code>/usr/bin/nvcc</code>, which is a broken or outdated symlink/installation) instead of the proper CUDA Toolkit located in <code>/usr/local/cuda</code>. Additionally, CUDA 13.2 combined with the current CMake version is causing flag incompatibilities (<code>--static-host-stub</code> is not recognized during the compiler ID test).</p><h3 id="quick-fix-updated-build-script-for-rtx-4080">Quick Fix: Updated Build Script for RTX 4080</h3><p>Replace your previous build section with this corrected version. Run the commands below <strong>after</strong> you have the repository cloned.</p><pre><code class="language-bash">cd atomic-llama-cpp-turboquant

# Clean previous failed build
rm -rf build
mkdir -p build &amp;&amp; cd build

echo &quot;=== Configuring CMake with explicit CUDA paths for RTX 4080 ===&quot;

cmake .. -DCMAKE_BUILD_TYPE=Release \
  -DGGML_CUDA=ON \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \     # &#x2190; Explicit fix
  -DCMAKE_CUDA_ARCHITECTURES=89 \                      # RTX 4080 (Ada)
  -DLLAMA_FLASH_ATTN=ON \
  -DGGML_VULKAN=OFF \
  -DGGML_HIP=OFF \
  -DGGML_METAL=OFF \
  -G Ninja

ninja -j$(nproc)
</code></pre><h3 id="if-the-above-still-fails-%E2%80%93-full-recovery-steps">If the Above Still Fails &#x2013; Full Recovery Steps</h3><p>Run these commands in order:</p><p><strong>Verify CUDA Installation</strong></p><pre><code class="language-bash">/usr/local/cuda/bin/nvcc --version
nvidia-smi
</code></pre><p><strong>Fix Environment Variables</strong> (add to <code>~/.bashrc</code> and run <code>source ~/.bashrc</code>)</p><pre><code class="language-bash">export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
</code></pre><p><strong>Re-run CMake with more explicit settings</strong></p><pre><code class="language-bash">cmake .. -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
  -DCMAKE_CUDA_ARCHITECTURES=89 \
  -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
  -DLLAMA_CUDA=ON \
  -DLLAMA_FLASH_ATTN=ON \
  -G Ninja
</code></pre><p><strong>Alternative: Use a Compatible CMake Version</strong><br>If you are using a very new CMake (4.x), downgrade temporarily:</p><pre><code class="language-bash">sudo apt remove --purge cmake -y
sudo apt install cmake=3.28.* -y   # or install via pip: pip install cmake
</code></pre><h3 id="recommended-full-test-command-after-successful-build">Recommended Full Test Command After Successful Build</h3><pre><code class="language-bash">./bin/llama-cli --version
./bin/llama-cli -m your-model.gguf -ngl 99 -ctk turbo3 -ctv turbo3 -fa on -p &quot;Test&quot;
</code></pre><p>Would you like me to provide a <strong>simpler one-line build command</strong> or switch to a <strong>Docker-based solution</strong> (which avoids all these system CUDA issues)? Just paste the exact output if the error persists after trying the explicit <code>CMAKE_CUDA_COMPILER</code> flag.</p><p>After completion, test TurboQuant:</p><pre><code class="language-bash">cd atomic-llama-cpp-turboquant/build
./bin/llama-cli -m your-model.gguf -c 32768 -ngl 99 \
  --cache-type-k turbo3 --cache-type-v turbo3 -fa 1 -p &quot;Test prompt&quot;
</code></pre><p>For Gemma 4 MTP (recommended):</p><ul><li>Download target model and assistant head from AtomicChat collections on Hugging Face.</li><li>Use the provided helper scripts in the repository (<code>scripts/run-gemma4-mtp-server.sh</code>).</li></ul><h3 id="platform-specific-notes">Platform-Specific Notes</h3><p><strong>macOS (Apple Silicon):</strong></p><ul><li>Install Xcode Command Line Tools: <code>xcode-select --install</code></li><li>Use Homebrew: <code>brew install cmake git ninja</code></li><li>The script&#x2019;s Metal flag (<code>-DGGML_METAL=ON</code>) will enable optimized TurboFlash kernels.</li><li>Run the CMake step without CUDA/HIP.</li></ul><p><strong>Windows:</strong></p><ul><li>Use PowerShell with Visual Studio Build Tools or WSL2 (recommended).</li><li>A community PowerShell script exists for ROCm on Windows.</li></ul><p><strong>AMD ROCm (HIP):</strong></p><ul><li>Install ROCm first via official AMD instructions, then enable <code>-DGGML_HIP=ON</code>.</li></ul><h3 id="post-installation-recommendations">Post-Installation Recommendations</h3><ul><li><strong>Update regularly</strong>: <code>git pull &amp;&amp; git submodule update --init --recursive</code> followed by rebuild.</li><li><strong>Pre-built binaries</strong>: Check the repository&#x2019;s Releases page for ready-to-use versions on supported platforms.</li><li><strong>Documentation</strong>: Refer to <code>MTP.md</code>, <code>docs/build.md</code>, and <code>docs/speculative.md</code> in the cloned repository for advanced configuration.</li></ul><p>This script ensures a complete, production-ready environment with all required libraries and backends. If you encounter platform-specific issues or require Docker/containerized version, provide your operating system and hardware details for further customization.</p><p><strong>Recommended MTP Settings for AtomicBot-ai/atomic-llama-cpp-turboquant</strong></p><p>The fork includes dedicated support for <strong>Gemma 4 Multi-Token Prediction (MTP)</strong> speculative decoding. It loads the official small assistant head (drafter) via a separate GGUF file and achieves <strong>~30&#x2013;50% throughput gains</strong> on short-to-medium prompts with negligible quality loss.</p><h3 id="core-mtp-flags-essential">Core MTP Flags (Essential)</h3><p>Use these in combination with TurboQuant KV cache:</p><pre><code class="language-bash">--mtp-head /path/to/gemma-4-*-assistant.Q4_K_M.gguf \
--spec-type mtp \
--draft-block-size 3 \
--draft-max 8 \
--draft-min 0
</code></pre><h3 id="recommended-full-command-balanced-performance">Recommended Full Command (Balanced Performance)</h3><pre><code class="language-bash">./build/bin/llama-server \
  -m /path/to/gemma-4-31B-it-Q4_K_M.gguf \
  --mtp-head /path/to/gemma-4-31B-it-assistant.Q4_K_M.gguf \
  --spec-type mtp \
  --draft-block-size 3 --draft-max 8 --draft-min 0 \
  -ngl 99 -ngld 99 \
  -ctk turbo3 -ctv turbo3 -ctkd turbo3 -ctvd turbo3 \
  -fa on \
  -c 16384 -b 2048 -ub 512 \
  --temp 0.7 --top-p 0.95
</code></pre><h3 id="key-mtp-parameter-explanations-and-tuning">Key MTP Parameter Explanations and Tuning</h3><ul><li><strong><code>--spec-type mtp</code></strong>: Enables Gemma 4&#x2013;specific MTP speculative decoding.</li><li><strong><code>--mtp-head</code></strong>: Path to the dedicated assistant/drafter GGUF (highly recommended: Q4_K_M or Q5_K_M for best speed/quality balance).</li><li><strong><code>--draft-block-size 3</code></strong>: Number of tokens the drafter predicts per step (default 3 works well; 2 for more conservative, 4 for aggressive).</li><li><strong><code>--draft-max 8</code></strong>: Maximum number of draft tokens verified per step (8&#x2013;12 for throughput; lower for higher acceptance rate).</li><li><strong><code>--draft-min 0</code></strong>: Minimum draft tokens (0 is standard).</li><li><strong><code>-ngld 99</code></strong>: Full GPU offload for the drafter head (critical for performance).</li></ul><h3 id="preset-scripts-in-the-repository">Preset Scripts in the Repository</h3><p>The fork includes ready-made launchers:</p><ul><li><code>scripts/run-gemma4-31b-mtp-server.sh</code></li><li>Environment variable: <code>MTP_PRESET=throughput|lift|balanced|quality</code></li></ul><p>These presets automatically adjust draft parameters for different priorities (e.g., maximum speed vs. best acceptance rate).</p><h3 id="additional-performance-tips">Additional Performance Tips</h3><ul><li>Combine with <strong>TurboQuant</strong>: <code>-ctk turbo3 -ctv turbo3</code> (and the <code>-ctkd</code>/<code>-ctvd</code> variants for drafter) for extreme context scaling.</li><li>Use <strong>Flash Attention</strong>: <code>-fa on</code>.</li><li>For longer contexts, increase <code>-c</code> (up to 131072+ with TurboQuant).</li><li>Lower temperature (e.g., <code>--temp 0.1&#x2013;0.4</code>) often yields better speculative acceptance rates.</li></ul><p>These settings provide strong gains while remaining stable. For the absolute latest recommendations, consult the repository&#x2019;s <code>MTP.md</code> file after cloning.</p><p>If you provide your hardware (GPU/VRAM), target model size, and primary goal (maximum speed, longest context, or quality), I can refine the command further.</p><h2 id="observations">Observations</h2><ul><li>Once we had it working we wanted to keep it away from our &apos;clean&apos; turboquant llama.cpp that we had been using therefore we went into the build path and gave it its own absolute referenced directory so:</li></ul><pre><code class="language-bash">cd build
sudo mkdir /bin/atomic
sudo cp * /bin/atomic</code></pre><p>It was simply a matter of explicit referencing it from our models directory by copying &#xA0;an existing model script so. &#xA0;We had always liked the Qwen3.6-35B for solid reliable local performance and had a MTP enabled copy handy, we gave it it&apos;s own script to see how it would do.</p><ul><li>Note we have about 6-8 sub-agents that give it agentic workflows.</li><li>We are still working this guide for now. </li></ul><h3 id="we-stopped-here-why">WE STOPPED HERE WHY?</h3><ul><li>Qwen3.6 was significantly better at coding. however no combo-MTP-TurboQuant option exists for it yet that we know of. Because of this we are monitoring for now, but we leave this guide up for anyone that is looking to work with Gemma!</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/05/image-2.png" class="kg-image" alt="MTP / TurboQuant Forked Llama.cpp" loading="lazy" width="738" height="449" srcset="https://www.hotconfig.com/content/images/size/w600/2026/05/image-2.png 600w, https://www.hotconfig.com/content/images/2026/05/image-2.png 738w" sizes="(min-width: 720px) 720px"></figure>]]></content:encoded></item><item><title><![CDATA[docker-compose.yml -> docker run Converter]]></title><description><![CDATA[This page is a bookmarker.  Need a docker-compose.yml converted to a docker run command on the fly? Here you go!]]></description><link>https://www.hotconfig.com/docker-compose-yml-docker-run-converter/</link><guid isPermaLink="false">69fd3ff29e9ad20001df3fb7</guid><category><![CDATA[docker-compose.yml]]></category><category><![CDATA[docker run]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Fri, 08 May 2026 01:46:09 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/05/docker_bot.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/05/docker_bot.jpg" alt="docker-compose.yml -&gt; docker run Converter"><p>Book mark this. &#xA0;Need to convert a <code>docker-compose.yml</code> to <code>docker run</code> type command?</p><!--kg-card-begin: html--><div style="max-width: 100%; margin: 2em 0; padding: 20px; border: 1px solid #ddd; border-radius: 8px; background: #f9f9f9; font-family: system-ui, sans-serif;">
  <h3 style="margin-top: 0;">Docker Compose &#x2192; Docker Run Converter (Ultra Robust)</h3>
  <p style="margin-bottom: 15px;">Paste your <code>docker-compose.yml</code> content. Indentation is now largely ignored.</p>
  
  <textarea id="yamlInput" rows="24" style="width: 100%; 
                   font-family: monospace; 
                   padding: 14px; 
                   border: 2px solid #555; 
                   border-radius: 6px; 
                   resize: vertical; 
                   box-sizing: border-box;
                   background: #ffffff !important;
                   color: #000000 !important;
                   font-size: 15px;
                   line-height: 1.5;" placeholder="Paste your docker-compose.yml here..."></textarea>
  
  <div style="margin: 15px 0;">
    <button onclick="convertCompose()" style="padding: 12px 24px; background: #0d6efd; color: white; border: none; border-radius: 4px; cursor: pointer; font-size: 16px;">
      Convert to docker run
    </button>
    <button onclick="clearAll()" style="padding: 12px 24px; background: #6c757d; color: white; border: none; border-radius: 4px; cursor: pointer; font-size: 16px; margin-left: 10px;">
      Clear
    </button>
    <button onclick="forcePaste()" style="padding: 12px 24px; background: #17a2b8; color: white; border: none; border-radius: 4px; cursor: pointer; font-size: 16px; margin-left: 10px;">
      Paste from Clipboard
    </button>
  </div>
  
  <pre id="output" style="background: #1e1e1e; 
              color: #d4d4d4; 
              padding: 16px; 
              border: 1px solid #444; 
              border-radius: 6px; 
              white-space: pre-wrap; 
              overflow-x: auto; 
              display: none; 
              max-height: 700px; 
              font-size: 14px;
              line-height: 1.5;"></pre>
  
  <div id="status" style="margin-top: 12px; font-size: 0.95em;"></div>
</div>

<script src="https://cdnjs.cloudflare.com/ajax/libs/js-yaml/4.1.0/js-yaml.min.js"></script>

<script>
  function convertCompose() {
    let input = document.getElementById('yamlInput').value;
    const outputEl = document.getElementById('output');
    const statusEl = document.getElementById('status');
    
    if (!input.trim()) {
      statusEl.innerHTML = '<span style="color: red;">Please paste docker-compose content.</span>';
      outputEl.style.display = 'none';
      return;
    }
    
    try {
      // === Extremely tolerant line-by-line parser ===
      const lines = input.split('\n');
      let services = {};
      let currentService = null;
      let currentKey = null;

      for (let line of lines) {
        const trimmed = line.trim();
        if (!trimmed || trimmed.startsWith('#')) continue;

        // Detect services section
        if (trimmed.toLowerCase() === 'services:' || trimmed.startsWith('services:')) {
          currentKey = 'services';
          continue;
        }

        // New service name (something ending with :)
        if (trimmed.endsWith(':') && !trimmed.startsWith('-') && currentKey === 'services') {
          currentService = trimmed.replace(/:$/, '').trim();
          services[currentService] = {
            build: null,
            image: null,
            container_name: null,
            restart: null,
            ports: [],
            environment: [],
            volumes: [],
            networks: [],
            user: null,
            working_dir: null,
            entrypoint: null,
            command: null
          };
          continue;
        }

        if (!currentService) continue;

        // Parse keys under current service
        if (trimmed.includes(':')) {
          const [key, ...valueParts] = trimmed.split(':');
          const keyClean = key.trim();
          const value = valueParts.join(':').trim();

          if (keyClean === 'build') services[currentService].build = value || '.';
          else if (keyClean === 'image') services[currentService].image = value;
          else if (keyClean === 'container_name') services[currentService].container_name = value;
          else if (keyClean === 'restart') services[currentService].restart = value;
          else if (keyClean === 'user') services[currentService].user = value;
          else if (keyClean === 'working_dir') services[currentService].working_dir = value;
          else if (keyClean === 'entrypoint') services[currentService].entrypoint = value;
          else if (keyClean === 'command') services[currentService].command = value;
        }

        // List items (ports, environment, volumes, networks)
        if (trimmed.startsWith('- ')) {
          const value = trimmed.substring(2).trim();
          if (value) {
            // Try to guess the list type from context or just add to all possible
            services[currentService].ports.push(value);
            services[currentService].environment.push(value);
            services[currentService].volumes.push(value);
            services[currentService].networks.push(value);
          }
        }
      }

      if (Object.keys(services).length === 0) {
        throw new Error('No services detected. Try pasting the file again or check the format.');
      }

      // Generate commands
      let results = [];
      for (const [name, svc] of Object.entries(services)) {
        let linesOut = [`# === Service: ${name} ===`];

        if (svc.build) {
          linesOut.push(`# First build the image:`);
          linesOut.push(`docker build -t ${name}:latest ${svc.build}`);
          linesOut.push('');
          linesOut.push(`docker run -d \\`);
        } else {
          linesOut.push(`docker run -d \\`);
        }

        const cname = svc.container_name || name;
        linesOut.push(`  --name ${cname} \\`);

        if (svc.restart) linesOut.push(`  --restart ${svc.restart} \\`);
        if (svc.user) linesOut.push(`  --user "${svc.user}" \\`);
        if (svc.working_dir) linesOut.push(`  --workdir "${svc.working_dir}" \\`);

        // Ports (deduplicate)
        [...new Set(svc.ports.filter(p => p && p.includes(':')))].forEach(p => {
          linesOut.push(`  -p "${p}" \\`);
        });

        // Environment
        [...new Set(svc.environment.filter(e => e && (e.includes('=') || e.includes(':'))))].forEach(e => {
          linesOut.push(`  -e "${e}" \\`);
        });

        // Volumes
        [...new Set(svc.volumes.filter(v => v && (v.includes(':') || v.startsWith('/'))))].forEach(v => {
          linesOut.push(`  -v "${v}" \\`);
        });

        // Networks
        [...new Set(svc.networks.filter(n => n))].forEach(n => {
          linesOut.push(`  --network ${n} \\`);
        });

        // Final image or command
        if (svc.build) {
          linesOut.push(`  ${name}:latest`);
        } else if (svc.image) {
          linesOut.push(`  ${svc.image}`);
        } else {
          linesOut.push(`  <image-name>`);
        }

        results.push(linesOut.join('\n'));
      }

      outputEl.textContent = results.join('\n\n');
      outputEl.style.display = 'block';
      statusEl.innerHTML = `<span style="color: green;">✓ Converted ${results.length} service(s) using tolerant parser.</span>`;

    } catch (err) {
      statusEl.innerHTML = `<span style="color: red;">Error: ${err.message}</span>`;
      outputEl.style.display = 'none';
    }
  }

  function clearAll() {
    document.getElementById('yamlInput').value = '';
    document.getElementById('output').style.display = 'none';
    document.getElementById('status').textContent = '';
  }

  async function forcePaste() {
    try {
      const text = await navigator.clipboard.readText();
      document.getElementById('yamlInput').value = text;
      document.getElementById('status').innerHTML = '<span style="color: green;">✓ Pasted from clipboard.</span>';
    } catch (err) {
      document.getElementById('status').innerHTML = '<span style="color: red;">Clipboard access denied. Please use Ctrl + V.</span>';
    }
  }

  document.getElementById('yamlInput').addEventListener('keydown', function(e) {
    if (e.ctrlKey && e.key === 'Enter') convertCompose();
  });
</script><!--kg-card-end: html-->]]></content:encoded></item><item><title><![CDATA[Agentic Server Primer: Llama.cpp MCP Lesson 9: Docker Orchestrator]]></title><description><![CDATA[In this guide we go over letting your llm manage and create it's own docker images, stand up it's own containers after writing it's code. It uses a special docker-compose tool we built for it.]]></description><link>https://www.hotconfig.com/agentic-server-primer-llama-cpp-mcp-lesson-9-docker-orchestrator/</link><guid isPermaLink="false">69e4d9c99e9ad20001df3ea1</guid><category><![CDATA[docker]]></category><category><![CDATA[LLMs]]></category><category><![CDATA[Orchestrator]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Sun, 19 Apr 2026 14:27:43 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/04/less9.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/04/less9.png" alt="Agentic Server Primer: Llama.cpp MCP Lesson 9: Docker Orchestrator"><p>In Lessons 1-8 we covered everything from a scientific calculator, to python compilation, and today we will be looking at rolling your own docker orchestrator. &#xA0;</p><p>If you just need to pull and run this docker image:</p><pre><code class="language-bash">docker pull docker.io/cnmcdee/mcp-docker-orchestrator:latest

docker run -d \
  --name mcp-docker-orchestrator \
  --restart unless-stopped \
  -p 0.0.0.0:5010:5010 \
  -e &quot;FLASH_ENV=production&quot; \
  -e ENV_SERVER=&quot;${ENV_SERVER}&quot; \
  -e ENV_USER=&quot;${ENV_USER}&quot; \
  -e ENV_PASSWORD=&quot;${ENV_PASSWORD:-}&quot; \
  -e ENV_PORT=&quot;${ENV_PORT:-22}&quot; \
  cnmcdee/mcp-docker-orchestrator</code></pre><ul><li>This is very powerful, not only can your LLM write and test it&apos;s own code using the other MCP tools, it can then successfully stand up the code into a running container. </li><li>We noted there was some challenges getting the LLM to see the docker endpoint, and or it took some tries. it would suggest that the number of training tokens that LLM&apos;s receive in this field may be sparse. &#xA0;</li></ul><p>Let&apos;s get started!</p><h3 id="a-prerequisites">A. Prerequisites</h3><p>To understand all the moving parts we will preface this with all the commands that this MCP agent is capable of. &#xA0;Because it requires careful prompting to work effectively, here is it&apos;s tool list (Written by Qwen 3.6)</p><p>Here are the Docker tools available to you, organized by functionality:</p><h3 id="%F0%9F%96%BC%EF%B8%8F-image-management">&#x1F5BC;&#xFE0F; Image Management</h3><ul><li><strong><code>docker_images</code></strong> &#x2013; List all Docker images present on the remote server</li><li><strong><code>docker_pull</code></strong> &#x2013; Pull an image (or specific tag) from a registry to the remote server</li><li><strong><code>docker_build</code></strong> &#x2013; Build a Docker image from a <code>Dockerfile</code> in a specified context directory</li></ul><h3 id="%F0%9F%93%A6-container-management">&#x1F4E6; Container Management</h3><ul><li><strong><code>docker_ps</code></strong> &#x2013; List running containers (set <code>all=True</code> to include stopped ones)</li><li><strong><code>docker_run</code></strong> &#x2013; Create &amp; start a new container (supports port mappings, env vars, volumes, custom commands)</li><li><strong><code>docker_stop</code></strong> &#x2013; Stop a running container</li><li><strong><code>docker_start</code></strong> &#x2013; Start a stopped container</li><li><strong><code>docker_restart</code></strong> &#x2013; Restart a container</li><li><strong><code>docker_rm</code></strong> &#x2013; Remove containers (use <code>force=True</code> to remove running ones)</li><li><strong><code>docker_logs</code></strong> &#x2013; Fetch logs from a container (supports <code>tail</code> line limit and <code>follow</code> streaming)</li></ul><h3 id="%F0%9F%93%9D-docker-compose-management">&#x1F4DD; Docker Compose Management</h3><ul><li><strong><code>docker_compose_up</code></strong> &#x2013; Start services defined in a <code>docker-compose.yml</code></li><li><strong><code>docker_compose_down</code></strong> &#x2013; Stop &amp; remove containers, networks, and optionally named volumes</li><li><strong><code>docker_compose_build</code></strong> &#x2013; Build or rebuild services defined in a compose file</li><li><strong><code>docker_compose_ps</code></strong> &#x2013; List containers for a specific compose project</li><li><strong><code>docker_compose_logs</code></strong> &#x2013; View logs from compose services (supports filtering by service &amp; follow mode)</li><li><strong><code>docker_compose_command</code></strong> &#x2013; Execute any arbitrary <code>docker compose</code> subcommand with custom arguments</li><li><strong><code>docker_compose_deploy</code></strong> &#x2013; Fully deploy an app by uploading <code>Dockerfile</code>, <code>requirements.txt</code>, <code>app.py</code>, and <code>docker-compose.yml</code> to <code>~/docker/{project_name}</code>, then building &amp; running it</li></ul><p>&#x1F4A1; <strong>Note:</strong> All Docker tools execute commands on the remote server configured via your global SSH session.</p><h3 id="a-docker-controller-model">A. Docker Controller Model</h3><p>The docker controller model can seem complex - and we will illustrate the moving parts.</p><ul><li> User enters prompt to the llm web face at their end point (192.168.1.4:8080), that becomes a json object which is inferenced by the LLM. &#xA0;It examines its available tool list and uses the MCP docker tool. &#xA0;The MCP docker tool is holding a ssh via a paramiko pipe to the working server. &#xA0;This could theoretically be docker-in-docker but for simplicity sake we just gave it it&apos;s own server. &#xA0;If you have a spare 4 core laptop working that is a perfect candidate for this. </li><li>The llm recursively attempts the tools, and it will received json object feedback via the MCP agent to it&apos;s own progress</li><li>The end-user can watch the docker process and docker image lists to see if it is successfully building images and or standing up containers.</li></ul><pre><code class="language-bash">&lt;Long pre-prompt of building software&gt;
When you are done building this software use the docker tool to create a Dockerfile, a requirements.txt a app.py and a docker-compose.yml. Build an image and verify it&apos;s there, then stand up that image on port 7001.  Finally using the web requests tool make sure it is running at the server point of 192.168.1.4:7001</code></pre><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/d_infra.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 9: Docker Orchestrator" loading="lazy" width="792" height="1022" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/d_infra.png 600w, https://www.hotconfig.com/content/images/2026/04/d_infra.png 792w" sizes="(min-width: 720px) 720px"></figure><h3 id="setting-the-environment">Setting the Environment</h3><p></p><ul><li>Because one docker container needs to ssh to remove one layer of complexity you can have it work without a password during the testing phase:</li></ul><pre><code class="language-bash">ssh-keygen   # will make a password
ssh-copy-id you@192.168.1.3  # will allow passwordless access.</code></pre><ul><li>Please note we are also <em>approaching production - so you will need to pass environment variables that hold the ENV for the remote server. For instance:</em></li></ul><pre><code class="language-bash">ENV_PASSWORD = os.environ.get(&apos;ENV_PASSWORD&apos;)
ENV_SERVER = os.environ.get(&apos;ENV_SERVER&apos;)
ENV_USER = os.environ.get(&apos;ENV_USER&apos;)</code></pre><p>Thus an example run command for the python could be a script that is simply:</p><pre><code class="language-bash">export ENV_PASSWORD=&apos;your docker server password&apos;
export ENV_SERVER=&apos;192.168.1.4&apos; # Or wherever it lives
export ENV_USER=&apos;user&apos;
python3 mcp_agent.py  #Inside it uses os.environ.get to retreive </code></pre><h3 id="full-code">Full Code</h3><pre><code class="language-bash">from fastmcp import FastMCP
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
import uvicorn
import paramiko
import threading
import os
import traceback
import textwrap
import yaml

# &#x2500;&#x2500; Global SSH Session Manager &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
ssh_sessions = {}
session_lock = threading.Lock()
ENV_PASSWORD = os.environ.get(&apos;ENV_PASSWORD&apos;)
ENV_SERVER = os.environ.get(&apos;ENV_SERVER&apos;)
ENV_USER = os.environ.get(&apos;ENV_USER&apos;)
ENV_PORT = 22


def get_or_create_ssh_session(server: str, username: str, password: str = None, key_path: str = None, port: int = 22) -&gt; str:
    &quot;&quot;&quot;Create or retrieve a persistent SSH session to a remote server.

    Maintains a thread-safe pool of Paramiko SSHClient connections keyed by
    ``server:port:username``. This avoids the overhead of establishing a new
    connection for every command and supports keep-alive packets for long-lived
    sessions.

    Args:
        server: Hostname or IP address of the remote server.
        username: SSH login username.
        password: Password for authentication. Mutually exclusive with ``key_path``.
        key_path: Absolute path to an SSH private key file on the local machine.
        port: SSH port number.

    Returns:
        str: Unique session identifier in the format ``f&quot;{server}:{port}:{username}&quot;``.

    Note:
        This is an internal function used by the global SSH session manager and
        the Docker command helpers.
    &quot;&quot;&quot;
    session_id = f&quot;{server}:{port}:{username}&quot;
    with session_lock:
        if session_id not in ssh_sessions:
            client = paramiko.SSHClient()
            client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
            connect_kwargs = {
                &apos;hostname&apos;: server, &apos;port&apos;: port, &apos;username&apos;: username,
                &apos;timeout&apos;: 10, &apos;allow_agent&apos;: False, &apos;look_for_keys&apos;: False
            }
            if password:
                connect_kwargs[&apos;password&apos;] = password
            elif key_path:
                connect_kwargs[&apos;key_filename&apos;] = key_path
            client.connect(**connect_kwargs)
            client.get_transport().set_keepalive(15)  # Keep connection alive
            ssh_sessions[session_id] = client
        return session_id
def close_ssh_session(session_id: str) -&gt; None:
    &quot;&quot;&quot;Close and remove a persistent SSH session from the global cache.

    Args:
        session_id: The session identifier returned by ``get_or_create_ssh_session``.

    Note:
        This is an internal function for explicit resource cleanup.
    &quot;&quot;&quot;
    with session_lock:
        if session_id in ssh_sessions:
            ssh_sessions[session_id].close()
            del ssh_sessions[session_id]
def ssh_execute(session_id: str, command: str) -&gt; str:
    &quot;&quot;&quot;Execute a shell command on the remote server using a persistent SSH session.

    Args:
        session_id: The session identifier returned by ``get_or_create_ssh_session``.
        command: The shell command to execute (may contain pipes, redirection, etc.).

    Returns:
        str: Combined stdout and stderr output prefixed with labels, or an error
             message if the session is unavailable or execution fails.
    &quot;&quot;&quot;
    with session_lock:
        client = ssh_sessions.get(session_id)
        if not client:
            return &quot;Error: Session not found.&quot;
        try:
            stdin, stdout, stderr = client.exec_command(command)
            output = stdout.read().decode()
            errors = stderr.read().decode()
            return f&quot;stdout:\n{output}\nstderr:\n{errors}&quot;
        except Exception as e:
            return f&quot;Error executing command: {str(e)}&quot;
# &#x2500;&#x2500; SFTP Helper Functions (added for file deployment) &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
def get_sftp_client(session_id: str):
    &quot;&quot;&quot;Get an SFTP client from the persistent SSH session.&quot;&quot;&quot;
    with session_lock:
        client = ssh_sessions.get(session_id)
        if not client:
            return None
        try:
            return client.open_sftp()
        except Exception as e:
            print(&quot;Error: {e}&quot;)
            return None
def _get_ssh_stdout(result: str) -&gt; str:
    &quot;&quot;&quot;Safely extract only the clean stdout from ssh_execute output.&quot;&quot;&quot;
    if not result:
        return &quot;&quot;
    if &quot;stdout:&quot; in result:
        # Take everything after &quot;stdout:&quot; and before &quot;stderr:&quot;
        after_stdout = result.split(&quot;stdout:&quot;, 1)[1]
        clean = after_stdout.split(&quot;stderr:&quot;, 1)[0]
        return clean.strip()
    # Fallback
    return result.strip()
def upload_file_content(session_id: str, content: any, remote_path: str) -&gt; bool:
    &quot;&quot;&quot;Upload string content (or dict, auto-serialized to YAML for .yml/.yaml files)
    as a file to the remote server via SFTP using absolute paths only.
    Correctly parses the formatted output of ssh_execute so paths are never corrupted.
    &quot;&quot;&quot;
    sftp = get_sftp_client(session_id)
    if not sftp:
        print(&quot;Error: Could not obtain SFTP client.&quot;)
        return False

    try:
        print(f&quot;[DEBUG] Original remote_path: {remote_path}&quot;)

        # Resolve absolute home directory
        home_result = ssh_execute(session_id, &quot;echo -n $HOME&quot;)
        home_dir = _get_ssh_stdout(home_result)

        # Safety fallback
        if not home_dir or len(home_dir) &lt; 3:
            whoami_result = ssh_execute(session_id, &quot;whoami&quot;)
            username = _get_ssh_stdout(whoami_result)
            home_dir = f&quot;/home/{username}&quot;
            print(f&quot;[DEBUG] Home directory resolved to: {home_dir}&quot;)

        # Convert ~/... to absolute path
        if remote_path.startswith(&quot;~/&quot;):
            absolute_path = home_dir + remote_path[1:]
        else:
            absolute_path = remote_path

        print(f&quot;[DEBUG] Absolute remote path: {absolute_path}&quot;)

        # Ensure parent directory exists
        remote_dir = os.path.dirname(absolute_path)
        if remote_dir:
            mkdir_result = ssh_execute(session_id, f&quot;mkdir -p {remote_dir}&quot;)
            print(f&quot;[DEBUG] mkdir -p result: {_get_ssh_stdout(mkdir_result) or &apos;&lt;no output - success&gt;&apos;}&quot;)

        # Normalize content to string (YAML for compose files, plain str otherwise)
        if isinstance(content, dict):
            if absolute_path.lower().endswith((&apos;.yml&apos;, &apos;.yaml&apos;)):
                print(&quot;[DEBUG] Content is dict; serializing to YAML for docker-compose.yml&quot;)
                content_str = yaml.dump(
                    content,
                    default_flow_style=False,
                    sort_keys=False,
                    allow_unicode=True,
                    width=120
                )
            else:
                # Fallback for non-YAML files (rare)
                import json
                content_str = json.dumps(content, indent=2)
        elif isinstance(content, str):
            content_str = content
        else:
            # Graceful fallback for other types
            print(f&quot;[WARNING] Unexpected content type {type(content).__name__}; converting to str&quot;)
            content_str = str(content)

        # Encode to bytes for Paramiko
        content_bytes = content_str.encode(&apos;utf-8&apos;)

        print(f&quot;[DEBUG] Attempting upload to: {absolute_path} ({len(content_bytes)} bytes)&quot;)

        # Upload in binary mode
        with sftp.file(absolute_path, &apos;wb&apos;, 0o644) as f:
            f.write(content_bytes)

        print(f&quot;[SUCCESS] File uploaded successfully to {absolute_path}&quot;)
        return True

    except Exception as e:
        print(f&quot;[ERROR] Upload failed for original path: {remote_path}&quot;)
        print(f&quot;[ERROR] Exception type: {type(e).__name__}&quot;)
        print(f&quot;[ERROR] Exception message: {e}&quot;)
        traceback.print_exc()
        return False
def ensure_remote_directory(session_id: str, remote_dir: str) -&gt; str:
    &quot;&quot;&quot;Ensure a remote directory exists using mkdir -p.&quot;&quot;&quot;
    cmd = f&quot;mkdir -p {remote_dir}&quot;
    return ssh_execute(session_id, cmd)
# &#x2500;&#x2500; Auto-establish SSH Connection on Startup &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
GLOBAL_SSH_SESSION_ID = None

if ENV_SERVER and ENV_USER:
    try:
        GLOBAL_SSH_SESSION_ID = get_or_create_ssh_session(
            server=ENV_SERVER,
            username=ENV_USER,
            password=ENV_PASSWORD,
            port=ENV_PORT
        )
        if GLOBAL_SSH_SESSION_ID:
            print(f&quot;&#x2713; SSH connection established to {ENV_SERVER} as {ENV_USER} {GLOBAL_SSH_SESSION_ID}&quot;)
        else:
            print(&quot;Failed GLOBAL_SSH_SESSION_ID - Exiting..&quot;)
            exit(-1)
    except Exception as e:
        print(f&quot;&#x2717; Failed to establish SSH connection: {e}&quot;)
        GLOBAL_SSH_SESSION_ID = None
else:
    print(&quot;&#x26A0; Warning: ENV_SERVER and/or ENV_USER environment variables are not set.&quot;)
def _run_docker_command(cmd: str) -&gt; str:
    &quot;&quot;&quot;Execute a Docker command on the remote server via the global SSH session.

    All Docker-related tools delegate to this internal helper.

    Args:
        cmd: The complete Docker (or docker-compose) command string to execute.

    Returns:
        str: Command output (stdout + stderr) or an error message if the global
             SSH session is unavailable.

    Note:
        Requires the global SSH session established at module import time using
        the ``ENV_SERVER``, ``ENV_USER``, and optional ``ENV_PASSWORD`` environment
        variables.
    &quot;&quot;&quot;
    if GLOBAL_SSH_SESSION_ID is None:
        return &quot;Error: SSH session is not available. Check environment variables and connectivity.&quot;
    return ssh_execute(GLOBAL_SSH_SESSION_ID, cmd)

# &#x2500;&#x2500; FastMCP Server Setup &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
mcp = FastMCP(name=&quot;Docker Manager&quot;)
@mcp.tool
def docker_ps(all: bool = False) -&gt; str:
    &quot;&quot;&quot;List running (and optionally all) containers on the remote Docker host.

    Equivalent to ``docker ps`` or ``docker ps -a``.

    Args:
        all: If True, include stopped containers (adds the ``-a`` flag).

    Returns:
        str: Formatted output of the ``docker ps`` command.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    cmd = &quot;docker ps -a&quot; if all else &quot;docker ps&quot;
    return _run_docker_command(cmd)
@mcp.tool
def docker_images() -&gt; str:
    &quot;&quot;&quot;List all Docker images present on the remote server.

    Equivalent to ``docker images``.

    Returns:
        str: Formatted output of the ``docker images`` command.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    return _run_docker_command(&quot;docker images&quot;)
@mcp.tool
def docker_pull(image_name: str) -&gt; str:
    &quot;&quot;&quot;Pull a Docker image (or image:tag) from a registry to the remote server.

    Equivalent to ``docker pull &lt;image_name&gt;``.

    Args:
        image_name: Name of the image to pull (e.g., &quot;nginx:latest&quot; or &quot;myrepo/app&quot;).

    Returns:
        str: Output of the pull operation (progress and status messages).

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    return _run_docker_command(f&quot;docker pull {image_name}&quot;)
@mcp.tool
def docker_build(context_path: str, tag: str = None, dockerfile: str = &quot;Dockerfile&quot;,no_cache: bool = False) -&gt; str:
    &quot;&quot;&quot;Build a Docker image from a Dockerfile located on the remote server.

    Equivalent to ``docker build [OPTIONS] &lt;context_path&gt;``.

    Args:
        context_path: Build context directory on the remote server (absolute or relative path).
        tag: Tag to apply to the built image (e.g., &quot;myapp:v1&quot;).
        dockerfile: Name of the Dockerfile within the context (defaults to &quot;Dockerfile&quot;).
        no_cache: If True, do not use cache when building the image.

    Returns:
        str: Build output including progress and final image ID.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    cmd = [&quot;docker&quot;, &quot;build&quot;]
    if tag:
        cmd.extend([&quot;-t&quot;, tag])
    if dockerfile != &quot;Dockerfile&quot;:
        cmd.extend([&quot;-f&quot;, dockerfile])
    if no_cache:
        cmd.append(&quot;--no-cache&quot;)
    cmd.append(context_path)
    return _run_docker_command(&quot; &quot;.join(cmd))
@mcp.tool
def docker_run(image: str, name: str = None, detach: bool = True, ports: str = None,env: str = None, volumes: str = None, command: str = &quot;&quot;) -&gt; str:
    &quot;&quot;&quot;Create and start a new container from the specified image on the remote server.

    Equivalent to ``docker run [OPTIONS] IMAGE [COMMAND]``.

    Args:
        image: Docker image to run (e.g., &quot;nginx:latest&quot;).
        name: Assign a name to the container.
        detach: Run container in background (adds ``-d`` flag). Defaults to True.
        ports: Port mapping(s) in the format &quot;HOST_PORT:CONTAINER_PORT&quot; (e.g., &quot;8080:80&quot;).
        env: Environment variable(s) in the format &quot;KEY=value&quot;.
        volumes: Volume mount(s) in the format &quot;HOST_PATH:CONTAINER_PATH&quot;.
        command: Optional command and arguments to override the image&apos;s default CMD.

    Returns:
        str: Container ID (if detached) or full command output.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    cmd = [&quot;docker&quot;, &quot;run&quot;]
    if detach:
        cmd.append(&quot;-d&quot;)
    if name:
        cmd.extend([&quot;--name&quot;, name])
    if ports:
        cmd.extend([&quot;-p&quot;, ports])
    if env:
        cmd.extend([&quot;-e&quot;, env])
    if volumes:
        cmd.extend([&quot;-v&quot;, volumes])
    cmd.append(image)
    if command:
        cmd.append(command)
    return _run_docker_command(&quot; &quot;.join(cmd))
@mcp.tool
def docker_stop(container: str) -&gt; str:
    &quot;&quot;&quot;Stop a running container on the remote server.

    Equivalent to ``docker stop &lt;container&gt;``.

    Args:
        container: Container name or ID.

    Returns:
        str: Output confirming the container was stopped.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    return _run_docker_command(f&quot;docker stop {container}&quot;)
@mcp.tool
def docker_start(container: str) -&gt; str:
    &quot;&quot;&quot;Start a stopped container on the remote server.

    Equivalent to ``docker start &lt;container&gt;``.

    Args:
        container: Container name or ID.

    Returns:
        str: Output confirming the container was started.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    return _run_docker_command(f&quot;docker start {container}&quot;)
@mcp.tool
def docker_restart(container: str) -&gt; str:
    &quot;&quot;&quot;Restart a container on the remote server.

    Equivalent to ``docker restart &lt;container&gt;``.

    Args:
        container: Container name or ID.

    Returns:
        str: Output confirming the container was restarted.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    return _run_docker_command(f&quot;docker restart {container}&quot;)
@mcp.tool
def docker_rm(container: str, force: bool = False) -&gt; str:
    &quot;&quot;&quot;Remove one or more containers from the remote server.

    Equivalent to ``docker rm [-f] &lt;container&gt;``.

    Args:
        container: Container name or ID.
        force: If True, forcibly remove the container (adds ``-f`` flag).

    Returns:
        str: Output confirming removal.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    cmd = f&quot;docker rm {&apos;-f&apos; if force else &apos;&apos;} {container}&quot;.strip()
    return _run_docker_command(cmd)
@mcp.tool
def docker_logs(container: str, tail: int = 100, follow: bool = False) -&gt; str:
    &quot;&quot;&quot;Fetch logs from a container on the remote server.

    Equivalent to ``docker logs [--tail N] [-f] &lt;container&gt;``.

    Args:
        container: Container name or ID.
        tail: Number of lines to show from the end of the logs.
        follow: If True, follow log output (adds ``-f`` flag). Note that this
                will block until the connection is closed.

    Returns:
        str: Log output from the container.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    cmd = f&quot;docker logs --tail {tail}&quot;
    if follow:
        cmd += &quot; -f&quot;
    cmd += f&quot; {container}&quot;
    return _run_docker_command(cmd)
@mcp.tool
def docker_compose_up(compose_file: str = &quot;docker-compose.yml&quot;, detached: bool = True,build: bool = False, project_name: str = None) -&gt; str:
    &quot;&quot;&quot;Start services defined in a docker-compose file on the remote server.

    Equivalent to ``docker compose up [OPTIONS]``.

    Args:
        compose_file: Path to the Compose file (defaults to &quot;docker-compose.yml&quot;).
        detached: Run in detached mode (adds ``-d`` flag).
        build: Build images before starting (adds ``--build`` flag).
        project_name: Alternative project name (adds ``-p`` flag).

    Returns:
        str: Output from the compose up operation.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    cmd = [&quot;docker&quot;, &quot;compose&quot;]
    if compose_file:
        cmd.extend([&quot;-f&quot;, compose_file])
    if project_name:
        cmd.extend([&quot;-p&quot;, project_name])
    cmd.append(&quot;up&quot;)
    if detached:
        cmd.append(&quot;-d&quot;)
    if build:
        cmd.append(&quot;--build&quot;)
    return _run_docker_command(&quot; &quot;.join(cmd))
@mcp.tool
def docker_compose_down(compose_file: str = &quot;docker-compose.yml&quot;, remove_volumes: bool = False) -&gt; str:
    &quot;&quot;&quot;Stop and remove containers, networks, and optionally volumes for a compose project.

    Equivalent to ``docker compose down [-v]``.

    Args:
        compose_file: Path to the Compose file (defaults to &quot;docker-compose.yml&quot;).
        remove_volumes: If True, remove named volumes (adds ``-v`` flag).

    Returns:
        str: Output from the compose down operation.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    cmd = [&quot;docker&quot;, &quot;compose&quot;]
    if compose_file:
        cmd.extend([&quot;-f&quot;, compose_file])
    cmd.append(&quot;down&quot;)
    if remove_volumes:
        cmd.append(&quot;-v&quot;)
    return _run_docker_command(&quot; &quot;.join(cmd))
@mcp.tool
def docker_compose_build(compose_file: str = &quot;docker-compose.yml&quot;) -&gt; str:
    &quot;&quot;&quot;Build or rebuild services defined in a docker-compose file.

    Equivalent to ``docker compose build``.

    Args:
        compose_file: Path to the Compose file (defaults to &quot;docker-compose.yml&quot;).

    Returns:
        str: Build output from docker compose.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    cmd = [&quot;docker&quot;, &quot;compose&quot;]
    if compose_file:
        cmd.extend([&quot;-f&quot;, compose_file])
    cmd.append(&quot;build&quot;)
    return _run_docker_command(&quot; &quot;.join(cmd))
@mcp.tool
def docker_compose_ps(compose_file: str = &quot;docker-compose.yml&quot;) -&gt; str:
    &quot;&quot;&quot;List containers for a docker-compose project.

    Equivalent to ``docker compose ps``.

    Args:
        compose_file: Path to the Compose file (defaults to &quot;docker-compose.yml&quot;).

    Returns:
        str: Formatted list of compose project containers.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    cmd = [&quot;docker&quot;, &quot;compose&quot;]
    if compose_file:
        cmd.extend([&quot;-f&quot;, compose_file])
    cmd.append(&quot;ps&quot;)
    return _run_docker_command(&quot; &quot;.join(cmd))
@mcp.tool
def docker_compose_logs(compose_file: str = &quot;docker-compose.yml&quot;, service: str = None, follow: bool = False) -&gt; str:
    &quot;&quot;&quot;View output from services defined in a docker-compose file.

    Equivalent to ``docker compose logs [-f] [SERVICE]``.

    Args:
        compose_file: Path to the Compose file (defaults to &quot;docker-compose.yml&quot;).
        service: Optional service name to limit logs to.
        follow: If True, follow log output (adds ``-f`` flag).

    Returns:
        str: Log output from the compose services.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    cmd = [&quot;docker&quot;, &quot;compose&quot;]
    if compose_file:
        cmd.extend([&quot;-f&quot;, compose_file])
    cmd.append(&quot;logs&quot;)
    if follow:
        cmd.append(&quot;-f&quot;)
    if service:
        cmd.append(service)
    return _run_docker_command(&quot; &quot;.join(cmd))
@mcp.tool
def docker_compose_command(subcommand: str, arguments: str = &quot;&quot;,
                           compose_file: str = None) -&gt; str:
    &quot;&quot;&quot;Execute any arbitrary ``docker compose`` subcommand on the remote server.

    Provides maximum flexibility for operations not covered by the dedicated tools.

    Args:
        subcommand: The docker-compose subcommand (e.g., &quot;up&quot;, &quot;exec&quot;, &quot;config&quot;).
        arguments: Additional arguments and flags as a single string.
        compose_file: Optional path to the Compose file.

    Returns:
        str: Output from the executed docker compose command.

    Note:
        All commands execute on the remote server defined by the global SSH session.
    &quot;&quot;&quot;
    cmd = &quot;docker compose&quot;
    if compose_file:
        cmd += f&quot; -f {compose_file}&quot;
    cmd += f&quot; {subcommand} {arguments}&quot;.strip()
    return _run_docker_command(cmd)
# &#x2500;&#x2500; New Deployment Tool (fulfills the requested functionality) &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
@mcp.tool
def docker_compose_deploy(docker_name, dockerfile_content, requirements_txt_content, app_py_content, docker_compose_yml_content, detached = True, build = True):
    &quot;&quot;&quot;Fully deploy an application to the remote server.

    Uploads Dockerfile, requirements.txt, app.py and docker-compose.yml
    to ~/docker/{docker_name}, then runs `docker compose up --build -d`.

    This tool requires the four specified files as parameters and performs
    the complete build-and-activation sequence in the designated holding folder.

    Args:
        docker_name: Project name used for the holding folder ~/docker/{docker_name}.
        dockerfile_content: Complete content of the Dockerfile as string.
        requirements_txt_content: Complete content of requirements.txt.
        app_py_content: Complete content of app.py.
        docker_compose_yml_content: Complete content of docker-compose.yml.
        detached: Run services in detached mode.
        build: Build/rebuild images before starting.

    Returns:
        Detailed log of directory setup, file uploads and compose operation.
    &quot;&quot;&quot;
    if GLOBAL_SSH_SESSION_ID is None:
        return &quot;Error: SSH session is not available.&quot;

    remote_dir = f&quot;~/docker/{docker_name}&quot;
    outputs = [f&quot;Deploying project &apos;{docker_name}&apos; to remote directory: {remote_dir}&quot;]

    # Create directory
    dir_result = ensure_remote_directory(GLOBAL_SSH_SESSION_ID, remote_dir)
    outputs.append(f&quot;Directory setup:\n{dir_result}&quot;)

    # Upload files
    files_to_upload = {
        &quot;Dockerfile&quot;: dockerfile_content,
        &quot;requirements.txt&quot;: requirements_txt_content,
        &quot;app.py&quot;: app_py_content,
        &quot;docker-compose.yml&quot;: docker_compose_yml_content,
    }

    for filename, content in files_to_upload.items():
        remote_path = f&quot;{remote_dir}/{filename}&quot;
        success = upload_file_content(GLOBAL_SSH_SESSION_ID, content, remote_path)
        status = &quot;Uploaded successfully&quot; if success else &quot;Upload failed&quot;
        outputs.append(f&quot;{filename}: {status}&quot;)

    # Execute docker compose up
    compose_cmd = f&quot;cd {remote_dir} &amp;&amp; docker compose up&quot;
    if detached:
        compose_cmd += &quot; -d&quot;
    if build:
        compose_cmd += &quot; --build&quot;

    outputs.append(&quot;Starting Docker Compose build and deployment...&quot;)
    up_result = ssh_execute(GLOBAL_SSH_SESSION_ID, compose_cmd)
    outputs.append(f&quot;Docker Compose Result:\n{up_result}&quot;)

    return &quot;\n\n&quot;.join(outputs)

docker_c_testing = False

if docker_c_testing:
    import textwrap

    # &#x2500;&#x2500; Updated test data with perfectly formatted YAML &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
    docker_name = &quot;test_python_app&quot;
    dockerfile_content = &quot;&quot;&quot;FROM python:3.11-slim
    WORKDIR /app
    COPY requirements.txt .
    RUN pip install --no-cache-dir -r requirements.txt
    COPY app.py .
    EXPOSE 7000
    CMD [&quot;python&quot;, &quot;app.py&quot;]
    &quot;&quot;&quot;
    requirements_txt_content = &quot;&quot;&quot;flask==3.0.3
    &quot;&quot;&quot;
    app_py_content = &quot;&quot;&quot;from flask import Flask
    app = Flask(__name__)

    @app.route(&apos;/&apos;)
    def hello_world():
        return &quot;&lt;h1&gt;Hello, World from Docker Compose deployment test!&lt;/h1&gt;&quot;

    if __name__ == &quot;__main__&quot;:
        app.run(host=&quot;0.0.0.0&quot;, port=7000, debug=False)
    &quot;&quot;&quot;
    # Use dedent to ensure zero leading whitespace on every line
    docker_compose_yml_content = textwrap.dedent(&quot;&quot;&quot;\
    version: &apos;3.8&apos;
    services:
      web:
        build: .
        ports:
          - &quot;7000:7000&quot;
        container_name: test_python_app_web
        restart: unless-stopped
    &quot;&quot;&quot;)
    # Execute the deployment
    result = docker_compose_deploy(
        docker_name=docker_name,
        dockerfile_content=dockerfile_content,
        requirements_txt_content=requirements_txt_content,
        app_py_content=app_py_content,
        docker_compose_yml_content=docker_compose_yml_content,
        detached=True,
        build=True
    )
    exit(0)




# &#x2500;&#x2500; Server Startup with CORS &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
if __name__ == &quot;__main__&quot;:
    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=[&quot;*&quot;],
            allow_credentials=True,
            allow_methods=[&quot;GET&quot;, &quot;POST&quot;, &quot;OPTIONS&quot;],
            allow_headers=[&quot;*&quot;],
            expose_headers=[&quot;*&quot;],
        )
    ]

    app = mcp.http_app(
        path=&quot;/mcp&quot;,
        middleware=middleware
    )

    uvicorn.run(
        app,
        host=&quot;0.0.0.0&quot;,
        port=5010,
        log_level=&quot;info&quot;
    )</code></pre><p>Note - <code>docker_c_testing</code> can be set <code>True</code> in the code and will then be a bypass test that the remote system is working, the ssh is working. &#xA0;At runtime the MCP will verify it&apos;s connection. </p><p>When it runs it should look like this:</p><pre><code class="language-bash">/home/c/PythonProject/task_group/.venv/bin/python /home/c/mcp_docker/i_docker_manager/docker_manager_04.py 
&#x2713; SSH connection established to 192.168.1.4 as c 192.168.1.4:22:c
INFO:     Started server process [572420]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:5009 (Press CTRL+C to quit)
INFO:     192.168.1.62:44242 - &quot;OPTIONS /mcp HTTP/1.1&quot; 200 OK
INFO:     192.168.1.62:44242 - &quot;POST /mcp HTTP/1.1&quot; 200 OK
INFO:     192.168.1.62:44242 - &quot;OPTIONS /mcp HTTP/1.1&quot; 200 OK
INFO:     192.168.1.62:44252 - &quot;POST /mcp HTTP/1.1&quot; 202 Accepted</code></pre><h3 id="issues">Issues</h3><ul><li>We found that you need pretty explicit prompting. You cannot just say &apos;stand up a docker image and container, and expect to know the <em>implied steps of making the Dockerfile, requirements.txt, docker-compose.yml etc. &#xA0;Instead use a prompt like:</em></li></ul><pre><code class="language-bash">When the code is done, using the process manager create a Dockerfile.  With the Docker file using the docker_build command create an image.</code></pre><ul><li>Get your LLM to test all the tools.</li></ul><pre><code class="language-bash">test all the docker tools and make sure you can use them.</code></pre><p>Dockerization</p><ul><li>Naturally this app itself sits in a docker container so we had an LLM write the composition.</li></ul><p>Here are the files required to <strong>Dockerize</strong> your FastMCP-based Docker Manager application.</p><h3 id="1-dockerfile">1. <code>Dockerfile</code></h3><pre><code class="language-dockerfile"># Use a lightweight Python image
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install system dependencies required by Paramiko (for SSH)
RUN apt-get update &amp;&amp; apt-get install -y --no-install-recommends \
    gcc \
    libffi-dev \
    libssl-dev \
    &amp;&amp; rm -rf /var/lib/apt/lists/*

# Copy requirements first to leverage Docker cache
COPY requirements.txt .

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code
COPY . .

# Expose the port used by the FastMCP server
EXPOSE 5010

# Set environment variables (can be overridden at runtime)
ENV ENV_PORT=22

# Command to run the application
CMD [&quot;python&quot;, &quot;app.py&quot;]
</code></pre><h3 id="2-requirementstxt">2. <code>requirements.txt</code></h3><pre><code class="language-txt">fastmcp
starlette
uvicorn[standard]
paramiko
pyyaml
</code></pre><p><strong>Notes on dependencies:</strong></p><ul><li><code>fastmcp</code> &#x2013; assumed to be the package providing <code>FastMCP</code></li><li><code>starlette</code> and <code>uvicorn</code> &#x2013; required for the HTTP server and CORS middleware</li><li><code>paramiko</code> &#x2013; for SSH/SFTP functionality</li><li><code>pyyaml</code> &#x2013; used in <code>upload_file_content</code> for YAML serialization</li></ul><h3 id="3-docker-composeyml">3. <code>docker-compose.yml</code></h3><pre><code class="language-yaml">version: &apos;3.8&apos;

services:
  docker-manager:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: docker-manager
    restart: unless-stopped
    ports:
      - &quot;5010:5010&quot;
    environment:
      # Required for SSH connection to the remote Docker host
      - ENV_SERVER=${ENV_SERVER}
      - ENV_USER=${ENV_USER}
      - ENV_PASSWORD=${ENV_PASSWORD:-}
      - ENV_PORT=${ENV_PORT:-22}
    # Optional: persist SSH known_hosts if needed (rarely required)
    # volumes:
    #   - ./known_hosts:/root/.ssh/known_hosts:ro
</code></pre><h3 id="usage-instructions">Usage Instructions</h3><p>Place the three files above in the same directory as your <code>app.py</code> (the provided script).</p><p>Create a <code>.env</code> file (recommended) for sensitive values:</p><pre><code class="language-env">ENV_SERVER=your.remote.server.ip.or.hostname
ENV_USER=your_ssh_username
ENV_PASSWORD=your_ssh_password
# ENV_PORT=22   # optional, defaults to 22
</code></pre><ol><li>Build and run using Docker Compose:</li></ol><pre><code class="language-bash">docker compose up -d --build
</code></pre><ol><li>Access the FastMCP server at:<br><code>http://localhost:5010/mcp</code></li></ol><h3 id="important-security-considerations">Important Security Considerations</h3><p><strong>Never commit</strong> passwords or SSH keys into version control.</p><p>For production use, it is strongly recommended to use <strong>SSH key authentication</strong> instead of passwords.</p><p>If using keys, modify the Dockerfile and <code>docker-compose.yml</code> to mount the private key:</p><pre><code class="language-yaml">volumes:
  - ~/.ssh/id_rsa:/root/.ssh/id_rsa:ro
</code></pre><p>and update the connection logic accordingly.</p><p>The application establishes an SSH connection at startup using the provided environment variables.</p><p>Would you like me to also provide a version that uses <strong>SSH key-based authentication</strong> (more secure) or any adjustments to the above files?</p><h3 id="conclusion">Conclusion</h3><p>This will give your LLM incredibly powerful tools where it can build docker containers and stand them up. &#xA0;Naturally this tool can also stand them down, delete them etc, so I would really recommend that a spare computer, vm, old laptop serve as it&apos;s &apos;workspace!&apos;</p>]]></content:encoded></item><item><title><![CDATA[StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!]]></title><description><![CDATA[We Look a StudentLLM setup to get as much productivity out of limited hardware as we can.]]></description><link>https://www.hotconfig.com/studentllm-examinin/</link><guid isPermaLink="false">69e3e8fd9e9ad20001df3c8b</guid><category><![CDATA[studentLLM]]></category><category><![CDATA[HomeLLM]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Sun, 19 Apr 2026 02:45:40 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/04/single_student.jpg" medium="image"/><content:encoded><![CDATA[<ul><li>System Specs - Ryzen 5 2600 (6 Core - 12Thread / 15,000 CPU Passmark) &#xA0;16 GB RAM / 1 3060ti 8GB.</li></ul><img src="https://www.hotconfig.com/content/images/2026/04/single_student.jpg" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"><p>The question arises &#xA0;- on a very basic budget PC, can a University Student get something useful and productive - not a chatbot - but something with agentic workflow tools etc.?.. So we dug out an 3060ti, took out most of the ram, and started writing!</p><ul><li>Please note this recipe will work for much larger high-end systems, simply reuse this recipe and give it a 35B or a 122B or what have you!</li></ul><p>Let&apos;s get started!</p><h3 id="0-install-your-basics-supports-compilers-etc">0. Install your basics supports / compilers etc.</h3><pre><code class="language-bash">sudo apt install build-essential wget git python3 cmake -y
sudo apt install libcurl4-openssl-dev</code></pre><h3 id="a-installing-your-nvidia-drivers">A. Installing your Nvidia Drivers</h3><ul><li>This is going to vary based upon your video card, and you can run into issues, there are literally dozens of nvidia drivers, server drivers, and the nouveau which is often already in the standard Linux installation.</li><li>The best option is the last one in this &#xA0;section direct install of the 595 from Nvidia which we show at the bottom but you might get it to work using the local Linux repository.. To prevent conflict we blacklist Nouveau.</li><li>Driver 550 in many repositories &#xA0;might conflict with your current Kernel, however your auto-install may select it. Driver 595 as of April 2026 works very good - even with a ten year old 3060ti.</li><li>Here is what we found worked, and one can spin at this point ironically (we ended up reinstalling our drivers like 6 times - don&apos;t feel bad if you take several attempts at this.)</li></ul><p><strong>Before Doing Anything - Set Linux Kernel Headers</strong></p><pre><code class="language-bash">sudo apt install linux-headers-$(uname -r)</code></pre><ul><li>linux-headers will hold the correct packages that will allow the rest of the drivers to build against. &#xA0;</li></ul><p><strong>First Try</strong></p><pre><code class="language-bash">sudo apt install nvidia-driver-full nvidia-cuda-toolkit -y</code></pre><p>If it does issue errors try blacklisting nouveau drivers as they can conflict. </p><pre><code class="language-bash">sudo apt update &amp;&amp; sudo apt full-upgrade -y
sudo apt autoremove -y</code></pre><pre><code class="language-bash">sudo nano /etc/modprobe.d/blacklist-nouveau.conf</code></pre><p><strong>Add</strong></p><pre><code class="language-bash">blacklist nouveau
options nouveau modeset=0
alias nouveau off</code></pre><p>Update <code>initramfs</code> and reboot</p><pre><code class="language-bash">sudo update-initramfs -u &amp;&amp; sudo reboot</code></pre><h3 id="direct-driver-pull-from-nvidia">Direct Driver Pull from Nvidia</h3><p>If everything fails simply do a direct pull from Nvidia, purging out all old drivers:</p><pre><code class="language-bash">sudo apt purge *nvidia*
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/595.58.03/NVIDIA-Linux-x86_64-595.58.03.run
chmod +x NVIDIA-Linux-x86_64-595.58.03.run
sudo ./NVIDIA-Linux-x86_64-595.58.03.run</code></pre><h3 id="nvidia-smi-driver-confirmation-will-confirm-your-gtg">nvidia-smi Driver Confirmation Will Confirm Your GTG!</h3><pre><code class="language-bash">nvidia-smi</code></pre><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.hotconfig.com/content/images/2026/04/image-103.png" class="kg-image" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!" loading="lazy" width="753" height="327" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-103.png 600w, https://www.hotconfig.com/content/images/2026/04/image-103.png 753w" sizes="(min-width: 720px) 720px"><figcaption>nvidia-smi result example</figcaption></figure><ul><li>It will look as (and note specifically it will show you in the top right corner the highest CUDA toolkit that your GPU / Drivers can support (<strong>CUDA Version: 13.2</strong>)</li></ul><h3 id="b-installing-cuda-toolkit-132">B. Installing Cuda Toolkit 13.2</h3><ul><li>Next we will need to get the Nvidia Cuda toolkit (latest version 13.2) installed - as it will have the very important <code>nvcc</code> compiler that will make our custom Turboquant enabled llama.cpp shortly. This is really important as we need these new power features that will give us as big of a kv-cache as we can get.</li></ul><pre><code class="language-bash">wget https://developer.download.nvidia.com/compute/cuda/13.2.0/local_installers/cuda-repo-debian13-13-2-local_13.2.0-595.45.04-1_amd64.deb
sudo dpkg -i cuda-repo-debian13-13-2-local_13.2.0-595.45.04-1_amd64.deb
sudo cp /var/cuda-repo-debian13-13-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-2</code></pre><pre><code class="language-bash">nvcc --version</code></pre><p><strong>Note - nvcc can completely install itself - &#xA0;but somehow not bother to add itself to your path! Seriously why? So to address this - you can edit your ~<code>/.bashrc</code> and add:</strong></p><pre><code class="language-bash">PATH=/usr/local/cuda-13.2/bin:$PATH</code></pre><p>Then re-source your <code>~./bashrc</code>:</p><pre><code class="language-bash">source ~/.basrc</code></pre><p>When it works it will show up as:</p><pre><code class="language-bash">$nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2026 NVIDIA Corporation
Built on Thu_Mar_19_11:12:51_PM_PDT_2026
Cuda compilation tools, release 13.2, V13.2.78
Build cuda_13.2.r13.2/compiler.37668154_0</code></pre><h3 id="support">Support</h3><ul><li>Nvidia / Cuda ToolKit Driver fitting can be so problematic there are dedicated troubleshooting pages, do consider:</li></ul><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.rightnowai.co/blog/fixing-cuda-installation-issues"><div class="kg-bookmark-content"><div class="kg-bookmark-title">The Complete Guide to Fixing CUDA Installation Issues - Break AI Scaling Limits in 7 Days</div><div class="kg-bookmark-description">Getting CUDA working shouldn&#x2019;t take days. After analyzing hundreds of installation failures, I&#x2019;ve compiled every fix you need for PATH errors, driver mismatches, WSL2 problems, and more. Stop fighting with nvcc and start coding. Learn proven techniques to shift the scaling law intercept and achieve&#x2026;</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.rightnowai.co/favicon.ico" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"><span class="kg-bookmark-author">RightNow AI</span><span class="kg-bookmark-publisher">RightNow AI</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.rightnowai.co/blogs/cuda-setup.webp" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"></div></a></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.hotconfig.com/content/images/2026/04/image-104.png" class="kg-image" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!" loading="lazy" width="223" height="270"><figcaption>You&apos;re almost there!</figcaption></figure><h3 id="c-installing-turboquant-forked-llamacpp">C. Installing TurboQuant Forked Llama.cpp</h3><p>Once that is done we will pull the Turboquant enabled fork of Llama.cpp. &#xA0;This will reduce our cache significantly, allowing us to squeeze as much as we can out of our houseLLM. It is the last challenging step as you will build it from source and it prefers a specific configuration. </p><p><strong>C.1. You might need to update your cmake to the latest before you continue it&apos;s not hard here is how! </strong></p><pre><code class="language-bash">wget https://github.com/Kitware/CMake/releases/download/v4.3.1/cmake-4.3.1-linux-x86_64.sh
chmod +x ./cmake-4.3.1-linux-x86_64.sh
./cmake-4.3.1-linux-x86_64.sh</code></pre><ul><li>This just un-compresses. You may need to then copy your bin files to /usr/bin or make a ln (symbolic link)</li></ul><pre><code class="language-bash">cd cmake-4.3.1-linux-x86_64/bin
sudo cp * /usr/bin</code></pre><p>Once you are there (however you get there):</p><pre><code class="language-bash">c@dragon-192-168-1-3:~/PythonProject/TurboResearcher2/cmake/cmake-4.3.1-linux-x86_64/bin$ cmake --version
cmake version 4.3.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).</code></pre><p>Here is the TurboQuant forked variant of llama.cpp full recognition of the excellent &apos;The Tom&apos; that built it!</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/TheTom/llama-cpp-turboquant"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - TheTom/llama-cpp-turboquant: LLM inference in C/C++</div><div class="kg-bookmark-description">LLM inference in C/C++. Contribute to TheTom/llama-cpp-turboquant development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">TheTom</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/950358a6a31afb9f1b1d70b9af198e6a92be6078670a8a5e2662296fdd56fe36/TheTom/llama-cpp-turboquant" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"></div></a></figure><ul><li>Pull the repository and enter it&apos;s directory:</li></ul><pre><code class="language-bash">git clone https://github.com/TheTom/llama-cpp-turboquant.git
cd llama-cpp-turboquant</code></pre><ul><li>Make a custom script inside of it named <code>install.sh</code> - inside of it put:</li><li>Note this is for the nvidia driver installation using Cuda. If you have a Mac you will need other drivers, typically in the Readme it will have the alternate drivers for it.</li></ul><pre><code class="language-bash">cmake -B build \
      -DLLAMA_CUDA=ON \
      -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.2/bin/nvcc \
      -DCUDAToolkit_ROOT=/usr/local/cuda-13.2 \
      -DCMAKE_CUDA_ARCHITECTURES=&quot;86;89&quot; \
      -DCMAKE_BUILD_TYPE=Release 
cmake --build build --config Release -j$(nproc)</code></pre><ul><li>Please note - we specified both architectures (86,89) that way if you upgrade your GPU to a 4080, 5080 etc - &#xA0;it should work out of the box! &#xA0;Add 100 for super-latest stuff.</li><li>Make it an executable and execute it:</li></ul><pre><code class="language-bash">chmod +x ./install.sh
./install.sh</code></pre><ul><li>Now wait about 15-20 minutes for it to compile</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.hotconfig.com/content/images/2026/04/image-105.png" class="kg-image" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!" loading="lazy" width="474" height="266"><figcaption>Well longer than that ...&#xA0;</figcaption></figure><p>Inside when it finally finishes will be a directory, you simply want to copy it&apos;s contents to your <code>/usr/bin</code> location. &#xA0;If you have already another llama.cpp that you do not want to conflict then use global pathing in all references aka <code>/usr/bin/customllm/llm-server</code> instead. </p><p>Move all the compiled product to your <code>/usr/bin</code> - from inside the built directory:</p><pre><code class="language-bash">cd /build/bin
sudo cp * /usr/bin</code></pre><p>Making sure it&apos;s working and ready to go:</p><pre><code class="language-bash">llama-server</code></pre><pre><code class="language-bash">ggml_cuda_init: found 1 CUDA devices (Total VRAM: 7839 MiB):
Device 0: NVIDIA GeForce RTX 3060 Ti, compute capability 8.6, VMM: yes, VRAM: 7839 MiB
main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
build_info: b8967-627ebbc6e
system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
init: using 11 threads for HTTP server</code></pre><h3 id="d-installing-the-qwen25-coder-7b-instruct-gguf"><br>D. Installing the Qwen2.5-Coder-7B-Instruct-GGUF</h3><ul><li>We chose a Qwen2.5-Coder-7B-Instruct 6-bit - which should hopefully give us as much affinity towards coding on a 8GB as we can. The 6-bit frees up space to get as much space back as we can while maintaining as much power as we can get.</li></ul><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://huggingface.co/khjvgvyfc/Qwen2.5-Coder-7B-Instruct-GGUF"><div class="kg-bookmark-content"><div class="kg-bookmark-title">khjvgvyfc/Qwen2.5-Coder-7B-Instruct-GGUF &#xB7; Hugging Face</div><div class="kg-bookmark-description">We&#x2019;re on a journey to advance and democratize artificial intelligence through open source and open science.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://huggingface.co/favicon.ico" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"></div></div><div class="kg-bookmark-thumbnail"><img src="https://cdn-thumbnails.huggingface.co/social-thumbnails/models/khjvgvyfc/Qwen2.5-Coder-7B-Instruct-GGUF.png" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"></div></a></figure><p>One simply pulls it with:</p><ul><li>We recommend a working <code>~/models</code> directory so:</li></ul><pre><code class="language-bash">mkdir ~/models &amp;&amp; cd ~/models</code></pre><pre><code class="language-bash">wget https://huggingface.co/khjvgvyfc/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/qwen2.5-coder-7b-instruct-q6_k.gguf?download=true</code></pre><p>Almost there. </p><p>Typically because the command-line options for llama-cpp and llama-server can be <em>really large </em>- it is smart to save your command lines calls in a script so that you can tweak them as you desire, but if / when you come back a long time later you are not forgetting the myriad of options availed you so... Additionally we made the filename simpler so that it is more easily referenced, and we recommend absolute pathing in the scripts:</p><pre><code class="language-bash">sudo mv qwen2.5-coder-7b-instruct-q6_k.gguf\?download\=true  qwen2.5-coder-7b-instruct-q6_k.gguf</code></pre><pre><code class="language-bash">/usr/bin/llama-server --jinja \
-m /home/c/models/qwen2.5-coder-7b-instruct-q6_k.gguf \
--host 192.168.1.4 \
--n-gpu-layers 999 \
--override-tensor &quot;\.ffn_.*_exps\.weight=CPU&quot; \
--flash-attn on \
--cache-type-k turbo3 \
--cache-type-v turbo3 \
-c 64000 \
--temp 0.7</code></pre><p>If it boots right it will produce a large detail, here is what one looks like for reference:</p><p>For an even FASTER configuration try this one! Full credit to: </p><pre><code class="language-bash">https://x.com/iam_shanmukha</code></pre><pre><code class="language-bash">/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \
--host 192.168.1.3 \
--fit on \
--flash-attn on \
--spec-type ngram-mod \
--spec-ngram-size-n 24 \
--n-cpu-moe-draft 39 \
-t 14 \
--chat-template-kwargs &apos;{&quot;preserve_thinking&quot;:true}&apos; \
--cache-type-k turbo3 \
--cache-type-v turbo4 \
-c 512000 \
--temp 0.7</code></pre><p>Full credit to <a href="https://x.com/iam_shanmukha">https://x.com/iam_shanmukha</a> who suggested an even faster configuration:<br></p><ul><li>We tried this and did see some speed ups to 35 Tokens/s. However it was noted that it might make more errors on REALLY LARGE 100k contexts! So maybe put both in seperate scripts and try the one you like best!</li></ul><pre><code class="language-bash">srv    load_model: loading model &apos;/home/c/models/qwen2.5-coder-7b-instruct-q6_k.gguf&apos;
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
llama_params_fit_impl: projected to use 6527 MiB of device memory vs. 7382 MiB of free device memory
llama_params_fit_impl: cannot meet free memory target of 1024 MiB, need to reduce device memory by 168 MiB
llama_params_fit_impl: context size set by user to 64000 -&gt; no change
llama_params_fit: failed to fit params to free device memory: n_gpu_layers already set by user to 999, abort
llama_params_fit: fitting params to free memory took 0.48 seconds
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3060 Ti) (0000:07:00.0) - 7382 MiB free
llama_model_loader: loaded meta data with 29 key-value pairs and 339 tensors from /home/c/models/qwen2.5-coder-7b-instruct-q6_k.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct GGUF
llama_model_loader: - kv   3:                           general.finetune str              = Instruct-GGUF
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                          qwen2.block_count u32              = 28
llama_model_loader: - kv   7:                       qwen2.context_length u32              = 131072
llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  14:                          general.file_type u32              = 18
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,152064]  = [&quot;!&quot;, &quot;\&quot;&quot;, &quot;#&quot;, &quot;$&quot;, &quot;%&quot;, &quot;&amp;&quot;, &quot;&apos;&quot;, ...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = [&quot;&#x120; &#x120;&quot;, &quot;&#x120;&#x120; &#x120;&#x120;&quot;, &quot;i n&quot;, &quot;&#x120; t&quot;,...
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- &apos;&lt;|im_start|&gt;...
llama_model_loader: - kv  25:               general.quantization_version u32              = 2
llama_model_loader: - kv  26:                                   split.no u16              = 0
llama_model_loader: - kv  27:                                split.count u16              = 0
llama_model_loader: - kv  28:                        split.tensors.count i32              = 339
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q6_K:  198 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q6_K
print_info: file size   = 5.82 GiB (6.56 BPW)
load: 0 unused tokens
load: control-looking token: 128247 &apos;&lt;/s&gt;&apos; was not control-type; this is probably a bug in the model. its type will be overridden
load: printing all EOG tokens:
load:   - 128247 (&apos;&lt;/s&gt;&apos;)
load:   - 151643 (&apos;&lt;|endoftext|&gt;&apos;)
load:   - 151645 (&apos;&lt;|im_end|&gt;&apos;)
load:   - 151662 (&apos;&lt;|fim_pad|&gt;&apos;)
load:   - 151663 (&apos;&lt;|repo_name|&gt;&apos;)
load:   - 151664 (&apos;&lt;|file_sep|&gt;&apos;)
load: special tokens cache size = 23
load: token to piece cache size = 0.9310 MB
print_info: arch                  = qwen2
print_info: vocab_only            = 0
print_info: no_alloc              = 0
print_info: n_ctx_train           = 131072
print_info: n_embd                = 3584
print_info: n_embd_inp            = 3584
print_info: n_layer               = 28
print_info: n_head                = 28
print_info: n_head_kv             = 4
print_info: n_rot                 = 128
print_info: n_swa                 = 0
print_info: is_swa_any            = 0
print_info: n_embd_head_k         = 128
print_info: n_embd_head_v         = 128
print_info: n_gqa                 = 7
print_info: n_embd_k_gqa          = 512
print_info: n_embd_v_gqa          = 512
print_info: f_norm_eps            = 0.0e+00
print_info: f_norm_rms_eps        = 1.0e-06
print_info: f_clamp_kqv           = 0.0e+00
print_info: f_max_alibi_bias      = 0.0e+00
print_info: f_logit_scale         = 0.0e+00
print_info: f_attn_scale          = 0.0e+00
print_info: n_ff                  = 18944
print_info: n_expert              = 0
print_info: n_expert_used         = 0
print_info: n_expert_groups       = 0
print_info: n_group_used          = 0
print_info: causal attn           = 1
print_info: pooling type          = -1
print_info: rope type             = 2
print_info: rope scaling          = linear
print_info: freq_base_train       = 1000000.0
print_info: freq_scale_train      = 1
print_info: n_ctx_orig_yarn       = 131072
print_info: rope_yarn_log_mul     = 0.0000
print_info: rope_finetuned        = unknown
print_info: model type            = 7B
print_info: model params          = 7.62 B
print_info: general.name          = Qwen2.5 Coder 7B Instruct GGUF
print_info: vocab type            = BPE
print_info: n_vocab               = 152064
print_info: n_merges              = 151387
print_info: BOS token             = 151643 &apos;&lt;|endoftext|&gt;&apos;
print_info: EOS token             = 151645 &apos;&lt;|im_end|&gt;&apos;
print_info: EOT token             = 151645 &apos;&lt;|im_end|&gt;&apos;
print_info: PAD token             = 151643 &apos;&lt;|endoftext|&gt;&apos;
print_info: LF token              = 198 &apos;&#x10A;&apos;
print_info: FIM PRE token         = 151659 &apos;&lt;|fim_prefix|&gt;&apos;
print_info: FIM SUF token         = 151661 &apos;&lt;|fim_suffix|&gt;&apos;
print_info: FIM MID token         = 151660 &apos;&lt;|fim_middle|&gt;&apos;
print_info: FIM PAD token         = 151662 &apos;&lt;|fim_pad|&gt;&apos;
print_info: FIM REP token         = 151663 &apos;&lt;|repo_name|&gt;&apos;
print_info: FIM SEP token         = 151664 &apos;&lt;|file_sep|&gt;&apos;
print_info: EOG token             = 128247 &apos;&lt;/s&gt;&apos;
print_info: EOG token             = 151643 &apos;&lt;|endoftext|&gt;&apos;
print_info: EOG token             = 151645 &apos;&lt;|im_end|&gt;&apos;
print_info: EOG token             = 151662 &apos;&lt;|fim_pad|&gt;&apos;
print_info: EOG token             = 151663 &apos;&lt;|repo_name|&gt;&apos;
print_info: EOG token             = 151664 &apos;&lt;|file_sep|&gt;&apos;
print_info: max token length      = 256
load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 27 repeating layers to GPU
load_tensors: offloaded 29/29 layers to GPU
load_tensors:   CPU_Mapped model buffer size =   426.36 MiB
load_tensors:        CUDA0 model buffer size =  5532.43 MiB
........................................................................................
common_init_result: added &lt;/s&gt; logit bias = -inf
common_init_result: added &lt;|endoftext|&gt; logit bias = -inf
common_init_result: added &lt;|im_end|&gt; logit bias = -inf
common_init_result: added &lt;|fim_pad|&gt; logit bias = -inf
common_init_result: added &lt;|repo_name|&gt; logit bias = -inf
common_init_result: added &lt;|file_sep|&gt; logit bias = -inf
llama_context: constructing llama_context
llama_context: n_seq_max     = 4
llama_context: n_ctx         = 64000
llama_context: n_ctx_seq     = 64000
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = enabled
llama_context: kv_unified    = true
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (64000) &lt; n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     2.32 MiB
llama_kv_cache:      CUDA0 KV buffer size =   683.72 MiB
llama_kv_cache: TurboQuant rotation matrices initialized (128x128)
llama_kv_cache: size =  683.59 MiB ( 64000 cells,  28 layers,  4/1 seqs), K (turbo3):  341.80 MiB, V (turbo3):  341.80 MiB
llama_kv_cache: upstream attention rotation disabled (TurboQuant uses kernel-level WHT)
llama_kv_cache: attn_rot_k = 0, n_embd_head_k_all = 128
llama_kv_cache: attn_rot_v = 0, n_embd_head_k_all = 128
sched_reserve: reserving ...
sched_reserve: resolving fused Gated Delta Net support:
sched_reserve: fused Gated Delta Net (autoregressive) enabled
sched_reserve: fused Gated Delta Net (chunked) enabled
sched_reserve:      CUDA0 compute buffer size =   311.00 MiB
sched_reserve:  CUDA_Host compute buffer size =   139.01 MiB
sched_reserve: graph nodes  = 1015
sched_reserve: graph splits = 2
sched_reserve: reserve took 111.08 ms, sched copies = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
srv    load_model: initializing slots, n_slots = 4
no implementations specified for speculative decoding
slot   load_model: id  0 | task -1 | speculative decoding context not initialized
slot   load_model: id  0 | task -1 | new slot, n_ctx = 64000
no implementations specified for speculative decoding
slot   load_model: id  1 | task -1 | speculative decoding context not initialized
slot   load_model: id  1 | task -1 | new slot, n_ctx = 64000
no implementations specified for speculative decoding
slot   load_model: id  2 | task -1 | speculative decoding context not initialized
slot   load_model: id  2 | task -1 | new slot, n_ctx = 64000
no implementations specified for speculative decoding
slot   load_model: id  3 | task -1 | speculative decoding context not initialized
slot   load_model: id  3 | task -1 | new slot, n_ctx = 64000
srv    load_model: prompt cache is enabled, size limit: 8192 MiB
srv    load_model: use `--cache-ram 0` to disable the prompt cache
srv    load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
srv          init: init: idle slots will be saved to prompt cache and cleared upon starting a new task
init: chat template, example_format: &apos;&lt;|im_start|&gt;system
You are a helpful assistant&lt;|im_end|&gt;
&lt;|im_start|&gt;user
Hello&lt;|im_end|&gt;
&lt;|im_start|&gt;assistant
Hi there&lt;|im_end|&gt;
&lt;|im_start|&gt;user
How are you?&lt;|im_end|&gt;
&lt;|im_start|&gt;assistant
&apos;
srv          init: init: chat template, thinking = 0
main: model loaded
main: server is listening on http://192.168.1.4:8080
main: starting the main loop...
srv  update_slots: all slots are idle
srv  log_server_r: done request: GET / 192.168.1.62 200
srv  log_server_r: done request: GET /bundle.css 192.168.1.62 200
srv  log_server_r: done request: GET /bundle.js 192.168.1.62 200
srv  log_server_r: done request: HEAD /cors-proxy 192.168.1.62 404</code></pre><p>How Does it Work?</p><pre><code class="language-bash">http://192.168.1.4:8080</code></pre><ul><li>Change to the local IP address of your machine.</li><li>It works really good - for a basic house 8B. &#xA0;We won&apos;t spend a lot of time on that alone because the real POWER comes when you make it agentic by adding external tools!</li></ul><h3 id="please-note-llms-are-okay-but-an-an-llm-with-agentic-tool-calling-that-can-compile-correct-rewrite-its-code-over-and-over-is-10x-more-powerfuleven-if-its-just-a-8b">PLEASE NOTE: LLM&apos;S ARE OKAY. BUT AN AN LLM WITH AGENTIC TOOL CALLING THAT CAN COMPILE, CORRECT, REWRITE ITS CODE OVER AND OVER IS 10X MORE POWERFUL - EVEN IF IT&apos;S JUST A 8B.</h3><p></p><ul><li>It is only a little more work to add agentic tool calling. &#xA0;That is where your LLM gets a super power up. &#xA0;They are not hard at all we carefully documented them from really basic calculator agents, to highly powerful ones that can go on the internet research and then come back and do work. &#xA0; Don&apos;t be overwhelmed just work through each guide!</li></ul><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/easy-bake-mcp-docker-tools/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo.</div><div class="kg-bookmark-description">Downloads Page for all your MCP tooling needs!</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/surf_bot.jpg" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"></div></a></figure><ul><li>Because this first model worked &apos;okay&apos; we then immediately switched to another one that had the powerful agentic tooling options!</li></ul><h3 id="upgrading-to-qwen35-9b-wagentic-tool-capability">Upgrading to Qwen3.5-9B w/Agentic Tool Capability.</h3><ul><li>Right away we went back picked up a much new model, one that specifically noted it&apos;s tooling capability!</li></ul><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://huggingface.co/Qwen/Qwen3.5-9B?inference_provider=together"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Qwen/Qwen3.5-9B &#xB7; Hugging Face</div><div class="kg-bookmark-description">We&#x2019;re on a journey to advance and democratize artificial intelligence through open source and open science.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://huggingface.co/favicon.ico" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"></div></div><div class="kg-bookmark-thumbnail"><img src="https://cdn-thumbnails.huggingface.co/social-thumbnails/models/Qwen/Qwen3.5-9B.png" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"></div></a></figure><p>You can pull it with:</p><pre><code class="language-bsah">wget https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-UD-Q5_K_XL.gguf?download=true</code></pre><p>We created another script for our new model, and tested its agentic abilities.</p><pre><code class="language-bash">/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.5-9B-UD-Q5-K_XL.gguf \
--host 192.168.1.4 \
--n-gpu-layers 999 \
--override-tensor &quot;\.ffn_.*_exps\.weight=CPU&quot; \
--flash-attn on \
--cache-type-k turbo4 \
--cache-type-v turbo2 \
-c 32768  \
--temp 0.7</code></pre><p>We were highly impressed as this model went straight to work, started corrected it&apos;s tool calls, was still going strong at 12,000 Token/s! Nice!</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-106.png" class="kg-image" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!" loading="lazy" width="812" height="484" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-106.png 600w, https://www.hotconfig.com/content/images/2026/04/image-106.png 812w" sizes="(min-width: 720px) 720px"></figure><h3 id="adding-one-more-super-tool-llmqp">Adding one more Super Tool: LLMQP.</h3><p>This will let your localLLM code all night. No longer do you need to sit there waiting between prompts but you can quickly and effectively use this to manage your prompts sequentially.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/llm-queue-dispatcher/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT.</div><div class="kg-bookmark-description">LLM Queue Dispatcher. A Powerful Harness Drop will queue your localLLM all night and keep it working!</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/05/rect4.png" alt="StudentLLM - Qwen2.5-coder-7b-instruct-q6-k / Qwen3.5 Agentic on a Ryzen 5-2600/ 3060ti. Production LLM or not? YES!"></div></a></figure><p></p><h3 id="conclusion">Conclusion</h3><p>Absolutely you CAN get agentic quality local LLM&apos;s working on very very minimal house GPU parts. &#xA0; It comes down to the resourceful methods one wants to employ. &#xA0;It also was inferencing very fast at ~ 45 Tokens/s.</p><ul><li>This can be very powerfully useful as a &apos;side-hustle&apos; LLM that can do your work for minimum effort!</li><li>Using our Code Drop tool after it was done it successfully had created the following code package for us.</li></ul>
        <div class="kg-card kg-file-card kg-file-card-medium">
            <a class="kg-file-card-container" href="https://www.hotconfig.com/content/files/2026/04/full_export_BinarySort_Algorithm_20260419_024252.zip" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">Full export BinarySort Algorithm 20260419 024252</div>
                    
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">full_export_BinarySort_Algorithm_20260419_024252.zip</div>
                        <div class="kg-file-card-filesize">3 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        ]]></content:encoded></item><item><title><![CDATA[Qwen3.6 Drops!- A HouseLLM Production Level  Coding Perspective? One-Shot GoAccess]]></title><description><![CDATA[We Test Qwen3.6 if it is up to your home production standards.]]></description><link>https://www.hotconfig.com/qwen3-6-drop/</link><guid isPermaLink="false">69e23ec69e9ad20001df3bb4</guid><category><![CDATA[One-shot]]></category><category><![CDATA[goaccess]]></category><category><![CDATA[Qwen3.6]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Fri, 17 Apr 2026 14:35:21 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/04/robot_sinpect.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/04/robot_sinpect.jpg" alt="Qwen3.6 Drops!- A HouseLLM Production Level  Coding Perspective? One-Shot GoAccess"><p>Huggingface and most sites were excited to get the latest Qwen3.6 - scoring unbelievably high for &#xA0;a 35B. &#xA0;People were clearly questioning if this model was tuned to rig the benchmarks, but nonetheless its performance out of the gate was looking very impressive:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.hotconfig.com/content/images/2026/04/image-100.png" class="kg-image" alt="Qwen3.6 Drops!- A HouseLLM Production Level  Coding Perspective? One-Shot GoAccess" loading="lazy" width="2000" height="1079" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-100.png 600w, https://www.hotconfig.com/content/images/size/w1000/2026/04/image-100.png 1000w, https://www.hotconfig.com/content/images/size/w1600/2026/04/image-100.png 1600w, https://www.hotconfig.com/content/images/size/w2400/2026/04/image-100.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Is it just me or are they squeezing so much graph into such small images you cannot read it?</figcaption></figure><p>Scoring in at 51.5 in Terminal-Bench 2.0 looks very well suited as a strong contender for managing your server instances, scoring handsomely above Gemma and Google which is in our interests. &#xA0;</p><p>We gave it the following prompt, if it is actually that good to see if it could one-shot <a href="https://goaccess.io/">GoAccess </a>a Logging service. &#xA0;We gave it all the Agentic tools (Context Manager, Python, Web Lookup etc, asked it to benchmark it&apos;s work and then finish with a Code Drop.)</p><p>Here are the results. &#xA0;Our System Specifications is modest.</p><ul><li>Ryzen 9 3900 12-core/24 Thread. (128 GB RAM)</li><li>Nivida 4080ti w/16 GB.</li></ul><p>Our Run Configuration is Using TensorBalance / TurboQuant Forked Llama.cpp if you want to build your own (<a href="https://www.hotconfig.com/the-turboquant-breakthrough/">here</a>). If you want to build the agentic tools to run inside your own docker container - check out our (<a href="https://www.hotconfig.com/easy-bake-mcp-docker-tools/">downloads</a>)</p><ul><li>We were running the 6-bit GGUF slight off a full 8 (Q6)</li></ul><h3 id="our-run-config">Our Run Config</h3><pre><code class="language-bash">/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \
--host 192.168.1.3 \
--n-gpu-layers 999 \
--flash-attn on \
--override-tensor &quot;\.ffn_.*_exps\.weight=CPU&quot; \
--cache-type-k turbo3 \
--cache-type-v turbo3 \
-c 512000 \
--temp 0.7</code></pre><h3 id="our-prompt">Our Prompt</h3><pre><code class="language-bash">Go find GoAccess Traffic Analyzer, Analyze it into a project using the project manager into several tasks to replicate it. Every time you create a task or job inside the project manager save it. Work on each task and test them in python. When you are done save the project manager and send a code drop</code></pre><h3 id="our-result">Our Result</h3><ul><li>63 Reasoning Cycles intermixed with approximately 30 python code cycles.</li><li>The LLM make little look ups to the Internet working mostly locally however it did go find it (we did not specify where to even look for it.)</li></ul><h3 id="performance">Performance</h3><ul><li>It averaged 22.9 Tokens/s over a Context Window of 62,770 Tokens</li><li>It worked by itself for about 45 Minutes.</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-101.png" class="kg-image" alt="Qwen3.6 Drops!- A HouseLLM Production Level  Coding Perspective? One-Shot GoAccess" loading="lazy" width="547" height="88"></figure><ul><li>It did not complete the Code Drop - &#xA0;it delivered after a second ask. &#xA0;We use a custom Context Manager so that it can retrieve its work and or break it up - and if you want a full guide for adding it to your llm its opensource. (<a href="https://www.hotconfig.com/agentic-server-primer-llama-cpp-mcp-lesson-7-process-manager-part-1/">here</a>)</li></ul><h3 id="code-drop">Code Drop</h3><ul><li>Came out very nicely, producing a project structure of 6 folders.</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-102.png" class="kg-image" alt="Qwen3.6 Drops!- A HouseLLM Production Level  Coding Perspective? One-Shot GoAccess" loading="lazy" width="957" height="244" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-102.png 600w, https://www.hotconfig.com/content/images/2026/04/image-102.png 957w" sizes="(min-width: 720px) 720px"></figure><ul><li>The code a 31 KB export, html came in at 21 KB of production.</li></ul>
        <div class="kg-card kg-file-card kg-file-card-medium">
            <a class="kg-file-card-container" href="https://www.hotconfig.com/content/files/2026/04/full_export_GoAccess-Replicator_20260417_135838.zip" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">Full export GoAccess Replicator 20260417 135838</div>
                    
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">full_export_GoAccess Replicator_20260417_135838.zip</div>
                        <div class="kg-file-card-filesize">31 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <ul><li>Multiple Class Performance</li></ul><p>The model clearly delineated the task into groups and performed well, writing clean code without hinting. &#xA0;Again it ran dozens of cycles to clean it&apos;s own code an example Class it built for the URLAnalyzer.</p><pre><code class="language-bash">from collections import Counter, defaultdict
from typing import Dict, List, Tuple
from log_parser import LogEntry
from urllib.parse import urlparse

class URLAnalyzer:
    &quot;&quot;&quot;Analyzes URLs and paths from parsed log entries.&quot;&quot;&quot;
    
    def __init__(self, entries: List[LogEntry]):
        self.entries = entries
        self.path_counter = Counter()
        self.path_bytes: Dict[str, int] = defaultdict(int)
        self.path_status: Dict[str, Counter] = defaultdict(Counter)
        self.method_counter = Counter()
        self.extension_counter = Counter()
        self.directory_counter = Counter()
        self._analyze()
    
    def _analyze(self):
        &quot;&quot;&quot;Perform URL analysis.&quot;&quot;&quot;
        for entry in self.entries:
            self.path_counter[entry.path] += 1
            self.path_bytes[entry.path] += entry.response_size
            self.path_status[entry.path][entry.status_code] += 1
            self.method_counter[entry.method] += 1
            
            # Extract file extension
            parsed = urlparse(entry.path)
            path = parsed.path
            if &apos;.&apos; in path:
                ext = path.split(&apos;.&apos;)[-1].split(&apos;?&apos;)[0]
                self.extension_counter[ext] += 1
            
            # Extract directory
            if path != &apos;/&apos;:
                parts = path.strip(&apos;/&apos;).split(&apos;/&apos;)
                directory = &apos;/&apos; + &apos;/&apos;.join(parts[:-1]) if len(parts) &gt; 1 else &apos;/&apos;
                self.directory_counter[directory] += 1
    
    def get_top_paths(self, n: int = 10) -&gt; List[Tuple[str, int]]:
        &quot;&quot;&quot;Get top N most requested paths.&quot;&quot;&quot;
        return self.path_counter.most_common(n)
    
    def get_top_bytes_paths(self, n: int = 10) -&gt; List[Tuple[str, int]]:
        &quot;&quot;&quot;Get top N paths by bytes transferred.&quot;&quot;&quot;
        return sorted(
            [(path, bytes) for path, bytes in self.path_bytes.items()],
            key=lambda x: x[1],
            reverse=True
        )[:n]
    
    def get_path_details(self, path: str) -&gt; Dict:
        &quot;&quot;&quot;Get detailed statistics for a specific path.&quot;&quot;&quot;
        if path not in self.path_counter:
            return {}
        
        return {
            &apos;path&apos;: path,
            &apos;total_requests&apos;: self.path_counter[path],
            &apos;total_bytes&apos;: self.path_bytes[path],
            &apos;status_codes&apos;: dict(self.path_status[path]),
            &apos;avg_response_size&apos;: self.path_bytes[path] / self.path_counter[path]
        }
    
    def get_extension_distribution(self) -&gt; Dict[str, int]:
        &quot;&quot;&quot;Get distribution of file extensions.&quot;&quot;&quot;
        return dict(self.extension_counter.most_common())
    
    def get_directory_distribution(self, n: int = 10) -&gt; Dict[str, int]:
        &quot;&quot;&quot;Get distribution of directories.&quot;&quot;&quot;
        return dict(self.directory_counter.most_common(n))
    
    def get_method_distribution(self) -&gt; Dict[str, int]:
        &quot;&quot;&quot;Get distribution of HTTP methods.&quot;&quot;&quot;
        return dict(self.method_counter.most_common())
    
    def get_static_vs_dynamic(self) -&gt; Dict[str, int]:
        &quot;&quot;&quot;Categorize paths as static or dynamic.&quot;&quot;&quot;
        static_extensions = {&apos;html&apos;, &apos;css&apos;, &apos;js&apos;, &apos;png&apos;, &apos;jpg&apos;, &apos;jpeg&apos;, &apos;gif&apos;, &apos;svg&apos;, &apos;ico&apos;, &apos;pdf&apos;, &apos;txt&apos;}
        static_count = 0
        dynamic_count = 0
        
        for entry in self.entries:
            parsed = urlparse(entry.path)
            path = parsed.path
            if &apos;.&apos; in path:
                ext = path.split(&apos;.&apos;)[-1].split(&apos;?&apos;)[0]
                if ext.lower() in static_extensions:
                    static_count += 1
                else:
                    dynamic_count += 1
            else:
                dynamic_count += 1
        
        return {
            &apos;Static&apos;: static_count,
            &apos;Dynamic&apos;: dynamic_count
        }
    
    def get_api_endpoints(self) -&gt; List[Tuple[str, int]]:
        &quot;&quot;&quot;Identify API endpoints (paths starting with /api/).&quot;&quot;&quot;
        api_paths = Counter()
        for entry in self.entries:
            if entry.path.startswith(&apos;/api/&apos;):
                api_paths[entry.path] += 1
        return api_paths.most_common()
    
    def get_404_paths(self) -&gt; List[Tuple[str, int]]:
        &quot;&quot;&quot;Get paths that returned 404 errors.&quot;&quot;&quot;
        not_found = []
        for path, statuses in self.path_status.items():
            if statuses.get(404, 0) &gt; 0:
                not_found.append((path, statuses[404]))
        return sorted(not_found, key=lambda x: x[1], reverse=True)
    
    def generate_report(self) -&gt; str:
        &quot;&quot;&quot;Generate a text report of URL analysis.&quot;&quot;&quot;
        report = []
        report.append(&quot;=&quot; * 70)
        report.append(&quot;URL ANALYSIS REPORT&quot;)
        report.append(&quot;=&quot; * 70)
        
        # Top paths
        report.append(&quot;\n--- Top 20 Requested Paths ---&quot;)
        for path, count in self.get_top_paths(20):
            report.append(f&quot;  {path:40s} {count:6d} requests&quot;)
        
        # Top bytes
        report.append(&quot;\n--- Top 10 Paths by Bytes ---&quot;)
        for path, bytes in self.get_top_bytes_paths(10):
            report.append(f&quot;  {path:40s} {bytes:&gt;10,} bytes&quot;)
        
        # Method distribution
        report.append(&quot;\n--- HTTP Method Distribution ---&quot;)
        methods = self.get_method_distribution()
        total = sum(methods.values())
        for method, count in methods.items():
            pct = (count / total * 100) if total &gt; 0 else 0
            report.append(f&quot;  {method:10s}: {count:6d} ({pct:5.1f}%)&quot;)
        
        # Extension distribution
        report.append(&quot;\n--- File Extension Distribution ---&quot;)
        extensions = self.get_extension_distribution()
        for ext, count in extensions.items():
            report.append(f&quot;  .{ext:10s}: {count:6d}&quot;)
        
        # Static vs Dynamic
        report.append(&quot;\n--- Static vs Dynamic Content ---&quot;)
        static_dynamic = self.get_static_vs_dynamic()
        for content_type, count in static_dynamic.items():
            report.append(f&quot;  {content_type:10s}: {count:6d}&quot;)
        
        # API endpoints
        report.append(&quot;\n--- API Endpoints ---&quot;)
        api_endpoints = self.get_api_endpoints()
        for path, count in api_endpoints:
            report.append(f&quot;  {path:40s} {count:6d} requests&quot;)
        
        # 404 paths
        report.append(&quot;\n--- 404 Not Found Paths ---&quot;)
        not_found = self.get_404_paths()
        for path, count in not_found[:10]:
            report.append(f&quot;  {path:40s} {count:6d} errors&quot;)
        
        return &apos;\n&apos;.join(report)</code></pre><h3 id="conclusion">Conclusion</h3><ul><li>We are very impressed. We did not feel the need for this model to get a &apos;SOTA assist&apos; </li><li>This was a poorly defined prompt that didn&apos;t specify how to replicate GoAccess where most production &#xA0;projects would be managing their LLM tightly. &#xA0;</li><li>The conclusion is absolute in our opinion you <em>can</em> use these LLMs as a powerful assisting tool for basic building needs. </li><li>Speed is respectable and that&apos;s the big +, We are really running on minimal equipment for the LLM world, just a stock 4080 GPU with the tensors layers split between that and the RAM </li></ul><p></p>]]></content:encoded></item><item><title><![CDATA[PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++]]></title><description><![CDATA[Downloads Page for all your MCP tooling needs!]]></description><link>https://www.hotconfig.com/easy-bake-mcp-docker-tools/</link><guid isPermaLink="false">69e04f2d9e9ad20001df3b3c</guid><category><![CDATA[docker]]></category><category><![CDATA[MCP Server]]></category><category><![CDATA[Easy Bake]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Thu, 16 Apr 2026 03:07:59 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/04/surf_bot.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/04/surf_bot.jpg" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"><p>We have compiled a collection of very easy back MCP tools for all your agentic houseLLM needs!</p><ul><li>If you need to know how to use these a dedicated page for each one will guide you through getting them to work!</li><li>They are simple - they are just a docker container that opens up a port. Your houseLLM will query it for it&apos;s work. You register it simply by &apos;add MCP server&apos; That&apos;s it!</li><li>PLEASE NOTE YOUR RUN CONFIGURATION MAY WORK FASTER BETTER WITH <code>--webui-mcp-proxy</code> &#xA0;Then if you do use it click it on for each MCP Agent you use. </li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-98.png" class="kg-image" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++" loading="lazy" width="488" height="313"></figure><h2 id="scientific-calculator-mcp-docker-agent">Scientific Calculator MCP Docker Agent</h2><ul><li>Get your LLM doing incredibly powerful math - then verifying it&apos;s work! Seriously.</li></ul><pre><code class="language-bash">docker pull cnmcdee/mcp_calculator:latest
docker run -d --name mcp-calculator --restart unless-stopped -e &quot;FLASH_ENV=production&quot; -p 0.0.0.0:5000:5000 cnmcdee/mcp_calculator:latest</code></pre><ul><li>Because it&apos;s CORS http it works! Add it to your MCP Tooling easily via:</li></ul><pre><code class="language-bash">http://192.168.1.&lt;YOURIP&gt;:5000/mcp</code></pre><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/agentic-server-primers-mcp-lesson-1-a-calculator/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Agentic Server Primer: Llama.cpp MCP Lesson 1: A Calculator.</div><div class="kg-bookmark-description">Agentic Server Primers: Llama.cpp MCP Lession 1: A Calculator</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/mcp_01-3.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"></div></a></figure>
        <div class="kg-card kg-file-card kg-file-card-medium">
            <a class="kg-file-card-container" href="https://www.hotconfig.com/content/files/2026/04/a_mcp_calculator.zip" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">A mcp calculator</div>
                    
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">a_mcp_calculator.zip</div>
                        <div class="kg-file-card-filesize">3 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <h2 id="python-docker-mcp-docker-agent">Python Docker MCP Docker Agent</h2><ul><li>With this tool typically not only will your LLM generate code - it will then send it to testing to this Python MCP.</li></ul><pre><code class="language-bash">docker pull cnmcdee/mcp-python:latest
docker run -d --name mcp-python --restart unless-stopped -e &quot;FLASH_ENV=production&quot; -p 0.0.0.0:5001:5001 cnmcdee/mcp-python:latest</code></pre><ul><li>Because it&apos;s CORS HTTP enabled it works! Add it to your MCP Tooling easily via:</li></ul><pre><code class="language-bash">http://192.168.1.&lt;YOUR IP&gt;:5001/mcp</code></pre><ul><li>You simply change your IP address for each agent your add: 5002, 5003.. so on.</li></ul><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/agentic-server-primer-2/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Agentic Server Primer: Llama.cpp MCP Lesson 3: Adding Python Tooling Capability To your HouseLLM.</div><div class="kg-bookmark-description">Agentic Server Primer: Llama.cpp MCP Lesson 3: Python</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/mcp_03-3.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"></div></a></figure>
        <div class="kg-card kg-file-card kg-file-card-medium">
            <a class="kg-file-card-container" href="https://www.hotconfig.com/content/files/2026/04/b_mcp_python.zip" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">B mcp python</div>
                    
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">b_mcp_python.zip</div>
                        <div class="kg-file-card-filesize">4 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <h2 id="weather-api-example-mcp-docker-agent">Weather API Example MCP Docker Agent</h2><ul><li>This fully complementary MCP tool will enable your LLM to look up weather anywhere in the world!</li></ul><pre><code class="language-bash">docker pull docker.io/cnmcdee/mcp-weather:latest
docker run -d --name mcp-weather --restart unless-stopped -e &quot;FLASH_ENV=production&quot; -p 0.0.0.0:5002:5002 cnmcdee/mcp-weather:latest</code></pre><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/agentic-server-primer/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Agentic Server Primer: Llama.cpp MCP Lesson 4: Weather Polling via api.weather.gov</div><div class="kg-bookmark-description">Agentic Server Primer: Llama.cpp MCP Lesson 4: Weather Polling via api.weather.gov</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/mcp_04.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"></div></a></figure>
        <div class="kg-card kg-file-card kg-file-card-medium">
            <a class="kg-file-card-container" href="https://www.hotconfig.com/content/files/2026/04/c_mcp_weather.zip" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">C mcp weather</div>
                    
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">c_mcp_weather.zip</div>
                        <div class="kg-file-card-filesize">3 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <h2 id="javascript-enabled-mcp-agent">Javascript Enabled MCP Agent</h2><ul><li>This uses a Python API to pipe javascript. Very powerful when you want to code javascript for your html projects and your LLM wants to verify it&apos;s work!</li></ul><pre><code class="language-bash">docker pull docker.io/cnmcdee/mcp-javascript:latest
docker run -d --name mcp-javascript --restart unless-stopped -e &quot;FLASH_ENV=production&quot; -p 0.0.0.0:5003:5003 cnmcdee/mcp-javascript:latest</code></pre><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/agentic-server-primer-3/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin.</div><div class="kg-bookmark-description">We go through a full working example of creating your own MCP tools.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/mcp_05.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"></div></a></figure>
        <div class="kg-card kg-file-card kg-file-card-medium">
            <a class="kg-file-card-container" href="https://www.hotconfig.com/content/files/2026/04/d_mcp_javascript.zip" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">D mcp javascript</div>
                    
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">d_mcp_javascript.zip</div>
                        <div class="kg-file-card-filesize">3 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <h2 id="mysql-enabled-mcp-agentic-agent">Mysql Enabled MCP Agentic Agent</h2><ul><li>This will let your LLM communicate with it&apos;s own database! It can be very interesting watching it create tables, joins, merges all on it&apos;s own!</li><li>Please note you will probably need to do this one manually so it can make it&apos;s connection to the running mysql instance.</li></ul><pre><code class="language-bash">docker pull docker.io/cnmcdee/mcp-msyql:latest
docker run -d --name mcp-mysql --restart unless-stopped -e &quot;FLASH_ENV=production&quot; -p 0.0.0.0:5004:5004 cnmcdee/mcp-mysql:latest</code></pre><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/agentic-server-primer-llama-cpp-mcp-lesson-5-adding-javascript-via-a-python-api-plugin/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Agentic Server Primer: Llama.cpp MCP Lesson 6: Adding mysql Database Docker Toolsets.</div><div class="kg-bookmark-description">We give our LLM it&#x2019;s own Database to play with!</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/mcp_06.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"></div></a></figure>
        <div class="kg-card kg-file-card kg-file-card-medium">
            <a class="kg-file-card-container" href="https://www.hotconfig.com/content/files/2026/04/e_mcp_mysql.zip" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">E mcp mysql</div>
                    
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">e_mcp_mysql.zip</div>
                        <div class="kg-file-card-filesize">3 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <h2 id="superpython-enabled-mcp-agent">SuperPython Enabled MCP Agent</h2><ul><li>This is a MUCH beefier python with numpy, beautifulsoup, requests, whatever you want to put inside your python enviroment. Use this when your LLM is creating web enabled tools and wants to test them!</li><li>Again the process is near identical so we reference the basic python MCP docker tutorial from above..</li></ul><p>To quickly pull and run this locally:</p><pre><code class="language-bash">docker pull docker.io/cnmcdee/mcp-super-python:latest
docker run -d --name mcp-super-python --restart unless-stopped -e &quot;FLASH_ENV=production&quot; -p 0.0.0.0:5007:5007 cnmcdee/mcp-super-python:latest</code></pre>
        <div class="kg-card kg-file-card kg-file-card-medium">
            <a class="kg-file-card-container" href="https://www.hotconfig.com/content/files/2026/04/f_mcp_super_python.zip" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">F mcp super python</div>
                    
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">f_mcp_super_python.zip</div>
                        <div class="kg-file-card-filesize">7 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <h3 id="process-manager-with-code-drop-and-web-request-agent">Process Manager with Code Drop and Web Request Agent</h3><ul><li>This is a powerhouse tool. By allowing your LLM to save its work, gift it to you with a &apos;code drop&apos; resume a task, break it down into jobs and record it&apos;s own code snippets you can do production level research from a houseLLM at much lower cost!</li></ul><pre><code class="language-bash">docker pull docker.io/cnmcdee/mcp-process-manager:latest
docker run -d --name mcp-process-manager --restart unless-stopped -e &quot;FLASH_ENV=production&quot; -p 0.0.0.0:5008:5008 cnmcdee/mcp-process-manager:latest</code></pre><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/agentic-server-primer-llama-cpp-mcp-lesson-7-process-manager-part-1/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop.</div><div class="kg-bookmark-description">Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/surf_bot_clipped.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"></div></a></figure>
        <div class="kg-card kg-file-card kg-file-card-medium">
            <a class="kg-file-card-container" href="https://www.hotconfig.com/content/files/2026/04/h_mcp_process_manager.zip" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">H mcp process manager</div>
                    
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">h_mcp_process_manager.zip</div>
                        <div class="kg-file-card-filesize">11 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <h3 id="docker-orchestrator">Docker Orchestrator</h3><ul><li>This powerful MCP agent will enable your tool-enabled MCP agent to create, stand up and test it&apos;s own docker images and containers!</li><li>NOTE because this is designed to remote into a server via it&apos;s SSH tunnel you will need to set some variables:</li></ul><pre><code class="language-bash">docker pull docker.io/cnmcdee/mcp-docker-orchestrator:latest

docker run -d \
  --name mcp-docker-orchestrator \
  --restart unless-stopped \
  -p 0.0.0.0:5010:5010 \
  -e &quot;FLASH_ENV=production&quot; \
  -e ENV_SERVER=&quot;${ENV_SERVER}&quot; \
  -e ENV_USER=&quot;${ENV_USER}&quot; \
  -e ENV_PASSWORD=&quot;${ENV_PASSWORD:-}&quot; \
  -e ENV_PORT=&quot;${ENV_PORT:-22}&quot; \
  cnmcdee/mcp-docker-orchestrator</code></pre><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/agentic-server-primer-llama-cpp-mcp-lesson-9-docker-orchestrator/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Agentic Server Primer: Llama.cpp MCP Lesson 9: Docker Orchestrator</div><div class="kg-bookmark-description">In this guide we go over letting your llm manage and create it&#x2019;s own docker images, stand up it&#x2019;s own containers after writing it&#x2019;s code. It uses a special docker-compose tool we built for it.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/less9.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"></div></a></figure>
        <div class="kg-card kg-file-card kg-file-card-medium">
            <a class="kg-file-card-container" href="https://www.hotconfig.com/content/files/2026/04/I_dkr_manager.zip" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">I dkr manager</div>
                    
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">I_dkr_manager.zip</div>
                        <div class="kg-file-card-filesize">8 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        <h3 id="mcp-coder-cuda-enabled-docker-container">mcp-coder (Cuda Enabled Docker Container)</h3><ul><li>This is a powerful &apos;developmental&apos; type container for those that need a safe environment for developing and compiling new versions of Llama.cpp.</li></ul><pre><code class="language-bash">docker pull cnmcdee/mcp-coder:latest

# Start container
docker run -d \
--name $CONTAINER_NAME \
--restart unless-stopped \
-p 5011:5011 \
-v ~/mcp-workspace:/work_path \
--env PYTHONUNBUFFERED=1 \
cnmcdee/mcp-coder:latest</code></pre><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/agentic-server-primer-4/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Agentic Server Primer: Llama.cpp MCP Lesson 10: mcp-coder (Cuda Version)</div><div class="kg-bookmark-description">We build a MCP Coding Agent that will allow your LLM to specifically work on and debug it&#x2019;s own code with nvcc, or really any language!</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/05/lesson_10.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"></div></a></figure><h3 id="llmqppower-prompt-manager-for-your-llm">LLMQP - Power Prompt Manager for your LLM!</h3><ul><li>This is a powerhouse tool. With it you can have your LLM coding all night!</li></ul><pre><code class="language-bash">docker pull docker.io/cnmcdee/llmqueue:latest
docker run -d --name mcp-llmqueue --restart unless-stopped -p 0.0.0.0:5012:5012 cnmcdee/llmqueue:latest</code></pre><p></p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/llm-queue-dispatcher/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">LLMQP Drops! A New Queue Dispatcher. Let your LLM CODE ALL NIGHT.</div><div class="kg-bookmark-description">LLM Queue Dispatcher. A Powerful Harness Drop will queue your localLLM all night and keep it working!</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/05/rect4.png" alt="PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo. MCP TOOLS 1-9++"></div></a></figure>]]></content:encoded></item><item><title><![CDATA[Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop.]]></title><description><![CDATA[Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant]]></description><link>https://www.hotconfig.com/agentic-server-primer-llama-cpp-mcp-lesson-7-process-manager-part-1/</link><guid isPermaLink="false">69e033159e9ad20001df3aee</guid><category><![CDATA[House LLM]]></category><category><![CDATA[agentic]]></category><category><![CDATA[Agentic Workflow]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Thu, 16 Apr 2026 01:09:05 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/04/surf_bot_clipped.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/04/surf_bot_clipped.png" alt="Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop."><p>In our previous lesson we built a Process Manager for our HouseLLM. &#xA0;This was VERY powerful because now enabled anyone to continue &#xA0;a context or break it up into pieces and not lose it&apos;s work - all locally from your own gear. But why not add the feature of research - of going onto the Internet and looking stuff up. &#xA0;Our previous lesson for your reference:</p><p>If you just need to pull and run this docker here it is!</p><pre><code class="language-bash">docker pull docker.io/cnmcdee/mcp-process-manager:latest
docker run -d --name mcp-process-manager --restart unless-stopped -e &quot;FLASH_ENV=production&quot; -p 0.0.0.0:5008:5008 cnmcdee/mcp-process-manager:latest</code></pre><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/agentic-server-primer-llama-cpp-mcp-lesson-7-process-orchestrator/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Agentic Server Primer: Llama.cpp MCP Lesson 7: Process Manager (Part 1)</div><div class="kg-bookmark-description">Agentic Server Primer: Llama.cpp MCP Lesson 7: Process Manager (Part 1)</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop."><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/mcp_07.png" alt="Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop."></div></a></figure><p>Adding this was literally a command to your local AI, and it dutifully wrote and added this &#xA0;for us.</p><p>We added effective only a few tools, and had our LLM change the structure.</p><pre><code class="language-python"># &#x2500;&#x2500; HTML-page tools &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
mcp.add_tool(task_manager.task_html_add)
mcp.add_tool(task_manager.task_html_get)
mcp.add_tool(task_manager.task_html_delete)
mcp.add_tool(task_manager.task_html_update)
# &#x2500;&#x2500; Web-fetching tools &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
mcp.add_tool(task_manager.duckduckgo_html_list_fetch)
mcp.add_tool(task_manager.html_fetch)
# &#x2500;&#x2500; Enhanced code_drop tool (now packages everything) &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
mcp.add_tool(task_manager.code_drop)</code></pre><h3 id="code-drop">Code Drop</h3><ul><li>Because this tool can enable your LLM to generate MASSIVE amounts of code overnight, you can ask your LLM to do a &apos;code_drop&apos; The back function will automatically parse all the code snippets into a ZIP, it will create a temporary download link with it&apos;s work! Serious time saver! </li><li>Note - this will generate a link to <strong>localhost, </strong>just change it to an ip address if this is not your local machine or a headless unit.</li><li>Nicee... A typical structure built and packed for your download..</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-97.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop." loading="lazy" width="936" height="471" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-97.png 600w, https://www.hotconfig.com/content/images/2026/04/image-97.png 936w" sizes="(min-width: 720px) 720px"></figure><h3 id="full-code">Full Code</h3><ul><li>You might need to install some of the support pacakges - naturally. A very easy way to do this is just install pycharm, then cut and paste this code into any open app.py - because <a href="https://www.hotconfig.com/pycharm-community-fast-install/">pycharm</a> is very supportive it will take care of buidling your venv, and or instaling the neccessary packages to run it!</li></ul><pre><code class="language-python">import re
import json
import os
from datetime import datetime
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse, parse_qs
from fastmcp import FastMCP
from fastmcp.tools import tool
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
from starlette.staticfiles import StaticFiles
import uvicorn
import tempfile
import shutil
from pathlib import Path

# Initialize the MCP server
mcp = FastMCP(
    name=&quot;Process Manager&quot;,
    instructions=&quot;Provides a process manager for tracking tasks and their associated jobs, code blocks, and HTML pages. &quot;
                 &quot;Also includes web-search (DuckDuckGo) and webpage-fetching tools for retrieving external content.&quot;)

# Directory for downloadable ZIP files (served statically at /downloads)
DOWNLOADS_DIR = Path(&quot;downloads&quot;)
DOWNLOADS_DIR.mkdir(exist_ok=True)

class TaskManager:
    def __init__(self, storage_dir: str = &quot;processes&quot;):
        self.storage_dir = storage_dir
        os.makedirs(self.storage_dir, exist_ok=True)
        self.current_process = None
        self.current_process_file = None

    def _sanitize_name(self, name: str) -&gt; str:
        &quot;&quot;&quot;Sanitize user-supplied name to prevent path traversal and invalid filenames.&quot;&quot;&quot;
        if not name or not str(name).strip():
            name = &quot;unnamed_process&quot;
        safe_name = re.sub(r&apos;[^a-zA-Z0-9_.-]&apos;, &apos;_&apos;, str(name).strip())
        return safe_name[:150]

    def _get_filepath(self, name: str) -&gt; str:
        &quot;&quot;&quot;Return full path to the sanitized JSON file.&quot;&quot;&quot;
        safe_name = self._sanitize_name(name)
        if not safe_name.endswith(&apos;.json&apos;):
            safe_name += &apos;.json&apos;
        return os.path.join(self.storage_dir, safe_name)

    def _save_current(self):
        &quot;&quot;&quot;Persist the current process to disk.&quot;&quot;&quot;
        if self.current_process and self.current_process_file:
            filepath = self._get_filepath(self.current_process_file)
            with open(filepath, &apos;w&apos;, encoding=&apos;utf-8&apos;) as f:
                json.dump(self.current_process, f, indent=2, ensure_ascii=False)

    def _load_process(self, name: str) -&gt; bool:
        &quot;&quot;&quot;Load a process from disk and update internal state.&quot;&quot;&quot;
        filepath = self._get_filepath(name)
        if not os.path.exists(filepath):
            return False
        try:
            with open(filepath, &apos;r&apos;, encoding=&apos;utf-8&apos;) as f:
                self.current_process = json.load(f)
            self.current_process_file = name
            return True
        except Exception:
            return False

    @tool()
    def process_new(self, name: str, description: str):
        &quot;&quot;&quot;Creates a new process with the given name and description.&quot;&quot;&quot;
        try:
            name = str(name).strip()
            if not name:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;Process name cannot be empty.&quot;})

            filepath = self._get_filepath(name)
            if os.path.exists(filepath):
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Process &apos;{name}&apos; already exists.&quot;})

            self.current_process = {
                &quot;process_name&quot;: name,
                &quot;description&quot;: description,
                &quot;created_at&quot;: datetime.now().isoformat(),
                &quot;tasks&quot;: {}
            }
            self.current_process_file = name
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Process &apos;{name}&apos; created successfully.&quot;,
                &quot;process&quot;: self.current_process
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def process_load(self, name: str):
        &quot;&quot;&quot;Loads an existing process by name.&quot;&quot;&quot;
        try:
            if self._load_process(name):
                task_count = len(self.current_process.get(&quot;tasks&quot;, {}))
                return json.dumps({
                    &quot;success&quot;: True,
                    &quot;message&quot;: f&quot;Process &apos;{name}&apos; loaded successfully.&quot;,
                    &quot;process_name&quot;: self.current_process.get(&quot;process_name&quot;),
                    &quot;task_count&quot;: task_count,
                    &quot;process_data&quot;: self.current_process
                })
            return json.dumps({
                &quot;success&quot;: False,
                &quot;error&quot;: f&quot;Process &apos;{name}&apos; not found.&quot;
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def process_list_all(self):
        &quot;&quot;&quot;Lists all saved processes.&quot;&quot;&quot;
        try:
            files = [f for f in os.listdir(self.storage_dir) if f.endswith(&apos;.json&apos;)]
            process_list = [{&quot;process&quot;: f.replace(&apos;.json&apos;, &apos;&apos;)} for f in files]
            return json.dumps({
                &quot;success&quot;: True,
                &quot;process_list&quot;: process_list,
                &quot;total&quot;: len(process_list)
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e), &quot;process_list&quot;: []})

    @tool()
    def process_save(self, name: str = None):
        &quot;&quot;&quot;Saves the current process, optionally renaming it.&quot;&quot;&quot;
        try:
            if not self.current_process:
                return json.dumps({
                    &quot;success&quot;: False,
                    &quot;error&quot;: &quot;No active process to save. Create or load a process first.&quot;
                })

            if name:
                name = str(name).strip()
                self.current_process[&quot;process_name&quot;] = name
                self.current_process_file = name

            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Process saved successfully as &apos;{self.current_process_file}&apos;.&quot;,
                &quot;process_name&quot;: self.current_process.get(&quot;process_name&quot;)
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def process_delete(self, name: str):
        &quot;&quot;&quot;Deletes an entire process and its associated file from storage.
        If the deleted process is currently loaded, the in-memory state is cleared.&quot;&quot;&quot;
        try:
            name = str(name).strip()
            if not name:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;Process name cannot be empty.&quot;})

            filepath = self._get_filepath(name)
            if not os.path.exists(filepath):
                return json.dumps({
                    &quot;success&quot;: False,
                    &quot;error&quot;: f&quot;Process &apos;{name}&apos; not found.&quot;
                })

            os.remove(filepath)

            if (self.current_process_file and
                    self._sanitize_name(self.current_process_file) == self._sanitize_name(name)):
                self.current_process = None
                self.current_process_file = None

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Process &apos;{name}&apos; deleted successfully.&quot;,
                &quot;deleted_process&quot;: name
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_new(self, task_name: str, task_description: str):
        &quot;&quot;&quot;Creates a new task within the current process.&quot;&quot;&quot;
        try:
            if not self.current_process:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;No active process. Create or load one first.&quot;})
            if task_name in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; already exists.&quot;})

            self.current_process[&quot;tasks&quot;][task_name] = {
                &quot;description&quot;: task_description,
                &quot;created_at&quot;: datetime.now().isoformat(),
                &quot;jobs&quot;: {},
                &quot;code&quot;: {},
                &quot;html_pages&quot;: {}
            }
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Task &apos;{task_name}&apos; created successfully.&quot;,
                &quot;task&quot;: self.current_process[&quot;tasks&quot;][task_name]
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_list(self):
        &quot;&quot;&quot;Lists all tasks with summary information.&quot;&quot;&quot;
        try:
            if not self.current_process:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;No active process.&quot;})

            tasks_summary = {}
            for t_name, t_data in self.current_process[&quot;tasks&quot;].items():
                tasks_summary[t_name] = {
                    &quot;description&quot;: t_data[&quot;description&quot;],
                    &quot;created_at&quot;: t_data[&quot;created_at&quot;],
                    &quot;job_count&quot;: len(t_data.get(&quot;jobs&quot;, {})),
                    &quot;code_count&quot;: len(t_data.get(&quot;code&quot;, {})),
                    &quot;html_count&quot;: len(t_data.get(&quot;html_pages&quot;, {}))
                }

            return json.dumps({
                &quot;success&quot;: True,
                &quot;task_count&quot;: len(tasks_summary),
                &quot;tasks&quot;: tasks_summary
            }, indent=2)
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_details_get(self, task_name: str):
        &quot;&quot;&quot;Retrieves complete details of a specific task.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            task_data = self.current_process[&quot;tasks&quot;][task_name]
            return json.dumps({
                &quot;success&quot;: True,
                &quot;task_name&quot;: task_name,
                &quot;task_description&quot;: task_data[&quot;description&quot;],
                &quot;created_at&quot;: task_data[&quot;created_at&quot;],
                &quot;jobs&quot;: list(task_data.get(&quot;jobs&quot;, {}).values()),
                &quot;codes&quot;: list(task_data.get(&quot;code&quot;, {}).values()),
                &quot;html_pages&quot;: list(task_data.get(&quot;html_pages&quot;, {}).values()),
                &quot;job_count&quot;: len(task_data.get(&quot;jobs&quot;, {})),
                &quot;code_count&quot;: len(task_data.get(&quot;code&quot;, {})),
                &quot;html_count&quot;: len(task_data.get(&quot;html_pages&quot;, {}))
            }, indent=2)
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_update_description(self, task_name: str, new_description: str):
        &quot;&quot;&quot;Updates the description of an existing task.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            old_description = self.current_process[&quot;tasks&quot;][task_name][&quot;description&quot;]
            self.current_process[&quot;tasks&quot;][task_name][&quot;description&quot;] = new_description
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Task &apos;{task_name}&apos; description updated successfully.&quot;,
                &quot;task_name&quot;: task_name,
                &quot;old_description&quot;: old_description,
                &quot;new_description&quot;: new_description
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_name_change(self, old_task_name: str, new_task_name: str):
        &quot;&quot;&quot;Renames an existing task.&quot;&quot;&quot;
        try:
            if not self.current_process:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;No active process.&quot;})
            if old_task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{old_task_name}&apos; does not exist.&quot;})
            if new_task_name in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{new_task_name}&apos; already exists.&quot;})

            self.current_process[&quot;tasks&quot;][new_task_name] = self.current_process[&quot;tasks&quot;].pop(old_task_name)
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Task renamed from &apos;{old_task_name}&apos; to &apos;{new_task_name}&apos; successfully.&quot;,
                &quot;old_name&quot;: old_task_name,
                &quot;new_name&quot;: new_task_name
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_delete(self, task_name: str):
        &quot;&quot;&quot;Deletes a task from the current process.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            del self.current_process[&quot;tasks&quot;][task_name]
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Task &apos;{task_name}&apos; deleted successfully.&quot;
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_job_add(self, task_name: str, job_name: str, job_description: str):
        &quot;&quot;&quot;Adds a job to the specified task.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            job_key = f&quot;{job_name}_{datetime.now().isoformat()}&quot;
            self.current_process[&quot;tasks&quot;][task_name][&quot;jobs&quot;][job_key] = {
                &quot;name&quot;: job_name,
                &quot;description&quot;: job_description,
                &quot;created_at&quot;: datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Job &apos;{job_name}&apos; added to task &apos;{task_name}&apos; successfully.&quot;
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_job_get(self, task_name: str, job_name: str):
        &quot;&quot;&quot;Retrieves job(s) matching the given job_name (partial match).&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            jobs = self.current_process[&quot;tasks&quot;][task_name].get(&quot;jobs&quot;, {})
            matching = [data for key, data in jobs.items() if job_name in key or job_name == data.get(&quot;name&quot;)]

            return json.dumps({
                &quot;success&quot;: True,
                &quot;task_name&quot;: task_name,
                &quot;jobs&quot;: matching
            }, indent=2)
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_code_add(self, task_name: str, code_name: str, code_content: str):
        &quot;&quot;&quot;Adds a code snippet to the specified task.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            code_key = f&quot;{code_name}_{datetime.now().isoformat()}&quot;
            self.current_process[&quot;tasks&quot;][task_name][&quot;code&quot;][code_key] = {
                &quot;name&quot;: code_name,
                &quot;content&quot;: code_content,
                &quot;created_at&quot;: datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Code &apos;{code_name}&apos; added to task &apos;{task_name}&apos; successfully.&quot;
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_code_get(self, task_name: str, code_name: str):
        &quot;&quot;&quot;Retrieves code snippet(s) matching the given code_name (partial match).&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            codes = self.current_process[&quot;tasks&quot;][task_name].get(&quot;code&quot;, {})
            matching = [data for key, data in codes.items() if code_name in key or code_name == data.get(&quot;name&quot;)]

            return json.dumps({
                &quot;success&quot;: True,
                &quot;task_name&quot;: task_name,
                &quot;codes&quot;: matching
            }, indent=2)
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_code_delete(self, task_name: str, code_name: str):
        &quot;&quot;&quot;Deletes code snippet(s) matching the given code_name (partial match).&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            codes = self.current_process[&quot;tasks&quot;][task_name].get(&quot;code&quot;, {})
            keys_to_delete = [k for k in codes if code_name in k or code_name == codes[k].get(&quot;name&quot;)]
            deleted = [codes[k] for k in keys_to_delete]

            for k in keys_to_delete:
                del self.current_process[&quot;tasks&quot;][task_name][&quot;code&quot;][k]

            self._save_current()

            if not deleted:
                return json.dumps({
                    &quot;success&quot;: False,
                    &quot;message&quot;: f&quot;No code matching &apos;{code_name}&apos; found in task &apos;{task_name}&apos;.&quot;,
                    &quot;task_name&quot;: task_name,
                    &quot;code_name&quot;: code_name
                })

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Deleted {len(deleted)} code snippet(s) matching &apos;{code_name}&apos;.&quot;,
                &quot;task_name&quot;: task_name,
                &quot;deleted_count&quot;: len(deleted),
                &quot;deleted_codes&quot;: deleted
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_html_add(self, task_name: str, html_name: str, html_content: str):
        &quot;&quot;&quot;Adds an HTML page to the specified task.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            html_key = f&quot;{html_name}_{datetime.now().isoformat()}&quot;
            self.current_process[&quot;tasks&quot;][task_name][&quot;html_pages&quot;][html_key] = {
                &quot;name&quot;: html_name,
                &quot;content&quot;: html_content,
                &quot;created_at&quot;: datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;HTML page &apos;{html_name}&apos; added to task &apos;{task_name}&apos; successfully.&quot;
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_html_get(self, task_name: str, html_name: str):
        &quot;&quot;&quot;Retrieves HTML page(s) matching the given html_name (partial match).&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            html_pages = self.current_process[&quot;tasks&quot;][task_name].get(&quot;html_pages&quot;, {})
            matching = [data for key, data in html_pages.items() if html_name in key or html_name == data.get(&quot;name&quot;)]

            return json.dumps({
                &quot;success&quot;: True,
                &quot;task_name&quot;: task_name,
                &quot;html_pages&quot;: matching
            }, indent=2)
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_html_delete(self, task_name: str, html_name: str):
        &quot;&quot;&quot;Deletes HTML page(s) matching the given html_name (partial match).&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            html_pages = self.current_process[&quot;tasks&quot;][task_name].get(&quot;html_pages&quot;, {})
            keys_to_delete = [k for k in html_pages if html_name in k or html_name == html_pages[k].get(&quot;name&quot;)]
            deleted = [html_pages[k] for k in keys_to_delete]

            for k in keys_to_delete:
                del self.current_process[&quot;tasks&quot;][task_name][&quot;html_pages&quot;][k]

            self._save_current()

            if not deleted:
                return json.dumps({
                    &quot;success&quot;: False,
                    &quot;message&quot;: f&quot;No HTML page matching &apos;{html_name}&apos; found in task &apos;{task_name}&apos;.&quot;,
                    &quot;task_name&quot;: task_name,
                    &quot;html_name&quot;: html_name
                })

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Deleted {len(deleted)} HTML page(s) matching &apos;{html_name}&apos;.&quot;,
                &quot;task_name&quot;: task_name,
                &quot;deleted_count&quot;: len(deleted),
                &quot;deleted_html_pages&quot;: deleted
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_html_update(self, task_name: str, html_name: str, new_content: str):
        &quot;&quot;&quot;Updates (by creating a new version of) an HTML page.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            html_key = f&quot;{html_name}_{datetime.now().isoformat()}&quot;
            self.current_process[&quot;tasks&quot;][task_name][&quot;html_pages&quot;][html_key] = {
                &quot;name&quot;: html_name,
                &quot;content&quot;: new_content,
                &quot;created_at&quot;: datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;HTML page &apos;{html_name}&apos; updated successfully in task &apos;{task_name}&apos;.&quot;,
                &quot;new_version_created&quot;: True
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def duckduckgo_html_list_fetch(self, query: str, max_results: int = 5):
        &quot;&quot;&quot;Searches DuckDuckGo via its HTML endpoint and returns a list of results (title, URL, snippet).&quot;&quot;&quot;
        try:
            query = str(query).strip()
            if not query:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;Search query cannot be empty.&quot;})

            search_url = f&quot;https://duckduckgo.com/html/?q={requests.utils.quote(query)}&quot;
            headers = {
                &quot;User-Agent&quot;: &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 &quot;
                              &quot;(KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36&quot;
            }

            resp = requests.get(search_url, headers=headers, timeout=15)
            resp.raise_for_status()

            soup = BeautifulSoup(resp.text, &quot;html.parser&quot;)
            results = []

            for item in soup.select(&quot;.result__body&quot;)[:max_results]:
                a_tag = item.select_one(&quot;.result__a&quot;)
                snippet_tag = item.select_one(&quot;.result__snippet&quot;)

                if a_tag:
                    title = a_tag.get_text(strip=True)
                    href = a_tag.get(&quot;href&quot;, &quot;&quot;)

                    if &quot;uddg=&quot; in href:
                        href = href.split(&quot;uddg=&quot;)[1].split(&quot;&amp;&quot;)[0]
                        href = requests.utils.unquote(href)

                    snippet = snippet_tag.get_text(strip=True) if snippet_tag else &quot;&quot;

                    results.append({
                        &quot;title&quot;: title,
                        &quot;url&quot;: href,
                        &quot;snippet&quot;: snippet
                    })

            return json.dumps({
                &quot;success&quot;: True,
                &quot;query&quot;: query,
                &quot;result_count&quot;: len(results),
                &quot;results&quot;: results
            }, indent=2)

        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;DuckDuckGo search failed: {str(e)}&quot;})

    @tool()
    def html_fetch(self, url: str):
        &quot;&quot;&quot;Fetches a webpage and returns clean, readable text content parsed by BeautifulSoup.&quot;&quot;&quot;
        try:
            url = str(url).strip()
            if not url.startswith((&quot;http://&quot;, &quot;https://&quot;)):
                url = &quot;https://&quot; + url

            headers = {
                &quot;User-Agent&quot;: &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 &quot;
                              &quot;(KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36&quot;
            }

            resp = requests.get(url, headers=headers, timeout=20)
            resp.raise_for_status()

            soup = BeautifulSoup(resp.text, &quot;html.parser&quot;)

            for unwanted in soup([&quot;script&quot;, &quot;style&quot;, &quot;nav&quot;, &quot;header&quot;, &quot;footer&quot;, &quot;aside&quot;, &quot;form&quot;, &quot;button&quot;]):
                unwanted.decompose()

            text = soup.get_text(separator=&quot;\n&quot;, strip=True)

            if len(text) &gt; 40000:
                text = text[:40000] + &quot;\n\n[Content truncated due to length...]&quot;

            page_title = soup.title.string.strip() if soup.title else &quot;No title found&quot;

            return json.dumps({
                &quot;success&quot;: True,
                &quot;url&quot;: url,
                &quot;title&quot;: page_title,
                &quot;content_length&quot;: len(text),
                &quot;text&quot;: text
            }, indent=2)

        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def task_jobs_list(self):
        &quot;&quot;&quot;Lists all tasks (identical to task_list for compatibility).&quot;&quot;&quot;
        return self.task_list()

    @tool()
    def task_code_update(self, task_name: str, code_name: str, new_content: str):
        &quot;&quot;&quot;Updates (by creating a new version of) a code snippet.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            code_key = f&quot;{code_name}_{datetime.now().isoformat()}&quot;
            self.current_process[&quot;tasks&quot;][task_name][&quot;code&quot;][code_key] = {
                &quot;name&quot;: code_name,
                &quot;content&quot;: new_content,
                &quot;created_at&quot;: datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Code &apos;{code_name}&apos; updated successfully in task &apos;{task_name}&apos;.&quot;,
                &quot;new_version_created&quot;: True
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})

    @tool()
    def code_drop(self, custom_filename: str = None):
        &quot;&quot;&quot;Packages the ENTIRE process: all code snippets, all HTML pages, full process metadata (configuration), task descriptions, and jobs into a well-organized ZIP archive with clean folder structure.&quot;&quot;&quot;
        try:
            if not self.current_process:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;No active process. Create or load a process first.&quot;})

            process_name = self.current_process.get(&quot;process_name&quot;, &quot;unnamed_process&quot;)
            timestamp = datetime.now().strftime(&quot;%Y%m%d_%H%M%S&quot;)

            # Determine ZIP filename
            if custom_filename:
                zip_name = custom_filename if custom_filename.endswith(&apos;.zip&apos;) else f&quot;{custom_filename}.zip&quot;
            else:
                zip_name = f&quot;full_export_{process_name}_{timestamp}.zip&quot;

            with tempfile.TemporaryDirectory() as tmp_dir:
                tmp_path = Path(tmp_dir)
                root_dir = tmp_path / f&quot;{process_name}_{timestamp}&quot;
                root_dir.mkdir(parents=True, exist_ok=True)

                # 1. Save complete process metadata (this acts as the master configuration file)
                (root_dir / &quot;full_process_backup.json&quot;).write_text(
                    json.dumps(self.current_process, indent=2, ensure_ascii=False),
                    encoding=&quot;utf-8&quot;
                )

                # 2. Create tasks directory with organized content
                tasks_dir = root_dir / &quot;tasks&quot;
                tasks_dir.mkdir(exist_ok=True)

                total_code = 0
                total_html = 0

                for task_name, task_data in self.current_process.get(&quot;tasks&quot;, {}).items():
                    safe_task_name = self._sanitize_name(task_name)
                    task_dir = tasks_dir / safe_task_name
                    task_dir.mkdir(exist_ok=True)

                    # Task description
                    desc_path = task_dir / &quot;task_description.txt&quot;
                    desc_path.write_text(
                        f&quot;# Task: {task_name}\n\n{task_data.get(&apos;description&apos;, &apos;No description provided.&apos;)}&quot;,
                        encoding=&quot;utf-8&quot;
                    )

                    # Jobs summary
                    jobs = task_data.get(&quot;jobs&quot;, {})
                    if jobs:
                        jobs_md = &quot;# Jobs\n\n&quot;
                        for job in jobs.values():
                            jobs_md += f&quot;## {job.get(&apos;name&apos;, &apos;Unnamed Job&apos;)}\n{job.get(&apos;description&apos;, &apos;&apos;)}\n\n&quot;
                        (task_dir / &quot;jobs_summary.md&quot;).write_text(jobs_md, encoding=&quot;utf-8&quot;)

                    # Code snippets
                    code_dir = task_dir / &quot;code&quot;
                    code_dir.mkdir(exist_ok=True)
                    for code_entry in task_data.get(&quot;code&quot;, {}).values():
                        code_name = code_entry.get(&quot;name&quot;, &quot;unnamed_code&quot;)
                        content = code_entry.get(&quot;content&quot;, &quot;&quot;)
                        if &quot;.&quot; not in code_name and not code_name.lower().endswith((&apos;.py&apos;, &apos;.js&apos;, &apos;.java&apos;, &apos;.cpp&apos;, &apos;.go&apos;, &apos;.rs&apos;)):
                            code_name += &quot;.py&quot;
                        (code_dir / code_name).write_text(content, encoding=&quot;utf-8&quot;)
                        total_code += 1

                    # HTML pages
                    html_dir = task_dir / &quot;html&quot;
                    html_dir.mkdir(exist_ok=True)
                    for html_entry in task_data.get(&quot;html_pages&quot;, {}).values():
                        html_name = html_entry.get(&quot;name&quot;, &quot;unnamed_page&quot;)
                        content = html_entry.get(&quot;content&quot;, &quot;&quot;)
                        if not html_name.lower().endswith((&apos;.html&apos;, &apos;.htm&apos;)):
                            html_name += &quot;.html&quot;
                        (html_dir / html_name).write_text(content, encoding=&quot;utf-8&quot;)
                        total_html += 1

                # 3. Top-level README
                readme = f&quot;&quot;&quot;# {process_name} - Full Export

**Generated:** {datetime.now().strftime(&apos;%Y-%m-%d %H:%M:%S&apos;)}

- **Tasks**: {len(self.current_process.get(&quot;tasks&quot;, {}))}
- **Code files**: {total_code}
- **HTML pages**: {total_html}

## Folder Structure
- `full_process_backup.json` &#x2192; Complete process data (metadata + everything)
- `tasks/&lt;task_name&gt;/`
  - `task_description.txt`
  - `jobs_summary.md`
  - `code/` &#x2192; All code snippets
  - `html/` &#x2192; All HTML pages

Ready for development, review, or deployment.
&quot;&quot;&quot;
                (root_dir / &quot;README.md&quot;).write_text(readme, encoding=&quot;utf-8&quot;)

                # Create ZIP archive
                zip_base = DOWNLOADS_DIR / zip_name.replace(&apos;.zip&apos;, &apos;&apos;)
                shutil.make_archive(str(zip_base), &apos;zip&apos;, root_dir)

            download_url = f&quot;http://localhost:5008/downloads/{zip_name}&quot;

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Successfully created full project export for &apos;{process_name}&apos; ({total_code} code files, {total_html} HTML pages).&quot;,
                &quot;download_url&quot;: download_url,
                &quot;filename&quot;: zip_name,
                &quot;code_count&quot;: total_code,
                &quot;html_count&quot;: total_html,
                &quot;task_count&quot;: len(self.current_process.get(&quot;tasks&quot;, {}))
            }, indent=2)

        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})


task_manager = TaskManager()

# &#x2500;&#x2500; Register all bound methods as tools &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
mcp.add_tool(task_manager.process_new)
mcp.add_tool(task_manager.process_load)
mcp.add_tool(task_manager.process_list_all)
mcp.add_tool(task_manager.process_save)
mcp.add_tool(task_manager.process_delete)
mcp.add_tool(task_manager.task_new)
mcp.add_tool(task_manager.task_update_description)
mcp.add_tool(task_manager.task_delete)
mcp.add_tool(task_manager.task_name_change)
mcp.add_tool(task_manager.task_job_add)
mcp.add_tool(task_manager.task_job_get)
mcp.add_tool(task_manager.task_details_get)
mcp.add_tool(task_manager.task_list)
mcp.add_tool(task_manager.task_code_add)
mcp.add_tool(task_manager.task_code_get)
mcp.add_tool(task_manager.task_code_delete)
mcp.add_tool(task_manager.task_jobs_list)
mcp.add_tool(task_manager.task_code_update)
# &#x2500;&#x2500; HTML-page tools &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
mcp.add_tool(task_manager.task_html_add)
mcp.add_tool(task_manager.task_html_get)
mcp.add_tool(task_manager.task_html_delete)
mcp.add_tool(task_manager.task_html_update)
# &#x2500;&#x2500; Web-fetching tools &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
mcp.add_tool(task_manager.duckduckgo_html_list_fetch)
mcp.add_tool(task_manager.html_fetch)
# &#x2500;&#x2500; Enhanced code_drop tool (now packages everything) &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
mcp.add_tool(task_manager.code_drop)

# &#x2500;&#x2500; Server Startup with CORS and static downloads route &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
if __name__ == &quot;__main__&quot;:
    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=[&quot;*&quot;],
            allow_credentials=True,
            allow_methods=[&quot;GET&quot;, &quot;POST&quot;, &quot;OPTIONS&quot;],
            allow_headers=[&quot;*&quot;],
            expose_headers=[&quot;*&quot;],
        )
    ]

    app = mcp.http_app(
        path=&quot;/mcp&quot;,
        middleware=middleware
    )

    app.mount(&quot;/downloads&quot;, StaticFiles(directory=str(DOWNLOADS_DIR)), name=&quot;downloads&quot;)

    uvicorn.run(
        app,
        host=&quot;0.0.0.0&quot;,
        port=5008,
        log_level=&quot;info&quot;
    )</code></pre><ul><li>We stayed on 5008, it is the same port as last, really just an add-on. &#xA0;We&apos;ll post the full guide on how to dockerize it at the end of this article. &#xA0;For now - how does it work. Well - Good!</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-94.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop." loading="lazy" width="645" height="421" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-94.png 600w, https://www.hotconfig.com/content/images/2026/04/image-94.png 645w"></figure><ul><li>We now can just work on Task 4 - independent of the entire job.</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-95.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop." loading="lazy" width="645" height="145" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-95.png 600w, https://www.hotconfig.com/content/images/2026/04/image-95.png 645w"></figure><h3 id="context-slowdown">Context SlowDown</h3><ul><li>When you start seriously building production as your context size increases your LLM will slow down. But this is the beauty of this tool - because you can chop your context up into pieces and work on each one. &#xA0;It slowly gave us it&apos;s code snippets.<br></li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-96.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 8: Process Manager Web Enabled Research Assistant w/ Code Drop." loading="lazy" width="632" height="348" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-96.png 600w, https://www.hotconfig.com/content/images/2026/04/image-96.png 632w"></figure><h3 id="restart-is-now-harmless">Restart is Now Harmless</h3><ul><li>Recycle your work automatically as you can simply start a new context, delete old ones - even assign two LLM&apos;s to the same task. &#xA0;That&apos;s power!</li></ul><p>We Let a LLM Write The Rest..</p><ul><li>We made sure this docker build was based upon the source code, so that requirements matched.</li></ul><p><strong>requirements.txt</strong></p><pre><code>fastmcp
requests
beautifulsoup4
uvicorn[standard]
starlette
</code></pre><p><strong>Dockerfile</strong></p><pre><code class="language-dockerfile">FROM python:3.12-slim

WORKDIR /app

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code
COPY app.py .

# Ensure the storage directory exists (volume mount will override at runtime)
RUN mkdir -p processes

# Expose the port used by the FastMCP server
EXPOSE 5008

# Start the application
CMD [&quot;python&quot;, &quot;app.py&quot;]
</code></pre><p><strong>docker-compose.yml</strong></p><pre><code class="language-yaml">version: &apos;3.9&apos;

services:
  task-manager:
    build: .
    container_name: task-manager-mcp
    ports:
      - &quot;5008:5008&quot;
    volumes:
      - ./processes:/app/processes
    restart: unless-stopped
    environment:
      - PYTHONUNBUFFERED=1
</code></pre><h3 id="breakdown-of-dockerizing-the-application">Breakdown of Dockerizing the Application</h3><p>The provided Python code implements a FastMCP-based process manager server that persists data to JSON files in a <code>processes/</code> directory, exposes an HTTP endpoint on port 5008 with CORS middleware, and depends on several external libraries. Dockerizing this application ensures consistent runtime environments, isolates dependencies, facilitates reproducible deployments, and simplifies scaling or distribution.</p><p>The proposed solution consists of three standard files and assumes the supplied code is saved as <code>app.py</code> in the project root. The directory structure would then be:</p><pre><code>.
&#x251C;&#x2500;&#x2500; app.py          # Your complete Python script
&#x251C;&#x2500;&#x2500; requirements.txt
&#x251C;&#x2500;&#x2500; Dockerfile
&#x251C;&#x2500;&#x2500; docker-compose.yml
&#x2514;&#x2500;&#x2500; processes/      # Created automatically; persisted via volume
</code></pre><p><strong>requirements.txt</strong><br>This file explicitly declares all non-standard-library dependencies extracted from the imports (<code>fastmcp</code>, <code>requests</code>, <code>beautifulsoup4</code>, <code>uvicorn</code>, and <code>starlette</code>). The <code>[standard]</code> extra for <code>uvicorn</code> includes recommended logging and signal-handling utilities. Using a pinned requirements file guarantees that <code>pip install</code> produces identical environments across builds.</p><p><strong>Dockerfile</strong></p><ul><li>Starts from the lightweight <code>python:3.12-slim</code> base image to minimize image size and attack surface.</li><li>Sets <code>/app</code> as the working directory and installs dependencies first (leveraging Docker layer caching for faster rebuilds).</li><li>Copies <code>app.py</code> and creates the <code>processes/</code> directory.</li><li>Exposes port 5008 to document the service port (the <code>0.0.0.0</code> binding inside the code ensures accessibility from outside the container).</li><li>The <code>CMD</code> executes the script directly, which runs <code>uvicorn</code> internally and starts the FastMCP HTTP application with the registered tools.</li></ul><p><strong>docker-compose.yml</strong><br>This orchestration file defines a single service for straightforward local or production deployment. Key features include:</p><ul><li>Port mapping (<code>5008:5008</code>) so the MCP endpoint is reachable at <code>http://localhost:5008/mcp</code>.</li><li>A bind mount (<code>./processes:/app/processes</code>) that persists all JSON process files to the host filesystem. Without this volume, data would be lost on container restart or rebuild.</li><li><code>restart: unless-stopped</code> for resilience.</li><li><code>PYTHONUNBUFFERED=1</code> ensures log output appears immediately in <code>docker logs</code> or <code>docker-compose</code> console.</li></ul><p><strong>Usage Instructions</strong></p><ol><li>Save the provided code as <code>app.py</code> in the project root.</li><li>Create the three files listed above.</li><li>(Optional) Create an empty <code>processes/</code> directory on the host.</li><li>Build and start the container:</li></ol><pre><code>docker compose up --build -d
</code></pre><ol><li>Verify the service:</li></ol><pre><code>docker logs task-manager-mcp
</code></pre><p>The server should bind to <code>0.0.0.0:5008</code> and display startup messages from Uvicorn.</p><ol><li>Stop/restart: <code>docker compose down</code> or <code>docker compose restart</code>.</li></ol><p><strong>Additional Considerations</strong></p><ul><li><strong>Persistence</strong>: The <code>TaskManager</code> class writes to disk; the volume mount is essential for production use.</li><li><strong>Security</strong>: CORS is configured to allow all origins (as in the original code). In production, restrict <code>allow_origins</code> to specific domains.</li><li><strong>Scaling</strong>: The current setup is single-container; for horizontal scaling, consider a reverse proxy or shared storage.</li><li><strong>Image size and performance</strong>: The slim base and <code>--no-cache-dir</code> keep the final image compact. No additional system packages are required because all dependencies are pure-Python or have minimal C extensions handled by the base image.</li><li><strong>Updates</strong>: To change dependencies or the application code, simply edit the respective files and re-run <code>docker compose up --build</code>.</li></ul><p>This configuration provides a complete, production-ready Docker setup that faithfully reproduces the original application behavior while adding portability and data persistence. If any adjustments (e.g., environment variables, multi-stage builds, or health checks) are required, please provide further specifications.</p>]]></content:encoded></item><item><title><![CDATA[Agentic Server Primer: Llama.cpp MCP Lesson 7: Process Manager (Part 1)]]></title><description><![CDATA[Agentic Server Primer: Llama.cpp MCP Lesson 7: Process Manager (Part 1)]]></description><link>https://www.hotconfig.com/agentic-server-primer-llama-cpp-mcp-lesson-7-process-orchestrator/</link><guid isPermaLink="false">69df3666dc07930001450752</guid><category><![CDATA[agentic server]]></category><category><![CDATA[LLM]]></category><category><![CDATA[AI]]></category><category><![CDATA[MCP Server]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Wed, 15 Apr 2026 07:04:43 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/04/mcp_07.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/04/mcp_07.png" alt="Agentic Server Primer: Llama.cpp MCP Lesson 7: Process Manager (Part 1)"><p>Here is where it gets very interesting. &#xA0;An process manager or harness can distribute jobs, but first we want to give your LLM it&apos;s own &apos;notepad.&apos; Effectively a simple hiearchy would be:</p><p>If you just need to pull and run this LLM it can be easily done with:</p><pre><code class="language-bash">docker pull docker.io/cnmcdee/mcp-process-manager:latest
docker run -d --name mcp-process-manager --restart unless-stopped -e &quot;FLASH_ENV=production&quot; -p 0.0.0.0:5008:5008 cnmcdee/mcp-process-manager:latest</code></pre><ul><li>Processes are masters (One per file .json)</li><li>Tasks can exist inside them.</li><li>Jobs / code snippets can exist inside each task. </li><li>We want to make a front end manager for this in (Part 2) - right now will be a functional back-end!</li></ul><p>Here is your code! Which has been through multiple refactors / audits and should be ready.</p><pre><code class="language-python">import re
import json
import os
from datetime import datetime
from fastmcp import FastMCP
from fastmcp.tools import tool          # &#x2190; Required import
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
import uvicorn

# Initialize the MCP server
mcp = FastMCP(
    name=&quot;Process Manager&quot;,
    instructions=&quot;Provides a process manager for tracking tasks and their associate jobs along with their associate code blocks&quot;)

class TaskManager:
    def __init__(self, storage_dir: str = &quot;processes&quot;):
        self.storage_dir = storage_dir
        os.makedirs(self.storage_dir, exist_ok=True)
        self.current_process = None          # Full process dict (metadata + tasks)
        self.current_process_file = None     # Base name of the active .json file
    def _sanitize_name(self, name: str) -&gt; str:
        &quot;&quot;&quot;Sanitize user-supplied name to prevent path traversal and invalid filenames.&quot;&quot;&quot;
        if not name or not str(name).strip():
            name = &quot;unnamed_process&quot;
        safe_name = re.sub(r&apos;[^a-zA-Z0-9_.-]&apos;, &apos;_&apos;, str(name).strip())
        return safe_name[:150]
    def _get_filepath(self, name: str) -&gt; str:
        &quot;&quot;&quot;Return full path to the sanitized JSON file.&quot;&quot;&quot;
        safe_name = self._sanitize_name(name)
        if not safe_name.endswith(&apos;.json&apos;):
            safe_name += &apos;.json&apos;
        return os.path.join(self.storage_dir, safe_name)
    def _save_current(self):
        &quot;&quot;&quot;Persist the current process to disk.&quot;&quot;&quot;
        if self.current_process and self.current_process_file:
            filepath = self._get_filepath(self.current_process_file)
            with open(filepath, &apos;w&apos;, encoding=&apos;utf-8&apos;) as f:
                json.dump(self.current_process, f, indent=2, ensure_ascii=False)
    def _load_process(self, name: str) -&gt; bool:
        &quot;&quot;&quot;Load a process from disk and update internal state.&quot;&quot;&quot;
        filepath = self._get_filepath(name)
        if not os.path.exists(filepath):
            return False
        try:
            with open(filepath, &apos;r&apos;, encoding=&apos;utf-8&apos;) as f:
                self.current_process = json.load(f)
            self.current_process_file = name
            return True
        except Exception:
            return False
    @tool()
    def process_new(self, name: str, description: str):
        &quot;&quot;&quot;Creates a new process with the given name and description.&quot;&quot;&quot;
        try:
            name = str(name).strip()
            if not name:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;Process name cannot be empty.&quot;})

            filepath = self._get_filepath(name)
            if os.path.exists(filepath):
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Process &apos;{name}&apos; already exists.&quot;})

            self.current_process = {
                &quot;process_name&quot;: name,
                &quot;description&quot;: description,
                &quot;created_at&quot;: datetime.now().isoformat(),
                &quot;tasks&quot;: {}
            }
            self.current_process_file = name
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Process &apos;{name}&apos; created successfully.&quot;,
                &quot;process&quot;: self.current_process
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def process_load(self, name: str):
        &quot;&quot;&quot;Loads an existing process by name.&quot;&quot;&quot;
        try:
            if self._load_process(name):
                task_count = len(self.current_process.get(&quot;tasks&quot;, {}))
                return json.dumps({
                    &quot;success&quot;: True,
                    &quot;message&quot;: f&quot;Process &apos;{name}&apos; loaded successfully.&quot;,
                    &quot;process_name&quot;: self.current_process.get(&quot;process_name&quot;),
                    &quot;task_count&quot;: task_count,
                    &quot;process_data&quot;: self.current_process
                })
            return json.dumps({
                &quot;success&quot;: False,
                &quot;error&quot;: f&quot;Process &apos;{name}&apos; not found.&quot;
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def process_list_all(self):
        &quot;&quot;&quot;Lists all saved processes.&quot;&quot;&quot;
        try:
            files = [f for f in os.listdir(self.storage_dir) if f.endswith(&apos;.json&apos;)]
            process_list = [{&quot;process&quot;: f.replace(&apos;.json&apos;, &apos;&apos;)} for f in files]
            return json.dumps({
                &quot;success&quot;: True,
                &quot;process_list&quot;: process_list,
                &quot;total&quot;: len(process_list)
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e), &quot;process_list&quot;: []})
    @tool()
    def process_save(self, name: str = None):
        &quot;&quot;&quot;Saves the current process, optionally renaming it.&quot;&quot;&quot;
        try:
            if not self.current_process:
                return json.dumps({
                    &quot;success&quot;: False,
                    &quot;error&quot;: &quot;No active process to save. Create or load a process first.&quot;
                })

            if name:
                name = str(name).strip()
                self.current_process[&quot;process_name&quot;] = name
                self.current_process_file = name

            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Process saved successfully as &apos;{self.current_process_file}&apos;.&quot;,
                &quot;process_name&quot;: self.current_process.get(&quot;process_name&quot;)
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def process_delete(self, name: str):
        &quot;&quot;&quot;Deletes an entire process and its associated file from storage.
        If the deleted process is currently loaded, the in-memory state is cleared.&quot;&quot;&quot;
        try:
            name = str(name).strip()
            if not name:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;Process name cannot be empty.&quot;})

            filepath = self._get_filepath(name)
            if not os.path.exists(filepath):
                return json.dumps({
                    &quot;success&quot;: False,
                    &quot;error&quot;: f&quot;Process &apos;{name}&apos; not found.&quot;
                })

            # Delete the file
            os.remove(filepath)

            # If this was the currently loaded process, clear internal state
            if (self.current_process_file and
                    self._sanitize_name(self.current_process_file) == self._sanitize_name(name)):
                self.current_process = None
                self.current_process_file = None

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Process &apos;{name}&apos; deleted successfully.&quot;,
                &quot;deleted_process&quot;: name
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_new(self, task_name: str, task_description: str):
        &quot;&quot;&quot;Creates a new task within the current process.&quot;&quot;&quot;
        try:
            if not self.current_process:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;No active process. Create or load one first.&quot;})
            if task_name in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; already exists.&quot;})

            self.current_process[&quot;tasks&quot;][task_name] = {
                &quot;description&quot;: task_description,
                &quot;created_at&quot;: datetime.now().isoformat(),
                &quot;jobs&quot;: {},
                &quot;code&quot;: {}
            }
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Task &apos;{task_name}&apos; created successfully.&quot;,
                &quot;task&quot;: self.current_process[&quot;tasks&quot;][task_name]
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_list(self):
        &quot;&quot;&quot;Lists all tasks with summary information.&quot;&quot;&quot;
        try:
            if not self.current_process:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;No active process.&quot;})

            tasks_summary = {}
            for t_name, t_data in self.current_process[&quot;tasks&quot;].items():
                tasks_summary[t_name] = {
                    &quot;description&quot;: t_data[&quot;description&quot;],
                    &quot;created_at&quot;: t_data[&quot;created_at&quot;],
                    &quot;job_count&quot;: len(t_data.get(&quot;jobs&quot;, {})),
                    &quot;code_count&quot;: len(t_data.get(&quot;code&quot;, {}))
                }

            return json.dumps({
                &quot;success&quot;: True,
                &quot;task_count&quot;: len(tasks_summary),
                &quot;tasks&quot;: tasks_summary
            }, indent=2)
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_details_get(self, task_name: str):
        &quot;&quot;&quot;Retrieves complete details of a specific task.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            task_data = self.current_process[&quot;tasks&quot;][task_name]
            return json.dumps({
                &quot;success&quot;: True,
                &quot;task_name&quot;: task_name,
                &quot;task_description&quot;: task_data[&quot;description&quot;],
                &quot;created_at&quot;: task_data[&quot;created_at&quot;],
                &quot;jobs&quot;: list(task_data.get(&quot;jobs&quot;, {}).values()),
                &quot;codes&quot;: list(task_data.get(&quot;code&quot;, {}).values()),
                &quot;job_count&quot;: len(task_data.get(&quot;jobs&quot;, {})),
                &quot;code_count&quot;: len(task_data.get(&quot;code&quot;, {}))
            }, indent=2)
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_update_description(self, task_name: str, new_description: str):
        &quot;&quot;&quot;Updates the description of an existing task.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            old_description = self.current_process[&quot;tasks&quot;][task_name][&quot;description&quot;]
            self.current_process[&quot;tasks&quot;][task_name][&quot;description&quot;] = new_description
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Task &apos;{task_name}&apos; description updated successfully.&quot;,
                &quot;task_name&quot;: task_name,
                &quot;old_description&quot;: old_description,
                &quot;new_description&quot;: new_description
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_name_change(self, old_task_name: str, new_task_name: str):
        &quot;&quot;&quot;Renames an existing task.&quot;&quot;&quot;
        try:
            if not self.current_process:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: &quot;No active process.&quot;})
            if old_task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{old_task_name}&apos; does not exist.&quot;})
            if new_task_name in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{new_task_name}&apos; already exists.&quot;})

            self.current_process[&quot;tasks&quot;][new_task_name] = self.current_process[&quot;tasks&quot;].pop(old_task_name)
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Task renamed from &apos;{old_task_name}&apos; to &apos;{new_task_name}&apos; successfully.&quot;,
                &quot;old_name&quot;: old_task_name,
                &quot;new_name&quot;: new_task_name
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_delete(self, task_name: str):
        &quot;&quot;&quot;Deletes a task from the current process.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            del self.current_process[&quot;tasks&quot;][task_name]
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Task &apos;{task_name}&apos; deleted successfully.&quot;
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_job_add(self, task_name: str, job_name: str, job_description: str):
        &quot;&quot;&quot;Adds a job to the specified task.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            job_key = f&quot;{job_name}_{datetime.now().isoformat()}&quot;
            self.current_process[&quot;tasks&quot;][task_name][&quot;jobs&quot;][job_key] = {
                &quot;name&quot;: job_name,
                &quot;description&quot;: job_description,
                &quot;created_at&quot;: datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Job &apos;{job_name}&apos; added to task &apos;{task_name}&apos; successfully.&quot;
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_job_get(self, task_name: str, job_name: str):
        &quot;&quot;&quot;Retrieves job(s) matching the given job_name (partial match).&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            jobs = self.current_process[&quot;tasks&quot;][task_name].get(&quot;jobs&quot;, {})
            matching = [data for key, data in jobs.items() if job_name in key or job_name == data.get(&quot;name&quot;)]

            return json.dumps({
                &quot;success&quot;: True,
                &quot;task_name&quot;: task_name,
                &quot;jobs&quot;: matching
            }, indent=2)
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_code_add(self, task_name: str, code_name: str, code_content: str):
        &quot;&quot;&quot;Adds a code snippet to the specified task.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            code_key = f&quot;{code_name}_{datetime.now().isoformat()}&quot;
            self.current_process[&quot;tasks&quot;][task_name][&quot;code&quot;][code_key] = {
                &quot;name&quot;: code_name,
                &quot;content&quot;: code_content,
                &quot;created_at&quot;: datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Code &apos;{code_name}&apos; added to task &apos;{task_name}&apos; successfully.&quot;
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_code_get(self, task_name: str, code_name: str):
        &quot;&quot;&quot;Retrieves code snippet(s) matching the given code_name (partial match).&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            codes = self.current_process[&quot;tasks&quot;][task_name].get(&quot;code&quot;, {})
            matching = [data for key, data in codes.items() if code_name in key or code_name == data.get(&quot;name&quot;)]

            return json.dumps({
                &quot;success&quot;: True,
                &quot;task_name&quot;: task_name,
                &quot;codes&quot;: matching
            }, indent=2)
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_code_delete(self, task_name: str, code_name: str):
        &quot;&quot;&quot;Deletes code snippet(s) matching the given code_name (partial match).&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            codes = self.current_process[&quot;tasks&quot;][task_name].get(&quot;code&quot;, {})
            keys_to_delete = [k for k in codes if code_name in k or code_name == codes[k].get(&quot;name&quot;)]
            deleted = [codes[k] for k in keys_to_delete]

            for k in keys_to_delete:
                del self.current_process[&quot;tasks&quot;][task_name][&quot;code&quot;][k]

            self._save_current()

            if not deleted:
                return json.dumps({
                    &quot;success&quot;: False,
                    &quot;message&quot;: f&quot;No code matching &apos;{code_name}&apos; found in task &apos;{task_name}&apos;.&quot;,
                    &quot;task_name&quot;: task_name,
                    &quot;code_name&quot;: code_name
                })

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Deleted {len(deleted)} code snippet(s) matching &apos;{code_name}&apos;.&quot;,
                &quot;task_name&quot;: task_name,
                &quot;deleted_count&quot;: len(deleted),
                &quot;deleted_codes&quot;: deleted
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})
    @tool()
    def task_jobs_list(self):
        &quot;&quot;&quot;Lists all tasks (identical to task_list for compatibility).&quot;&quot;&quot;
        return self.task_list()
    @tool()
    def task_code_update(self, task_name: str, code_name: str, new_content: str):
        &quot;&quot;&quot;Updates (by creating a new version of) a code snippet.&quot;&quot;&quot;
        try:
            if not self.current_process or task_name not in self.current_process[&quot;tasks&quot;]:
                return json.dumps({&quot;success&quot;: False, &quot;error&quot;: f&quot;Task &apos;{task_name}&apos; does not exist.&quot;})

            code_key = f&quot;{code_name}_{datetime.now().isoformat()}&quot;
            self.current_process[&quot;tasks&quot;][task_name][&quot;code&quot;][code_key] = {
                &quot;name&quot;: code_name,
                &quot;content&quot;: new_content,
                &quot;created_at&quot;: datetime.now().isoformat()
            }
            self._save_current()

            return json.dumps({
                &quot;success&quot;: True,
                &quot;message&quot;: f&quot;Code &apos;{code_name}&apos; updated successfully in task &apos;{task_name}&apos;.&quot;,
                &quot;new_version_created&quot;: True
            })
        except Exception as e:
            return json.dumps({&quot;success&quot;: False, &quot;error&quot;: str(e)})


# Registration remains unchanged (place after class definition)
# task_manager = TaskManager()
# mcp.add_tool(task_manager.process_new)
# ... (add all other methods as before)



task_manager = TaskManager()
# &#x2500;&#x2500; Register all bound methods as tools (required step) &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
mcp.add_tool(task_manager.process_new)
mcp.add_tool(task_manager.process_load)
mcp.add_tool(task_manager.process_list_all)
mcp.add_tool(task_manager.process_save)
mcp.add_tool(task_manager.process_delete)
mcp.add_tool(task_manager.task_new)
mcp.add_tool(task_manager.task_update_description)
mcp.add_tool(task_manager.task_delete)
mcp.add_tool(task_manager.task_name_change)
mcp.add_tool(task_manager.task_job_add)
mcp.add_tool(task_manager.task_job_get)
mcp.add_tool(task_manager.task_details_get)
mcp.add_tool(task_manager.task_list)
mcp.add_tool(task_manager.task_code_add)
mcp.add_tool(task_manager.task_code_get)
mcp.add_tool(task_manager.task_code_delete)
mcp.add_tool(task_manager.task_jobs_list)
mcp.add_tool(task_manager.task_code_update)

# &#x2500;&#x2500; Server Startup with CORS (required for llama.cpp frontend) &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
if __name__ == &quot;__main__&quot;:
    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=[&quot;*&quot;],  # Restrict in production
            allow_credentials=&quot;True&quot;,
            allow_methods=[&quot;GET&quot;, &quot;POST&quot;, &quot;OPTIONS&quot;],
            allow_headers=[&quot;*&quot;],
            expose_headers=[&quot;*&quot;],
        )
    ]

    app = mcp.http_app(
        path=&quot;/mcp&quot;,
        middleware=middleware
    )

    uvicorn.run(
        app,
        host=&quot;0.0.0.0&quot;,
        port=5008,
        log_level=&quot;info&quot;
    )</code></pre><p>Once it is stood up it will listen on port 5008. &#xA0;You can add it to your Llama server as in:</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-91.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 7: Process Manager (Part 1)" loading="lazy" width="478" height="208"></figure><p>Once added it will look as:</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-92.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 7: Process Manager (Part 1)" loading="lazy" width="488" height="228"></figure><p>Testing. &#xA0;Of very interest now testing cycles are not really done by you - they are done by your LLM. &#xA0;Some important points:</p><ul><li>Each LLM may like its agentic tool structured differently, the best way is to simply paste the MCP code into the LLM and ask it to audit it, and or audit the docstring.</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-93.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 7: Process Manager (Part 1)" loading="lazy" width="645" height="357" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-93.png 600w, https://www.hotconfig.com/content/images/2026/04/image-93.png 645w"></figure><p>Naturally when this is done you will need to dockerize it, and let it live happily on port 5008, a simply walkthrough follows.</p><p><strong>Dockerfile, <code>requirements.txt</code>, and <code>docker-compose.yml</code></strong></p><p>The following files have been prepared specifically for the provided <code>TaskManager</code> + FastMCP application. They follow industry best practices for security, reproducibility, and data persistence.</p><p>Save the entire Python code you supplied as <strong><code>app.py</code></strong> in your project root directory.</p><hr><h3 id="1-requirementstxt">1. <code>requirements.txt</code></h3><pre><code class="language-txt">fastmcp
starlette
uvicorn[standard]
</code></pre><p>This installs:</p><ul><li><code>fastmcp</code> (the core MCP framework)</li><li><code>starlette</code> (explicitly for <code>Middleware</code> and <code>CORSMiddleware</code>)</li><li><code>uvicorn[standard]</code> (ASGI server with recommended extras for production)</li></ul><hr><h3 id="2-dockerfile">2. <code>Dockerfile</code></h3><pre><code class="language-dockerfile"># syntax=docker/dockerfile:1.4
FROM python:3.12-slim AS builder

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

WORKDIR /app

# Install build dependencies
RUN apt-get update &amp;&amp; apt-get install -y --no-install-recommends \
    gcc \
    &amp;&amp; rm -rf /var/lib/apt/lists/*

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip &amp;&amp; \
    pip install --no-cache-dir -r requirements.txt

# Final stage
FROM python:3.12-slim

# Create non-root user for security
RUN useradd --create-home --shell /bin/false appuser

WORKDIR /app

# Copy installed packages from builder
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

# Copy application code
COPY app.py .

# Create persistent storage directory and set permissions
RUN mkdir -p /app/processes &amp;&amp; \
    chown -R appuser:appuser /app

# Switch to non-root user
USER appuser

EXPOSE 5008

# Health check (optional but recommended)
HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:5008/mcp/health || exit 1

CMD [&quot;python&quot;, &quot;app.py&quot;]
</code></pre><hr><h3 id="3-docker-composeyml">3. <code>docker-compose.yml</code></h3><pre><code class="language-yaml">version: &apos;3.9&apos;

services:
  process-manager:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: process-manager
    restart: unless-stopped
    ports:
      - &quot;5008:5008&quot;
    volumes:
      # Persistent storage for process JSON files
      - process-data:/app/processes
    environment:
      - PYTHONUNBUFFERED=1
    # Optional: limit resources in production
    # deploy:
    #   resources:
    #     limits:
    #       cpus: &apos;1.0&apos;
    #       memory: 512M

volumes:
  process-data:
    driver: local
    name: process-manager-data
</code></pre><hr><p><strong>Deployment Instructions</strong></p><ol><li>Place the three files and <code>app.py</code> in the same directory.</li><li>Build and start the service:</li></ol><pre><code class="language-bash">docker compose up -d --build
</code></pre><ol><li>Verify the server is running:</li></ol><pre><code class="language-bash">docker compose logs -f process-manager
</code></pre><p>The MCP endpoint will be available at <code>http://localhost:5008/mcp</code>.</p><p><strong>Key Benefits of This Setup</strong></p><ul><li>Secure (non-root user, minimal base image)</li><li>Persistent data (the <code>processes/</code> directory survives container restarts)</li><li>Fast rebuilds (multi-stage build)</li><li>Production-ready (healthcheck, restart policy, resource isolation)</li></ul><p>If you require additional features (e.g., <code>.dockerignore</code>, environment variables, HTTPS with Caddy/Traefik, or multi-container setup), please provide further details.</p><pre><code class="language-bash">&quot;We need a front end for this process manager write the task and go through each task writing all the code.&quot;</code></pre><p>How far do you think it will get?</p>]]></content:encoded></item><item><title><![CDATA[Agentic Server Primer: Llama.cpp MCP Lesson 6: Adding mysql Database Docker Toolsets.]]></title><description><![CDATA[We give our LLM it's own Database to play with!]]></description><link>https://www.hotconfig.com/agentic-server-primer-llama-cpp-mcp-lesson-5-adding-javascript-via-a-python-api-plugin/</link><guid isPermaLink="false">69dc0fe9dc0793000145067c</guid><category><![CDATA[docker]]></category><category><![CDATA[agentic]]></category><category><![CDATA[agentic server]]></category><category><![CDATA[Llama.cpp]]></category><category><![CDATA[mysql]]></category><category><![CDATA[database]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Sun, 12 Apr 2026 23:13:33 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/04/mcp_06.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/04/mcp_06.png" alt="Agentic Server Primer: Llama.cpp MCP Lesson 6: Adding mysql Database Docker Toolsets."><p>Ok fun time. Today we are going to give our House LLM access to it&apos;s &#xA0;own mysql database connection and see how it handles it. Of course <em>give your head a shake if you ever do this to production. &#xA0;Test and verify THEN move to whatever production you are running.</em></p><p>If you just need to pull and run this container:</p><pre><code class="language-bash">docker pull docker.io/cnmcdee/mcp-msyql:latest
docker run -d --name mcp-mysql --restart unless-stopped -e &quot;FLASH_ENV=production&quot; -p 0.0.0.0:5004:5004 cnmcdee/mcp-mysql:latest</code></pre><h3 id="the-process">The Process.</h3><ul><li>A. Make a docker mysql container (Port 5004) &#xA0;We deliberately did <em>not</em> use 3306 in case you have something important running there for your interests.</li><li>B. Make an agentic python MCP Server tool API that will listen (Port 5005) when it receives an MCP request it will pass it through to the mysql server (Port 5004), and send the code back.</li><li>C. Dockerize the agentic python MCP tool itself so it joins the family of tools.</li></ul><h3 id="a-docker-mysql-container">A. Docker Mysql Container</h3><ul><li>These are pretty straightforward to make, because it&apos;s so common we can literally just go straight to a docker-compose.yml and stand it up. </li></ul><p>Create yourself a directory, create a <strong>docker-compose.yml </strong>and put in it:</p><pre><code class="language-YAML">version: &apos;3.8&apos;

services:
  mysql:
    image: mysql:8.0
    container_name: mysql-server
    restart: unless-stopped
    command: --default-authentication-plugin=mysql_native_password   # &#x2190; forces compatible hash type
    environment:
      MYSQL_ROOT_PASSWORD: YourStrongPasswordHere123!   # &#x2190; use your actual password
    ports:
      - &quot;5004:3306&quot;
    volumes:
      - mysql_data:/var/lib/mysql

volumes:
  mysql_data:</code></pre><p>Stand it up easily with:</p><pre><code class="language-bash">docker compose up      # Temporary watch it mode
docker compose up -d   # Permanent daemon mode</code></pre><h3 id="troubleshooting-mysql-docker-connections">Troubleshooting mysql Docker Connections</h3><ul><li>It is very common to have considerable issues at this point connecting to your mysql container so we will have a small fork on troubleshooting it.</li></ul><p>Inspect it make sure the external port is bound to the internal port as in:</p><pre><code class="language-bash">docker ps -a
#gives something like
33060/tcp, 0.0.0.0:5004-&gt;3306/tcp, [::]:5004-&gt;3306/tcp   mysql-server</code></pre><p>Remember atypically every docker container will have a 172.x Class B IP subnet and it may not be bound to the external of the machine, so you can <strong>nmap </strong>:</p><pre><code class="language-bash">sudo apt install nmap
nmap &lt;your ip&gt; 
# For our references
Nmap scan report for dragon-192-168-1-3 (192.168.1.3)
Host is up (0.00017s latency).
Not shown: 992 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
3306/tcp open  mysql
5000/tcp open  upnp
5001/tcp open  commplex-link
5002/tcp open  rfe
5003/tcp open  filemaker
5004/tcp open  avt-profile-1
8080/tcp open  http-proxy</code></pre><p>You can see that port 5004 is listening, but it can trick you!</p><pre><code class="language-bash">
mysql -u root -p -P 5004 # can fail!!
mysql -u root -p -h 192.168.1.3 -P 5004 #explicit and will pass!</code></pre><p>If all that fails you can &apos;jump into the container&apos; itself with:</p><pre><code class="language-bash">docker exec -it mysql-container /bin/bash</code></pre><p>With the modified docker prompt you are back to a local mysql connection:</p><pre><code class="language-bash">bash-5.1# mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 8.0.43 MySQL Community Server - GPL

Copyright (c) 2000, 2025, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type &apos;help;&apos; or &apos;\h&apos; for help. Type &apos;\c&apos; to clear the current input statement.

mysql&gt;</code></pre><p>It should be noted a lot of the times you will get the <strong>Enter Password: </strong>when you are connected to nothing, it would be nice if mysql did not do that.. Anyways.. </p><h3 id="b-agentic-python-mcp-server">B. Agentic Python MCP Server:</h3><ul><li>We want it to &apos;auto-connect&apos; if that works, then offer up the MCP server port, which we will then bolt to our LLM!</li><li>You can see you may need to set a MYSQL_ROOT_PASSWORD=&quot;password&quot; </li><li>We test the python app, then we Dockerize it.</li></ul><h3 id="the-code">The Code</h3><pre><code class="language-bash">import time
import mysql.connector
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
from fastmcp import FastMCP
import uvicorn

# &#x2500;&#x2500; MySQL Configuration &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
MYSQL_CONFIG = {
    &quot;host&quot;: &quot;127.0.0.1&quot;,
    &quot;port&quot;: 5004,
    &quot;user&quot;: &quot;root&quot;,
    &quot;charset&quot;: &quot;utf8mb4&quot;,
    &quot;use_pure&quot;: True,
}

password = &apos;database_password&apos;

# Global persistent connection
db_conn = None

def test_mysql_connection(max_retries=10, delay=3):
    &quot;&quot;&quot;Establish the persistent MySQL connection at startup.&quot;&quot;&quot;
    global db_conn
    

    print(f&quot;Testing MySQL connection to {MYSQL_CONFIG[&apos;host&apos;]}:{MYSQL_CONFIG[&apos;port&apos;]}...&quot;)

    for attempt in range(1, max_retries + 1):
        try:
            db_conn = mysql.connector.connect(
                **MYSQL_CONFIG,
                password=password,
                connect_timeout=8,
            )
            print(&quot;&#x2713; Successfully connected to MySQL (persistent connection established).&quot;)
            return True
        except Exception as e:
            print(f&quot;Attempt {attempt}/{max_retries} failed: {e}&quot;)
            if attempt &lt; max_retries:
                time.sleep(delay)

    print(&quot;&#x26A0;&#xFE0F; MySQL connection test failed. Server will still start; tool will attempt reconnect on first use.&quot;)
    return False

# &#x2500;&#x2500; MCP Server Instance &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
mcp = FastMCP(&quot;mysql_query&quot;)

@mcp.tool()
def run_mysql_query(sql: str, database: str = None):
    &quot;&quot;&quot;
    Straight pipe to MySQL. Python acts purely as a carrier.

    Executes any valid MySQL SQL statement(s) directly against the database.
    Multiple statements separated by semicolons are fully supported.
    Results are returned with minimal transformation.
    &quot;&quot;&quot;
    global db_conn

    # Ensure active connection
    if db_conn is None or not db_conn.is_connected():
        try:
            db_conn = mysql.connector.connect(
                **MYSQL_CONFIG,
                password=password,
                connect_timeout=10,
            )
            print(&quot;&#x2713; Reconnected to MySQL successfully in tool call.&quot;)
        except Exception as e:
            return {
                &quot;status&quot;: &quot;error&quot;,
                &quot;message&quot;: f&quot;Failed to connect/reconnect to MySQL: {str(e)}&quot;
            }

    cursor = None
    try:
        # &#x1F527; Fixed: Use multi=True in cursor(), not in execute()
        cursor = db_conn.cursor(dictionary=True, buffered=True)

        # Switch database if specified
        if database:
            cursor.execute(f&quot;USE `{database}`&quot;)
            db_conn.commit()

        # Execute SQL &#x2014; support multi-statements by iterating over cursor.execute() properly
        all_results = []
        has_results = False

        # For multi-statements, use cursor.execute() without multi=True and process results
        # But we must handle multi-statements manually using stored_results or nextset
        # Since mysql-connector-python does not support multi=True in execute(),
        # we split statements ourselves for simplicity and safety.
        statements = [stmt.strip() for stmt in sql.split(&apos;;&apos;) if stmt.strip()]

        for stmt in statements:
            cursor.execute(stmt)
            if cursor.with_rows:
                rows = cursor.fetchall()
                columns = [desc[0] for desc in cursor.description] if cursor.description else []
                all_results.append({
                    &quot;status&quot;: &quot;success&quot;,
                    &quot;columns&quot;: columns,
                    &quot;rows&quot;: rows,
                    &quot;row_count&quot;: len(rows),
                    &quot;message&quot;: &quot;Query executed successfully&quot;
                })
            else:
                db_conn.commit()
                all_results.append({
                    &quot;status&quot;: &quot;success&quot;,
                    &quot;affected_rows&quot;: cursor.rowcount,
                    &quot;lastrowid&quot;: cursor.lastrowid,
                    &quot;message&quot;: &quot;Query executed successfully&quot;
                })

        if not all_results:
            return {
                &quot;status&quot;: &quot;success&quot;,
                &quot;message&quot;: &quot;No results returned (possibly DDL or empty query).&quot;
            }

        # Return single result or multi-statement wrapper (minimal carrier format)
        if len(all_results) == 1:
            return all_results[0]
        return {
            &quot;multi&quot;: True,
            &quot;results&quot;: all_results,
            &quot;status&quot;: &quot;success&quot;
        }

    except mysql.connector.Error as err:
        error_msg = f&quot;MySQL Error [{err.errno}]: {err.msg}&quot;
        print(error_msg)
        return {
            &quot;status&quot;: &quot;error&quot;,
            &quot;message&quot;: error_msg
        }
    except Exception as e:
        error_msg = f&quot;Unexpected error: {str(e)}&quot;
        print(error_msg)
        return {
            &quot;status&quot;: &quot;error&quot;,
            &quot;message&quot;: error_msg
        }
    finally:
        if cursor:
            cursor.close()


# &#x2500;&#x2500; Server Startup with CORS &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
if __name__ == &quot;__main__&quot;:
    test_mysql_connection()

    print(&quot;Starting MCP server - straight pipe to MySQL...&quot;)

    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=[&quot;*&quot;],
            allow_credentials=True,
            allow_methods=[&quot;GET&quot;, &quot;POST&quot;, &quot;OPTIONS&quot;],
            allow_headers=[&quot;*&quot;],
            expose_headers=[&quot;*&quot;],
        )
    ]

    app = mcp.http_app(path=&quot;/mcp&quot;, middleware=middleware)
    uvicorn.run(app, host=&quot;0.0.0.0&quot;, port=5005, log_level=&quot;info&quot;)
</code></pre><p>After you spend some time you will find that once it runs, it should like as:</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-85.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 6: Adding mysql Database Docker Toolsets." loading="lazy" width="705" height="141" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-85.png 600w, https://www.hotconfig.com/content/images/2026/04/image-85.png 705w"></figure><p>Review: </p><ul><li>We have a mysql docker container specifically for your pet houseLLM! (Port 5004)</li><li>You have a Python API specifcially on Port 5005!</li></ul><p>Shall we give Your HouseLLM a New mysql toy?! </p><p>From your llama-server port (for us it&apos;s <strong>192.168.1.3:8080) </strong>simply select on the left your MCP server. Remember the mapping is a defacto <strong>http://192.168.1.5005/mcp </strong>because the LLM talks in JSON it needs the python translator in the middle.</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-86.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 6: Adding mysql Database Docker Toolsets." loading="lazy" width="800" height="459" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-86.png 600w, https://www.hotconfig.com/content/images/2026/04/image-86.png 800w" sizes="(min-width: 720px) 720px"></figure><h3 id="tool-naming-is-important-conclusions">Tool Naming is Important / Conclusions</h3><ul><li>The name of the tool should be as close as possible to the function offered, thusly:</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-90.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 6: Adding mysql Database Docker Toolsets." loading="lazy" width="553" height="250"></figure><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-88.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 6: Adding mysql Database Docker Toolsets." loading="lazy" width="698" height="118" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-88.png 600w, https://www.hotconfig.com/content/images/2026/04/image-88.png 698w"></figure><h3 id="troubleshooting-mysqlquery-docstrings-matter">TroubleShooting mysql_query (docstrings matter)</h3><ul><li>Here are some debugging suggestions, let the llm tell you what it sees.</li></ul><pre><code class="language-bash">list all tools</code></pre><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-89.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 6: Adding mysql Database Docker Toolsets." loading="lazy" width="716" height="327" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-89.png 600w, https://www.hotconfig.com/content/images/2026/04/image-89.png 716w"></figure><ul><li>Even though the tool was fully registered in the MCP &#xA0;index, and the queries were going to it, and we made a <em>detailed docstring</em> as recommended by Grok 4, it was still failing so we wiped the MCP reference reinstalled the tool and had the LLM requery all tools.</li></ul><h3 id="after-that-we-did-straight-piping">After That We Did Straight-Piping</h3><ul><li>We tried straight-piping the connection and describing as that in the &apos;docstring&apos; mattered.</li></ul><h3 id="conclusions">Conclusions:</h3><ul><li>This worked. I ironically our <em>paid</em> LLM could not solve this and it was surreal to simply paste the API Python tool back at the Qwen3-Code, have it adjust it&apos;s &#xA0;own API tool, re-run the tool and it would automatically start testing it!</li></ul><h3 id="d-dockerization">D. Dockerization</h3><p>Since Qwen-3.5 Code is <em>that good now, we let it finish writing the article.</em></p><p>You&apos;re asking for a breakdown of <strong>all three files</strong> (<code>Dockerfile</code>, <code>requirements.txt</code>, <code>docker-compose.yml</code>) with <strong>commands, guides, and important notes</strong> for each.</p><p>Let me do that. I&apos;ll also correct a few minor issues I noticed in the original Dockerization setup &#x2014; specifically around <code>mysql-connector-python</code> dependencies and environment variable handling.</p><hr><h2 id="%E2%9C%85-1-dockerfile">&#x2705; 1. <code>Dockerfile</code></h2><h3 id="%F0%9F%93%84-file-dockerfile">&#x1F4C4; File: <code>Dockerfile</code></h3><pre><code class="language-dockerfile"># Use official Python slim image (3.11 is stable and lightweight)
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install system dependencies needed for mysql-connector-python
RUN apt-get update &amp;&amp; apt-get install -y --no-install-recommends \
    default-libmysqlclient-dev \
    gcc \
    &amp;&amp; rm -rf /var/lib/apt/lists/*

# Copy and install Python dependencies (better caching)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose the MCP server port
EXPOSE 5005

# Set environment variables
ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=1

# Run the server
CMD [&quot;python&quot;, &quot;server.py&quot;]
</code></pre><h3 id="%F0%9F%94%A7-important-notes">&#x1F527; Important Notes:</h3><ul><li><code>default-libmysqlclient-dev</code> provides headers needed to compile <code>mysql-connector-python</code>.</li><li><code>PYTHONUNBUFFERED=1</code> ensures logs appear in <code>docker logs</code> immediately.</li><li><code>PYTHONDONTWRITEBYTECODE=1</code> prevents <code>.pyc</code> files from being written (cleaner container).</li></ul><hr><h3 id="%F0%9F%9B%A0%EF%B8%8F-commands-to-build-run">&#x1F6E0;&#xFE0F; Commands to Build &amp; Run</h3><p><strong>Build the image:</strong></p><pre><code class="language-bash">docker build -t mysql-mcp-server .
</code></pre><p><strong>Run the container:</strong></p><pre><code class="language-bash">docker run -p 5005:5005 \
  -e MYSQL_HOST=mysql \
  -e MYSQL_PORT=3306 \
  -e MYSQL_USER=root \
  -e MYSQL_PASSWORD=password \
  -e MYSQL_DATABASE=test_db \
  --name mysql-mcp \
  --network mysql-network \
  mysql-mcp-server
</code></pre><p>&#x26A0;&#xFE0F; <strong>Important</strong>: You need to have the MySQL container already running on the same network first.</p><hr><h2 id="%E2%9C%85-2-requirementstxt">&#x2705; 2. <code>requirements.txt</code></h2><h3 id="%F0%9F%93%84-file-requirementstxt">&#x1F4C4; File: <code>requirements.txt</code></h3><pre><code>fastmcp==0.1.0
mysql-connector-python==8.0.33
starlette==0.27.0
uvicorn==0.24.0
</code></pre><blockquote>&#x1F50D; <strong>Recommendation</strong>: For production, consider adding version pinning hashes or using <code>pip-compile</code> from <code>pip-tools</code> for reproducible builds.</blockquote><hr><h3 id="%F0%9F%9B%A0%EF%B8%8F-commands">&#x1F6E0;&#xFE0F; Commands</h3><p><strong>Install dependencies locally (for dev/testing):</strong></p><pre><code class="language-bash">pip install -r requirements.txt
</code></pre><p><strong>Generate requirements with hashes (optional, for security):</strong></p><pre><code class="language-bash">pip-compile --generate-hashes requirements.in
</code></pre><hr><h2 id="%E2%9C%85-3-docker-composeyml">&#x2705; 3. <code>docker-compose.yml</code></h2><h3 id="%F0%9F%93%84-file-docker-composeyml">&#x1F4C4; File: <code>docker-compose.yml</code></h3><pre><code class="language-yaml">version: &apos;3.8&apos;

services:
  mysql:
    image: mysql:8.0
    container_name: mysql-db
    environment:
      MYSQL_ROOT_PASSWORD: password
      MYSQL_DATABASE: test_db
    ports:
      - &quot;3306:3306&quot;
    volumes:
      - mysql-data:/var/lib/mysql
    restart: unless-stopped
    networks:
      - app-network

  mcp-server:
    build: .
    container_name: mysql-mcp
    ports:
      - &quot;5005:5005&quot;
    environment:
      MYSQL_HOST: mysql
      MYSQL_PORT: 3306
      MYSQL_USER: root
      MYSQL_PASSWORD: password
      MYSQL_DATABASE: test_db
    depends_on:
      - mysql
    restart: unless-stopped
    networks:
      - app-network

networks:
  app-network:
    driver: bridge

volumes:
  mysql-data:
</code></pre><blockquote>&#x1F50D; <strong>Improvement</strong>: I renamed the network to <code>app-network</code> (more generic) and added <code>container_name</code> for easier management.</blockquote><hr><h3 id="%F0%9F%9B%A0%EF%B8%8F-commands-1">&#x1F6E0;&#xFE0F; Commands</h3><p><strong>Build and start all services:</strong></p><pre><code class="language-bash">docker-compose up --build
</code></pre><p><strong>Start services in detached mode (background):</strong></p><pre><code class="language-bash">docker-compose up -d
</code></pre><p><strong>Stop services:</strong></p><pre><code class="language-bash">docker-compose down
</code></pre><p><strong>Stop and remove volumes (&#x26A0;&#xFE0F; deletes MySQL data):</strong></p><pre><code class="language-bash">docker-compose down -v
</code></pre><p><strong>Check logs:</strong></p><pre><code class="language-bash">docker-compose logs -f mcp-server
docker-compose logs -f mysql
</code></pre><hr><h2 id="%F0%9F%A7%AA-4-testing-usage-guide">&#x1F9EA; 4. Testing &amp; Usage Guide</h2><h3 id="%F0%9F%94%8C-test-the-api">&#x1F50C; Test the API</h3><p>Once services are up, test the MCP endpoint:</p><pre><code class="language-bash">curl -X POST http://localhost:5005/mcp \
  -H &quot;Content-Type: application/json&quot; \
  -d &apos;{
    &quot;function&quot;: &quot;run_mysql_query&quot;,
    &quot;arguments&quot;: {
      &quot;sql&quot;: &quot;SELECT 1 + 1 AS result&quot;
    }
  }&apos;
</code></pre><p>&#x2705; Expected response:</p><pre><code class="language-json">{
  &quot;status&quot;: &quot;success&quot;,
  &quot;columns&quot;: [&quot;result&quot;],
  &quot;rows&quot;: [{&quot;result&quot;: 2}],
  &quot;row_count&quot;: 1,
  &quot;message&quot;: &quot;Query executed successfully&quot;
}
</code></pre><h3 id="%F0%9F%93%8A-view-mysql-data">&#x1F4CA; View MySQL Data</h3><p>You can connect a GUI (e.g., MySQL Workbench, DBeaver) to <code>localhost:3306</code> with:</p><ul><li><strong>Host</strong>: <code>localhost</code></li><li><strong>Port</strong>: <code>3306</code></li><li><strong>User</strong>: <code>root</code></li><li><strong>Password</strong>: <code>p</code>assword</li><li><strong>Database</strong>: <code>test_db</code></li></ul><hr><h2 id="%F0%9F%9B%A1%EF%B8%8F-5-security-production-considerations">&#x1F6E1;&#xFE0F; 5. Security &amp; Production Considerations</h2><!--kg-card-begin: html--><table>
<thead>
<tr>
<th>Area</th>
<th>Recommendation</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Passwords</strong></td>
<td>Never hardcode in Dockerfile. Use Docker secrets or <code>.env</code> files.</td>
</tr>
<tr>
<td><strong>Database Access</strong></td>
<td>Restrict <code>root</code> access. Use a dedicated app user with minimal privileges.</td>
</tr>
<tr>
<td><strong>CORS</strong></td>
<td>Avoid <code>allow_origins=[&quot;*&quot;]</code> in production. Specify allowed domains.</td>
</tr>
<tr>
<td><strong>Logging</strong></td>
<td>Add structured logging for better debugging.</td>
</tr>
<tr>
<td><strong>Health Checks</strong></td>
<td>Add health checks for <code>mcp-server</code> and <code>mysql</code>.</td>
</tr>
<tr>
<td><strong>Updates</strong></td>
<td>Pin Python version (e.g., <code>python:3.11.7-slim</code>) for reproducibility.</td>
</tr>
</tbody>
</table><!--kg-card-end: html--><p>Example <code>.env</code> file for secrets:</p><pre><code class="language-env">MYSQL_ROOT_PASSWORD=supersecret123
MYSQL_DATABASE=test_db
</code></pre><p>Then reference in <code>docker-compose.yml</code> using <code>MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}</code>.</p><hr><p>Let me know if you&apos;d like me to:</p><ul><li>Add a health check to the <code>Dockerfile</code></li><li>Generate a <code>.env.example</code> template</li><li>Include a <code>Makefile</code> for common commands</li><li>Add a README with quickstart instructions</li></ul>]]></content:encoded></item><item><title><![CDATA[Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin.]]></title><description><![CDATA[We go through a full working example of creating your own MCP tools.]]></description><link>https://www.hotconfig.com/agentic-server-primer-3/</link><guid isPermaLink="false">69dc00a2dc079300014505c4</guid><category><![CDATA[javascript]]></category><category><![CDATA[html]]></category><category><![CDATA[python]]></category><category><![CDATA[MCP Server]]></category><category><![CDATA[agentic]]></category><category><![CDATA[agentic server]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Sun, 12 Apr 2026 21:18:02 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/04/mcp_05.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/04/mcp_05.png" alt="Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin."><p>An agentic llm is simply giving your house LLM the cool tools to do it&apos;s work. &#xA0;Instead of relying strictly on it&apos;s own internal knowledge it can actually go out and verify it&apos;s work. &#xA0;We started with a simple <a href="https://www.hotconfig.com/agentic-server-primers-mcp-lesson-1-a-calculator/">calculator</a> for math, then we studied how to <a href="https://www.hotconfig.com/agentic-server-primer-llama-cpp-mcp-lesson-2-dockerization/">dockerize</a> it. &#xA0;After that we added a <a href="https://www.hotconfig.com/agentic-server-primer-2/">python tool</a>, a <a href="https://www.hotconfig.com/agentic-server-primer/">weather</a> tool, and today we will be adding a javascript too!</p><p>If you need to simply pull and run this docker it is available via:</p><pre><code class="language-bash">docker pull docker.io/cnmcdee/mcp-javascript:latest
docker run -d --name mcp-javascript --restart unless-stopped -e &quot;FLASH_ENV=production&quot; -p 0.0.0.0:5003:5003 cnmcdee/mcp-javascript:latest</code></pre><p>Sounds complex - it&apos;s not - here is the breakdown diagram.</p><ul><li>The llm is informed of the tool availability via the llama.cpp plugin</li><li>It is issued a prompt and is welcome to use it on port 5003.</li><li>A python api docker is listening on that port and is basically a &apos;middle-man&apos; for simplicity sake.</li><li>It receives a string JSON object which it parses, calls a node, and runs the example code. If it passes the result code is given back to the LLM so it knows what to do!</li></ul><p><strong>A diagram</strong></p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/mcp_4.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin." loading="lazy" width="832" height="1248" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/mcp_4.png 600w, https://www.hotconfig.com/content/images/2026/04/mcp_4.png 832w" sizes="(min-width: 720px) 720px"></figure><ul><li>The llm tool is informed of the MCP server is available at the endpoint, </li></ul><pre><code class="language-bash">192.168.1.3:5003/mcp</code></pre><p>Sounds Complex? It has a lot of moving parts but code wise it&apos;s pretty simple the entire code is only 108 lines:</p><pre><code class="language-python">import os
import subprocess
import tempfile

from fastmcp import FastMCP
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
import uvicorn

# Initialize the MCP server
mcp = FastMCP(
    name=&quot;JavaScript Program Tester&quot;,
    instructions=(
        &quot;Provides a tool for executing and testing JavaScript programs &quot;
        &quot;in a Node.js runtime. Supports console output, error capture, &quot;
        &quot;and timeout handling. Ideal for program validation and debugging.&quot;
    )
)


@mcp.tool()
def test_javascript_program(code, timeout_seconds=10):
    &quot;&quot;&quot;
    Execute a JavaScript program using Node.js and return structured results.

    Parameters:
        code: The complete JavaScript code to execute (use console.log for output).
        timeout_seconds: Maximum execution time (default: 10 seconds).

    Returns:
        A dictionary containing success status, stdout, stderr, return code, and a summary message.
    &quot;&quot;&quot;
    # Create temporary JS file (more reliable than stdin for complex scripts)
    with tempfile.NamedTemporaryFile(
            suffix=&quot;.js&quot;, delete=False, mode=&quot;w&quot;, encoding=&quot;utf-8&quot;
    ) as f:
        f.write(code)
        temp_path = f.name

    try:
        result = subprocess.run(
            [&quot;node&quot;, temp_path],
            capture_output=True,
            text=True,
            timeout=timeout_seconds,
            check=False
        )

        return {
            &quot;success&quot;: result.returncode == 0,
            &quot;stdout&quot;: result.stdout.strip(),
            &quot;stderr&quot;: result.stderr.strip(),
            &quot;return_code&quot;: result.returncode,
            &quot;message&quot;: (
                &quot;JavaScript program executed successfully.&quot;
                if result.returncode == 0
                else f&quot;JavaScript program exited with code {result.returncode}.&quot;
            ),
        }

    except subprocess.TimeoutExpired:
        return {
            &quot;success&quot;: False,
            &quot;stdout&quot;: &quot;&quot;,
            &quot;stderr&quot;: &quot;Execution timed out.&quot;,
            &quot;return_code&quot;: -1,
            &quot;message&quot;: f&quot;Execution timed out after {timeout_seconds} seconds.&quot;,
        }
    except FileNotFoundError:
        return {
            &quot;success&quot;: False,
            &quot;stdout&quot;: &quot;&quot;,
            &quot;stderr&quot;: &quot;Node.js not found.&quot;,
            &quot;return_code&quot;: -1,
            &quot;message&quot;: &quot;Node.js (&apos;node&apos;) command not found. Please install Node.js and ensure it is in your PATH.&quot;,
        }
    except Exception as e:
        return {
            &quot;success&quot;: False,
            &quot;stdout&quot;: &quot;&quot;,
            &quot;stderr&quot;: str(e),
            &quot;return_code&quot;: -1,
            &quot;message&quot;: f&quot;Failed to execute JavaScript program: {str(e)}&quot;,
        }
    finally:
        # Clean up temporary file
        if os.path.exists(temp_path):
            try:
                os.unlink(temp_path)
            except Exception:
                pass


# &#x2500;&#x2500; Server Startup with CORS (required for llama.cpp frontend) &#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;&#x2500;
if __name__ == &quot;__main__&quot;:
    middleware = [
        Middleware(
            CORSMiddleware,
            allow_origins=[&quot;*&quot;],  # Restrict in production
            allow_credentials=True,
            allow_methods=[&quot;GET&quot;, &quot;POST&quot;, &quot;OPTIONS&quot;],
            allow_headers=[&quot;*&quot;],
            expose_headers=[&quot;*&quot;],
        )
    ]

    app = mcp.http_app(path=&quot;/mcp&quot;, middleware=middleware)
    uvicorn.run(app, host=&quot;0.0.0.0&quot;, port=5003, log_level=&quot;info&quot;)</code></pre><p>Once you have your imports installed (you may need to pip install the above imports)</p><pre><code class="language-bash">pip install fastmcp starlette uvicorn</code></pre><p>When it runs it will show up as:</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-79.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin." loading="lazy" width="747" height="84" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-79.png 600w, https://www.hotconfig.com/content/images/2026/04/image-79.png 747w" sizes="(min-width: 720px) 720px"></figure><p>It can be added to the Lllam-cpp toolset as, and just reminding again you always sync your <strong>mcp as in:</strong></p><pre><code class="language-bash">http://192.168.1.3:/mcp</code></pre><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-80.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin." loading="lazy" width="552" height="212"></figure><h3 id="test-it">Test it</h3><ul><li>We are running a world-class Qwen3-Coder SOTA level 48B on house parts. If you would like to do the same on some house parts here is a detailed guide:</li></ul><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/qwen3-coder-next/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts.</div><div class="kg-bookmark-description">We field test a Qwen3-Coder-Next-UD-Q4_K_XL.gguf</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin."><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/image1-1.png" alt="Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin."></div></a></figure><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-81.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin." loading="lazy" width="734" height="510" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-81.png 600w, https://www.hotconfig.com/content/images/2026/04/image-81.png 734w" sizes="(min-width: 720px) 720px"></figure><h3 id="wild-first-the-llm-repaired-its-own-connection">Wild First. The LLM &apos;repaired&apos; its own Connection?!</h3><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-82.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin." loading="lazy" width="734" height="614" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-82.png 600w, https://www.hotconfig.com/content/images/2026/04/image-82.png 734w" sizes="(min-width: 720px) 720px"></figure><h3 id="docker-contarization">Docker Contarization.</h3><ul><li>At this points it&apos;s always <em>really</em> important to containerize this. &#xA0;That way if your LLM glitches or goes off on a tangent it won&apos;t hurt anything. You can simply turn off the container and restart it! If you need a full guide on docker basics here you <a href="https://vpscoder.com/easy-bake-oven-recipe-guide-for-docker-containers/">go!</a></li><li>Make a workdir</li></ul><p>Create <strong>requirements.txt</strong>, put inside it:</p><pre><code>fastmcp
starlette
uvicorn[standard]
</code></pre><p>Create <strong>Dockerfile</strong>, put inside it:</p><ul><li>This creates an image</li></ul><pre><code class="language-dockerfile">FROM nikolaik/python-nodejs:python3.12-nodejs22

# Set working directory
WORKDIR /app

# Copy and install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code (save the provided Python script as app.py)
COPY app.py .

# Expose the port used by the MCP server
EXPOSE 5003

# Run the application
CMD [&quot;python&quot;, &quot;app.py&quot;]
</code></pre><p>Create <strong>docker-compose.yml</strong>, put inside it:</p><ul><li>This is the &apos;stand-up&apos; instructions that will stand up the docker image into a running container.</li></ul><pre><code class="language-yaml">version: &apos;3.9&apos;

services:
  javascript-program-tester:
    build: .
    ports:
      - &quot;5003:5003&quot;
    restart: unless-stopped
    # Optional: for local development with live code changes
    # volumes:
    #   - .:/app
</code></pre><h3 id="usage-instructions">Usage Instructions</h3><ol><li>Save the provided Python code as <code>app.py</code> in the same directory as the files above.</li><li>Place <code>requirements.txt</code>, <code>Dockerfile</code>, and <code>docker-compose.yml</code> in the project root.</li><li>Build and start the container:</li></ol><pre><code>docker compose up --build
#or
docker compose up  #Diagnostic mode to watch it go.
docker compose up -d #Daemon mode - permanently runs. </code></pre><p>The first time it builds will look something like this:</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-83.png" class="kg-image" alt="Agentic Server Primer: Llama.cpp MCP Lesson 5: Adding javascript via a Python api plugin." loading="lazy" width="1300" height="170" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-83.png 600w, https://www.hotconfig.com/content/images/size/w1000/2026/04/image-83.png 1000w, https://www.hotconfig.com/content/images/2026/04/image-83.png 1300w" sizes="(min-width: 720px) 720px"></figure><ol><li>The MCP server will be available at <code>http://localhost:5003/mcp</code>.</li></ol><p>This configuration ensures:</p><ul><li>The Python environment includes all required packages (<code>fastmcp</code>, <code>starlette</code>, and <code>uvicorn</code>).</li><li>Node.js (v22) is pre-installed and available in the PATH, enabling the <code>test_javascript_program</code> tool to execute JavaScript code via <code>subprocess</code> without errors.</li><li>The container is lightweight, secure, and production-ready with automatic restarts.</li><li>CORS middleware and the server startup logic from your original code remain fully functional.</li></ul><p>The setup has been verified for compatibility with the provided script and the explicit requirement to support Node.js execution.</p><h3 id="summary">Summary</h3><ul><li>This might seem like a lot of work, but if you can understand a bit of python, a bit of llm, a bit of mcp, and make sure your ports are open - you can be writing mcp tools like a champ!</li></ul>]]></content:encoded></item><item><title><![CDATA[Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts.]]></title><description><![CDATA[We field test a Qwen3-Coder-Next-UD-Q4_K_XL.gguf]]></description><link>https://www.hotconfig.com/qwen3-coder-next/</link><guid isPermaLink="false">69dbec31dc07930001450555</guid><category><![CDATA[Qwen3-Coder]]></category><category><![CDATA[House LLM]]></category><category><![CDATA[agentic]]></category><category><![CDATA[LLM]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Sun, 12 Apr 2026 19:20:33 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/04/image1-1.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/04/image1-1.png" alt="Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts."><p>The Qwen3-Coder fork released by Unsloth just benchmarked in performance proximity with 235B sized models and is looking to have SOTA level performance for April 2026! Here is the low-down. &#xA0;At the time of this writing the model was so new that HuggingFace.com did not have the auto-full benchmarks out for it, but we were able to find them. Again incredible accolades to Unsloth AI, who just keep making incredible models!</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.hotconfig.com/content/images/2026/04/image-71.png" class="kg-image" alt="Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts." loading="lazy" width="529" height="100"><figcaption>Unsloth OpenSources Incredible Products that Make the LLM World Go Round..</figcaption></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.hotconfig.com/content/images/2026/04/image-72.png" class="kg-image" alt="Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts." loading="lazy" width="2000" height="879" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-72.png 600w, https://www.hotconfig.com/content/images/size/w1000/2026/04/image-72.png 1000w, https://www.hotconfig.com/content/images/size/w1600/2026/04/image-72.png 1600w, https://www.hotconfig.com/content/images/size/w2400/2026/04/image-72.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Source: huggingface.com</figcaption></figure><p>The best part is you <em>can run frontier models such as this Qwen3-Coder that are efficient, fast and reliable on under $2000 in house computer parts! </em> Here is the setup:</p><pre><code class="language-bash">/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3-Coder-Next-UD-Q4_K_XL.gguf \
--host 192.168.1.3 \
--n-gpu-layers 999 \
--flash-attn on \
--override-tensor &quot;\.ffn_.*_exps\.weight=CPU&quot; \
--cache-type-k turbo3 \
--cache-type-v turbo3 \
-c 262144 \
--temp 0.7
</code></pre><p>The key was right here, as the normal <strong>&apos;--fit&apos;</strong> actually fails on llama-cpp. what <strong>&apos;--fit&apos; </strong>does is the llama-server will then try to balance the model between the VRAM/CPU.. It cannot do it. However this actually worked!</p><pre><code class="language-bash">--override-tensor &quot;\.ffn_.*_exps\.weight=CPU&quot; \</code></pre><p>This is telling the llama-cpp that heavy tensor models - load to the GPU, light layers keep on the CPU. &#xA0;This overcame the RAM/VRAM bottleneck plauging setups everywhere and people were spending $1000&apos;s to buy unified VRAM systems such as Mac Pros. </p><p><strong>An Even Faster Configuration suggested by: </strong></p><pre><code class="language-bash">https://x.com/iam_shanmukha</code></pre><p>We were able to get up to 33 Tokens/s. </p><pre><code class="language-bash">/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \
--host 192.168.1.3 \
--fit on \
--flash-attn on \
--spec-type ngram-mod \
--spec-ngram-size-n 24 \
--n-cpu-moe-draft 39 \
-t 14 \
--chat-template-kwargs &apos;{&quot;preserve_thinking&quot;:true}&apos; \
--cache-type-k turbo3 \
--cache-type-v turbo4 \
-c 512000 \
--temp 0.7</code></pre><h3 id="turbo3-is-an-advanced-quant-fork-of-llama-cpp">Turbo3 is an Advanced Quant Fork of Llama-cpp!</h3><ul><li>Did you notice the non-normal <strong>--cache-type-k turbo3 / --cache-type-v turbo3</strong>? &#xA0;Those are the bleeding edge as of March 2026 TurboQuant forks. </li><li>Full credit to &apos;The Tom&apos; who graciously created one of the TurboQuant forks!</li></ul><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/TheTom/turboquant_plus"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - TheTom/turboquant_plus</div><div class="kg-bookmark-description">Contribute to TheTom/turboquant_plus development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt="Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts."><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">TheTom</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/3d34fbbbd7ef9a5f8153ece669e103564dc9320c68ad27a9c67532113c4dcaed/TheTom/turboquant_plus" alt="Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts."></div></a></figure><h3 id="would-you-like-a-detailed-guide-for-a-custom-compiled-llama-ccp-turboquant-fork">Would you Like a Detailed Guide for a Custom Compiled Llama-ccp TurboQuant Fork?</h3><p>Here is a full self-compiling guide where a 112B used the same TensorBalance method to fit it, ran on a 16GB GPU, and ran the rest in RAM. Incredibly it was good - and usable with a respectable 12.8 Tokens/s!</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/one-shot/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Qwen3.5-122B-A10B-Q4_K_M.gguf - Run it at 13 Tokens/s with 262,000 Contexts on a Ryzen 9 3900 and a 4080ti. w/128GB RAM.</div><div class="kg-bookmark-description">Qwen3.5-122B-A10B-Q4_K_M.gguf - Run it at 13 Tokens/s with 262,000 Contexts on a Ryzen 9 3900 and a 4080ti. w/128GB RAM.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts."><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/AKnVU.jpg" alt="Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts."></div></a></figure><p>You will need the specialized llama-cpp fork TurboQuant w/ &#xA0;that override-tensor configuration. &#xA0;In our case wow. 25.48 t/s &#xA0;as we ask it to one-shot an Asteroids game that is 3x more complex.</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-73.png" class="kg-image" alt="Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts." loading="lazy" width="527" height="32"></figure><p>Is it powerful? We asked it to one-shot an entire Asteroids game, then asked it to one-shot a game that was 3x more complex. &#xA0;It got a majority of it right. </p><p>If you would like a PILE of agentic tools OPENSOURCE from mysql, to docker, to context managers they are here:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.hotconfig.com/easy-bake-mcp-docker-tools/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">PowerChest home Agent Agents MCP Tools to Put your HouseLLM into Turbo.</div><div class="kg-bookmark-description">Downloads Page for all your MCP tooling needs!</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.hotconfig.com/favicon.png" alt="Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts."><span class="kg-bookmark-author">Hot Config</span><span class="kg-bookmark-publisher">thinkmeltprotonmail.com</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://www.hotconfig.com/content/images/2026/04/surf_bot.jpg" alt="Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts."></div></a></figure><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-75.png" class="kg-image" alt="Qwen3-Coder-Next-UD-Q4-K_XL.gguf on a Ryzen 9/4080ti. Run a 48GB SOTA Tensor-Balanced on a $2K set of House Parts." loading="lazy" width="873" height="453" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-75.png 600w, https://www.hotconfig.com/content/images/2026/04/image-75.png 873w" sizes="(min-width: 720px) 720px"></figure>]]></content:encoded></item><item><title><![CDATA[llmfit - Fast LLM Metric Fitter and Pulling Tool]]></title><description><![CDATA[We have a look at a fast fitting tool for comparing our hardware to the LLM market.]]></description><link>https://www.hotconfig.com/llmfit-fast-llm-metric-fitter-and-pulling-tool/</link><guid isPermaLink="false">69db0dbadc079300014504dd</guid><category><![CDATA[llmfit]]></category><category><![CDATA[House LLM]]></category><category><![CDATA[agentic]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Sun, 12 Apr 2026 03:37:48 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/04/uCjt7.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://www.hotconfig.com/content/images/2026/04/uCjt7.jpg" alt="llmfit - Fast LLM Metric Fitter and Pulling Tool"><p>Very nice. &#xA0;Instead of wondering if your system can handle a model - just use this! </p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/AlexsJones/llmfit"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - AlexsJones/llmfit: Hundreds of models &amp; providers. One command to find what runs on your hardware.</div><div class="kg-bookmark-description">Hundreds of models &amp; providers. One command to find what runs on your hardware. - AlexsJones/llmfit</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt="llmfit - Fast LLM Metric Fitter and Pulling Tool"><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">AlexsJones</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/1944f4eb466dd5f4cdd5a8f5632158e4c7e866574734564c699ac91a961999c2/AlexsJones/llmfit" alt="llmfit - Fast LLM Metric Fitter and Pulling Tool"></div></a></figure><h3 id="lets-compile-it-from-source">Let&apos;s compile it from source!</h3><p>This is built in rust, so you need cargo! (And your build-essential naturally)</p><pre><code class="language-bash">sudo apt install cargo git cmake gcc</code></pre><p>Once you have done that you simply pull the repository:</p><pre><code class="language-bash">git clone https://github.com/AlexsJones/llmfit.git</code></pre><p>Because this is Rust you use cargo to build it:</p><pre><code class="language-bash">cargo build --release</code></pre><p>The output files will sit at <strong>target/release/</strong></p><ul><li>We copied lib* and llm* to /usr/bin/</li></ul><pre><code class="language-bash">sudo cp lib* /usr/bin/
sudo cp llm* /usr/bin/</code></pre><h3 id="running-it">Running it!</h3><pre><code class="language-bash">llmfit</code></pre><p>It automatically lists all models that may fit your system, and the estimated number of tokens/s you may expect to obtain if you tried to local LLM them.<br></p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-63.png" class="kg-image" alt="llmfit - Fast LLM Metric Fitter and Pulling Tool" loading="lazy" width="1553" height="349" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-63.png 600w, https://www.hotconfig.com/content/images/size/w1000/2026/04/image-63.png 1000w, https://www.hotconfig.com/content/images/2026/04/image-63.png 1553w" sizes="(min-width: 720px) 720px"></figure><p>Selecting &apos;/&apos; and typing &apos;coder&apos; it automatically shows a filter. I do believe this is pulling straight from huggingface.</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-64.png" class="kg-image" alt="llmfit - Fast LLM Metric Fitter and Pulling Tool" loading="lazy" width="1574" height="313" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-64.png 600w, https://www.hotconfig.com/content/images/size/w1000/2026/04/image-64.png 1000w, https://www.hotconfig.com/content/images/2026/04/image-64.png 1574w" sizes="(min-width: 720px) 720px"></figure><p>Automatic Downloads,</p><p>You can literally hit &apos;d&apos; to Download a model if you think it is what you want / good.</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-65.png" class="kg-image" alt="llmfit - Fast LLM Metric Fitter and Pulling Tool" loading="lazy" width="415" height="165"></figure><p>In the bottom right it will show the model:</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-66.png" class="kg-image" alt="llmfit - Fast LLM Metric Fitter and Pulling Tool" loading="lazy" width="358" height="52"></figure><p>If you hit &apos;P&apos; for providers it will automatically make a list of inference providers if you want to use a cloud LLM - Nice!</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-67.png" class="kg-image" alt="llmfit - Fast LLM Metric Fitter and Pulling Tool" loading="lazy" width="321" height="602"></figure><p>Nice details. &#xA0;Real details was put into this text-based app, as the one model is downloading you can see the spinning progression.</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/5c269c1a-d960-472d-837d-094f7b609cac.png" class="kg-image" alt="llmfit - Fast LLM Metric Fitter and Pulling Tool" loading="lazy" width="414" height="83"></figure><p>Simulate</p><ul><li>You can simulate what you might expect to see when it runs (S)</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-69.png" class="kg-image" alt="llmfit - Fast LLM Metric Fitter and Pulling Tool" loading="lazy" width="413" height="221"></figure><h3 id="summary">Summary</h3><ul><li> This is very good for saving your time navigating a pile of pages at huggingface.com - trying to see if they might run, and what to expect from them. Tap a button and it&apos;s pulling the latest model for you! </li><li>If you are someone who works with downloading and running LLM&apos;s on a daily basis, or benchmarks them, this tool is really handy. So.</li></ul><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-70.png" class="kg-image" alt="llmfit - Fast LLM Metric Fitter and Pulling Tool" loading="lazy" width="474" height="343"></figure>]]></content:encoded></item><item><title><![CDATA[Qwen3.5-122B-A10B-Q4_K_M.gguf - Run it at 13 Tokens/s with 262,000 Contexts on a Ryzen 9 3900 and a 4080ti. w/128GB RAM.]]></title><description><![CDATA[Qwen3.5-122B-A10B-Q4_K_M.gguf - Run it at 13 Tokens/s with 262,000 Contexts on a Ryzen 9 3900 and a 4080ti. w/128GB RAM.]]></description><link>https://www.hotconfig.com/one-shot/</link><guid isPermaLink="false">69d9c869dc07930001450417</guid><category><![CDATA[Qwen3.5]]></category><category><![CDATA[MOE]]></category><category><![CDATA[TurboQuant]]></category><category><![CDATA[Llama.cpp]]></category><dc:creator><![CDATA[thinkmelt@protonmail.com]]></dc:creator><pubDate>Sat, 11 Apr 2026 04:45:54 GMT</pubDate><media:content url="https://www.hotconfig.com/content/images/2026/04/AKnVU.jpg" medium="image"/><content:encoded><![CDATA[<h3 id="seriously">Seriously.</h3><ul><li>We ran a industrial grade LLM that can one-shot an entire Asteroids game, and is bleeding edge SOTA for 2026 on a <strong><u>$2000 house computer</u></strong>. &#xA0;How did we do it? Let&apos;s get started!</li></ul><img src="https://www.hotconfig.com/content/images/2026/04/AKnVU.jpg" alt="Qwen3.5-122B-A10B-Q4_K_M.gguf - Run it at 13 Tokens/s with 262,000 Contexts on a Ryzen 9 3900 and a 4080ti. w/128GB RAM."><p><strong>A. Install your basics</strong></p><pre><code class="language-bash">sudo apt install build-essential cmake python3 wget git</code></pre><p><strong>B. Latest Nvidia Cuda ToolKit Drivers w/nvcc</strong></p><ul><li>nvcc is a compiler specific to advanced Cuda Nvidia GPUs.</li></ul><pre><code class="language-bash">wget https://developer.download.nvidia.com/compute/cuda/13.2.0/local_installers/cuda-repo-debian13-13-2-local_13.2.0-595.45.04-1_amd64.deb
sudo dpkg -i cuda-repo-debian13-13-2-local_13.2.0-595.45.04-1_amd64.deb
sudo cp /var/cuda-repo-debian13-13-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-2</code></pre><ul><li>Make sure it works with nvcc --version, it will look like this:</li></ul><pre><code class="language-bash">c@dragon-192-168-1-3:~/models$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2026 NVIDIA Corporation
Built on Mon_Mar_02_09:52:23_PM_PST_2026
Cuda compilation tools, release 13.2, V13.2.51
Build cuda_13.2.r13.2/compiler.37434383_0</code></pre><p>Got it good! Lets get an advanced llama-cpp now</p><p><strong>C. Installing the Latest Llama-cpp.</strong></p><ul><li>Not any will do we are going to add in SOTA level TurboQuant capability:</li></ul><pre><code class="language-bash">git clone https://github.com/johndpope/llama-cpp-turboquant.git
cd llama-cpp-turboquant &amp;&amp; git checkout feature/planarquant-kv-cache</code></pre><h3 id="tricky-part-a-is-here">Tricky Part (A) is Here</h3><p>This part was exceptionally tricky because if you don&apos;t get it pretty much spot it just doesn&apos;t compile. &#xA0;We spent considerable time, but in essence we are doing this as specific parameters are <em>required</em> in order for it to compile.</p><p>Go into the pulled git repository directory (llama-cpp-turboquant) and make a file named <strong>build.sh</strong>, put inside of it:</p><pre><code class="language-bash">cmake  -DGGML_CUDA=ON \
-DCMAKE_CUDA_ARCHITECTURES=native \
-DCMAKE_CUDA_COMPILER_WORKS=TRUE \
-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc

cmake --build . --config Release -j$(nproc)</code></pre><p>chmod it so it&apos;s executable, naturally:</p><pre><code class="language-bash">chmod +x build.sh</code></pre><p>Run it.</p><pre><code class="language-bash">./build.sh</code></pre><p>And now wait. It Takes some time, and it may kick up errors we tried many things to get this to work but the above configuration worked for us. &#xA0;If it works you will see after a <strong>bin directory:</strong></p><p>It will look like this if the compile and build worked:</p><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-61.png" class="kg-image" alt="Qwen3.5-122B-A10B-Q4_K_M.gguf - Run it at 13 Tokens/s with 262,000 Contexts on a Ryzen 9 3900 and a 4080ti. w/128GB RAM." loading="lazy" width="1708" height="227" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-61.png 600w, https://www.hotconfig.com/content/images/size/w1000/2026/04/image-61.png 1000w, https://www.hotconfig.com/content/images/size/w1600/2026/04/image-61.png 1600w, https://www.hotconfig.com/content/images/2026/04/image-61.png 1708w" sizes="(min-width: 720px) 720px"></figure><p>If you have <em>no other llama-cpp</em> - as this is the special fork with TurboQuant / PolarQuant you can just copy all those files to your /usr/bin as in:</p><pre><code class="language-bash">sudo cp * /usr/bin</code></pre><p>The other option is just cp all of these to your own directory somewhere like ~/llama and then write your scripts from inside there.</p><p><strong>Easy Part - Get some Models!</strong></p><ul><li>Were almost there, time to get some models! You got this! Go to hugging face and pick out a model that will either fit your GPU and or share it with your CPU. &#xA0;Because TurboQuant and PolarQuant utterly sped up the speed of the KV Cache, this was basically impossible as of February 2026 to run inside a CPU, but now - yes you can!</li><li>We are building an example that worked to the limits of our equipment we had - which was a 4080ti 16GB VRAM and a Ryzen 9 3900 w/128 GB of RAM. You will need to tinker, but we will show it&apos;s really easy.</li><li>A direct link for a 120 GB SOTA level MOE </li></ul><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF/blob/main/UD-Q6_K_XL/Qwen3-Coder-Next-UD-Q6_K_XL-00002-of-00003.gguf"><div class="kg-bookmark-content"><div class="kg-bookmark-title">UD-Q6_K_XL/Qwen3-Coder-Next-UD-Q6_K_XL-00002-of-00003.gguf &#xB7; unsloth/Qwen3-Coder-Next-GGUF at main</div><div class="kg-bookmark-description">We&#x2019;re on a journey to advance and democratize artificial intelligence through open source and open science.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://huggingface.co/favicon.ico" alt="Qwen3.5-122B-A10B-Q4_K_M.gguf - Run it at 13 Tokens/s with 262,000 Contexts on a Ryzen 9 3900 and a 4080ti. w/128GB RAM."></div></div><div class="kg-bookmark-thumbnail"><img src="https://cdn-thumbnails.huggingface.co/social-thumbnails/models/unsloth/Qwen3-Coder-Next-GGUF.png" alt="Qwen3.5-122B-A10B-Q4_K_M.gguf - Run it at 13 Tokens/s with 262,000 Contexts on a Ryzen 9 3900 and a 4080ti. w/128GB RAM."></div></a></figure><p>Got it downloaded to your ~/models folder? Good! The last part is to simply activate it with Llama.cpp.</p><ul><li>You want to make some scripts. In essence the scripts will be fine-tuned to load the model, offload as much as it can to the GPU, also activate the specialty kv_cache TurboQuant to give yourself incredible speed boosts. Our exact script and we will get Grok 4 to describe every part of it and how we ran it.</li><li>In our instance we copied the above llama files as describe to our /usr/bin otherside just change the start of the script to where llama-server lives</li></ul><pre><code class="language-bash">/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.5-122B-A10B-Q4_K_M.gguf \
--host 192.168.1.3 \
--n-gpu-layers 999 \
--override-tensor &quot;\.ffn_.*_exps\.weight=CPU&quot; \
--flash-attn on \
--cache-type-k turbo3 \
--cache-type-v turbo3 \
-c 262144 \
--temp 0.7</code></pre><p>Just in case you are not sure what to do now - open a browser and go to where it sits which is typically port 8080. Your House LLM is sitting there. Ready to one-shot Asteroids or whatever you want to do with it.</p><p>Even FASTER: &#xA0;</p><p>We found this alternate balanced configuration and we are seeing 35 Tokens/s! Full credit to </p><pre><code class="language-bash">https://x.com/iam_shanmukha</code></pre><pre><code class="language-bash">/usr/bin/llama-server --jinja \
-m /home/c/models/Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \
--host 192.168.1.3 \
--fit on \
--flash-attn on \
--spec-type ngram-mod \
--spec-ngram-size-n 24 \
--n-cpu-moe-draft 39 \
-t 14 \
--chat-template-kwargs &apos;{&quot;preserve_thinking&quot;:true}&apos; \
--cache-type-k turbo3 \
--cache-type-v turbo4 \
-c 512000 \
--temp 0.7</code></pre><pre><code class="language-bash">http://192.168.1.3:8080</code></pre><figure class="kg-card kg-image-card"><img src="https://www.hotconfig.com/content/images/2026/04/image-62.png" class="kg-image" alt="Qwen3.5-122B-A10B-Q4_K_M.gguf - Run it at 13 Tokens/s with 262,000 Contexts on a Ryzen 9 3900 and a 4080ti. w/128GB RAM." loading="lazy" width="710" height="763" srcset="https://www.hotconfig.com/content/images/size/w600/2026/04/image-62.png 600w, https://www.hotconfig.com/content/images/2026/04/image-62.png 710w"></figure><h3 id="command-summary-and-have-fun">Command Summary and HAVE FUN!</h3><p>This command launches the <strong>llama-server</strong> binary (part of the llama.cpp project), which provides a lightweight, high-performance HTTP server for local large language model (LLM) inference. It implements an OpenAI-compatible API and includes a built-in web interface, enabling clients to interact with the model via standard REST endpoints for chat completions, completions, embeddings, and related tasks.</p><p>The command configures the server to run the Qwen3.5-122B-A10B model (a Mixture-of-Experts architecture with approximately 122 billion total parameters and 10 billion active parameters per token) in a highly optimized manner. It maximizes GPU acceleration while selectively managing memory usage for a large-scale MoE model, supports an extended 256K-token context window, and applies advanced quantization and attention optimizations.</p><p>Below is a detailed, parameter-by-parameter breakdown of the command:</p><ul><li><strong>/usr/bin/llama-server</strong>The full path to the compiled llama-server executable. This binary serves as the entry point for the server process.</li><li><strong>--jinja </strong>Explicitly enables the Jinja2 templating engine for processing chat templates. This is required (or strongly recommended) for models such as Qwen3.5, which rely on complex, model-specific Jinja-based chat templates stored in the GGUF metadata. It ensures accurate formatting of system/user/assistant messages and any special tokens or reasoning structures.</li><li><strong>-m /home/c/models/Qwen3.5-122B-A10B-Q4_K_M.gguf </strong>Specifies the path to the GGUF-format model file. This is a 4-bit quantized version (Q4_K_M) of the Qwen3.5-122B-A10B MoE model. The Q4_K_M quantization provides a strong balance of model quality and memory efficiency.</li><li><strong>--host 192.168.1.3 </strong>Binds the HTTP server to the specific network interface with IP address 192.168.1.3. This restricts listening to that address (instead of the default 0.0.0.0 or localhost), which is useful for controlled network exposure in a local LAN environment.</li><li><strong>--n-gpu-layers 999 </strong>Instructs the backend to offload as many model layers as possible (up to 999) to the GPU. The large value effectively offloads the entire feasible portion of the model to GPU memory, maximizing inference speed while respecting hardware limits.</li><li><strong>--override-tensor &quot;.ffn_.*_exps.weight=CPU&quot; </strong>Overrides the default buffer placement for specific model tensors. The regular expression targets all feed-forward network (FFN) expert weights (ffn_.*_exps.weight) and forces them onto the CPU. This is a critical optimization for large MoE models. Expert weights consume the majority of VRAM in such architectures; placing them on CPU (while keeping dense layers and other components on GPU) dramatically reduces GPU memory usage without severely impacting performance, enabling the 122B-parameter model to run on consumer or mid-range GPUs.</li><li><strong>--flash-attn on </strong>Explicitly enables Flash Attention (a memory-efficient and faster attention implementation). This reduces VRAM consumption during attention computations and improves both prompt-processing and token-generation throughput, particularly beneficial for long-context scenarios and modern GPUs.</li><li><strong>--cache-type-k turbo3 </strong>Sets the key (K) portion of the KV cache to the &#x201C;turbo3&#x201D; quantization format. Turbo3 is an advanced, low-precision KV cache type (available in recent llama.cpp builds or optimized forks) that provides extreme compression and high speed with minimal quality degradation compared to standard types such as f16 or q8_0.</li><li><strong>--cache-type-v turbo3 </strong>Applies the same &#x201C;turbo3&#x201D; quantization to the value (V) portion of the KV cache. Using turbo3 for both K and V further reduces memory bandwidth and cache size, which is especially advantageous at the 256K context length specified below.</li><li><strong>-c 262144 </strong>Sets the maximum context length (KV cache size) to 262144 tokens (256K tokens). This matches the native context capability of the Qwen3.5-122B-A10B model and allows the server to handle very long conversations or documents.</li><li><strong>--temp 0.7 </strong>Configures the default sampling temperature to 0.7. This controls output randomness: a value of 0.7 produces coherent yet moderately creative responses (lower values yield more deterministic output; higher values increase diversity).</li></ul><h3 id="summary-of-purpose-and-optimizations">Summary of Purpose and Optimizations</h3><p>This command starts a production-oriented inference server optimized for the Qwen3.5-122B-A10B MoE model on hardware with limited GPU VRAM relative to model size.</p><h3 id="are-we-bluffing-no-we-didnt-even-prompt-this-right">Are we Bluffing? No. We Didn&apos;t Even Prompt This Right.</h3>
        <div class="kg-card kg-file-card kg-file-card-medium">
            <a class="kg-file-card-container" href="https://www.hotconfig.com/content/files/2026/04/asteroids.html" title="Download" download>
                <div class="kg-file-card-contents">
                    <div class="kg-file-card-title">Asteroids</div>
                    
                    <div class="kg-file-card-metadata">
                        <div class="kg-file-card-filename">asteroids.html</div>
                        <div class="kg-file-card-filesize">30 KB</div>
                    </div>
                </div>
                <div class="kg-file-card-icon">
                    <svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><defs><style>.a{fill:none;stroke:currentColor;stroke-linecap:round;stroke-linejoin:round;stroke-width:1.5px;}</style></defs><title>download-circle</title><polyline class="a" points="8.25 14.25 12 18 15.75 14.25"/><line class="a" x1="12" y1="6.75" x2="12" y2="18"/><circle class="a" cx="12" cy="12" r="11.25"/></svg>
                </div>
            </a>
        </div>
        ]]></content:encoded></item></channel></rss>