Learn

Building a Fully Local, Air-Gapped Splunk AI Assistant Using LibreChat, Ollama, and Splunk MCP

This guide shows how to build a fully local AI assistant for Splunk-adjacent workflows using LibreChat as the web UI and Ollama as the model backend, with a clear path to Splunk MCP integration later. The design keeps data on-prem, avoids cloud AI services, and is built for repeatable operations in controlled environments.

What you will build

LibreChat running in Docker on a Raspberry Pi 5 (or similar Linux host)
Ollama serving local models from a second machine (or the same machine)
A validated LibreChat custom endpoint that connects to Ollama over HTTP
A foundation for adding Splunk MCP tools later without cloud dependencies

1) Background and Motivation

Many organizations cannot send operational data to cloud-hosted LLM services. In Splunk environments, that restriction is common in regulated sectors, internal SOC networks, critical infrastructure, and any environment with strict data sovereignty rules.

Air-gapped design means no dependency on external AI APIs at runtime. You control the model host, the UI host, network paths, and logs. That gives you predictable behavior and clearer audit boundaries.

Tool-driven patterns matter here. A model should not have direct unrestricted access to production systems. Instead, it should call controlled tools that enforce identity, permissions, and scope.

MCP (Model Context Protocol) is a controlled tool interface for this pattern. The model does not talk directly to Splunk internals. It requests tool actions, and the tool layer enforces RBAC, tokens, and approved operations. This article focuses on the local UI and local LLM side needed before adding tools. It does not depend on Splunk AI Assistant.

A web UI is also practical: many analysts, responders, and platform users are not SPL experts. A UI lets them ask for help safely while you retain backend control.

2) Architecture Overview

Architecture in plain words:

Raspberry Pi 5 runs LibreChat in Docker and provides the browser UI.
A separate machine (Windows, Linux, or macOS) runs Ollama and serves local models over HTTP.
LibreChat sends prompts to Ollama through a configured custom endpoint.
Later, a Splunk MCP Server can be added as the tool provider.

LibreChat is the operator-facing interface: authentication, conversation controls, endpoint wiring, and model selection behavior.

Ollama is the local model runtime: it downloads or imports models and serves them via a local API. No cloud model endpoint is required.

Docker is used to package services with predictable dependencies and repeatable startup. You avoid host package drift and keep deployment behavior consistent.

A Pi 5 is usually sufficient for the UI tier because it is handling web/API services, not heavy model inference. Inference load stays on the Ollama host.

Internet is not required for day-to-day operation once components and models are already available in your internal environment.

3) Prerequisites

Minimum requirements:

Raspberry Pi 5 (or equivalent Linux server) for LibreChat
SSH access to manage the host
Docker Engine and Docker Compose plugin installed
A second machine (or same machine) running Ollama; Windows GPU hosts are a common choice
Basic terminal familiarity

Required Downloads and References

Install Docker and Compose on Raspberry Pi OS

On current Raspberry Pi OS builds, Docker packages are available from apt:

sudo apt update
sudo apt install -y docker.io docker-compose-plugin git curl
sudo systemctl enable --now docker
sudo usermod -aG docker $USER

Log out and back in after group changes, then verify:

docker --version
docker compose version

If docker-compose-plugin is unavailable in your repo mirror, install Docker using your standard internal package method, then confirm docker compose works before continuing.

4) Installing LibreChat

LibreChat source is hosted in GitHub. Clone the repository so you can run the supported Compose workflow and maintain clean local overrides.

git clone https://github.com/danny-avila/LibreChat.git
cd LibreChat
cp .env.example .env

The .env file is a plain text key/value file used by Docker Compose and LibreChat startup scripts for environment-specific settings.

Do not edit core compose files directly. Keep local changes in override files so updates from upstream remain easy to merge.

5) Understanding LibreChat’s Docker Setup

LibreChat is a multi-service application. You typically see service definitions for the API/backend, web client, MongoDB, and supporting components. This separation makes startup and dependencies explicit.

The backend service is named api. That is the service which reads librechat.yaml, validates endpoint configuration, and brokers model calls.

Docker Compose supports layered configuration. A base file defines defaults and an override file applies local changes. That is why override files exist.

In plain terms: an override file is a patch. It only contains the changes you need, while the base file remains untouched.

6) Creating the Docker Override File

Create docker-compose.override.yaml in the LibreChat repository root. It can start empty and then only include the local settings required for this deployment.

The target service must be api, because that is where librechat.yaml is consumed.

services:
  api:
    volumes:
      - ./librechat.yaml:/app/librechat.yaml:ro
    extra_hosts:
      - "host.docker.internal:host-gateway"

Line by line explanation:

volumes: mounts your host config file into the container so the API can read it.
:ro: read-only mount. The container can read config but cannot modify it.
extra_hosts: adds a name-to-IP mapping inside the container.
host.docker.internal:host-gateway: lets containers reach services on the Docker host in a portable way.

Why localhost fails in containers: inside Docker, localhost means "this container," not your host and not another machine.

7) Installing and Configuring Ollama

Ollama runs LLM models locally and exposes them through an HTTP API. LibreChat connects to that API endpoint.

The /v1/ path matters because LibreChat expects an OpenAI-compatible API shape for custom endpoints.

Pick your Ollama host OS

If your GPU is on a Windows gaming machine, use that as the Ollama host and keep LibreChat on the Pi. That split is common and works well.

Windows (recommended when your GPU is on Windows)

Install Ollama for Windows from the official installer and start the app.
Open PowerShell and verify the local API:

curl http://127.0.0.1:11434/api/tags
netstat -ano | findstr 11434

Expose Ollama to your LAN so LibreChat can reach it:

setx OLLAMA_HOST "0.0.0.0:11434"

Then fully restart Ollama (quit and reopen). Add a firewall rule for TCP 11434:

New-NetFirewallRule -DisplayName "Ollama 11434" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 11434

Linux

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama

Verify listener and API:

ss -ltnp | rg 11434
curl http://127.0.0.1:11434/api/tags

Bind to all interfaces if LibreChat is on another host:

sudo systemctl edit ollama

Add:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

Then reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart ollama

macOS

Install Ollama for macOS, launch it, then verify:

curl http://127.0.0.1:11434/api/tags
lsof -iTCP:11434 -sTCP:LISTEN

For remote access from LibreChat, set OLLAMA_HOST=0.0.0.0:11434 in the Ollama runtime environment and restart Ollama.

Expected behavior on any OS: /api/tags returns JSON with a models list. If the list is empty, pull a model first.

Also allow TCP/11434 in the host firewall between the LibreChat host and Ollama host.

8) Creating librechat.yaml

LibreChat requires a YAML configuration to define endpoints and model behavior. The API validates this file at startup using a schema.

Schema validation errors often mention Zod. In plain language, that means a required field is missing or has the wrong type.

Required sections include:

version
endpoints
models.default for your custom endpoint definition

Working example:

version: "1.0.0"

endpoints:
  custom:
    - name: "Ollama"
      apiKey: "ollama"
      baseURL: "http://192.168.169.173:11434/v1/"
      models:
        fetch: true
        default:
          - "gpt-oss:20b"
      titleModel: "current_model"
      summarize: false
      modelDisplayLabel: "Ollama"

Field explanation:

version: config schema version expected by LibreChat.
endpoints.custom: list of non-default providers.
name: provider name shown in UI.
apiKey: required field for provider shape; local placeholder is fine.
baseURL: Ollama endpoint, including trailing /v1/.
models.fetch: query provider for available models dynamically.
models.default: startup/default model list for the provider.
titleModel: title behavior in conversations.
summarize: whether to run summary behavior for this endpoint.
modelDisplayLabel: friendly label shown in model selection UI.

9) Bringing LibreChat Online

Use deploy-compose.yml as the base runtime compose file, then add your override file explicitly.

docker compose -f deploy-compose.yml -f docker-compose.override.yaml up -d
docker compose -f deploy-compose.yml -f docker-compose.override.yaml logs -f api

Why specify files explicitly: it avoids ambiguity and guarantees the same startup path each time.

Common errors and fixes

service has no image: you likely used the wrong compose file set. Use deploy-compose.yml with your override.
invalid configuration: YAML syntax error or wrong indentation in librechat.yaml.
missing fields: required keys like version, endpoints, or models.default are absent.

Logs are the source of truth for startup failures. Check API logs first whenever the UI loads but model calls fail.

10) Verifying Ollama Connectivity

Test from inside the api container, not just from the host shell. This confirms real container-to-Ollama network reachability.

docker compose -f deploy-compose.yml -f docker-compose.override.yaml exec api curl http://OLLAMA_IP:11434/api/tags

Expected output is JSON containing a models array. If this request fails, check IP, port, firewall, and Ollama binding.

If Ollama runs on Windows, use the Windows host LAN IP (for example from ipconfig) in both this test command and baseURL. Do not use localhost unless Ollama and LibreChat are on the same machine.

11) Selecting Models in LibreChat

Models are governed by backend configuration and provider state. They are not freely added from the GUI because endpoint admins control what is available.

With fetch: true, LibreChat asks Ollama for all available models and shows them in selection lists.

Governance implication: if certain models are disallowed, remove them at the Ollama layer. That is safer than relying on UI-only restrictions.

12) Security and Air-Gapped Considerations

No cloud AI dependency is required for inference.
Data stays inside your controlled network boundary.
Docker provides process and filesystem isolation per service.
Future MCP integration can enforce token-scoped tool access.
Architecture aligns with least privilege and auditability goals common in Splunk operations.

This is not "secure by default" without operations discipline, but it is a strong base for controlled deployment in restricted environments.

13) What This Enables Next

Once the local UI and model path is stable, you can extend the platform with:

Splunk MCP Server integration for tool-based query and action workflows
Constrained tool execution tied to Splunk roles and service accounts
RAG patterns for SPL guidance grounded in approved internal docs
Role-based endpoint access and model policies
Audit logs across UI requests, tool calls, and backend actions

14) Summary

You built a fully local assistant stack with LibreChat as the front-end control plane and Ollama as the on-prem model runtime. The components communicate over explicit network paths and validated configuration, with no dependency on cloud AI APIs.

This approach works because each layer has a clear responsibility: UI mediation, model serving, and future tool governance through MCP. It scales by separating UI and inference hosts and by enforcing policy at the endpoint and model layers.

For organizations that need deterministic behavior, stronger audit boundaries, and local data control, this architecture is a practical and safer foundation than sending Splunk context to external AI services.