Macalinao Studio — AI Systems, Implemented and Operated

AI Stack

01 · Stack

Claude · Hermes Agent · Paperclip · GPT-5.5

The agentic stack I implement against — a Claude / GPT-5.5 reasoning layer fronted by Hermes (the agent runtime) and Paperclip (the gateway / cron / delegation tier). One default agent, many models, all calls audited at the edge. Underneath it sits a persistent second brain — a file-based knowledge wiki the agents read first and write back to, with a scheduled "dream sequence" pass that ingests new material and lints the index.

What I'm learning

A knowledge base the agent maintains itself (ingest → page → index → log) turns chat ephemera into institutional memory — sessions stop re-discovering the same facts.
Plain markdown beats a database for agent memory: greppable, diffable, and any model can read it cold.
Centralizing access through one gateway makes cost, audit, and rate limits enforceable. Direct keys per tool is the slow drift into chaos.
"Agent" is a verb (a thing that decides what to call) more than a noun. Picking one default keeps the verb consistent.
Tool inventory belongs in the agent layer, not in the prompt — keeps prompts small and tools swappable.
One agent, many models is easier to operate than many agents, one model.
Cron-driven agent delegation is dramatically simpler than event-driven for small teams. Most "real-time" requirements aren't.
Mass-update workflows require idempotency from day one — or the rollback story gets ugly fast.
Audit logging at the gateway saves arguing about what an agent actually called when something breaks at 2am.

Tools

ClaudeHermes PaperclipGPT-5.5 Second Brain

Posture

One default agent
Many models
Audit at the edge

Generation Stack

02 · Stack

ComfyUI · GPT-Image2 · LTX-Video · AceStep 1.5 XL

The media-generation stack covering image, video, and audio — running as local generation on a private multi-node GPU farm. ComfyUI is the workflow editor; GPT-Image2 handles batch influencer renders; LTX-Video is the video diffusion runner; AceStep 1.5 XL covers music. LoRA fine-tuning is baked into the pipeline, and centralized orchestration handles model routing, throughput shaping, and node-level health.

What I'm learning

A saved workflow .json is more reproducible than any prompt — treat it like code, commit it, diff it.
Custom nodes age fast; pin the ones you depend on or accept the breakage.
Splitting the workflow at the latent stage (load, sample, decode separately) is the cheapest debugging move.
Influencer enrichment as structured JSON beats freeform prompt rewrites every time — and survives reruns.
Visual-DNA maps make characters consistent across hundreds of frames without re-prompting from scratch.
Video models reward thinking in shots, not seconds — pacing is a prompt input, not a render setting.
Music generation is most useful when the visual cut already exists — composition lives in the video, not the audio.
VRAM is the binding constraint on multi-model serving — utilization without VRAM tracking is a misleading green dashboard.
LoRA training stays cheap if the base model is locked and only the adapter cycles — full retrains rarely earn their cost.
Auto-load/unload is the difference between "real cluster" and "one big model that won't move."
Cost-per-generation only becomes a real number once node-level telemetry is wired in.
Long-running generations should checkpoint to disk; "cluster goes down at 80%" should not erase the work.
Throughput shaping per-tenant is the difference between "shared farm" and "one user starves everyone else."

Tools

ComfyUIGPT-Image2 LTX-VideoAceStep

Coverage

Image · Video
Audio · Music
LoRAs · Fine-tunes
Workflow as code

Local generation

Multi-node farm
Internal-only
Auto-routing

Hosted Stack

03 · Stack

Mail & Web Operations · Cloud Service · Backups · Disaster Recovery

The self-hosted operations layer — managed mail with proper authentication and an outbound relay, plus a portfolio-wide edge / DNS / proxy setup with structured contact-form sweeping. Backed by a managed cloud service tier, daily off-host backups, and a warm standby that turns disaster recovery into a DNS swap rather than a rebuild. All managed as data, all built to roll back per zone.

What I'm learning

Reputation is the metric that matters; everything else is upstream of it.
An outbound relay for transactional mail is a much better story than sending direct from the mail server.
The dry-run output IS the rollout plan. Anything you can't reproduce in dry-run won't behave the same on apply.
Parked domains are a contact-form attack surface most teams forget exists. Sweep them like any other input.
Edge rules outlive any single deployment script — write them as data, not as one-shot commands.
A managed cloud service tier earns its cost the first time a local node drops and traffic just keeps flowing.
A backup you've never restored from is a hope, not a backup — and "off-host" is the word in "off-host backup" that actually saves you.
Warm standby beats cold backup: when disaster recovery is a DNS swap instead of a rebuild, you'll actually reach for it under pressure.

Approach

MailWeb DNSEdge CloudBackups Recovery

Stack

SESAWS

Posture

Self-hosted
Managed deliverability
Warm standby
Phased rollout

AI Image/Video In House

04 · Live

3D Character Modeling · AI Web & UX Design · AI Film

In-house creative production built on AI image generation. A single generated character sheet becomes a 3D asset — image to mesh, auto-staged and lit in Blender, finished with a photoreal diffusion pass. The same image work drives AI-designed web products (an AI generation studio, live in production behind its own auth) and long-form AI film storytelling, where characters hold identity across hundreds of shots.

What I'm learning

Image-to-mesh models turn one character sheet into a riggable 3D asset in minutes — the bottleneck moved from modeling to art direction.
Driving Blender programmatically (stage, light, camera, render) makes 3D output reproducible the same way workflow JSON made image output reproducible.
3D for geometry, diffusion for skin: a photoreal img2img pass over a 3D render gives consistent characters in controlled poses.
Vibe-coded sites ship fastest when the design tokens are locked before the pages — six iterations of this site taught that rule.
Shipping an AI image studio as a live web product means deployment, auth, and tunnels are part of the design, not an afterthought.
Narrative film is consistency management — face anchors, seed discipline, and shot-level decomposition, not longer prompts.

Ernie Image → Krea 2 → Open the design studio →

Approach

Img→3DBlender Web/UXAI Film

Output

3D characters
Live web studio
Story-driven film

AI Attendant and Assistant

05 · Live

Voice Assistant · Phone, Discord & Web · Layered Memory · All AI Influencers Voiced

A speech-first assistant running in production on the studio's own GPUs — an open-weight reasoning model behind a self-hosted speech pipeline, reachable over a real phone line, Discord, and the browser. Every AI Influencer in the roster now carries a unique voice, turning each character into a distinct conversational presence. Layered memory — vector recall plus a relationship graph — lets it remember across calls and sessions, with web search wired in for fresh answers. A working demo is available.

What I'm learning

Turn-taking is harder than transcription — the assistant that interrupts itself loses every conversation.
A real phone line is the honest benchmark: telephony adds latency you can't refactor away, so the budget is spent before the model says a word.
Serving the LLM, TTS, and STT on shared GPUs is a memory-placement problem — capacity planning is VRAM planning.
Memory has layers: vector recall answers "what did we say," a relationship graph answers "how does it connect." An assistant needs both.
Model selection matters more than parameter tuning for TTS quality — and multi-language coverage locks the engine choice early.
Latency budget is real; sub-second response is the floor, sub-300ms first-syllable is the goal.
A voice influencer is a brand asset — it should be consistent across calls, sessions, even years.

Try the demo →

Approach

TTSSTT PSTNMemory Turn-taking

Coverage

Phone · Discord · Web
Layered memory
All AI Influencers voiced
Sub-300ms goal

AI Music Idols

06 · Live

Albums & Music Videos · Character-led Acts · Live Sites

A small label of in-house AI idols — characters with discographies, not producers with stage names. Currently QKeyV (synth · dark pop) and DvYnT (electronic · cinematic), each with a live artist site. Both acts are voiced and attended. The production direction stays deliberately vocal-forward. Releases ship as albums plus visual companions.

What I'm learning

Prompt versions are album versions — v6 replaced v5 the same day; the per-track lane file is the studio session.
A production constraint ("vocal-forward, no auto-tune") survives only if it's written into every track's prompt lane, not held in your head.
Visuals tied to the audio cut from day one beat post-hoc music videos — every time.
Album shape forces a story; single-track-only acts lose continuity fast.
An idol is a character with a discography, not a producer's stage name — write the influencer, then the songs.
Voice consistency across releases is what keeps an act feeling like one act — a distinct voice identity per idol is now in place for exactly this.
The fan loop (release → social → response → next release) is the actual product, not any single song.

QKeyV → DvYnT →

Acts

QKeyVDvYnT

Output

Albums + videos
Voiced · Attended
Live artist sites
DSP distribution

AI Influencers, Models, UGC

07 · Live

Personalities · Rosters · Images · Websites · Voices · LoRAs

Multiple rosters of AI Influencers — female models, male brand ambassadors, music idols, and professionals — each with a consistent identity, unique voice, and ready-to-publish content.

What I'm learning

An influencer is a brand asset — versioned, signed off, retired the same way logos are.
When influencers multiply past a few dozen, the registry is the product.
UGC feel comes from imperfection on purpose, not from prompt-perfect renders.
Cross-platform formats cost less when designed up front than retro-fitted.

VyAyEy → Male UGC → AI Influencers → Employees →

Rosters

VyAyEyMale UGC AI InfluencersEmployees

Capabilities

Voiced · Attended
Face-anchored
Brand-safe media
UGC-ready feeds

Use

Campaign assets
Social feeds
UGC seeding

AI Learning Course

08 · 44 Lessons · Live

4 Levels + 3 Specialist Tracks · 19 Live Playgrounds · Content as data, design as code

A from-the-ground curriculum on transformer-based AI, fully rebuilt as a generated static site. 44 lessons climb four levels — 01 Foundations, 02 The Transformer, 03 Training a Model, 04 Running Models — then branch into three specialist tracks: Image, Video, Voice. Lesson content lives as structured JSON; one Python builder renders every page through a single template — the curriculum is data, the design system is code. 19 live interactive playgrounds run inside the lessons, 24 lessons carry audio narration (125 mp3 clips), and 42 ERNIE-generated infographics illustrate the concepts. Current through 2026: agents & MCP, reasoning models & RLVR, flow matching, provenance & regulation. No login. No paywall.

Open the course →

The climb · 4 levels, then 3 specialist tracks

01 Foundations — tokens, embeddings, the math under next-token prediction.
02 The Transformer — attention, Q/K/V, the block, what a forward pass actually does.
03 Training a Model — pretraining, RLHF, PEFT/LoRA, reasoning models & RLVR.
04 Running Models — inference, the KV-cache, agents & MCP, provenance & regulation.
Specialist tracks — Image (diffusion, flow matching), Video, Voice — each inheriting the same foundation.

Interactive system · lessons you can poke at

19 live playgrounds embedded in lessons — tokenizer, attention heatmap, LoRA rank slider, diffusion denoiser, KV-cache visualizer, and more.
Homogenized card system across every page: expandable concept cards, animated pipeline flow rails, count-up stat strips.
localStorage progress tracking plus a filter/search course map — resume exactly where you left off.

What I learned shipping the course

Curriculum survives best when anchored on principles rather than specific products — model names date; the residual stream and the diffusion process won't.
A common foundation that all specialist tracks inherit beats duplicating the basics per track — mirrors how the systems actually work.
Content-as-data pays off at scale: 44 lessons as structured JSON rendered through one template means a design change is one edit, not 44.
Interactive playgrounds teach what prose can't — dragging the LoRA rank slider explains low-rank adaptation faster than any paragraph.
Staying 2026-current is a maintenance contract, not a milestone — agents, RLVR, and regulation will need revisits; the JSON structure makes those cheap.

Curriculum

44 lessons
4 levels
3 specialist tracks

Interactives

19 live playgrounds
Concept cards
Flow rails · Stat strips
Progress tracking

Media

125 mp3 clips
24 narrated lessons
42 ERNIE infographics

Format

Self-pacedNo login AudioPlaygrounds

Source

github.com /
ryzenx570 /
LLM-Understanding
macalinao.com/learning

Security, IDS, Hardening

09 · Live

SOC Posture · LLM-written Daily Briefs · Intrusion Detection · Runbooks

Continuous security work across access policy, an ongoing threat-tracking notebook, an intrusion-detection layer, versioned hardening checklists, and a monitoring runbook. An LLM reads the night's logs and writes the daily security brief, with a continuous monitor sweeping between briefs at 15-minute cadence; every new service passes a security check-in before it ships. Real attack analysis feeds the runbook back instead of drifting into archive.

What I'm learning

A threat database without a monitoring runbook is a graveyard. Alerting is what makes the data load-bearing.
An LLM that reads the logs every morning and writes three paragraphs turns "the audit log exists" into "the audit log gets read."
A security check-in gate on every new service catches the boring 80% — default creds, open ports, missing rate limits — before exposure, not after.
Hardening checklists need versioning the same way code does — each revision should explicitly subsume the last, with a diff.
Brute-force at the perimeter is still the floor of what you'll see, and rate-limit tooling is still the cheapest mitigation.
Snapshot completion does not equal recovery. Restore drills uncover gaps the snapshot job hides.
IDS signal-to-noise ratio drives whether you'll act on it — tune for false-positive cost, not detection rate.
Deploy scripts are part of the attack surface — a mirror-style rsync will happily publish your internal docs if you let it.

Approach

HardeningIDS Rate LimitsRunbook

Stack

CloudflareMail Certs

Cadence

Daily LLM brief
15-min monitor
Weekly audit

AI systems,
implemented & operated.

Nine projects, three stacks, one workshop.

How a project actually moves.

Route, don't proliferate

One-shot rate is the cost line

Dry-run is the spec

AI Influencer = structured DNA

Memory is the binding constraint

The runbook is the alert

AI systems, implemented & operated.

Nine projects, three stacks, one workshop.

How a project actually moves.

Route, don't proliferate

One-shot rate is the cost line

Dry-run is the spec

AI Influencer = structured DNA

Memory is the binding constraint

The runbook is the alert

AI systems,
implemented & operated.