Est. 2019 New City, New York Studio · Lab · Ops

AI systems,
implemented & operated.

Practice
AI implementation, generative pipelines, and infrastructure orchestration — across studio work and a private compute environment.
Based
New City, New York — operating remote, with private compute set up for generation-heavy work.
Currently
Shipping voiced AI Influencers, live idol sites, and a working voice attendant demo.
Booking
Q3 · 2026
Accepting brief-stage engagements. 20-min intro, no decks required.
01 — 09
Selected work

Nine projects, three stacks, one workshop.

A snapshot of the work currently moving through the studio. The first three cards are stacks — collections of tools I implement against. The remaining six are projects shipping out of those stacks. Click any card for the full read.

AI Stack — agentic routing topology
AI Stack
01 · Stack
Claude · Hermes Agent · Paperclip · GPT-5.5

The agentic stack I implement against — a Claude / GPT-5.5 reasoning layer fronted by Hermes (the agent runtime) and Paperclip (the gateway / cron / delegation tier). One default agent, many models, all calls audited at the edge. Underneath it sits a persistent second brain — a file-based knowledge wiki the agents read first and write back to, with a scheduled "dream sequence" pass that ingests new material and lints the index.

What I'm learning
  • A knowledge base the agent maintains itself (ingest → page → index → log) turns chat ephemera into institutional memory — sessions stop re-discovering the same facts.
  • Plain markdown beats a database for agent memory: greppable, diffable, and any model can read it cold.
  • Centralizing access through one gateway makes cost, audit, and rate limits enforceable. Direct keys per tool is the slow drift into chaos.
  • "Agent" is a verb (a thing that decides what to call) more than a noun. Picking one default keeps the verb consistent.
  • Tool inventory belongs in the agent layer, not in the prompt — keeps prompts small and tools swappable.
  • One agent, many models is easier to operate than many agents, one model.
  • Cron-driven agent delegation is dramatically simpler than event-driven for small teams. Most "real-time" requirements aren't.
  • Mass-update workflows require idempotency from day one — or the rollback story gets ugly fast.
  • Audit logging at the gateway saves arguing about what an agent actually called when something breaks at 2am.
Tools
ClaudeHermes PaperclipGPT-5.5 Second Brain
Posture
One default agent
Many models
Audit at the edge
Generation Stack — image, video, audio tools
Generation Stack
02 · Stack
ComfyUI · GPT-Image2 · LTX-Video · AceStep 1.5 XL

The media-generation stack covering image, video, and audio — running as local generation on a private multi-node GPU farm. ComfyUI is the workflow editor; GPT-Image2 handles batch influencer renders; LTX-Video is the video diffusion runner; AceStep 1.5 XL covers music. LoRA fine-tuning is baked into the pipeline, and centralized orchestration handles model routing, throughput shaping, and node-level health.

What I'm learning
  • A saved workflow .json is more reproducible than any prompt — treat it like code, commit it, diff it.
  • Custom nodes age fast; pin the ones you depend on or accept the breakage.
  • Splitting the workflow at the latent stage (load, sample, decode separately) is the cheapest debugging move.
  • Influencer enrichment as structured JSON beats freeform prompt rewrites every time — and survives reruns.
  • Visual-DNA maps make characters consistent across hundreds of frames without re-prompting from scratch.
  • Video models reward thinking in shots, not seconds — pacing is a prompt input, not a render setting.
  • Music generation is most useful when the visual cut already exists — composition lives in the video, not the audio.
  • VRAM is the binding constraint on multi-model serving — utilization without VRAM tracking is a misleading green dashboard.
  • LoRA training stays cheap if the base model is locked and only the adapter cycles — full retrains rarely earn their cost.
  • Auto-load/unload is the difference between "real cluster" and "one big model that won't move."
  • Cost-per-generation only becomes a real number once node-level telemetry is wired in.
  • Long-running generations should checkpoint to disk; "cluster goes down at 80%" should not erase the work.
  • Throughput shaping per-tenant is the difference between "shared farm" and "one user starves everyone else."
Tools
ComfyUIGPT-Image2 LTX-VideoAceStep
Coverage
Image · Video
Audio · Music
LoRAs · Fine-tunes
Workflow as code
Local generation
Multi-node farm
Internal-only
Auto-routing
Hosted Stack — self-hosted mail, web, cloud and recovery
Hosted Stack
03 · Stack
Mail & Web Operations · Cloud Service · Backups · Disaster Recovery

The self-hosted operations layer — managed mail with proper authentication and an outbound relay, plus a portfolio-wide edge / DNS / proxy setup with structured contact-form sweeping. Backed by a managed cloud service tier, daily off-host backups, and a warm standby that turns disaster recovery into a DNS swap rather than a rebuild. All managed as data, all built to roll back per zone.

What I'm learning
  • Reputation is the metric that matters; everything else is upstream of it.
  • An outbound relay for transactional mail is a much better story than sending direct from the mail server.
  • The dry-run output IS the rollout plan. Anything you can't reproduce in dry-run won't behave the same on apply.
  • Parked domains are a contact-form attack surface most teams forget exists. Sweep them like any other input.
  • Edge rules outlive any single deployment script — write them as data, not as one-shot commands.
  • A managed cloud service tier earns its cost the first time a local node drops and traffic just keeps flowing.
  • A backup you've never restored from is a hope, not a backup — and "off-host" is the word in "off-host backup" that actually saves you.
  • Warm standby beats cold backup: when disaster recovery is a DNS swap instead of a rebuild, you'll actually reach for it under pressure.
Approach
MailWeb DNSEdge CloudBackups Recovery
Stack
SESAWS
Posture
Self-hosted
Managed deliverability
Warm standby
Phased rollout
AI Image/Video In House — character sheet to 3D viewport, web generation studio, AI film strip
AI Image/Video In House
04 · Live
3D Character Modeling · AI Web & UX Design · AI Film

In-house creative production built on AI image generation. A single generated character sheet becomes a 3D asset — image to mesh, auto-staged and lit in Blender, finished with a photoreal diffusion pass. The same image work drives AI-designed web products (an AI generation studio, live in production behind its own auth) and long-form AI film storytelling, where characters hold identity across hundreds of shots.

What I'm learning
  • Image-to-mesh models turn one character sheet into a riggable 3D asset in minutes — the bottleneck moved from modeling to art direction.
  • Driving Blender programmatically (stage, light, camera, render) makes 3D output reproducible the same way workflow JSON made image output reproducible.
  • 3D for geometry, diffusion for skin: a photoreal img2img pass over a 3D render gives consistent characters in controlled poses.
  • Vibe-coded sites ship fastest when the design tokens are locked before the pages — six iterations of this site taught that rule.
  • Shipping an AI image studio as a live web product means deployment, auth, and tunnels are part of the design, not an afterthought.
  • Narrative film is consistency management — face anchors, seed discipline, and shot-level decomposition, not longer prompts.
Approach
Img→3DBlender Web/UXAI Film
Output
3D characters
Live web studio
Story-driven film
AI Attendant and Assistant
AI Attendant and Assistant
05 · Live
Voice Assistant · Phone, Discord & Web · Layered Memory · All AI Influencers Voiced

A speech-first assistant running in production on the studio's own GPUs — an open-weight reasoning model behind a self-hosted speech pipeline, reachable over a real phone line, Discord, and the browser. Every AI Influencer in the roster now carries a unique voice, turning each character into a distinct conversational presence. Layered memory — vector recall plus a relationship graph — lets it remember across calls and sessions, with web search wired in for fresh answers. A working demo is available.

What I'm learning
  • Turn-taking is harder than transcription — the assistant that interrupts itself loses every conversation.
  • A real phone line is the honest benchmark: telephony adds latency you can't refactor away, so the budget is spent before the model says a word.
  • Serving the LLM, TTS, and STT on shared GPUs is a memory-placement problem — capacity planning is VRAM planning.
  • Memory has layers: vector recall answers "what did we say," a relationship graph answers "how does it connect." An assistant needs both.
  • Model selection matters more than parameter tuning for TTS quality — and multi-language coverage locks the engine choice early.
  • Latency budget is real; sub-second response is the floor, sub-300ms first-syllable is the goal.
  • A voice influencer is a brand asset — it should be consistent across calls, sessions, even years.

Try the demo →

Approach
TTSSTT PSTNMemory Turn-taking
Coverage
Phone · Discord · Web
Layered memory
All AI Influencers voiced
Sub-300ms goal
AI Music Idols — albums and music videos
AI Music Idols
06 · Live
Albums & Music Videos · Character-led Acts · Live Sites

A small label of in-house AI idols — characters with discographies, not producers with stage names. Currently QKeyV (synth · dark pop) and DvYnT (electronic · cinematic), each with a live artist site. Both acts are voiced and attended. The production direction stays deliberately vocal-forward. Releases ship as albums plus visual companions.

What I'm learning
  • Prompt versions are album versions — v6 replaced v5 the same day; the per-track lane file is the studio session.
  • A production constraint ("vocal-forward, no auto-tune") survives only if it's written into every track's prompt lane, not held in your head.
  • Visuals tied to the audio cut from day one beat post-hoc music videos — every time.
  • Album shape forces a story; single-track-only acts lose continuity fast.
  • An idol is a character with a discography, not a producer's stage name — write the influencer, then the songs.
  • Voice consistency across releases is what keeps an act feeling like one act — a distinct voice identity per idol is now in place for exactly this.
  • The fan loop (release → social → response → next release) is the actual product, not any single song.
QKeyV → DvYnT →
Acts
Output
Albums + videos
Voiced · Attended
Live artist sites
DSP distribution
AI Influencers, Models, UGC roster
AI Influencers, Models, UGC
07 · Live
Personalities · Rosters · Images · Websites · Voices · LoRAs

Multiple rosters of AI Influencers — female models, male brand ambassadors, music idols, and professionals — each with a consistent identity, unique voice, and ready-to-publish content.

What I'm learning
  • An influencer is a brand asset — versioned, signed off, retired the same way logos are.
  • When influencers multiply past a few dozen, the registry is the product.
  • UGC feel comes from imperfection on purpose, not from prompt-perfect renders.
  • Cross-platform formats cost less when designed up front than retro-fitted.
VyAyEy → Male UGC → AI Influencers → Employees →
Capabilities
Voiced · Attended
Face-anchored
Brand-safe media
UGC-ready feeds
Use
Campaign assets
Social feeds
UGC seeding
AI Learning Course — 44 lessons across 4 levels and 3 specialist tracks
AI Learning Course
08 · 44 Lessons · Live
4 Levels + 3 Specialist Tracks · 19 Live Playgrounds · Content as data, design as code

A from-the-ground curriculum on transformer-based AI, fully rebuilt as a generated static site. 44 lessons climb four levels — 01 Foundations, 02 The Transformer, 03 Training a Model, 04 Running Models — then branch into three specialist tracks: Image, Video, Voice. Lesson content lives as structured JSON; one Python builder renders every page through a single template — the curriculum is data, the design system is code. 19 live interactive playgrounds run inside the lessons, 24 lessons carry audio narration (125 mp3 clips), and 42 ERNIE-generated infographics illustrate the concepts. Current through 2026: agents & MCP, reasoning models & RLVR, flow matching, provenance & regulation. No login. No paywall.

Open the course →

The climb · 4 levels, then 3 specialist tracks
  • 01 Foundations — tokens, embeddings, the math under next-token prediction.
  • 02 The Transformer — attention, Q/K/V, the block, what a forward pass actually does.
  • 03 Training a Model — pretraining, RLHF, PEFT/LoRA, reasoning models & RLVR.
  • 04 Running Models — inference, the KV-cache, agents & MCP, provenance & regulation.
  • Specialist tracks — Image (diffusion, flow matching), Video, Voice — each inheriting the same foundation.
Interactive system · lessons you can poke at
  • 19 live playgrounds embedded in lessons — tokenizer, attention heatmap, LoRA rank slider, diffusion denoiser, KV-cache visualizer, and more.
  • Homogenized card system across every page: expandable concept cards, animated pipeline flow rails, count-up stat strips.
  • localStorage progress tracking plus a filter/search course map — resume exactly where you left off.
What I learned shipping the course
  • Curriculum survives best when anchored on principles rather than specific products — model names date; the residual stream and the diffusion process won't.
  • A common foundation that all specialist tracks inherit beats duplicating the basics per track — mirrors how the systems actually work.
  • Content-as-data pays off at scale: 44 lessons as structured JSON rendered through one template means a design change is one edit, not 44.
  • Interactive playgrounds teach what prose can't — dragging the LoRA rank slider explains low-rank adaptation faster than any paragraph.
  • Staying 2026-current is a maintenance contract, not a milestone — agents, RLVR, and regulation will need revisits; the JSON structure makes those cheap.
Curriculum
44 lessons
4 levels
3 specialist tracks
Interactives
19 live playgrounds
Concept cards
Flow rails · Stat strips
Progress tracking
Media
125 mp3 clips
24 narrated lessons
42 ERNIE infographics
Format
Self-pacedNo login AudioPlaygrounds
Source
github.com /
ryzenx570 /
LLM-Understanding
macalinao.com/learning
Security, IDS, Hardening
Security, IDS, Hardening
09 · Live
SOC Posture · LLM-written Daily Briefs · Intrusion Detection · Runbooks

Continuous security work across access policy, an ongoing threat-tracking notebook, an intrusion-detection layer, versioned hardening checklists, and a monitoring runbook. An LLM reads the night's logs and writes the daily security brief, with a continuous monitor sweeping between briefs at 15-minute cadence; every new service passes a security check-in before it ships. Real attack analysis feeds the runbook back instead of drifting into archive.

What I'm learning
  • A threat database without a monitoring runbook is a graveyard. Alerting is what makes the data load-bearing.
  • An LLM that reads the logs every morning and writes three paragraphs turns "the audit log exists" into "the audit log gets read."
  • A security check-in gate on every new service catches the boring 80% — default creds, open ports, missing rate limits — before exposure, not after.
  • Hardening checklists need versioning the same way code does — each revision should explicitly subsume the last, with a diff.
  • Brute-force at the perimeter is still the floor of what you'll see, and rate-limit tooling is still the cheapest mitigation.
  • Snapshot completion does not equal recovery. Restore drills uncover gaps the snapshot job hides.
  • IDS signal-to-noise ratio drives whether you'll act on it — tune for false-positive cost, not detection rate.
  • Deploy scripts are part of the attack surface — a mirror-style rsync will happily publish your internal docs if you let it.
Approach
HardeningIDS Rate LimitsRunbook
Stack
CloudflareMail Certs
Cadence
Daily LLM brief
15-min monitor
Weekly audit
9
Active Projects
3
Stacks Implemented
All AI Influencers Voiced
2
Idols Live
10
Process

How a project actually moves.

There's no proprietary methodology here — just the order of operations that has stopped me losing weekends to preventable rework.

01
Discover

Understand what the system actually does today. Talk to the operators, read the logs, run a baseline. The job before this is usually wrong about what's broken.

02
Architect

Design around the binding constraint, not the most exciting one. Most often that's memory, retry rate, blast radius, or audit. Pick before implementing.

03
Implement

Ship in dry-run first. Wire monitoring before features. Make rollback a button, not a story you tell at the post-mortem.

04
Operate

Cron, runbook, on-call. The most expensive part of any system is what happens after launch — design for the second year, not the first week.

11
Cross-project insights
Nine projects in flight teach you the same six lessons over and over — until you finally write them down.
i / 01

Route, don't proliferate

Centralizing access through one gateway makes cost, audit, and rate limits enforceable. Direct keys per tool is the slow drift into chaos — and the audit trail you'll wish you had.

From: AI Stack
i / 02

One-shot rate is the cost line

Token spend follows retry rate. Lowering retries — through tighter prompts, smaller diffs, and read-before-edit — saves more than picking a cheaper model ever will.

From: AI Stack · Generation Stack
i / 03

Dry-run is the spec

Whether it's edge config, a fine-tune, or a workflow graph, anything you can't reproduce in dry-run won't behave in production. Treat dry-run output as the deliverable, not a sanity check.

From: Hosted Stack · Generation Stack
i / 04

AI Influencer = structured DNA

A consistent character across a hundred frames, a year of releases, or a five-minute call isn't about a better prompt. It's about persisting the influencer as data and enriching every call from it.

From: Generation Stack · AI Music Idols · AI Influencers/Models/UGC
i / 05

Memory is the binding constraint

On a multi-model compute environment, utilization without memory tracking is a misleading green dashboard. Schedule on memory headroom; the throughput follows.

From: Generation Stack
i / 06

The runbook is the alert

A threat database without a monitoring runbook is a graveyard. Alerting + a written response procedure is what makes any operational data load-bearing — security, ops, or otherwise.

From: Security, IDS, Hardening · Hosted Stack
Let's
build
quietly.
Based

New City, New York

Working Remote

New Brief?

20-min intro call.

No decks required.

Talk to AI →