AI Architecture - Notions on Training and Inference

CPU · GPU · TPU · Edge Computing The problem is not that AI is expensive. It’s that for years, people paid to train models as if that were the main cost — when the real cost, the one that never stops, is serving every response. TL;DR Inference costs exceed training by 15x–20x over a model’s operational lifetime. Optimizing for training while ignoring inference is optimizing the wrong problem. CPU (Intel Xeon AMX): the correct choice when the model lives alongside the data. Network latency kills any compute gain from moving to a GPU cluster. NVIDIA GPU (Blackwell/Hopper + TensorRT-LLM): still the default for research and heterogeneous production. CUDA is a 20-year moat. Don’t lock in at peak prices. Google TPU v6/v7: the right answer for high-volume, predictable inference. Midjourney cut monthly costs from $2.1M to $700K. The CUDA migration barrier no longer exists. Edge AI: thermodynamics, not algorithms, sets the limits. Pi 5 + Hailo-10H delivers 320 ms TTFT (6.4× faster than CPU-only) with a PCIe x1 bottleneck you need to design around. The right hardware is not the most powerful. It is the one that matches the problem topology to the silicon architecture without wasting energy or budget. Introduction In 2023, Nvidia published a post titled What Is AI Computing? focused on handling intensive computations — particularly useful for embedding design and optimization processes in Machine Learning — and advancing toward hardware acceleration to find patterns in immense amounts of data, thereby updating the assumptions of ML or AI models. All of this typically runs on GPUs. ...

April 5, 2026 · Carlos Daniel Jiménez

Anatomy of an MLOps Pipeline - Part 1: Pipeline and Orchestration

Complete MLOps Series: Part 1 (current) | Part 2: Deployment → | Part 3: Production → Anatomy of an MLOps Pipeline - Part 1: Pipeline and Orchestration Why This Post Is Not Another Scikit-Learn Tutorial Most MLOps posts teach you how to train a Random Forest in a notebook and tell you “now put it in production.” This post assumes you already know how to train models. What you probably don’t know is how to build a system where: ...

January 13, 2026 · Carlos Daniel Jiménez

Anatomy of an MLOps Pipeline - Part 2: Deployment and Infrastructure

Complete MLOps Series: ← Part 1: Pipeline | Part 2 (current) | Part 3: Production → Anatomy of an MLOps Pipeline - Part 2: Deployment and Infrastructure 8. CI/CD with GitHub Actions: The Philosophy of Automated MLOps The Philosophical Foundation: Why Automation Isn’t Optional Before diving into YAML, let’s address the fundamental question: why do we automate ML pipelines? The naive answer is “to save time.” The real answer is more profound: because human memory is unreliable, manual processes don’t scale, and production systems demand reproducibility. ...

January 13, 2026 · Carlos Daniel Jiménez

Anatomy of an MLOps Pipeline - Part 3: Production and Best Practices

Complete MLOps Series: ← Part 1: Pipeline | ← Part 2: Deployment | Part 3 (current) Anatomy of an MLOps Pipeline - Part 3: Production and Best Practices 11. Model and Parameter Selection Strategies The Complete Flow: Selection → Sweep → Registration This pipeline implements a three-phase strategy for model optimization, each with a specific purpose: Step 05: Model Selection ├── Compares 5 algorithms with basic GridSearch (5-10 combos/model) ├── Objective: Identify best model family (Random Forest vs Gradient Boosting vs ...) ├── Primary metric: MAPE (Mean Absolute Percentage Error) └── Output: Best algorithm + initial parameters Step 06: Hyperparameter Sweep ├── Optimizes ONLY the best algorithm from Step 05 ├── Bayesian optimization with 50+ runs (exhaustive search space) ├── Objective: Find optimal configuration of best model ├── Primary metric: wMAPE (Weighted MAPE, less biased) └── Output: best_params.yaml with optimal hyperparameters Step 07: Model Registration ├── Trains final model with parameters from Step 06 ├── Registers in MLflow Model Registry with rich metadata ├── Transitions to stage (Staging/Production) └── Output: Versioned model ready for deployment Why three separate steps? You don’t have computational resources to do exhaustive sweep of 5 algorithms × 50 combinations = 250 training runs. First decide strategy (which algorithm), then tactics (which hyperparameters). ...

January 13, 2026 · Carlos Daniel Jiménez

MLOps Guides: A Comprehensive Overview

Exploring the intersection of machine learning and DevOps - from model versioning to automated deployments. Featured Posts 📦 Artifact Design and Pipeline in MLOps Part I Introduction to artifacts, MLproject manifests, and pipeline orchestration for reproducible ML workflows. 🤖 MLflow for Generative AI Systems Learn how to use MLflow for tracing, evaluation, and versioning of LLM applications and Agentic AI systems. 🍓 Raspberry Pi 16GB, Servers, and MLOps Using Raspberry Pi as a development server for MLOps testing and edge deployments. ...

June 15, 2024 · Carlos Daniel Jiménez

📬 Did this help?

I write about MLOps, Edge AI, and making models work outside the lab. One email per month, max. No spam, no course pitches, just technical content.