Posts

AWS EKS vs ECS: Architecture, Cost & Real-World Use Cases (2026)

TL;DR: EKS gives you full Kubernetes power with portability and the CNCF ecosystem. ECS gives you AWS-native simplicity with zero control plane cost. Choose EKS if you need GitOps, ArgoCD, Dapr, or multi-cloud portability. Choose ECS if you want faster setup, lower ops overhead, and a pure AWS-native stack. I’ve run both in production. At Vigo Retail, I architected a 21-service Go microservices platform on EKS handling 8,000 RPS peak and 25M+ requests/month. I’ve also managed ECS clusters for smaller AWS-native projects. This guide is what I wish existed before I made those decisions. ...

Zero DevOps E-commerce with Cloudflare Workers & Turborepo

Tired of maintaining expensive Kubernetes clusters, fine-tuning Auto-scaling groups on AWS, or wiring together complex CI/CD pipelines just to keep an e-commerce store alive? Welcome to the Zero DevOps era. In this post, we dissect Aura Store — a production-grade Cloudflare Workers E-commerce platform built entirely on Edge infrastructure, powered by a Turborepo Monorepo. Everything you see below is drawn directly from the running codebase. 1. Turborepo Monorepo Architecture in Practice Answer-first: Turborepo splits the e-commerce system into four independent apps — storefront-ui, admin-ui, public-api, and admin-api — and two shared packages: database and contract. This separation maximises build speed via Turborepo’s task graph cache and enforces a hard security boundary between the public-facing layer and internal tooling. ...

Kubernetes In-Place Pod Resizing: Scale CPU & Memory Without Restart

Answer-first: In-Place Pod Resizing (GA in Kubernetes v1.35) allows you to modify CPU and memory requests/limits on running containers without restarting the pod — eliminating cold-start disruptions for AI inference, databases, and stateful workloads. This guide covers requirements, production YAML, VPA integration, cost optimization patterns, and gotchas. Before this feature, changing a container’s resource allocation required deleting and recreating the pod. For a stateful database holding connections, an AI model with 30GB of weights loaded in memory, or a long-running batch job — that restart is catastrophic. In-Place Pod Resize finally decouples resource management from pod lifecycle. ...

Go 1.26: Green Tea GC, Faster CGO & Goroutine Leak Detection

Answer-first: Go 1.26 ships three landmark runtime features: the Green Tea garbage collector (10–40% GC overhead reduction), ~30% faster cgo calls for AI inference bindings, and an experimental goroutine leak profile that detects permanently blocked goroutines via GC reachability analysis. Released in February 2026, Go 1.26 is not a routine patch release. It fundamentally changes how the Go runtime manages memory, interacts with C code, and surfaces concurrency bugs. For teams running Golang microservices at scale, these improvements compound across a fleet — zero code changes required. ...

Go Microservices Architecture: Production Guide

Go microservices from domain design to Kubernetes deployment — gRPC, Dapr, OpenTelemetry, and GitOps patterns from a real 21-service production migration.

Magento Development in Vietnam: 2026 Hiring Guide

Vietnam’s Magento talent pool runs deep — but finding engineers who can handle production architecture is harder. Cost tiers, vetting signals, and when to migrate.

Golang gRPC Microservices: Protobuf, TLS & Middleware

Why gRPC for Go Microservices? Answer-first: gRPC is the right choice for Go microservices when you need: binary-efficient serialization (Protobuf is 3–10× smaller than JSON), bidirectional streaming for real-time data, strongly-typed contracts across services, and sub-millisecond inter-service latency. Google, Uber, Netflix, and Square use gRPC as the primary inter-service communication protocol. This guide shows you how to build production-grade Go gRPC services from scratch. The key advantages over REST: gRPC REST/JSON Serialization Protobuf (binary, schema-enforced) JSON (text, schema-optional) Payload size 3–10× smaller Baseline Streaming Unary, Client, Server, Bidirectional HTTP/2 SSE (server-only), WebSocket (separate) Contract .proto file (language-agnostic codegen) OpenAPI (opt-in, often stale) Latency ~0.5ms p50 inter-service ~2–5ms p50 inter-service Browser support gRPC-Web (needs proxy) Native Best for Internal microservices, streaming Public APIs, browser clients Step 1: Define Your Service with Protobuf Create the contract first — Protobuf schema drives code generation for all languages. ...

GraphHopper Distance Matrix: Self-Host & Replace Google Maps API ($510/day → $0)

What Is the GraphHopper Distance Matrix? Answer-first: GraphHopper distance matrix is the /matrix API of the open-source GraphHopper routing engine. It accepts N points and returns an N×N matrix of travel durations (seconds) and distances (meters) based on real road networks from OpenStreetMap — completely free when self-hosted. For 100 delivery stops, it computes 10,000 pairs in under 50ms on a standard VPS. This guide covers everything you need to run GraphHopper distance matrix in production: Docker setup, the /matrix API, Custom Models for truck/motorcycle routing, H3-based Redis caching, and an honest comparison with OSRM, Valhalla, and Google Maps. ...

Composable Banking Architecture: From Monolith to Modular Core

Answer-first: How banks replace monolithic cores (Temenos, Finacle) with composable banking using Go microservices, Saga orchestration, NewSQL ledgers, and Strangler Fig. Legacy core banking systems were designed in a different era. Temenos T24, Finacle, and Flexcube shared one defining assumption: the bank’s entire product catalogue — deposits, lending, payments, trade finance — would live inside a single, tightly coupled application and a single, shared database. That assumption held when banking moved at human speed. It breaks completely when product releases need to go from months to days, when a single fraud engine update must not risk a payments outage, and when engineers on a COBOL codebase are retiring faster than they can be replaced. ...

MySQL Scalability: Read Replicas, Sharding & TiDB

MySQL scalability is the ability to increase database throughput — reads per second, writes per second, or data volume — without rewriting your application. The critical distinction: read scaling (adding replicas) and write scaling (sharding or distributed SQL) require completely different architectural approaches. Choosing the wrong path creates technical debt that takes months to unwind. This guide walks through every stage of the MySQL scaling ladder, from buffer pool tuning through TiDB migration, with Go-specific implementation patterns at each step. ...

Real-Time Inventory Synchronization: Kafka, CDC & Redis for E-commerce

What Is Real-Time Inventory Synchronization? Real-time inventory synchronization is the process of propagating stock count changes from the system of record (database) to all sales channels — web storefront, mobile app, WMS, ERP — in sub-second time. Instead of batch ETL jobs that run every hour, a CDC + Kafka pipeline streams every committed stock change as an event, eliminating overselling and stale stock displays. Handling this during a flash sale — where thousands of users attempt to purchase a highly contested SKU simultaneously — is a pinnacle architectural challenge. Traditional synchronous database updates collapse under lock contention. ...

Go Microservices Distributed Tracing Architecture (2026)

Monitoring complex Go microservices requires more than isolated logs. When a request traverses HTTP APIs, Kafka event streams, and asynchronous worker pools, you need absolute visibility to pinpoint latency bottlenecks and failures. By 2026, OpenTelemetry (OTel) has cemented itself as the vendor-neutral standard for telemetry. This guide explores the architecture of distributed tracing in Go, from SDK context propagation to advanced Collector Gateway configurations. The 2026 Paradigm: OpenTelemetry Pipeline Answer-first: Modern Go observability relies on a decoupled OpenTelemetry pipeline. Go SDKs generate OTLP data, local DaemonSet Agents handle low-latency batching, and centralized Gateways perform tail-based sampling and PII redaction before routing to backends like Tempo or Mimir. ...

Go pprof in Kubernetes: CPU & Memory Profiling

Prerequisite: This guide covers how to profile and diagnose complex performance issues in production. If you are specifically dealing with unbounded goroutine growth, ensure you first understand the foundational concepts in Goroutine Leak Detection and Fix in Production Go Services. Performance degradation in production is inevitable. When a Go microservice suddenly spikes to 90% CPU utilization or triggers an Out-Of-Memory (OOM) kill in Kubernetes, guessing the root cause by staring at the code is rarely effective. You need data. ...

Surge Pricing Algorithm & Spatial Indexing Architecture

Answer-first: Explore the architecture of a real-time Surge Pricing algorithm. Discover how Uber utilizes the H3 spatial index, Kafka, and Flink to calculate dynamic pricing. Why is it that every time it rains, ride-hailing fares double, or even triple? It’s not a human operator manually adjusting the prices behind a desk. Rather, it’s the result of an incredibly sophisticated Stream Processing engine running in the background executing the surge pricing algorithm. ...

Banking Microservices Architecture: Go, Saga & Event Sourcing

1. Introduction: Deconstructing the Legacy Core Answer-first: A modern banking microservices architecture replaces legacy monolithic ledgers (like T24 or Flexcube) using Go for high-throughput transaction routing. The system achieves distributed consistency without two-phase commit (2PC) by combining Event Sourcing (immutable ledger streams), Saga Orchestration (using Temporal or Dapr), the Transactional Outbox pattern, and PostgreSQL unique constraints for API idempotency. For decades, banks relied on monolithic core systems like Temenos T24 or Oracle FLEXCUBE. While robust, these systems present severe bottlenecks for modern digital banking. They were designed for overnight batch processing, not real-time, API-first global transactions. ...

Vitess vs GORM Sharding: MySQL Write Scaling in Go

Answer-first: Vitess vs GORM Sharding for MySQL write scaling: VReplication zero-downtime vs. application-level sharding — ErrMissingShardingKey tradeoffs in Go. When your application reaches millions of users, a single database instance will inevitably become the biggest bottleneck in your entire architecture. To solve this, MySQL database scaling becomes mandatory. You must Scale DB for Microservices using Horizontal Scaling techniques. This article delves into the differences between scaling methods and compares the two most popular Sharding architectures today: Middleware-level Sharding (Vitess) and Application-level Sharding in Go (GORM Sharding plugin). ...

GraphHopper vs CARTO: Order Fulfillment Routing Engine

Answer-first: A comparison between the GraphHopper Distance Matrix API and CARTO Spatial Analytics. A guide to building an order fulfillment routing engine (VRP). In last-mile delivery and logistics, calculating a route is not just about finding the shortest path from point A to point B. When a system needs to coordinate thousands of drivers and orders simultaneously, computational costs can explode exponentially. This article will compare two popular approaches: utilizing GraphHopper for lightning-fast GraphHopper distance matrix calculation, and leveraging the CARTO Spatial Platform (focused on spatial analysis in Cloud Data Warehouses). We will also explore how to integrate this routing data into Real-time Surge Pricing Calculation to optimize operational costs. For routing within geospatial indexing systems (H3 hexagons, Redis GEO), see Part 2 — Geospatial Indexing: H3, S2 & Redis GEO. ...

What's New in Argo CD 3.4 & 3.3: Cluster Pause & Upgrades

Answer-first: Argo CD v3.4 & v3.3 (2026): Cluster Pause, PreDelete Hooks, SemVer breaking change 2014 plus RC: annotation filtering, Teams Workflow, ApplicationSet UI. GitOps is steadily becoming the gold standard for configuration management and application deployment on Kubernetes. Among the tools available, Argo CD continues to maintain its leading position. In the first half of 2026, the Argo project released two landmark versions: Argo CD 3.3 and Argo CD 3.4. These releases address numerous headaches related to application lifecycle management, synchronization performance, and incident response capabilities. ...

Alipay Double 11: 583,000 TPS Architecture Explained

Answer-first: How Alipay’s engineering team scaled Double 11 to 583,000 TPS using LDC unitization, OceanBase, RocketMQ, and SOFAStack. A 2026 deep-dive. At midnight on November 11th, approximately 1.5 billion people across Asia collectively open a single app and start tapping “Buy Now.” In the first 60 seconds, Alipay processes more transactions than a major Western bank handles in an entire day. The 2023 Singles’ Day peak — 583,000 payment transactions per second (TPS) — is not just a headline. It is the product of fourteen years of architectural evolution that has redefined what “production-ready” means for a financial platform. ...

Cloudflare D1 + Durable Objects: Build a Real-Time Cart

Answer-first: Build a real-time shopping cart using Cloudflare D1, Durable Objects, and Workers. Full schema, TypeScript code, and conflict-free concurrent updates. The traditional shopping cart architecture is a familiar set of tradeoffs: Redis for session storage, PostgreSQL for order data, and a backend API tier that coordinates between them. It works, but it introduces latency proportional to the distance between the user and your datacenter, requires operational overhead for Redis cluster management, and struggles with globally concurrent cart edits from the same user across multiple devices. ...

Dapr Workflow Go Tutorial: Orchestrated Saga Pattern

Answer-first: Step-by-step Go code for Orchestrated Saga using Dapr Workflow: durable state, compensating transactions, and banking-grade consistency. Most Go developers building microservices know the Choreography Saga pattern: service A emits an event, service B reacts, service C reacts to B, and so on. If step C fails, services emit “compensation” events in reverse order. The pattern works elegantly for simple flows, but breaks down as the number of steps grows: debugging a failed saga requires tracing events across five message broker topics, and implementing compensation logic requires every service to understand the full saga’s state. ...

Generative UI with MCP: Architecting AI-Native Frontends

Answer-first: Architecting dynamic generative UI applications with Model Context Protocol (MCP): dynamic registries, client-agent state synchronization, security, and a11y. The first generation of AI-powered chat interfaces followed a simple pattern: the user types a message, the LLM generates text, the UI renders text. The second generation added tool calls — the LLM could invoke functions and render the results as text. The third generation — Generative UI — goes further: the LLM generates not just text responses but interactive UI components that are rendered directly in the browser, enabling experiences that feel less like chatting with a text box and more like using a responsive, intelligent application. ...

Go pprof in Kubernetes: Remote Profiling & Flame Graphs

Answer-first: How to safely profile CPU, memory, and goroutines in Go services running in Kubernetes using kubectl port-forward, pprof, and Pyroscope. You’ve instrumented your Go service with net/http/pprof, run go tool pprof locally against the development binary, and spotted the hot path in your flame graph. Then you deploy to Kubernetes and the bottleneck disappears — because the workload profile in Kubernetes differs from local testing (different request mix, connection pool pressure, GC behavior under actual memory pressure, scheduler interference from co-located pods). ...

Goroutine Pool Patterns in Go: errgroup & Backpressure

Answer-first: Production Go concurrency patterns: errgroup worker pools, semaphore-based rate limiting, bounded queues, and graceful backpressure for microservices. Every Go engineer eventually writes the same mistake: a loop that launches goroutines unconditionally. In a demo with 10 items, this works beautifully. In production with 50,000 incoming webhook events, it spawns 50,000 goroutines simultaneously, exhausts memory, and triggers the OOM killer. Kubernetes restarts the pod. The on-call engineer gets paged at 3 AM. ...

GraphRAG vs Naive RAG: Enterprise Architecture Guide

Answer-first: Compare Naive RAG with GraphRAG for enterprise AI pipelines: knowledge graphs, LlamaIndex, chunking, streaming CDC, and security controls for dynamic data. Most RAG (Retrieval-Augmented Generation) implementations look the same: chunk documents, embed them into vectors, store them in a vector database, retrieve by cosine similarity, and inject the top-K chunks into the LLM context. This works for simple document Q&A. It fails systematically for enterprise knowledge bases where the answer to a question depends not on a single document chunk, but on the relationships between dozens of interconnected entities. ...

Order Fulfillment Algorithm: Warehouse to Last-Mile

Answer-first: How e-commerce giants decide which warehouse fulfills your order. Covers Amazon CONDOR, VRP solvers, split shipment logic, and last-mile routing. When you place an order on Amazon at 11:47 PM and it arrives at your door the next morning, every step of that delivery was orchestrated by a set of algorithms making real-time decisions across a network of hundreds of warehouses, thousands of drivers, and millions of items in inventory. None of it happens by chance, and none of it is primarily a human decision. ...

PayPay Architecture: Scaling Payments to 70M Users

Answer-first: An in-depth look at PayPay’s engineering stack: handling 70M users and 7.8B transactions/year using TiDB, Kafka event sourcing, GitOps, and chaos engineering. PayPay launched in October 2018 and grew to 10 million users in just 3 months — a growth rate that no Japanese fintech had ever seen. By 2025, the platform had crossed 70 million registered users and processed 7.8 billion payments per year. Behind this growth is an engineering team that has had to scale not just their infrastructure, but their entire engineering culture: from service standardization and GitOps-driven deployments to chaos engineering and AI-powered fraud detection. ...

Prompt Engineering vs Fine-Tuning: When to Use Each (GPT-5 Era Decision Guide)

Answer-first: A clear decision framework for AI engineers: when to fine-tune (LoRA/QLoRA), when to prompt-engineer, and when RAG is the right answer instead. Three engineers on the same team are trying to build the same thing: a customer support assistant that answers questions in the company’s specific support style, using terminology from their product documentation. One engineer says “just write a better system prompt.” Another says “we need to fine-tune a model.” The third says “this is clearly a RAG problem.” ...

Real-Time Ride-Hailing Architecture: Uber & Grab Stack

Answer-first: How Uber and Grab handle millions of GPS pings/sec: H3 geospatial indexing, Kafka, DISCO matching engine, surge pricing, and RAMEN push notifications. The moment you open the Uber or Grab app, a cascade of real-time systems activates simultaneously: your phone begins transmitting GPS coordinates, a geospatial index updates your location, a matching engine re-evaluates nearby driver availability, a pricing model recalculates the fare based on supply-demand ratios, and a push notification pipeline prepares to deliver your match confirmation in under 3 seconds. ...

Self-Hosting GraphHopper on Kubernetes with OSM Data

Answer-first: Step-by-step guide to deploying GraphHopper on Kubernetes with OpenStreetMap data: Docker image, PVC for OSM PBF files, RAM tuning, and health probes. GraphHopper is arguably the most capable open-source routing engine available — it supports Contraction Hierarchies (CH) for sub-millisecond route queries, custom vehicle profiles, turn restrictions, and the full OpenStreetMap road network. The problem most teams encounter is not the algorithm; it is the operational challenge of running it in Kubernetes: loading a large OSM PBF file, sizing JVM memory correctly, handling the long CH pre-processing startup time, and updating map data without downtime. ...